Chemical Biology Vol 1 (2007)

Chemical Biology
Edited by
Stuart L. Schreiber,
Tarun M. Kapoor, and Cunther Wess
Volume I
Related Titles
Larijani, B., Woscholski, R., Casteiger, I. (ed.)

Rosser, C. A. (eds.)
Handbook o f
Chemical Biology Chemoinformatics
Applications and Techniques From Data to Knowledge
2006 2003
Hardcover Hardcover
ISBN 978-0-470-09064-0 ISBN 978-3-527-30680-0
Klipp, E., Herwig, R., Kowald, A., Nicolaou, K. C., Hanko, R.,
Wierling, C., Lehrach, H. Hartwig, W. (eds.)
Systems Biology in Practice Handbook of Combinatorial

Concepts, Implementation and
Application
Chemistry
Drugs, Catalysts, Materials
2005
Hardcover 2002
ISBN 978-3-527-310784 Hardcover
ISBN 978-3-527-30509-4
Kubinyi, H.,Muller, G . (eds.)

Beck-Sickinger, A., Weber, P.
Chemogenomics in
Drug Discovery Combinatorial Strategies in
A Medicinal Chemistry Perspective
Biology and Chemistry
2004 2002
Hardcover Hardcover
ISBN 978-3-527-30987-0 ISBN 978-0-471-49726-4
1807-2007 Knowledge for Generations
Each generation has its unique needs and aspirations. When Charles Wiley
first opened his small printing shop in lower Manhattan in 1807, it was a
generation of boundless potential searching for an identity. And we were
there, helping to define a new American literary tradition. Over half a century
later, in the midst of the Second Industrial Revolution, it was a generation
focused on building the future. Once again, we were there, supplying the
critical scientific, technical, and engineering knowledge that helped frame
the world. Throughout the 20th Century, and into the new millennium,
nations began to reach out beyond their own borders and a new international
community was born. Wiley was there, expanding its operations around the
world to enable a global exchange of ideas, opinions, and know-how.
For 200 years, Wiley has been an integral part of each generation’s journey,
enabling the flow of information and understanding necessary to meet their
needs and fulfill their aspirations. Today, bold new technologies are changing
the way we live and learn. Wiley will be there, providing you the must-have
knowledge you need to imagine new worlds, new possibilities, and new
opportunities.
Generations come and go, but you can always count on Wiley to provide you
the knowledge you need, when and where you need it!
William J. Pesce Peter Booth Wiley

President and Chief Executive Officer Chairman of the Board
Chemical Biology
From Small Molecules to Systems Biology

and Drug Design
Edited by
Stuart 1. Schreiber, Tarun M. Kupoor,
and Cunther Wess
.,CENTENNIAL
B I C I W T E N N I I L
WILEY-VCH Verlag CmbH & Co. KCaA

The Editors All books published by Wiley-VCH are carefully
produced. Nevertheless, authors, editors, and
publisher do not warrant the information contained
Prof: Dr. Stuart L. Schreiber
Howard Hughes Medical Institute in these books, including this book, to be free o f
errors. Readers are advised to keep in mind that
Chemistry and Chemical Biology
Harvard University statements, data, illustrations, procedural details or
Broad Institute o f Harvard and MIT other items may inadvertently be inaccurate.
Cambridge, MA 02142 Library ofcongress Card No.: applied for
USA
British Library Cataloguingin-Publication
Prof: Dr. Tarun M. Kapoor Data
Laboratory o f Chemistry and Cell Biology A catalogue record for this book i s available
Rockefeller University from the British Library.
1230 York Ave. Bibliographic information published by
New York, NY 10021 the Deutsche Nationalbibliothek
USA The Deutsche Nationalbibliothek lists this
publication in the Deutsche
Prof: Dr. Ciinther Wess Nationalbibliografie; detailed bibliographic
CSF - Forschungszentrum fur data are available in the Internet a t
Umwelt und Gesundheit < http://dnb.d-nb.dez.
lngolstadter Landstr. 1
85764 Neuherberg 0 2007 WILEY-VCH Verlag CmbH & Co
Germany KCaA, Weinheim
All rights reserved (including those o f

translation into other languages). No part o f
this book may be reproduced in any
form - by photoprinting, microfilm, or any
other means - nor transmitted or translated
into a machine language without written
permission from the publishers. Registered
names, trademarks, etc. used in this book,
even when not specifically marked as such,
are not to be considered unprotected by law.
Typesetting Laserwords Private Ltd,

Chennai, India
Printing betz-druck CmbH, Darmstadt
Binding Litges & Dopf CmbH,
Heppenheim
Cover Schulz Grafik-Design, Fussgonheim
Wiley Bicentennial Logo Richard J. Pacific0
Printed in the Federal Republic o f Germany
Printed on acid-free paper
ISBN 978-3-527-31150-7
Iv
Preface XV
List of Contributors XVll
Volume 1
Part I chemistry and Biology - Historical and Philosophical Aspects
1 Chemistry and Biology - Historical and PhilosophicalAspects 3

Gerhard Quinkert, Holger Wallmeier,Norbert Windhab,and
Dietmar Reichert
1.1 Prologue 3
1.2 Semantics 4
1.2.1 Synthesis - Genesis - Preparation 4
1.2.2 Synthetic Design - Synthetic Execution 8
1.2.3 Preparative Chemistry - Synthetic Chemistry 9
1.3 Bringing Chemical Solutions to Chemical Problems 10
1.3.1 The Present Situation 10
1.3.2 Historical Periods of Chemical Synthesis 12
1.3.3 Diels-Alder Reaction - Prototype of a Synthetically Useful
Reaction IG
1.4 Bringing Chemical Solutions to Biological Problems 18
1.4.1 The Role of Evolutionary Thinking in Shaping Biology 18
1.4.2 On the Sequence of Chemical Synthesis (Preparation) and
Biological Analysis (Screening) 20
1.5 Bringing Biological Solutions to Chemical Problems 45
1.5.1 Proteins [99] 45
1.5.2 Antibodies 52
1.G Bringing Biological Solutions to Biological Problems 53
1.7 EPILOGUE 54
1.7.1 The Fossil Fuel Dilemma of Present Chemical Industry 54
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Cunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
vi 1 Contents
1.7.2 Two Lessons From the Wealth of Published Total Syntheses 55

Acknowledgments 58
References 59
Part II Using Natural Products to Unravel Biological Mechanisms
2 Using Natural Products to Unravel Biological Mechanisms 71

2.1 Using Small Molecules to Unravel Biological Mechanisms 71
Michael A. Lampson and Tarun M . Kapoor
Outlook 71
2.1.1 Introduction 71
2.1.2 Use of Small Molecules to Link a Protein Target to a Cellular
Phenotype 72
2.1.3 Small Molecules as Probes for Biological Processes 77
2.1.4 Conclusion 89
References 90
2.2 Using Natural Products to Unravel Cell Biology 95

Jonathan D. Gough and Craig M . Crews
Outlook 95
2.2.2 Historical Development 95
2.2.3 General Considerations 96
2.2.4 Applications and Practical Examples 96
2.2.5 Future Development 109
2.2.6 Conclusions 109
Acknowledgments 110
References 110
3 Engineering Control Over Protein Function Using Chemistry

115
3.1 Revealing Biological Specificityby Engineering Protein- Ligand
Interactions 115
Matthew D. Simon and Kevan M. Shokat
Outlook 115
3.1.2 The Selection of Resistance Mutations to Small-moleculeAgents
116
3.1.3 Exploiting Sensitizing Mutations to Engineer Nucleotide Binding
Pockets 126
3.1.4 Engineering the Ligand Selectivelyof Ion Channels 130
3.1.5 Conclusion 134
References 136
Contents 1 vii
3.2 Controlling Protein Function by Caged Compounds 140

Andrea Giordano, Sirus Zarbakhsh, and Carsten Schultz
3.2.2 Photoactivatable Groups and Their Applications 140
3.2.3 Caged Peptides and Proteins I S 0
3.2.4 Caged Proteins by Introduction of Photoactive Residues via Site
Directed, Unnatural Amino Acid Mutagenesis 156
3.2.5 Small Caged Molecules Used to Control Protein Activity 159
References 168
3.3 Engineering Control Over Protein Function; Transcription

Control by Small Molecules 174
j o h n T. Koh
Outlook 174
3.3.2 The Role of Ligand-dependent Transcriptional Regulators 175
3.3.3 Engineering New Ligand Specificities into NHRs 179
3.3.4 The Requirement of “Functional Orthogonality” 180
3.3.5 Overcoming Receptor Plasticity 180
3.3.6 Nuclear Receptor Engineering by Selection 183
3.3.7 Ligand-dependent Recombinases 184
3.3.8 Complementation/Rescue of Genetic Disease 186
3.3.9 De Novo Design of Ligand-binding Pockets 188
3.3.10 Light-activated Gene Expression from Small Molecules 189
References 191
4 Controlling Protein-Protein Interactions 199

4.1 Chemical Complementation: Bringing the Power of Genetics to
Chemistry 199
Pamela Peralta-Yahya and Virginia W. Cornish
Outlook 199
4.1.2 History/Development 202
4.1.4 Applications 21 G
References 223
4.2 Controlling Protein- Protein Interactions Using Chemical

Inducers and Disrupters of Dimerization 227
T i m Clackson
Outlook 227
viii 1 Contents

4.2.2 Development of Chemical Dimerization Technology 228
4.2.3 Dimerization Systems 229
4.2.4 Applications 237
Acknowledgments 246
References 246
4.3 Protein Secondary Structure Mimetics as Modulators of

Protein-Protein and Protein-Ligand Interactions 250
Hang Yinand Andrew D. Hamilton
Outlook 250
4.3.2 History and Development 251
4.3.5 Future Developments 264
Acknowledgments 2G5
References 265
5 Expanding the Genetic Code 271

5.1 Synthetic Expansion of the Central Dogma 271
Masahiko Sisido
Outlook 271
5.1.2 Aminoacylation of tRNA with Nonnatural Amino Acids 274
5.1.2.2 Micelle-mediatedAminoacylation 275
5.1.2.3 Ribozyme-mediatedAminoacylation 276
5.1.2.4 PNA-assisted Aminoacylation 277
5.1.2.5 Directed Evolution of Existing aaRS/tRNA Pair to Accept Non-
natural Amino Acids 278
5.1.3 Other Biomolecules That Must Be Optimized for Nonnatural
Amino Acids 281
5.1.3.2 Adaptability of EF-Tu to Aminoacyl-tRNAsCarrying a Wide
Variety of Nonnatural Amino Acids 283
5.1.3.3 Adaptability of Ribosome to Wide Variety of Nonnatural Amino
Acids 283
5.1.4 Expansion of the Genetic Codes 284
5.1.4.2 Four-base Codons 285
5.1.4.3 “Synthetic Codons” That Contain Nonnatural
Nucleobases 286
5.1.5 In vivo Synthesis of Nonnatural Mutants 287
Contents I ix
5.1.6 Application of Nonnatural Mutagenesis - Fluorescence

Labeling 289
5.1.7 Future Development and Conclusion 291
Acknowledgments 291
References 291
Part Ill Engineering Control Over Protein Function Using Chemistry
6 Forward Chemical Genetics 299

StephenJ. Haggarty and Stuart L. Schreiber
Outlook 299
6.1 Introduction 299
6.2 History/ Development 302
6.3 General Considerations 307
6.3.1 Small Molecules as a Means to Perturb Biological Systems
Conditionally 307
6.3.2 Forward and Reverse Chemical Genetics 308
6.3.3 Phenotypic Assays for Forward Chemical-Genetic
Screening 3 12
6.3.4 Nonheritable and Combinations of Perturbations 316
6.3.5 Multiparametric Considerations: Dose and Time 318
6.3.6 Sources of Phenotypic Variation: Genetic versus Chemical
Diversity 318
6.3.7 The “Target Identification” Problem 329
6.3.8 Relationship between Network Connectivity and Discovery of
Small-molecule Probes 323
6.3.9 Computational Framework for Forward Chemical Genetics:
Legacy of Morgan and Sturtevant 325
6.3.10 Mapping of Chemical Space Using Forward Chemical
Genetics 326
6.3.11 Dimensionality Reduction and Visualization of Chemical
Space 330
6.3.12 Discrete Methods of Analysis of Forward Chemical-genetic
Data 334
6.4 Applications and Practical Examples 336
6.4.1 Example 1: Mitosis and Spindle Assembly 336
6.4.2 Example 2: Protein Acetylation 338
6.4.3 Example 3: Chemical-genomic Profiling 340
6.5 Future Development 344
6.6 Conclusion 347
Acknowledgments 348
References 349
X I Contents
7 Reverse Chemical Genetics Revisited 355

7.1 Reverse Chemical Genetics - An Important Strategy for the
Study of Protein Function in Chemical Biology and Drug
Discovery 355
Rolf Breinbauer, Alexander Hillisch, and Herbert Waldmann
7.1.5 Future Developments 376
Acknowledgments 380
References 380
7.2 Chemical Biology and Enzymology: Protein Phosphorylation as a

Casestudy 385
Philip A. Cole
Outlook 385
7.2.1 Overview 385
7.2.2 The Enzymology of Posttranslational Modifications
of Proteins 387
References 401
7.3 Chemical Strategies for Activity-based Proteomics 403

NadimJessani and Benjamin F. Cravatt
Outlook 403
Acknowledgments 423
References 423
8 Tags and Probes for Chemical Biology 427

8.1 The Biarsenical-tetracysteine Protein Tag: Chemistry
and Biological Applications 427
Stephen R. Adams
Outlook 427
8.1.2 History and Design Concepts of the Tetracysteine-biarsenical
System 429
Contents 1 xi

8.1.4 Practical Applications of the Biarsenical-tetracysteine System
439
8.1.5 Future Developments and Applications 453
Acknowledgments 454
References 454
8.2 Chemical Approaches to Exploit Fusion Proteins for Functional

Studies 458
Anke Arnold, India SielaJ NilsJohnsson, and Kailohnsson
Outlook 458
8.2.4 Conclusions and Future Developments 476
Acknowledgments 477
References 477
Volume 2
Part IV Controlling Protein- Protein Interactions
9 Diversity-orientedSynthesis 483
9.1 Diversity-oriented Synthesis 483
Derek S. Tan
9.2 Combinatorial Biosynthesis of Polyketides and Nonribosomal

Peptides 519
Nathan A. Schnarr and Chaitan Khosla
10 Synthesis of Large Biological Molecules 537

10.1 Expressed Protein Ligation 537
Matthew R. Pratt and Tom W. Muir
10.2 Chemical Synthesis of Proteins and Large Bioconjugates 567

Philip Dawson
10.3 New Methods for Protein Bioconjugation 593

Matthew B. Francis
11 Advances in Sugar Chemistry 635

11.1 The Search for Chemical Probes to Illuminate Carbohydrate
Function 635
Laura L. Kiessling and Erin E. Carlson
xii I Contents
11.2 Chemical Glycomics as Basis for Drug Discovery 668

Daniel B. Werz and Peter H. Seeberger
12 The Bicyclic Depsipeptide Family of Histone Deacetylase In-

hibitors 693
Paul A. Townsend, Simon]. Crabb, Sean M. Davidson, Peter W. M.
Johnson, Graham Packham, and Arasu Ganesan
Part V Expandingthe Genetic Code
13 Chemical Informatics 723

13.1 Chemical Informatics 723
Paul A. Clemons
13.2 WOMBAT and WOMBAT-PK Bioactivity Databases for Lead and

Drug Discovery 760
Marius Olah, Ramona Rad, Liliana Ostopovici, Alina Bora, Nicoleta
Hadaruga, Dan Hadaruga, Ramona Moldovan, Adriana Fulias,
Maria Mracec, and Tudor 1. Oprea
Volume 3
Part VI Forward Chemical Genetics
14 Chemical Biology and Drug Discovery 789

14.1 Managerial Challenges in Implementing Chemical Biology
Platforms 789
Frank L. Douglas
14.2 The Molecular Basis of Predicting Druggability 804

Bissan Al-Lazikani, Anna Gaulton, Gaia Paolini, Jerry Lanfar,
John Overington, and Andrew Hopkins
15 Target Families 825

15.1 The Target Family Approach 825
Hans Peter Nestler
15.2 Chemical Biology of Kinases Studied by NMR Spectroscopy

852
Marco Betz, Martin Vogtherr, Ulrich Schieborr, Bettina Elshorst, Su-
sanne Grimrne, Barbara Pescatore, Thomas Langer, Krishna Saxena,
and Harald Schwalbe
Contents I xiii
15.3 The Nuclear Receptor Superfamily and Drug Discovery 891

John T. Moore, Jon L. Collins, and Kenneth H . Pearce
15.4 The GPCR - 7TM Receptor Target Family 933

Edgar Jacoby, Rochdi Bouhelal, Marc Gerspacher, and Klaus Seuwen
15.5 Drugs Targeting Protein-Protein Interactions 979

Patrick Che'ne
16 Prediction of ADM ET Properties I003

UEfNorinder and Christel A. S. Bergstrom
Part VII Reverse Chemical Genetics Revisited
17 Computational Methods and Modeling 1045

17.1 Systems Biology of the JAK-STATSignaling Pathway 1045
lens Timmer, Markus Kollrnann, and Ursula Klingmiiller
17.2 Modeling Intracellular Signal Transduction Processes 1 061

Jason M. Haugh and Michael C. Weiger
18 Genome and Proteome Studies 1083

18.1 Genome-wide Gene Expression Analysis: Practical Considera-
tions and Application to the Analysis of T-cell Subsets in Inflam-
matory Diseases 1083
Lars Rogge and Elisabetta Bianchi
18.2 Scanning the Proteome for Targets of Organic Small Molecules

Using Bifunctional Receptor Ligands 1118
Nikolai Kley
Part Vlll Tags and Probes for Chemical Biology
19 Chemical Biology - An Outlook 1143

Giinther Wess
Index 1151
I xv
Preface
Small molecules are at the heart of chemical biology. The contributions in

this book reveal the many ways in which chemical biologists’ studies of small
molecules in the context of living systems are transforming science and society.
Macromolecules are the basis of heritable information flow in living systems.
This is evident in the Central Dogma of biology, where heritable information is
replicated via DNA and flows from DNA to RNA to proteins. Small molecules
are the basis for dynamic information flow in living systems. They constitute
the hormones and neurotransmitters, many intra- and intercellular signaling
molecules, the defensive and offensive ”natural products”used in information
flow between organisms, among many others. They are the basis for memory
and cognition, sensing and signaling, and, of course, for many of the most
effective therapeutic agents.
One dominant theme in many of the chapters concerns small molecules
and small-molecule screening. Together, these have dramatically affected life-
science research in recent years. Many of the contributors to Chemical Biology
themselves both provided new tools for understanding living systems and
affected smoother transitions from biology to medicine. The chapters they
have provided offer riveting examples of the field’s impact on life science.
The range of approaches and the creativity that fueled these projects are
truly inspiring. After a period of widely recognized advances by geneticists
and molecular and disease biologists, chemists and chemical biologists are
returning to a position of prominence in the consciousness of the larger
scientific community.
The trend towards small molecules and small-molecule screening has
resulted in an urgent need for advances in synthetic planning and methodology.
Synthesis routes are needed for candidate small molecules and for improved
versions of candidates identified in biological discovery efforts. Several
contributors give hints to the question: How do we synthesize candidate
structures most effectively poised for optimization? They note that planning
and performing multi-step syntheses of natural products in the past resulted
in the recognition and, often, resolution of gaps in synthetic methodology. The
synergistic relationship between organic synthesis planning and methodology
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Giinther Wess
ISBN: 978-3-527-31150-7
xvi 1 Preface
is even more profound as synthetic organic chemists tackle the new challenges
noted above. The objects of synthesis planning, no longer limited by the
biochemical transformations used by cells in synthesizing naturally occurring
small molecules, require radically new strategies and methodologies.
Several contributors help us answer a related question that also influences
synthetic plannig: What are the structural features of small, organic molecules
most likely to yield specific modulation of disease-relevant functions? They
note that the ability to assess the performance of these compounds, and to
compare their performance to other small molecules such as commercially
available or naturally occurring ones, is possible through public small-molecule
screening efforts and public small-molecule databases (e.g., WOMBAT,
PubChem, ChemBank). These developments are reminiscent of the early
stage of genomics research, where visionary scientists recognized the need to
create a culture of open data sharing and to develop public data repositories
(e.g., GenBank) and analysis environments (e.g., Ensembl, UCSC Genome
Browser).
Sometimes the line between small and macromolecules is blurred.
Oligosaccharides are often presented as a third class of macromolecules, yet
several contributions here reveal arguably greater similarities of carbohydrates
to small-molecule terpenes than to nucleic acids and proteins, both in terms
of their biosynthesis and cellular functions. Oligosaccharides are shown to be
synthesized by glycosyl transferases (analogous to isopentenyl pyrophosphate
transferases used in terpene biosynthesis) and, like the terpenes, are subject
to tailoring enzymes. Transferase enzymes are used to attach oligosaccharides
and terpenes to proteins, where they serve key functions (e.g., glycoproteins,
farnesylated Ras). Chemical biologists have illuminated and manipulated
oligosaccharides and the unquestionable member of the macromolecule
family, the proteins, with great aplomb. Several of our contributors are
pioneers in the revolution of protein chemistry and protein engineering, and
their chapters provide clear testimony to the consequences of these advances
to life science. Finally, in examing the similarities of and synergies between
chemical biology and systems biology, several of our contributors have perhaps
offered a glimpse into the future of these fields.
Stuart L. Schreiber, Cambridge January 2007

Tarun M. Kapoor, New York
Gunther Wess, Neuherberg
List of Contributors
Stephen R. Adarns Elisabetta Bianchi

Department o f Pharmacology lmmunoregulation Laboratory
University o f California, San Diego Department o f Immunology
310 George Palade Laboratories 0647 Institute Pasteur
La Jolla, CA 92093-0647 25, rue du Dr. Roux
USA 75724 Paris Cedex 15
France
Anke Arnold
Ecole Polytechnique Federale A h a Bora
de Lausanne (EPFL) Division o f Biocomputing
Institute o f Chemical Sciences University o f New Mexico
and Engineering School o f Med, MSC11 6445
1011 Lausanne Albuquerque, N M 87131
Switzerland USA
Rochdi Bouhelal
Christel A. S. Bergstrom
Novartis Institutes for
AstraZeneca R&D
BioMedical Research
Discovery Medicinal Chemistry
Lichtstrasse 35
15185 Sodertalje
4056 Basel
Sweden
Switzerland
Marco Betz
Rolf Breinbauer
Center for Biomolecular
Institute o f Organic Chemistry
Magnetic Resonance
University o f Leipzig
Johannisallee 29
and Chemical Biology
041 03 Leipzig
Johann Wolfgang Goethe-
Germany
University Frankfurt
Max-von-Laue-Str. 7 Erin E. Carkon
60439 Frankfurt Department o f Chemistry
Germany University o f Wisconsin
1101 University Avenue
Madison, WI 53706
USA
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
ISBN: 978-3-527-31150-7
xviii 1 List ofContributors
Patrick Chene Craig M. Crews

Oncology Research Yale University
Novartis Institutes for School o f Medicine
Biomedical Research 333 Cedar Street
4002 Basel New Haven, CT 06510
Switzerland USA
Benjamin F. Cravatt
Tim Clackson
Neuro-Psychiatric Disorder Institute
ARIAD Pharmaceuticals, Inc.
The Skaggs Institute for Chemical
26 Landsdowne Street
Biology
Cambridge, MA 021 39-4234
The Scripps Research Institute
USA
BCC 159
10550 North Torrey Pines Rd.
Paul A. Clemons
La Jolla, CA 92037
Chemical Biology
USA
Broad Institute o f Harvard & MIT
7 Cambridge Center Sean M. Davidson
Cambridge Center, MA 02142 The Hatter Cardiovascular Institute
USA 67 Chenies Mews
University College Hospital
Philip A. Cole London WC1 E 6DB
Department o f Pharmacology United Kingdom
Johns Hopkins School o f Medicine
725 N. Wolfe St. Philip Dawson
Baltimore, MD 21 205 Department o f Cell Biology
USA and Chemistry
The Scripps Research Institute
Jon L. Collins 10550 N. Torrey Pines Road
Discovery Research. La Jolla, CA 92037
GlaxoSmithKline Discovery Research USA
Research Triangle Park, NC 27709
Frank L. Douglas
USA
Aventis Pharma
lndustriepark Hochst
Virginia W. Cornish 65926 Frankfurt
Department o f Chemistry
Germany
Columbia University
3000 Broadway, MC 31 67 Bettina Elshorst
New York, NY 10027-6948 Center for Biomolecular
USA Magnetic Resonance
Simon J. Crabb and Chemical Biology
School o f Chemistry Johann Wolfgang Goethe-
University o f Southampton University Frankfurt
Highfield Max-von-Laue-Str. 7
Southampton SO1 7 1 BJ 60439 Frankfurt
United Kingdom Germany
List ofcontributors I xix
Matthew B. Francis Jonathan D. Cough

Department o f Chemistry Yale University
University of California, Berkeley Department of Molecular, Cellular,
Berkeley, CA 94720-1460 and Developmental Biology
USA Kline Biology Tower 442
New Haven, CT 06520-8103
Adriana Fulias USA
Division of Biocomputing
University o f New Mexico Susanne Crimme
School of Med, MS C l l 6445 Center for Biomolecular
Albuquerque, N M 87131 Magnetic Resonance
USA Institute o f Organic Chemistry
Arasu Canesan Johann Wolfgang Goethe-
School of Chemistry University Frankfurt
University o f Southampton Max-von-Laue-Str. 7
Highfield 60439 Frankfurt
Southampton SO1 7 1BJ Germany
United Kingdom
Dan Hadaruga
Anna Caulton Division of Biocomputing
Pfizer Global Research and University of New Mexico
Development School of Medicine, MS C l l 6445
Pfizer Ltd. Albuquerque, N M 87131
Sandwich, Kent, CT13 9NJ USA
United Kingdom
Nicoleta Hadaruga
Marc Cerspacher Division of Biocomputing
Novartis Institutes for University of New Mexico
BioMedical Research School o f Med, MS C l l 6445
Klybeckstrasse 141 Albuquerque, N M 87131
4057 Basel USA
Switzerland
Stephen J. Haggarty
Andrea Giordano Broad Institute of Harvard and MIT
European Molecular Biology 320 Bent Street
Laboratory Cambridge, MA 02141
Gene Expression Programme USA
Meyerhofstr. 1
691 17 Heidelberg Andrew D. Hamilton
Germany Department of Chemistry
Yale University
225 Prospect St.
New Haven, CT 06520-8107
USA
xx I List ofcontributors
JasonM. Haugh Nils Johnsson

Department o f Chemical and Center for Molecular Biology
Biomolecular Engineering o f Inflam mat io n
North Carolina State University Institute o f Medical Biochemistry
Raleigh, NC 27695-7905 University o f Muenster
USA Von-Esmarch-Str. 56.
48149 Muenster
Alexander Hillisch Germany
Bayer Healthcare AG
PH-GDD-EURC-CR Peter W. M. Johnson
Aprather Weg 18a School o f Chemistry
42096 Wupperta! University of Southampton
Germany Highfield
Southampton SO17 1BJ
Andrew Hopkins United Kingdom
Pfizer Global Research and
Development Tarun M. Kapoor
Pfizer Ltd. Laboratory of Chemistry and
Sandwich, Kent, CT13 9NJ Cell Biology
United Kingdom Rockefeller University
Flexner Hall
Edgar Jacoby 1230 York Ave.
Novartis Institute for New York, NY 10021
Biomedical Research USA
Lichtstrasse 35
4056 Basel Laura L. Kiessling
Switzerland Department o f Chemistry
University o f Wisconsin
Nadim Jessani 1101 University Avenue
Department of Cell Biology Madison, WI 53706
Celera USA
180 Kimball Way
South San Francisco, CA 94080 Nikolai Kley
USA CPC Biotech, Inc.
610 Lincoln Street
Kai Johnsson Waltham, MA 02451
Ecole Polytechnique Federale USA
de Lausanne (EPFL)
Institute o f Chemical Sciences Chaitan Khosla
and Engineering Department o f Chemistry
1011 Lausanne Stanford U n iversi ty
Switzerland 381 North South Mall
Stanford, CA 94305
USA
List ofcontrjbutors 1 xxi
Ursula Klingmiiller Bissan Al-Lazikani

German Cancer Research Center lnpharmatica Ltd.
(DKFZ) 60 Charlotte Street
Im Neuenheimer Feld 280 London, W1T 2NU
69120 Heidelberg United Kingdom
Germany
Ramona Moldovan
John T. Koh Division o f Biocomputing
Department o f Chemistry University o f New Mexico
and Biochemistry School o f Med, M S C l l 6445
University o f Delaware Albuquerque, N M 87131
Newark, DE 19716 USA
USA
JohnT. Moore
Markus Kollmann Discovery Research
Physics Institute GlaxoSmithKline Discovery Research
Hermann-Herder-Str. 3 Research Triangle Park, NC 27709
79104 Freiburg USA
Germany
Maria Mracec
Michael A. Lampson Division o f Biocomputing
Laboratory o f Chemistry and Cell University o f New Mexico
Biology School o f Med, M S C l l 6445
Rockefeller University Albuquerque, N M 87131
Flexner Hall USA
1230 York Ave.
New York, NY 10021 Tom W. Muir
USA The Rockefeller University
1230 York Avenue
Jerry Lanfear New York, NY 10021
Pfizer Global Research and USA
Development
Pfizer Ltd. Hans Peter Nestler
Sandwich, Kent, CT13 9NJ Sanofi aventis
United Kingdom Combinatorial Technologies Center
1580 East Hanley Blvd.
Thomas Langer Tucson, AZ 85737
Center for Biomolecular USA
Magnetic Resonance
Institute o f Organic Chemistry Ulf Norinder
and Chemical Biology AstraZeneca R&D
Johann Wolfgang Goethe- Discovery Medicinal Chemistry
University Frankfurt 15185 Sodertalje
Max-von-Laue-Str. 7 Sweden
60439 Frankfurt
Germany
xxii I ~ i s ofcontributon
t
Marius Olah Pamela Peralta-Yahya

Division o f Biocomputing Department o f Chemistry
University o f New Mexico Columbia University
School o f Med, M SC l l 6445 3000 Broadway, MC 3167
Albuquerque, N M 87131 New 'fork, NY10027-6948
USA USA
Tudor 1. Oprea Barbara Pescatore

Division o f Biocomputing Center for Biomolecular
University o f New Mexico Magnetic Resonance
School o f Med, MS C l l 6445 Institute of Organic Chemistry
Albuquerque, N M 87131 and Chemical Biology
USA Johann Wolfgang Coethe-
Liliana Ostopovici Max-von-Laue-Str.7
Division o f Biocomputing 60439 Frankfurt
University o f New Mexico Germany
School o f Med, M SC l l 6445
Albuquerque, N M 87131 Matthew R. Pratt
USA Laboratory of Synthetic
Protein Chemistry
John Overington The Rockefeller University
lnpharmatica Ltd. New York, NY 10021
60 Charlotte Street USA
London, W1T 2NU
United Kingdom Ramona Rad
Division o f Biocomputing
Graham Packham University o f New Mexico
School o f Chemistry School of Med, MS C l l 6445
University o f Southampton Albuquerque, N M 87131
Highfield USA
Southampton SO1 7 1BJ
United Kingdom Dietmar Reichert
Degussa AG
Gaia Paolini Exclusive Synthesis & Catalysis
Pfizer Global Research and Rodenbacher Chausssee 4
Developme nt 63457 Hanau
Pfizer Ltd. Germany
Sandwich, Kent, CT13 9NJ
United Kingdom Lars Rogge
lmmunoregulation Laboratory
Kenneth H. Pearce Department of Immunology
Gene Exp. and Protein Chem. Institute Pasteur
GIaxoSmith Kline Discovery Research 25, rue du Dr. Roux
Research Triangle Park, NC 27709 75724 Paris Cedex 15
USA France
List ofcontributors I xxiii
Cerhard Quinkert Stuart L. Schreiber

lnstitut fur Organische Chemie Howard Hughes Medical Institute
und Chemische Biology Department o f Chemistry and
Johann Wolfgang Goethe Universitat Chemical Biology
Marie-Curie-Str. 11 Harvard University
60439 Frankfurt Broad Institute o f Harvard and M I T
Germany Cambridge, MA 02142
USA
Krishna Saxena
Carsten Schultz
Center for Biomolecular
European Molecular Biology
Magnetic Resonance
Laboratory
Gene Expression Programme
Meyerhofstr. 1
Johann Wolfgang Goethe-
691 17 Heidelberg
Germany
Max-von-Laue-Str. 7
60439 Frankfurt Peter H. Seeberger
Germany Laboratory for Organic Chemistry
Swiss Federal Institute o f Technology
Ulrich Schieborr Zurich
Center for Biomolecular ETH-Honggerberg
Magnetic Resonance HCI F315
Institute o f Organic Chemistry Wolfgang- Pa u Ii-Str. 10
and Chemical Biology 8093 Zurich
Johann Wolfgang Goethe- Switzerland
Max-von-Laue-Str. 7 Klaus Seuwen
60439 Frankfurt Novartis Institutes for
Germany BioMedical Research
Lichtstrasse 35
Nathan A. Schnarr 4056 Basel
Department o f Chemistry Switzerland
Stanford University Kevan M. Shokat
381 North South Mall Department o f Cellular and
Stanford, CA 94305 Molecular Pharmacology
USA UC San Francisco
600 16th Street, Box 2280
Harald Schwalbe San Francisco, CA 90143-2280
Center for Biomolecular USA
Magnetic Resonance
Institute o f Organic Chemistry hdia Sielaff
and Chemical Biology Ecole Polytechnique Federale
Johann Wolfgang Goethe- de Lausanne (EPFL)
University Frankfurt Institute o f Chemical Sciences
Max-von-Laue-Str. 7 and Engineering
60439 Frankfurt 1011 Lausanne
Germany Switzerland
xxiv I List ofcontributors
Matthew D. Simon Herbert Waldmann

Department o f Cellular and MPI of Molecular Physiology
Molecular Pharmacology University of Dortmund
UC San Francisco Otto-Hahn-Str. 11
600 16th Street, Box 2280 44227 Dortmund
San Francisco, CA 90143-2280 Germany
USA
Holger Wallmeier
Masahiko Sisido Aventis Pharma Deutschland GmbH
Department o f Bioscience and Research &Technologies
Biotechnology lndustriepark Hochst, K801
Okayama University 65926 Frankfurt am Main
3-1-1 Tsushimanaka Germany
Okayama 700-8530
Japan Michael C. Weiger
Department o f Chemical and
Derek S. Tan Biomolecular Engineering
Laboratory of Chemistry and North Carolina State University
Chemical and Chemical Genetic Raleigh, NC 27695-7905
Sloan-Kettering Cancer Center USA
1275 York Ave. RRL 1317
New York, NY 10021 Daniel B. Werz
USA Laboratory for Organic Chemistry
Swiss Federal Institute o f Technology
lens Timmer
Zurich
Physics Institute
ETH-Honggerberg
Hermann-Herder-Str. 3
HCI F315, Wolfgang-Pauli-Str. 10
79104 Freiburg
8093 Zurich
Germany
Switzerland
Paul A. Townsend
School o f Chemistry Ciinther Wess
University o f Southampton GSF - Forschungszentrum fur
Highfield Umwelt und Gesundheit
Southampton SO1 7 1BJ Ingolstadter Landstr. 1
United Kingdom 85764 Neuherberg
Germany
Martin Vogtherr
Center for Biomolecular Norbert Windhab
Magnetic Resonance Degussa AG
Institute o f Organic Chemistry CREAVIS
and Chemical Biology Rodenbacher Chausssee 4
Johann Wolfgang Goethe- 63457 Hanau
University Frankfurt Germany
Max-von-Laue-Str.7
60439 Frankfurt
Germany
Sirus Zarbakhsh
List ofContributors
I
xxv
Hang Yin
Department o f Chemistry European Molecular Biology
Yale University Laboratory
225 Prospect St. Gene Expression Programme
New Haven, Meyerhofstr. 1
CT 06520-8107 691 17 Heidelberg
USA Germany
PART I
Introduction
Edited bv Stuart L. Schreiber. Tamn M. Kauoor. and Gunther Wess
ISBN: 978-3-527-31150-7
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
13
1
Chemistry and Biology - Historical and Philosophical
Aspects
Gerhard Quinkert, Holger Wallmeier,Norbert Windhab,and Dietmar Reichert
Dedicated to Profs. Helmut Schwarz and Utz-Hellmuth Felcht on the occasion of their
respective GOth birthdays.
1.1
Prologue
The reductionistic attitude of philosophers [ 11has given way to the emergence-

based thinking [2] of biologists. In place of the view that phenomena occurring
at a higher level in a complex system [3] with hierarchically structured levels of
organization can also be described by rules and in terms of concepts already
verified at a lower level, it has come to be accepted that some of these rules
or concepts may be altered or even gained in the transition from lower to
higher level. This applies even in the case of the structural and functional
basic unit of all biological systems: the living cell. The living cell is a protected
region in which diverse ensembles of molecules interact with one another
in a harmony achieved through self-assembly [4]. The reality of the cell, with
its overlapping functional networks [S] (for regulation of metabolism, signal
transduction, or gene expression, for example) can serve as a model. The
question of the hierarchical organization of such networks arises. Top-down
analysis proceeds in the direction of decreasing complexity of the biological
systems, a cell, a tissue, or even an organism, step by step all the way down to
the level of molecules underlying their intra- and intermolecular interactions.
From chemistry’s molecules and supermolecules bottom-up synthesis starts
in the direction of increasing complexity to reach the totality of the cell and
its higher organizations emerging through modular motifs and supramodular
functional units [6]. Bottom-upsynthesis and top-down analysis are signposts
for changes in complexity in emergent systems, lending themselves not only
to narrative representation of what is, but also to reflective conjecture on why
something is as it is.
The interdisciplinary union of the worlds of chemistry and of biology has
to begin with the different entry points to the two disciplines. In the world
of chemistry, for material atoms and its associated interactions within and
ISBN: 978-3-527-31150-7
4
I between moleculesthe crucial aid is the open sesame represented by the periodic
1 Chemistry and Biology - Historical and Philosophical Aspects
system of the chemical elements. In the world of biology, the fundamental

information flow and the associated ascent from the biochemical network
of metabolism to the biological network of genetic information transfer can
be deciphered by the Rosetta Stone that is the genetic code. Fundamental
to this is the understanding that in biology - as in cosmology'), but wholly
different in chemistry (and physics) - earlier historical events influence future
developments. It is a characteristic of historical events that they may have
been played out completely differently under other circumstances. In such
cases, it is reasonable to ask why questions. Why did Darwinian evolution
eventually come to entrust its further fate to the chemistries of two polymer
types, nucleic acids and proteins, and their later collaboration in a ribosome?
Why did the dice fall in favor of a genetic code with triplet character? Why
did protein genesis satisfy itself with the 20 canonical amino acids? For a
transdisciplinary perspective it is worth addressing such cases in which the
emergence of chemistry (or, more precisely, biochemistry) into biology (or,
more precisely, molecular biology) signifies a tipping point. This came about
with the appearance of macromolecules possessing the aptitude to store and
distribute information and to translate it into catalytic function [gal. It became
manifest as awareness grew of the double-faceted nature of protein synthesis:
as an enzymatic chain of chemical reaction steps in biochemical space and as
a genetic information transfer process in molecular biological space 191.
This essay deals with the structures and functions of material things
produced by chemical or biological means. While the products obtained
in both routes are comparable, if not identical, the production facilities
differ substantially.As facilities of human design, they happen to be formed by
machines in the laboratory or in the factory;as facilities of Darwinian evolution,
they start to exist in generative supermolecules of the living world. Having
distinguished the generation of natural products by supramolecular facilities
built up by self-assemblyof complementary molecules from the production of
materials in man-made facilities, it seems appropriate to add a brief excursion
into semantics.
1.2
Semantics
1.2.1
Synthesis - Genesis - Preparation
By a chemical reaction, whether it takes place in a laboratory, in a factory, or in

a living cell, an educt is converted into a product. If the product is structurally
1) The developments of stars and galaxies offer

no analog to Darwinian evolution by natural
selection, of course [7].
1.2 Semantics
15
more complex than the related educt, the conversion is called a construction
(in biochemistry: an anabolic pathway). In contrast, the conversion is called
a degradation (in biochemistry: a catabolic pathway), if the product is less
complex than the related educt. According to another classification, one may
distinguish between synthesis, genesis, and preparation. While execution follows
a subtle plan in the first and instructions of a naturally selected program
in the second case, tinkering takes place in the last instance. That such a
differentiation may prove useful to the keen mind of a synthetic chemist is
demonstrated by the example of the natural dye, indigo.
While its first offspring is often popularly held to be urea, synthetic chemistry
actually began in the last quarter of the nineteenth century, with the production
of artificial indigo [lo]. This dissent can be resolved if consensus is reached
on what should be understood by the term synthesis in organic chemistry
[ll].If it is taken to mean an attempt to construct a previously decided upon
target molecule with a known structure from a suitable starting molecule (or
molecules) according to some plan [12],the choice has to be for indigo. Urea, in
contrast, was discovered by chance as an isomerization product of ammonium
cyanate by Wohler [13]in 1828, and was not in any way prepared intentionally
[14].This qualification, however, does not mean that the urea synthesis can be
discounted as inconsequential. On the contrary, Friedrich Wohler’s production
of artificial urea from hydrogen cyanate and ammonia in 1828 was a key
discovery for the dawning chemical sciences, and researchers at the ever-
advancing frontiers of the science have to this day venerated the narrative
connection between Wohler’s urea synthesis and their own new findings and
future perspectives. What historians like to unmask as a benign legend [14]
serves scientists as a rhetorical shorthand and metaphorical paraphrase.
In the industrially used Heurnann-Pfleger synthesis, N-phenylglycine 1,
readily accessible from aniline, is transformed through indoxyl2 into indigo 3
in a targeted fashion (Scheme 1-1).
This process represents the culmination of a development first set in motion
in the laboratories of the Munchen University under Adolf Baeyer. Baeyer
had begun his efforts to prepare indigo in the laboratory at a time (before
1883) when the constitution of indigo was not even known [lG],starting his
1 2 3
Scheme 1-1 Industrial production o f indigo 3 by the Heurnann-Pfleger synthesis [15]:

from 1 via 2 t o 3.
6
I endeavors with degradation products (aniline,anthranilic acid,isatin) obtained
by the application of one of the usual degradative methods (alkali melt, effect
of oxidizing agents) to the naturally occurring dyestuff. These degradation
products were treated with an extraordinarily broad range of chemicals in
a form of intuitive combinatorial process, to examine whether the resulting
products would contain 3. In this way, Baeyer and Emmerling succeeded in
transforming isatin 10 into 3 in 1870.The preparation of 10 (from phenylacetic
acid4: 1878)was however too elaborate to becomrnerciallyviable (Scheme 1-2).
As long as the constitution of a target molecule is unknown, the above
definition of a synthesis is inadmissible. The sequence of reactions depicted in
Scheme 1-2, however, characterizes a venture that serves for the preparation
of indigo. Two other pathways that afforded indigo in the laboratory were also
not industrially viable. A. von Baeyer encouraged BASF and Farbwerke Hoechst
to undertake a systematic search for an industrial synthesis of artijicial indigo
(the constitution of which had meanwhile been established) in competition
with one another. This was finally achieved in a strategicallyclear and tactically
flexible manner through the already mentioned Heumann-P’eger synthesis
(Scheme 1-1).It was envisaged that the artificial preparation of dyes from coal
tar should become a source of national wealth. Baeyer’s Miinchen University
laboratories and the two representatives of Germany’s flowering chemical
r 1
4 5
1 6
1
H
7 a 9
Scheme 1-2 Laboratory studies ofthe preparation of indigo 3 by A. (uon) Baeyer and his
colleagues.
1.2 Semantics
17
industry had exchanged ideas and experiences in a previously unknown scale
and had thus passed the test for a collaboration in partnership. In 1905, Adolf
von Baeyer was awarded the Nobel Prize for Chemistry for his contribution to
the development of organic chemistry and the chemical industry.
It has thus been demonstrated that the example of indigo is suitable for
conceptual differentiation between molecule construction according to a plan
(synthesis) and one without a plan (preparation). It can also provide an
illustration, based on the different character of the synthetic steps involved,
of differentiation between chemical and biological synthesis steps within the
overall indigo syntheses. Chemical synthesis steps [ 17a] can be understood
to include transformations achieved not only through the use of reagents or
catalysts prepared by chemists but also those in which enzymes, antibodies,
or even dead cells are used. Synthesis steps in which the synthetic capabilities
of living cells, either possessing their original genomes or new recornbinant
variants, are deployed in a targeted manner, are classified as a part of biological
synthesis [17a]. Indigo was synthesized biologically in 1983 (Scheme 1-3) [18].
Biological indigo synthesis made use of an Escherichia coli strain with a
recornbinant genome, being capable of converting aromatic hydrocarbons in
general into cis-l,2-dihydrodiols and, in particular, indole (obtained from
tryptophan 11 with the aid of tryptophanase) into cis-2,3-dihydroxy-2,3-
dihydroindol13. The recombinant E. coli strain was augmented with the genes
expressing naphthalene dioxygenase from Pseudomonas putida. The initially
produced oxidation product spontaneously loses water, and the resulting
indoxyl 2 is converted by aerial oxidation into 3, which can be taken up into
organic solvents.
&NH2 H cis-2,3-dihydroxy-
2,3-dihydroindol
/
H
11 12 13
11
Tryptophanase
- 12
Naphthalene-
+ 13
1
dioxygenase
- H2O
Air oxidation
3 - 2
Scheme 1-3 Formation of indigo 3 in a recombinant strain of E. coli.

1 Chemistry a n d Biology Historical and Philosophical Aspects
I
- --
8 -
Indol-3-
glycerol- 12 2 3
phosphate
Scheme 1-4 On the formation of indigo 3.
After the discussion on the biological synthesis of indigo with the aid
of a recombinant E. coli strain, one question still remaining relates to the
programmed genesis of indigo precursors in plants. Plants cultivated for indigo
production contain 2, stabilized by glycosylation (e.g., as indican = indoxyl
B-D-glucoside or as isatan B = indoxyl 5-ketogluconate) [19]. Indoxyl on its
part is produced from indole 3-glycerinephosphate [20] (Scheme 1-4) and that
in turn by the chorismate pathway.
This essay deals not only with preparation (intuitive) and synthesis (planned)
but also with genesis (programmed). Such (genetically and somatically
regulated) programs have arisen through Darwinian evolution. A plan for
a synthesis is devised by a synthetic chemist as designer and enacted by the
synthetic chemist as molecule maker. How is a synthesis planned?
1.2.2
Synthetic Design - Synthetic Execution
Unlike the bottom-up-oriented execution of a synthesis, involving real

molecules, the designing of a synthesis is a top-down event using virtual
structuresZ).Design begins with the target structure and moves through a
greater or lesser number of intermediate structures to the starting structure,
with the complexity generally decreasing. The starting structure is worthy of
that name, once it can reasonably be said to represent a comfortably accessible
starting molecule for the carrying out of the synthesis. E. J . Corey coined some
terms for top-down-oriented synthesis design which intended to highlight
the fact that retrosynthetic structure analysis and synthetic building up of the
molecule are concurrent processes. Whilst bottom-up synthesis takes place with
molecules and in synthetic steps through the deployment of suitable synthetic
building blocks, from the appropriate starting molecule to the resulting target
molecule, top-down retrosynthesis operates with structures and in transformation
steps through the identification of appropriate retron structure elements, from the
particular target structure to the resulting starting structure. Some of Corey’s
achievements through his endeavors in the logic ofsynthesis [21] include:
the fact that organic synthesis can be taught [22] even where it
is not actively practiced;
2) Differentiation between abstract structures

and concrete molecules will also pay for itself
in other circumstances.
1.2 Semantics
19
the availability of computer-aided synthesis planning [23]as a
procedure to generate a population of synthesis plans from
which the synthetic chemist can select the best one to
use; and
his being awarded the 1990 Nobel Prize for Chemistry for
development and methodology of organic synthesis.
Twenty-five years earlier, R. B. Woodward had been awarded the Chemistry

Nobel Prize for his outstanding achievements in the art of organic synthesis.
Woodward’scategorical imperative [12] - Synthesismust always be carried out by
plan - rapidly became the sign of the coming generation of natural products’
synthesis chemists. His qualifying statement in the following sentence can
easily go unremarked: “The synthetic frontier can be defined only in terms
of the degree to which realistic planning is possible”. This is probably the
reason for Woodward’scomment at the end ofhis essay on the total synthesis of
chlorophyll [24a].“At the beginning there was detailed synthetic planning. The
degree to which our plans proved realizable is very gratifying, but laboratory
discoveries and knowledge obtained from observation and experimentation
contributed at least as much to the advancement of our studies. We learned and
found out much that would previously not have been knowable or at best would
have been only approximately imaginable.” Elsewhere he sounds the Leitmotif
of natural products synthesis [24b]: “In our time many organic chemists
address themselves explicitly to mechanistic and theoretical problems - and
make outstanding contributions in so doing - it should not be forgotten that
questions too self-consciouslyasked of Nature may well receive subconsciously
determined answers - answers which only with difficulty contain more than
was presupposed in the questions. It is important to keep open the avenues
for innovation and surprise.”
1.2.3
Preparative Chemistry - Synthetic Chemistry
The terms preparative chemistry and synthetic chemistry are often used
synonymously. We wish to draw some distinction between them: in preparative
chemistry we see a rich fund of knowledge from which the synthetic chemist
can draw, gained from work on chemical reactions. The preparative chemist is
concerned with broadly aimed investigations geared toward the discovery of
chemical reactions and the development and improvement of already known
ones. A chemical reaction may qualify as “mature” [17a] if it is capable of
transforming a starting compound of not too restricted substrate specificity in
a predictable manner:
under easily maintainable reaction conditions;
as far as possible with the use of substoichiometric
proportions of effective catalysts;
I Chemistry and Biology - Historical and Philosophical Aspects
10
I without restriction to a particular scale;
with high chemical yield; and
with high regio- and stereospecificity
into an envisaged product. There is now such an extensive available reservoir

of preparatively useful reactions of this level of comprehensiveness that for the
construction of molecular skeletons it appears expedient to switch to a handful of
trusted reactions in the first instance [25]. In the introduction, modijication, and
elimination offinctional groups, the a priori restriction on only a few methods
is already becoming more difficult.
Organic synthesis presupposes a substantial body of knowledge, usually de-
veloped through bottom-up strategies ofthe structures and reactivities oforganic
molecules. In education, though, it is important to begin concurrently practic-
ing top-down approaches based on this knowledge and its extension and further
enrichment, as early as possible. As example speaks louder than a long discus-
sion of principles: to demonstrate the problem-solving potential of synthetic
chemistry, it would be useful to identify a molecule that has served for a long
time, commanding undiminished interest both in the past and in the present,
as a sought-after target molecule for a solid synthetic pathway. One such
molecule is estrone. If a particular target structure has been decided upon, it is
appropriate to select a particular synthetic pathway from the multitude ofvirtual
ones identifiable by combinatorial analysis (Scheme 1-5).In the process, it usu-
ally remains open whether the whole set of alternative synthetic pathways for
the particular decision is evaluated or intuitively only a part of it is considered.
1.3
Bringing Chemical Solutions to Chemical Problems
1.3.1
The Present Situation
At the beginning of the twenty-first century chemistry finds itself in the middle
of a phase of reorientation. In the chemical industry there is a clear trend
toward specialization and concentration. It cannot be ignored that traditional
organizational structures can be altered appreciably by investment and
disinvestment decisions, the maxim being away from the broadly diversified
chemical concern of yesterday toward the megacorporation of tomorrow,
with its focus on a few core competences. Measures adopted in established
organizations are disposition of particular branches, horizontal fusion of
adjoining core activities, and vertical integration of new high-tech ventures.
In the chemical sciences, progressive integration with chemical biology
and also with nanotechnology is underway. Self-organization of molecules
and modules into supramolecular and supramodular functional units plays a
prominent role in both fields of development, as is clear from research and
1.3 Bringing Chemical Solutions to Chemical Problems
I”
-A AC ABD 7ABCD
AB
\?AAD
N A Y D1
6 further planning variants
BC BD CD
A
+
A B C D t C 4 further planning variants
4
D
Scheme 1-5 Virtual synthetic pathways single step of an AB (AC, AD, BC, BD, or
toward the steroid skeleton with rings A, 6, CD)-building block into the ABCD system;
C, and D. Top row: stepwise conversion of a bottom row: expansion in a single step of an
ring A (B,C, or D)-building block into the A (B,C, or D)-building block into the ABCD
ABCD system; middle row: expansion in a system.
teaching in the top academic institutions. That this has been possible is due to
the development of physical methods without the aid of which it would be im-
possible even to establish the existence or presence of systems with particular
properties. The core competence of chemistry, though, remains the provision
of new molecules through synthesis, a mission equally valid for synthetic
chemists in both industrial and academic environments. Both can point to
great successes in the past. Nonetheless, synthesis finds itself in a dilemma.
Academic synthetic chemists tended to give the highest priority to the
elegance of the design of a synthesis, and this veneration was passed on to
their students. For industry’s molecular engineers, the expediency with which
the synthesis could be carried out held center stage: a concept which new
graduates did not have to come to terms with until their entry into their
industrial careers. Meanwhile, the constructive tension between elegance and
efficiency was usurped by the dream of the perfect reaction and the ideal
synthesis. The perfect reaction can be summarized in Derek Burton’s utopian
view: 100%yield, 100%stereoselectivity [25a]. B. M. Trost [25b]seeks to advance
toward the ideal through observance of atom-economy, and M. Beller [25c]
12
I through transformation of multiple-component educts into single-component
products. The ideal synthesis conforms to the prescription of K. B. Sharpless

[26]: rather than being concerned with the innumerable synthetic methods in
the textbooks one should assemble a handful of “perfect” reactions that may
be used again and again by synthetic chemists in the many-step construction
of a molecular framework. A solution to this dilemma lies in a radical new
orientation, as the synthetic chemist begins to take on a role in chemistry
similar to those long played by the medical doctor in biology or the engineer
in physics [27]. In this way, the synthetic chemist provides assistance to the
fundamental scientist as a practicing technologist for mutual benefit and
being capable of demonstrating that, and in what way, fundamental chemical
knowledge may be applied in a targeted fashion to problem solving in synthesis.
There is still the matter of future target molecules for the synthetic chemist.
The times are gone when it was sufficient to synthesize a target molecule
just because it had not yet been synthesized in another laboratory. The
accent of interest in chemistry has shifted. There are two reasons for this:
one is that the structure space of supramolecular chemistry, unlike that
of molecular chemistry, is in many regions only thinly populated and awaits
selective filling. The attention of chemists has therefore moved from molecular
structure to molecular function [28]. Molecules that combine themselves into
supramolecular functional units attract particular attention from synthetic
chemists. A. Eschenrnoser’s vision [29] of creating synthetically accessible
supramolecular systems that will spontaneously assemble and may even be
capable of reproducing themselves, thus representing the first artificial models
of living systems, is heading in this direction, although far into the future.
1.3.2
Historical Periods of Chemical Synthesis
From a distance, scientific and technological advancements look like a

continuous stream, contributed to by many activists. On closer inspection,
though, discontinuities due to outstanding contributions by individuals are
unmistakable. If the development of chemical synthesis is reviewed, it is
possible informally to identify three phases, following on from one another in
the sense that a later phase is characterized by a greater degree of selectivity
than the earlier, with which it partially overlaps. It is easy to make out
prominent protagonists for each of the three phases. The example of the
female sex hormone estrone serves well to demonstrate how the synthetic
chemist has succeeded in meeting growing demands for selectivity.
1.3.2.1 The pre-Woodwardian Era

The first phase of chemical synthesis, ending at about the beginning
of the Second World War, might be termed the pre-Woodwardian era.
1.3 Bringing Chemical Solutions t o Chemical Problems
The pre- Woodwardian era largely concerned itself with the collection and
classification of synthetic tools: chemical reactions suited to broad application
to the constitutional construction of molecular skeletons (including Kiliani’s
chain-extension of aldoses, reactions of the aldol type, and cycloadditions of
the Diels-Alder type). The pre- Woodwardian era is dominated by two synthetic
chemists: Emil Fischer and Robert Robinson. Emil Fischer was emphasizing the
importance of synthetic chemistry in biology as early as 1907 [30]. He was
probably the first to make productive use of the three-dimensional structures
of organic molecules, in the interpretation of isomerism phenomena in
carbohydrates with the aid of the Van’t Ho$ and Le Be1 tetrahedron model (cf.
family tree of aldoses in Scheme I-G),and in the explanation of the action of
an enzyme on a substrate, which assumes that the complementarily fitting
surfaces of the mutually dependent partners are noncovalently bound for a
little while to one another (shape complementarity) [31].
Robert Robinson looked for suitable reactions with the aid of which
constitutional modifications in a pathway to, for example, a steroid synthesis
might be achieved. He was probably the first to employ mechanistic
! c 7 cs
c2
Glyceraldehyde
0C1
Eryihrose
gl:$4
CH20H CH20H CH20H
/ \ / Arabinose
\ / Xylose \ / \
LYXOSQ
$
Ribose
H OH
HO
OH OH H
$ CH>OH
OH
CH,OH CHzOH CH20H CH>OH CH70H CH,OH
Allose Altrose Glucose Mannose Gulose Idose Galactose Talose
Scheme 1-6 The family tree o f aldoses derived f r o m

(+)-glyceraldehyde. The Fischer projections of the corresponding
aldaric acids are, variously, chiral and asymmetrical (C,), chiral
and symmetrical (C?), o r achiral and symmetrical (G).
14
I considerations in the process. There is a tendency toward charge balancing
between anionoid and cationoid atom groups [32] through space and through
the bonds lying between them (charge complementarity). Robinson used a
transparent accounting system (curly arrows) to illustrate the direction of charge
displacement (Scheme 1-7).
Case Study Estrone: Elisabeth Dane’s attempts to produce estrone 24
(Scheme 1-8)synthetically [33], beginning with a Diels-Alder reaction that
might formally give rise to two regioisomeric adduct components, ended in
disappointment: whilst no adduct at all was obtained from an attempted
reaction between the Dane diene 1 4 and the monoketonic dienophile
15a, the reaction between 14 and the biketonic dienophile 19a resulted
in a mixture of rac-20a and rac-2la, in which rac-20a, with the steroidal
molecular skeleton, was present only as a minor component. It is thus no
surprise that the Dane strategy was consigned to the files, at the end of
the 1930s.
1.3.2.2 The Woodwardian Era

In the second phase of organic synthesis, which could reasonably be termed
the Woodwardian era, beginning in 1937”, chemical reactions characterized
by diastereoselection in the construction of a molecular skeleton found favor.
Here as well, two synthetic chemists tower over all their contemporaries:
one, naturally, is R. €3. Woodward, who advanced the intellectualization of
organic synthesis like no one else. Woodward’s seminars set a new standard
for natural products chemistry4).The other is Albert Eschenrn~ser~), the sole
P O
,-
Me Me
Scheme 1-7 Analysis ofthe relative orientation o f Dane’s diene 14 and the
complementary dienophile following Robinson’s way.
3) Woodward graduated as a Doctor of Philosophy 4) I have no doubt that they ( Woodwards seminars
in 1937, after submission of his dissertation at at ETH Zurich)played a major role in stimulating
M I T (Cambridge, Mass.) (341. my ownpredilectioizforand enthrallment with the
synthesis of complex natural products; A. E.: in
1351.
5) See the concise Preface in [36a].
I 15
14 15a: R =M e 16a: R =M e 17a: R = Me

15b: R = Et 16b: R = Et 17b: R = Et
18a: R = Me 19a: R = Me 20a: R = Me 21a: R =M e

18b: R = Et 19b: R = Et 20b: R = Et 21b:RZEt
22a: R = Me 23 24
22b: R = Et
Scheme 1-8 Collections o f formulae relevant to Dane’s concept o f a steroid synthesis

+
following the AB D + ABCD aufbau principle.
recipient of the privilege of a “collaborative competition” with Woodwurd

[35]. To master the demands of stereoselection it is necessary to know the
mechanism of the reaction used and its stereostructural consequences. In
particular, knowledge of a mechanism demands the capability to gauge the
diastereomorphic transition states of rival parallel reactions (see Scheme 36
in [37]).A necessary prerequisite for the acceptance of proposed ideas is that
they should be able to predict the sense of chirality of the main product
components, accurately.
Case Study (f)-Estrone (ruc-24): In 1991, [33c] the presumed dead Dane
strategy was resurrected by the use of Lewis acids as mediators. Compound 1 4
does in fact react with 15a between 0 “C and room temperature in CH2Cl2 - to
provide a mixture of (mainly) ruc-16a and (as a minor product) ruc-17a - as
soon as Et2AlCl is added [33d]. In the presence of TiC14 in CHzCl2 at -80 “C
an 89% yield of ruc-18a is obtained.
1.3.2.3 The post-Woodwordian Era

Characteristic of the third phase of organic synthesis, which would logically
be termed the post- Woodwurdian era, is that the constitutional construction
of a molecular framework is now concerned not only with the problem
of diastereoselection but also with the more demanding problem of
16
I enantioselection [37]. Certain chemical reactions serving as key stages in
I Chemistry and Biology - Historical and Phi/osophical Aspects
multistep syntheses have been developed to perfection through the preparation

of tailor-made catalysts by Barry Sharpless6) (38a],R. NoyoVi [39]and E. J. Corey
[40],setting the standard for the further development of organic synthesis.
Case Study: (+)-Estrone 24. The “Dane-style estrone synthesis” provides a
classic example of stereoselective access to an envisaged target molecule. The
Diels-Alder reactions between 14 and 15a or 19a are chirogenic’’ reaction steps
or, put another way, the enantioselective access to the Diels-Alder adducts can
already be set at this stage. This requires, for example, the participation of a
nonracemic Lewis acid with the “right” sense of chirality. In the presence of a
Ti-TADDOLate [42], cycloadduct 20a was thus obtained from the Dane diene
14 and the bidentate dienophile 19a and was further transformed via 23 into
(+)-estrone 24*1 [33d].
Before leaving estrone, a synthetic model for oral contraceptives, as synthetic
biologicals (vide infia), it should be pointed out that each historical period of
chemical synthesis can be correlated with a characteristic synthetic level
amenable to conscious perception [37]. The resurrection [33c] of the Dane
strategy for estrone prompted synthetic chemists working on the design of
metal-free, chirality-transferring catalysts to use the chirogenic opening step
as a selection assay. In this context, acceleration of adduct formation and
changes in the ratios of the resulting regioisomers are encouraging signs
that enantioselection, which may be finished off here by recrystallization if
necessary, may be anticipated [33d]. M. W. Gobel and coworkers [43] and
E. J. Corey and coworkers [44]have reported on the application of amidinium
catalysts and oxazaborolidinium catalysts, respectively,for the enantioselective
treatment of the Dane diene 14 with 19a or with acyclic dienophile~~).
1.3.3
Diels-Alder Reaction - Prototype of a Synthetically Useful Reaction
The Diels-Alder reaction occupies a cherished place in the hearts of organic

synthetic chemists, not only in the synthesis of steroids [45]but far and wide
in the synthesis of structurally complex natural products [46].The Diels-Alder
6 ) Thebottomline in Scheme 1-6shows the eight 8) The (S,S)-configurated Ti-TADDOLate [42]

aldohexoses ofnatural origin; they all belong to complex with four phenanthren-9-yl residues
the D-series. Their L-configured enantiomers is used at -80°C in CH2C12: 65% chemical
have been synthesized by use of the abiotic yield, 93% ee or 78% chemical yield, and 85%
Sharpless catalyst (38bj. ee (2 or 0.2 equiv, respectively).
7) See [41] for the meaning of the term “chi- 9) With cyclic dienophiles, rings C and D in the
rogenic reaction step” and the usefulness of cycloadduct are joined in cis fashion. With
its application. acyclic dienophiles containing E-configured
C=C bonds, an adduct in which the atom
groups necessary for construction ofthe D ring
are oriented, trans is produced; see Chapter 3
in [33d].
reaction comes closest to meeting the stipulations of K. B. Sharpless [26]

and B. M. Trost [25b] set out in Section 1.3.1. It only remains to comment
that, besides diverse instances of intermolecular examples, the intramolecular
version1o'of a Diels-Alder reaction was not left neglected in the synthesis of
estrone and its derivatives.
Scheme 1-9 summarizes the construction of a steroid framework by the
+
A D + AD + [AD]* -+ ABCD aufiau principle"'.
[AD]* 25a is a photoenol generated i n situ, and reacts under meticulously
determined conditions [48] by cycloaddition and subsequent dehydration to
provide the estrone derivatives 2Ga and 27a. The mixture of regioisomeric styryl
derivatives can be reduced to give 24 after temporary protection of the 17-keto
group. The photoenol 25a is produced by regioselective electronic excitation
of the Michael adduct 28a with light having wavelengths of >340nm. The
Michael adduct is accessible by treatment of the chiral enolate anion 30a with
the achiral acceptor 29 [49]. The strength (the trans fusion of rings C and D
is directly accessible) and weakness (there is still no solution to the problem
of substitution of the multistep procedure that delivers diastereoselection
for a shorter route proceeding in tandem with enantioselection) of the
photochemical synthesis of 24 have already been commented upon [36b].
I&[
Me0 \
25
Me0 &&
\
26
Me0 \
27
20
C.r:"
Me0
29 30
a:R=Me
b: R = Et
Scheme 1-9 Collection offormulae relevant to a steroid synthesis following an

+
A D + AD + [AD]* + ABCD aufbau principle.
10) For further examples see the section "In- 11) Optimization of the reaction conditions was
tramolecular DielT-Alder Reactions" in carried out in the racemic series 1481. See 1491
Ref. [47]. for the synthesis ofthe enantiomerically pure
target compounds.
18
I I Chemistry and Biology - Historical and Philosophical Aspects
1.4
Bringing Chemical Solutions to Biological Problems
1.4.1
The Role o f Evolutionary Thinking in Shaping Biology
Biology is such a hugely diversified field that a historical guide hardly helps as
an aid to orientation. Given this, it might then be reasonable to consciously
pick out some particular partial aspect, as Theodosius Dobzhansky did in his
famous statement “Nothing in Biology makes Sense except in the Light of
Evolution”. With evolutionary biology as a compass, it is not hard to discern
three historical periods.
1.4.1.1 The pre-Darwinian Era

One prominent event in the pre-Darwinian era is the Cuvier-Geofioy debate
(concerning the primacy of anatomical structure over anatomical function or
vice versa) before the Acade‘mie des Sceances in Paris in the spring of 18301*).Its
immediate focus involved opposed viewpoints in comparative anatomy, while
indirectly it represented endeavors to turn “the static Chain of Being into an
ever-moving escalator” [511. Cuvier represented the functionalist approach of
the designer: Formfollows Function. Geofioy Saint-Hilaire expanded the theme
and took the structuralist standpoint of the evolutionist: Functionfollows Form.
The public argument was unable to settle the difference between the two
adversaries, though it became clear that fundamental scientific discussions
would in future no longer take place in a neutral en~ironment’~). It was
also evident that evolutionary thinking in biology could no longer be kept in
its cage.
1.4.1.2 The Darwinian Era

In the narrow sense, the Darwinian era began with the publication of The
Origin of Species in 1859 and ended at the beginning of the twentieth century
with the rediscovery of Gregor Mendel’s 1866 Versuche iiber Pflanzen-Hybriden
(Experiments in Plant Hybridization). Charles Darwin’s book “The Origin of
Species by Means of Natural Selection could be read as one long argument.
It supported the claims of science to understand the world in its own terms.
Animals and plants are not the product of special design or special creation.
Natural selection was not self-evident in nature, nor was it the kind of theory in
which one could say, “Look here and see”. Darwin had no crucial experiment
that conclusively demonstrated evolution in action. His whole concept of
natural selection rested on analogy”, an analogy between selective processes
taking place under either artijcial or natural conditions [53]. A series of
12) See [SO] for the Cuuier-Geofioydebate before 13) See [52]: Discussions between Goethe and
and beyond the Academie. Eckerrnann of the 2nd August 1830.
1.4 Bringing Chemical Solutions to Biological Problems
questions was left open; that of whether in the union of two gametes into
a zygote a mixture of the genes involved took place (blending inheritance),
occupied a key position. It could only be answered after:
Gregor Mendel [54]had set out statistical rules for the passing
on of particular hereditary characteristics from generation to
generation, which are useful for discussion on the complex
relationships in questions of heredity, and
Wilhelm]ohannsen [55] had coined the terms phenotype and
genotype, which made it possible to distinguish between a
statistically apparent type (the phenotype) of observable
properties and the corresponding genetic make-up (the
genotype) of an organism.
The distinction between genotype and phenotype facilitated the separation
between genetics and embryology. It is clear from this separation that the
differentiation between genetic and environmental causes in embryology and
the wider discipline of developmental biology is something to talk about.
1.4.1.3 The post-Darwinian Era

The post-Darwinianera saw the vision of Darwinian evolution through natural
selection being accepted as a reality. Since then, evolution has been observed
in action in many living organisms and also in innumerable viruses [56, 571.
Through Manfied Eigen’s paper on the role of “Self-organization of Matter
and the Evolution of Biological Macromolecules” [58] Darwin’s ideas have
been placed on firm physical foundations and have been tested by in vitro
evolution experiments [59]. The Darwinian view of evolution has prompted
biologists to think in terms of dynamic populations while considering a species
[60].To avoid misunderstandings among nonbiologists, Eigen introduced the
term quasispecies. Because of mutability, self-replicating systems are always
ensembles of mutants and are not, in any circumstances, single species made
up of uniform individuals. To indicate quantitative proportional relationships
between quasispecies and their mutants, Eigen’s evolutionary model uses a
multidimensional representation (sequence space). In a nucleic acid space [61]
(protein space [62]14)),each nucleic acid (protein) sequence is represented
in the sequence space by a point and each change in the sequence by a
vector. If the points in a sequence space are assigned specific scalar fitness
values, a fitness landscape is obtained. The metaphor of a fitness landscape
(adaptive landscape) was introduced into evolutionary biology in 1932 by
Sewall Wright [64] and was afterwards used abundantly, if with a certain
breadth of interpretation, by theoretical biologist^^^). The picture conveyed
14) See [63]:Footnote 10. their professional colleagues. T. Dobzhan-

15) R. A. Fisher, /. B. S. Haldane, and S. Wright sky, G . G . Simpson, and E. Mayr successfully
count as mathematical biologists; their pub- interpreted the mathematically formulated
lications were understood only by some of theorems [65].
20
I by the metaphor is that of an evolving population subject to exclusion of
unfit mutants making uphill progress until a local peak is reached. For the
evolutionary process in the high-dimensional sequence space, local peaks in
the vicinity may readily be reached by small jumps, without the need to traverse
the valleys between them, and a continuous sequence of small jumps to reach
a global summit is a realistic prospect. To use Eigen’s own words: “Because
of frequent criss-crossing of paths in multidimensional sequence space, by
virtue of its inherent non-linear mechanism which gives the appearance of
goal-directednessthe process of evolution is steered in the direction of optimal
value peak” [8b]. In brief, biological evolution uses two processes: genetic
mutation (as a means of generating random diversity) and natural selection
(as a means to optimize the peak-jumping technique) in the environmentally
shaped fitness landscape.
Through the removal of subdisciplinary barriers, biology’s evolutionary
thinking has contributed on two occasions to enhance that science’s voice in
the choir of the natural sciences. In the 1940s and 1950s, a union of Darwinian
and Mendelian perspectives took place in Modern Synthesis [65], whilst at the
turn of the twentieth to the twenty-first century a union of developmental
and evolutionary biology into evolutionary developmental biology (Evo-Devo)
is taking place before our eyes in the New Synthesis [66].
1.4.2
O n the Sequence of Chemical Synthesis (Preparation) and Biological Analysis
(Screening)
In an ideal starting situation for the synthetic chemist the structure of the
target molecule is already given. In the real world of the search for active
substances, the matter of whether a target molecule is to be synthesized is
determined by its presumed profile of properties. If a management decision
is made in favor of a target molecule to be synthesized, the synthetic chemist
then looks for a way to relate molecular function back to molecular structure.
This is based on the supposition that a functional unit should contain at
least two structurally complementary molecules non-covalently bound to one
another in a supermolecule. The idea of supermolecules as supramolecular
functional units, nowadays preached and systematically further developed
most conspicuously by Jean-Marie Lehn [67], goes back directly to Emil Fischer
[31], who introduced the instructive lock-and-key metaphor as early as 1894.
Fischer’s metaphor, as the tip of the submerged model of molecular recognition,
traces the function of a supermolecule back to structural interactions between
its complementary constituents. Through this, the complementarity between
substrate and enzyme was to become the basis of enzymology. Paul Ehrlich
seized on the lock-and-key metaphor in his 1908 Nobel lecture [68], and the
goal of chemotherapeutic endeavor thereafter came to be regarded as the
activation or deactivation of a receptor through noncovalent binding of a
complementary effective substance. Structural complementarity of effector and

I
receptor accordingly represents the fundamentals of chemotherapy, similar to
the way in which complementarity of antigen and antibody is regarded as
central to immunology.
The goal of synthesizing a target molecule with particular properties can
be achieved with the aid of two problem-solving processes based on different
principles.
In one problem-solving process, illustrated by the image of the key and
its lock, the maxim is to m o d i h a designed target structure little by little
until the corresponding target molecule has the very properties of interest. It
involves an iterative procedure, usually of several rounds, based on trial and
error. It is trivial to note that the screening can take place only after the
synthesis.
In the other problem-solvingprocess, which can be illustrated by the image ofan
assortment of keys, hopefully containing the key that will be complementary
to a given lock, the maxim is to develop a parallel structured search method,
with the aid of which the matching key will befound, without it being necessary to
subject the whole ensemble of candidates to the totality of&nctional tests. This is
a procedure based on the principle of trial and selection. Since a distinction
has been drawn between synthesis and preparation (Section 1.2.1),some spin
doctoring should come as no surprise. After preparation is performed on a
microscale, screening will follow before the synthesis on a macroscale. For the
time being, we should come back to the traditional search for a biological, with
a very particular function.
1.4.2.1 Single-componentConsecutive Procedure

In traditional single-component consecutive procedures, the synthetic chemist
each time focuses on a structure (a molecule) from a series of successive
candidates. The example of the total synthesis of estrone in Sections 1.3.2 and
1.3.3 demonstrates the adaptation of synthetic goals to the state of the art
in organic synthetics. The case studies described there have academic value
that should not be underestimated, though for industrial synthetic practices
they are not directly relevant because estrone will in general be commercially
more advantageously accessible through partial synthesis than through total
synthesis. In the search for an ovulation inhibitor outlined below, however,
total synthesis plays a commercially acceptable role, since partial synthesis
drops out as a serious contender from the second generation of inhibitors to
be discovered in future.
1.4.2.1.1 Oral Contraceptives

Thanks to initiatives instigated by Margaret Sanger, probably the highest-
profile campaigner worldwide for family planning, a project geared toward
the development of an orally administrable contraceptive was initiated in the
22
I early 1950s under the reproductive biologist Gregory G. Pincus at the Worcester
Foundation for Experimental Biological Research [69a].

It was known that progesterone established and maintained pregnancy as an
endogenous gestagen and so was able to act as a contraceptive. As progesterone
was not suited for oral application, a systematic search for the steroidal
structure space was carried out for an exogenous gestagen [69b] that - orally
administered - would bind to the progesterone receptor, hereby initiating
a series of molecular events culminating in the induction or repression of a
certain set of target genes. Binding of a gestagen to the progesterone receptor is
necessary but not sufficient for the former’s playing an active role as an agonist
in reproductive biology. This became clear as soon as an antigestagen like R LJ
486 [70] was found, which bound to the progesterone receptor, but - unlike
an agonist - was unable to trigger the gestagenic response. As it turned out,
there is no known parameter of effector binding that can predict differential
agonistic or antagonistic activity of a steroid.
If a metaphorical statement can ever reveal “how things are”, Emil Fischer’s
static lock-and-keymetaphor [31a]ought to be replaced with a dynamic one. This
was done by D. E. Koshland’s induced-jit concept [31b],which readily produced
the self-explanatory hand-and-glove metaphor. Binding of a given effector will
bring about a conformational change of the receptor that is favorable for
catalytic activity of the formed supermolecule.
G. G . Pincus and M . C. Chang investigated a diverse range of variants of
about 200 steroids [69b], which were in most cases not naturally occurring
compounds but products that had accrued in countless laboratories as a
result of arduous individual studies on their biological functions. They found
that combinations of a gestagenic and an estrogenic 19-nor-steroid exhibited
the desired effects. These findings from animal experiments (rabbit and
rat) were also confirmed in humans, in almost militarily planned (Pincus)
clinical studies (by the gynaecologists I. Rock and C. R. Garcia). In the
early 19GOs, a combination pill made up of norethindrone (prepared by
C. Djerassi at Syntex in 1951 [71]) and 17w-ethynylestradiol (prepared by
H . H . Inhofen at Schering AG in 1938 [72]) reached the market as the first-
generation pill.
Members of the First Generation

Norethindrone 31a, the gestagenic component in the combination pill, is
smoothly accessible from estrone-methylether by partial synthesis [71]. The
reaction sequence begins with a dearomatization (Birch reduction) and ends
with an ethynylation (Scheme 1-10), necessary for the oral applicability.
Technical production of estrone 24 (or estradiol) from inexpensive steroids
such as diosgenin or cholesterol by partial synthesis is also feasible. Pyrolytic
aromatization (Inhofen at Schering A G ) assists the transition from the steroid
to the 19-nor-steroid class (such as from androsta-1,4-dien-17~-01-3-one 32 to
estradiol33 [72]).
123
HO
32
&
3, a: R = M e 33
b: R = Et
;fi
\
Me0 a: R = Me
35 b: R = Et
34
Me0
Me0
37 38
Scheme 1-10 Collection o f formulae relevant to Trogov's concept o f a steroid synthesis

following the AB +
D + ABD + ABCD aufbau principle.
Members of the Second Generation

Here the gestagen (-)-norethindrone 31a has been supplanted by (-)-
norgestrel 31b. The difference between the two molecular structures, minor
in itself, still has far-reaching consequences for biological action and synthetic
accessibility. The presence of the ethyl group in place of the methyl
group at C( 13) slows down the compound's metabolism, thereby increasing
bioavailability and also ordaining that total synthesis now has to take the
place of partial synthesis. This begins (Scheme 1-10)with the condensation of
(~)-l-vinyl-l-hydroxy-G-methoxy-l,2,3,4-tetrahydronaphthalene (rac-34)with 2-
ethylcyclopentane-l,3-dione(35b) [73]. The resulting seco-dione 3Gb, with
a meso configuration, can be reduced microbiologically to one of four
stereoisomers: the microorganism used (Saccharornycesuvarurn) approaches
the surface of the five-membered ring differentially from one of the
two diastereotopic half-spaces and selectively attacks only one of the two
enantiotopic carbonyl groups [74b]. The reduction product 37b can be
stereoselectively converted into (-)-38b (as reported by V. Torgov [74a]) and
finally ( H . Smith [75])into (-)-norgestrel 31b.
24
Members of Later Generations

The search for unnatural gestagens with improved properties by the trial and
error approach continues. Oral applicability (through ethynylation at C(17))
and at low dosages (thanks to slow metabolism because of the ethyl group
at C(13)) have already been achieved. A new, exogenous gestagen therefore
has prospects of being favored over already known preparations only if it
distinguishes itself in at least one of the three following aspects:
through a higher binding specificity to the complementary
receptor (i.e., biological);
through more economically advantageous accessibility (i.e.,
chemical);and/or
through some advantage arising from patent law
(i.e., commercial).
What this means in detail should become clear through illustration with
later-generation gestagens.
Gestoden 39 (Scheme 1-11) has the lowest ovulation inhibitory dose
of all gestagens known to date. It displays both antiestrogenic and
antimineralcorticoidal activity. A lower affinity to the androgen re-
ceptor is not sufficient to produce measurable anabolic androgenic
effects.
The pathway to 39 passes through compound 47 (Scheme 1-12) [7G] and
after microbiological introduction of an 0 function at C(15) (with the aid
of Penicilliurn ruistuickii), on through the stations 48 (R = H or Ac) and
49 [77]. Compound 31b, incidentally, can be easily obtained starting from
47 [78].
Desogestrel 40 (Scheme 1-11) is a progestagen that is transformed in
the intestinal mucosa and in the liver into the actual effective metabo-
lite 3-ketogestrel. The bioavailability is around 75%. Desogestrel, obtained
partially synthetically by chemists at Orgunon [79], displays minimal an-
drogenic and estrogenic activity. The long pathway from the 19-nor-
steroid estr-4-ene-3,17-dione includes a microbiological hydroxylation of
39 40 41
Scheme 1-11 Cestagens of the Pill of later generations: (-)-gestodene 39,

(-)-desogestrel40, and (-)-drospirenone 41.
125
.J-:3:1
&&
42 43 44
<! 0 0 /
O A O E t
45 46 47
48 49
Scheme 1-12 Collection offormul ae relevant to syntheses of (-)-norgestrel 31b a nd

(-)-gestodene 39 in both cases via 47.
the steroid skeleton at C(11) and an intramolecular functionalization of

C(18).
E. J . Corey et al. [80]reported a total synthesis (Scheme 1-13) beginning with
the reduction product 50, easily accessible from 42'"'.
Alkylation of the metallated enol derived from 52 with m-methoxy-
phenylethyl-iodide to afford the tricyclic P-keto ester 53, followed by
cationic cyclization of this to furnish the steroid derivative 54, warrants
particular attention. Corey and colleagues have recently published another
total synthesis of 40 [82], beginning with an enantioselective Diels-Alder
reaction between Dane's diene 14 and dienophile 61. An oxazaborolidinium
salt (see Section 1.3.2.3)was used as an efficient catalyst (Scheme 1-14).
Drospirenone 41 (Scheme 1-11),the latest of the exogenous gestagens,
differs from its antecedents in some characteristic ways:
16) The bicyclic, chiral, non-racemic building excess from the achiral triketone precur-
block 42 represents a milestone in the his- sor through a proline-catalyzed, intramolecu-
tory of organic chemistry. It is accessible lar aldol condensation (Hajos-Parrish-Eder-
in high chemical yield and enantiomeric Sauer- Wiechert reaction [76,81]).
26
I 7 Chemistry and Biology - Historical and Philosophical Aspects
54 55 56 57
58 59 60
Scheme 1-13 Collection offormulae relevant t o a synthesis of (-)-desogestrel40 opened

by the asymmetric Hajos-Parrish-Eder-Sauer-Wiechert reaction.
61 62 63 38
26 b 64 65
Scheme 1-14 Collection o f formulae relevant t o a synthesis of (-)-desogestrel 40 opened

by an asymmetric Diels-Alder reaction o f Dane’s diene 14 and dienophile 61.
I . 4 Bringing Chemical Solutions to Biological Problems
127
constitutionally, in that both angular positions are occupied
by methyl groups whilst the tetracyclic steroid skeleton is
endowed with three additional rings, and
biologically, in that 41 is an unnatural gestagen that both acts
as an aldosterone antagonist and at the same time displays
pronounced antiestrogenic and antiandrogenic
properties.
With this combination of activities in one and the same dosage, drospirenone
currently holds a leading position in hormonal contraception, although it
requires a higher dosage than gestagens with an ethyl group at C( 13).
The synthesis ofDrospirenone 41 (Scheme 1-15) [83]starts with the inexpensive
androstenolone 66, which can be converted microbiologically (Colletotrichum
h i ) into the 7a,lSa-dihydroxy derivative 67. A selective epimerization at C(7)
proceeds by way of the acetalG8. Methylenation of the intermediate (C=C) bond
appearing between C(15) and C(1G) is successfully accomplished with the aid
of dimethylsulfoxonium methylide to provide 71, and that of the (C=C) bond
between C(G) and C(7) through a Simmons-Smith reaction. The conversion
of 76 into 41 can be carried out in a one-pot procedure, with a Pd-catalyzed
hydrogenation being followed by a Ru-catalyzed oxidation and a hydrochloric
acid-induced dehydration.
66 67 68 69
70 71 72 73
74 75 76
Scheme 1-15 Collection o f formulae relevant t o a synthesis of (-)-drospirenone 41

starting from the easily accessible androstenolone 66.
28
Pinkus and Chang (Section 1.4.2.1.1),in their search for orally applicable
contraceptives, had decided upon norethindrone after some 200 steroidal
candidates had been examined one by one. Chemists at Schering AG had
stumbled upon drospirenone after some 600 newly prepared molecules
with antialdosterone activity had become available [84].It can be justifiably
stated that the hardly ineffectual pharmaceutical industry had finished up
in a Mind alley in its search for new active substances by using traditional
strategies [85].
The rapidly progressing expansion of the world market, where new suppliers
have arrived in great numbers (globalization), places serious decisions before
the management of every multinational company [86] (see Section 1.3.1).
These are not merely restricted to restructuring of portfolios of the products
manufactured; they also do not exclude the reorganization of the entire
company structure”). Under real pressure from financial analysts and
resumptive pressure from shareholders, questions have also been directed
toward the scientists involved: whether there might be new methods that
could afford more rapid access to new active substances. The answer was not
long in coming: with chirotechnologyI8)and the combinatorial acceleration of
the preparation and screening of whole populations of molecular candidates,
a new turn has been taken in the solution of biological problems through
chemical methods.
1.4.2.2 Multicornponent Simultaneous Procedure

Darwinian evolution is kept in motion by a continual succession of newly
arising variation and its modification by natural selection. The search
for active substances proceeds through multiple-component simultaneous
procedures, in which a restricted variant population is prepared on a
microscale by a combinatorial strategy, to be subjected to the new form
of selection, that is, collective screening. After a successfully applied
unnatural selection of a particular variant with the desired properties,
synthesis on a macroscale can take place. In Section 1.4.2.2.1 a static
variation is going to be prepared and screened for anti-inflammatory
17) The consequences arising from reorganiza- 18) One of the main challenges of synthetic
tion of the structure of a business may be chemistry in the post-Woodwardian era (see
guessed by careful market analysis. Most dif- Section 1.3.2.3) is to find routes that sat-
ficult to predict is the reaction of employees. isfy the demands of industrial applicability
If the creative people among them are not to enantiomerically pure compounds [37].
convinced by the new orientation, or have In 1992, various international journals (Fi-
even been put off by the way in which it has nancial Times, Neue Ziircher Zeitung, Science,
been implemented, they may defect to the and Chemical & Engineering News), as if co-
competition, thus doubly weakening their ordinated by a global editor, touched on the
previous employer. phenomenon of chirality. C&EN even pre-
dicted that chirotechnology may progress in
the future as biotechnology had grown in the
past.
activity of individual variants that might be useful in controlling asthmatic

inflammation19’.
The worldwide incidence, morbidity, and mortality of allergic asthma are
increasing. Asthma has become an epidemic, affecting 155 million individuals
throughout the world. It is a complex disorder characterized by local and
systemic allergic inflammation, mucus hypersecretion, and reversible airway
obstruction [88].The pathogenesis of asthma reflects the activity of cytokines
from T Hcells.
~ Without these cells there is no asthma. Animal models support
important roles for the cytokines IL-4, IL-5, and the recent IL-13 [89].The latter
is closely related to IL-4: they both bind to the same IL-4 receptor, to the
a-chain of that receptor, particularly.
The molecular biologist is interested in the molecular consequences
of allergen binding to the T-cell receptor. Experimental investigations
have revealed various signal-transduction pathways that link T-cell surface
molecules with nuclear transcription events. A [Ca2+]-dependentroute has
been discovered, emanating from the T-cell receptor, which can be blocked by
natural products of fungi: cyclosporine A (CsA) and FK 506 (Scheme 1-16).
Another signal-transducing pathway, independent of [Ca2+],emanates from
the IL-2 receptor and controls translational events on ribosomes. It can be
blocked by a third natural product, rapamycin, but not by CsA or FK 506.
Two signaling pathways have been targeted for pharmacological treatment
of unwanted immune responses. It is essential to realize that blocking
signal transduction leading to regulated transcription or regulated translation,
requires CsA or FK 506 on the one hand and rapamycin on the other to
be more than an inhibitor of a cognate target protein: calcineurin in the
former and fascilin related adhesive protein (FRAP) in the latter case. As a
matter of fact, the fungi-derived ligands in each case act as a “molecular glue”
that mediates the interactions of primary and secondary receptors, forming a
ternary receptor-ligand-receptor complex. Calcineurin is blocked by CsA and
by FK 506, but only, after the two ligands have been activated by each complex
primary receptor, cyclophilin A and FK-506 binding protein 12 (FKBP 12),
respectively. In a similar way, rapamycin, on forming a binary complex with
the primary receptor FKBP 12, is promoted to block the secondary receptor
called FRAP on ternary complex formation (Table 1-1).
An antigen bound by the receptor of a T cell sets in motion a long cascade of
signal carriers and subsequent proliferation of T cells. In allergic subjects, this
signal cascade can be initiated by allergens, which are by themselves actually
harmless, leading to undesired T-cell overproduction. For allergy sufferers,
therefore, it is desirable to specifically interrupt or slow down transcriptional
or translational signal cascades involved in T-cell production. Because FK
506, rapamycin, and CsA are effective immunosuppressants, they cannot be
19) Project of the G e m a n Federal Ministry of’

Education and Research (87a], initiated by
A. Kleemann, K. Brune. G . Quinkert; fordetails
see (631 and [87b]. Beginning: 1 July 1994.
30
I
\
FK 506 Rapamycin
-4
CsA
Scheme 1-16 Natural immunosuppressants.
Table 1-1 Naturally occurring immunosuppressants (ligands)

and their receptor complexes
Ligand Primary receptor Secondary receptor
Cyclosporine Cyclophilin Calcineurin

FK 506 FKBP Calcineurin
Rapamycin FKBP FRAP
Binary complex
Ternary complex
considered suitable for long-term treatment of allergic patients. The search is

on for nonnatura120)ligands with a more specific action on the immune system.
A collection of non-natural ligands - synthesized independently in various
laboratories - has demonstrated an immense chemical production effort in
search of specific modulators of the immune system with significantly reduced
20) V. Prelog [90]has underlined the viewthat nat- argument that “natural products are biolog-
ural products hold a worthwhile message. H. ically validated starting points in structural
Waldrnann et al. [91] entertain the plausible space for compound library development”.
1.4 5r;nging Chemical Solutions to Biological Problems
molecular complexity. One can’t help wondering why the traditional method,
I 31
making one compound at a time, analyzing it, and evaluating it biologically

indubitably was applied by all synthetic groups involved. As the synthetic target
structures aimed at are represented by isolated points scattered irregularly
over a relatively small segment of structure space, a combinatorial approach
furnishing a focused variation, whose members ought to be represented by a
cluster of points in abstract structural space, would seem promising.
1.4.2.2.1 Preparation and Screening o f a Static Variation

The combinatorial approach that was pursued in search of an antiasthma
drug based on a split-and-mix strategy [92] as a practical use of the operational
principle of parsimony was to get the most with the least; in this case, to get
343 different types of variants in only 21 reaction steps. Scheme 1-17 sketches
Scheme 1-17 Construction o f a encoding-decoding alternation (resulting in

binary-encoded [93]combinatorial variation a state with every bead carrying a single
using the split-and-mix protocol (resulting tripeptide sequence).
in an one-bead-one-variant state) and an
32
I how a biased variation of 343 members was obtained on resin-beads in three
preparative rounds, each round allowing for the parallel attachment of one out
of seven building blocks available.
The complete set of monomeric building blocks used in the construction
of the combinatorial variation of Scheme 1-17 is shown in Scheme 1-18.The
aesthetic elegance of the combinatorial strategy reveals itself when compared
with alternative strategies*’).
The bead-bound substrate variation was screened for binding to a biological
receptor (a fluorescence-conjugated immunophilin [87])by mixing a sample
of the charged beads with a buffer containing the complementary protein.
The beads that carry variants with affinity for the receptor are easily identified
by visual inspection under a microscope with a fluorescent illuminator and
removed with the aid of a (non-plastic) syringe. The sequence of each bead-
bound substrate variant has been determined indirectly but unambiguously
by Clark Still’s encoding-decoding alternation [93].
Molecular encoding: During each step of the construction of a focused variation
of tripeptides (see Scheme 1-17)tagging molecules are attached to the beads
Scheme 1-18 21 building blocks for the preparation o f t h e 343 tripeptides of

Scheme 1-17 (building blocks 6,10, and 11 were used as racemates).
21) A divergent approach would require 399

+ +
(7’ 7’ 7 3 ) reaction steps, a serial ap-
proach even 1029 (73+ 7’t 7’) reaction
steps to reach the same 343 variants [63, 871.
7.4 Bringing Chemical Solutions t o Biological Problems
that encode both the step number (one through 21) and the reagent (amino
I 33
acid or acid chloride, respectively) used in that step. A combinatorial encoding

of the 21 reaction steps requires altogether seven molecular tags (i.e., A, B, C;
AB, AC, BC; ABC in one round).
Molecular decoding: After screening the variation, the molecular tags22'can be
cleaved photochemically from each of the selected beads and analyzed by gas
chromatography [93].The specified on-bead selection test afforded a mixture
of ruc-77 and rac-78 (Scheme 1-19).
To explore its biological properties by various functional tests [94],
a substantial amount had to be synthesized. Instead of going for 79
(Scheme 1-19)the more distant compound 80 (Scheme 1-20)was aimed at, by
conventional synthesis technique.
The cause for replacement oftarget structure 79 with 80 was accidental. While
looking for linkers for solid-phase synthesis that can be cleaved enzymatically,
the substitution took place. Substitution of the B-methoxyethylamino residue
by the Z-protected lysine residue [87] led to higher biological activity in
various functional tests. Compound 80, recently, [94] has been considered
to be a promising candidate for the treatment of diseases accompanied by
immunological inflammation.
The combinatorial approach produces large variations of related molecules,
which can be exploited by appropriate screening techniques. As far as the
production ofthese variations and their screening are concerned, combinatorial
chemistry reminds one of the immune system. In the immune system,
antibodies recognize cognate antigens. Those antibody-producing cells that
are effective against a particular type of invader molecules preferentially evolve
from a huge population. If the invaders are pathogens or parasites, dynamic
6 OCH3
77
6 OCH3
78 79
OCH3
Scheme 1-19 On-bead molecules (rac-77 and roc-78) selected from the variation of
Scheme 1-17. and the seeming target structure 79.
22) The molecular tags that were used are

composed of a series of electrophoric tags
(halophenol derivatives) plus a photolabile
linker [93].
34
I 1 Chemistry and Biology Historical and Philosophical Aspects
H 0
\
80 0 81
82
81 a)82
81 -bl
83
82+83 - - -
CI
84
d)
85
+86
e)
80
a) 6 0 ~ ~aq0 NaOH,
, dioxane, 90 %
b) MeOH. SOClp, 98 %
c ) 2-Chloro-1methylpyridiniumiodide, CH2Cl2.NEt3. 50 %
d) MeOH. 2.5 N NaOH, 74 %
e) 2-Chloro-1methylpyridiniumiodide, CH2Clp,NEt3. 86 %
Scheme 1-20 Collection of formulae relevant to a synthesis of the biologically active

candidate 80.
coevolution between them and the host may occur. There is, however, a
tremendous difference between a static variation and the immune system.
While the processes of preparation and screening of a static variation were
designed by chemists, what happens in immunology was not designed but
rather evolved.
The preparation of a dynamic variation (to be described in the following
section) is somewhat in between the two extremes, though very much closer
to the designer's end.
1.4.2.2.2 Preparation and Screening of a Dynamic Variationz3)

In the previous section, a well-known method was applied to a long-standing
biological problem: the discovery of a new biologically active substance. With
23) For dynamic non-covalent chemistry see

1951.
the intention of finding such a substance displaying properties closest to a

I 35
setup profile, a static molecular variation was prepared (on microscale) and
screened (collectively) to afford a select variant qualifying as the candidate
for subsequent synthesis (on macroscale). In this section, we present the self-
assembly ofa variation ofthree sets ofconjugates from which an added receptor
selects a number of effectors by molecular recognition. This selection works
by way of the interactions of protein surfaces within the receptor-effector
supermolecule, the knowledge of which ought to be helpful in drug design.
The self-assembly to be introduced is based on three pyranosyl-RNA (p-RNA)
[96] single strands (a, b, and c, Scheme 1-21) associating in a Watson-Crick-like
manner, initially into binary and further on into ternary super molecule^^^). In
Scheme 1-21 Base-pairing dynamics of single strands a, b, and c.
24) Project of the G e m a n Federal Ministry of

Education and Research [97a];for details see
[87b][97b]. Initiated by A. Eschenmoser, U.-H.
Felcht, G. Quinkert [97c]. Beginning: 1 April
1995.
36
I addition to the H bridges, intercatenary n,n-stackingeffects make a substantial
contribution to the stabilization of the resulting duplexes [9Ga, 9Gd].

In its current form, the self-assembly is based on three p-RNA single strands
with 7 (a and b) or 14 (in the case of c) nucleobases. The two short strands are
sequence complementary to the first seven or the last seven bases in the longer
strand. The pairing gives rise eventually to water-soluble ternary complexes
acb (Scheme 1-21). Strand c is involved in all the equilibria. Since strands a
and b are unable to pair with one another and as they bind to non-overlapping
regions of c, they do not compete with each other in binding to c. The unusual
designation acb is used to reflect the dominant role of the longer strand c in
complex formation.
The following equilibria, with five independent equilibrium constants25),
apply to the pairing of the complementary strands:
ci + aj *aj : ci,
Subscripts i,j , and k are used to distinguish various possible sequences

displaying the required complementarity.
Scheme 1-22 shows a network representation of the above set of equilibria.
The nodes in the network correspond to the individual strands involved in the
equilibria, while the lines represent their possible associations or dissociations.
Along a given line, the concentrations of a single strand or of several strands
vary between zero and the maximum disposable value. Each of the colored
lines corresponds to a single strand, whilst black lines relate to more than one
strand or to a binary complex. With the exceptions of a and b, which have only
two connections each, all other nodes have at least three available connections,
whilst the node for the ternary acb complex has as many as five. The network
here results from the superposition of the synchronous formation from a, b,
and c with the formation both from ac plus b and from cb plus a.
25) (1)and (2) form closed subsystems. As soon out of the three single conjugates. Since
as all three components are present, how- this corresponds to third-order kinetics, a
ever, the full system of equilibria (1-5) is process of this type is significantly less prob-
valid. Equilibrium (5) represents the syn- able than the purely bimolecular processes
chronous formation of the ternary complex (1-4).
I 37
I I ' I l \ rh
a acb b
C
Variation of [a]
Variation of [b]
~ Variation of [c]
Scheme 1-22 Network representation of equilibria (1)-(5)
In a three-dimensional representation, the strands and their complexes can

be arranged as the vertices of a trigonal bipyramid, its edges corresponding to
the equilibrium arrows from (l)-(S)26).Each state ofthe system is thus a point
within the trigonal bipyramid.
The stability of the complexes may be preserved when the pairing-capable
strands a, b, and c are extended into sets of conjugates2'' A, B, and C
(Scheme 1-23).
Coupling with a series of oligopeptides transforms the pairing system (self-
assembly system) with the three single strands a, b, and c into an exploring
system (molecular recognizing system) with the three sets of conjugates A,
B, and C. The equilibria (1)-(5) also apply to the conjugates, if the subscripts
i, j, and k are used to denote the oligopeptides employed. For the resulting
system there is a particular assignment of roles: the pairing system based
on the p-RNA strands a, b, and c serves to bring the peptide regions into
proximity with each other, thus supporting their joint function. The law of
mass action applies here not only to the self-assembly but also to molecular
recognition, ensuring that the full potential of the structural variation can be
exploited.
As effectors, the triple peptide combinations are capable of entering into
specific interactions with a further component, a receptor R (Scheme 1-24).As
a selector of complementary oligopeptide combinations, the receptor enables
unnatural selection from the variation of conjugates.
26) I t should be pointed out that the transition 27) For the conjugates the following p-RNA se-
from ac to cb does not take place as a quences have been used: a = {CGGGGGNJ.
direct, single process, but should be regarded b = [NGAAGGG], and c = (CCCTCTNCC
only as a conflation of processes ac cf CCCG}. N is a tryptamine nucleoside [98],
a + c and cb c) c + b. The corresponding which serves to attach the oligopeptides
edge of the bipyramid thus - unlike the (discrete random variation of hexapeptides
other edges - does not symbolize a single composed of the amino acids C, E, F, H , K ,
equilibrium. L, N, R, S, T, W).
38
Scheme 1-23 Equilibria between members ofthe three sets o f

conjugates of types A, B, and C each with p-RNA moieties (gray)
t o make self-assembly possible and oligopeptide moieties (green)
t o allow molecular recognition.
The equilibria (1-5) described above now need to be supplemented, first

to take account of the receptor itself, and second to allow for the receptor
complexes with the various components of binary and ternary aggregates
shown in Scheme 1-23: altogether eight molecular species are now involved.
Scheme 1-25 shows the corresponding network of 8 nodes and 28 possible
equilibria, each of the nodes having 7 connections. As in Scheme 1-22, green,
red, and blue lines represent the possible binary equilibria, whilst black lines
denote potential ternary and quaternary equilibria.
In the interactions with a receptor, unlike in the case of the separate ternary
complex, there are several types of substitution equilibria in which conjugates
I 39
Scheme 1-24 Sketch o f molecular recognition of a receptor (R) by a complementary

effector (here by a discrete variant of type ACB).
are exchanged. There are three types of pure binary substitutions, and two
higher order substitutions where one conjugate is substituted for two others at
a time. Whether these simultaneous exchanges of several conjugates, as well as
the higher order associations and dissociations are relevant, though, remains
to be determined experimentally. The alternative of stepwise processes is
available in any case.
Topologically, the molecular species can be ordered into four levels of
complexity28’(Scheme 1-25). On the simplest level is the free receptor R. The
level above is represented by the binary complexes R:A, R B , and R C , the next
level by the ternary complexes RAB, RAC, and RBC, whilst lastly the level of
highest complexity is occupied by the quaternary complex R:ACB. Accordingly,
the participating species can be arranged as vertices of a cube. All possible
equilibria are now either edges, or face- or space-diagonals of the cube and the
system is, by definition, described by a point inside the cube at any time.
The cube-style representation shows, firstly, that pathways from one species
to another are possible either via both edges and diagonals, or exclusively via
28) The free ternary complex and its subsystems

are found on these levels likewise and are
continuously present over the full span of
equilibria. For the sake of clarity, however,
they are not explicitly taken into account here.
40
Scheme 1-25 Network representation of all artifacts of the two-dimensional

possible equilibria extending Scheme 1-24. representation. For the sake o f clarity, face-
The eight nodes are labeled by bold and space-diagonals ofthe cube are not
characters. All other intersections are shown.
edges or diagonals. Secondly, it also demonstrates the high syntactic symmetry

(equivalenceof the different types of interactions) of the system and underlines
the exchangeability of receptor and effectors.
To delineate pharmacological properties of members of the dynamic system
shown in Scheme 1-25, data of an enzyme-binding experiment from a real-
time biomolecular interaction analysis27)and data of an enzyme-inhibition
experiment from a photometric assay30)have been correlated (Scheme 1-26).
One can see that the strongest affinity (binding) does not give rise to the
greatest activity (inhibition). Affinity is not proportional to activity. Species
RAC shows the strongest affinity, whilst species RACB causes the greatest
activity. Since species RCB has the weakest affinity, it is clear that B makes
no cooperative contribution to affinity, but is important for effective activity.
29) The biotinylated conjugates (ACB, AC, BC, 30) The enzyme is mixed with its photolabeled
or C) are captured by a sensor chip, whose substrate S. Upon cleavage by the enzyme,
surface is coated with immobilized strept- the label is activated and fluorescence can be
avidin and which acts via surface plasmon detected. In case ofinhibition by the effector,
resonance as a tool for enzyme (R) binding cleavage does not occur and fluorescence is
experiments. not detected.
Obviously, there is no additivity of the individual conjugates’ contributions.

I 41
From the quantitative point of view this corresponds to non-linear behavior.

The influence on the enzymatic reaction has to be interpreted in terms of
either competitive inhibition (ACB:R)31), uncompetitive inhibition (ACB:RS),
+
mixed inhibition (ACB:R ACB:RS), or substrate capture by the conjugates.
It should be noted that interactions of A, B, and C with the receptor may
mutually influence one another in both cooperative or anticooperative fashion.
Furthermore, the coordinating role that conjugate C is playing in self-assembly
(Scheme 1-23) may be pushed into the background or may even be absent
entirely while interacting with the receptor.
Scheme 1-26 Correlation diagram of affinity (binding) and activity (inhibition) for some
nodes ofthe network of Scheme 1-25. Values for ACB are set to 100%.
31) Here, and in the other possibilities men-

tioned, ACB:R stands for any ofthe molecular
species from Scheme 1-25 containing the re-
ceptor.
42
For a screening experiment on enzyme inhibition (Scheme 1-27),a variation

I 43
of conjugates of types A, B, and C was formatted spatially addressable using

16 microtiter plates. One out of 1308 different C conjugates was given each in
a separate well, together with 1of 8 different A conjugates and 1 of 11 different
B conjugates, as indicated on the margins. In 99 of the remaining wells, the
single A or B conjugates were given as inactive blank controls. The last well
was filled with solvent and buffer, only. To each of the various mixtures the
enzyme used was added, together with its fluorescence-labeled substrate s.
In each well, the enzyme could either select the substrate or the conjugates
of Scheme 1-25. In the first case, the labeled substrate would be cleaved by
the enzyme and fluorescence observed. In the second case, inhibition of the
enzyme would occur and little or no fluorescence detected.
The color coding in Scheme 1-27 indicates the degree of inhibitory activity
found in each case. White and pale blue denote inactive substances, red and
violet denote strong inhibitory effects. In a separate measurement, an ICs0
value of 23 nM was found for the strongest inhibitor (position A 8 / B l l on the
plate in the fourth column, third row). Surprisingly, there are not only single
point hits but also whole clusters of hits in which the participating conjugates
display inhibitory activity. A closer inspection of, for example, all the wells in
which conjugate A4 is present, reveals that the majority indeed shows activity,
independently of the B and C conjugates added. This notwithstanding, not
all 16 plates show the same distribution of active and inactive triplets, even
though the A and B conjugates are the same in each plate. So, variation in
the C conjugate significantly influences the activity of the A and B conjugates.
This is especially apparent in the mixtures of A3 with B1 through B8 and of
A2 with B1, B3, and B5 through B7 in the plate of the second column, third
row. Only in the presence of a C conjugate do A and B conjugates contribute
to the observed activity in this case.
The law of mass action suggests to depart from the 1 : 1: 1 stoichiometry
in the search for maximum activity. On changing the concentrations of
individual conjugates, one shifts the molecular system parallel to edges or
planes of the cube (Scheme 1-25).The statistical weights of the contributions
of individual conjugates to the network of interactions are altered in the
process. Scheme 1-28 shows the results of a pilot experiment in which the
inhibitory activity was measured as a function of the concentrations of the A
and B c o n j ~ g a t e s ~The
~ ) .results are displayed as a hypersurface for a constant
concentration of conjugate C. The sigmoidal dose-activity relationship is clearly
evident with regard to both A and B. The stoichiometric composition with
[A] = [B] = [C] = 555 nM is represented by a point located on top of a ridge,
separating a flat region of the hypersurface from a descending slope. Starting
from the stoichiometric point, activity increases with the concentrations of A
and B. The strongest inhibition value was found at the bottom of the slope
32) Results relate to the second strongest in- the third row and the second column with
hibitor found in the screening. In Sche- the conjugates A3/B1. The results presented
me 1-27 it is to be found on the plate in in Scheme 1-26 refer to the same complex.
44
Scheme 1-28 Three-dimensional stoichiometric composition

(hypersurface) view ofenzyme-inhibition [A] = [B] = [C] = 555 nM is close t o a ridge.
activity o f a combination ofthree Increasing the concentrations o f A and B
conjugates, A, B, and C as a function of the enhances the activity.
concentrations o f conjugates A and B. The
with [A] = [B] = 5000 nM and [C] = 555 nM, where the properties of A and B
have a 10 times greater statisticalweight than those of C33).From the foregoing
discussion it can be directly inferred that the activity of a conjugate triplet is
not connected to a single molecular species from Scheme 1-25.
Given the dynamics of the supramolecular system described, one could
go a step further and transgress the confinements of molecular constitution.
It should be just as possible to use carbohydrates, steroids, terpenes or
even nonbiogenic substance classes - dendrimers, for example - in place
of the peptides. Through the addition of conjugates of different types of
constitution, the transition from one type to another could be studied in a
quasi-continuous way, opening up a further, new option for the determination
of structure-activity relationships.
The dynamics of the system allows it to adapt to changes in the environment.
Adaptation here means that the balance between the interactions inside the
33) Comparing Scheme 1-28 with Scheme 1-26,

one can see that the increase of activity on
going from C to ACB, from CB to ACB,
and from AC to ACB is consistent with the
topology ofthe hypersurface in Scheme 1-28.
1.5 Bringing Biological Solutions to Chemical Problems
effector (between the individual conjugates) on the one hand and those
I 45
between the effector and the receptor on the other hand, can change. Therefore,
depending on the prevailing conditions, different molecular species may be
responsible for the effects produced at the receptor. Particular combinations
of members of the three sets described may be used to map the affinity
profile of the receptor. In short: receptor profiling directly results from a
thorough investigation of the dynamic system under discussion. It reveals the
complementarity between the sites of the interacting surfaces of receptor and
effectors and suggests the design for a specific, biologically active substance
finally taking over from the analyzing effectors.
Ultimately, the potential ofbiologicallyactive substances can only be assessed
in actual biological systems by means of animal experiments (Scheme 1-29)
and confirmed by subsequent clinical studies. En route to this, however, the
dynamic system described here offers various options for the analysis and
optimization of pharmacological parameters like affinity and activity. It is the
heterobifunctional character of the dynamic system that allows the synthetic
chemist to influence both intrinsic self-assembly as well as extrinsic molecular
recognition in a controlled way.
1.5
Bringing Biological Solutions to Chemical Problems
1.5.1
Proteins 1991
Among the bio-macromolecules, proteins are distinguished all-round players.

As fibrous proteins they are used for structural purposes. As enzymes they
catalyze almost every chemical reaction in a cell with great power and high
specificity. As gene regulators they control gene expression in development
and evolution. As antibodies (immunoglobulins) they bind invading antigens.
As motor proteins they convert chemical energy into kinetic energy. As
transport proteins they mediate transmembrane movements of ions or
metabolites.
1.5.1.1 A Look at Protein Structure and Generation from Different Angles

The chemist fills the void in structure space left by the physicist who dislikes
the integrated complexity of the molecular world. Even the chemist, for some
time, had been treating his structure space rather unevenly. According to the
Beilstein Doctrine341,macromolecules neglected by the organic chemist for a
34) Beilstein Handbook ofOrganic Chemistry,

an encyclopedia of known micromolecular
carbon compounds, does not concern itself
with macromolecular carbon compounds
[17e].
46
I
Scheme 1-29 Outlook: supramolecular network concept in pharmacology.
long time [17f],were finally taken up by the biochemist who could not afford
to ignore bio-macromolecules like nuclear acids and proteins any longer.
The bottom-up view of the biochemist eventually was complemented by the
top-down attitude of the (molecular) biologist. Quite a few of those scientists
who considered themselves molecular biologists entertained the idea [ 100aI
that “other laws of physics’ might be discovered by studying the gene”. This
search for the physical paradox [100b] remained an important element of the
psychological infrastructure of the creators of molecular biology. As a matter of
fact, the physicists among the new group were going to create a new approach
to biology [loll.
1.5.1.1.1
The Chemist’s Look (1021

I 47
The HofFneister-Fischer Theory of Protein Structure was made public in 1902

[103, 1041. Accordingly, proteins consist of polypeptide chains in which the
individual a-amino acids are linked to one another through amide (peptide)
bonds formed between the COOH group of one amino acid and the NH2
group of the next amino acid. The structure of proteins, Linus Pauling has
demonstrated, some time later, how deep knowledge of chemistry can lead
to general rules [105]. The nature of the strong peptide bond, the role of
weak hydrogen bonding, and the importance of complementarity [lo61 were
such rules used in model building: one of Pauling’s methods to work out the
structure of bio-macromolecules.
Stepwise protein synthesis normally requires [ 1071

protection of the amino group of the first amino acid and the
carboxy group of the next amino acid;
activation of the carboxy group of the amino acid carrying the
protected amino group to form a peptide bond; and finally,
removal of the protecting groups.
Polypeptide synthesis on insoluble polymer supports was pioneered by R. B.

Merriield [108].This method could be automated and has facilitated protein
synthesis enormously [ 1091. Chemical ligation of even unprotected peptide
segments has recently been reported [IlO].
To summarize: systematic variation of structure with the aim of developing
peptides for therapeutic use gives the synthetic chemist a good excuse for
chemical synthesis. a-Amino acids, obtained from natural sources or from
the synthetic chemist’s laboratory, play a trailblazing role in the gradual
growth of chemical biology. For the synthetic protein chemist they are
the obvious building blocks, for the teaching chemical generalist they are
ideal demonstration objects with an unmistakable structural profile: two
unlike functional groups and - with the exception of glycine - at least one
stereogenic center within the smallest possible space. Nearly 50 years were
to pass from Emil Fischer’s view that synthetic chemistry should contribute
to the solution of biological problems [30] to Du Vigneaud’s synthesis of the
neuropeptide oxytocin [ 1111. Preparative stumbling blocks in the selective
protection and/or activation of functional groups as well as in the effective
separation of complex reaction products, first had to be cleared from the path.
Methodological progress toward the achievement of automated solid-phase
synthesis, with or even without utilization of protecting group technology,
finally made peptide synthesis more or less a routine matter. Sophisticated
methods have been developed to ligate smaller peptide segments together to
make larger peptides. As far as larger proteins are concerned, the chemist’s
ability to control their structure (and functions) specifically is still in its
infancy.
48
I 1.5.1.1.2
The Biochemist’s Look [112]

In his study of endergonic protein genesis,3s)the biochemist is driven by
the desire to understand how the energy barrier from the amino acids to
the peptide is overcome [113]. Paul C. Zamecnik, Mahlon Hoagland, and their
colleagues developed and used a cell-free system for the in uitro study of the
mechanistic details of protein genesis [114]. By the use of radioactive amino
acids, it could be shown that, in an initial step, enzymatic activation of the one
amino acid out of 20 induced by the hydrolysis of ATP took place following
the reaction:
Amino acid + ATP Enzyme, AMP-amino acid residue:enzyme

+pyrophosphate
The resulting adenylated amino acid appears to be tightly bound to its specific
enzyme, the corresponding aminoacyl-tRNA synthetase. without leaving its
enzyme, the former, in a consecutive step, reacts with a low-molecular-weight
RNA (called soluble RNA = sRNA, later more logically known as transfer RNA
= tRNA) to afford an aminoacyl-tRNA [115,116].
AMP-amino acid residue:enzyme GTP Amino acid residue-tRNA

+
+tRNA + AMP + enzyme
This transacylation furnishes conjugates that structurally bridge the gap
between amino acids and their ordered arrangement in proteins.
1.5.1.1.3 The Molecular Biologist’s Look [117]

Aminoacyl-tRNAs not only bridged the gap between activated amino acids
and their ordered arrangement in proteins but they also, rather dramatically,
brought together the experimental biochemist and the theoretical molecular
biologist [113, 1181. The biochemist, beyond biogenesis, takes a lively interest
in flow of matter and energy during metabolism. The molecular biologist takes
additional interest in the flow of genetic information during gene expression
on the one-way road: D N A + RNA + Protein. M. Hoagland [115] and
P. C. Zamecnik [116]with their sRNAs acted as the experimental biochemists
while Francis Crick, by offering his adaptor hypothesis [119], figured as the
theoretical biologist. Several years, before sRNAs were discovered, Crick had
already proposed 20 types of adaptor-RNAmolecules, which could line up along
an unspecified template-RNA, and each bind to a particular amino acid. In his
own words: “one would require twenty adaptors, one for each amino acid, and
separate enzymes would be needed to join each adaptor to its cognate amino
35) We distinguish in this essay products of

protein synthesis which were designed by man
from products of protein genesis which were
produced by evolution.
acid. Thus one is lead to suppose that after the activating step, discovered by
I 49
Hoagland and described earlier (vide supra), some other more specific step is
needed before the amino acid can reach the template”.
Which template? Several observations had excluded rRNAs from being
candidates for acting as templates. A cell, for example, could make a new type
of protein without making a new type of ribosome. The template-RNA was
finally disinterred as a class of unstable intermediates, self-explanatorilycalled
messenger-RNAs ( ~ R N A s ) ~When ~ ) . J . D. Watson informed the scientific
community “About the Involvement of RNA in the Synthesis of Protein”
[117a]he could begin with the sentence: “The ordered interaction of the three
classes of RNA controls the assembly of amino acids into protein”.
Now essential details in brief: protein genesis (translation) is the central event
in molecular biology. It takes place in the incredibly complex machinery3’)
of the ribosome [124], where the syntactic structure of ribonucleic acids is
translated into the syntactic structure of proteins. During the translation
process, the information contained in a triplet codon of mRNA is decrypted by
an anticodon of a tRNA molecule, according to the instructions of the genetic
code. The genetic code is an abstract scheme for the redundant correlation of 64
“words” (nucleoside triplets) in the language of nucleic acids with 20 “words”
(canonical amino acids) in the language of proteins. The synthetic chemist
accepts the limitation on the number of amino acid building blocks as the
price for his readymade use of the ribosomal protein generating system. The
undisputed leading actors in the translation process at the stage of information
transfer from ribonucleic acids to proteins are aminoacyl-tRNAs [ 1251. These
are conjugates made up of proportions of both biopolymer types (language
systems), produced through esterification of an amino acid with a tRNA. A
particular tRNA with its anticodon corresponding to a specific amino acid is
covalently coupled (esterified) with precisely this amino acid. The esterification
takes place through the help of an enzyme (an aminoacyl-tRNA synthetase)
capable of specifically recognizing and coupling that particular tRNA and its
cognate amino acid [126].Whilst the self-assembly of mRNA and tRNA during
translation is due to codon-anticodon interaction, based on Watson-Crick
36) Messenger-RNAs were the last of the RNA 37) In an urgent appeal, we are certainly going to
trio engaged in protein genesis, to be de- follow henceforth, Carl Woese [123] requests
tected [120]. A further type of RNA has been to stop looking at an organism as a molecular
discovered as a widespread, universal tool machine. The machine metaphor, according
in biology for gene regulation by means of to his view, overlooks much of what biology
antisense-like interactions [121]. It is called is. To understand living systems in any deep
inductive RNA (RNAi) and is produced from sense, “we must come to see them not
double stranded RNA in a cascade of enzy- materialistically, as machines, but as stable
matic processes by a set of specific RNAses. complex, dynamic organization”.
Several regulatory pathways involving RNAi
are known in many eukaryotes, including
plants and mammals. RNAi is used exten-
sively as a tool for research and its therapeutic
potential is getting more and more obvious
[122].
50
I pairing of complementary nucleobases, the mutual recognition of a tRNA and
its cognate synthetase during aminoacyl-tRNA formation is due to molecular
shape complementarity.
1.5.1.2 The Genetic Code [127]
1.5.1 2.1 Cracking the Genetic Code

The genetic code was cracked in the early 19GOs, beginning with investigations
by Marshall Nirenberg and Heinrich Matthaei by using a cell-free E. coli
system. The N I H researchers, in an inaugural experiment demonstrated that
the homopolymer polyuridylic acid coded for the nonnatural protein poly-
phenylalanine [ 1281. Clearly, the natural system of protein genesis would
translate any appropriate message, natural or artificial, into a polypeptide
chain, natural or artificial [116].
1.5.1.2.2 Expanding the Genetic Code
By Natural Selection
The genetic code has the potential for 64 (=43) triplet codons, 61 of which
redundantly specify the 20 canonical amino acids. The methionine-specifying
triple code AUG may take on the role of a starting signal at the beginning
of protein synthesis: it thus has a double function. Three triplet codes in a
mRNA - UAA (ochre), UGA (opal), and UAG (amber) - known as nonsense
codons, specify no amino acids; that is, there are no tRNAs with complementary
anticodons for these codons. As a consequence, translation breaks off here.
The nonsense codons are also, therefore, termed stop signals (termination
codons). Broader roles in protein genesis, however, have also been established
for two of these three stop signals in recent years. In E. coli (and also in a
whole range of other organisms) the UGA codon may be redefined to perform
one of two different functions: either it may function as a stop codon and thus
end the elongation of the protein chain under construction, or further growth
of the polypeptide chain may carry on with incorporation of selenocysteine
[129],not a member of the standard set of canonical amino acids. Which of
the two instructions is followed by the translation system is dictated by the
secondary and tertiary structure of the mRNA to be decrypted (and possibly by
protein factors). Similarly, structural alterations in mRNA are able to modify
the programming of the UAG codon: once more, a codon that continues a
translation in progress, in this case through the incorporation of pyrrolysine
[130], is produced from a stop codon. The genetic code is thus naturally
expanded from the standard set. Instead of the original 20 amino acids, 22
amino acids specified by mRNA sequences are currently recognized. Further
as yet unrecognized extensions of the genetic code through natural selection
cannot be excluded. Why no sense codon has (yet) been found to be doubly
coded, is unclear. The discovery that the genetic code, as a result of natural
I 51
selection, already has more than 20 amino acid building blocks for protein
genesis in store, poses the question of whether the genetic code might also be
expandable by design; that is, whether amino acids not specified by the genetic
code in their original version might be introducible into a polypeptide chain
by translation.
By Design [131]
Peter G. Schultz, a leading protagonist of the movement to consider biology
an engineering discipline, is aiming at the construction of new proteins and,
eventually of new organisms with enhanced properties. Two alternatives for
site-specific in vivo incorporation into proteins, of amino acids not specified
by the genetic code in their original version, have been designed to achieve
that goal: systematic reassignment of three-base nonsense codons or use of
supersized codons.
The addition of a non-canonical amino acid to the genetic code requires - in
the first case - additional components of the protein producing system: a
noncanonical amino acid, an exogenous tRNA/aminoacyl-tRNA synthetase
pair, and an unique codon that specifies the amino acid of interest.
Orthogonality between the exogenous translational components (Scheme 1-30)
and their endogenous opposite numbers is the key feature of this approach.
With the effect
that the codon for the noncanonical amino acid should not
encode a canonical amino acid;
that the new tRNA or the cognate aminoacyl-tRNA synthetase
should not cross-react with any endogenous tRNA/synthetase
pair; and
that the new synthetase should recognize only the
noncanonical and not any of the canonical amino acids.
A completely autonomous bacterium with a 21 amino acid genetic code was
engineered. The bacterium can generate p-aminophenylalanine from basic
carbon sources and incorporate this amino acid into proteins in response to
the amber nonsense codon (1321.
As the restriction of non-coding triplet codons limits the number of non-
canonical amino acids, the question arises as to whether or not expansion of
the genetic code by use of a supersized codon and cognate tRNA with an ex-
panded anticodon loop might be possible. A study Exploring the Limits of Codon
and Anticodon Size [133] reveals that the E. coli ribosome is capable of using
codons of three to five nucleobases. The tRNAs that decode these codons are
most efficient with a Watson-Crick complementary anticodon containing two
additional nucleotides on either side of the normal-sized anticodon in the loop.
An orthogonal synthetase/tRNA pair was designed and constructed, which
site-specifically incorporates a noncanonical amino acid (L-homoglutamin)
into proteins of E. coli in response to the four-base codon AGGA [134].
J Chemistry and Biology - Historical and Philosophical Aspects
52
I
Scheme 1-30 Incorporation of (a) canonical (yellow) and (b) noncanonical (red) amino
acids into proteins in vivo.
1.5.2
Antibodies
The ribosomal system is not the only evolutionary accomplishment the syn-
thetic chemist might use in pursuit of his ends. The immune system offers
an example of how a biological solution can successfully be brought to exploit
antibodies as enzymatic catalysts. As far as their functions are concerned,
enzymes and antibodies normally are quite different. Enzymes have been
selected for the transition state of a catalyzed reaction over millions of years
[105].Antibodies have been selected for their affinity for the immunogen over
a period ofweeks [135].Ifthe immunogen were a transition state analogue, the
resulting antibodies should catalyze the appropriate reaction. Richard A. Lemer
and Peter G. Schultz with their respective colleagues have designed molecules
1. I; Bringing Biological Solutions to Biological Problems
that could be used to guide the process of clonal expansion and somatic muta-
I 53
tion to generate catalytic antibodies for a variety of reactions [136].Rather than

going into details here, we refer to the authoritative book on catalytic antibodies
11371. The various articles ofthat book make for interesting reading: for the syn-
thetic chemist who wants to design new catalysts as well as for the molecular
biologist who wants to gain structural insight into antibody evolution.
1.6
Bringing Biological Solutions to Biological Problems
The composition of this essay followed the matrix
chemical problems biological problems
Biological answers to biological questions are, of course, given by Nature

directly. Man may use the complex systems of Nature with the aim to
correct a fault (as, e.g., was done by Robert Edwards and Patrick Steptoe
[ 1381 in reproductive medicine). Reproductive medicine cannot be discussed
disregarding bioethical aspects [ 1391. The present authors are not competent
to meet the bioethical requirements. For this reason, reproductive medicine is
not further commented on.
Up to now synthetic chemistry has been the dominant part of our reflection.
Now synthetic biology comes in to meet the requirements of the sophisticated
observer who wants to be informed about the newest development. At any
rate, the fundamental question, WHAT IS LIFE? comes up. Under this title,
two essays have been published; one by Erwin Schrodinger [140] in 1944 and
the other by J . B. S. Haldane [141] in 1949. While the former focused on the
physical aspect of the living cell, the latter considered life essentially as a
pattern of chemical processes. A very pragmatic point of view was formulated
in 1994 by Antonio Lazcano 11421 with the statement: “Life is like music, you
can describe it, but not define.”
In a state-of-the-art survey, Biology and the Future o f M a n 11431, of the US
National Academy of Sciences, the chances to realize the dream of a man-made
cell were pondered. The conclusion reached was: “Those who are hopeful about
synthesizing a cell in the foreseeable future have every reason to retain their
optimism.” However, they should be warned against false claims. Synthesis
of life is one such false claim. Living things (i.e., a cell) can be synthesized but
not life itself, and that is what people really mean when they are talking about
synthesizing life.
A question that keeps busy scientists in chemistry as well as in biology
is about where the line separating inanimate from animate matter can be
I
54
I drawn.
Chemistty and Biology - Historical and Philosophical Aspects
In the past it has been tried to link the problem to the question of
life’s origin in terms of molecular evolution [144]. Recently, sequencing of
the human and other complete genomes has shed some new light on this
field. The question of what the minimal set of genes would be necessary
for a living organism can be put more concisely in the context of what
is now called synthetic biology [145]. Both approaches, the top-down way of
deactivating more and more genes of an existing species [146]and the bottom-
up way of assembling genes to build an organism with a fully synthetic
genome [147],have not yet reached the goal to explain the transition from the
inanimate to the animate world. On the one hand, results obtained through
different methods to identify the minimal set of genes that constitute a living
organism point to roughly 250 genes [148]. On the other hand, none of
the synthetic constructs obtained so far covers the central functionality of
life, self-construction, metabolism, adaptation, self-repair, reproduction, and
evolution [149].
Nonetheless, the bottom-up route has turned into an engineering approach to
synthetic biology [150].The strategy is to combine predefined DNA modules,
so-called bio-bricks that can be combined to bio-circuits, designed to be
implementations of biological functions [ 1511. In that sense, synthetic biology
is seen as the successor of molecular cloning, in particular, with respect to
safety issues.
1.7
EPI LOCUE
To round offthis essay, we point to two issues gaining more and more emphasis
in chemistry. One thing is the problem of shared use of the limited sources of
energy and raw materials. The other thing is the concept of a total synthesis, in
particular for complex natural substances. Both topics underline that organic
chemistry is far from being pure routine applying a comprehensive toolbox
to solve any problem in synthesis [ 1521. Medical therapeutics, agrochemicals,
and high-performance materials must be provided by organic chemistry to
fulfill global needs.
1.7.1
The Fossil Fuel Dilemma o f Present Chemical Industry
For chemical industry, the interdependence of energy source and raw material
supply is typical. This double function of fossil fuel to act as a source of raw
material supply as well as an energy source will have to be terminated in
a not-too-distant future [153]. Being the main source of raw material, fossil
fuel should be maintained as long as possible for the chemical industry. A
final way out to disentangle energy requirement and raw material supply
would be to find new sources for one field or the other. Nuclear energy,
I
1.7 EPlLOCUE 55
despite political moves to dispense with nuclear power, could play a role
as an alternative to fossil fuel. With petroleum supplies dwindling, there
is increasing interest in selective methods for transforming other carbon
feedstocks into hydrocarbons suitable for transportation fuel. The reductive
oligomerization of CO and H l to produce hydrocarbons (specificallyn-alkanes)
with highly controlled molecular weight (Fischer-Tropsch process [154]) from
the vast reserve of coal, natural gas, oil, or biomass is one such process that
was developed in the 1920s. The Goldman-Brookhart process (tandem alkane
dehydrogenation-olefin metathesis [155]) is of a similar kind, but of recent
origin.
1.7.2
Two Lessons From the Wealth o f Published Total Syntheses
The final proof of the structure of a natural product after the latter has also
been synthesized in the chemist’s lab was, for a long time, common procedure
[156]. In a few cases, disagreement raised a few eyebrows. This was the case
for patchouli alcohol and for a molecule called hexacyclinol [157]. Quinine is
an example of the difficulties associated with the notion of a total synthesis.
Shouts [35, 37,1581 and murmurs [llb,159] have been expressed to comment
on the wealth of total syntheses of natural products performed in the second
half of the twentieth century.
1.7.2.1 Synthetic Lesson from Patchouli Alcohol: The Trouble with “the Last
Structural Proof’ [160]
The peculiar case of patchouli alcohol (87) (Scheme 1-31) was told and
commentated by Jack D. Dunitz [IbOa]. Following W. H. Perkin’s jun.
advice [I561 to perform, as a final proof of structure a total synthesis of
a natural product 87 was synthesized [IGOc]. The synthetic product proved
to be identical to sesquiterpene whose structure had been derived from
the results of a long series of chemical experiments lasting more than
50 years and apparently confirmed in 1961 by total synthesis [IGOc]. In
spite of this, X-ray structure determination [IbOa] revealed that the accepted
structure of patchouli alcohol was wrong. A careful reinvestigation showed
that during chemical degradation as well as during synthesis a rearrangement
of the molecular skeleton had taken place. The first reaction step of the
chemical degradation (acetate pyrolysis affording patchoulene 88) and the last
reaction step of the chemical synthesis (hydrolysis of the epoxide 89 obtained
from 88) were accompanied by a rearrangement proceeding in precisely
the reverse direction of the rearrangement in the other case. Taking this
56
Degradation 87 Synthesis
a7 88
t i
89
(b)
Scheme 1-31 Synthesis and degradation of Patchouli alcohol.
finding into consideration, a new synthetic approach furnished 87 without any

difficulty [lGOd].
1.7.2.2 Synthetic Lesson From Quinine 90: The Trouble with Formal Total
Syntheses [161a]
In the period between 1918 and 2001, a series of publications appeared that
changed the claim of the total synthesis of 90 (Scheme 1-32) as a fact into a
myth. It started with a paper of Rabe and Kindler in 1918 [lGlb]on the partial
synthesis of 90 from quinitoxine (91),via quininone (92) (Scheme 1-32a).91 is
a relais compound to 90, since it can easily be made from 90. In 1944 and 1945,
Woodward and Doring published two papers [lGle]where they linked the par-
tial synthesis of Rabe and Kindler to their own synthesis of 91 (Scheme 1-32b),
taking the combination as a total synthesis of 90. Not being convinced of the
view of Woodward and Doring, Stork published a new total synthesis of 90
1.7 EPILOGUE
I 57
92 90 9-epf-quinine
quinidine 9-epr-quinidine
HO HOP Me N
A HO F MeN , Ac - qN, 0
Me
Ac
isoquinoline-7dl mixture of
stereoisomers
OMe
91 90
Scheme 1-32 Synthesis of 90. (a) The Robe-Kindler partial

synthesis of 90 I161 b]. (b) The Woodward-Diin'nglRabe-Kindler
formal total synthesis of 90 [161e]. (c) The Stork total synthesis of
90 [161fl.
58
J.-+.OTBS .POTBDPS
oAf=
OTBDPS
94
Scheme 1-32 (Continued)
in 2001 [Iblfl. He started from the Taniguchi lactone (94) and proceeded via
desoxyquinine (95) (Scheme 1-32c).According to Stork, a distinction between
a real total synthesis and a formal one is necessary. Accordingly, the work of
Woodward and Doring is an example of a formal total synthesis.
Acknowledgments
Our own investigations on multicomponent simultaneous procedures were

supported by the German Ministry of Education and Research and carried out by
a team ofpostdoctoral fellows. In addition to these colleagues whose names are
mentioned in the references, Susanne Feiertag, Stefan Kienle, Stefan Raddatz,
Jochen Muller-lbeler,Jochen Muth, Christoph Brucher, Heike BehrensdorJ; Andreas
Kappel, and Marc Pignot have contributed to our understanding of dynamic
variations. Oliver Boden took care of the equipment for the electronic version
of the manuscript. We are indebted to n e o d o r a Ruppenthal for patient and
skillful secretarial help. The greater part of this essay has been translated
from German into English by Dr. Andrew Beard. We are grateful to the
mentioned persons for their assistance and to the indicated institution for its
generous support. Last but not least, we would like to emphasize that it was
Albert Eschenmoser's idea to use p-RNA or analogs for selecting appropriate
candidates from a self-assembly of a dynamic variation.
References 159
References
1. (a) F.J. Ayla, T. Dobzhansky, (Eds.), Ribosome Structure, Eds.: K.H.

Studies in the Philosophy of Nierhaus, D.N. Wilson, Wiley-VCH,
Biology-Reduction and Related Weinheim, 2004.
Problems, Macmillan, London, 1974; 10. (a) M. Seefelder, Indigo-Kultur,
(b) J. Cornwell, (Ed.),Nature’s Wissenschaft und Technik, 2nd ed.,
Imagination-The Frontiers ofscientijjc ecomed Verlagsgesellschaft.
Vision, Oxford University Press, Landsberg, 1994; (b) W. Wetzel,
Oxford, 1995; (c) G.R. Bock, J.A. Natunvissenschaften und Chemische
Goode (Eds.),Novartis Foundation Industrie in Deutschland, Franz
Symposium 213, The Limits of Steiner Verlag, Stuttgart, 1991;
Reductionism in Biology, John Wiley (c)W. Abelshauser, (Ed.), Die
and Sons, Chichester, 1998; BASF- Eine Unternehmensgeschichte,
(d) F. Crick, The Astonishing Verlag C.H. Beck, Munchen, 2002;
Hypothesis (Introduction), Simon & (d) E. Baumler, Ein Jahrhundert
Schuster, New York, 1995. Chemie (zum 1OOjahrigen Jubilium
2. A. Stephan, Emergenz-Von der der Farbwerke Hoechst AG),
Unvorhersagbarkeit zur Dusseldorf, 1963; (e) E. Steingruber,
Selbstorganisation, Dresden Indigo and indigo colorants,
University Press, Dresden, 1999. Ullmann’s Encyclopedia ofhdustrial
3. (a) Several authors, Special section Chemistry, 5th ed., Vol A14, Verlag
on complex structures, Science 1999, Chemie, Weinheim.
284,79; (b) T. Vicsek, The bigger 11. (a) C.A. Russell, Role of synthesis in
picture, Nature 2002, 418, 131; organic chemistry, Ambix 1987, 34,
(c) J.M. Ottino, Engineering complex 169; (b) J.W. Cornforth, The trouble
systems, Nature 2004, 427, 399. with synthesis, Aust. /. Chem. 1993,
4. (a) Z.N. Oltvai, A.-L. Barabasi, Life’s 46, 157.
complexity pyramid, Science 2002, 12. R.B. Woodward, in Perspectives in
298, 763; (b) L.H. Hartwell, Organic Chemistry, Ed.: A. Todd,
7.7. Hopfield, S. Leibler, Interscience Publishers, New York,
A.W. Murray, From molecular to 1956, p. 155.
modular biology, Nature 1999, 402, 13. F. Wohler, Uber kunstliche Bildung
c47. des Harnstoffs, Ann. Phys. Chem.
5. Several authors, Special section on 1828, 12, 253.
networks in biology, Science 2003, 14. J. Weyer, 150 Jahre
301, 1863. Harnstoffsynthese, Nachr. Chem.
6. Several authors, Special section on Tech. Lab. 1978, 26, 564.
systems biology, Science 2002, 295, 15. C Voigt, Immer eine Idee
1661. besser-Forscher und Erfinder der
7. M. Rees, Our Cosmic Habitat, Degussa, Degussa AG, Frankfurt am
Weidenfeld & Nicolson, London, Main, 1998.
2001. 16. A. von Baeyer, Zur Geschichte der
8. M. Eigen, R. Winkler-Oswatitsch, Indigo-synthese, Ber. Dtsch. Chem.
Steps Towards Lqe; (a) Part 11, Ges. 1900, 33, LI, (Sonderheft).
Chapter 4; (b) Part 111, Oxford 17. G. Quinkert, E. Egert, C. Griesinger,
University Press, Oxford, 1992. Aspects oforganic Chemistry, Verlag
9. (a) H.-J. Rheinberger, Toward History Helvetica Chimica Acta, Basel, 1996;
of Epistemic Things-Synthesizing (a) p. 2; (b) p. 55; (c) Fig. 5.4;
Proteins in the Test Tube, Stanford (d) Section 10.2.6; (e) p. 5 and p. 79;
University Press, Stanford, 1997; (4 Section 7.5.
(b) H.-J. Rheinberger, A history of 18. B.D. Ensley, B.J. Ratzkin,
protein biosynthesis and ribosome T.D. Osslund, M.1. Simon,
research, in Protein Synthesis and L.P. Wackett, D.T. Gibson,
60
Expression of naphthalene oxidation products, Pure Appl. Chem. 1977, 49,
genes in Escherichia coli results in the 1241; (b) B.M. Trost, Atom
biosynthesis of indigo, Science 1983, economy-A challenge for organic
222, 167. synthesis, Angew. Chem., Int. Ed.
19. (a) Zhi-Qiang X, M.H. Zenk, Engl. 1995, 34, 259; (c) J.F. Hartwig,
Biosynthesis of indigo precursors in Raising the bar for the “Perfect
higher plants, Phytochemistry 1992, Reaction”, Science 2002, 297, 1653.
31, 2695; (b) H. Marcinek, 26. H.C. Kolb, M.G. Finn,
W. Weyler, B. Deus-Neumann, K.B. Sharpless, Click chemistry:
M.H. Zenk, Diverse chemical function from a few
Indoxyl-UDPG-glucosyltransferase good reactions, Angew. Chem., Int.
from baphicacanthus cusia, Ed. Engl. 2001, 40, 2004.
Phytochemistry 2000, 53, 201. 27. A. Eschenmoser, in Neuorientierung
20. T. Maugard, E. Enaud, P. Choisy, der Chemie-Mode oder mehr?
M.D. Legoy, Identification ofan Podiumsdiskussion,Aventis
indigo precursor from leaves of isatis Deutschland, Frankfurt am Main,
tinctoria (Woad), Phytochemistry 2002.
2001, 58,897. 28. G.S. Hammond, Restructuring of
21. (a) E.J. Corey, M. Ohno, R.B. Mitra, chemistry and chemical curricula,
P.A. Vatakencherry, Total synthesis Pure Appl. Chem. 1970, 22, 3.
of longifolene, J . Am. Chem. SOC. 29. A. Eschenmoser, Various comments
1964, 86,478; (b) E.J. Corey, General made on organic synthesis and life
methods for the construction of sciences, in Chemical Synthesis-
complex molecules, Pure Appl. Chem. Gnosis to Prognosis, Eds.:
1967, 14, 19; (c) E.J. Corey, C. Chatgillaloglu, V. Snieckus,
Xue-Min Cheng, The Logic of Kluwer Academic Publishers,
Chemical Synthesis,Wiley, New York, Dordrecht, 1996.
1989; (d) E.J. Corey, The Logic of 30. E. Fischer, Synthetical chemistry in
Chemical Synthesis, Nobel Lectures its relation to biology, 1.Chem. SOC.
Chemistry 1981-1990, World 1907,1749.
Scientific, Singapore, 1992, p. 686. 31. (a) E. Fischer, Bedeutung der
22. S. Warren, Desigrting Organic stereochemischen resultate fur die
Syntheses,Wiley, Chichester, 1978. physiologie, Ber. Dtsch. Chem. Ges.
23. (a) E.J. Corey, W. Todd Wipke, 1894, 27, 3228; (b) D.E. Koshland, Jr,
Computer-assisted design of complex The key-lock-theoryand the
organic syntheses, Science 1969, 166, induced-fit-theory,Angew. Chem., rnt.
178; (b) E.J. Corey, Ed. Engl. 1995, 33, 2375.
Computer-assisted analysis of 32. A. Todd, J.W. Cornforth, Robert
complex synthetic problems, Q. Rev. Robinson, Biographical Memoirs of
1971, 25, 455; (c) E.J. Corey, A.K. the Fellows of the Royal Society,
Long, S.D. Rubenstein, 1976, 22, 415.
Computer-assisted analysis in 33. (a) E. Dane, Synthesen in der Reihe
organic synthesis, Science 1985, 228, der Steroide, Angew. Chem. 1939, 52,
408. 655; (b) G. Singh, Structure of Dane’s
24. (a) R.B. Woodward, Totalsynthese adduct,]. Am. Chem. SOC.1956, 78,
des chlorophylls, Angew. Chem. 1960, 6109; (c) G. Quinkert, M. Del Grosso,
72, 651; (b) R.B. Woodward, A. Bucher, J.W. Bats, G. Durner,
Fundamental studies in the E. Dane’s route to estrone revisited,
chemistry of macrocyclic systems Tetrahedron Lett. 1991, 32, 3357;
related to chlorophyll, Ind. Chim. (d) G. Quinkert, M. Del Grosso,
Belg. 1962, 11, 1293. A. Doring, W. Doring, R.I. Schenkel,
25. (a) D.H.R. Barton, The invention of M. Bauch, G.T. Dambacher, J.W.
reactions useful for the synthesis of Bats, G. Zimmermann, G. Durner,
specifically fluorinated natural Total synthesis with a chirogenic
References I 6 1
opening move demonstrated on Diels-Alder reaction constituting the

steroids with estrone or key step of the quinkert-dane estrone
18a-Homoestrone skeleton, Helv. synthesis, Eur. J . Org. Chem. 2003,
Chim. Acta 1995, 78, 1345. 1661, and earlier papers.
34. R.B. Woodward, Experiments on the 44. Qi-Ying Hu, P.D. Rege, E.J. Corey,
synthesis of estrone, 1.Am. Chem. Simple, catalytic enantioselective
SOC.1940, 62, 1478. syntheses of estrone and desogestrel,
35. A. Eschenmoser, RBW, Vitamin B12, 1.A m . Chem. Soc. 2004, 126,5984.
and the Harvard-ETH Collaboration, 45. (a) G. Quinkert, Five Decades of
in Robert Bums Woodward, Eds.: O.T. Steroid Synthesis, Vorlesungsreihe
Benfey, P.J.T. Morris, Chemical Schering, Berlin, 1988, Heft 19;
Heritage Foundation, Philadelphia, (b) G. Quinkert, M. Del Grosso,
2001. Progress in the Diels-Alder reaction
36. G. Quinkert, M.V. Kisakurek, From means progress in steroid synthesis,
Molecular Structure Towards Biology, in Stereoselective Synthesis, Eds.:
Verlag Helvetica Chimica Acta, Zu- E. Ottow, K. Schollkopf, B.G. Schulz,
rich, 2001, (a) p. VII; (b) Section 3.2.1. Springer Verlag, Berlin, 1993, S. 109.
37. (a) G. Quinkert, H. Stark, 46. K. Nicolaou, S.A. Snyder, T. Montag-
Stereoselective synthesis of non, G.E. Vassilikogiannakis, The
enantiomerically pure natural Diels-Alder reaction in total
products-estrone as example, Angew. synthesis, Angew. Chem., Int. Ed.
Chem., lnt. Ed. Engl. 1983, 22, 637; Engl. 2002, 41, 1668.
(b) B. List, J.W. Yang, The organic 47. (a) M.B. Groen, F.J. Zeelen, Steroid
approach to asymmetric catalysis, total synthesis, Red. Trav. Chim.
Science 2006, 313 1584. Pays-Bas 1986, 105,465; (b) F.J.
38. (a) K.B. Sharpless, Searching for new Zeelen, Steroid total synthesis, Nat.
reactivity, Nobel Lecture Chemistry Prod. Rep. 1994, 607.
2001; (b) S.Y. KO, A.W.M. Lee, 48. G . Quinkert, W.-D. Weber,
S. Masamune, L.A. Reed, 111, K.B. U. Schwartz, H. Stark, H. Baier,
Sharpless, F.J. Walker, Total G. Durner, Hochselektive
synthesis of the L-Hexoses, totalsynthese von 19-Nor-Steroiden
Tetrahedron 1990, 46, 245. mit photochemischer Schlusselreak-
39. R. Noyori, Asymmetric catalysis: tion: Racemische zielverbindungen,
Science and opportunity, Nobel Liebigs Ann. Chem. 1981, 2335.
Lecture Chemistry 2001. 49. G . Quinkert, U. Schwartz, H. Stark,
40. E.J. Corey, Catalytic enantioselective W.-D. Weber, F. Adam, H. Baier,
Diels- Alder reactions: Methods, G. Frank, G. Durner, Asymmetrische
mechanistic fundamentals, totalsynthese von 19-Nor-Steroiden
pathways, and applications, Angew. mit photochemischer Schlussel-
Chem., Int. Ed. Engl. 2002, 41, 1650. reaktion: Enantiomerenreine
41. S. Drenkard, J. Ferris, A. Eschen- zielverbindungen, Liebigs Ann. Chem.
moser, Chemie von a-Amonitrilen, 1992,1999.
Helv. Chim. Acta 1990, 73, 1373. 50. T.A. Appel, The Cuvier-Geoffrey
42. (a) D. Seebach, A.K. Beck, A. Heckel, Debate, Oxford University Press, New
TADDOL and its derivatives-our York, 1987.
dream of universal chiral auxiliaries, 51. M. Ruse, Evolution, in 7’he Oxford
in From Molecular Structure Towards Companion to Philosophy, Ed.:
Biology, Verlag Helvetica Chimica T. Honderich, Oxford University
Acta, Zurich, 2001; (b) K. Narasaka, Press, Oxford, 1995.
Chiral lewis acids in catalytic asym- 52. J.P. Eckermann, Gespriiche mit Goethe
metric reactions, Synthesis 1990, 1. in den LetztenJahren Seines Lebens,
43. S.B. Tsogoeva, G. Durner, M. Bolte, C. Michel, H. Grtiters (Hrsg.),
M.W. Gobel, A C2-Chiral Deutscher Klassiker Verlag,
Bis(amidinium) catalyst for a Frankfurt am Main, 1999.
62
I 53. J. Browne, Charles Darwin, Vol. 11, Evolution- Taking Development
A.A. Knopf, New York, 2002. Seriously, Cambridge University
54. E.A. Carlson, Mendel’s Legacy, Cold Press, Cambridge, 2004; (c) C.R.
Spring Harbor Laboratory Press, Woese, A new biology for a new
Cold Spring Harbor, NY, 2004. century, Microbiol. Mol. Biol. Rev.
55. W. Johannsen, Elemente der Exakten 2004, 173, 68; (d) K.M. Weiss, The
Erblichkeitslehre, G. Fischer, Jena, phenogenetic logic of life, Nut. Rev.
1909. Genet. 2005, 6, 36.
56. F.M. Burnet, Evolution made visible, 67. J.-M. Lehn, Supramolecular
in The Evolution ofLiving Organisms, Chemistry, VCH, Weinheim, 1995.
Ed.: G.W. Leeper, Melbourne 68. P. Ehrlich, Partial cell functions,
University Press, Melbourne, 1962. Nobel Lecture Physiology or
57. M. Eigen, Viren als modelle der Medicine 1908.
molekularen evolution, 69. (a) B. Asbell, The Pill, Random
Paul- Ehrlich-Ludwig Darmstadter House, New York, 1995; L.V. Marks,
Award Lecture, Frankfurt am Main, Sexual Chemistry, Yale University
March 14th 1992. Press, New Haven, 2001; C. Djerassi,
58. (a) M. Eigen, Self-organization of This Man’s Pill, Oxford University
Matter and the Evolution of Press, Oxford, 2001; (b) G. Pincus,
Biological Macromolecules, Control of contraception by hormonal
Naturwissenschaften 1971, 58,465;
steroids, Science 1966, 153, 493.
(b) Der Code des Lebens, 3 SAT,
70. (a) A. Brzozowsky, A.C.W. Pike,
26.04.2006, DVD, ZDF, 2006;
Z. Dauter, R.E. Hubbard, T. Bonn,
(c) M. Eigen, From Strange Simplicity
0. Engstrom, L. Ohman, G.L.
to Complex Familiarity, in
Greene, J.-A. Gustafsson,
preparation.
M. Carlquist, Molecular basis of
59. G. Strunk, T. Ederhof, Machines for
automated evolution experiments in agonism and antagonism in the
vitro based on the serial-transfer oestrogen receptor, Nature 1997, 389,
concept, Biophys. Chem. 1997, 66, 753; (b) E.-E. Beaulieu, Contragestion
193. and other clinical applications of RU
60. E. Mayr, What Evolution is, 486, an antiprogesterone at the
Weidenfeld & Nicolson, London, receptor, Science 1989, 245, 1351.
2002. 71. C. Djerassi, L. Miramontes,
61. I. Rechenberg, Evolutionsstrategie ‘94, G. Rosenkranz, F. Sondheimer,
frommann-holzboog, Stuttgart-Bad Synthesis of 19-Nor-17a-
Cannstatt, 1994. ethynyltestosterone and
62. J. Maynard Smith, Concept ofprotein 19-Nor-17a-methyltestosterone, J .
space, Nature 1979, 280,445. A m . Chem. SOC.1954, 76,4092.
63. G. Quinkert, H. Bang, D. Reichert, 72. G. Quinkert, Hans Herloff Inhoffen
Variation and selection, Helv. Chim. in His Times, Eur. J. Org. Chem.
Acta 1996, 79, 1260. 2004,3727.
64. W.B. Provine, Sewall Wright and 73. C. Rufer, H. Kosmol, E. Schroder,
Evolutionary Biology, The University K. Kiesslich, H. Gibian,
of Chicago Press, Chicago, 1986. Totalsynthese von optisch aktiven
65. D.L. Hull, History of evolutionary 13-Ethyl-gonan-Derivaten, Liebigs
thought, in Encyclopedia of Evolution, Ann. Chem. 1967, 702,141.
Vol. I, Ed.: M. Pagel, Oxford 74. (a) I.V. Torgov, Progress in the total
University Press, Oxford, 2002. synthesis of steroids, Pure Appl.
66. (a) S.C. Gilbert, J.M. Opitz, R.A. Raff, Chem. 1963, 6,525; (b) C.H. Kuo,
Resynthesizing evolutionary and D. Taub, N.L. Wendler, Mechanism
developmental biology, Dev. Biol. of the coupling reaction of a vinyl
1996, 173, 357; (b) J.S. Robert, carbinol with a B-Diketone, J. Org.
Embryology, Epigenesis, and Chem. 1968,33,3126.
References I 6 3
75. H. Smith. et al., 13fi-Alkylgona- 85. (a) G. Quinkert, in High-Tech-Das
1,3,5(10)-trienes, 13fi-Alkylgon-4- neue Gesicht der Arzneimitte(forschung,
en-3-ones, and related compounds, /. H.1. Dengler, S . Meuer (Hgb.),
Chem. Soc. (London), 1964,4472. G . Fischer, Stuttgart, 1995;
76. (a) U. Eder, G. Sauer, R. Wiechert, (b) Several authors in: Special Issue of
Neuartige asymmetrische Science on Drug Discovery 2005, 309,
cyclisierung zu optisch aktiven 721-735.
Steroid-CD-Teilstticken, Angew. 86. F. Aftalion, A History ofthe
Chem., Int. Ed. Engl. 1971, 10. 496; International Chemical Industry, 2nd.
(b) Z.G. Hajos, D.R. Parrish, ed., Chemical Heritage Press,
Asymmetric synthesis of bicyclic Philadelphia, 2001.
intermediates of natural product 87. (a) G. Quinkert, D. Reichert, H.-G.
chemistry,/. Org. Chem. 1974, 39, Schaible, B. Cezanne, Final Report of
1615. the BMBF Project No. 0310792,
77. H. Hofmeister, K. Annen, Projekttrager Jiilich, 2000;
H. Laurent, K. Petzoldt, R. Wiechert, (b) G. Quinkert, Kombinatorische
Syntheses of gestodene, Drug Res. Chemie-ein Paradigmenwechsel in der
1986, 36, 781. Chemischen Synthese, Verh. Ges.
78. G. Sauer, U. Eder, G. Haffer, Dtsch. Naturforscher u. Arzte, 120.
G. Neef, R. Wiechert, Synthesis of Vers., Hirzel Verlag, Stuttgart, 1999;
D-Norgestrel, Angew. Chem., Int. Ed. (c) H . 4 . Schaible, Kombinatorische
Engl. 1975, 14, 417. Synthese codierter
79. M.J. van den Heuvel, C.W. van Verbindungsbibliotheken und
Bokhoren, H.P. de Jongh, F.J. Selektion immunsuppressiver
Verbindungen, Dissertation,
Zeelen, A partial synthesis of
University of Frankfurt am Main,
desogestrel based upon
1997.
intramolecular oxidation of an
1Ifi-hydroxy-19-norsteroid.
88. (a) W.W. Busse, R.F. Lemanske,
Recl.
Asthma, N. Engl. /. Med. 2001, 344,
Trav. Chim. Pays-Bas 1988, 107, 331.
350; (b) Several authors in: Nature
80. E.J. Corey, A.X. Huang, A short
1999, B l , 402.
enantioselective total synthesis of the 89. M. Wills-Karp, J. Luyimbazi, X. Xu,
third-generation oral contraceptive B. Schofield, T.Y. Neben, C.L. Karp,
desogestrel, /. Am. Chem. Soc. 1999, D.D. Donaldson, Interleukin-13:
121, 710. central mediator of allergic asthma,
81. B. List, Proline-catalyzed asymmetric Science 1998, 282, 2258.
reactions, Tetrahedron 2002, 58, 5573. 90. V. Prelog, Gedanken nach 118
82. Qi-Ying Hu, P.D. Rege, E.J. Corey, Semestern Chemiestudium, in
Simple, catalytic enantioselective Chemie und Geseflschaft, Ed.:
syntheses of estrone and desogestrel, G . Boche, Wissenschaftl Verlagsges,
1.Am. Chem. Soc. 2004, 126,5984. Stuttgart, 1984, p. 57.
83. (a) H. Laurent, D. Bittler, 91. D. Brohm, S. Metzger, A. Bhargava,
H. Hofmeister, K. Nickisch, 0. Muller, F. Lieb, H. Waldmann,
R. Nickolson, K. Petzoldt, Natural products are biologically
R. Wiechert, Synthesis and activities validated starting points in structural
of anti-aldosterones, J . Steroid space for compound library
Biochem. Mof. Biof.1983, 19, 771; development, Angew. Chem., Int. Ed.
(b) W. Elger, S. Beier, K. Pollow, Engl. 2002, 41, 307.
R. Garfield, S.Q. Shi, A. Hillisch, 92. (a) A. Furka, F. Sebestyen,
Conception and pharmacodynamic M. Asgedom, G. Dibo, General
profile of drospirenone, Steroids 2003, method for rapid synthesis of
68, 891. multicomponent peptide mixtures,
84. R. Wiechert, in Schering 1971- 1993, Int. /. Pep. Protein Res. 1991, 37, 487;
S. 149, Schering AG, Berlin, 2005. (b) K.S. Lam, S.E. Salmon,
I Chemistry and Biology Historical and Philosophical Aspects
64
I -
E.M. Hersh,V.J. Ruby, W.M. 1998, 128, 34984; (c) G. Quinkert,

Kazmierski, R.J. Knapp, A new type Visionen-paradigmenwechsel-
of peptide library for identifying technologieschube, in Chemie-Eine
ligand-binding activity, Nature 1991, reqe lndustrie oder weiterhin
354, 82. Innovationsmotor? Blazek &
93. (a) M.H.J. Ohlmeyer, R.N. Swanson, Bergmann, Frankfurt am Main, 2000.
L.W. Dillard, J.C. Reader, 98. C. Hamon, T. Brandstetter,
G. Asouline, R. Kobayashi, N. Windhab, Pyranosyl-RNA
M. Wigler, W.C. Still, Complex supramolecules containing
synthetic chemical libraries indexed non-hydrogen bonding base-pairs,
with molecular tags, Proc. Natl. Acad. Synlett 1999, (suppl. l),940.
Sci. U.S.A. 1993, 90, 10922; (b) H.P. 99. (a) C. Tanford, J. Reynolds, Nature’s
Nestler, P.A. Bartlett, W.C. Still, A Robots, Oxford University Press,
general method for molecular Oxford, 2001; (b) Th. Creighton,
tagging of encoded combinatorial Proteins Structures and Molecular
chemistry libraries, I. Org. Chem. Properties, 2nd Ed., Freeman, 2002;
1994,59,4723. (c) Proteins at Work, Science 2006,
94. A. Pahl, M. Zhang, K. Torok, 312(S10), 211-230.
H. Kuss, U. Friedrich, Z. Magyar, 100. (a) G.S. Stent, That was the
J. Szekely, I<. Horvath, K. Brune, molecular biology that was, Science
I. Szelenyi, Anti-inflammatory effects 1968, 160, 390; (b) G.S. Stent,
of a cyclosporine receptor binding Introduction: Waiting for the
compound, D-43787,/. Phamacol. paradox, in Phage and the Origins of
Exp. Ther. 2002, 301, 738. Molecular Biology, Eds.: J. Cairns,
95. 1.-M. Lehn, Dynamic combinatorial G.S. Stent, J.D. Watson, Cold Spring
chemistry and virtual combinatorial Harbor Laboratory Press, Cold
libraries, Chem. - Eur. J . 1999, 5, Spring Harbor, 1992, p. 3.
2455. 101 (a) M. Delbriick, A physicist looks at
96. (a) S. Pitsch, S. Wendeborn, B. Jaun, biology, in Phage and the Origins of
A. Eschenmoser, Pranosyl-RNA Molecular Biology, Eds.: J. Cairns,
(p-RNA),Helv. Chim. Acta 1993, 76, G.S. Stent, J.D. Watson, Cold Spring
2161; (b) I. Schlonvogt, S. Pitsch, Harbor Laboratory Press, 1992, p. 9;
C. Lesneur, A. Eschenmoser, B. J a m , (b) A Physicist’s renewed look at
R.M. Wolf, Pyranosyl-RNA (p-RNA): biology-twenty years later, Nobel
Duplex formation by self-pairing, Lecture Medicine, 1969.
Helv. Chim. Acta 1996, 79, 2316; 102. H.G. Khorana, Chemical Biology,
(c) M. Bolli, R. Micura, S. Pitsch, World Scientific, Singapore, 2000.
A. Eschenmoser, Pyranosyl-RNA: 103. F. Hofmeister, Uber Bau und
Further observations on replication, Gruppierung der Eiweisskorper,
Helv. Chim. Acta 1997, 80, 1901; Ergeb. Physiol. 1902, I, 759.
(d) S. Ilin, I. Schlonvogt, M.-0. Ebert, 104. E. Fischer, Uber die Hydrolyse der
B. Jaun, H. Schwalbe, Comparison of Proteinstoffe, Chem. Ztg. 1902, 26,
the N M R spectroscopy solution 939.
structures of pyranosyl-RNA and its 105. L. Pauling, Molecular architecture
Nucleo-b-peptide analogue, and biological reactions, Chem. Eng.
Chembiochem 2002,3,93. News 1946,24,1375.
97. (a) N. Windhab, Final Report of the 106. L. Pauling, M. Delbriick, The nature
BMBF Project No. 0311030, of intermolecular forces operating in
Projekttrager Julich, 2001; biological processes, Science 1940,
(b) C. Miculka, N. Windhab, 92, 77.
G. Quinkert, A. Eschenmoser, Novel 107. M. Bergmann, L. Zervas, Uber ein
substance library and supramolecular allgemeines verfahren der
complexes produced therewith, PCT Peptidsynthese, Ber. Dtsch. Chem.
Int. Appl. WO 97143232. Chem. Abstr. Gei. 1932, 65, 1192
References I 6 5
108. B. Merrifield, Solid phase synthesis, C.G. Kurland, R.W. Risebrough, J.D.
Nobel Lecture Chemistry, 1984. Watson, Unstable ribonucleic acid
109. S.B.H. Kent, Chemical synthesis of revealed by pulse labelling of
peptides and proteins, Annu. Rev. Escherichia Coli, Nature 1961,
Biochem. 1988, 57,957. 170, 581; (c) Walter Gilbert, The RNA
110. P.E. Dawson, S.B.H. Kent, Synthesis World, Nature 1986, 319, 618.
of native proteins by chemical 121. A. Fire, D. Albertson, S.W. Harrison,
ligation, A n n u . Rev. Biochem.. 2000, D.G. Moerman, Production of
69, 923. antisense RNA leads to effective and
111. V. Du Vigneaud, C. Ressler, I.M. specific inhibition of gene expression
Swan, C.W. Roberts, P.G. in C. elegance muscle, Development
Katsoyannis, S. Gardon, The 1991, 113,503.
synthesis of an octapeptide amid with 122. (a) Gregory I. Hannon, John J. Rossi,
the hormonal activity of oxytocin, 1. Unlocking the potential of the
Am. Chem. SOC.1953, 75,4879. human genome with RNA
112. P.C. Zamecnik, Historical aspects of interference, Nature 2004, 431, 371;
protein synthesis, A n n . N.Y. Acad. (b) Chistian P. Petersen, John
Sci. 1979, 325, 269. G. Doench, Alla Grishok, Phillip
113. T. Pederson, 50 years ago protein A. Sharp, The Biology of Short RNAs,
synthesis met molecular biology: the in: 7'he R N A World, 3rd Edition, Eds.:
discoveries of amino-acid activation R.F. Gesteland, T.R. Cech,
and transfer RNA, F A S E B ] . 2005, 19, J.F. Atkins, Cold Spring Harbor
1583. Laboratory Press, Cold Spring
114. P. Zamecnik, The machinery of Harbor, 2006.
protein synthesis, Trends Biol. Sci. 123. CarlR. Woese, A new biology for a
Lett. 1984, 9, 464. new century, Microbiol. Mol. Biol.
115. P.C. Zamecnik, Historical and Rev. 2004, 68, 173.
current aspects of the problem of 124. T.R. Cech, The ribosome is a
protein synthesis, Harvey Lecture, ribozyme, Science 2000, 289, 878.
1959. 125. M. Ibba, D. Soll, The renaissance of
116. M. Hoagland, Toward the Habit of aminoacyl-tRNA synthesis, E M B O
Truth, W.W. Norton & Company, Rep. 2001, 2, 382.
New York, 1990. 126. I?. Schimmel, L.R. De Pouplana,
117. (a) J.D. Watson, Involvement of RNA Footprints of aminoacyl-tRNA
in the synthesis of proteins, Science synthetase are everywhere, Trends
1963, 140, 17; (b) P.B. Moore, T.A. Biol. Sci. ( T I B S )2000, 25, 207.
Steitz, The roles of RNA in the 127. (a) F.H.C. Crick, On the genetic code,
synthesis of protein, in The R N A Nobel Lecture Physiology or
World, Eds.: R.F. Gesteland, T.R. Medicine, 1962; (b) M. Nirenberg,
Cech, J.F. Atkins, 3rd ed., Cold The genetic code, Nobel Lecture
Spring Harbor Laboratory Press, Physiology or Medicine, 1968;
Cold Spring Harbor, 2006. (c) H.G. Khorana, Nucleic acid
118. M. Hoagland, Enter transfer RNA, synthesis in the study of the genetic
Nature 2004, 431,249. code, Nobel Lecture Physiology or
119. (a) F.H.C. Crick, On protein Medicine, 1968.
synthesis, Syrnp. SOC.Exp. Biol. 1958, 128. M.W. Nirenberg, J.H. Matthaei, The
12, 138; (b) F. Crick, W h a t M a d dependence of cell-free protein
Pursuit, Basic Books, New York, 1988. synthesis in E. cali upon naturally
120. (a) S. Brenner, F. Jacob, occurring or synthetic
M. Meselson, An unstable polyribonucleotides, Proc. Natl. Acad.
intermediate carrying information Sci. U.S.A. 1961, 47, 1588.
from genes to ribosomes for protein 129. D.L. Hatfield, In Soon Choi, B.J. Lee,
synthesis, Nature 1961, 190, 576; J.E. lung, Selenocysteine a new
(6) F. Gros, H. Hiatt, W. Gilbert, addition-to the universal genetic code,
66
I I Chemistry and 6io/ogy - Historical and Philc)sophical Aspects
in Transfer RNA in Protein Synthesis, 140. E. Schrodinger, What is Lfe?,

Eds.: D.L. Hatfield, B.J. Lee, R.M. Cambridge University Press,
Pirtle, CRC Press, Boca Raton, 1992. Cambridge, 1944.
130. (a) P. Schimmel, K. Beebe, Genetic 141. J.B.S. Haldane in Philosophy of
code seizes pyrrolysine, Nature Biology, Ed.: M. Ruse, Macmillan
2004, 431, 257; (b) J.F. Atkins, Publishing Comp., New York, 1989.
R. Gesteland, The 22nd amino acid, 142. A. Lazcano in Early Lij on Earth, Ed.:
Science 2002, 296, 1409. S. Bengton, Columbia University
131. (a) L. Wang, P.C. Schultz, Expanding Press, New York, 1994.
the genetic code, Chem. Commun. 143. P. Handler, Biology and the Future
2002, I, 1; (b) J. Xie, P.G. Schultz, An o f M a n , Ed.: P. Handler, Oxford
expanding genetic code, Methods University Press, New York,
2005, 36, 227; (c) L. Wang, P.G. 1970.
Schultz, Expanding the genetic code, 144. (a) S.L. Miller, A production of amino
Angew. Chem., Int. Ed. Engl. 2005, acids under possible primitive earth
44, 34. conditions, Science 1953, I 1 7, 528;
132. (a) R.A. Mehl, J.C. Anderson, S.W. (b) S.L. Miller, L.E. Orgel, The Origins
Santoro, L. Wang, A.B. Martin, D.S. ofLfe on the Earth, Concepts of
King, D.M. Horn, P.G. Schultz, Modern Biology Series, Prentice Hall,
Generation of a bacterium with a 21 Englewood Cliffs, 1974; (c) L.E.
amino acid genetic code, J . Am. Orgel, Molecular replication, Nature
Chem. SOC.2003, 125,935; 1992,358,203.
(b) L. Wang, P.G. Schultz, A general
145. (a) S.A. Benner, A.M. Sismour,
approach for the generation of
Synthetic biology, Nature Reviews
orthogonal tRNAs, Chem. Biol. 2001,
Genetics 2005, 6, 533;
8,883.
(b) R. MeDaniel, R. Weiss, Advances
133. J.C. Anderson, T.J. Magliery, P.G.
in synthetic biology: on the path from
Schultz, Exploring the limits of
codon and anticodon size, Chem. prototypes to applications, Curr.
Biol. 2002, 9, 237. Opin. Biotechnol. 2005, 16, 476.
134. J.C. Anderson, N. Wu, S.W. Santoro, 146. (a) C.A. Hutchinson et al., Global
V. Lakshman, D.S. King, P.G. transposon mutagenesis and
Schultz, An expanded genetic code minimal mycoplasma genome,
with a functional quadruplet codon, Science 1999, 286, 2165; (b) G. Posfai
Proc. Natl. Acad. Sci. U.S.A. 2004, et al., Emergent Properties of
101,7566. Reduced-Genome Escherichia coli,
135. S. Tonegawa, Somatic generation of Science 2006, 312, 1044.
antibody diversity, Nature 1983, 147. H.O. Smith et al., Generating a
302, 575. synthetic genome by whole genome
136. P.G. Schultz, Bringing biological assembly: 4x174 bacteriophage from
solutions to chemical problems, Proc. synthetic oligonucleotides, Proc. Natl.
Natl. Acad. Sci. U.S.A. 1998, 95, Acad. Sci. 2003, 100, 15440.
14590. 148. (a) E.V. Koonin, How many genes
137. E. Keinan (Ed.),Catalytic Antibodies, can make a cell: the minimal-gene-set
Wiley-VCH, Weinheim, 2005. concept, Annu. Rev. Genomics Hum.
138. Robert Edwards, P. Steptoe, Matter of Genet. 2000, I, 99; (b) P.L. Luisi,
Lfe, W. Morrow & Company, New T. Oberholzer, A. Lazcano, The
York, 1980. notion of a DNA minimal cell, Helv.
139. (a) J. Maienschein, Whose View of Chim. Acta 2002, 85, 1759;
Lfe? Harvard University Press, (c) F. Arigoni, F. Talabot, M. k i t s c h ,
Cambridge, 2003; (b) R.M. Green, M.D. Edgerton, E. Meldrum, A
The Human Embryo Research Debates, genome-based approach for the
Oxford University Press, Oxford, identification of essential bacterial
2001. genes, Nature Biotech. 1998, 16, 851.
References 167
149. (a) P.L. Luisi, About various 158. D.H.R. Barton, The relevance of
definitions of life, Origins ofL@ and organic chemistry, Chem. Britain
Evolution ofthe Biosphere 1998, 28, 1973, 9, 149.
613; (b) B. Korzeniewski, Cybernetic 159. (a) R. Huisgen, The adventure
formulation of the definition of life, Playground of Mechanisms and
/. theor. Biol. 2001, 209, 275; (c) Y.N. Novel Reactions, in: Profiles,
Zhuravlev, V.A. Avetisov, The Pathways, and Dreams, J.I. Seeman
definition of life in the context of its (Ed.),American Chemical Society,
origin, Biogeosciences 2006, 3, 281; Washington DC, 1994, p. X X I I ;
(d) D.E. Koshland Jr.,The seven (b) P. Schmalz, Interview mit Gilbert
pillars of life, Science 2002, 295, Stork: Organische - Zukunft und
2215. Gegenwart, Nachr. Chew. Tech. Lab.
150. (a) E. Andrianantoandro, S. Basu, 1987, 35, 349.
D.K. Karig, R. Weiss, Synthetic 160. (a) J.D. Dunitz, X-Ray Analysis and the
biology: new engineering rules for an Structure of Organic Molecules, Cornell
emerging discipline, Mol. Systems University Press, Ithaca, 1978, p. 310;
Biol. 2006, 2, msb4100073; (b) P. Fu, (b) J. Fleming, Selected Organic
A perspective of synthetic biology: Syntheses, Wiley, London, 1973,
assembling building blocks for novel p. 125; (c) G. Buchi, R.E. Erickson,
functions, Biotechnol. /. 2006, 1, 690; N. Wakabayashi, Constitution of
(c) J.B. Tucker, R.A. Zilinskas, The Patchouli Alcohol, /. A m . Chem. Soc.
promise and perils of synthetic 1961, 83,927; (d) G. Buchi, W.D.
biology, Trte New Atlantis 2006, McLeod jr., J. Padilla O., Synthesis of
Spring 2006,25. Patchouli Alcohol, 1.Am. Chem. SOL.
151. A registry of standardized modules 1964, 86,4438.
can be found at http://parts.mit.edu. 161. (a) S.M. Weinreb, Synthetic lessons
152. Editorial, Beauties of Synthesis, from quinine, Nature 2001, 21 1, 429;
Nature 2006, 443, 1. (b) P. Rabe, K. Kindler, Uber die
153. K. Weissermel, Energie und Rohstoff partielle Synthese des Chinins, Ber.
entkoppeln, aber wie?, Lecture given dtsch. chem. Ges. 1918, 51, 466;
in Frankfurt am Main, Feb. 22nd, (c)T.S. Kaufman, E.A. Ruveda, The
1980, Hicom GmbH, quest for quinine: Those Who Won
http://www.hicom.de. the Battles and Those Who Won the
154. K. Weissermel, H.-J. Arpe, Industrial War, Angew. Chem. Internat. Ed.
Organic Chemistry, Fourth Edition, 2005, 44, 854; (d) ].I. Seeman, The
Wiley-VCH, Weinheim, 2003. Woodward-Doeringl Rabe- Kindler
155. A.S. Goldman, A.H. Roy, Z. Ahuja, Total Synthesis of Quinine: Setting
W. Schinski, M. Brookhart, Catalytic the Record Straight, Angew. Chem.
Alkane Metathesis by Tandem Internat. Ed. in press; (e) R.B.
Alkane Dehydrogenation-Olefin Woodward, W.E. Doering, The total
Metathesis, Science 2006, 312, synthesis of quinine, J . A m . Chem.
257. Soc. 1994, 66, 849; 1945, 67,860;
156. W.H. Perkin, Jr., Experiments on the (fl G. Stork, D. Niu, A. Fujimoto,
synthesis of the terpenes. Part I., /. E.R. Koft, J.M. Balkovec, J.R. Tata,
Chem. Soc. 1904,85,654. G.R. Dake, The first stereoselective
157. E. Marris, The proofis in the synthesis of quinine, J . Am. Chem.
product, Nature 2006, 442,492. Soc. 2001, 123, 3239.
PART II
Using Small Molecules to Explore Biology
Edited by Stuart L. Schreiber, T a r u n M. Kapoor, and Gunther Wess
ISBN: 978-3-527-31150-7
Chemical Biology
I 71
2
Using Natural Products to Unravel Biological Mechanisms
2.1
Using Small Molecules to Unravel Biological Mechanisms
Michael A. Lampson and Tarun M . Kapoor
Outlook
Experimental strategies designed around small molecule inhibitors have been

critical in advancing our understanding ofbiological mechanisms. This chapter
introduces a series of biological questions and illustrates how they have been
addressed by using small molecules to perturb protein function.
2.1.1
Introduction
Our understanding of biological processes often develops from discovering

or designing ways to perturb the process and observe the effects of the
perturbation. While genetic approaches have been widely used for this
purpose, small molecule inhibitors have several advantages as a means of
perturbing protein function. First, small molecules provide a high degree
of temporal control, generally acting within minutes or even seconds, and
are often reversible, allowing both rapid inhibition and activation of protein
function. The ability to design perturbations on short timescales has proved
particularly valuable in examining dynamic biological processes. Second, dose
can easily be controlled with small molecule inhibitors to allow varying
degrees of inhibition. Third, small molecules can be applied in multiple
biological systems, including different organisms, different cell types, and in
vitro systems. The examples discussed in this chapter illustrate how these
properties of small molecules have been exploited in designing strategies to
dissect biological mechanisms.
ISBN: 978-3-527-31150-7
72
I 2 Using Natural Products to Unravel Biological Mechanisms
2.1.2
Use of Small Molecules to Link a Protein Target to a Cellular Phenotype
Small molecules with dramatic cellular phenotypes have been used, without
knowledge of their protein target, to provide insight into biological processes.
If the effects of a small molecule are well characterized, then identification
of the protein target immediately provides a wealth of information about its
cellular functions because of the known inhibition phenotypes.
2.1 2.1 Colchicine and Tubulin

Cell division is the process by which cells dividetheir contents into two daughter
cells, each ofwhich must receive genetic material identical to that of the mother
cell. Each chromosome is replicated before cell division begins, and a complex
and highly regulated process known as mitosis has evolved to ensure that the
replicated chromosomes are equally partitioned between the two daughter
cells. Progress through mitosis is closely linked to chromosome movements
(Fig. 2.1-1(a)).Chromosomes first move to the center of the spindle, and only
after correct positioning of all chromosomes at metaphase (Fig. 2.1-1(a) iii)
do the sister chromosomes split apart at anaphase (Fig. 2.1-1(a)iv) and move
to opposite sides of the cell before the final division into two daughter cells
(Fig. 2.1-1(a)v, vi). All of these coordinated chromosome movements occur
over the course of approximately one hour. The result is that each daughter
cell receives exactly one copy of each replicated chromosome. Failure of this
process leads to loss or gain of whole chromosomes in the daughter cells, a
condition known as aneuploidy which is strongly associated with developmental
defects and human diseases such as cancer (reviewed in Ref. [I]).
Examination of fixed samples revealed the existence of a fibrous structure,
known as the mitotic spindle, which appears at each mitosis and disappears
after the chromosomes have separated. One of the great challenges in the
study of cell division has been to understand the organization and function
of the mitotic spindle. Use of the small molecule colchicine (Fig. 2.1-1(b))has
contributed to our understanding of the physical properties of the spindle
fibers and how they might drive chromosome movements, as well as their
molecular components.
The fibers that make up the mitotic spindle are optically anisotropic, or
birefringent, with different indices of refraction in different directions (i.e.,
parallel or perpendicular to the fiber axis).Exploiting this property of the fibers,
Inoue developed a sensitive polarized light microscope that allowed him to
directly observe the spindle in living cells [2]. The small molecule colchicine
(Fig. 2.1-1(b))was known to disrupt spindle function, but its mechanism of
action was not known. Using the polarized light microscope, Inoue showed that
the birefringence of the spindle fibers disappeared after colchicine treatment,
indicating loss of the fibers [3]. The time course of this effect ranged from a
few minutes to an hour, depending on the concentration. If colchicine was
removed, the fibers recovered. Small molecule inhibitors of protein synthesis
2.7 Using Small Molecules to Unravel Biological Mechanisms
I 73
(b)
Colchicine p'
I 0'
I
Replicated Sp'indle fiber
chromosome pair
Taxol
iv V vi
Fig. 2.1-1 (a) Overview o f mitosis. move in opposite directions. (v) The cell
(i) Chromosomes are replicated before divides as the cleavage furrow forms
mitosis. (ii) The spindle forms and between the separated chromosomes.
chromosomes attach to spindle fibers. (vi) Two daughter cells form, each with
(iii) Chromosomes move t o the center ofthe exactly one copy of each chromosome.
spindle at metaphase. (iv) Sister (b) Structures o f t w o small molecules that
chromosomes separate at anaphase and target microtubules: colchicine and taxol.
were used to demonstrate that the fibers recovered by assembly from an

available pool of material [4].Similar results were obtained by changing the
temperature to manipulate the fibers [S]. Together, these findings suggested
that the observed birefringence was due to oriented polymers that were in
equilibrium with free molecules in solution. The equilibrium is shifted toward
the depolymerized state by colchicine or by low temperature, and returns to
its original state after removal of the inhibitor or rewarming.
To demonstrate the potential functional significance of the spindle fiber
dynamics, the same experimental paradigm was used: perturbation of spindle
function combined with observation ofthe fibers in living cells. Treatment with
low concentrations of colchicine caused the fibers to contract slowly rather
than immediately eliminating the birefringence. As the fibers contracted,
chromosomes were pulled toward one pole of the spindle, which was anchored
at the cell surface [ 3 ] .The effect was reversible, as fibers elongated after removal
of colchicine and chromosomes moved away from the pole. This experiment
demonstrated that force could be generated by coupling polymerization and
depolymerization of the fibers to chromosome movement.
In the studies discussed above, colchicine was used to probe spindle func-
tion without knowing its mechanism of action. Tight binding to a intracellular
74
I target was implied by the low concentration (100 nM) required to arrest cells
2 Using Natural Products to Unravel Biological Mechanisms
in mitosis. A strategy was developed to isolate a colchicine-binding protein.

First, colchicine was labeled with H3 with high specific activity and tested with
a variety of cells, tissues, and organelles [GI. High binding activity was observed
with multiple preparations, including the mitotic spindle, cilia, sperm tails,
and brain tissue, that are enriched in intracellular fibers called microtubules,the
same fibers that Inoue observed in the spindle [7, 81. These results suggested
that the target of colchicine was a subunit of microtubules. Isolated sea urchin
sperm tails were dissolved to extract the colchicine-binding activity, which was
then purified by gel filtration and sedimentation over a sucrose gradient. A
single component with a sedimentation constant of GS was identified. Using
porcine brain as a starting material, the same component was isolated and
shown to bind guanosine triphosphate (GTP) [9, lo]. Because this component
was believed to be the primary constituent of microtubules, the protein was
named tubulin [Ill.
The functions of microtubules in cells depend on the activities of
numerous microtubule-associated proteins (MAPs), including regulators of
polymerization dynamics and molecular motors that move along microtubule
tracks. Identification of MAPS was made difficult by the dynamic nature of
microtubule fibers, particularly the tendency to depolymerize under conditions
used to prepare extracts for biochemical purification. The small molecule taxol
(Fig. 2.1-1(b))was shown to promote microtubule assembly and to stabilize
polymerized microtubules [12, 131, and these properties were exploited to
develop a procedure for purification of MAPS [14]. Taxol was added to brain
or cell extract to polymerize microtubules, which were subsequently isolated
together with bound MAPs. Washing with high salt released MAPS from
the microtubules, which were stabilized with taxol, so that the soluble MAPS
could be separated from the microtubules. One prominent application of this
strategy was the discovery of the founding member of the kinesin family of
microtubule-based motor proteins [15].
The potential of small molecules targeting microtubules as cancer
therapeutics was demonstrated by the vinca alkaloids, such as vincristine
and vinblastine, which have been used in the clinic for 40years. At high
concentrations (10- 100 nM), these compounds depolymerize microtubules,
which eliminates the mitotic spindle. At lower concentrations that are
used clinically, microtubules remain stable but microtubule dynamics are
suppressed. Taxol, which also inhibits microtubule dynamics, is widely used
to treat a variety of cancers (reviewed in Ref. [lG].These drugs induce a mitotic
arrest, which eventually leads to cell death [17]through mechanisms that are
only beginning to be understood [18,19, 201.
2.1.2.2 Cytochalasin and Actin

While colchicine was a valuable tool for examining cellular processes that relied
on microtubules, electron microscopy revealed another filamentous structure,
2. I Using Small Molecules to Unravel Biological Mechanisms
I 75
Fig. 2.1-2 (a) Structure ofcytochalasin B, a small molecule that

targets actin. (b) Force production by the contractile ring in
cytokinesis. A ring o f actin filaments forms at the plasma
membrane and contracts to divide the cell in half.
termed rnicroJlarnents, that was distinct from microtubules. A key step in

understanding the function of microfilaments was to observe a correlation
between the presence of the filaments, their disruption by the small molecule
cytochalasin (Fig. 2.1-2(a)), and the phenotype of cytochalasin treatment in
multiple systems. Although the molecular target of cytochalasin was unknown,
it was shown to inhibit many forms of cellular or intracellular movement, such
as cytoplasmic cleavage in cytokinesis (Fig. 2.1-2(b)),cell motility, membrane
ruffling, and nerve outgrowth [21, 221. In all of these systems, microfilaments
were observed and were shown to be disrupted by cytochalasin. Cells recovered
after removal of cytochalasin as the microfilaments returned to their normal
state. Furthermore, the actions of cytochalasin and colchicine were generally
mutually exclusive, suggesting that the two types of filamentous structures
could function independently in the cell. Microtubule-dependent processes,
which were inhibited by colchicine, were often insensitive to cytochalasin, while
processes inhibited by cytochalasin were generally insensitive to colchicine
[22]. The conclusion from these correlative data was that microfilaments likely
played a fundamental role in the generation of forces at the cellular level:
“the evidence seems overwhelming that microfilaments are the contractile
machinery of nonmuscle cells” [22]. The action of the myosin motor, which
uses energy from adenosine triphosphate (ATP) hydrolysis to slide filaments
made up of polymers of the protein actin, was known to drive contractility in
muscle, but the relevance of this mechanism to other cellular processes had
not been demonstrated.
Using actin filaments purified from muscle, cytochalasin was shown to
decrease the viscosity of actin in solution. This experiment, which established
a direct link between cytochalasin and actin, led to two important conclusions.
First, cytochalasin interacts directly with actin. Second, “an interaction of
76
l cytochalasin with actin or actin-like proteins in vivo could account for the
ability of cytochalasin to inhibit various forms of cell motility and contraction”

[23].As the molecular target of cytochalasin, actin was implicated as a critical
component of the microfilaments involved in cytochalasin-sensitiveprocesses.
2.1 2 . 3 Small Molecules and Thermal Sensation

Another example of a small molecule with a dramatic cellular phenotype is
capsaicin (Fig. 2.1-3(a)),the natural product that makes chili peppers “hot”.
Its mechanism of action is of particular interest because of the link to
more general pain sensation. A class of neurons that are excited by various
noxious stimuli (chemical, mechanical, or temperature) are also sensitive to
capsaicin [24]. Therefore, capsaicin could be a useful tool in understanding
the basic mechanisms underlying pain sensation. The discovery of a capsaicin
Fig. 2.1-3 (a) Structures ofthe small gated by capsaicin binding, heat, and
molecule capsaicin and menthol. protons. (c) Response of the VR1 receptor
(b) Schematic o f the VR1 receptor, a channel t o capsaicin, temperature, and pH.
nonspecific cation channel. The channel is Adapted from [Ref. 281.
2. J Using Small Molecules to Unravel Biological Mechanisms
receptor, in particular, would provide a molecular handle on this process.

I 77
Studies in cultured neurons showed that capsaicin induced a rapid calcium

influx through activation of a cation channel [25, 261. On the basis of this
knowledge, an expression cloning strategy was devised to identify the receptor
[27]. The underlying logic of this approach was that if nonneuronal cells were
not sensitive to capsaicin simply because they did not express the receptor,
expression of the receptor would lead to a capsaicin-induced increase in
intracellular calcium. A neuronal cDNA library was transfected into human
embryonic kidney (HEK293) cells and screened by calcium imaging in living
cells. The cloned receptor, named VR1 (vanilloid receptor subtype 1) was
shown to be a nonselective cation channel expressed in sensory neurons
(Fig. 2.1-3(b-c)). The sensitivity of VR1 to heat and acid, as well as capsaicin,
indicated its more general physiological importance in detecting noxious
stimuli [28]. At the whole animal level, the role of V R l in detection of noxious
stimuli has been demonstrated by gene disruption studies in mice [29, 301.
A similar expression cloning strategy was used to identify a receptor involved
in transduction of cold sensation. In this case, the natural product used to
induce calcium influxwas menthol (Fig. 2.1-3(a)),which was known to produce
a sensation of cold and even suggested to interact directly with a cold detection
pathway [31].Transient receptor potential (TRPM8), a cation channel from the
same family as VR1, was cloned and shown to be activated by both menthol
and cold [32, 331. Thus, small molecules were used to link our perceptions
of both heat and cold to specific receptors in sensory neurons involved in
thermosensation. Identification of these receptors has opened the door to an
understanding of thermosensation at a molecular level [34].
2.1.3
Small Molecules as Probes for Biological Processes
In strategies developed to use small molecules as probes to understand

biological processes, the effects of the small molecule on the biological
system as a whole are often more important than the specific protein target,
which may not even be known. A number of insightful experiments have
been designed around such perturbations by examining how the system
responds to or recovers from the induced state. Because of the temporal
control available with small molecules and the reversibility of inhibition, these
approaches are particularly powerful with dynamic processes. As initially
shown with colchicine, the mitotic spindle is a highly dynamic structure and
small molecules have played an integral role in understanding its function.
2.1.3.1 Progression through Mitosis

It is clear from observing chromosome movements that cell division occurs
in an ordered sequence of events (Fig. 2.1-1(a)). Chromosomes attach to
spindle microtubule fibers and move to the spindle equator before sister
78
I chromosomes separate at anaphase and move to opposite sides of the
cell, followed by division into two daughter cells. Successful chromosome

segregation requires that events occur in this order. If anaphase begins
prematurely, before chromosomes have properly attached to the spindle, the
sister chromosomes will not segregate equally, leading to aneuploid daughter
cells. Mechanisms that determine the timing of anaphase onset are therefore
critical for the success of mitosis.
One hypothesis for how anaphase onset might be regulated was through
feedback control. This term refers to a mechanism for controlling progression
past a certain point in the cell cycle,known as a checkpoint, where the completion
of an event generates a signal that allows the next event to begin. Failure to
complete the event causes a cell-cycle arrest. In the context of progression
through mitosis, some critical process, such as spindle assembly, would be
monitored to generate a signal regulating anaphase onset. Consistent with
this hypothesis, colchicine was known to induce a mitotic arrest by disrupting
the spindle. The effect of colchicine did not prove the existence of a feedback
control mechanism, however, because the mitotic arrest could also be explained
by direct inhibition of another microtubule-dependent process required for
anaphase. A prediction of the feedback-control hypothesis is that mutations
in genes required for feedback signaling would allow cells to bypass the
colchicine-induced arrest and progress through mitosis without completing
spindle assembly.
A genetic screen was designed to identify such mutations in budding yeast,
using benomyl, a small molecule inhibitor of microtubule polymerization that
is effective in yeast, to perturb spindle assembly. Benomyl could either be used
at a low dose or washed out, as the effect is reversible, so that cells would survive
the treatment. Cells were arrested in mitosis with high benomyl(70 pg mL-'),
which prevents spindle formation, but proceeded normally through mitosis af-
ter removal of benomyl and continue to grow (Fig. 2.1-4(a))1351. Alternatively,
spindle assembly was slowed with low benomyl (15 pg mL-l), and anaphase
onset was delayed to allow completion of spindle assembly, but cells continued
to grow [36]. In both cases, massive chromosome missegregation and cell
death were expected if cells entered anaphase prematurely in the presence of
benomyl with incomplete or nonexistent spindles. The difference in survival
between cells with functional and defective feedback control was used to select
mutations in genes required for feedback control [35, 361. After creating ran-
dom genetic mutations, cells that failed to grow after benomyl treatment were
selected (Fig. 2.1-4(b)).As in Inoue's studies with colchicine, the reversibility
of the small molecule and the ability to achieve partial inhibition by decreasing
the dose were important components of the benomyl-screening strategies.
The identification of genetic mutations that abolished the benomyl-induced
mitotic arrest provided evidence for a feedback mechanism that delays
anaphase onset until completion of spindle assembly, now often referred
to as the mitotic spindle checkpoint. The names M a d , for mitotic arrest
deficient, and Bub, for budding uninhibited by benomyl, were used for
-
0
I 79
(b)
Colony grows
Wild-type cell arrests in mitotis without benomyl
8 .
Mutant cell defective in feedback control fails to arrest

Colony dies
(I4-
with benomyl
Cells dead due to catastrophic
chromosome misegregation
* *
Benomyl Benomyl removed
Fig. 2.1-4 Screening strategy used t o missegregation and eventual cell death.
identify genes required for feedback control (b) Cells were mutagenized, and colonies
o f anaphase onset in budding yeast [35]. were grown from single cells and then
(a) Cells were arrested in mitosis for 20 h transferred t o create two replicate plates.
with benomyl, a small molecule that targets One plate (top) was grown without benomyl.
tubulin and prevents spindle formation. The second plate (bottom) was treated with
After removal o f benomyl, wild-type cells benomyl. Colonies that failed to grow on the
form a spindle and proceed normally second plate, indicating defective feedback
through mitosis. Mutant cells fail to arrest control, were selected from the first plate t o
and enter anaphase without forming a identify the mutated gene.
spindle, causing chromosome
the genes identified in these screens. The Mad and Bub genes, which
are well conserved from yeast to mammals, have provided the foundation
for much of our current understanding of the mitotic spindle checkpoint.
Studies in transgenic mice have confirmed the importance of several of these
genes for faithful chromosome segregation in higher eukaryotes, as reduced
expression increases both aneuploidy and cancer susceptibility. In human
tumors, mutations have been reported in Madl, Mad2, Bubl, and BubRl, a
related vertebrate protein (reviewed in [Ref. 11. Additionally, human germline
mutations in BubR1 have been linked to mosaic variegated aneuploidy, a
condition associated with high risk of cancer [37].
Experiments examining the intracellular localization of Mad2 have suggested
a model for how the feedback control mechanism might operate [38, 391. At
early stages of mitosis, Mad2 localizes to the kinetochore, a structure that forms
on each chromosome and mediates attachment to spindle microtubules. As
80
cells progress through mitosis, however, Mad2 disappears from kinetochores,

and at anaphase onset none of the kinetochores have detectable Mad2. The
loss of Mad2 from kinetochores correlates with microtubule attachment.
Furthermore, when spindle microtubules are depolymerized with the small
molecule nocodazole, Mad2 localizes to all kinetochores. These findings
suggest a mechanistic basis for the feedback-control model. Mad2 binds
kinetochores that lack microtubule attachment as a signal that mitosis
in not complete, which prevents anaphase onset. Microtubule binding
displaces Mad2 from kinetochores, so that when all kinetochores have bound
microtubules, anaphase can begin.
It should be noted that the small molecule benomyl was used in the Mad/Bub
genetic screens not because of its specific protein target but because of the
perturbation of spindle assembly. In principle, the same experiments could
be done by targeting a different component of the spindle. The generality of
the spindle checkpoint has been demonstrated through the use of monastrol,
a small molecule inhibitor of the mitotic kinesin Eg5, which was identified
in a screen for small molecules that arrest cells in mitosis without targeting
tubulin [40].Because Eg5 is required to separate the spindle poles, monastrol
treatment arrests cells in mitosis with monopolar spindles. In the presence of
monastrol, the checkpoint can be overridden by inhibition of Mad2, through
microinjection of inhibitory antibodies [41]. This finding indicates that the
principle of feedback control applies generally to spindle perturbations through
highly conserved mechanisms.
Inhibitors of Eg5 are currently in development as anticancer drugs because,
like taxol and the vinca alkaloids, they arrest cells in mitosis by activating
the spindle checkpoint. The efficacy of these drugs, as demonstrated by
recent studies, requires a prolonged, checkpoint-dependent mitotic arrest [42,
191. Drug resistance is conferred by a compromised spindle checkpoint, for
example, through reduced expression of Mad2.
2.1.3.2 Positioning the Cleavage Plane in Cytokinesis

Monastrol, the small molecule inhibitor of Eg5, has been used in several
studies to address questions in the biology of cell division [41, 43, 441. One
important question is how the position of the cell division (or cleavage)
plane is determined in cytokinesis. The cleavage plane is typically positioned
in the center of the cell so that cellular components are equally divided
between the two daughter cells. Asymmetric divisions do occur, however,
and are particularly important during development, when the location of
the cleavage plane can determine the fate of the daughter cells. Models to
explain the position of the cleavage plane relied on the presence of the bipolar
microtubule array of the mitotic spindle, which would place the division plane
in between the spindle poles.
To test this idea directly, monastrol was used in an experiment designed to
determine if cytokinesis could occur in cells with monopolar spindles [41].To
2. I
Anti-Mad2 antibody
Using Small Molecules to Unravel Biological Mechanisms
Fig. 2.1-5 Assay to examine

I 81
injection cytokinesis in the presence of a

monopolar spindle [41].
Treatment with monastrol, a
small molecule inhibitor ofthe
kinesin Eg5, causes cells to
@+&I
arrest in mitosis with
monopolar spindles due to
activation of the spindle
checkpoint. Microinjection o f an
antibody against the protein
Mad2 inactivates the checkpoint
p
b monopolar
so that cellsspindles.
divide with
Monastrol
allow cells to enter anaphase in the presence of monastrol, inhibitory antibodies

against Mad2 were microinjected to override the mitotic spindle checkpoint.
After entering anaphase, the injected cells successfully completed cytokinesis
(Fig. 2.1-5). This experiment demonstrated that a bipolar microtubule array
is not required for cytokinesis. By carefully analyzing microtubule dynamics
during anaphase in the monopolar spindles, a population of microtubules near
the chromosomes was shown to be stabilized at the location where the cleavage
plane formed. These findings suggest a model in which the position of the
cleavage plane is determined by local regulation of microtubule dynamics,
through association with chromosomes.
2.1.3.3 Correcting Errors in Chromosome-spindleAttachments

Accurate chromosome segregation in mitosis requires not only feedback
control of anaphase onset but also regulation of chromosome attachment to
the spindle. Each pair of replicated chromosomes must achieve a particular
orientation in which microtubule fibers attach sister chromosomes to opposite
poles of the spindle. Experiments in yeast showed that inhibition of the
Ipll/Aurora family of kinases stabilized improper attachments [45, 461, but
how the active kinase corrected attachment errors was not known. Because
attachment errors are rarely observed in the presence of active Aurora kinase,
this problem was particularly difficult to address. Inhibition of Aurora kinase,
through experimental approaches such as genetic mutation, could be used
to accumulate attachment errors, but not to examine error correction by the
active kinase. Reversible small molecule Aurora kinase inhibitors present a
82
(b)
IV
-b -b
Monastrol Monastrol removed
Hesperadin Hesperadin removed
H
4 Fig. 2.1-6 Correction o f improper (c) Spindles were fixed after bipolarization
chromosome attachments by activation o f either in the absence (i) or in the presence
Aurora kinase [44]. (a) Structures o f t w o (ii) o f a n Aurora kinase inhibitor.
Aurora kinase inhibitors (AKI), hesperadin Chromosomes are shown in blue and
and AKI-1. (b) Assay schematic. microtubule fibers in green. The arrows
(i) Treatment with the Eg5 inhibitor indicate sister chromosomes that are both
monastrol arrests cells in mitosis with attached t o the same spindle pole.
monopolar spindles, in which sister Projections o f multiple image planes are
chromosomes are often both attached to the shown, with optical sections o f boxed
single spindle pole. (ii) Hesperadin, an regions (1 and 2) t o highlight attachment
Aurora kinase inhibitor, is added as errors. Scale bar 5 pm. (d) After removal o f
monastrol is removed. As the spindle hesperadin, CFP tubulin (top) and
bipolarizes with Aurora kinase inhibited, chromosomes (bottom) were imaged live by
attachment errors fail t o correct so that three-dimensional confocal fluorescence
some sister chromosomes are still attached microcopy and differential interference
t o the same pole o f t h e bipolar spindle. contrast (DIC), respectively. The arrow and
(iii) Removal o f hesperadin activates Aurora arrowhead show two chromosomes that
kinase. Incorrect attachments are move to the spindle pole (marked by a circle
destabilized by disassembling the in DIC images) as the associated
microtubule fibers, pulling the kinetochore-microtubule fibers shorten, and
chromosomes to the pole, while correct then move t o the center ofthe spindle. Time
attachments are stable. (iv) Chromosomes (min:s) after removal of hesperadin. Scale
move from the pole to the center ofthe bar 5 pm. (With permission from Lampson
spindle as correct attachments form. et al. N a t . Cell Biol. 2004, Ref. 44.)
solution to this problem because they can be used to inhibit kinase function
and subsequently removed to activate the kinase. Understanding the function
of Aurora kinases is particularly important because they have been linked to
oncogenesis, and Aurora kinase inhibitors are currently in development as
cancer therapeutics [47, 481.
Several issues needed to be addressed to devise a strategy to address the
question of how attachment errors were corrected. First, kinase inhibition
should be temporally controlled to experimentally isolate the error correction
process, as Aurora kinases have been implicated in multiple mitotic processes.
Second, error correction likely involves some regulation of the dynamics of the
microtubule fibers that attach chromosomes to the spindle. These dynamics
can be analyzed with high temporal and spatial resolution by high-resolution
microscopy in living cells. Finally, the dynamics of individual microtubule
fibers are difficult to analyze if that fiber is obscured by other microtubules in
the spindle. The dynamics can be clearly observed, however, under conditions
in which the improperly attached chromosomes are positioned away from the
spindle body.
All of these issues were addressed through the development of an assay
using several reversible small molecule inhibitors (Fig. 2.1-6) [44]. First,
treatment with the Eg5 inhibitor monastrol arrests cells in mitosis with
monopolar spindles (Fig. 2.1-G(b) i). A particular chromosome attachment
error in which both sisters are attached to the single spindle pole, referred to
as syntelic attachment, is frequent in the monopolar spindles [49]. If monastrol
84
I is removed, the spindle becomes bipolar, all of the accumulated attachment
errors are corrected, and anaphase proceeds normally. An Aurora kinase

inhibitor was added immediately after removal of monastrol to determine
if Aurora kinase activity is required for correction of the attachment errors.
Because the Aurora kinase inhibitor is added only at this point, its activity was
unperturbed for all the preceding stages of mitosis. To control for possible
off-target activities of the Aurora kinase inhibitors, the assay was performed
with two structurally unrelated inhibitors (Fig. 2.1-G(a)).
Cells expressing GFP (green fluorescent protein) tubulin were used
to examine spindle bipolarization in the presence of an Aurora kinase
inhibitor (Fig. 2.1-G(b-d)).Both chromosome and microtubule dynamics were
analyzed at high resolution by multimode fluorescence and transmitted light
microscopy.The syntelic attachment errors persisted as the spindle bipolarized,
directly demonstrating that Aurora kinase activity is required for correction
of these errors. Notably, some of the improperly attached microtubule fibers
could be clearly observed, unobstructed by other spindle microtubules, as the
chromosomes attached to these fibers were positioned away from the spindle
body. After spindle bipolarization, the Aurora kinase inhibitor was removed to
examine how the active kinase might correct the syntelic attachment errors.
One hypothesis was that attachment errors would correct by chromosome
release from the attached microtubule fiber [50]. Instead the observation was
that improperly attached chromosomes remained attached to the microtubule
fibers and were pulled to the spindle pole as the fibers shortened. Properly
attached chromosomes were not affected, suggesting local regulation of
microtubule dynamics by Aurora kinase activity. After disassembly of the
microtubule fibers, the chromosomes moved to their usual position at the
center of the spindle as correct attachment formed.
Several advantages of small molecule inhibitors, particularly in combination
with high-resolution live-cell microscopy, are demonstrated by this assay. In a
highly dynamic process such as mitosis, many events occur on timescales of
minutes or seconds. Ideally, perturbation of protein function and observation
of the effects of the perturbation would be possible on similar timescales.
Manipulation of protein function through the use of reversible small molecule
inhibitors, together with live-cell imaging, makes this possible. In the assay
described here, inhibitors of both the kinesin Eg5 and Aurora kinases were
effectivelyused as switches to turn enzymes on and off. With this high degree
of temporal control, a mechanism for correcting chromosome attachment
errors could be dissected without perturbing the preceding processes, such as
those involved in spindle assembly.
2.1.3.4 Brefeldin A Principles of Membrane Transport

Our understanding of cell division has benefited greatly from studies with
small molecules, but these tools have also been applied successfully to other
dynamic processes in cell biology. One such process is the transport of lipids
and proteins between distinct membrane-bound compartments, or organelles,

inside the cell. The small molecule Brefeldin A (BFA) was instrumental in
uncovering some of the basic principles of intracellular transport.
A fundamental question in cell biology is how an organelle can maintain
its identity in the presence of constant inward and outward flow of lipids
and proteins. In the secretory pathway, for example, proteins are synthesized
in the endoplasmic reticulum (ER), then transported to the Golgi apparatus
for processing, and finally exit the Golgi in transport intermediates that
fuse with the plasma membrane to release their contents outside the cell
(Fig. 2.1-7(a)). As an indication of the flow of lipids and proteins through
this pathway, bulk ER membrane was estimated to be depleted by transport
to the Golgi with a half-time of 10 min [Sl]. This observation suggested the
existence of a recycling pathway to return membrane to the ER, but the
first direct demonstration of this recycling pathway came from studies with
Brefeldin A (Fig. 2.1-7(b)).Early studies had shown that BFA blocked transport
of proteins out of the ER and caused disassembly of the Golgi [52, 531.
Careful analysis of BFA-treated cells demonstrated that within minutes of BFA
treatment, resident Golgi proteins redistributed to the ER. The redistribution
was shown both by localization of Golgi proteins and biochemically, as resident
ER glycoproteins were processed by the redistributed Golgi enzymes in the
presence of BFA [54, 551. After removal of BFA, the Golgi rapidly reformed
and the usual localization of Golgi proteins was reestablished, again within
minutes. These findings provided direct evidence for a Golgi-ER recycling
pathway and highlighted the dynamic nature of membrane transport between
the two organelles.
Subsequent studies with BFA led to additional insights into some essential
features of membrane traffic from the Golgi. A careful analysis of the timing
of events after BFA treatment showed that a 110-kD peripheral membrane
protein, whose identity was at that point unknown, dissociated from Golgi
membranes as the earliest detectable event (within 30 s) in BFA action and
reassociated after removal of BFA as the Golgi reformed [SG]. Other peripheral
membrane proteins did not dissociate but redistributed to the ER instead, as
had been shown for resident Golgi proteins. These findings suggested that the
110-kD protein played a critical role in the regulation of membrane transport
from the Golgi.
The 110-kD protein was subsequently purified and cloned and shown to be
identical to B-COP, a component of the coat protein 1 (COPI) (or coatamer)
complex, which forms the coat of vesicles budding from the Golgi [57, 581.
This finding, together with the known effects of BFA, led to the hypothesis
that COPI-coated vesicles mediate forward membrane flow from the Golgi.
Inhibition of this process with BFA would allow retrograde flow to dominate,
so that Golgi membranes would be transported back to the ER, as observed.
The hypothesis was tested in a cell-free system in which the budding of
COPI-coated vesicles from Golgi membranes could be reconstituted in vitro
[59].BFA prevented the assembly ofthe COPI coat in this system, as predicted.
86
Fig. 2.1-7 (a) Schematic ofthe secretory ARF CTPase. Exchange o f GDP for GTP on
pathway. Transport vesicles carry membrane ARF triggers ARF-CTP binding t o Colgi
and soluble material from the ER t o the membranes. After ARF-CFP binding, the
Colgi and from the Golgi to the plasma coatamer complex assembles on the
membrane, where the soluble contents are membrane and induces budding o f a
released into the extracellular space. transport vesicle. ARF hydrolyzes CTP after
(b) Structure of the small molecule Brefeldin vesicle budding t o release coatamer and
A. (c) Regulation ofvesicle budding by the ARF-CDP from the membrane.
Together these experiments linked the COPI complex with forward membrane
transport from the Golgi, through the observed effects of BFA on both COPI
coat assembly and the dynamics of ER-Golgi trafficking.
2.7 Using Small Molecules t o Unravel Biological Mechanisms
BFA continued to be instrumental in understanding the regulation of coat

assembly. In a semipermeabilized cell system, GTPy S, a nonhydrolyzable
analog of GTP, was shown to prevent the BFA-induced dissociation of the
110-kD protein (at that point not known to be p-COP) from the Golgi [GO].
This finding suggested that the GTP-GDP (guanosine diphosphate) cycle
was involved in the process inhibited by BFA. A small GTP-binding protein,
adenosine diphosphate ribosylation factor (ARF), was a candidate involved in
this mechanism because it was known to associate with the Golgi and had
been implicated in Golgi transport processes [ G l ] . When the sensitivity of this
protein to BFA was examined, BFA was found to inhibit ARF binding to Golgi
membranes, both in cells and in vitro, while GTPyS prevented this inhibition
[G2]. These results were consistent with the effects of BFA and GTPyS on
,&COP. Furthermore, ARF was shown to be a subunit of the COPI coat
[G3].Together, these findings suggested that the GTP-binding state of ARF
regulates COPI coat formation. To place the events in an ordered biochemical
process, BFA was shown to be required for association of ARF with Golgi
membranes, and ARF was then required for binding of p-COP [G4].
A more detailed biochemical understanding of the mechanism of BFA action
was provided by the finding that an activity associated with Golgi membranes
catalyzes GDP-GTP exchange on ARF and is inhibited by BFA [65, 661.
The interpretation was that BFA acts by preventing nucleotide exchange on
ARF, which prevents ARF binding to membrane, an event required for coat
assembly and vesicle budding. This result suggested a general model for
membrane transport in which ARF proteins regulate assembly of coated
vesicles through changes in the GTP-GDP binding state and therefore control
vesicular trafficking (Fig. 2.1-7(c))[G7].
Much more work has been done with BFA, for example, to understand
its mechanism of action in more detail [G8], but the studies discussed here
illustrate many key features of the small molecule approach. Interest in BFA
was initially stimulated by its dramatic phenotype on a biological process: traffic
of proteins through the secretory pathway. Before the underlying mechanism
was understood in molecular detail, the inhibitor was instrumental in a
series of experiments that revealed some of the key principles of membrane
transport. Though BFA was not directly involved in all of the experiments,
interpretation of many of the findings depended on placing the results in the
context of BFA action. These experiments demonstrated the dynamic nature of
ER-Golgi transport and the role of the COPI coat complex in vesicle formation.
Furthermore, the role of the ARF GTPase in coat assembly led to a model for
regulation of vesicular trafficking.
Several properties of BFA as a small molecule were exploited throughout
these experiments. Reversibility and temporal control were used to understand
the dynamic nature of the events and to place them in an ordered process. In
addition, BFA was used in multiple systems, including various cell types and
in vitro, so that insights from biochemical experiments could be interpreted in
the context of a complex cellular process.
88
2.1.3.5 Catalysis by Ribosomal RNA

Small molecules can be used to address problems at the level of biochemical
reactions as well as larger-scalecellular processes. Puromycin, a small molecule
inhibitor of protein synthesis, has contributed to our understanding of the
catalysis of peptide bond formation. Protein synthesis in a cell takes place on
a large assembly of protein and RNA components called the ribosome. This
structure carries out the complex task of reading the codons of an mRNA
molecule, selecting the appropriate amino acid for each codon, and catalyzing
the formation of a peptide bond between that amino acid and the preceding
one in the polypeptide chain (the peptidyl transferase reaction). It was initially
assumed that ribosomal proteins were responsible for the peptidyl transferase
activity, but experiments in the 1970s suggested that ribosomal RNA might be
directly involved. The discovery of catalytic RNA in the 1980s [69, 701 led to the
hypothesis that ribosomal RNA, rather than protein, might catalyze peptide
bond formation.
An experiment was designed to test this idea on the basis of the logic
that if catalysis is RNA based, it might be possible to remove ribosomal
proteins without loss of peptidyl transferase activity. The assay used to
measure transferase activity had been developed two decades earlier as a model
reaction to study the mechanism of peptide bond formation [71].In this assay,
both ribosomal substrates, the growing polypeptide chain and the incoming
aminoacyl-tRNA,are replaced with simplified molecules: a tRNA fragment,
CAACCA-formyl-methionine, and the small molecule puromycin (Fig. 2.1-8).
The “fragment reaction” requires only the large (50s) ribosomal subunit,
without small subunits or other factors. Peptidyl transferase activity can be
measured as formation of the product f-Met-puromycin, using 35 S-labeled
methionine. Exploiting this model system, catalytic activity was measured
following extraction of ribosomal proteins from the 50s subunit, using
procedure designed to cause minimal perturbation of RNA structure. Ninety-
five percent of the ribosomal protein could be removed by treatment with
SDS (sodium dodecyl sulfate) and proteinase K, followed by phenol extraction,
while maintaining over 80% activity [72]. In contrast, transferase activity was
rapidly lost upon treatment with ribonuclease. While this result could not
formally exclude the possibility that catalysis was carried out by the remaining
5% of ribosomal proteins, it strongly supported the hypothesis that ribosomal
RNA was responsible for peptidyl transferase activity.
In the fragment reaction, the ability of puromycin to mimic the aminoacyl-
tRNA in the peptidyl transferase reaction was exploited to measure catalytic
activity. Puromycin was subsequently used to design a transition-state analog
for the peptidyl transferase reaction, known as the Yams inhibitor, in which
it is linked to the oligonucleotide CCdA by a phosphoramide group [73]. In
a complex with the 50s ribosomal subunit, the Yams inhibitor was used to
define the catalytic site in a high-resolution crystal structure. N o protein was
found within 18 A of this site [74]. This result demonstrated conclusively that
the catalytic activity indeed resides in the ribosomal RNA.
2. I Using Small Molecules to Unravel Biological Mechanisms
I 89
Elongated
polypeptide chain
-OR
Growing
polypeptide chain
NHz ReleasedtRNA
", Purornycin
Fig. 2.1-8 (a) Elongation o f a polypeptide peptidyl-tRNA. (b) The small molecule
chain. The amino group ofthe incoming puromycin replaces the arninoacyl-tRNA in
aminoacyl-tRNA joins the carbonyl group o f the polypeptide chain and prevents further
the growing polypeptide chain to replace the elongation.
2.1.4
Conclusion
The experiments described in this chapter illustrate how small molecule

inhibitors have been used to design strategies to address fundamental
biological problems. As our understanding of the biology advances, the use
of small molecules should complement genetic and RNAi-based approaches.
The advantages of small molecule inhibitors have been emphasized here, but
there are also significant limitations that should be considered, particularly
in comparison with genetic approaches. For example, genetics can be used to
target any gene for mutation or deletion without direct effects on any other
gene. Discovery of a new small molecule inhibitor, however, is challenging.
Another limitation is the difficulty of demonstrating specificity of small
molecule inhibitors. Taking a kinase inhibitor as an example, testing the
90
effects on over 500 kinases in the human genome is a substantial undertaking.

Using small molecules in focused assays is one way to address specificity, so
that a narrowly defined biological process is examined and off-target effects are
less likely to be relevant. In combination with this approach, several inhibitors
that target the same protein can be compared. If the inhibitors are chemically
unrelated, they are not expected to have similar off-target activities.
2.1.4.1 Future Directions

Only the availability of inhibitors and the assays that can be designed around
them limit the future use of small molecule inhibitors to address biological
questions. Currently, only a small fraction of the proteome can be targeted
by small molecules. As new inhibitors are identified, small molecule-based
strategies will be applicable to an increasing range of biological problems.
The development of methods to monitor protein function with high temporal
and spatial resolution, particularly in living cells, will also increase the scope
for using small molecules. Recent advances in fluorescence-based probes, for
example, have made it possible to monitor numerous properties of living
cells, including membrane potential, pH, posttranslational modifications,
protease activity, and mediators of intracellular signaling such as Ca2+ and
cyclic adenosine monophosphate (AMP) [75]. These high-resolution readouts,
with the temporal control afforded by small molecule inhibitors, should be
a powerful combination for examining biological mechanisms in living cells.
Methods have also been developed to measure the enzymatic activities of
single protein molecules in vitro. Investigating the effects of small molecule
inhibitors, both at this level and in a more complex cellular context, should
continue to provide insight into protein function.
References
1. G.j. Kops, B.A. Weaver, D.W. /. Gen. Physiol. 1967, 5O(Suppl.),

Cleveland, On the road to cancer: 259-292.
aneuploidy and the mitotic checkpoint, 5. S. Inoue, Organization and function of
Nut. Rev. Cancer 2005, 5, 773-785. the mitotic spindle, in Primitive Motile
2. S. Inoue, Polarization optical studies Systems in Cell Biology, (Eds.: R.D.
of the mitotic spindle. I. The Allen, K. Kamiya),Academic Press,
demonstration of spindle fibers in New York, 1964,549-598.
living cells, Chromosoma 1953, 5, 6. E.W. Taylor, The Mechanism of
487-500. Colchicine Inhibition of Mitosis. I.
3. S. Inoue, The effect of colchicine on Kinetics of Inhibition and the Binding
the microscopic and submicroscopic of H3-Colchicine,J. Cell B i d . 1965,
structure of the mitotic spindle, Exp. 25(Suppl.), 145 - 160.
Cell Res. 1952, Z(Suppl.),305. 7. G.G. Borisy, E.W. Taylor, The
4. S. Inoue, H. Sato, Cell motility by mechanism of action of colchicine.
labile association of molecules. The Binding of colchicine-3H to cellular
nature of mitotic spindle fibers and protein, J. Cell Biol. 1967a, 34,
their role in chromosome movement, 525-533.
References 191
8. G.G. Borisy, E.W. Taylor, The Induction of apoptosis by an inhibitor

mechanism of action of colchicine. of the mitotic kinesin KSP requires
Colchicine binding to sea urchin eggs both activation of the spindle assembly
and the mitotic apparatus, /. Cell Biol. checkpoint and mitotic slippage,
1967b, 34, 535-548. Cancer Cell 2005, 8, 49-59.
9. M.L. Shelanski, E.W. Taylor, Isolation 20. B.A. Weaver, D.W. Cleveland,
of a protein subunit from Decoding the links between mitosis,
microtubules, J . Cell Biol. 1967, 34, cancer, and chemotherapy: the mitotic
549-554. checkpoint, adaptation, and cell death,
10. R.C. Weisenberg, G.G. Borisy, E.W. Cancer Cell 2005, 8, 7-12.
Taylor, The colchicine-binding protein 21. S.B. Carter, Effects of cytochalasins on
of mammalian brain and its relation to mammalian cells, Nature 1967, 213,
microtubules, Biochemistry 1968, 7, 261 -264.
4466-4479. 22. N.K. Wessells, B.S. Spooner, J.F. Ash,
11. H. Mohri, Amino-acid composition of M.O. Bradley, M.A. Luduena, E.L.
“Tubulin” constituting microtubules Taylor, J.T. Wrenn, K. Yamaa,
of sperm flagella, Nature 1968, 21 7, Microfilarnents in cellular and
1053-1054. developmental processes, Science
12. P.B. Schiff, J. Fant, S.B. Honvitz, 1971, 271,135-143.
Promotion of microtubule assembly in 23. J.A. Spudich, S. Lin, Cytochalasin B,
vitro by taxol, Nature 1979, 277, its interaction with actin and
665-667. actornyosin from muscle (cell
13. P.B. Schiff, S.B. Honvitz, Taxol movement-microfilaments-rabbit
stabilizes microtubules in mouse striated muscle), Proc. Natl. Acad. Sci.
fibroblast cells, Proc. Natl. Acad. Sci. U.S.A. 1972, 69,442-446.
U.S.A. 1980, 77, 1561-1565. 24. M.J. Caterina, D. Julius, The vanilloid
14. R.B. Vallee, A taxol-dependent receptor: a molecular gateway to the
procedure for the isolation of pain pathway, Annu. Rev. Neurosci.
microtubules and 2001, 24,487-517.
microtubule-associated proteins 25. U. Oh, S.W. Hwang, D. Kim,
(MAPs),J. Cell Biol.1982, 92,435-442. Capsaicin activates a nonselective
15. R.D. Vale, T.S. Reese, M.P. Sheetz, cation channel in cultured neonatal rat
Identification of a novel dorsal root ganglion neurons, J.
force-generating protein, kinesin, Neurosci. 1996, 16, 1659-1667.
involved in microtubule-based 26. J.N. Wood, J . Winter, I.F. James, H.P.
motility, Cell 1985, 42, 39-50. Rang, J. Yeats, S. Bevan,
16. M.A. Jordan, L. Wilson, Microtubules Capsaicin-induced ion fluxes in dorsal
as a target for anticancer drugs, Nat. root ganglion cells in culture, J.
Rev. Cancer 2004, 4, 253-265. Neurosci. 1988, 8, 3208-3220.
17. M.A. Jordan, K. Wendell, S . Gardiner, 27. M.J. Caterina, M.A. Schumacher,
W.B. Derry, H. Copp, L. Wilson, M. Tominaga, T.A. Rosen, J.D. Levine,
Mitotic block induced in HeLa cells by D. Julius, The capsaicin receptor: a
low concentrations of paclitaxel (Taxol) heat-activated ion channel in the pain
results in abnormal mitotic exit and pathway, Nature 1997, 389,816-824.
apoptotic cell death, Cancer Res. 1996, 28. M. Tominaga, M.J. Caterina, A.B.
56,816-825. Malmberg, T.A. Rosen, H. Gilbert,
18. C.L. Rieder, H. Maiato, Stuck in K. Skinner, B.E. Raurnann, A.I.
division or passing through: what Basbaum, D. Julius, The cloned
happens when cells cannot satisfy the capsaicin receptor integrates multiple
spindle assembly checkpoint, Dev. Cell pain-producing stimuli, Neuron 1998,
2004, 7,637-651. 21,531-543.
19. W. Tao, V.J. South, Y. Zhang, J.P. 29. M.J. Caterina, A. Leffler, A.B.
Davide, L. Farrell, N.E. Kohl, Malmberg, W.J. Martin, J. Trafton,
L. Sepp-Lorenzino, R.B. Lobell, K.R. Petersen-Zeitz, M. Koltzenburg,
'
92
I 2 Using Natural Products t o Unravel Biological Mechanisms
A.I. Basbaum, D. Julius, Impaired assembly checkpoint component

nociception and pain sensation in XMAD2 with unattached
mice lacking the capsaicin receptor, kinetochores, Science 1996, 274,
Science 2000, 288, 306-313. 242-246.
30. J.B. Davis, J. Gray, M.J. Gunthorpe, 39. J.C. Waters, R.H. Chen, A.W. Murray,
J.P. Hatcher, P.T. Davey, P. Overend, E.D. Salmon, Localization of Mad2 to
M.H. Harries, J. Latcham, kinetochores depends on microtubule
C. Clapham, K. Atkinson, S.A. attachment, not tension, J . Cell Bid.
Hughes, K. Rance, E. Grau, A.J. 1998, 141,1181-1191.
Harper, P.L. Pugh, D.C. Rogers, 40. T.U. Mayer, T.M. Kapoor, S. J.
S. Bingham, A. Randall, S.A. Haggarty, R.W. King, S.L. Schreiber,
Sheardown, Vanilloid receptor-1 is T.J. Mitchison, Small molecule
essential for inflammatory thermal inhibitor of mitotic spindle bipolarity
hyperalgesia, Nature 2000, 405, identified in a phenotype-based
183-187. screen, Science 1999, 286,971-974.
31. H. Hensel, Y. Zotterman, The effect of 41. J.C. Canman, L.A. Cameron, P.S.
menthol on the thermoreceptors, Acta Maddox, A. Straight, J.S. Tirnauer, T.J.
Physiol. Scand. 1951, 24,27-34. Mitchison, G. Fang, T.M. Kapoor, E.D.
32. D.D. McKemy, W.M. Neuhausser, Salmon, Determining the position of
D. Julius, Identification of a cold the cell division plane, Nature 2003,
receptor reveals a general role for TRP 424,1074-1078.
channels in thermosensation, Nature 42. T. Sudo, M. Nitta, H. Saya, N.T. Ueno,
2002,416 52-58. Dependence of paclitaxel sensitivity on
33. A.M. Peier, A. Moqrich, A.C. a functional spindle assembly
Hergarden, A.J. Reeve, D.A. checkpoint, Cancer Res. 2004, 64,
Anderson, G.M. Story, T.J. Earley, 2502-2508.
I. Dragoni, P. McIntyre, S. Bevan, 43. A. Khodjakov, L. Copenagle, M.B.
A. Patapoutian, A TRP channel that Gordon, D.A. Compton, T.M. Kapoor,
senses cold stimuli and menthol, Cell Minus-end capture of preformed
2002, 108,705-715. kinetochore fibers contributes to
34. S.E. Jordt, D.D. McKemy, D. Julius, spindle morphogenesis, J . Cell Biol.
Lessons from peppers and 2003, 160,671-683.
peppermint: the molecular logic of 44. M.A. Lampson, K. Renduchitala,
thermosensation, Curr. Opin. A. Khodjakov, T.M. Kapoor,
Neurobiol. 2003, 13,487-492. Correcting improper
35. M.A. Hoyt, L. Totis, B.T. Roberts, S. chromosome-spindle attachments
cerevisiae genes required for cell cycle during cell division, Nut. Cell Biol.
arrest in response to loss of 2004,6,232-237.
microtubule function, Cell 1991, 156, 45. S. Biggins, F.F. Severin, N. Bhalla,
507-517. I. Sassoon, A.A. Hyman, A.W.
36. R. Li, A.W. Murray, Feedback control Murray, The conserved protein kinase
of mitosis in budding yeast, Cell 1991, Ipll regulates microtubule binding to
GG, 519-531. kinetochores in budding yeast, Genes
37. S. Hanks, K. Coleman, S. Reid, Dev. 1999, 13, 532-544.
A. Plaja, H. Firth, D. Fitzpatrick, 46. T.U. Tanaka, N. Rachidi, C. Janke,
A. Kidd, K. Mehes, R. Nash, N. Robin, G. Pereira, M. Galova, E. Schiebel,
N. Shannon, J. Tolmie, J. Swansbury, M.J. Stark, K. Nasmyth, Evidence that
A. Irrthum, J. Douglas, N. Rahman, the Ipll-Slil5 (Aurora
Constitutional aneuploidy and cancer kinase-INCENP) complex promotes
predisposition caused by biallelic chromosome bi-orientation by altering
mutations in B U B l B , Nut. Genet. kinetochore-spindle pole connections,
2004,36,1159-1161. Cell 2002, 108,317-329.
38. R.H. Chen, J.C. Waters, E.D. Salmon, 47. E.A. Harrington, D. Bebbington,
A.W. Murray, Association of spindle J. Moore, R.K. Rasmussen,
References I 9 3
A.O. Ajose-Adeogun, T. Nakayama, 56. J.G. Donaldson, J. Lippincott-

J.A. Graham, C. Demur, T. Hercend, Schwartz, G.S. Bloom, T.E. Kreis, R.D.
A. Diu-Hercend, M. Su, J.M. Golec, Klausner, Dissociation of a 110-kD
K.M. Miller, VX-680, a potent and peripheral membrane protein from
selective small-molecule inhibitor of the Golgi apparatus is an early event in
the Aurora kinases, suppresses tumor brefeldin A action, J . Cell Biol. 1990,
growth in vivo, Nat. Med. 2004, 10, 1 1 I , 2295-2306.
262-267. 57. R. Duden, G. Griffiths, R. Frank,
48. P. Meraldi, R. Honda, E.A. Nigg, P. Argos, T.E. Kreis, Beta-COP, a 110
Aurora kinases link chromosome kD protein associated with
segregation and cell division to cancer non-clathrin-coated vesicles and the
susceptibility, Cum. Opin. Genet. Dev. Golgi complex, shows homology to
2004, 14,29-36. beta-adaptin, Cell 1991, 64, 649-665.
49. T.M. Kapoor, T.U. Mayer, M.L. 58. T. Serafini, G. Stenbeck, A. Brecht,
Coughlin, T.J. Mitchison, Probing F. Lottspeich, L. Orci, J.E. Rothman,
spindle assembly mechanisms with F.T. Wieland, A coat subunit of
monastrol, a small molecule inhibitor Golgi-derived non-clathrin-coated
of the mitotic kinesin, Eg5, J. Cell Biol. vesicles with homology to the
2000, 150,975-988. clathrin-coated vesicle coat protein
50. R.B. Nicklas, S.C. Ward, Elements of beta-adaptin, Nature 1991b, 349,
error correction in mitosis: 215-220.
microtubule capture, release, and 59. L. Orci, M. Tagaya, M. Amherdt,
tension,]. Cell Biol. 1994, 126, A. Perrelet, J.G. Donaldson,
1241-1253. J . Lippincott-Schwartz, R.D. Klausner,
51. F.T. Wieland, M.L. Gleason, T.A. J.E. Rothman, Brefeldin A, a drug that
Serafini, J.E. Rothman, The rate of blocks secretion, prevents the
bulk flow from the endoplasmic assembly of non-clathrin-coated buds
reticulum to the cell surface, Cell 1987, on Golgi cisternae, Cell 1991, 64,
SO, 289-300. 1183-1 195.
52. T . Fujiwara, K. Oda, S. Yokota, 60. J.G. Donaldson,
A. Takatsuki, Y. Ikehara, Brefeldin A J. Lippincott-Schwartz, R.D. Klausner,
causes disassembly of the Golgi Guanine nucleotides modulate the
complex and accumulation of effects of brefeldin A in
secretory proteins in the endoplasmic semipermeable cells: regulation of the
reticulum, J . Biol. Chem. 1988, 263, association of a 170-kD peripheral
18545-18552. membrane protein with the Golgi
53. Y. Misumi, K. Miki, A. Takatsuki, apparatus, J. Cell Biol. 1991b, 112,
G. Tamura, Y. Ikehara, Novel blockade 579-588.
by brefeldin A of intracellular 61. T. Stearns, M.C. Willingham,
transport of secretory proteins in D. Botstein, R.A. Kahn,
cultured rat hepatocytes, J . Bid. Chem. ADP-ribosylation factor is functionally
1986,261, 11398-11403. and physically associated with the
54. R.W. Doms, G. Russ, J.W. Yewdell, Golgi complex, Proc. Natl. Acad. Sci.
Brefeldin A redistributes resident and U.S.A. 1990, 87,1238-1242.
itinerant Golgi proteins to the 62. J.G. Donaldson, R.A. Kahn,
endoplasmic reticulum, J . Cell Biol. J , Lippincott-Schwartz, R.D. Klausner,
1989, 109,61-72. Binding of ARF and beta-COP to Golgi
55. J. Lippincott-Schwartz, L.C. Yuan, J.S. membranes: possible regulation by a
Bonifacino, R.D. Klausner, Rapid trimeric G protein, Science 1991a, 254,
redistribution of Golgi proteins into 1197-1 199.
the ER in cells treated with brefeldin 63. T. Serafini, L. Orci, M. Amherdt,
A: evidence for membrane cycling M. Brunner, R.A. Kahn, J.E. Rothman,
from Golgi to ER, Cell 1989, 56, ADP-ribosylation factor is a subunit of
801-81 3. the coat of Golgi-derived COP-coated
94
I vesicles: a novel role for a GTP- in the excision of the intervening
binding protein, Cell 1991a, 67, sequence, Cell 1981, 27,487-496.
239-253. 70. C. Guerrier-Takada. K. Gardiner,
64. J.G. Donaldson, D. Cassel, R.A. Kahn, T. Marsh, N. Pace, S . Altman, The
R.D. Klausner, ADP-ribosylation RNA moiety of ribonuclease P is the
factor, a small GTP-binding protein, is catalytic subunit of the enzyme, Cell
required for binding of the coatomer 1983,35,849-857.
protein beta-COP to Golgi 71. R.E. Monro, K.A. Marcker,
membranes, Proc. Natl. Acad. Sci. Ribosome-catalysedreaction of
U.S.A. 1992a, 89, 6408-6412. puromycin with a
65. J.G. Donaldson, D. Finazzi, R.D. formylmethionine-containing
Klausner, Brefeldin A inhibits Golgi oligonucleotide,/. Mol. Biol. 1967, 25,
membrane-catalysed exchange of 347-350.
guanine nucleotide onto ARF protein, 72. H.F. Noller, V. Hoffarth, L. Zimniak,
Nature 1992b, 360,350-352. Unusual resistance of peptidyl
66. J.B. Helms, J.E. Rothman, Inhibition transferase to protein extraction
by brefeldin A of a Golgi membrane procedures, Science 1992, 256,
enzyme that catalyses exchange of 1416-1419.
guanine nucleotide bound to ARF, 73. M. Welch, J. Chastang, M. Yarus, An
Nature 1992,360, 352-354. inhibitor of ribosomal peptidyl
67. J.E. Rothman, The protein machinery transferase using transition-state
of vesicle budding and fusion, Protein analogy, Biochemistry 1995, 34,
S C ~1996,
. 5, 185-194. 385-390.
68. C.L. Jackson, Brefeldin A revealing the 74. P. Nissen, J. Hansen, N. Ban, P.B.
fundamental principles governing Moore, T.A. Steitz, The structural
membrane dynamics and protein basis of ribosome activity in peptide
transport, Subcell. Biochem. 2000,34, bond synthesis, Science 2000, 289,
233-272. 920-930.
69. T.R. Cech, A.J. Zaug, P.J. Grabowski, 75. J. Zhang, R.E. Campbell, A.Y. Ting,
In vitro splicing of the ribosomal RNA R.Y. Tsien, Creating new fluorescent
precursor of Tetrahymena: probes for cell biology, Nat. Rev. Mol.
involvement of a guanosine nucleotide Cell Biol. 2002, 3, 906-918.
Chemical Biology
2.2
2.2 Using Natural Products to Unravel Cell Biology
I 95
Using Natural Products to Unravel Cell Biology

Jonathan D. Gough and Craig M . Crews
Outlook
In recent years, a new discipline has emerged from the interface of chemistry
and biology, known as chemical biology. The unique foundation of this field is
the examination of biological questions through the use of chemical probes. An
example of chemical genetics is the use of biologically active natural products
as “inducible alleles” for elucidating protein function. In this chapter, we
discuss a variety of different natural products and their use in understanding
cell biology.
2.2.1
Introduction
With the sequencing of the human genome, advances in biological research

have grown exponentially. The use of genetic knockouts, RNA interference,
and site-directed mutagenesis to understand the roles of genes and gene
products is now becoming commonplace. Fundamentally, these methods
perturb protein expression at the genetic or transcriptional level. Although
these new tools have significantly improved our understanding of molecular,
cellular, and developmental biology, many questions still remain intractable.
Through the use of chemical genetics, biologically active compounds are now
being used as another means to address difficult biological questions.
Small molecules offer a significant advantage over classical genetic
techniques in that they can serve as “conditional alleles”. For example, a
small molecule that targets a specific protein can be used to “knock out” or
inhibit that protein only at a certain point during the cell cycle or during
an organism’s developmental process. In this approach, small molecules act
as “conditional alleles” that can be used in a temporal manner to induce or
inhibit a specific biological response, thus providing a method to selectively
investigate cell-signaling events within a narrow temporal window. In this
way, chemical genetics has provided the means to answer biological questions
that are difficult to study with standard genetic methods.
2.2.2
Historical Development
Evolution has taught us that biological systems find or create ways to adapt
to exogenous forces or stressors. Natural products are often the result of this
ISBN: 978-3-527-31150-7
96
l survival mechanism. These often highly potent small molecules encompass
a diverse array of structural variation and biological activities. Historically,

isolated compounds and extracts have been utilized as herbal remedies
or drugs. Initially, pharmaceutical companies utilized natural products as
a source or lead toward new drug candidates. Although most of these
compounds lack the potential for use as drugs, biologists in recent years
have found that natural products are useful for perturbing model cell systems.
As a class of compounds, they offer a unique starting point for investigating
biological systems. Because they are created in a living system, they are often
cell permeable and have specific biological targets. Using structure activity
relationships, via the analysis of analogs, natural products provide a starting
point for the development of new synthetic biological probes and insight into
their mechanism of action.
2.2.3
General Considerations
It is doubtful that Asperillus firnigatus evolved to produce the potent

antiangiogenic natural product fumagillin as a means to inhibit endothelial
cell growth. Nevertheless, secondary metabolites from many natural sources
have unexpected biological activities and have proved useful as cellular probes
or even as drug candidates. While many biologically active natural products
are isolated each year, not all have the potential to be effective biological
tools. Natural products are often isolated based on relatively simple bioassays
such as cell growth inhibition. Those compounds that block cell growth in
a nonselective manner (i.e., DNA intercalcation, ionophore activity, electron-
transport disruption), offer little in the ability to control specific intracellular
signaling processes. Thus, those natural products that most likely serve
as ligands for enzymes offer the most potential use as chemical genetic
probes.
2.2.4
Applications and Practical Examples
2.2.4.1 HDAC Inhibitors: Histone Deacetylase Inhibitors

The posttranslational modification of histones provides a code for the
correct regulation of gene expression by affecting chromatin structure
and interaction with regulatory factors. Modifications include acetylation,
deacetylation, phosphorylation, methylation, and ubiquitination [l].Histone
acetyltransferases(HATS)serve to activate gene transcription by acetylating the
E-amine of lysine residues of histone tails. Conversely, histone deacetylases
(HDACs) serve to deacetylate the lysine residues resulting in chromatin
condensation and subsequent transcriptional silencing [2]. Since the discovery
2.2 Using Natural Products t o Unravel Cell Biology
of the first HDAC inhibitors trichostatin A (TSA)1 and trapoxin (TPX) 2 in the
I 97
1990s [3] these, and other similar inhibitors have provided insight into a diverse
array of cell-signaling events: cell cycle arrest, apoptosis, cell differentiation,
angiogenesis, and metastasis inhibition. The general mechanism of action
for many of these natural products entails an aliphatic chain with a metal
chelating moiety that interferes with zinc coordination in the binding pocket
of their targeted HDACs.
0
3
2.2.4.1.1 Trichostatin A
The antifungal natural product TSA, originally isolated from a Streptomyces,was
found to have reversible biological activity at low nanomolar concentrations.
Yoshida and coworkers [4]demonstrated that TSA causes the induction of
Friend leukemia cell differentiation as well as inhibition of the cell cycle of
normal rat fibroblasts in the G I and G2 phases. This initial work revealed that
at low nanomolar concentrations, TSA induces the accumulation of acetylated
histones because of inhibiting HDAC activity within the cell.
TSA has also been shown to induce apoptosis in various tumor cell lines
[5] thereby making HDACs possible targets for cancer treatment. By blocking
HDACs, inhibitors such as TSA affect the level of gene transcription, causing
both the up- and downregulation of many genes ( ~ 2 % of the genome)
[GI. For example, TSA was found to reduce the expression of cyclin B1, a
key cyclin for G2-M transition, but in fact also stimulated expression of
p21C1P/WAF, an inhibitor of cyclin-dependent kinase (CDK)and Cdc2. Through
TSA-mediated HDAC inhibition, the G2-M transition is blocked because of
98
increased transcription of cell cycle regulators, p21C'P/WAF and cyclin B1. This
occurs via the modulation of histone acetylation at these gene promoters [7].
In addition, TSA has proved useful in the elucidation of important nuances
of cell differentiation. Cell cycle inhibitors had shown that inhibition of
proliferation was necessary, but not sufficient, for the differentiation of
neuronal precursor cells into oligodendrocytes [8]. Given the significant level
of chromatin remodeling that accompanies cellular differentiation, Marin-
Husstege and colleagues [9]hypothesized that histone acetylation plays a role
in oligodendrocyte differentiation. Using synchronized primary neonatal rat
cortical progenitors that were induced to differentiate into oligodendrocytes,
the authors showed that there is a temporal window during which histone
deacetylation is correlated with the acquisition of a branched morphology and
myelin gene expression. TSA-treated progenitors were able to exit from the
cell cycle but did not progress into oligodendrocytes. The ability of HDAC
inhibitors to inhibit oligodendrocyte differentiation is cell lineage dependent,
although TSA did not affect the precursor cells' ability to differentiate into
astrocytes. These results suggest that transcriptional repression is a crucial
event during oligodendrocyte lineage progression.
2.2.4.1.2 Trapoxin
The irreversible HDAC inhibitor TPX was first isolated as a fungal metabolite
that induced morphological reversion of v-sis-transformed NIH 3T3 cells
[lo]. Using the known structure-activity relationship between other HDAC
inhibitors as a guide, a TPX affinity reagent was synthesized and used to
identify its target protein as a HDAC [ll].
TPX was used to elucidate the protein interactions necessary for HDAC
mediated transcriptional repression via the Mad:Max ternary complex [ 121.
Previous studies had suggested that Mad:Max transcriptional repression
was mediated by ternary complex formation with another unknown protein.
Biochemical experiments identified the proteins mSin3A or B as the primary
candidates responsible for this negative transcriptional function. Coexpression
of activated or inactivated MAD (a DNA-binding transcription factor) in the
presence of TPX demonstrated that HDAC activity was necessary for ternary
complex formation. Additionally, these and other experiments showed that
the Mad:Max heterocomplexes repress transcription in a mSin3A-associated
H DAC-dependent manner.
2.2.4.1.3 Apicidin and Depudecin

Like TPX, the rnicrobially derived HDAC inhibitor depudecin 4 was also
isolated based on its ability to reverse the transformed cellular phenotype
of tumor cells. This diepoxide-containing natural product induced a flat
phenotype in Ki-rus-transformed NIH 3T3 cells and was further characterized
as an HDAC inhibitor by its ability to induce the accumulation of acetylated
histones [13]. Apicidin (APC) 3, a cyclic tetrapeptide HDAC inhibitor with
2.2 Using Natural Products to Unravel Cell Biology
structural similarity to TPX, was shown to possess potent antiproliferative

I 99
activity against various cancer cell lines [14], and like depudecin, displays
potent in uitro and in uivo antiangiogenic activities [15, 161. Thus, given the
ability of HDAC inhibitors to arrest cell proliferation and reverse tumor cell
morphology, HDAC inhibitors have generated much attention as a new class
of antitumor drugs.
2.2.4.2 Cyclin-dependent Kinase Inhibitors

Cyclin-dependent kinases (CDKs) play key roles in regulating cell cycle
progression. Throughout the cell cycle, different CDKs are activated and are
directly responsible for driving the cell from one phase to the next. Individual
CDK activity is regulated by a number of cellular processes: cyclin association,
association with cyclin-dependent inhibitors (CDI),CDK synthesis, proteolysis,
and various posttranslational modifications. Progression through the cell cycle
is controlled by the concentrations of different cyclin proteins, Thus, cyclin
degradation results in the loss of activity from its CDK partner, leading to the
arrest of the cell cycle. The regulation of cell cycle progression is important
for the cells’ ability to deal with external stresses. Therefore, CDKs serve a
checkpoint function, in that the cellular stress can block entry into the next
phase of the cell cycle through the expression of a member of the three major
CDI families, p21C’p’wAF, and ~ 1 6 [17].” ~ ~ ~
The idea of targeting CDKs represents a completely different strategy
for treating tumor cells: finding small molecules that inhibit specific
molecular targets as opposed to drugs that just kill tumor cells. Functionally,
all CDK inhibitors act by competitive inhibition of ATP binding to a
CDK. Whereas disruption of the CDK-cyclin interaction is an attractive
therapeutic strategy given its requirement for CDK activity, the large
protein-protein-binding surface of this interaction makes it a less-than-ideal
target relative to the small, well-defined ATP-binding pocket of CDK.
Accordingly, several antiproliferative natural products target the ATP-binding
site on CDKs.
2.2.4.2.1 Purine Analogs

The natural products olomoucine 7 and roscovitine 8 are relatively selective
kinase inhibitors that bind CDK1, 2, and 5 but have little effect on CDK4 and
G [18].These selective CDK inhibitory profiles result in cell cycle arrest in
the GI and G2 phases. Both inhibitors act in a dose-dependent and reversible
manner, thus allowing temporal control of CDK activity at different stages of
the cell cycle.
CDK inhibition by these potent natural products results in four major cellular
consequences: (a) inhibition of cell proliferation; (b) induction of apoptosis in
mitotic cells; (c) induction of cellular differentiation; and (d) protection from
apoptosis. Several studies have shown that purine derivatives arrest cells in
100
\ /
N
OH OH
5 6 7 8
QOH
CI
OH 0
9 10 11
either GI or GZ [19-211 primarily due to CDK2 and CDKl inhibition; however,

the effect on Erkl/2 activity has also been demonstrated [22]. CDK purine
derivative inhibitors also induce apoptosis in mitotic cells when combined
with other drug treatments. For example, roscovitine and olomoucine were
found to synergize with a farnesyltransferase inhibitor [23] to induce apoptosis
of human cancer cell lines. In addition, the combination of the microtubule
stabilizing drug Taxol@with the CDKl inhibitor purvalanol A 9 results in HeLa
cell apoptosis [24]. Treatment with either Taxol@or purvalanol A alone and in
combination (in the reverse order) were ineffectual, demonstrating an ordered
cooperativitybetween the two drugs. Likewise, the induction of differentiation
in murine erythroleukemia cells is triggered by the combined sequential
inhibition of CDK2 (with roscovitine) and CDK6 (via p16"K4a), while the
reverse sequence of inhibition was ineffective [20,25,26].Finally, purine analog
inhibitors of CDKs (5-10) can protect cells from apoptosis via a mechanism yet
undefined. Examples of this phenomenon include the prevention of CAMP-
induced apoptosis in rat leukemia cells [27], etoposide-induced apoptosis
in rat fibroblasts [28], and cell death in human immunodeficiency virus
(H1V)-inducedsyncytia [29].
2.2.4.2.2 Flavopiridol
Flavopiridol (FLV) 11 is a sernisynthetic flavinoid derived from rohitukine, an
indigenous plant from India [30]. FLV can induce cell cycle arrest by three
mechanisms: (a) direct inhibition of CDK via binding in the ATP-binding site;
2.2 Using Natural Products to Unravel Cell Biology I 101
(b) inhibition of CDK7/cyclin H consequently leading to loss of CDK activation

[31];and (c) decreased levels of cyclin D1, an oncogene that is overexpressed in
many human neoplasias [32]. Initial studies revealed that FLV arrested cells in
GI or Gz due to CDKl and CDK2 inhibition [33-351. In vitro studies, however,
revealed that FLV inhibits all CDKs thus far examined (IC50 100-300 nM)
[ 35 - 371.
2.2.4.3 Proteasome Inhibitors

Cell homeostasis and proliferation is dependent on both protein synthesis
as well as protein degradation. The proteasome serves as the primary
regulator of intracellular proteolysis. Specifically, the proteasome is a 700 kDa,
multicatalytic protease complex composed of two 19s regulatory particles
flanking the 20s proteolytic cylinder [38], itself consisting of 28 subunits
organized into four rings. The proteasome has three major classes of protease
activities: (a) trypsinlike activity; (b) chymotrypsin-like activity; (c) peptidyl-
glutamyl peptide hydrolyzing (PGPH) activity or caspaselike activity. Each
protease function appears to act independently, thereby degrading most
proteins into six to eight amino acid peptides. Proteins are targeted for
proteolysis via conjugation to 76 amino acid polypeptide ubiquitin (Ub)
catalyzed by a multistep process involving a series of enzymes that: activate
the Ub monomer ( E l ) , recognize the protein targeted for degradation (E3),
and transfer Ub monomers to lysine residues on the targeted protein (E2).
The proteasome has been implicated as a key player in a number of
important cellular processes including apoptosis, cell differentiation, M HC
class I antigen presentation, NF-KB activation, tumor suppression, and cell
division. In particular, the prominent role that the proteasome plays in cellular
proliferation has generated much attention toward the use of proteasome
inhibitors as antitumor chemotherapeutic agents. As more and more cellular
functions are linked to the proteasome, the use of proteasome inhibitors will
be increasingly important in the investigation of various signaling interactions.
2.2.4.3.1 Lactacystin
Originally characterized as a microbial metabolite that induced neurite
outgrowth in neuroblastoma cells [39, 401, lactacystin 14 was later found
to be a potent inhibitor of cell proliferation [41]. Using a [3H] lactacystin
analog, Fenteany and coworkers [39] demonstrated that lactacystin and
its related clasto-B-lactone covalently bind the N-terminal threonine of the
20s proteasome subunit. Functionally, lactacystin is a relatively nonspecific
protease inhibitor, also showing significant inhibition of peptidyl peptidase I1
and cathepsin A [40].Despite this cross-inhibitory activity, lactacystin has been
used to investigate the role of the Ub proteasome pathway in a diverse array
of systems such as Alzheimer’s disease, breast cancer, neurobiology, kidney
research, and nephrology, to name a few [41-461.
102
15
13
14
2.2.4.3.2 a,b-Epoxyketones
Selective covalent inhibitors of proteasome have also been developed.
Epoxomicin and eponemycin are members of the cr,B-epoxyketone class
of proteasome inhibitors that were isolated from actinomycete strains and
found to exhibit in vivo antitumor activity against B16 melanoma [47,48]. Early
structure activity studies and structural motifs present in similar molecules
suggested that the terminal epoxyketone moiety was an important aspect of
the functional pharmacophore, possibly via covalent modification of its target
protein. Through synthetic chemistry and biochemical affinity techniques, the
natural products and corresponding biotinylated affinity reagents have been
used to identify the 20s proteasome as the molecular target of epoxomicin 12
and eponemycin 13 [38,491.
X-ray crystallographic analysis demonstrated that the epoxyketone pharma-
cophore of epoxomicin forms a covalent adduct as a morpholino ring [SO] with
the amino terminal threonine of the 20s proteasome. Epoxomicin draws its
specificity from the uniqueness of the proteasomal N-terminal threonine; non-
proteasomal proteases lack an N-terminal nucleophilic residue and thus cannot
form a stable covalent morpholino adduct with the epoxomicin epoxyketone
pharmacophore [50].
These potent and specific proteasome inhibitors have been used to answer
questions in a number of biological fields and systems. For example, protea-
some inhibitors have been used to investigate inflammation, cancer biology
and neuroscience. In immune research, chemokines and their receptors play

an important role in host immune surveillance and are important mediators
of HIV pathogenesis and inflammatory response. Chemokines and their re-
ceptors have also been implicated in hematopoiesis, angiogenesis, embryonic
development and breast cancer metastasis. Specifically, they play important
roles in immune and inflammatory responses by regulating the directional
migration and activation of leukocytes. The chemokine receptors CXCR4 and
CCR5 have been shown to act as coreceptors for the entry and infection
of HIV-1 and HIV-2. The proteasome inhibitors lactacystin and epoxomicin
have been used to show that downmodulation mechanisms and chemotaxis
mediated by CCR5 and CXCR4 are dependent upon proteasome activity [51].
2.2.4.3.3 TMC-95A
Recently, more selective noncovalent inhibitors of proteasome have been
developed. TMC-95A 15 is a potent and reversible selective inhibitor of the
chymotrypsin-like, trypsinlike, and caspaselike activities ofthe 20s proteasome.
Comparatively, TMC-95A shows no inhibition of calpain, cathepsin, or trypsin.
This selectivity in activity has led to a great deal of current biological interest
in TMC-95A [50, 52,531 including X-ray crystallographic analysis showing that
TMC-95A does not covalently bind the yeast proteasome [54].
2.2.4.4 ATPase Inhibitors

Vacuolar ATPases (V-ATPases)are a class of enzymes that are found through-
out eukaryotes. Fundamentally, these multisubunit complexes function as
proton pumps, moving hydrogen ions from one side of a membrane to the
other. In so doing, they alter the pH of the distal compartment. Typically,
V-ATPases perform this function on the membrane of cellular vacuoles and
are dependent on ATP for the energy required to carry out their function.
Structurally, eukaryotic V-ATPases are comprised of 13 different polypeptides,
which are defined as comprising two specific functional domains; Vo is the
transmembrane-ion channel domain and V1 is the ATPase or ATP-binding
domain. Small molecule V-ATPase inhibitors are thought to function primarily
through binding to and inhibiting the Vo domain. In recent years, V-ATPase
have become important drug targets because their inhibition leads to highly
specific cytotoxic effects [55].
2.2.4.4.1 Bafilomycins and Concanamycins

A series of macrolides, bafilomycins 17 and concanamycins 16 were isolated
in a screen for secondary microbial metabolites having effects similar to those
of the cardiac glycosides ouabain and digitoxin [56].Their V-ATPase inhibitory
effects were not recognized until Bowman and colleagues discovered that
bafilomycins inhibit H+ V-ATPases at nanomolar concentrations [57]. Until
then these compounds had exhibited a wide range ofbiological activities: in vitro
104
inhibition of P-ATPase, antihelminthic activity against Caenorhabditis elegans,

stimulation of y -aminobutyric acid release from rat brain synaptosomes,
selective antifungal activity and inhibition of concanavalin-A-stimulated T-cell
proliferation.
16
I 0
’
17
From a functional standpoint, V-ATPases act as regulators of organelle pH

by pumping protons from the cytoplasm into the lumen. Inhibition of this
regulatory effect results in cytotoxicity. However, because these compounds
bind reversibly, they can be used to perturb a given system for the purpose of
understanding the effect of pH change on other cellular functions or protein
interactions. In addition, as they are reversible, recovery from drug treatment
can also be observed. Examples include inhibition of acidification in pinocytic
vesicles, inhibition of lysosomal acidification and degradation of Epidermal
Growth Factor (EGF)in mammalian cells [55].
2.2.4.5 Angiogenesis Inhibitors

Angiogenesis is the formation of new blood vessels from preexisting
blood vessels and is required for wound healing and reproduction. In
addition to these homeostatic roles for angiogenesis, the formation of
new blood vessels has been found to be required for the metastasis and
growth of tumors. Since Judah Folkman [58] proposed the link between
angiogenesis and tumor growth/metastases, much effort has focused on the
identificationand developmentof antiangiogenic small molecules as antitumor
chemotherapeutic agents.
Angiogenesis is closely regulated through the complex interactions of
endogenous factors that promote and inhibit the process. In general,
angiogenesis proceeds through three steps [59, 601: degradation of the

basement membrane, invasion or migration of cells through the degraded
matrix, and differentiation into mature blood vessels. For endothelial cell
proliferation to occur, the existing blood vessel cells must degrade the
underlying basement membrane and invade the stroma of the neighboring
tissue. Once the barrier has been broken, cells proliferate and migrate
into the underlying tissue. The cells differentiate and form capillary loops.
Subsequently, cell polarity is established and the formation of the lumen
begins. Small molecules that interrupt the various phases of angiogenesis
have been insightful in determining important signaling events that regulate
the various processes involved.
2.2.4.5.1 Curcuminoids
Curcuminoids, a group of natural products originally isolated from the
Indian spice turmeric, have been known to be potent antioxidant and anti-
inflammatory agents for many years. Curcuminoids reduce tissue factor (TF)
gene expression through the inhibition of the AP-1 and NF-KB transcription
factors and thus lead to the loss of angiogenesis initiation [Gl,621.
2.2.4.5.2 Fumagillin and TNP-470

Fifteen years ago, an astute observation made during the routine culturing
of endothelial cells led to the identification of a new antiangiogenic natural
product. The natural product fumagillin 18 was isolated from a contaminated
A. &migatus fresenius colony in the Folkman laboratory. Subsequent
derivatization of the parent natural product by Takada Pharmaceuticals yielded
the drug candidate TNP-470 19 that was 50 times more potent than the parent
natural product fumagillin [63]. Using the structure activity relationship as
a guide, a biotinylated affinity reagent was synthesized and used to identify
methionine aminopeptidase 2 (MetAP-2)as the molecular target of fumagillin
19
106
I and TNP-470 [G4].X-ray crystal structures of the free and the fumagillin-bound
2 Using Natural Products t o Unravel Biological Mechanisms
MetAP-2 revealed the mechanism of action of this potent natural product;

a covalent bond between the reactive spirocyclic epoxide of furnagillin and
histidine-23 1 of MetAP-2 blocks the active site.
Endothelial cells, unlike fibroblasts, display an impressive sensitivity to
fumagillin and TNP-470 addition. At the molecular level, TNP-470 does
not inhibit early GI mitogenic events such as cellular protein tyrosyl
phosphorylation or the expression of immediate early genes [GS]. However,
TNP-470 was found to induce expression of the CDK inhibitor p21C'P/WAF and
p53 in endothelial cells [GG]. Moreover, the function of both p21C1P/WAF and
p53 were shown to be essential for the endothelial cell cycle GI arrest induced
by TNP-470 and lack of p21C'P/WAF abrogates the inhibitory activity of TNP-470
on corneal angiogenesis in vivo. Thus it was shown that these antiangiogenic
compounds act through p21C'P/WAF induction to GI cell cycle arrest.
2.2.4.6 Immunosuppressant Natural Products

Using the immunosuppressive natural products cyclosporin A (CsA) 20,
rapamycin 22, and FK 506 21, researchers were able to unravel two key
20
22 21
2.2 Using Natural Products t o Unravel Cell Biology 1 107
signal transduction pathways in T lymphocytes (T cells). T cells respond to

an immune stimulus through the binding of an antigen-presenting cell to the
T-cell receptor (TCR).Binding subsequently initiates a cascade of intracellular
signaling events leading to activation and proliferation of the T cells and other
cell types required for an immune response. Importantly, this process induces
the transcription and thereby production of a range of effector molecules like
interleukin 2 (IL-2);IL-2 is secreted and binds to IL-2 receptors on various cells
including T lymphocytes and stimulates the cells to progress from G I to the
S phase of the cell cycle. This sequential chain of events drives the immune
response. Immunosuppressive natural products have proved useful in the
elucidation of several immune cell signal transduction pathways through the
identification of specific target proteins.
2.2.4.6.1 Cyclosporin A and FK 506

CsA is a cyclic undecapeptide that was isolated from the fungus Cylindrocarpon
lucidum Booth and Tolypocladium injlatum Gams in 1970 by the Sandoz
Laboratory. Interestingly, CsA has both high potency and selectivity for
inhibition of T-cell activation with low cytotoxicity. The structurally unrelated
polyketide metabolite FK 506, isolated in 1984 by the Fujisawa Pharmaceutical
Company from the fungus Streptomyces tsukubaensis 9996, proved to have
100 times greater immunosuppressive activity than CsA. Although the two
natural products were structurally different and had different potencies, they
exhibit the same phenotypic biological activity; both compounds prevented the
progression from Go to G I during T-cell activation.
CsA and FK 506 have proved to be critical tools in elucidating the signaling
events downstream of the TCR. Both were found to block the same step in
Ca2+-dependentsignaling pathways. Additionally, these natural products were
also found to bind to peptidyl-prolyl cis- trans isomerases, collectively known
as immunophilins. CsA binds cyclophilin [67] and FK 506 binds FKBP 12
[68].Although it appeared that both natural products functioned through the
same mechanism of calcium-dependent gene expression, oddly neither target
protein alone initiated the release of calcium. For the cell cycle inhibition,
both the small molecule and the protein are needed to be present. Using
affinity chromatography with immobilized protein-natural product complexes,
the phosphatase calcineurin was identified as the target of both protein-drug
complexes [69]. In vivo the protein-ligand pairs formed immunosuppressive
complexes that inhibited the calcium-dependent calcineurin phosphatase
activity.
The T-cell specific transcription factor, NFAT is held in the cytosol through
the presence of an inhibitory phosphorylated residue. Upon TCR-mediated
calcium release, the calcineurin dephosphorylates NFAT, translocates to
the nucleus. CsA and FK 506 have proved useful in identifying this
pharmaceutically vulnerable step in immune cell signaling [70].
108
2.2.4.6.2 Rapamycin
The fungal immunosuppressive agent rapamycin was isolated from Strepto-
myces hygroscopicus, originally found in a soil sample from Rapa-Nui, Easter
Island in 1975. Although structurally similar to FK 506, rapamycin demon-
strated markedly different activity. Rapamycin does not affect the progression
from Go to GI, but rather blocks T-cell progression from GI to S phase.
As FK 506 and rapamycin share structural similarities, it was not surprising
that rapamycin also bound FKBP 12. However, binding studies revealed that
the FKBP 12-rapamycin complex does not target calcineurin, as done by the
F K 50G-FKBP 12 complex. Rather, using FKBP 12-rapamycin complex as an
affinity reagent, the lipid kinases target of rapamycin 1 and 2 (TOR1 and
TOR2) were identified [71]; these proteins possess homology to the mam-
malian phosphatidyl inositol-3-kinases, which are involved in the regulation
of cell cycle progression in stimulated cells. Studies have shown that growth
factor addition to cells leads to TOR activation and subsequent increased p70
SG kinase activity [72].
2.2.4.7 Other Examples of Biologically Active Natural Products
2.2.4.7.1 Capsaicin
Some of the most commonly and frequently used spices throughout the
world are hot peppers of the Capsicum family, of which capsaicin 23 is the
major pungent ingredient. Because of its analgesic and anti-inflammatory
activities, topical application of capsaicin has been used for the treatment of
a variety of neuropathic pain conditions. Autoradiographic visualization of
a tritiated resiniferatoxin probe in tissues of various species identified the
vanilloid receptor (VR) as a molecular target [73, 741. Additionally, capsaicin
was used as a molecular probe to isolate the first nociceptive receptor, VR1[75].
Characterization of VR1 revealed it to be a member of the Transient Receptor
Potential (TRP)ion channel family and a nonselective cation channel activated
by capsaicin or elevated temperatures.
'0
24
2.2 Using Natural Products t o Unravel Cell Biology I 109
2.2.4.7.2 Parthenolide
Parthenolide 24, the biologically active natural product in the medicinal
herb Feverfew, has been used for 2000 years to treat fevers, headaches, and
inflammation [76]. Initial studies of the anti-inflammatory of parthenolide
activity showed that it was a potent inhibitor of NF-KB nuclear translocation
as well as I K B phosphorylation. Using a biotinylated analog of parthenolide
in affinity chromatography experiments revealed that parthenolide formed a
covalent adduct with IKB Kinase beta (IKK-B) in a specific and dose-dependent
manner [77]. This specific interaction between IKKB and parthenolide was
confirmed by mass spectrometric analysis. Parthenolide was shown to
form a covalent adduct with Cys179 of IKKB, which lies between the two
phosphorylated serines in the kinase activation loop. Moreover, constitutively
activated protein with a Cysl79Ala point mutation was found to be insensitive
to 40 pM parthenolide, indicating that parthenolide inhibits IKKB via Michael
addition by Cys179 in the kinase activation loop [77].
2.2.5
Future Development
Mechanism of action studies of biologically natural products have profited

greatly from the emerging field of chemical biology as chemists and biologists
have worked more closely over the last 15 years. Moreover, these natural
products will continue to be of great use as drug development leads in addition
to their use as tools for understanding intracellular processes.
2.2.6
Conclusions
After a decade, both natural products and cell-based bioassay screening, which
were out of favor, are making a renaissance in the pharmaceutical industry.
Natural products still offer an impressive range of chemical diversity and
have a long track record of providing scaffolds for successful drugs. A greater
appreciation of their potential for the identification of novel hit structures
is propelling a new interest in the use of natural product screens in the
pharmaceutical industry. Likewise, cell-based bioassays are regaining some
of their previous acceptance in the drug development process, primarily
because of the success of novel target deconvolution strategies. New proteomic
technologies are largely behind the belief that the pharmaceutical industry has
the ability to identify the targets of compounds identified in cell-based assays.
Obviously, not all biologically active compounds identified in these screens
will be developed into therapeutic agents. However, this renewed interest in
both natural products and cell-based assays will, in turn, offer many new
2 Using Natural Products to Unravel Biologicd Mechanisms
110
I opportunities for the development of novel cell biological probes, using the
fruits of these screens.
Acknowledgments
The authors would like to acknowledge the financial support of the NIH (grant
GMG21G0).
References
1. A.U. Khan, S. Krishnamurthy, differentiation, J . Cell. Biochem. 1999,

Histone modifications as key 76,270-279.
regulators of transcription, Front. 9. M. Marin-Husstege, M. Muggironi,
Biosci. 2005, 10,866-872. A. Liu, P. Casaccia-Bonnefil,Histone
2. M. Grunstein, Histone acetylation in deacetylase activity is necessary for
chromatin structure and transcription, oligodendrocyte lineage progression,
Nature 1997,389,349-352. J . Neurosci. 2002, 22, 10333-10345.
3. M. Yoshida, S. Horinouchi, T. Beppu, 10. H. Itazaki, K. Nagashima, K. Sugita,
Trichostatin A and trapoxin: novel H. Yoshida, Y. Kawamura, Y. Yasuda,
chemical probes for the role of histone K. Matsumoto, K. Ishii, N. Uotani,
acetylation in chromatin structure and H. Nakai et al., Isolation and
function, BioEssays 1995, 17, 423-430. structural elucidation of new
4. M. Yoshida, M. Kijima, M. Akita, cyclotetrapeptides, trapoxins A and B,
T. Beppu, Potent and specific having detransformation activities as
inhibition of mammalian histone antitumor agents, /.Antibiot. (Tokyo)
deacetylase both in vivo and in vitro by 1990,43,1524-1532.
trichostatin A,]. Bid. Chem. 1990, 265, 11. J. Taunton, C.A. Hassig, S.L.
17174- 17179. Schreiber, A mammalian histone
5. M.H. Kuo, C.D. Allis, Roles of histone deacetylase related to the yeast
acetyltransferases and deacetylases in transcriptional regulator Rpd3p,
gene regulation, BioEssays 1998, 20, Science 1996, 272, 408-411.
615-626. 12. C.A. Hassig, T.C. Fleischer, A.N.
6. M. Yoshida, A. Matsuyama, Billin, S.L. Schreiber, D.E. Ayer,
Y. Komatsu, N. Nishino, From Histone deacetylase activity is required
discovery to the coming generation of for full transcriptional repression by
histone deacetylase inhibitors, Curr. mSin3A, Cell 1997,89,341-347.
Med. Chem. 2003, 10,2351-2358. 13. K. Sugita, H. Yoshida, M. Matsumoto,
7. Y. Sowa, T. Orita, S. Hiranabe- S. Matsutani, A novel compound,
Minamikawa, K. Nakano, T. Mizuno, depudecin, induces production of
H. Nomura, T. Sakai, Histone transformation to the flat phenotype of
deacetylase inhibitor activates the NIH3T3 cells transformed by
p21/WAFl/Cipl gene promoter ras-oncogene, Biochem. Biophys. Res.
through the Spl sites, Ann. N.Y. Acad. Commun. 1992, 182,379-387.
S C ~1999,886,195-199.
. 14. J.W. Han, S.H.Ahn, S.H. Park, S.Y.
8. X.M. Tang, J.S. Beesley, J.B. Grinspan, Wang, G.U. Bae, D.W. Seo, H.K.
P. Seth, J. Kamholz, F. Cambi, Cell Kwon, S. Hong, H.Y. Lee, Y.W. Lee,
cycle arrest induced by ectopic H.W. Lee, Apicidin, a histone
expression of p27 is not sufficient to deacetylase inhibitor, inhibits
promote oligodendrocyte proliferation of tumor cells via
induction of p21WAFl/Cipl and roscovitine and olomoucine, synergize
gelsolin, Cancer Res. 2000, 60, with farnesyltransferase inhibitor
6068-6074. (FTI) to induce efficient apoptosis of
15. S.H. Kim, S.Ahn, J.W. Han, H.W. human cancer cell lines, Oncogene
Lee, H.Y. Lee, Y.W. Lee, M.R. Kim, 2000, 19,3059-3068.
K.W. Kim, W.B. Kim, S. Hong, 24. D.S. O’Connor, N.R. Wall, A.C. Porter,
Apicidin is a histone deacetylase D.C. Altieri, A p34(cdc2) survival
inhibitor with anti-invasive and checkpoint in cancer, Cancer Cell 2002,
anti-angiogenic potentials, Biochem. 2,43-54.
Biophys. Res. Commun. 2004, 315, 25. 1. Matushansky, F. Radparvar, A.I.
964-970. S koultchi, Reprogramming leukemic
16. T. Oikawa, C. Onozawa, M. Inose, cells to terminal differentiation by
M. Sasaki, Depudecin, a microbial inhibiting specific cyclin-dependent
metabolite containing two epoxide kinases in G1, Proc. Natl. Acad. Sci.
groups, exhibits anti-angiogenic U.S.A. 2000, 97, 14317-14322.
activity in vivo, Bid. Pharm. Bull. 1995, 26. A. Borgne, R.M. Golsteyn, The role of
18,1305-1307. cyclin-dependent kinases in apoptosis,
17. K. Vermeulen, D.R. Van Bockstaele, Prog. Cell Cycle Res. 2003, 5, 453-459.
Z.N. Berneman, The cell cycle: a 27. T. Sandal, C. Stapnes, H. Kleivdal,
review of regulation, deregulation and L. Hedin, S.O. Doskeland, A novel,
therapeutic targets in cancer, Cell extraneuronal role for
Prolq 2003,36, 131-149. cyclin-dependent protein kinase 5
18. N. Gray, L. Detivaud, C. Doerig, (CDK5):modulation of CAMP-induced
L. Meijer, ATP-site directed inhibitors apoptosis in rat leukemia cells, J . Biol.
of cyclin-dependent kinases, C u r . Chem. 2002, 277,20783-20793.
Med. Chem. 1999, 6,859-875. 28. S . Adachi, A.J. Obaya, Z. Han,
19. N. Villerbu, A.M. Gaben, G. Redeuilh, N. Ramos-Desimone, J.H. Wyche,
J. Mester, Cellular effects of J.M. Sedivy, c-Myc is necessary for
purvalanol A: a specific inhibitor of DNA damage-induced apoptosis in the
cyclin-dependent kinase activities, Int. G(2) phase of the cell cycle, Mol. Cell.
J. Cancer 2002, 97, 761-769. Bid. 2001, 21,4929-4937.
20. R.T. Abraham, M. Acquarone, 29. M. Castedo, T. Roumier, J. Blanco,
A. Andersen, A. Asensi, R. Belle, K.F. Ferri, J. Barretina, L.A. Tintignac,
F. Berger, C. Bergounioux, G. Brunn, K. Andreau, J.L. Perfettini,
C. Buquet-Fagot, D. Fagot et al., A. Amendola, R. Nardacci, P. Leduc,
Cellular effects of olomoucine, an D.E. Ingber, S. Druillennec,
inhibitor of cyclin-dependent kinases, B. Roques, S.A. Leibovitch,
Biol. Cell 1995, 83, 105-120. M. Vilella-Bach, J. Chen, ].A. Este,
21. F. Alessi, S. Quarta, M. Savio, F. Riva. N. Modjtahedi, M. Piacentini,
L. Rossi, L.A. Stivala, A.I. Scovassi, G. Kroemer, Sequential involvement
L. Meijer, E. Prosperi, The of Cdkl, mTOR and p53 in apoptosis
cyclin-dependent kinase inhibitors induced by the HIV-1 envelope,
olomoucine and roscovitine arrest EMBOJ. 2002,21,4070-4080.
human fibroblasts in G1 phase by 30. R.G. Naik, S.L. Kattige, S.V. Bhat,
specific inhibition of CDK2 kinase B. Alreja, N.J. Desouza, R.H. Rupp,
activity, Exp. Cell Res. 1998, 245, 8-18. An antiinflammatory cum
22. M. Knockaert, P. Lenorrnand, N. Gray, immunomodulatory
P. Schultz, J. Pouyssegur, L. Meijer, piperidinylbenzopyranone from
p42/p44 MAPKs are intracellular dysoxylum-binectariferum-isolation,
targets of the CDK inhibitor structure and total synthesis,
purvalanol, Oncogene 2002, 21, Tetrahedron 1988, 44,2081-2086.
6413-6424. 31. S . Mani, C. Wang, K. Wu, R. Francis,
23. H. Edarnatsu, C.L. Gau, T. Nemoto, R. Pestell, Cyclin-dependent kinase
L. Guo, F. Tamanoi, Cdk inhibitors, inhibitors: novel anticancer agents,
2 Using Natural Products to Unravel BiologicalI Mechanisms
112
I Expert Opin. Investig. Drugs 2000, 9, proteasome biology, Bioorg. Med.
1849-1870. Chem. Lett. 1999, 9,2283-2288.
32. E.A. Sausville, D. Zaharevitz, 39. G. Fenteany, R.F. Standaert, W.S.
R. Gussio, L. Meijer, M. Louarn-Leost, Lane, S. Choi, E.J. Corey, S.L.
C. Kunick, R. Schultz, T. Lahusen, Schreiber, Inhibition of proteasome
D. Headlee, S. Stinson, S.G. Arbuck, activities and subunit-specific
A. Senderowicz, Cyclin-dependent amino-terminal threonine
kinases: initial approaches to exploit a modification by lactacystin, Science
novel therapeutic target, Pharmacol. 1995, 268,726-731.
7'her. 1999, 82, 285-292. 40. H. Ostrowska, C. Wojcik, S. Omura,
33. G. Kaur, M. Stetler-Stevenson, K. Worowski, Lactacystin, a specific
S. Sebers, P. Worland, H. Sedlacek, inhibitor of the proteasome, inhibits
C. Myers, J. Czech, R. Naik, human platelet lysosomal cathepsin
E. Sausville, Growth inhibition with A-like enzyme, Biochem. Biophys. Rex
reversible cell cycle arrest of Commun. 1997,234,729-732.
carcinoma cells by flavone L86-8275,J. 41. S. Omura, H. Takeshima, Lactacystin:
Natl. Cancer Inst. 1992, 84, 1736-1740. a tool for elucidation of proteasome
34. P.J.Worland, G. Kaur, functions, Tanpakushitsu Kakusan
M. Stetler-Stevenson, S. Sebers, KOSO1996, 41, 327-336.
0. Sartor, E.A. Sausville,Alteration of 42. J.Y. Zhang, S.J. Liu, H.L. Li, J.Z.
the phosphorylation state of p34cdc2 Wang, Microtubule-associated protein
kinase by the flavone L86-8275 in tau is a substrate of
breast carcinoma cells. Correlation ATP/Mg(2+)-dependent proteasome
with decreased H1 kinase activity, protease system,]. Neural Transm.
Biochem. Pharmacol. 1993, 46, 2005, 112,547-555.
1831-1840. 43. T. Tsukinoki, H. Sugiyarna,
35. M.D. Losiewicz, B.A. Carlson, R. Sunami, M. Kobayashi, T. Onoda,
G. Kaur, E.A. Sausville, P.J.Worland, Y. Maeshima, Y. Yamasaki,
Potent inhibition of CDC2 kinase H. Makino, Mesangial cell Fas ligand:
activity by the flavonoid L86-8275, upregulation in human lupus
Biochem. Biophys. Res. Commun. 1994, nephritis and NF-kappaB-mediated
expression in cultured human
201,589-595.
mesangial cells, Clin. Exp. Nephrol.
36. B. Carlson, T. Lahusen, S. Singh,
2004,8,196-205.
A. Loaiza-Perez,P.J. Worland,
44. C. Lorz, P. Justo, A.B. Sanz, J. Egido,
R. Pestell, C. Albanese, E.A. Sausville,
A. Ortiz, Role of Bcl-xL in
A.M. Senderowicz, Down-regulation paracetamol-induced tubular epithelial
of cyclin D1 by transcriptional cell death, Kidney Int. 2005, 67,
repression in MCF-7 human breast 592-601.
carcinoma cells induced by 45. K.L. De Moliner, M.L. Wolfson,
flavopiridol, Cancer Res. 1999, 59, N. Perrone Bizzozero, A.M. Adamo,
4634-4641. Growth-associated protein43 is
37. B.A. Carlson, M.M. Dubay, E.A. degraded via the ubiquitin-proteasome
Sausville, L. Brizuela, P.J. Worland, system, J. Neurosci. Res. 2005, 79,
Flavopiridol induces G1 arrest with 652-660.
inhibition of cyclin-dependent kinase 46. M.R. Brown, V. Bondada, J.N. Keller,
(CDK)2 and CDK4 in human breast J. Thorpe, J.W. Geddes, Proteasome or
carcinoma cells, Cancer Res. 1996, 56, calpain inhibition does not alter
2973-2978. cellular tau levels in neuroblastoma
38. N. Sin, K. Kim, M. Elofsson, L. Meng, cells or primary neurons, 1.Alzheimers
H. Auth, B.H.B. Kwok, C.M. Crews, Dis. 2005, 7, 15-24.
Total synthesis of the potent 47. K. Sugawara, M. Hatori,
proteasome inhibitor epoxomicin: a Y. Nishiyama, K. Tomita, H. Kamei,
useful tool for understanding M. Konishi, T. Oki, Eponemycin, a
References I 113
new antibiotic active against B16 56. L. Huang, G. Albers-Schonberg, R.L.

melanoma. I. Production, isolation, Monaghan, K. Jakubas, S.S. Pong,
structure and biological activity,/. O.D. Hensens, R.W. Burg, D.A.
Antibiot. 1990, 43, 8-18. Ostlind, J. Conroy, E.O. Stapley,
48. M. Hanada, K. Sugawara, K. Kaneta, Discovery, production and purification
S. Toda, Y. Nishiyama, K. Tornita, of the Na+, K+ activated ATPase
H. Yamamoto, M. Konishi, T. Oki, inhibitor, L-681,110from the
Epoxomicin, a new antitumor agent of fermentation broth of streptomyces
microbial origin, /. Antibiot. 1992, 45, sp. MA-5038,]. Antibiot. (Tokyo) 1984,
1746- 1752. 37,970-975.
49. L. Meng, R. Mohan, B.H.B. Kwok, 57. E.J. Bowman, A. Siebers, K. Altendorf,
M. Elofsson, N. Sin, C.M. Crews, Bafilomycins: a class of inhibitors of
Epoxomicin, a potent and selective membrane ATPases from
proteasome inhibitor, exhibits in vivo microorganisms, animal cells, and
anti-inflammatory activity, Proc. Natl. plant cells, Proc. Natl. Acad. Sci. U.S.A.
Acad. Sci. U.S.A. 1999, 96, 1988,85,7972-7976.
10403-10408. 58. J. Folkman, Tumor angiogenesis, Adv.
50. M. Groll, K.B. Kim, R. Huber, C.M. Cancer Res. 1974, 19, 331-358.
Crews, Crystal structure of 59. S.M. Hyder, G.M. Stance], Regulation
epoxomicin:20S proteasome reveals of angiogenic growth factors in the
molecular basis for selectivity of female reproductive tract by estrogens
d,,Y-epoxyketone proteasome and progestins, Mol. Endocrinol. 1999,
inhibitors, 1.Am. Chem. Soc. 2000, 13,806-811.
122,1237-1238.
60. S. Liekens, E. De Clercq, J . Neyts,
Angiogenesis: regulators and clinical
51. A.Z. Fernandis, R.P. Cherla, R.D.
applications, Biochem. Pharmacol.
Chernock, R.K. Ganju, CXCR4/CCR5
2001, 61,253-270.
down-modulation and chemotaxis are
61. S. Singh, B.B. Agganval, Activation of
regulated by the proteasome pathway,
transcription factor NF-kappa B is
/. Biol. Chem. 2002,277,18111-18117. suppressed by curcumin
52. J. Kohno, Y. Koguchi, M. Nishio,
(diferuloylmethane) [corrected],/. Biol.
K. Nakao, M. Kuroda, R. Shimizu, Chem. 1995,270,24995-25000.
T. Ohnuki, S. Komatsubara, 62. T.S. Huang, M.L. Kuo, J.K. Lin, J.S.
Structures of TMC-95A-D:novel Hsieh, A labile hyperphosphorylated
proteasome inhibitors from Apiospora c-Fos protein is induced in mouse
montagnei sacc. TC 1093,J. Org. fibroblast cells treated with a
Chem. 2000, 65,990-995. combination of phorbol ester and
53. Y. Koguchi, J. Kohno, M. Nishio, anti-tumor promoter curcumin,
K. Takahashi, T. Okuda, T. Ohnuki, Cancer Lett. 1995, 96, 1-7.
S. Komatsubara, TMC-95A, B, C, and 63. D. Ingber, T. Fujita, S. Kishimoto,
D, novel proteasome inhibitors K. Sudo, T. Kanamaru, H. Brem,
produced by Apiospora montagnei J. Folkman, Synthetic analogues of
Sacc. TC 1093. Taxonomy, production, furnagillin that inhibit angiogenesis
isolation, and biological activities, /. and suppress tumour growth, Nature
Antibiot. (Tokyo)2000,53, 105-109. 1990,348,555-557.
54. M. Groll, Y. Koguchi, R. Huber, 64. N. Sin, L. Meng, M.Q.W. Wang, J.J.
J. Kohno, Crystal structure of the 20 S Wen, W.G. Bornmann, C.M. Crews,
proteasome:TMC-95A complex: a The anti-angiogenic agent furnagillin
non-covalent proteasome inhibitor, /. covalently binds and inhibits the
Mol. Bid. 2001, 311, 543-548. methionine aminopeptidase,
55. S. Drose, K. Altendorf, Bafilomycins MetAP-2, Proc. Natl. Acad. Sci. U.S.A.
and concanamycins as inhibitors of 1997, 94,6099-6103.
V-ATPases and P-ATPases,]. Exp. 65. H. Koyama, Y. Nishizawa, M. Hosoi,
Bid. 1997, 200, 1-8. S. Fukumoto, K. Kogawa, A. Shioi,
2 Using Natural Products t o Unravel Biological Mechanisms
114
I H. Morii, The fumagillin analogue 73. A. Szallasi, Autoradiographic
Tnp-470 inhibits DNA synthesis of visualization and pharmacological
vascular smooth muscle cells characterization of vanilloid
stimulated by platelet-derived growth (capsaicin) receptors in several
factor and insulin-like growth species, including man, Acta Physiol.
factor-I-possible involvement of %and. Suppl. 1995, 629, 1-68.
cyclin-dependent kinase 2, Circ. Res. 74. A. Szallasi, S. Nilsson,
1996, 79,757-764. T. Farkas-Szallasi, P.M. Blumberg,
66. J.R. Yeh, R. Mohan, C.M. Crews, The T. Hokfelt, J.M. Lundberg, Vanilloid
antiangiogenic agent TNP-470 (capsaicin) receptors in the rat:
requires p53 and p21CIP/WAF for distribution in the brain, regional
endothelial cell growth arrest, Proc. differences in the spinal cord, axonal
Natl. Acad. Sci. U.S.A. 2000, 97, transport to the periphery, and
12782- 12787. depletion by systemic vanilloid
67. R.E. Handschumacher, M.W. treatment, Brain Res. 1995, 703,
Harding, J. Rice, R.J. Drugge, D.W. 175-183.
Speicher, Cyclophilin: a specific 75. S.M. Huang, T. Bisogno, M. Trevisani,
cytosolic binding protein for A. Al-Hayani, L. De Petrocellis,
cyclosporin A, Science 1984, 226, F. Fezza, M. Tognetto, T.J. Petros, J.F.
544-547. Krey, C.J. Chu, J.D. Miller, S.N.
68. G.D. Van Duyne, R.F. Standaert, P.A. Davies, P. Geppetti, J.M. Walker, V. Di
Karplus, S.L. Schreiber, J. Clardy, Marzo, An endogenous capsaicin-like
Atomic structure of FKBP-FK506,an substance with high potency at
immunophilin-immunosuppressant recombinant and native vanilloid VR1
complex, Science 1991, 252,839-842. receptors, Proc. Natl. Acad. Sci. U.S.A.
69. J. Liu, J.D.J. Farmer, W.S. Lane, 2002, 99,8400-8405.
J. Friedman, 1. Weissman, S.L. 76. S . Heptinstall, D.V. Awang, B.A.
Schreiber, Calcineurin is a common Dawson, D. Kindack, D.W. Knight,
target of cyclophilin-cyclosporin A and J. May, Parthenolide content and
FKBP-FK506complexes, Cell 1991, 66, bioactivity of feverfew (Tanaceturn
807-815. parthenium (L.) Schultz-Bip.).
70. N.A. Clipstone, G.R. Crabtree, Estimation of commercial and
Identification of calcineurin as a key authenticated feverfew products, 1.
signalling enzyme in T-lymphocyte P h a m . Phamacol. 1992,44,391-395.
activation, Nature 1992, 357, 695-697. 77. B.H. Kwok, B. Koh, M.I. Ndubuisi,
71. E.J. Brown, M.W. Albers, T.B. Shin, M. Elofsson, C.M. Crews, The
K. Ichikawa, C.T. Keith, W.S. Lane, anti-inflammatory natural product
S.L. Schreiber, A mammalian protein parthenolide from the medicinal herb
targeted by G1-arresting Feverfew directly binds to and inhibits
rapamycin-receptor complex, Nature IkappaB kinase, Chew. B i d . 2001, 8,
1994,369,756-758. 759-766.
72. 1. Mann, Natural products as
immunosuppressive agents, Nat. Prod.
Rep. 2001, 18, 417-430.
Chemical Biology
3
Engineering Control Over Protein Function Using Chemistry
3.1
Revealing Biological Specificity by Engineering Protein- Ligand Interactions
Matthew D. Simon and Kevan M . Shokat
Outlook
Protein function can be altered in a rapid and graded manner through small
molecule ligand binding in both natural systems and through drug design. In
natural systems evolutionary pressure can lead to accumulation of mutations
that influence ligand binding specificity, thereby altering protein function.
Similarly, in the laboratory, mutations that have well defined effects on a
protein’s ligand specificity can provide a functional handle to elucidate the
protein’s biological role. Here we explore examples of mutations, introduced
in the laboratory or found in nature, that cause significant changes to protein
ligand specificity, with an emphasis on the biological and biochemical lessons
learned from these studies. The examples described here illustrate both the
challenges and the power of engineering protein-ligand interactions in order
to elucidate a protein’s biological role.
3.1 .I
Introduction
The exquisite specificity observed in biological systems emerges from the

composite specificity of interactions at the molecular level. Understanding the
mapping between molecular interactions and their functional consequences is
the aim of molecular biology. While it is common to characterize biochemical
activities of a protein i n vitro, identifying the biological importance of these
activities in a complex environment such as a cell extract, an intact cell, or
even an entire organism, remains a daunting task. Genetic approaches provide
Edited bv Stuart L. Schreiber. Tarun M. Kauoor. and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag G k b H & Co KGaA, Weinheirn
ISBN 978-3-527-31150-7
116
I powerful means to investigate these biological activities (e.g., observing the
phenotype that results from a gene disruption). However, protein engineering

can provide complementary information that connects the biochemical
specificity of a protein to its functional role. Here we discuss examples
where protein-ligand interactions can be engineered to provide a specificity
handle that can in turn be used to link a molecular interaction to a biological
result.
In these experiments a protein is mutated to alter its ligand Specificity.
The resulting engineered protein-ligand interaction is then used to infer
the role of the unmodified protein in the biological system. The success of
this strategy requires that we specifically engineer protein-ligand interactions.
How feasible is it to alter a protein’s ligand specificity in a well-defined manner?
Are mutations that change the ligand specificity of a protein common or rare?
Are mutations that alter the specificity of small-molecule binding more or
less likely to destroy other functions or properties of the protein such as
its catalytic activity, stability, or cellular localization? From all the potential
mutations at a protein-ligand interface, what strategy do we use to identify the
productive mutations that have a desired effect on protein-ligand interactions?
Molecular evolution in nature provides inspiration to help answer these
questions.
While the mechanistic details accounting for the success of natural molecular
evolution are distinct from the practical details governing protein engineering,
there are similarities that are worth elaborating. In particular, molecular
evolution in nature demonstrates that a small number of mutations is often
sufficient to cause dramatic changes in the ligand-binding properties of
a protein. Similarly, in protein engineering, a single point mutation is
often enough to provide a specificity handle that allows a protein to be
uniquely sensitive or uniquely resistant to a small-molecule ligand. By keeping
engineered changes to the protein simple, the potential to rationally engineer
proteins is increased, and the chances ofother adverse effects are minimized. In
fact, many engineering strategies based on individual mutations are essentially
indistinguishable from natural strategies found throughout evolution. Here
we discuss several such examples.
3.1.2
The Selection of Resistance Mutations to Small-molecule Agents
3.1.2.1 HIV Protease Inhibition and Substrate Selectivity

Drug resistance mutations are common in patients treated with anti-HIV
compounds such as indinavir and nelfinavir. These drugs act by inhibiting
HIV protease (HIV PR), one of the essential HIV proteins required for viral
growth and infection. These drugs inhibit HIV PR by competing with the
peptide substrate to bind in the active site of HIV PR (see Fig. 3.1-1). The
rapid emergence of inhibitor resistance is caused, in part, by the low fidelity of
3.1 Revealing Biological Specificity by Engineering Protein-Ligand Interactions I 1 1 7
lndinavir NeIfi navir
Fig. 3.1-1 HIV PR bound to a NC-pl peptide substrate [3] (a) and nelfinavir (b) (41.
the HIV-reverse transcriptase. An these experiments a protein is mutated to

alter its ligand specificity. The resulting engineered protien-ligand interaction
is then used to infer the role of the unmodified protein in the biological
system. For example, nelfinavir is a potent inhibitor of the wild-type HIV PR
(Ki = 0.28 nM) However, a double mutant of HIV PR (V82F/I84V) that has
118
I been observed in patients causes the virus to become refractory to nelfinavir
(K,rnut/K,wt = 86) [I].

Given that the inhibitors mimic the protease's natural peptide substrate, it
is perhaps surprising that HIV PR mutants can overcome inhibition without
losing a critical level of enzymatic activity". The most common resistance-
causing mutations are found in close proximity to the substrate-binding
pocket. Given that the inhibitor-binding surface largely overlaps with the
substrate-binding surface, how do these mutations disrupt inhibitor binding
while retaining substrate recognition? Structural analysis reveals that the
inhibitors tend to penetrate deeply into the same pockets that the protease
uses to bind the side chains of its substrates - in fact, the inhibitors tend
to penetrate more deeply into the pockets than the substrates themselves.
Therefore, mutations in the protease can disrupt the deepest inhibitor contacts
while having a smaller effect on substrate binding [2]. Indeed, it appears
that the majority of the characterized inhibitor-resistance mutations work by
disrupting these deep inhibitor contacts, thereby selectively disrupting the
binding of one ligand (the inhibitor) without affecting another ligand (the
substrate).
While many of the characterized HIV PR mutants do not substantially alter
the protease substrate specificity there are other resistance-causing mutations
that do, the best characterized of which is V82A. When the in vitro substrate
specificity of the V82A mutant (inhibitor resistant) was compared with the
wild-type (inhibitor sensitive) strain, the V82A-containing enzyme was found
to have a statistically significant increase in activity for Val over Ala at the P2
position of the substrate (P2 is the second amino acid N-terminal to the scissile
bond) [S]. So, in this case, mutations in HIV PR were selected, that disrupt
the inhibitor-binding surface, but in doing so, the substrate specificity of the
protease was also affected. So how does the virus accommodate this change in
specificity?
HIV PR cleaves several substrates during viral development. Among these
sites, cleavage of the nucleocapsid-pl (NC-pl) site is rate limiting to viral
maturation. The occurrence of the V82A mutation in HIV PR correlates with
an alanine-to-valine mutation at the NC-pl cleavage site. In other words, it
appears that, under the pressure of selection caused by the HIV PR inhibitor,
a HIV PR mutant (V82A) was selected with alterations to the inhibitor-
binding site, thereby changing the substrate specificity at P2. Along with the
altered substrate specificity at P2 came compensatory mutations in one of
the substrate sequences (Ala-to-Valat P2). Residue V82 does not make direct
van der Waals contact with the P2 side chain. Rather, incorporation of an
Ala-to-Val mutation at P2 generally increases the quality of fit between the
substrate and the enzyme thereby compensating for loss of substrate-enzyme
contacts at V82 [3]. This structural difference explains why the V82A mutation
1) The V82FII84V mutant HIV PR is func- mutant (kcat/KM = 0.5 mM sc') is com-
tionally active - a vims with these mutants promised relative to the wild-type enzyme
is viable - yet the catalytic efficiency of the ( k c a t / K ~= 30 mM-' s -') 111.
3. I Revealing Biological Specificity by Engineering Protein-Ligand lnteractions 1 119
in the enzyme and the Ala-to-Val mutation in the substrate are found to
coevolve.
There are at least three lessons from HIV PR inhibitor resistance. First,
relatively few mutations are often sufficient to induce inhibitor resistance,
and in many cases a single point mutation is sufficient. Interestingly, several
mutations allow HIV to overcome inhibitor sensitivity demonstrating that
there are numerous solutions to the same engineering problem. While the
mutations are focused in regions that directly contact the inhibitor, as we might
expect, some are sufficiently subtle (e.g., acting through slight rearrangements
of the protein core) that it is hard to imagine predicting similar mutations
while attempting to rationally engineer a protein.
Second, relatively few mutations may be necessary to convergently engineer
a protein and its substrate - in this case natural selection led to a HIV PR
mutation (I82A) that changed its substrate selectivity and a compensatory
change in one of its substrates.
While the first two lessons are encouraging for the purposes of engineering
proteins with altered specificities, the third lesson is largely cautionary:
protein functions can be intimately interconnected. In at least one case,
altering the inhibitor surface of HIV PR affected the substrate specificity
of the mutant proteases. For this reason, engineering projects that intend
to dissect individual functions of a given protein must also take care to
control other unintended changes to the protein function. For example, it is
common that engineering a protein will adversely affect its stability or activity.
This natural example demonstrates the feasibility but also the challenges of
mutating a protein to alter its ligand specificity using only a small number of
mutations.
3.1.2.2 Identification ofthe Target o f Rapamycin

While the emergence of HIV strains resistant to HIV PR inhibitors presents a
serious medical challenge, there are cases where the development resistance
mutations to an inhibitor can be invaluable, particularly when the mode
of action of an inhibitor is unknown. Such was the case with the small-
molecule immunosuppressant rapamycin [GI.The natural product rapamycin
became the subject of intensive study after it was demonstrated to block
helper T-cell activation through an unknown mechanism. Indeed, this is a
common problem with small-molecule agents; although it is straightforward
to isolate compounds that cause interesting phenotypes, identifying the
phenotypically relevant targets of the molecule can be challenging. Similarly,
once a putative target is identified, it can be difficult to establish whether
inhibition of that target is sufficient to cause the biological effect or whether
other targets may also contribute to the observed phenotype. In the case
of rapamycin, several groups conducted research to determine the specific
underlying biochemical interactions that lead to the phenotypic effects of
rapamycin.
120
I
Me,,
Rapamycin
Finding the binding partners of a small molecule is one common approach

in target identification.Attempts to identify the physiologically relevant cellular
targets of rapamycin led to the observation that rapamycin binds tightly to
the abundant peptidylprolyl rotamase, FK506 binding protein (FKBP).While
binding to FKBP appeared to be important for the activity of rapamycin, several
lines of evidence suggested that binding and inhibiting FKBP is not sufficient
to account for the cellular activity of rapamycin. For example, rapamycin is
toxic to yeast, yet strains lacking FKBP are viable; since FKBP is not essential,
its inhibition would not be expected to cause toxicity. Intriguingly, however,
yeast lacking FKBP are insensitive to rapamycin. This and other lines of
evidence [G, 71 lead to the subsequent realization that rapamycin binds FKBP
and that the FKBP-rapamycin complex then targets other cellular factors; it is
these other cellular factors that are responsible for the specific cellular activity
of rapamycin.
Focusing on yeast, a genetic screening was done to identify mutations
that conferred rapamycin resistance [8]. To accomplish this, yeast cells were
mutagenized and rapamycin resistant strains were identified. Some of the
mutations recoveredlocalized to FKBP, as would be expected (see Fig. 3.1-2(b)).
These mutations are recessive, consistent with the role of FKBP as an accessory
protein; even in the presence of the mutant, the wild-type copy is sufficient
to form the active rapamycin-FKBP complex. In addition to the recessive
FKBP mutations, two other proteins were implicated in rapamycin activity,
TOR1 and TOR2 (for target of rapamycin). Unlike the FKBP mutations, it was
found that TOR mutations had dominant effects, suggesting that these TOR
proteins (later identified as protein kinases related to the lipid PI3 kinase),
are the relevant targets of rapamycin responsible for cellular activity (see
Fig. 3.1-2(c)).Indeed, the mutations in the TOR proteins that cause complete
rapamycin resistance have been shown in vitro to block the binding and,
3.7 Revealing Biological Specijcity by Engineering Protein-Ligand Interactions I 121
Fig. 3.1-2 Mechanism of rapamycin lead to loss of inhibition. (c) Dominant

inhibition and resistance. (a) Rapamycin resistance mutations in TOR prevent
inhibits TOR through an FKBP-rapamycin FKBP-rapamycin binding and inhibition.
complex. (b) Resistance mutations in FKBP
therefore, inhibition of TOR by the FKBP-rapamycin complex. Furthermore,

although the initial identification of TOR was performed in yeast, several
studies demonstrated that a human homolog of TOR is responsible for the
immunosuppressive activity of rapamycin. In fact, that the mammalian TOR
deserves its name can be demonstrated using a similar line of experimentation;
the mutation in mammalian TOR analogous to one of those discovered in
yeast (S1975I) confers rapamycin resistance to mammalian cells.
In summary, mutations in a protein that caused small-molecule resistance
were used to map the phenotype induced by the small molecule to its
functionally relevant biochemical targets. Specifically, rapamycin acts by
binding the abundant protein FKBP. The resulting small-molecule protein
complex subsequently binds to and inhibits the TOR proteins leading to the
observed cellular effects of rapamycin. This seemingly complicated mechanism
of action is similar for immunosuppressants FK506 and cyclosporin A
(FK506 binds to FKBP and the complex inhibits the phosphatase calcineurin;
cyclosporin A binds cyclophilin and the resulting complex also inhibits
calcineurin).
In the case of rapamycin, it was possible to demonstrate that TOR is the
target using a dominant mutant of TOR that is resistant to FKBP-rapamycin
inhibition. These resistance mutations are the single most definitive means
of demonstrating the phenotypically relevant target of a small molecule.
Unfortunately, the availability of resistance mutations can be limiting; attempts
to engineer such a mutation may adversely affect the function of the protein
122
I (aswas demonstrated by the altered substrate specificity of the V82A mutant of
HIV PR). Similarly, isolating resistance mutations from genome-wide screens
(as was the case with TOR) is only tractable in organisms such as yeast
that are conducive to genetic manipulation. Nonetheless, the use of resistance
mutations to uncover and prove the functionally relevant targets of an inhibitor
is a powerful and definitive experiment.
3.1.2.3 Kinase Inhibitors and Resistance

While inside the laboratory, whole genome screens (enabled by organisms
amenable to genetic manipulation) has made possible the identification of
resistance mutations, outside the laboratory similar screens are inadvertently
taking place in the real lives of cancer patients who are treated with
antineoplastic drugs. The ability to search for increased gene copy number
of known oncogenes and loss of heterozygosity at tumor suppressor loci,
the development of array-based comparative genomic hybridization for
identification of translocation events, and, most relevant here, the ability
to carry out high throughput DNA sequencing of candidate resistance genes
have allowed identification of numerous molecular markers of resistance
to chemotherapeutics. Many resistance loci are associated with increases
in the cancer cell’s ability to pump out the antineoplastic agent, such as
drug efflux pump mutants. These resistance mechanisms are independent
of the targeting agent, causing resistance to cis-platinum, doxorubicin, and
other general antiproliferative agents. Resistance mechanisms to molecularly
targeted therapeutics in contrast provide discreet insights into the mechanism
of action of these new generation antineoplastics.
BAY 43-9006 lmatinib

3.1 Revealing Biological Specificity by Engineering Protein-Ligand Interactions I 123
The prototype molecularly targeted therapeutic agent is imatinib, an

inhibitor of the Bcr-Abl tyrosine kinase. This oncogenic kinase is produced by
translocation of the Bcr locus on chromosome 9 to the c-Abl tyrosine kinase on
chromosome 11, termed the Philadelphia chromosome because of its discovery
in 1960 at the University of Pennsylvania School of Medicine by Peter Nowell
and David Hungerford from the Institute for Cancer Research [9].It was later
demonstrated in 1973 by Janet Rowley that the Philadelphia translocation was
responsible for a specific form of leukemia, chronic myelogenous leukemia
(CML)[lo].In 2001, imatinib was approved for treatment of CML patients, and
produced remarkable results with more than 92% patients achieving 14-month
progression-free survival on imatinib as a monotherapy.
The importance of imatinib in demonstrating the efficacy of a small-
molecule tyrosine kinase inhibitor for cancer therapy is its broad implication
for molecularly targeted therapeutics. First, it discounted the notion that
protein kinases could not be targeted selectively by small molecules that
bind to the ATP-binding site. In particular, the ATP-binding pockets of
different protein kinases were thought to be too similar for small molecules to
discriminate between them, yet imatinib only targets a handful of kinases (the
known targets ofimatinib include Bcr-Abl,c-Abl, PDGFR, and c-Kit).Also, the
high concentration of cellular ATP (>1 mM) was expected to limit the potency
of ATP-competitive drugs, yet imatinib is a potent inhibitor (IC50 < 1 pM). It
was also believed that the side effects associated with inhibition of wild-type
kinases (such as c-Abl) would be prohibitive, yet imatinib causes no overt
toxicity in normal cells while inducing apoptosis in CML leukemia cells.
Second, because of its ability to target Bcr-Abl expressing tumors, patients
could be classified into potential responders based on their Philadelphia
chromosome status. This genomic prescreening for responder populations is
widely viewed as a major avenue for improvement of therapeutic efficacy,
minimization of unnecessary toxicity in nonresponder populations, and
heralds the era of personalized medicine.
A third and more cautionary lesson from imatinib has been the rapid
emergence of imatinib resistance in CML patients. Initially, the advanced
stage CML patients, those in so-called blast crisis stage, who received imatinib
late in disease, showed high rates of resistance. Currently, all CML patients are
given imatinib upon diagnosis, and thus the rate of emergence of resistance is
slower, although still a major challenge to these patients’ long-term survival.
The molecular mechanism behind imatinib resistance mirrors its molecular
mechanism of action. Bcr-Abl gene duplication as well as transcriptional
mechanisms leading to increases in Bcr-Abl transcript levels can lead to
imatinib resistance. Thus, the Bcr-Abl inhibition exerts selective pressure on
CML tumors to increase Bcr-Abl signaling, which is manifest by upregulation
of Bcr-Abl messenger RNA. Another common mechanism of resistance is the
mutation of the Bcr-Abl kinase ATP-binding pocket in which imatinib binds
[Ill. The mutation in the ATP-binding pocket produces a Bcr-Abl protein
kinase, which can carry out ATP-dependent substrate phosphorylation but
124
I cannot be inhibited by imatinib. Strikingly,the cancer has identified selectivity
determinants for imatinib binding, which do not affect ATP binding.

One particular mutation, T3151, is most frequently identified in imatinib
resistant tumors and serves as an illustration of how a single point mutation
can exquisitely control ligand selectivity (see Fig. 3.1-3). The amino acid at
Fig. 3.1-3 The crystal structure of imatinib bound t o Abl kinase [12]. The gatekeeper
residue (T315, colored red) packs tightly against imatinib (PDB: 1 IEP).
3. I Revealing Biological Specificity by Engineering Protein-Ligand Interactions I 125
position 315 of Bcr-Abl makes contact with the exocyclic amine of ATP and,
thus, lines the adenine-binding pocket of the kinase. The ATP-binding pocket
of most protein kinases is larger than necessary for binding ATP, especially
in the vicinity of the exocyclic amine of ATP. Thus, a large hydrophobic
pocket adjacent to adenine is available for small-molecule inhibitor binding.
Importantly, the size of the amino acid residue at position 315 controls access
to this extra pocket, and thus it has been termed the gatekeeper residue. In the
T315I mutant Bcr-Abl kinase, imatinib cannot access the hydrophobic pocket
because the larger isoleucine residue blocks its access. Since the bulkier
isoleuciiie occupies a pocket not used by substrate ATP, the T315I mutant is
still able to efficiently bind ATP and catalyze phosphotransfer reactions.
As the predominance of imatinib resistance mechanisms can be traced to
Bcr-Abl functional upregulation, the clinical resistance offers another proof
of mechanism akin to the genetic screen which identified TOR as the target
of rapamycin discussed in Section 3.1.2.2. In the former case imatinib was
more or less designed to be a Brc-Abl inhibitor, thus its target was known
from the outset of the clinical trial. In the case of rapamycin, a genetic
screen to identify its target(s) was carried out to identify the molecular basis
for its effect on immune suppression. In an amalgam between these two
paradigms for target identification and clinical efficacy, a B-Raf inhibitor
BAY43-9006 displayed disappointing efficacy in clinical trials of myeloma
patients, despite the identification of activating mutations in B-Raf, in this
form of cancer. Luckily, BAY43-9006 was also used in clinical trials of other
cancer types, where it showed surprising efficacy in the treatment of renal
cancer, which is thought to be particularly dependent on vascularization.
Subsequent biochemical studies demonstrated that BAY43-9006, which was
originally thought to be a highly specific B-Raf inhibitor, is a potent inhibitor
of the vascular endothelial growth factor receptor (VEGFR),providing a post
fucto rationale for its efficacy in this VEGFR-dependent cancer type [13].
In another case of small-molecule assisted target identification, the imatinib
response of patients with idiopathic hypereosinophilic syndrome lead to the
identification of a chromosomal rearrangement involving the tyrosine kinase,
and the known imatinib target, PDGFR, as a likely cause of this syndrome
[14]. The link between the PDGFR fusion and hypereosinophilic syndrome
was further strengthened when, after extended imatinib therapy, a relapse in
one patient was observed to correlate with the emergence of a T674I mutation
in PDGFRA. T674 is the gatekeeper residue in PDGFRA.
Similarly, imatinib has been found to be a useful therapy for gastrointestinal
stromal tumors (GIST)which is driven by the c-Kittyrosine kinase, a previously
known “off-target’’ of imatinib when it was being developed as a Bcr-Abl
inhibitor. Again, resistance to imatinib in GIST patients has emerged and c-Kit
ATP-binding site mutations to the gatekeeper residue (T670I) is commonly
found [ 151.
The lessons learned from irnatinib, BAY-43-9006suggest that cancers can
be uniquely dependent on the catalytic activity of a single kinase. Moreover,
126
I because of the highly conserved nature of the kinase ATP-binding pocket,
drug candidates always inhibit multiple family members. In some cases, off-
target effects will lead to new medicines (BAY43-9006).In some other cases
of course, off-target effects will lead to toxic side effects, and will predictably
lead to failures of clinical trials. Moreover, because a single amino acid in
the binding pocket of kinases, the gatekeeper residue, can control inhibitor-
binding specificity, resistance to these drugs has emerged quickly in cancer
patients. A central challenge in all therapeutic areas is to identify key kinase
targets for the treatment of the signaling defects in human diseases.
3.1.3
ExploitingSensitizing Mutations to Engineer Nucleotide Binding Pockets
3.1.3.1 EngineeringUniquely lnhibitable Kinases

One approach for determining the function of every protein kinase in the
genome is to develop a highly selective small-molecule inhibitor of each
kinase. The challenge in achieving high specificity is daunting since over
500 kinases are present in the human genome, containing highly similar
ATP-binding pockets. Our laboratory has addressed this specificity problem by
using protein engineering to target a kinase inhibitor to any kinase of interest.
In fact, this is the inverse of the problem discussed in Section 3.1.2.3, the
generation of imatinib resistant alleles (T315I) Bcr-Abl. Rather than creation
of an inhibitor resistant allele, the approach to discovery of an inhibitor of
any protein kinase is to create a uniquely sensitive kinase allele, which will be
inhibited by a molecule that does not inhibit any wild-type protein kinase.
Me
PPl 1NM-PP1
This is achieved by mutation of the gatekeeper residue in the wild-type

kinase to a small alanine or glycine residue. Importantly, there are no human,
mouse, worm, fly, or yeast kinases with an alanine or glycine gatekeeper
residue, making the mutant kinase unique. A pyrazolopyrimidine-based
3.1 Revealing Biological Specificity by Engineering Protein-Ligand lnteractions 1 127
Fig. 3.1-4 The structure of kinase inhibitor PP1 bound t o the

ATP-binding pocket o f Hck kinase. The gatekeeper residue (the
surface ofwhich is colored red) packs tightly against the tolyl
substituant of PP1 [16] (PDB: 1QCF).
inhibitor was designed (based on the parent inhibitor PPl), which is only
capable of inhibiting kinases containing a glycine or alanine gatekeeper
residue. Importantly, the kinases with the smallest naturally occurring
gatekeeper residues, serine and threonine, are not inhibited by 1NM-PP1
(Fig. 3.1-4).It is interesting to note that the gatekeeper residue was selected
on the basis of structural models of kinase-ATP crystal structures and docking
models of pyrazolopyrimidine-based inhibitors prior to the discovery of the
gatekeeper mutations in imatinib resistant CML patients. The fact that
gatekeeper mutations can be used to confer inhibitor sensitivity through
rational design and inhibitor resistance through natural selection processes
highlights that this residue is a dominant feature controlling small molecule
access to the ATP-binding pocket without affecting kinase activity.
3.1.3.2 Analog-specific Kinases

The enzymatic function of protein kinases is carried out by phosphorylation
of serine, threonine, or tyrosine residues on target proteins. As an estimated
30% of human proteins are thought to be phosphorylated, identification of
the direct substrates of all human protein kinases is a daunting challenge.
Although a wide range of methods have been developed for isolating the
128
I 3 Engineering Control Over Protein Function Using Chemistry
4-(03P)30
OH OH OH OH
ATP N6-Benzyl ATP
phosphoproteome, critical information about the kinase or kinases responsible

for a given phosphorylation event are not provided by phosphoproteomics. To
directly label and identify the targets of each kinase in the genome, kinases
can be engineered to accept surrogate phosphodonors that are not accepted
by any wild-type protein kinases. These N6-substituted ATP analogs, most
commonly N6-benzyl ATP, are accepted by kinases containing an alanine or
glycine gatekeeper residue.
The N6-benzyl ATP accepting oncogenic tyrosine kinase (1338G) v-Src has
been the best characterized analog-specific protein kinase. Several critical de-
sign criteria must be satisfied by an engineered kinase, for it to be useful in
studying kinase-signaling pathways. First and foremost, the substrates phos-
phorylated by the mutant kinase must be identical to those phosphorylated
by the wild-type protein. Three lines of evidence suggest that mutation of the
gatekeeper residue does not alter substrate specificity. First, using combinato-
rial peptide substrates, wild-type Src and (1338G) Src protein kinases exhibit
identical sequence specificity patterns [17]. Second, using a cellular transfor-
mation assay, v-Src and I338G v-Src produce equivalent levels of anchorage
independent cell growth, confirming that the cellular targets phosphorylated
by the mutant are able to fully recapitulate the wild-type kinase-induced phe-
notype [18].Lastly, at the structural level, the crystal structure ofthe mutant Src
(T338G c-Src, see Fig. 3.1-5) shows no rearrangements in the kinase domain
in the phosphoacceptor binding pocket. In fact, the cocrystal structure with
NG-benzylADP shows that the nucleotide binding is unchanged from that of
the ADP/c-Src cocomplex. Thus, available biochemical, genetic, and structural
evidence suggests that the mutation of the gatekeeper residue in the Src kinase
exhibits very limited change to the function of the kinase, while allowing the
use of inhibitors or ATP analogs for the study of Src. Currently, over 30
protein kinases from yeast, mouse, humans, Arabidopsis, and tomato have
been successfully engineered for substrate labeling or inhibitor development.
3.1.3.3 From CTPases to XTPases

Given the convergence between the resistance mutations found in cancer and
the mutations used to engineer orthogonal kinase ligands, it is reasonable
3. I Revealing Biological Specifrcity by Engineering Protein-Ligand Interactions I 129
Fig. 3.1-5 N6-benzylADP is shown bound i n the ATP-binding

pocket o f t h e analog-sensitized Src kinase (PDB: 1 KSW) (Ref.:
Witucki, LA et al., Chem Biol, 2002 19, 25-33).
to consider the gatekeeper residue particularly amenable to engineering.

But the gatekeeper residue is not alone. In fact, the strategy to engineer
orthogonal kinase ligands is the descendant of a similarly successful strategy
to engineer orthogonal nucleotide specificity into the nucleotide binding pocket
of GTPases. This mutation was discovered by Hwang et al. while dissecting
the GTP-binding pocket of EF-Tu, a GTPase essential for ribosome function
in Escherichia coli [19]. Introducing an aspartate to the asparagine mutation
(D138N) disrupted the hydrogen-bonding interactions between GTP and the
GTPase, thus impairing the GTPase activity of the protein. Remarkably,
using XTP as substrate rather than GTP, restored hydrogen bonding (now
reversed, see Fig. 3.1-6) and the activity of the GTPase-turned-XTPase was
nearly identical to the wild-type enzyme. Therefore, this mutation allows the
construction of an orthogonal nucleotide specificity (the XTPase accepts only
XTP; the GTPase only GTP).
This engineered GTPase was particularly useful for dissecting the GTP
requirements of the E. coli ribosome. In vitro translation experiments had
established that two GTPases are necessary for each round of amino acid
addition to a growing polypeptide. EF-G (one of these two GTPases) is
responsible for the translocation of the peptidyl-tRNA from the A site to the
P site of the ribosome. The other GTPase involved in this process is EF-
Tu - the GTPase previously engineered into an XTPase by Hwang et al. The
130
OH OH
GTP
Fig. 3.1-6 CTPases contain a conserved aspartate that hydrogen

bonds to the guanine ofCTP. An aspartate t o asparagine mutation
changes the nucleotide specificity from GTP to XTP by altering
these hydrogen bonds.
role of EF-Tu is to ensure proper binding of the appropriate aminoacyl-tRNA

to the ribosome (Fig. 3.1-7). Because the D138N EF-Tu nucleotide specificity
is orthogonal to wild-type EF-G, Weijland and Parmeggiani were able to use
this mutant, radiolabeled nucleotides (either XTP or GTP) to quantitate the
nucleotide consumption of each protein during the translation cycle [20, 211.
From this work it was established that, for every amino acid incorporated into
a growing peptide chain, EF-Tu (D138N) consumes two molecules ofXTP and
EF-G (wt) consumes one molecule of GTP.
At the time when Miller et al. developed the GTPase-to-XTPase mutation in
EF-Tu, they proposed that, because this mutation is in a highly conserved loop
shared by most GTPases, the D138N mutation should be applicable to endow
other GTPases with XTPase activity. This proposal has proven remarkably
accurate; numerous GTPases have been converted into XTPases using this
strategy [22].
3.1.4
Engineeringthe Ligand Selectively of Ion Channels
3.1.4.1 Resistance Mutations in L-type Calcium Channel Signaling

For kinases and GTPases, point mutations can be used to study one member
of a large family by allowing the engineered member to bind to a unique
3. I Revealing Biological Specfcity by Engineering Protein-Ligand interactions I 131
Fig. 3.1-7 The crystal structure o f EF-Tu bound to a nonhydrolyzable CTP analog shows
Asp138 hydrogen bonding t o guanine. (PDB: 1 EXM).
ligand or substrate. An alternative means of isolating the activity of a single

protein in a family is to engineer the protein of interest to be uniquely resistant
to a general inhibitor. This way, the activity of the protein can be unmasked
by inhibiting all the other family members. The function of L-type calcium
channels was dissected in this manner.
Voltage-gatedcalcium channels play an important role in neuronal signaling.
While there are several different types of voltage-gated calcium channels, they
share a common activity: allowing an influx of calcium into the cytoplasm
upon activation. Despite this commonality, calcium influx from different
types of channels is not equivalent; L-type calcium channel specific blockers
diminish calcium dependent CAMP-responseelement binding (CREB) protein
phosphorylation and activation of the MAP kinase pathway while N- and
P/Q-type channel blockers have little-to-no effect. This and other differences
led to the proposal that calcium signal can act locally. For example, L-type
calcium channels may have the means of directing the entering calcium
to affect signaling molecules positioned near the channel. These signaling
molecules may then activate other signaling pathways (such as the MAP
kinase pathway). Testing this hypothesis requires a means of isolating the
role of calcium influx through L-type calcium channels from the role of
calcium influx from other types of voltage-gated calcium channel. This feat
132
I was accomplished using a mutant L-type calcium channel that is resistant to
nimodipine, a dihydropyridine (DHP) antagonist of L-type calcium channel

activity.
A dihydropyridine-resistant L-type calcium channel was identified while
trying to map the DHP-binding site [23]. Initially, the binding site was probed
using photoaffinity labels and chimeric channels. These studies implicated
a specific region as responsible for DHP binding. Site-directed mutagenesis
in this region identified several mutations that altered DHP sensitivity. One
mutation, in particular, TlOOGY, was shown to be resistant to antagonism by a
DHP. The agonist binding to the mutant channel was dramatically decreased,
as demonstrated in a radioligand-binding assay. That this effect might be
caused by nonspecific disruption of the channels structure was ruled out
by demonstrating that channel activation and inactivation were not affected
by this mutation. Therefore, biochemical and electrophysiological evidences
suggest that this mutant channel is similar to the wild-type channel with the
exception that the mutant is resistant to DHP antagonists.
In neurons, the TlOOGY mutant channel's activities can be distinguished
from that of the endogenous channel by treating the cells with nimodipine,
thus blocking the wild-type copy and revealing the activity of the transfected
mutant [24]. Upon membrane depolarization in the presence of nimodipine,
the mutant channel rescues the Ca2+ influx and other downstream signaling
pathways including CREB phosphorylation and the stimulation of the MAP
Fig. 3.1-8 The activity of an exogenenous nimodipine, the endogenous, wild-type

L-type calcium channel was dissected using channel (blue) is blocked and the activity of
a mutation that effects nimodipine the mutant channel (green) i s revealed.
resistance (T1006Y). In the presence of
3. I Revealing Biological Specificity by Engineering Protein-Ligund lnteructions 1 133
kinase pathway (Fig. 3.1-8). Thus, the DHP-resistant T1006Y mutant L-type
calcium channel provides the specificity handle necessary to dissect the
activity of L-type calcium signaling. For example, this TlO06Y channel was
instrumental in the identification of a calmodulin-binding site on the C-
terminus of the channel. This binding site provides insight as to how L-type
calcium channel signaling can use local Ca2+ influx to interface specifically
with other cellular signaling pathways.
3.1.4.2 Capsaicin Sensitivity

Similar to the engineering of DHP-resistant mutant calcium channels, there
are natural examples of the emergence of uniquely resistant channels. One
example comes from the small-molecule capsaicin, the component of hot chili
peppers that induces the sensation of burning pain. Capsaicin accomplishes
this effect by binding to and opening the VR1 cation channel found in nerve
endings, including the mouth. That we consider chili peppers “hot” is not
arbitrary - the VR1 channel is also responsible for recognition of noxious
stimuli including heat (>43 “C) and acid [25].
Capsaicin
It has been proposed that capsaicin serves chili peppers by selectively

deterring predators. Birds, productive vectors for seed dispersion, do not
respond to capsaicin. In contrast, mammals are predatory but are deterred by
the capsaicin (with the exception of humans) [26]. The molecular basis of the
differential capsaicin sensitivity between birds and mammals can be traced
to VR1 [27]. The avian homolog of VR1, like its mammalian counterpart, is
responsive to heat but unlike its mammalian counterpart, avian VR1 does not
respond to capsaicin.
The chicken VR1 ortholog (capsaicin insensitive) was compared with the
rat V R l (capsaicin sensitive) and chimeric channels were used to identify
sites on the chicken VR1 sufficient to render rat VR1 capsaicin, insensitive.
When a short stretch of the rat VR1 channel in the third transmembrane
spanning region (presumably at the capsaicin-binding site, although there
are no high resolution structures of the VR1 channel) is substituted with the
chick sequence, the mutant channel is rendered capsaicin insensitive. Using
this chimera as inspiration, it was possible to find individual point mutations
sufficient to render the rat channel capsaicin insensitive while having only a
134
I modest impact on the channel's response to heat and acid. Interestingly, the
best resistance-inducing mutations were the unnatural ones found in neither

receptor; the use of these natural differences serves as an excellent guide but,
as with many of the examples above, it is often necessary to test a panel of
mutations before a productive mutant is found.
Perhaps more remarkable than the ability to use the differences between
chick (insensitive) and rat (sensitive) to construct a mutant insensitive rat
VR1, was the use of the rat receptor to guide the construction of a capsaicin-
sensitive chick receptor. Building the binding pocket required more than a
point mutation; the active construct borrowed 45 amino acids from the rVR1
inserted into the correct position in the cVR1. Essentially, the molecular basis
of this selective deterrence causing birds, but not mammals, to consume chili
peppers is explained by a biochemical change in ligand specificity, induced by
a few amino acids in mammal versus avian VR1.
3.1.5
Conclusion
3.1.5.1 Challenges in Protein Engineering

We have presented several natural and synthetic examples of the alteration
of protein-ligand interactions. Several other examples exist and have been
reviewed elsewhere [28-311. While the utility of altering ligand specificity
is clear, protein engineering remains challenging. Even for the successful
examples presented here, the mutant proteins frequently suffer some level
of compromised function. For example, the space-creating mutation in the
ATP-binding site of Cdkl, an essential yeast kinase involved in the regulation
of cell cycle progression, has a substantial impact on the KM of the kinase for
ATP = 35 pM, KM,,,~ = 320 pM) [32]. In this case, the compromised
KM does not significantly impact the utility of the engineered kinase because
the high cellular concentrations of ATP (>1mM) are substantially above the
KM for both the wild-type kinase and the mutant. This and other similar
concerns can be addressed by using one of the great advantages of convergent
engineering strategies: the activity of the mutant can always be compared to
the activity of the wild type, both with and without ligand (see Table 3.1-1).
Because of these controls, unintended changes to the function of the protein
or the ligand can be dissected. In the case of the analog-sensitized Cdkl, the
mutant compares favorably with the temperature-sensitive mutant that had
previously been used to dissect the function of this kinase. Specifically, this
mutant kinase has been used with INM-PP1 to demonstrate the role of Cdk1
in the G2/M transition [32] and with ~ - ~ * P - l a b e lNG-benzyl
ed ATP to identify
numerous substrates of this kinase [33].Even when the reengineered mutants
do not match the function of the wild type perfectly, they can still serve as
useful tools.
3. I Revealing Biological Spec9city by Engineering Protein-Ligand Interactions I 135
Table 3.1-1 Controls available when using orthogonally

engineered protein-ligand interactions to study the biological
function of a protein
Without ligand With ligand
Wild type Reference state Control for the off-target effects

of the ligand
Mutant Control for the effect of the Experimental condition to probe
mutation the functional consequences of
the protein-ligand interaction
But sometimes the engineered mutations have a substantial impact on the

activity of the protein. For example, while the GTPase-to-XTPase mutation
described in Section 3.1.3.3 has been general for most of the GTPases
studied, attempts to use the Asp-to-Asn mutation to study G-protein coupled
receptor (GPCR) signaling through Go, were initially unsuccessful because
the mutation (D273N) compromises nucleotide binding and GTPase activity
of these G-proteins [34]. In this case, it was possible to rescue the activity of
the mutant G-protein using an additional mutation (Q2SOL) that resides on
the other side of the GTP-binding pocket from D273. The discovery of this
mutation was apparently serendipitous; Q250L mutants are usually GTPase
deficient. Similarly, the space-creating mutations used to study kinases (see
Section 3.1.3.1) occasionally compromise the activity of a kinase severely. In
several cases, it has been possible to identify second-site suppressor mutations
that rescue the activity of the mutant kinase [35]. In light of the natural
examples we have presented above, perhaps this level of feasibility is to be
expected; within the set of single mutants of a given protein there appears
to be significant functional diversity in ligand-binding activities. The best
mutations are sometimes, but not always, easy to rationalize. While using
rational strategies to identie productive mutations undoubtedly enriches
the chances of finding mutants with the desired activities, testing several
mutations is likely necessary. Nonetheless, both the natural and synthetic
examples above illustrate that reengineering a protein’s ligand specificity is a
tractable problem.
3.1.5.2 Engineering Extended Biomolecular Interfaces

While this chapter has focused on the engineering of selectivity for small-
molecule ligands, primarily using single mutations, a similar strategy would
clearly be useful for studying the biological specificity of larger interfaces if
the reagents were available. Toward this end, several studies have attempted
more ambitious engineering projects to redesign large regions of protein
interfaces. For example, computational approaches were instrumental in
developing mutants of maltose-binding protein (and related members of the
136
I family) with completely reengineered ligand specificities
[36] Similarly, many

other computational approaches have made significant progress to aid in the
reengineering of protein interfaces [37]. Alternatively, in vitro selections have
provided a means of enriching desired binders from large libraries of mutants.
For instance, phage display has been used to reengineer both protein-protein
[38] and protein-DNA [39-411 interactions. While reengineering complex
biomolecular interfaces remains difficult, these advances, alone or in
combination, will aid in the development of specifically engineered binding
partners that will provide powerful tools to study the biological importance of
these interactions.
3.1.5.3 Conclusion
Reengineering protein-ligand interactions can provide powerful information
that complements traditional biochemical and genetic approaches. The power
of these engineering approaches will increase as new methods are developed
both in protein engineering and in our ability to genetically manipulate
the organisms we wish to study. These engineering approaches are most
useful in vitro or in organisms where genetic manipulation is tractable,
such as bacteria, yeast, flies, and mice. As pharmacological agents that
target wild-type proteins become increasingly selective, these reagents will
complement chemical genetic tools. Even in these cases, however, engineering
protein-ligand interactions can provide important information about the
specificity of the pharmacological agent, as was discussed earlier for rapamycin.
While the genome is vast, many of its features reoccur (e.g., domains,
cofactors, etc.) in several different signaling contexts. This biochemical
similarity presents a specificity problem on one hand but an engineering
opportunity on the other; introducing specificity handles using carefully
designed mutations can help provide insight into critical connections between
biochemical specificity and biological function.
References
1. R.M. Klabe, L.T. Bacheler, P.J. Ala, 3. M. Prabu-Jeyabalan, E.A. Nalivaika,

S. Erickson-Viitanen, J.L. Meek, N.M. King, C.A. Schiffer, Structural
Resistance to HIV protease inhibitors: basis for coevolution of a human
a comparison of enzyme inhibition immunodeficiency virus type 1
and antiviral potency, Biochemistry nucleocapsid-pl cleavage site with a
1998, 37(24),8735-42. v82a drug-resistant mutation in viral
2. N.M. King, M. Prabu-Jeyabalan, protease, J. Virol. 2004, 78(22),
E.A. Nalivaika, C.A. Schiffer, 12446-54.
Combating susceptibility to drug 4. S.W. Kaldor, V. J. Kalish, J.F.N. Davies,
resistance: lessons from hiv-1 B.V. Shetty, J.E. Fritz, K. Appelt,
protease, Chem. B i d . 2004, 11(10), J.A. Burgess, K.M. Campanale,
1333-8. N.Y. Chirgadze, D.K. Clawson,
References I 1 3 7
B.A. Dressman, S.D. Hatch, D.A. C. Chen, X. Zhang, P. Vincent,

Khalil, M.B. Kosa, P.P. Lubbehusen, M. McHugh, Y. Cao, J. Shujath,
M.A. Muesing, A.K. Patick, S. Gawlak, D. Eveleigh, B. Rowley,
S.H. Reich, K.S. Su, J.H. Tatlock, L. Liu, L. Adnane, M. Lynch,
ViracePt (nelfinavir mesylate, ag1343): D. Auclair, I. Taylor, R. Gedrich,
a potent, orally bioavailable inhibitor A. Voznesensky, B. Riedl, L.E. Post,
of hiv-1 protease, J. Med. Chem. 1997, G. Bollag, P.A. Trail, Bay 43-9006
40(24),3979-85. exhibits broad spectrum oral
5. D.S. Dauber, R. Ziermann, N. Parkin, antitumor activity and targets the
D.J. Maly, S. Mahrus, J.L. Harris, raf/mek/erk pathway and receptor
J.A. Ellman, C. Petropoulos, tyrosine kinases involved in tumor
C.S. Craik, Altered substrate progression and angiogenesis, Cancer
specificity of drug-resistant human Res. 2004, 64(19), 7099-109.
immunodeficiency virus type 1 14. J . Cools, D.J. DeAngelo, J. Gotlib,
protease, I. Viral. 2002,76(3),1359-68. E.H. Stover, R.D. Legare, J. Cartes,
6. J.L. Crespo, M.N. Hall, Elucidating tor J. Kutok, J. Clark, I. Galinsky,
signaling and rapamycin action: J.D. Griffin, N.C. Cross, A. Tefferi,
lessons from saccharomyces J . Malone, R. Alam, S.L. Schrier,
cerevisiae, Microbiol. Mol. Biol. Rev. J. Schmid, M. Rose, P. Vandenberghe,
2002, 66(4), 579-91. G. Verhoef, M. Boogaerts,
7. S.L. Schreiber, Chemistry and biology I , wlodarska, H, Kantarjian,
of the immunophilins and their P. Marynen, S.E. Coutre, R. Stone,
immunosuppressive ligands, Science D.G. Gilliland, A tyrosine kinase
1991, 251(4991),283-7.
created by fusion of the pdgfra and
8. J. Heitman, N.R. Mowa, M.N. Hall, fiplll genes as a therapeutic target of
Targets for cell cycle arrest by the imatinib in idiopathic hypereo-
sinophilic syndrome, N. Engl. I. Med.
immunosuppressant rapamycin in
yeast, Science 1991, 253(5022),905-9.
2003, 348(13), 1201-14.
9. P.C. Nowell, D. Hungerford, A minute
15. E. Tamborini, L. Bonadiman,
chromosome in chronic granulocytic
A. Greco, V. Albertini, T. Negri,
leukemia, Science 1960, 132, 1497.
A. Gronchi, R. Bertulli, M. Colecchia,
10. J.D. Rowley, Letter: a new consistent
P.G. Casali, M.A. Pierotti, S. Pilotti, A
chromosomal abnormality in chronic
new mutation in the kit atp pocket
myelogenous leukaemia identified by
quinacrine fluorescence and giemsa causes acquired resistance to imatinib
staining, Nature 1973, 243(5405), in a gastrointestinal stromal tumor
290-3. patient, Gastroenterology 2004, 127(1),
11. M.E. Gorre, M.Mohammed, 294-9.
K. Ellwood, N. Hsu, R. Paquette, 16. T. Schindler, F. Sicheri, A. Pico,
P.N. Rao, C.L. Sawyers, Clinical A. Gazit, A. Levitzki, J. Kuriyan,
resistance to sti-571 cancer therapy Crystal structure of hck in complex
caused by bcr-abl gene mutation or with a src family-selective tyrosine
amplification, science 2001, 293(5531), kinase inhibitor, Mol. Cell 1999, 3(5),
876-80. 639-48.
12. B. Nagar, W.G. Bornmann, 17. L.A. Witucki, X. Huang, K. Shah,
P. Pellicena, T. Schindler, D.R. Veach, Y. Liu, S. Kyin, M.J. Eck, K.M. Shokat,
W.T. Miller, B. Clarkson, J. Kuriyan, Mutant tyrosine kinases with
Crystal structures of the kinase unnatural nucleotide specificity retain
domain of c-abl in complex with the the structure and phospho-acceptor
small molecule inhibitors pd173955 specificity of the wild-type enzyme,
and imatinib (sti-571),Cancer Res. Chem. Bid. 2002, 9(1),25-33.
2002, 62(15),4236-43. 18. K. Shah, K.M. Shokat, A chemical
13. S.M. Wilhelm, C. Carter, L. Tang, genetic screen for direct v-src
D. Wilkie, A. McNabola, H. Rong, substrates reveals ordered assembly of
138
a retrograde signaling pathway, Chem. 28. M.A. Shogren-Knaak, P.J. Alaimo,

Bid. 2002, 9(1),35-47. K.M. Shokat, Recent advances in
19. Y.W. Hwang, D.L. Miller, A mutation chemical approaches to the study of
that alters the nucleotide specificity of biological systems, Annu. Rev. Cell
elongation factor tu, a gtp regulatory Dev. Biol. 2001, 17,405-33.
protein, /. Bid. Chem. 1987, 262(27), 29. J.T. Koh, Engineering selectivity and
13081- 5. discrimination into ligand-receptor
20. A. Weijland, A. Parmeggiani, Toward interfaces, Chem. Biol. 2002, 9(1),
a model for the interaction between 17-23.
elongation factor tu and the ribosome, 30. A.R. Buskirk, D.R. Liu, Creating
Science 1993, 259(5099), 1311-4. small-molecule-dependent switches to
21. A. Weijland, G. Parlato, modulate biological functions, Chem.
A. Parmeggiani, Elongation factor tu B i d . 2005, 12(2),151-61.
d138n, a mutant with modified 31. B.N. Cook, C.R. Bertozzi, Chemical
substrate specificity,as a tool to study approaches to the investigation of
energy consumption in protein cellular systems, Bioorg. Med. Chem.
biosynthesis, Biochemistry 1994, 2002, 10(4),829-40.
33(35),10711-7. 32. A.C. Bishop, J.A. Ubersax,
22. A. Bishop, 0. Buzko, S. Heyeck- D.T. Petsch, D.P. Matheos, N.S. Gray,
Dumas, I. Jung, B. Kraybill, Y. Liu, J. Blethrow, E. Shimizu, J.Z. Tsien,
K. Shah, S. Ulrich, L. Witucki, P.G. Schultz, M.D. Rose, J.L. Wood,
F. Yang, C. Zhang, K.M. Shokat, D.O. Morgan, K.M. Shokat, A
Unnatural ligands for engineered chemical switch for inhibitor-sensitive
proteins: new tools for chemical alleles of any protein kinase, Nature
genetics, Annu. Rev. Biophys. Biomol. 2000 407(6802),395-401.
Sttuct. 2000, 29, 577-606. 33. J.A. Ubersax, E.L. Woodbury,
23. M. He, I. Bodi, G. Mikala, P.N. Quang, M. Paraz, J.D. Blethrow,
A. Schwartz, Motif iii s5 of 1-type K. Shah, K.M. Shokat, D.O. Morgan,
calcium channels is involved in the Targets of the cyclin-dependent kinase
dihydropyridine binding site. a cdkl, Nature 2003, 425(6960),859-64.
combined radioligand binding and 34. B. Yu, V.Z. Slepak, M.I. Simon,
electrophysiologicalstudy, /. Bid. Characterization of a goalpha mutant
Chew. 1997, 272(5),2629-33. that binds xanthine nucleotides, I.
Biol. Chem. 1997, 272(29), 18015-9.
24. R.E. Dolmetsch, U. Pajvani, K. Fife,
J.M. Spotts, M.E. Greenberg, 35. C. Zhang, D.M. Kenski, J.L. Paulson,
A. Bonshtien, G. Sessa, J.V. Cross,
Signaling to the nucleus by an 1-type
D.J. Templeton, K.M. Shokat, A
calcium channel-calmodulin complex
second-site suppressor strategy for
through the map kinase pathway,
chemical genetic analysis of diverse
Science 2001, 294(5541),333-9.
protein kinases, Nut. Methods 2005,
25. M.J. Caterina, M.A. Schumacher,
2(6),435-41.
M. Tominaga, T.A. Rosen, J.D. Levine, 36. L.L. Looger, M.A. Dwyer, J.J. Smith,
D. Julius, The capsaicin receptor: a H.W. Hellinga, Computational design
heat-activated ion channel in the pain of receptor and sensor proteins with
pathway, Nature 1997, 389(6653), novel functions, Nature 2003,
816-24. 423(6936), 185-90.
26. 7.7. Tewksbury, G.P. Nabhan, Seed 37. T. Kortemme, D. Baker,
dispersal. directed deterrence by Computational design of
capsaicin in chilies, Nature 2001, protein-protein interactions, Cum.
412(6845),403-4. Opin. Chem. Bid. 2004, 8(1),91-7.
27. S.E. Jordt, D. Julius, Molecular basis 38. S. Atwell, M. Ultsch, A.M. De Vos,
for species-specificsensitivity to “hot” J.A. Wells, Structural plasticity in a
chili peppers, Cell 2002, 108(3), remodeled protein-protein interface,
421-30. Science 1997, 278(5340),1125-118.
References I 1 3 9
39. H.A. Greisman, C.O. Pabo, A general 41. M.D. Simon, K.M. Shokat,
strategy for selecting high-affinity zinc Adaptability at a protein-dna interface:
finger proteins for diverse dna target re-engineering the engrailed
sites, Science 1997, 275(5300),657-61. homeodomain to recognize an
40. R.R. Beerli, B Dreier, C.F. Barbas, unnatural nucleotide, J . Am. Chem.
Engineering polydactyl zinc-finger SOC.2004, 126(26),8078-9.
transcription factors, Nat. Biotechnol.
2002, 20(2), 135-41.
Chemical Biology
Cowriaht 0 2007 WILEY-VCH Verlaq CmbH & Co KCaA, Weinheim
140
3.2
Controlling Protein Function by Caged Compounds
Andrea Giordano, Sirus Zarbakhsh, and Carsten Schultz
3.2.1
Introduction
One ofthe major tasks in biological sciences is to dissect complex specimens to

learn more about structures, their functions, and the connections between the
components. These days, science is focusing predominantly on the microscopic
and molecular level and therefore the behavior of each molecule, its fate, its
mobility, and the interaction with other molecules is of interest. TO achieve
this, it is required to generate data with high spatial and temporal resolution.
Most standard methods cannot provide the latter, because they require the
destruction of cells. Even modern techniques like ribonucleic acid interference
(RNAi) or artificial expression of proteins are crude in this respect because
large populations of molecules are affected. It would be most desirable to
interfere with a small subset of molecules in a specific area of a cell or an
organism. Even more advanced would be techniques that permit the onset
of a biochemical reaction or a translocation event at a certain time point
and under the control of the observer. Photoactivatable compounds could
serve these purposes. With a flash of light focused at a particular region of
the specimen, a biologically active compound may be generated or destroyed
within seconds. The caged compound is usually a small molecule that is
able to modulate protein function [l].In the last decade or so, proteins or
peptides themselves are increasingly equipped with photoactivatable groups
generating switchable, biologically active molecules under the direct control
of the experimentalist [2, 31. When applied to proteins, the photolytic removal
would activate or inactivate the molecule spontaneously thus mimicking fast
intracellular changes in enzyme activity. In a few cases, the methodology was
used for other macromolecules like DNA and RNA [4-61. This chapter gives a
brief overview of the various known caging groups suitable for forming caged
proteins, their pros and cons, and the methods of introducing the groups
chemically. Chiefly, the current knowledge of applying cages to proteins
and the questions answered by using caged proteins are described. During
the preparation of this manuscript, a splendid book describing most of our
knowledge on caged compounds and proteins was published [7].
3.2.2
Photoactivatable Groups and Their Applications
3.2.2.1 Nitrobenzyl and Nitrophenyl Groups

In 1962, Barltrop et al. reported the release of glycine from its nitrobenzyl
carbamate upon photolysis IS]. Today, the o-nitrobenzyl group and its
ISBN: 978-3-527-31150-7
3.2 Controlling Protein Function by Caged Compounds I 141
derivatives are the most prevalent photocleavable caging groups in use.

Formally, the reaction is a photochemically induced isomerization of
o-nitrobenzyl alcohol into o-nitrosobenzaldehyde, thereby releasing the
substituent as the free acid (Scheme 3.2-1).Esters, carbamates, and carbonates
are converted into an acetal derivative that spontaneously collapses into the
aldehyde and the released fragment. If the leaving group is a carbamate or
a carbonate the latter undergoes spontaneous decarboxylation and yields free
amines or alcohols, respectively.
The groups are usually uncharged, of average lipophilicity, and fairly small;
all features that are desirable for cell applications. Nitrobenzyl groups as
well as other caged groups were successfully employed, especially to mask
charged groups like acids, phosphates, and amines (as carbamates) [g]. For
compounds like CAMP the corresponding nitrobenzyl ester or coumaryl esters
were rendered uncharged by the masking groups and the compounds were,
therefore, able to penetrate cell membranes [I1, 121. After photolysis, however,
the released charged compounds were again impermeable and hence trapped
inside cells. This prodrug-like approach combines two crucial features of
biochemical tools: cell permeability and photoactivation. This combination of
properties could also be of major interest in peptide-based tools in the future.
The unsubstituted 2-nitrobenzyl (NB) group (Fig. 3.2-1A) has several
shortcomings that limit its application. First, the wavelength that is required
for deprotection (260 nm) is too short for optical equipment and is known
to damage living cells [13]. Second, the N B caging group is not suitable to
examine fast reactions because there is a lag of a few milliseconds between
the photolysis and the release of the bioactive molecule [14, 151. Third, the
photoproduct 2-nitrosobenzaldehyde may react with the released compound or
other components, leading to cell damage [16].These three factors (photolysis
wavelength, kinetics, and product) are most relevant for all cages used in living
cells.
A more suitable photolysis by-product is released from the 1-(2-nitro-
pheny1)ethyl (NPE) group (Fig. 3.2-1B) [16], which is also removed by
short UV light (265 nm). It generates the less reactive nitrosoacetophe-
none and therefore exhibits less toxicity. Also, NPE’s photolysis rates
are significantly higher at 260nm than those for N B (10000 versus
850 s-l). Even better are a-carboxy-2-nitrobenzyl (CNB) groups (Fig. 3.2-1C,
17000 s-l) [17]. However, the NPE group is chiral, a property that is
often undesirable due to the formation of diastereomers with chiral
biomolecules. The diastereomers might have different biological and pho-
tochemical properties and separation is usually difficult on a preparative
scale.
NPE-caged ATP was used to probe the kinetics of muscle contraction, but
its release rate was modest and, more importantly, the caged compound was
not completely inactive [18, 191. Sometimes, the increased lipophilicity of the
cage is undesirable. To prevent the interaction of NPE-caged carbamoylcholin
with the nicotin acetylcholine receptor before photolysis a negatively charged
142
I 3 Engineerhg Control Over Protein Function Using Chemistry
I OYX
"&" \ /
tI
I
t
X
a:
a: I ko
qq
3
3
N
0
CK-
LT
z
zII-
-K
o-+g
X
t
A B NO2 CH3 C NO2 COOH D NO2 COOH E
&x &x @x W N H 2 H3CO

\ \ \ \
OCH3
NB NPE CNB NPg DMNB
H3C0W H
I NU2 OCH3
OCH3
DMNPE DNP NTP DMNTP
Fig. 3.2-1 Structures o f nitrobenzyl groups used for light-induced

deprotection. X represents a leaving group, either in the reagent
used to introduce the cage or for the photochemical release.
carboxylate group was attached to the cage (CNB, Fig. 3.2-1C), eliminating
the problem 1171. In addition, this CNB group showed faster release kinetics
than the N B group [17]. CNB has also been successfully used to cage glycine
derivatives [20]. However, additional charges are not always beneficial. CAMP-
dependent protein kinase A (PKA) was made to react with CNB bromide to
yield a caged version of the enzyme [21]. The caging group was introduced at
Cys199 and inactivated PKA. Unfortunately, the caged protein was unable to
undergo significant photoactivation. In contrast, simple o-nitrobenzyl bromide-
modified PKA not only exhibited a substantial loss in kinase activity but also
showed a 20-30 fold reactivation of the catalytic activity upon exposure to UV
light (for more detailed information on caged PKA, see below).
A particular form of CNB is (2-nitropheny1)glycine (Npg). This artificial
amino acid (Npg, Fig. 3.2-1D) was successfully incorporated into ion channels
like the nicotinic acetylcholine receptor [22] by nonsense suppression, a
technique developed by Peter Schultz and coworkers [23-261. Irradiation (4
h, > 360 nm) of proteins containing Npg led to peptide backbone cleavage in
Xenopus oocytes [22].
Like the nitrobenzyl group, NPE and CNB groups absorb only weakly at
wavelengths greater than 340 nm, thus limiting applications in the suitable
range of 350-400 nm. Wavelengths under 300 nm are inconvenient because
of considerable absorption by proteins and nucleic acids as well as by any kind
of glass, including microscope lenses.
This was overcome when electron-donating groups were added to the
aromatic moiety. The 4,5-dimethoxy-2-nitrobenzyl (DMNB) (Fig. 3.2-1E) cage
(2-nitroveratryl) was introduced in 1970 by Patchornik and Woodward as
144
I a “nitrogen” protecting group [27]. The substituents on the aromatic ring
were located to give a major absorption band at 350nm. This relatively

long wavelength is attractive, because absorbance of radiation by proteins
and nucleic acids is significantly reduced. Until today, DMNB is still one
of the few photolabile protecting groups working at lower energy levels (up
to 420 nm). Marriott employed DMNB chlorocarbamate (Fig. 3.2-2A) to cage
G-actin at LysGl [28].This modification blocked the polymerization of G-actin
to F-actin. Additionally,he prepared a cysteine-caged myosin using the DMNB
bromide [29]. The DMNB chlorocarbamate and bromide (Fig. 3.2-2A and B)
are both commercially available and are the most commonly used reagents
to introduce the DMNB group. Nitrophenyl-substituted Michael acceptor
systems (Fig. 3.2-2C) have also been employed to cage proteins, for instance
B-galactosidase,probably by reaction with a cysteine residue [30].
Katritzky et al. examined the effect of the electronic nature of nitrobenzyl
groups and two different types of linkage groups, ether and carbonate,
upon photolysis [31]. The 4-monomethoxy substituted nitrobenzyl group
(Fig. 3.2-1F) had a more electron-rich benzylic carbon atom than that of
the 4,s-dimethoxy substituted nitrobenzyl compounds, because, according
to the authors, the methoxy substituent in the meta position was electron
withdrawing with respect to the benzyl carbon atom. On the basis of
quantitative stucture-activity relationship calculations it was expected that
monomethoxy-substituted nitrobenzyl molecules would decompose faster
than their dimethoxy analogs under photolysis conditions [311. Dimethoxy
substitution of caged nitrobenzyl phenylephrine increased the maximum
absorption wavelength and also increased the rate of photolysis relative to
the unsubstituted nitrobenzyl phenylephrine analog, showing that electron-
donating benzyl substituents promoted photolytic cleavage of 2-nitrobenzyl
phenolic ethers. Furthermore, it was shown that molecules with ether linkages
decompose faster than molecules with a carbonate linkage. The faster kinetics
of release of DMNB compared to the corresponding NPE-caged versions were
demonstrated for caged cyclic nucleotides [32].
The 1-(4,5-dirnethoxy-2-nitrophenyl)ethyl (DMNPE) (Fig. 3.2-1G) group
which combines both the modifications of DMNB and NPE groups failed
to show fast release kinetics with ATP or amino acids [32, 331. As mentioned
above, another major problem is the formation of diastereomers due to the
Fig. 3.2-2 Structures of commonly used DMNB reagents.

stereocenter at the benzylic carbon atom. As expected, the DMNPE group is

removed with UV light > 350 nm, which is less harmful to cells. Furthermore,
the photo-by-product is again a nitrosoacetophenone that is less reactive than
the corresponding aldehyde released by photolysis of commonly implemented
o-nitrobenzyl caging groups. Therefore, depending on the application, the use
of the DMNPE group might be beneficial, especially when the formation of
diastereomers is not causing problems. The isomeric 2-ethyl form [34] as
well as the related 2-propyl variety [35] were also examined as cage groups.
The photorelease happened via B-elimination. Because of favorable quantum
yields, these groups may be some of the most promising caging groups in
future applications.
Some of the isomeric nitroaromatic groups were tried as photocages for
phosphates in the 1960s. The 3,s-dinitrophenyl (DNP) (Fig. 3.2-1H) caged
inorganic phosphate was converted by irradiation at 300-360 nm ( E ~ ~ ~
about 3000 M-lcm-l) with a reasonable quantum efficiency (0.67) and
released phosphate at > l o 4 s-' at pH 7. However, the only successful
example that employs the DNP group was the photoreleasing phosphate
in crystals of glucogen phosphorylase b, thereby permitting to monitor
its catalytic cycle by Laue X-ray diffraction [36]. DNP-caged ATP was at
least 100-fold less photosensitive than DNP phosphate, clearly a setback for
applications involving compounds with a chromophore. Recently, N-methyl-
N-(2-nitrophenyl)carbamoyl chloride (MNPCC) was introduced to specifically
mask the catalytic serine in butyrylcholinesterase (BChE). Reactivation was
achieved by irradiation at 365 nm [37].
A very recent addition to the nitrobenzyl-based photocleavable protecting
groups are the 1-(2-nitrophenyl)-2,2,2-trifluoroethyl
(NPT) (Fig. 3.2-1K)and the
~-(~,5-dimethoxy-(2-nitrophenyl)-~,~,~-trifluoroethyl (DMNPT) (Fig. 3.2-1L)
groups [38]. However, these groups are not stable under the harsh reaction
conditions of the Williamson synthesis. Therefore, it was required to attach the
NPT and DMNPT groups to various alcohols via Mitsunobu coupling. Primary
alcohols reacted with good yields while secondary alcohols gave only poor cou-
pling. An advantage ofthe NPT and DMNPT groups is the high quantum yields
(0.4-0.7). Unfortunately, besides the slow fragmentation kinetics observed for
decaging alcohols [38]this caging group exhibited very poor hydrolytic stability
for carboxylic esters (M. Goeldner, personal communication).
An interesting nitrobenzyl-based photocage is the 2,2'-dinitrobenzhydryl
(DNB) group [27]. Here, the benzylic methylene group is substituted with
another o-nitrophenyl group. This group, which was used to cage amino acids,
does not lead to diastereomers due to its symmetry. The related bis(2-nitro-
4,5-dimethoxyphenyl)methylgroup was used to cage ion chelators [39,40].
A novel cage variety is the 2-(dimethylamino)-5-nitrophenyl (DMNP) group.
With its major absorption band at 400 nm, fast release kinetics, and a decent
extinction coefficient (9000 M-'cm-') this group appears to be promising for
in vivo applications [41].
146
Is it possible to use several of these photoactivatable groups in one molecule

for orthogonal deprotection by wavelength-selective cleavage? First attempts
with various nitrobenzyl group derivatives were only partially successful mainly
because of energy transfer between the chromophores [42,43].
3.2.2.2 Other Caging Groups

A significant number of photoremovable protecting groups that are not
derivatives of nitrobenzyl group cages have been devised by organic chemists
for applications in peptide and nucleotide syntheses. These groups and their
respective uses have been extensively reviewed before [9,44].We will therefore
describe only those groups that were useful to cage peptides and proteins,
in detail. However, several caging groups used to date for small molecules
or as photoremovable protecting groups for synthetic purposes may be very
useful for applications with proteins in the future. Unfortunately, many of
these require photolysis with short wavelength ultraviolet light (<300 nm)
and would be impractical for biological systems. Some, however, are cleaved
at higher wavelengths and do not cause the photodestruction of amino acids
such as tryptophan and tyrosine. These are, in particular, phenacyl esters. They
were used to mask phosphates [45,46]and peptides [47] and generated mostly
phenylacetic acid derivatives after photocleavage due to an intramolecular
rearrangement reaction [48, 491. Sheehan introduced substituted benzoin
esters as a protecting group for the carboxyl group, over 30years ago [SO].
Later, this moiety was reinvestigated as replacement for the NPE group
to protect phosphates. Promising results were achieved with cu-benzoyl-3,S-
dimethoxybenzyl phosphate due to a high quantum yield (0.78 at 347 nm/0.64
at 366 nm) and fast photolysis rates (>10’ s-I) [51,52]. A water-soluble diacetic
acid derivative was also introduced 1531. A very elegant application of a benzoin
group is the formation of a peptidic loop by cyclization via a bifunctional
chromophore that keeps the peptide in a partially unfolded state. Photolysis of
the benzoin broke the cyclic structure thereby permitting the peptide to fold,
which was followed by CD spectroscopy [54].
Other groups like the sisyl (tris(trimethylsily1)-silyl)group are probably too
lipophilic to be used in an aqueous environment and might interfere with
protein conformation or solubility [55].This problem has been anticipated for
coumarin-based cages. While coumarins were successfully used for caging
y-aminobutyric acid (GABA) derivatives [56] and for two photon photolysis
of glutamate in brain slices [57], the (7-methoxy-coumarin-4-y1)methyl esters
of CAMP and cGMP were poorly soluble [58, 591. More recently, however,
substituted coumarylmethyl ester (7-diethylaminocoumarin-4-y1)methyl ester
(DEACM), (7-carboxymethoxycoumarin-4-y1)methyl ester (CMCM), and [6,7-
bis(carboxymethoxy)coumarin-4-yl]methyl ester (BCMCM) were developed to
cage cyclic nucleotide monophosphates. The CMCM and BCMCM groups in-
creased the hydrophilicity and solved the solubility problem [59].The DEACM
protecting group on the other hand, exhibited remarkable photochemical
properties [60]. The caged cyclic nucleotides could be efficiently released at

nondamaging wavelengths (405 nm). All caged compounds were released
very quickly and show very high rates of photocleavage. 7-Hydroxycoumarinyl
methyl esters of CAMP were also sufficiently soluble to allow for biological
applications [61]. Hence, coumarin-based groups have a high potential for
successful applications in proteins. Other groups worth investigating are aryl-
azides [62],nitroindilines [63, 641, as well as N-acyl-2-thionothiazolidines [65]
and 5-azido-l,3,4-oxadiazoles [66].Most of these groups suffer from laborious
preparation procedures or have just not been investigated for applications with
large molecules. Exceptions are cinnamate-based caging groups.
32 . 2 . 3 Vi nylogenic Photocleavable Croups

The cinnamate cage was used in one ofthe earliest examples ofa caged enzyme.
In contrast to other caging groups, the cinnamate cage relied on E + Z pho-
toisomerization (Scheme 3.2-2). Porter and coworkers showed that a number
of serine proteinases could be inactivated with p-Amidinophenyl-o-hydroxy-
methylcinnamate, which forms a stable acyl enzyme intermediate upon release
of the pamidinophenol leaving group [67,68]. After photoisomerization to the
Z derivative, the aromatic hydroxy group was sufficientlyclose to the ester ofthe
acylated enzyme to permit reesterification (Scheme 3.2-2). This sterically favor-
able arrangement allowed the regeneration ofthe free serine hydroxy group and
gave the decaged protein. Limitations reside in the extensive overlap between
enzyme and inhibitor absorbance spectra. The intensity of the light source had
to be substantial. At the same time long irradiations degraded the enzyme.
Other photocleavable protecting groups that take advantage of E + Z
photoisomerization are the vinylsilanes (Fig. 3.2-3) [69, 701. Unfortunately,
these compounds require harsh, short wavelength light (254 nm) for
photoconversion. The introduction of a methylenedioxy group (Fig. 3.2-3B)
failed to shift the absorption to higher wavelengths, but the naphthalene
derivative (Fig. 3.2-3C)was effectively photolyzed at 350 nm in methanol.
3.2.2.4 Attaching Photoactivatable Croups

The introduction of a cage usually requires a nucleophilic group at the molecule
of interest. The relevant groups in proteins and peptides are amino, thiol, or
w o , eH n z y m e
HO-enzyme
Scheme 3.2-2 Decaging of a proteinase via an intramolecular reesterification.

148
*OH
Fig. 3.2-3 Vinylsilanes as photocleavable protecting groups require E +2

photoisornerization.
alcohol groups. Amino groups are readily reacted with chloroformate deriva-
tives (Scheme 3.2-3).In fact, the most commonly used nitrobenzyl derivative
(DMNB-OCOC~)is commercially available. Other reagents are prepared by
reaction of the alcohol with phosgene or alternatively with carbonyldiimida-
zole (CDI) [42]. Caging reactions proceed under mild conditions in aqueous
solution at slightly basic pH (9-10) [28]. An alternative is p-nitrophenyl car-
bonate esters. The leaving group permitted the formation of a carbamate
directly from the hydrochloric acid salt of glutamate in the presence of
4-(dimethylarnino)pyridine(DMAP) at room temperature (Scheme 3.2-3) [57].
Thiol groups are preferentially reacted with aryl methylhalogenides, for
instance, bromo nitrobenzyl derivatives (Scheme 3.2-4). The conditions are
extremely mild (Tris buffer pH 7.2) and reactions were reported to be finished
within an hour [71].When the reactive caging group is equipped with a suitable
amino acid docking sequence, a specific cysteine can be labeled, even with a
300-fold excess of the reagent [21]. Another photoactivatable caging reagent
that covalently binds to thiols in proteins is the a-haloacetophenone group.
Its aromatic character is recognized particularly well by phosphotyrosine
phosphatases (PTP) [72,73].Accordingly,haloacetophenone groups are potent
photoreleasable inhibitors of PTPs in vitro. No details about the labeling
procedure have been published so far.
It is of special interest to label serine and threonine residues, due to their role
as acceptors for posttranslational modifications, namely, for phosphorylation.
Scheme 3.2-3 Introduction of caging groups to amino residues.

Tris buffer
- PH 7.2 R
-
R-SH +
Scheme 3.2-4 Introduction ofcaging groups to thiol residues. X is O H or halogen
To achieve the necessary alkylation, much harsher conditions are required.

Unfortunately, the strongly basic conditions of the Williamson ether synthesis
are unsuitable for halogenated o-nitrobenzyl reagents [74]. A more suitable
leaving group than the halogene is the well-known trichloracetamide group
(Scheme 3.2-5). However, the successful reaction requires strongly acidic
conditions (CF3S03H)and is used for protected amino acids rather than entire
peptides [75].
A milder method that is suitable for caging hydroxyl groups in proteins is the
reesterification with p-amidino esters of arylcinnamates. With the help of the
leaving group, deactivation of thrombin was achieved within 8 h at pH 7.4 [68].
The phosphorylated varieties are as important for functional studies of
peptides and proteins as the hydroxyl groups. Since the nucleophilicity of a
phosphate is only moderate, thiophosphates are frequently used as targets
for caging reactions. The same conditions that work for labeling cysteines
are applied for thiophosphates [71].Alternatively, 4hydroxyphenacyl bromide
(HP-Br) is employed to label a thiophosphothreonine in protein kinase A
under very mild conditions (1mM reagent, pH 7.3) [76]. For peptides, caged
phosphates can be conveniently introduced during solid phase synthesis via
phosphorous ( I I I) reagents [77 -801.
The above-mentioned coumarin cages were introduced to CAMPor cGMP
via the corresponding diazoalkanes [60]. The introduction of cages via diazo
compounds has great versatility and was used for numerous applications,
in particular, for caging small biologically active phosphate esters like ATP
and myo-inositol 1,4,5-trisphosphate (InsP3) [lo, 811. Usually, the carbonyl
CFsS03H NO2
R-OH + Cl3C"Q
' CH2C12 Rm*ocH3
OCH3
OCH3 OCH3
Scheme 3.2-5 A method that does not require base to form ethers o f hydroxy amino acids.
(b)
B r A T s N H - N h - B r d O H T s EtSN ~~d
\ \
AcO AcO AcO
Scheme 3.2-6 Two commonly used synthetic routes to diazo compounds. Ts - tosyl.
derivative of the caging group was reacted with hydrazine, followed by oxidation
to the diazo compound in the presence of MnOz (Scheme 3.2-G(a))[lo,81,821.
After the removal of MnOz by filtration and several washes, the diazo reagent
was used mostly without further purification. In an alternative method, a tosyl
hydrazone was formed. Treatment with base then gave the diazo compound
(Scheme 3.2-G(b))[GO, 611.
3.2.3
Caged Peptides and Proteins
The synthesis of caged peptides is accompanied by a series of obstacles.

That is the reason for the formerly small amount of caged peptides available
compared to other low-molecular-weight caged species. Proteins contain a
variety of nucleophilic sites and therefore the major problem is the site-specific
modification of a protein with an exogenous caging agent. Furthermore,
the absence of an appropriate nucleophilic residue at or near the desired
site of modification can be a problem. Finally, unlike low-molecular-weight
compounds, proteins and most peptides are generally not membrane-
permeant. The most obvious way to prepare a caged protein seems to be
the addition of a photoactivable group to a residue that is essential for protein
function. The problem is that the chemistry required needs to deal with entire
proteins and that the residue of interest is not usually unique within the
protein. Nevertheless, several approaches addressed the direct introduction
of cage groups on proteins, either on several residues simultaneously or
specifically on a single amino acid side chain.
3.2.3.1 Multiresidue Protein Caging

Preparation of caged proteins by introduction of an o-nitrobenzyl group
directed toward specific residues dates back to the mid-1990s. In a pilot
study, bovine serum albumin (BSA) was randomly labeled with up to 15
3.2 Controlling Protein Function by Caged Compounds 1 151
o-nitrobenzyl groups at Lys residues using either 2-nitrobenzyl alcohol or

1-(2-nitrophenyl)ethanolin the presence of diphosgene or l,l’-CDI, which
yielded up to 90% of caged protein [83]. Notably, the secondary alcohol
coupled with diphosgene, but not with 1,l’-CDI. Exposure of NB-labeled
BSA to UV light led to the release of about 60% of the coupled cages. The
incomplete photolysis was probably due to the propensity of the photoproduct
nitrosobenzaldehyde to either back-react with the protein or to dimerize to
azobenzene-2,2’-dicarboxylicacid, which was suggested to act as an internal
filter lowering the efficiency of photolysis [84]. NPE-labeled BSA, on the other
hand, readily furnished up to 95% of the native protein after UV treatment
(365 nm) with a time-dependent release of about 1/3 of coupled residues
after 1-2 min and about 213 of that after 5 min of exposure. Performing
the same caging strategy and using antibodies as models for both receptor
and ligands, these authors successfully modulated affinity of antibody-binding
sites for antigen, antigen binding sites for antibodies, and antibody Fc binding
sites for protein A using a NPE-coated human IgG before and after UV
treatment [85].
With the aim of studying the regulation of the G-actin monomer
pool and the assembly of F-actin filaments in living cells, Marriott
described both preparation and properties of G-actin conjugates [28, 861.
Using the lysine-directed 4,5-dimethoxy-2-nitrobenzyl chloroformate (DMNB-
OCOCl) and an optimized water-based chemistry protocol that avoided
overlabeling of the target protein (and thus, circumventing problems of
denaturation/insolubility/low yields of photoactivation), caged monomeric
G-actin was prepared in 30-60% yield, with an average of four DMNB
groups per monomer. Such LysG1-caged G-actin showed to be unable to
polymerize to F-actin in vitro, confirming that residue Lys6l was forming part
of an actin-actin interface in F-actin. Upon photo-deprotection with UV light
(320-400 nm) for 12 min, polymerized F-actin was obtained in 60-95% yield.
More recently, Lys-targeted protein caging with DMNB-OCOCl was
performed on the G-actin binding protein thymosin 8 4 (TB4) [87]. TB4 is
thought to be involved in the regulation of the large intracellular G-actin
pool. Native TB4 is known to inhibit actin polymerization in vitro by
binding to G-actin via a conserved nine-residue segment (LKKTETQEK,
residues 17-25) [88]. In the cited study, DMNB-labeled TB4 was shown
to be unable to bind to G-actin in vitro as a result of the unaffected rate
of polymerization compared to control actin. Subsequently, DMNB-labeled
TB4 was introduced by bead loading in locomoting fish epithelial keratocytes
and was photoactivated locally in the cell wings (871. Upon UV irradiation
(365 nm), very specific changes in the global locomotory pattern of keratocytes
were observed in vivo, with noticeable turning of cells. These observations may
be explained by local perturbation in actin filament dynamics brought by the
spontaneous increase of active, decaged TB4 concentration in the region of
irradiation.
152
I 3.2.3.2
Single Residue Protein Caging

A second labeling strategy aimed at the preparation of caged protein conjugates
is based on the targeted modification of essential cysteine residues using
photolabile alkyl halides [86], such as 2-bromo-2-(2-nitrophenyl)aceticacid
(CNB-Br),NB-Br, or DMNB-Br. Proteins to be caged at Cys residues can be
engineered from other proteins by cysteine-scanning mutagenesis: the useful
mutant will be the one that is inactive only after labeling with a thiol-reactive
caged reagent. Because only a single cage group is removed from a cysteine-
targeted caged protein, the photoactivation yield is usually higher compared to
DMNB-caged proteins [89].The main disadvantage of this approach may be
the necessity of generating and screening a large collection of mutants.
The synthesis and utilization of the water-soluble CNB-Br as a Cys-targeted
caging reagent was reported by Bayley and coworkers [go]. Staphylococcal
a-hemolysine ( a H L ) is a toxic polypeptide lacking cysteine residues. The
protein self-assembles to form a heptameric pore in cell membranes. A single
cysteine mutant R104C maintained this feature, while pore-forming activity
toward rabbit erythrocytes was lost upon derivatization of CyslO4 with CNB-
+
Br (100 10 mM Dithiothreitoe in aqueous buffer at pH 8.5, yield ca 80%).
Toxicity ofthe R104C mutant was regenerated by photoactivation with UV light
(300 nm, 30 min, yield ca 60%) and subsequent exposure to rabbit erythrocytes
(Fig. 3.2-4).
Marriott and Heidecker reported a Cys-caged heavy meromyosin (HMM)
using DMNB-Br and evaluated the capacity of photoactivated HMM to couple
the energy of calcium/actin-activated ATP hydrolysis to the movement of
F-actin filaments in an in vitro motility assay [29, 861. It was known from
labeling studies with the thiol-reactive fluorophore tetramethylrhodamine
0 20 40 60
Time (min)
Fig. 3.2-4 Hemolytic activity of decaged R104C a-hemolysine

(black circles) toward rabbit red blood cells (rRBC) measured by
monitoring light scattering at 595 nm versus a nonilluminated
sample (white circles). With permission from Ref. [go].
iodoacetate (IA-TMR) directed against Cys707 that this residue was crucial
for sliding of F-actin filaments in the in vitro motility assay. Therefore, it
was reasoned that Cys707-caged HMM could show a similar behavior, which
eventually could be reverted upon photoactivation. HMM was reacted with
DMNB-Br in aqueous buffer at pH 7.4. Two cage groups per HMM molecule
(or one cage per ATPase domain of HMM) were incorporated in the reported
protocol. Although the calcium/ATPase activity of purified caged HMM was
increased fivefold compared to unlabeled HMM, caged HMM failed to produce
appreciable sliding of F-actin filaments, unless irradiated with pulsed (500 ms)
340-400 nm UV light, conditions that produced sliding of 90% of F-actin
filaments in the in vitro motility assay with a velocity of up to 4 pm s-l, a value
comparable to unmodified HMM [%I.
Protein kinases constitute a large family of enzymes (>500) whose activity
includes the transfer of the y -phosphoryl group of ATP to serine, threonine,
and tyrosine residues in a wide range of protein substrates, giving rise to
a large collection of phosphorylation-based signal transduction pathways. A
well-defined spatially and temporally activatable kinase is of invaluable utility
in elucidating many aspects of signal transduction phenomena in living cells,
under both physiological and pathological conditions.
One of the best-studied kinases is protein kinase A. An interesting
comparison of the behavior of three different caged catalytic subunits of
PKA was reported by Bayley and colleagues [91]. Working with a single
cysteine mutant (C343S) of the murine catalytic subunit of PKA, the unique
Cys residue 199 was masked with the thiol-reactive cage groups NB-Br, CNB-
Br, and DMNB-Br. Cys199 is placed in close proximity to the critical Thr197
in the “activation loop” of the enzyme [92]. The caged protein showed, as
expected, a significant inactivation when kinase activity was tested in vitro with
the artificial substrate Kemptide (LRRASLG).Interestingly, only the NB-caged
enzyme showed, among the three, low values of residual activity after caging
(3-5%) and satisfactory activity after photolysis (pH 6.0,80- 100%)with respect
to the unmodified enzyme. Moreover, the quantum yield of photolysis was an
impressive 0.84. The ‘‘lesson’’from this work, using the authors’ phrasing,
is that given a particular target protein a variety of photoremovable protecting
groups have to be tested since a reagent that works well with one protein (for
instance, the CNB-caged aHL described earlier) may not work well with others.
Cofilin is a kinase-regulated, F-actin binding protein whose activation state
is regulated by phosphorylation at Ser3 through the LIM-domain-containing
kinase (LIM kinase). Unphosphorylated cofilin monomers bind cooperatively
to F-actin in vitro leading to depolymerization of actin filaments [93], while
phosphorylation by LIM kinase inactivates these features of the cofilin function
(Fig. 3.2-5).Lawrence and coworkers [94]observed that the cysteine mutant S3C
cofilin is constitutively active because it is unable to undergo phosphorylation
by LIM kinase, while a CNB-caged S3C cofilin is unable to depolymerize
actin filaments in vitro. This shows the importance of Ser3 for cofilin activity.
Accordingly, S3C cofilin activity was restored up to 80% upon irradiation and
154
Fig. 3.2-5 Activity o f cofilin initiated by local decaging. A 2-s laser pulse aimed at the area
indicated in F gave local protrusion within 1 t o 3 rnin. With permission from Ref. [95].
depolymerization of rhodamine-labeled actin filaments was assessed via an in

vitro light microscopy assay. Subsequently, these investigators could elegantly
extend the role of cofilin in vivo by microinjecting caged CNB-S3C cofilin (up
to 20 pM) into MTLn3 carcinoma cells and by exposing cell territories to UV
irradiation [95].
Cell-wide photoactivation increased free barbed ends, F-actin content, and
cellular locomotion, while highly localized activation generated lamellipodia
and determined direction of cell locomotion. Showing all the intrinsic power
of caged proteins in biological investigations in vivo, this study expanded the
effective role of cofilin in contrast to motility models in vitro, where cofilin was
predicted to only depolymerize F-actin.
Protein phosphorylation on tyrosine residues is an important posttransla-
tional modification playing a vital part both in physiological processes, such
as transmembrane signaling, and in pathological processes, for instance, in
cancer and immune dysfunctions [96].The levels of tyrosine phosphorylation
are regulated by the opposing action of protein Tyr kinases (PTKs),which cat-
alyze the formation of phosphotyrosine residues (pY) on target proteins, and
phosphotyrosine phosphatases (PTPs), which hydrolyze pY. PTPs of various
origins share a common domain of about 250 residues containing the unique
“signature motif’ (I/V)-HCxAGxxR(S/T) in which the catalytic phosphatase
cysteine is located [97]. Being generally less well characterized than protein
kinases, the precise role of PTPs in physiological and pathological conditions
still remains to be investigated in more detail.
Recently, a-halogenated acetophenones (phenacyl groups) have been
reported as a novel, membrane-permeant, non o-nitrobenzyl-based class of
caging reagents. They are capable of covalent, photoreversible (350 nm)
inhibition of PTPs at the catalytic cysteine (Scheme 3.2-7) [72,73].The different
a-bromo and a-chloro acetophenone derivatives were employed i n vitro to cage

the catalytic cysteine ofvarious prototypical phosphatases such as PTPlB, SHP-
1, and the catalytic domain of SHP-1, SHP-1 (ASH2). Recovery of enzyme
activity after irradiation at 350 nm (15 min) was in some cases obtained to a
maximum of 80% of the original value.
In the last years, reports have demonstrated the possibility of producing
caged proteins by targeting specific amino acid residues that are different from
lysine or cysteine.
After having described a catalytic Ca subunit of PKA caged at Cys199,
Bayley along with Zou and others presented a Ca caged at the active
threonine (Thr197) using the above-mentioned 4-hydroxyphenacyl photore-
movable protecting group [76]. The advantage of such a caging group
with respect to the classical o-nitrobenzyl derivatives was the rapid photo-
deprotection (k % 107-10s s ~ and ~ the
) lack of reactivity of the photolysis
product 4-hydroxyphenyl acetic acid [47, 981. The phenacyl methodology was
also employed to prepare caged thiophosphoryl peptides (see also below)
[76, 991: Ca catalytic subunit was first expressed as a recombinant mutant
protein (H6-T197C199A/C343S) in Escherichia coli. Exclusive thiophosphory-
lation of Thr197 was performed with the phosphoinositide-dependent kinase
(PDK-1) in the presence of ATP(y)S. Confirmation of thiophosphorylation
was assessed by Western blotting and gel-shift electrophoresis. Finally, pu-
rified thiophosphorylated Ca was caged with 4-HP-BR (Scheme 3.2-8) giving
rise to the modified protein HP-PsT197Ca showing an 18-fold reduction-
of specific kinase activity i n vitro toward Kemptide. Activation by photolysis
was performed with UV light (312 nm) at pH 7.3 with an 85-90% yield in
-
photoactivation, a quantum yield of 0.21, and a 15-fold increase in activity.
These are promising values for future in vivo studies.
Photoregulation of the catalytic activity of natural and recombinant human
BChE was described in 2003 [37].This enzyme is closely related to acetylcholine
hi.
6
S"H
s o
hV / +$OH
/ + Cys-protein
+ /e i $
OR OR OR OR
X = CI, Br ; R = H, CH,, CH,COOH
Scheme 3.2-7 Cysteine-containing proteins like phosphatases are caged in the active site
with phenacyl bromides or chlorides.
156
ATP(r)S HP-Br
Tlg7Ca b Ti 97Ca
Tig7Ca
PDK-1 kinase hv
I
0 0
I I
Br -s-p=o S-P-OH
Q
I II
OH 0
H0’
HP-Br= OH
Scheme 3.2-8 Caging ofthe catalytic subunit Ca of PKA was

achieved by thiophosphorylation and subsequent alkylation o f the
thiophosphate by 4-hydroxyphenacyl bromide (HP-Br).
esterase (AChE),the serine hydrolase that terminates cholinergic transmission

by hydrolysis of the neurotransmitter acetylcholine. Despite the fact that its
endogenous substrate has not been identified yet, this enzyme plays a key role
in detoxification by degrading esters such as succinylcholine and cocaine. In
the reported study, BChE was treated with a novel photoremovable alcohol-
protecting group, MNPCC targeted at the catalytic serine residue ofthe enzyme.
MNPCC seemed to act as a pseudoirreversibleinhibitor and the X-ray structure
of the MNPCC:BChE conjugate showed a nonambiguous carbamylation of
the catalytic residue as the only modification on the protein [37].Reactivation
of the caged enzyme was obtained at 365 nm (20 min, pH 7.4) and exhibited
an efficiency larger than 80%, as was determined by the Ellman test. The
same group previously intended to explore the efficient photoregulation in
crystals of the MNPCC:BChE conjugate was used to further determine the
mechanistic properties of BChE by time-resolvedX-ray crystallography under
cryophotolytic conditions [loo].
3.2.4
Caged Proteins by Introduction o f Photoactive Residues via Site Directed,
Unnatural Amino Acid Mutagenesis
Photochemical control of processes such as protein folding, protein-protein

or protein-ligand interactions may be achieved via an alternative procedure by
which the photochemical trigger - that is, the caged amino acid - is directly
incorporated into the native protein sequence as an unnatural residue.
The elegant and sophisticated - yet laborious - biosynthetic methodology
introduced by Peter Schultz made a wider exploration of protein functions
possible by de facto expanding the natural genetic code [23-251.
Introduction of an unnatural amino acid follows a series of defined steps that
are summarized here briefly: (a)the codon for the amino acid to be replaced
3.2 Contro/hg Protein Function by Caged Compounds 1 157
is substituted with a nonsense codon (like the amber stop codon UAG)
via standard site-directed mutagenesis, (b) a specific “nonsense suppressor”
tRNA able to recognize this codon is prepared and acylated with the desired
unnatural amino acid, (c) addition of the mutagenized gene or mRNA and the
aminoacylated suppressor tRNA to an in vitro extract or biosynthetic apparatus
generates a mutant protein containing the unnatural amino acid at the desired
position.
Thus, the generation of the specific suppressor tRNA, its acylation with the
unnatural residue, and the synthesis of sufficient amount of mutagenized
protein are the key steps of the entire methodology, more recently expanded
in some technical aspects from its original design [101-103].
With this technique, caged amino acids have been successfully introduced
into various protein sequences as unnatural residues. Enzymatic catalysis
before and after photoirradiation has been explored by means of caged residues
replacing the natural ones in critical positions. Schultz and coworkers described
a mutant phage T4 lysozyme (T4L)containing an aspartyl /3-nitrobenzyl ester
in place of the wild-type Asp20 in the active site of the enzyme [104]. This
residue, along with Glull, is responsible for the catalytic activity [105]. The
caged protein, produced in 37% yield, showed no activity in vitro. Conversely,
activity was restored to a 32% level compared to the wild-type enzyme after
irradiation at 315 nm (Hg-Xe arc lamp 1000 W). In another experiment these
investigators managed to photochemically initiate protein splicing from the
Thermococccus litoralis DNA Vent polymerase by introducing the 2-nitrobenzyl
ether of serine in the place of the conserved Ser1082 [106].
NB- or DMNB-caged aspartates were instrumental in controlling the
dimerization of HIV-1 protease [107].This enzyme exists as a 22-kDa monomer
that self-assembles into the active dimeric aspartyl protease. The active site is
placed at the interface of the homodimer and consists of Asp25 and Asp125,
both necessary for the proteolytic activity [108, 1091. Introduction of a NB-Asp
into position 25 led to minimal proteolytic activity, while its recovery after UV
irradiation (500 W mercury-xenon lamp, 10 min, 0 “C,pH 6.0) was about 97%
as revealed by a fluorescence-based protease assay [110]. The introduction of
the caged aspartate did not prevent dimerization, suggesting that H bonding
involving the wild-type residue is not a prerequisite for monomer association
of HIV-1 protease. Instead, it was believed that it affected the stability of the
dimer [107].
A similar behavior was shown by the H133A mutant of BamHI endonuclease
having incorporated a caged Lys132 [lll].Lys132 along with Glu167, Glu170,
and His133 participates in the salt-bridge network at the dimer interface of the
active wild-type enzyme [112, 1131. Site-directed introduction of DMNB-OCO-
Lys132 (yield 55%) in the H133A mutant did not prevent dimer formation
but abolished enzyme activity almost completely. Photoirradiation (365 nm,
20min, 0°C) led to a recovery of both activity and specificity toward a
substrate DNA (ADNA). A different behavior was shown for the H133A
BamHI mutant incorporating DMNB-Glul67 or DMNB-Glul70 which did not
158 3 Engineering Control Over Protein Function Using Chemistry
I exhibit recovery of activity after photoactivation, suggesting misfolding of the
protein subsequent to the introduction of these caged residues. A site-directed
incorporation of a phenylazo-Phe residue (azoAla) at the same position 132
was also performed (incorporation efficiency of 52%) [114]. Dimer formation
and enzyme activity was achieved by inducing trans-cis photoisomerization
of the azobenzene moiety. The substihtion K132azoAla produced a mutant
enzyme with drastically reduced activity (measured by cleavage efficiency of
a DNA substrate), while after irradiation and trans-cis isomerization almost
full activity was recovered compared to the wild-type enzyme. Thus, in its
trans conformation, the bulkiness of the azoAla residue prevented a correct
association of monomers, while the more compact size of the cis isomer did
not preclude the proper assembly into the active form. Gradual gain of activity
was observed within 5 min of photoirradiation (366 nm, 0°C) without further
increase in a global 20 min exposure time.
Several proteins are naturally produced as inactive proenzymes and acquire
full activity only when cleaved at a specific position by another enzyme.
Caspase-3, a cysteine protease, is a key component of the apoptosis signaling
pathway. Its inactive form procaspase-3 is cleaved at position Ser176 by caspase-
8 in the “death receptor-induced’’ apoptosis pathway, eventually forming the
active tetramer. Majima and coworkers artificially reproduced the activation
mechanism of procaspase-3 by photoinducing the cleavage of the backbone
in a mutant protein containing a Npg residue specifically introduced at
position 176 [115]. The incorporation efficiency of Npg by using an i n vitro
transcription/translation system was only 15%. Nevertheless, photoactivation
(366 nm, O’C, up to 10 min exposure time) of Npg-caspase-3 was followed
within 1 min by a clear activation of enzymatic activity as quantified by the
change in fluorescence of the peptidic substrate Z-DEVD-rhodamine 110.
All these studies were performed i n vitro. Some i n vivo experiments with
caged proteins engineered by nonsense suppression were successful, especially
on the acetylcholine receptor.
In the mouse muscle nicotinic receptor (nAChR), NB-tyrosine was
incorporated at positions 93 and 198 of the (Y subunit. These are conserved
residues crucial for acetylcholine binding. The mutagenized mRNA and the
relative nonsense suppressor tRNA charged with the NB-Tyrwere injected into
Xenopus oocytes. The channel was successfully expressed and incorporated into
the egg membrane [ 1161. In the following voltage-clamp study, a train of about
20 near-UV laser pulses (300-350 nm) was able to increase acetylcholine-
induced conductance across the membrane with about 5% of decaged Tyr
residues in any one flash.
A qualitatively similar result was achieved in another elegant experiment
where the same ion channel was mutagenized by direct incorporation of
NB-Cys or NB-Tyr replacing a conserved leucine residue in the y subunit that
is known to be involved in channel gating [117].As stated by these authors, the
work represented the first successful incorporation of caged amino acids into a
transmembrane segment of a membrane protein. Interestingly, the presence
3.2 Controlling Protein function by Caged Compounds 1 159
of the bulky nitrobenzyl group did not disturb both assembly and trafficking
of the receptor, but likely distorted its conformation leading to an alteration of
the conductance. This condition was reverted by photoactivation performed
with 1-ms pulses of UV light. The different and characteristic kinetics of
channel activation after flash photolysis for tyrosine and cysteine for the
respective caged receptors were determined. Oocytes expressing the mutant
acetylcholine receptor wVall32Npg showed acetylcholine-induced conductance
similar to the wild type! but upon photoinduced cleavage of the backbone in
the localized region of the w subunit about 90% of the current was lost. Thus,
in addition to playing a key role in the correct assembly of the various subunits,
this conserved portion proved to be essential for receptor function [22].
The work of this group clearly showed the importance and usefulness of
caged proteins as tools for the elucidation of protein function in living cells
[118- 1201.
3.2.5
Small Caged Molecules Used to Control Protein Activity
An alternative method to modifying the protein of interest is to control its

function by an inhibiting or activating ligand. Since these ligands can be
small peptides or other small molecules, a caging group is usually introduced
by preparative chemistry. After decaging, interaction between ligand and
protein is permitted, the protein is either silenced or activated. For life cell
applications, the small molecule ligand has to be membrane-permeant or needs
to be introduced by physical methods like microinjection or electroporation.
Among the many caged ligands reported so far are various cyclic and noncyclic
nucleotides [19, 59, 82, 121, 1221, nitric oxide [123], lipids [go, 124-1261,
carbohydrates [80, 127, 1281, inositol polyphosphates [81, 129-1311, ion
chelators [40,132, 1331, amino acids [57, 134, 1351, receptor agonists [136,
1371, and many others [138]. Because the synthesis and application of these
small molecules has been thoroughly reviewed before [l,7, 44, 1391, we will
not discuss them in detail.
3.2.5.1 Caged Peptides

Some of the most potent modulators of protein function are peptides. To
introduce a cage at the correct position, essential residues need to be known.
Alternatively, libraries of potential binding peptides have to be prepared and
tested. There are only a handful of amino acid residues suitable for introducing
a caged group. Typical side chains are those of the basic and acidic amino
acids and the nucleophilic thiol group of cysteine. In addition, phosphorylation
usually takes place at the alcohol groups of serine, threonine, or tyrosine and
caging groups on these residues render the phosphorylation site inaccessible
until the cage is removed. Solid phase peptide synthesis (SPPS) also permits
160
I the introduction of phosphorylated residues equipped with a cage group
attached to the phosphate. From a synthetic standpoint, there are two ways
of preparing caged phosphopeptides: by using an already assembled caged
phosphoamino acid or by introducing the caged phosphate after cleavage of
the mature peptide from the resin. Phosphopeptides will bind to proteins
usually interacting with phosphoproteins as soon as the cage is removed. With
the help of membrane-penetrating peptide sequences, “peptide interference”
is now on its way into biology labs.
3.2.5.1.1 Caged Basic Residues

Caged lysine in form of N‘-o-nitrobenzyloxycarbonyllysine was reported as
a building block suitable for Fmoc-SPPS. It was used for the preparation
of caged AIPs, autocamtide-2 related inhibitory peptides [2, 1401. AIP
(KKALRRQEAVDAL) is a highly specific inhibitor of calcium/calmodulin-
dependent protein kinase I1 (CaMKII). The first two lysine residues play
an important role for its activity [141]. As expected, caged AIPs showed
significantly reduced inhibitory activity in vitro toward CaMKII (IC50 =
1.2 x M) and gave instantaneous recovery of activity after irradiation
(IC50 = 3.6 x lo-’ M, as for natural AIP). Interestingly, the photolysis by-
product nitrosobenzaldehyde did not interfere with the behavior of the
photoactivated peptides.
3.2.5.1.2 Caged Tyrosine Residues

One of the first caged peptides contained a NB-caged tyrosine that was
introduced via Fmoc-SPPS [142]. Fmoc-Tyr(NB) was used to prepare caged
neuropeptide Y (NPY) and caged angiotensin I1 (AII) peptide [142]. NPY
is a 36-amino acid peptide containing Tyr residues at both the N- and the
C-termini. It localizes in both the central and peripheral nervous system and
is potentially involved in various physiological roles, including blood pressure
regulation, anxiety, circadian rhythms, and feeding behavior. Structure/activity
relationship studies indicated that both the N- and the C-terminal fragments
of NPY are essential for the activation of Y 1 receptors [143]. Introduction of
one caged Tyr at the naturally occurring Tyr positions at the N/C-termini of
NPY led to a decrease of about 1 order of magnitude after activation of the
Y1 receptors in SK-N-MCcells, with additional reduction when two caged Tyr
were incorporated at both termini of NPY. Restoration of activity assessed by
the binding assay performed after UV irradiation demonstrated the successful
role ofthe nitrobenzyl group as a cage for Tyr residues and for the NPY peptide
itself.
Interestingly, no differences in activity toward A11 receptors in human
neuroblastoma SMS-KAN cells were found between caged and unmodified
A11 peptides, indicating that the Tyr residue in this eight-amino acid peptide
is not involved in binding to the receptor [142].
The 20-amino acid residue peptide RS-20, whose sequence derives from
smooth muscle myosin light chain kinase (M LCK),is a well-known calmodulin
binding peptide [144]. Both, RS-20 and LMS-1, a 13-residue peptide derived
from the autoinhibitory domain of MLCK, have the capability of inhibiting
MLCK phosphorylation activity, normally directed toward the molecular motor,
actin binding protein myosin 11, which is involved in physiological phenomena
like cell polarization and locomotion [145, 1461.
The interaction of RS-20 with its target protein calmodulin has been
extensively studied and hydrophobic residues Trp5 and Leu18 were shown to
be critical for binding [147, 1481. Tyr9 in LMS-1 peptide is in turn crucial for
the inhibitory effect as is predicted from mutagenesis studies on MLCK [149].
Walker and others expanded the study on these molecules, both in vitro and in
vivo, using a caged version ofboth peptides (Scheme 3.2-9)[150].Trp5 in RS-20
was replaced with a masked tyrosine bearing a CNB cage on the phenolic group.
The carboxylic group of the cage mimicked the negative charge of a glutamate,
a mutation known to have a negative effect on binding. Accordingly, the caged
RS-20 peptide was largely unable to bind to calmodulin, as assessed in vitro
by a quantitative calmodulin-dependent MLCK assay. The photoproduct 5Y-
RS-20 generated after 10-min irradiation at 300-400 nm showed an apparent
50-fold increase in its affinity toward calmodulin. A similarly Tyr9-caged
LMS-1 proved to be an effective switchable inhibitor of MLCK in vitro, being
indistinguishable from authentic LMS-1 in its inhibitory potency. The effect
of local photoactivation of the two caged peptides was finally assessed in
vivo in fast-moving Newt eosinophil cells [151]. Peptides were introduced by
microinjection in an estimated concentration of 20-100 pM. Photoactivation
9
0 COOH
NO,
+
L I 1
5cgY-RS-20 H,N-ARRKYQKTGHAVRAIGRLSS-COOH
- hv
peptides +
0C
,O
,OH
9cgY-LMS-1 H,N-LSKDRMKKYMARR-COOH
r~ 1
Scheme 3.2-9 The calmodulin binding peptide RS-20 and LMS-1,

a peptide that inhibits myosin phosphorylation, caged at different
tyrosines. Both peptides were successfully used in eosinophils
after microinjection [151].
162
I was performed locally by pulsed near-UV laser light (series of 10 pulses with
a 5 ms duration at 20 ms intervals) with concomitant microscopic observation

of cells. Photorelease of active peptides was followed, within a few seconds,
by acute paralysis of cell movement, ceased flow of cytoplasmic granules
and inhibition of forward motion of the leading lamellipodia. These results
suggested that calcium/calmodulin regulation of MLCK activity is a major
signaling pathway underlying locomotion in eosinophil cells in vivo, and that
the myosin I1 motor target of MLCK activity is strongly involved in these
motility functions.
3.2.5.1.3 Caged Cysteine and Thiophosphoryl Residues

As mentioned above, Pan and Bayley reported a generally applicable approach
for caging cysteine-containing peptides or thiophosphorylated peptides on
serine residues in aqueous solution using o-nitrobenzyl bromides such
as NB-Br, CNB-Br, and DMNB-Br [71]. Kemptide (LRRASLG), C-kemptide
(LRRACLG), and CS-kemptide dimer (LRRACLGLRRASLG) were used as
model peptides in this study. Both, Kemptide and CS-Kemptide dimer, were
successfully thiophosphorylated on Ser residue using ATP(y)S and PKA
catalytic subunit. Thiophosphorylated kemptide peptide was subsequently
treated with the three different cages, respectively. At pH 7.2, only NB-Br
and DMNB-Br cages were found to react satisfactorily with the thiophosphate
group, producing the corresponding caged peptides in 95% yield. CNB-Br was
found to be close to unreactive (10% yield at pH 4.0), hence the synthesis to
this caged peptide was no longer pursued.
Photoactivation of NB- and DMNB-thiophosphoryl-caged Kemptide at
290-380nm was obtained with a yield of 70 and 55% and with quantum
yields of 0.23 and 0.06, respectively (Scheme 3.2-10).
Selective caging was examined on the CS-Kemptide dimer. The goal was
to selectively introduce a cage at a thiophosphoryl-Ser residue over a cysteine
SH op02s-
I I
H2N-LRRACLGLRRASLG-COOH
j
NB-Er. pH 4.0
O z N q s
$;:‘*Hy
O=P-OH
s
O=P-OH
NO2
0 SH 0
I I I I
H2N-LRRACLGLRRASLG-COOH H2N-LRRACLGLRRASLG-COOH
Scheme 3.2-10 The selective introduction of cages to thiophosphates versus cysteines is

p H dependent.
residue. At pH 4.0 NB-Br (2 mM in 100 mM sodium acetate) showed good

selectivity for the alkylation of thiophosphate, while at pH 7.2 both Cys
and thiophosphoryl residues reacted with NB-Br as was determined by
MALDI-MS.
Cysteine-targeted caging of C-Kemptide was performed with all three
photolabile groups mentioned above at pH 7.2 with a consistent yield of
caged product (95%), while photolysis with UV light (h,,, = 312 nm) gave
yields varying from 62 to 70% and quantum yields from 0.15 to 0.62 at pH 5.8,
with a slight decrease in performance at pH 7.2.
Finally, NB-caged thiophosphoryl kemptide was used to test the activity of
phage h protein phosphatase (h-PPase)before and after photoactivation. The
thiophosphate group of NB-caged thiophosphoryl kemptide was fully protected
against h-PPase activity, whereas the correspondent group in the unmodified
peptide was hydrolyzed to an extent of 90% when incubated at 30°C for
3 h. After UV treatment for 40 min, the uncaged thiophosphoryl kemptide
underwent thiophosphate hydrolysis to about 70%.
A similar strategy was employed to produce caged thiophosphotyrosyl
peptides [99]. The sequence EPQYEEIPILG was thiophosphorylated on the
tyrosine residue by action of hematopoietic cell kinase (Hck) in the presence
of Co" ions (the authors explain how thiophosphorylation on Tyr with
ATP(y)Sand tyrosine kinases failed in conditions that normally work well with
standard ATP) and afterwards attached the thiophosphate group again with
both NB-Br and 4-HP-Br, respectively. The peptides EPQYp,(HP)EEIPILG
and EPQYp,(NB)EEIPILG were obtained in 90 and 75% yield, respectively,
regardless of the pH of the reaction buffer (range 5.8 to 8.0). Irradiation
of the EPQYp,(HP)EEIPILG peptide at 312 nm afforded the photoproduct
EPQYP,EEIPILG with 50-70% yield. Quantum yields were 0.65 and 0.56 at
pH 5.8 and 7.3, respectively. The same treatment of EPQYp,(NB)EEIPILG
gave EPQYp,EEIPILG in 50 to 60% yield, with quantum yields 0.37 and 0.25
at pH 5.8 and 7.3, respectively. It was verified that the caged peptides were
no longer able to bind to an SH2 domain in vitro, while this feature was
completely restored upon photolysis (Scheme 3.2-11).Despite the promising
characteristics of the above described thiophosphorylated peptides (especially
the HP-caged one), to the best of our knowledge, no study has yet been reported
in vivo.
By means of caged peptides, Lawrence and coworkers successfully prepared
a caged protein kinase A in two different ways, (a) via a peptidic affinity label
[21] and (b) via a caged inhibitor [152].The peptidic affinity label was designed
to target Cys199 in the active loop of the enzyme, interacting with PKA active
site only in its caged form, while transforming itself into a low affinity ligand
upon photoactivation. This peptide was synthesized by SPPS (see Fig. 3.2-6)
and coupled at the C-terminus to the a-carboxyl group of a CNB cage via a
diethylamine linker. The caged ligand was subsequently coupled to the thiol
group of Cys199, finally affording the caged enzyme.
164
I
HZN-EPQYEEIPILG-COOH
Kck kinase, Co"
NB-Br (75%)
H2N-EPQYEEIPILG-COOH H2N-EPQYEEIPILG-COOH
hv 312nm I
hv 312nm HP-Br
(50-70%) (90%) 02N7$
NB-cagedpeptide (inactive)
HP-cagedpeptide (inactive)
%OH
Scheme 3.2-11 Tyrosine residues equipped with various caging groups rendered
peptides inactive with respect to SH2-domain binding.
Fig. 3.2-6 Protein kinase A
labeling approach. Underlined

letters represent amino acids in
the one-letter code.
This caged PKA showed less than 2% of the activity displayed by the native
protein, while UV irradiation (300-400 nm, up to 15 min) restored about 50%
of the activation of the unmodified enzyme in vitro. Following these in vitro
observations, 3-7 pM solutions of caged PKA were microinjected in living rat
embryo fibroblasts (REF)-10-fold dilution was estimated after injection - and
irradiated with near-UV light (300-400 nm, up to 15 min). In these cells,
photoactivation of PKA led to disruption of actin stress fibers, membrane
rufling, and change of cell shape from flat to rounded, in accordance with
the phenotype observed when unmodified, active catalytic PKA subunit was
injected into the same cells. Microinjected cells that were not exposed to UV
irradiation retained their stress fibers and flat morphology, indicating that the
PKA-inducedpathway had not been activated [21].
PKI is a heat-stable protein first described in 1982 as a potent inhibitor
of PKA [153]. On the basis of a short binding sequence, a potent inhibitor
peptide with the sequence GRTGRRNAI was identified. The underlined
Arg residue played an essential role for the inhibitory behavior of this
3.2 Controlling Protein Function by Caged Compounds 1 165
peptide [154].Consequently, a peptide containing an L-ornithine replacing the

arginine residue was prepared. The latter was guanidinylated to obtain a caged
arginine, the first example described of this kind [ 1521. The guanidinylating
reagent resulted from the synthesis of DMNB-OCOCIand S-methylisothiourea
(Scheme 3.2-12). The caged peptide was shown to be a SO-fold poorer
inhibitor of PKA in vitro (K;= 20 pM) compared to the uncaged counterpart
(K;= 420 nM).
When REFS were exposed to the membrane-permeant PKA activator,
8-(4-~hlorophenylthio)-cAMP (CPT-CAMP),they underwent the same mor-
phological transformation as described above (disruption of actin stress fibers
leading to cell shape changes). In contrast, cells microinjected with the caged
peptide (5 pM estimated intracellular concentration) and exposed to UV irradi-
ation were unable to respond to the CPT-CAMPstimulus, demonstrating that
the CPT-CAMPactivation of the PKA pathway had been efficiently blocked in
vivo by the decaged peptide [152].
3.2.5.1.4 Caged Phosphorylation Sites and Caged Phosphopeptides

Recently, a Ser-caged,photoactivatable fluorescent peptide probe that monitors
protein kinase C (PKC) activity was described [75].As expected, the Ser-caged
peptide failed to serve as an effective PKC substrate in vitro,but upon light-
induced deprotection (300-400 nm, h,,,360 nm, 90 s), the serine became
phosphorylated and enzyme activity was recorded as a convincing change in
the fluorescent properties of the probe. Photoconversion was estimated to
occur with 60% yield and a quantum yield of 0.06.
With this probe, the investigators also studied the light-induced sampling
of PKC activity in HeLa cells in vivo. Exposure of cells to phorbol ester (TPA)
normally induces PKC activity. HeLa cells microinjected with the caged probe
at an estimated concentration of 20 pM failed to display a fluorescent response
to TPA, while a robust response was recorded as a result of a concomitant TPA
treatment and UV irradiation (365 nm at 1 J cm-2).
Scheme 3.2-12 A peptide caged at an arginine residue was

prepared by attaching a DMNB-coupled S-methylisothiourea
reagent t o ornithine [152].R represent further amino acids.
166
The phosphorylated varieties with a cage attached to the phosphate

are as desirable as caged serine or threonine. Imperiali and colleagues
have lately introduced an elegant and general method for the synthesis
of peptides containing 2-nitrophenylethyl-caged phosphoserine, phospho-
threonine, and phosphotyrosine by integrating an interassembly approach
into Fmoc-SPPS [78]. The recently reported method for the synthesis of
the phosphocaged Fmoc-building blocks - namely, N-a-Fmoc-phospho(1-
nitrophenylethyl-2-cyanoethyl)-~-serine, -threonine and -tyrosine - is superior
to the introduction of cages to the growing peptide on resin. Especially, the ox-
idation step required in phosphorous(111) chemistry was potentially hazardous
toward oxidation-sensitive residues C-terminal of the caged amino acid [79].
A caged phosphoserine octapeptide equipped with the environmentally
sensitive fluorophore 6-(2-dimethylaminonaphthoyl)alanine (DANA) [155]was
used in vitro to probe the phosphorylation-dependent binding to 14-3-3
proteins [156], a highly conserved family of proteins that plays a role as an
intermediate in the cell cycle regulation through phosphorylation-dependent
protein-protein interactions [157].The caged phosphopeptide was unable to
bind to the target 14-3-3protein as opposed to the photoproduct after irradiation
at 365 nm. This could be monitored by the shift of fluorescence of the DANA
amino acid from heml = 522 n m (unbound peptide) to hem2= 501 n m (bound
peptide).
The investigators have more recently described the use of such caged
phosphoserine-containing phosphopeptides to perform a UV-induced, “chem-
ical” knock-down of the entire 14-3-3 protein family thereby observing
the effects on cell cycle progression in vivo [158]. A derivative caged at
the phosphoserine position of a good 14-3-3-binding motif sequence like
MARRLYRpSLPAKK [159]was prepared by SPPS. The efficiency of the pho-
toactivationwas first tested in vitro under conditions mimicking irradiation of
cultured cells (365 nm, 90 s, 2.8 J m-’ irradiation dose).The uncaged phospho-
peptide was obtained in about 80% yield, quantum yield of 0.43 and was able to
compete with cellular proteins for 14-3-3binding in vitro, as demonstrated by
competitive binding assays performed in U20S cell lysates (Scheme 3.2-13).
The caged phosphopeptide was subsequently supplied to living U20S cells by
connecting it to the cell-permeable Penetratin sequence [161]via a disulphide
bond between N-terminal cysteine residues. After internalization and release
from vector peptide by spontaneous hydrolysis of the disulfide bridge, effects
of uncaged phosphopeptide disturbance on 14-3-3 binding to natural target
proteins were studied under several conditions.
For instance, synchronized U20S cells that received the peptides in an early
G2 phase and were subjected to UV treatment (365 nm, 90 s) showed (a) an
increased cell death ratio compared to controls, (b) an uncontrolled premature
entry into M phase accompanied by mitotic catastrophe, and (c) a striking
reduction in the stable G1 cell population, suggesting that 14-3-3 proteins
normally regulate the onset and timing of mitosis in cycling cells and maintain
stable interphase arrest in noncycling cells. The role of 14-3-3 proteins in
Ac CONHp
CONHp
"\
I
522 nm
ACC
-ONH~
O f
<?
0-P=O
\
h = 501 nm
\ /N\
Scheme 3.2-13 An octapeptide equipped with the

environmentally sensitive dye DANA. Only after decaging, binding
t o 14-3-3 domains is possible and is measured by a shift in
fluorescence due t o the change in the lipophilicity ofthe
environment [160].
the S-phase checkpoint response to DNA damage is speculative, since cells

incubated with caged peptides and simultaneously exposed to both UV-A
and UV-B (respectively 365 and 302 nm, 90 s) to induce uncaging and DNA
damage were unable to sustain S-phase arrest compared to controls, resulting
in ca SO% early apoptotic cell death.
To prepare larger phosphoproteins with cages on the phosphate moiety,
it was necessary to combine the synthesis of caged phosphopeptides [78,
791 with expressed protein ligation [162, 1631. The ligation of a recombinant
Smad2-MH2 thioester with the doubly NPE-caged C-terminal phosphopeptide
yielded a recombinant protein that formed a heterodimer with the cytosolic
retention factor Sara (Smad anchor for receptor activation). Decaging permitted
the release of Sara and subsequently the formation of active homotrimers.
Decaging was also followed in digitonin-permealized HeLa cells by monitoring
nuclear entry of Srnad2-MH2 after illumination [162]. This methodology was
extended using a cage in the backbone of the MH2 peptide. Photorelease of
the bulky N-terminus permitted homotrimerization. This was made visible by
adding fluorescein next to the phosphorylation sites and a dabcyl quencher to
the N-terminus. Photoinduced homotrimerization was therefore accompanied
by a strong increase of the fluorescein fluorescence [164].
168
I 3 Engineering Control Over Protein Function Using Chernistv
MeoX:r-""
Fig. 3.2-7 A chemotactic tripeptide caged
at the N-formyl group.
\
Me0 H
YN'Met-Leu-Phe-OMe
0
3.2.5.1.5 Other Caged Residues

Some N-formylated peptides are known to promote chemotaxis in mammalian
leukocytes, acting specifically via the formyl peptide receptor (FPR) located
on the plasma membrane of neutrophils [165].Among them, the most active
peptide is the tripeptide N-formyl-&)Met-&)Leu-(L)Phe.Caged versions of
such a peptide have been synthesized employing either nitroveratrylaldehyde
or nitropiperonaldehyde as photoremovable protecting groups at the N-formyl
moiety (Fig. 3.2-7) [lGG]. Although the described caged peptides exhibited a
drop of activity by 3-4 orders of magnitude in a rat basophilic leukemia RBL-
2H3 cell line, a study concerning photoactivation in vivo and related effects on
chemotaxis has not yet been reported.
3.2.6
Conclusions
Caged compounds including caged proteins are extremely useful tools to

study biochemical processes inside and outside of living cells. The respective
molecules have been employed in a large variety of areas. However, the overall
number of research groups benefiting from the technology is still fairly small.
It would be desirable if novel caging groups, caged molecules, and ready-to-use
decaging equipment would be more easily accessible. We will definitely see
more of the exciting applications in the future. For this, as in more and more
areas in biology, the close collaboration of chemists and biologists will be
indispensable.
References
J.M. Nerbonne, Curr. Opin. H. Furuta, T. Tsien, R.Y. Okamoto,

Neurobiol. 1996, 6, 379-386. H. Ando, Nat. Genet. 2001, 28,
Y. Tatsu, Y. Yumoto, N. Shigeri, 317-325.
Phamzacol. Ther. 2001, 91, 85-92. H. Fumta, T. Okamoto, H. Ando,
K. Lawrence, D.S. Curley, Curr. Opin. Methods Cell. Biol. 2004, 77, 159-171.
Chem. Biol. 1999, 3,84-88. M. Givens, R.S. Goeldner, (Eds.),
L. Heckel, A. Krock, Angew. Chem. Dynamic Studies in Biology,
Int. Ed. 2005, 44,471-473. WileyIVCH, New York, 2005.
References I169
8. J.A. Schofield, P. Barltrop, 26. L.E. Collins, C.S. Gilmore, M.A.
Tetrahedron Lett. 1962, 697-699. Carlson, J.E. Ross, J.B.A.
9. C.G. Bochet, J. Chem. SOC.Perkin Chamberlin, A.R. Steward, J. Am.
Trans. 12002, 125-142. Chern. Soc. 1997, 119,6-11.
10. J.W. Reid, G.P. McCray, J.A. 27. B. Woodward, R.B. Patchornik,
Trentham, D.R. Walker,J. Am. A. Amit,J. Am. Chem. SOC.1970, 92,
Chem. SOC.1988, 110,7170-7177. 6333-6335.
11. V. Frings, S. Bendig, J. Lorenz, 28. G. Marriott, Biochemistry 1994, 33,
D. Wiesner, B. Kaupp, U.B. Hagen, 9092-9097.
Angew Chem. Int. Ed. 2002, 41, 29. G. Heidecker, M. Marriott,
3625-3628. Biochemistry 1996, 35, 3170-3174.
12. J. Schlaeger, E.J. Engels,J. Med. 30. R. Zehavi, U. Naim, M. Patchornik,
Chem. 1977,20,907-911. A. Smirnoff, P. Golan, Biochem.
13. B.Z.U. Patchornik, A. Amit, 1sr.J. Biophys.Acta Prot. Strut. Mol.
Chem. 1974,103-113. Enzymol. 1996, 1293,238-242.
14. H. Wong, W.K. Schnabel, 31. A.R. Xu, Y.J. Vakulenko, A.V. Wilcox,
W. Schupp, J. Photochem. 1987, 36, A.L. Bley, K.R. Katritzky, J. Org.
85-97. Chem. 2003, 68,9100-9104.
15. Q.Q. Schnabel, W.Schupp, H. Zhu, 32. J.F. Wootton, D.R. Trentham in
1.Photochem. 1987, 39, 317-332. Photochemical Probes in Biochemistry,
16. J.H. Forbush, B. Hoffman, J.F. (Ed.: P.E. Nielsen), Kluwer Academic
Kaplan, Biochemistry 1978, 17, Publishers, 1989, pp 277-296.
1929-1935. 33. M. Viola, R.W. Johnson, K.W.
17. T. Matsubara, N. Billington, A.P. Billington, A.P. Carpenter, B.K.
Udgaonkar, J.B. Walker, J.W. Mccray, J.A. Guzikowski, A.P. Hess,
Carpenter, B.K. Webb, W.W. G.P. Wilcox,J. Org. Chem. 1990, 55,
Marque, J. Denk, W. McCray, J.A. 1585- 1589.
Hess, G.P. Milburn, Biochemistry 34. H. Eisele-Buhler, S. Hermann,
1989,28,49-55. C. Kvasyuk, E. Charubala,
18. E. Millar, N.C. Homsher, Annu. Rev. R. Pfleiderer, W. Giegrich,
Physiol. 1990, 52, 875-896. Nucleosides Nucleotides 1998, 17,
19. J.E.T. Barth, A. Munasinghe, V.R.N. 1987-1996.
Trentham, D.R. Hutter, M.C. Corrie, 35. K.R. DeLisi, C. Laursen, R.A.
J. Am. Chem. SOC. 2003, 125, Bhushan, Tetrahedron Lett. 2003, 44,
8546-8554. 8585-8588.
20. A.P. Walstrom, K.M. Ramesh, 36. E.M.H. Hadfield, A. Waiters,
D. Guzikowski, A.P. Carpenter, B.K. S . Wakatsuki, S. Bryan, R.K.
Hess, G.P. Billington, Biochemistry Johnson, L.N. Duke, Phil. Trans.
1992,31,5500-5507. Royal Soc. Ser. A Math Phys. Eng. Sci.
21. K. Lawrence, D.S. Curley, J. Am. 1992,340,245-261.
Chem. SOC.1998, 120,8573-8574. 37. S . Nicolet, Y. Masson,
22. P.M. Lester, H.A. Davidson, P. Fontecilla-Camps, J.C. Bon,
N. Dougherty, D.A. England, Proc. S . Nachon, F. Goeldner, M. Loudwig,
Natl. Acad Sci. U. S. A. 1997, 94, Chembiochem2003,4, 762-767.
11025-1 1030. 38. A. Goeldner, M. Specht, Angew
23. V.W. Mendel, D. Schultz, P.G. Chem. Int. Ed. 2004, 43, 2008-2012.
Cornish, Angew Chern. Int. Ed. 1995, 39. M.A. Goldman, Y.E. Trentham, D.R.
34,621-633. Ferenczi, J. Physiol. (London)1989,
24. C.J. Anthonycahill, S.J. Griffith, M.C. 418, P155.
Schultz, P.G. Noren, Science 1989, 40. S.R. Kao, J.P.Y. Tsien, R.Y. Adams, J.
244,182-188. Am. Chern. SOC. 1989, 1 1 I ,
25. D.Cornish, V.W. Schultz, P.G. 7957-7968.
Mendel, Annu. Rev. Biophys. Biomol. 41. A. Grewer, C. Ramakrishnan, L.
Struct. 1995, 24, 435-462. Jager, J. Gameiro, A. Breitinger,
170
I H.G.A. Gee, K.R. Carpenter, 60. V. Frings, S. Wiesner, B. Helm,
B.K. Hess, G.P. Banerjee, J . Org. S. Kaupp, U.B. Bendig, J. Hagen,
Chem. 2003, 68,8361-8367. Chembiochem2003,4,434-442.
42. C.G. Bochet, Tetrahedron Lett. 2000, 61. T. Takeuchi, H. Isozaki,
41,6341-6346. M. Takahashi, Y. Kanehara,
43. A. Bochet, C.G. Blanc, J . Org. Chem. M. Sugimoto, M. Watanabe,
2002, 67,5567-5577. T. Noguchi, K. Dore, T.M. Kurahashi,
44. G. Prestwich, G.D. Dorman, Trends T. Iwamura, M. Tsien, R.Y. Furuta,
Biotechnol. 2000, 18, 64-77. Chembiochem2004,5,1119-1128.
45. R.S. Athey, P.S. Matuszewski, 62. D.H.R. Sammes, P.G. Weingarten,
B. Kueper, L.W. Xue, J.Y. Fister, G.G. Barton,]. Chem. SOC.( C ) 1971,
T. Givens, J . Am. Chem. Soc. 1993, 721-725.
115,6001-6012. 63. D.A. Patchornik, A. Amit,
46. R.S. Kueper, L.W. Givens, Chem. Rev. B. Ben-Efraim,]. Am. Chem. SOC.
1993, 93,515-66. 1976,843-844.
47. R.S. Jung, A. Park, C.H. Weber, 64. G. Ogden, D.C. Barth, A. Corrie,
J. Bartlett, W. Givens, J . Am. Chem. J.E.T. Papageorgiou, J. Am. Chem.
SOC. 1997, 119,8369-8370. SOL. 1999, 121,6503-6504.
48. K. Corrie, J.E.T. Munasinghe, V.R.N.
65. L.P.J. White, J.D. Burton, Tetrahedron
Wan, P. Zhang,]. Am. Chem. SOC. Lett. 1980, 21, 3147-3150.
1999, 121,5625-5632.
66. P.N. Woodward, R.B. Confalone,
49. A. Falvey, D.E. Banerjee, J. Am.
J . Am. Chem. Soc. 1983, 105,
Chem. Soc. 1998, 120,2965-2966.
902-906.
50. J.C. Wilson, R.M. Sheehan, J . Am.
67. A.D. Pizzo, S.V. Rozakis, G.W.
Chem. SOC.1964,86,5277.
Porter, N.A. Turner,]. Am. Chem.
51. J.E.T. Trentham, D.R. Corrie,].
SOC.1987, 109,1274-1275.
Chem. SOC.Perkin Trans. 11992,
2409-2417. 68. A.D. Pizzo, S.V. Rozakis, G. Porter,
52. Y.J. Corrie, J.E.T. Wan, P. Shi,J. Org.
N.A. Turner,J. Am. Chem. SOC.1988,
Chem. 1997, 62,8278-8279. 110,244-250.
53. R.S. Chan, S.I. Rock,J. Am. Chem.
69. M.C. Lee, Y.R. Pirrung, J . Org. Chem.
Soc. 1998, 120,10766-10767. 1993,58,6961-6963.
54. K.C. Rock, R.S. Larsen, R.W. Chan, 70. M.C. Fallon, L. Zhu, J. Lee, Y.R.
S.I. Hansen, J . Am. Chem. Soc. 2000, Pirmng,]. Am. Chern. Soc. 2001, 123,
122,11567-11568. 3638-3643.
55. M.A. Balduzzi, S. Mohamed, 71. P. Bayley, H. Pan, F E B S Lett. 1997,
M. Gottardo, C. Brook, Tetrahedron 405,81-85.
1999,55,10027-10040, 72. G. Guo, X.C. Beebe, K.D. Coggeshall,
56. B. Kullmann, P.H. Bier, M.E. K.M. Pei, D. Arabaci, J . Am. Chem.
Kandler, K. Schmidt, B.F. Curten, SOC.1999, 121,5085-5086.
Photochem. Photobiol. 2005, 81, 73. G. Yi, T. Fu, H. Porter, M.E. Beebe,
641-648. K.D. Pei, D.H. Arabaci, Bioorg. Med.
57. T. Wang, S.S.H. Dantzker, J.L. Dore, Chem. Lett. 2002, 12, 3047-3050.
T.M. Bybee, W.J. Callaway, E.M. 74. L. Goeldner, M. Peng,]. Org. Chem.
Denk, W. Tsien, R.Y. Furuta Proc. 1996, GI, 185-191.
Natl. Acad. Sci. U. S. A. 1999, 96, 75. W.F. Nguyen, Q. McMaster,
1193-1200. G. Lawrence, D.S. Veldhuyzen, J .
58. T. Torigai, H. Sugimoto, Am. Chem. Soc. 2003, 125,
M. Iwamura, M. Fumta,J. Org. 13358-13359.
Chem. 1995, GO, 3953-3956. 76. K.Y. Cheley, S. Givens, R.S. Bayley,
59. V. Bendig, J. Frings, S. Eckardt, H. Zou, J . Am. Chem. SOC.2002, 124,
T. Helm, S. Reuter, D. Kaupp, U.B. 8220-8229.
Hagen, Angew Chem. Int. Ed. 2001, 77. D.M. Peterson, E. J. Vazquez, M.E.
40,1045-1048. Brandt, G.S. Dougherty, D.A.
I
References 171
Imperiali, B. Rothman, J . Am. Chem. 97. B.G. Tonks, N.K. Neel, Curr. Opin.
SOC.2005, 127,846-847. Cell. Biol. 1997, 9, 193-204.
78. D.M. Vazquez, E.M. Vogel, E.M. 98. C.H. Givens, R.S. Park, ]. Am. Chem.
Imperiali, B. Rothman, Org. Lett. Soc. 1997, 119, 2453-2463.
2002,4,2865-2868. 99. K.Y. Miller, W.T. Givens, R.S. Bayley,
79. D.M. Vazquez, M.E. Vogel, E.M. H. Zou, Angew Chem. Int. Ed. 2001,
Imperiali, B. Rothman, ]. Org. Chem. 40,3049-3051.
2003, 68,6795-6798. 100. A. Ursby, T. Weik, M. Peng,
80. C. Wichmann, 0. Schultz, C. Dinkel, L. Kroon, J. Bourgeois, D. Goeldner,
Tetrahedron Lett. 2003, 44, M. Specht, Chembiochem2001, 2,
1153-1155. 845-848.
81. J.W. Feeney, J. Trentham, D.R. 101. L. Brock, A. Herberich, B. Schultz,
Walker, Biochemistry 1989, 28, P.G. Wang, Science 2001, 292,
3272-3280. 498-500.
82. J.W. Reid, G.P. Trentham, D.R. 102. T. Ashizuka, Y. Murakami, H. Sisido,
Walker, Methods Enzymol.1989, 172, M. Hohsaka, Nucleic Acids Res. 2001,
288-301. 29,3646-3651.
83. S. Spoors, ].A. Fawcett, M.C. Self, 103. T. Ashizuka, Y. Taira, H. Murakami,
C.H. Thompson, Biochem. Biophys. H. Sisido, M. Hohsaka, Biochemistry
Res. Commun. 1994, 201,1213-1219. 2001,40,11060- 11064.
84. V.N. Pillai, Synthesis 1980, 1-26. 104. D. Ellman, J.A. Schultz, P.G. Mendel,
85. C.H. Thompson, S. Self, Nat. Med. J . Am. Chem. SOC.1991, 113,
2758-2760.
1996, 2,817-820.
105. L.H. Matthews, B.W. Weaver, ]. Mol.
86. G. Ottl, J. Heidecker, M. Gabriel,
Biol. 1987, 193, 189-199.
D. Marriott, Methods Enzymol.1998,
106. S.N. Jack, W.E. Xiong, X. Danley,
291,95-116.
P. Rajfur, Z.Jones, D. Marriott,
L.E. Ellman, J.A. Schultz, P.G.
87.
Noren, C.J. Cook, Angew Chem. Int.
G. Loew, L. Jacobson, K. Roy,]. Cell
Ed. 1995, 34,1629-1630.
Biol. 2001, 153, 1035-1047. 107. G.F. Lodder, M. Laikhter, A.L.
88. D. Nachmias, V.T. Safer, Bioessays Arslan, T. Hecht, S.M. Short, ]. Am.
1994, 16,473-479. Chem. SOC.1999, 121,478-479.
89. G. Roy, P. Jacobson, K. Marriott, 108. L.J. Tomaszek, T.A. Roberts, G.D.
Methods Enzymol.2003, 360, Carr, S.A. Magaard, V.W. Bryan, H.L.
274-288. Fakhoury, S.A. Moore, M.L. Minnich,
90. C.Y. Niblack, B. Walker, B. Bayley, M.D. Culp, J.S. Desjarlais, R.L. Meek,
H. Chang, Chem. Biol. 1995, 2, T.D. Hyland, Biochemistry 1991, 30,
391-400. 8441-8453.
91. C.Y. Fernandez, T. Panchal, 109. L.J. Tomaszek, T.A. Meek, T.D.
R. Bayley, H. Chang,]. Am. Chem. Hyland, Biochemistry 1991, 30,
SOL.1998, 120,7661-7662. 8454-8463.
92. L.N. Noble, M.E.M. Owen, D. J. 110. E.D. Wang, G.T. Krafft, G.A.
Johnson, Cell 1996,85,149-158. Erickson, J. Matayoshi, Science 1990,
93. J.R. McCough, A. Ono, S. Bamburg, 247,954-958.
Trends Cell. Biol. 1999, 9, 364-370. 111. M. Nakayama, K. Majima, T. Endo,
94. M . Ichetovkin, I. Song, X.Y. J . Org. Chem. 2004, 69,4292-4298.
Condeelis, J.S. Lawrence, D.S. 112. M. Strzelecka, T. Dorner, L.F.
Ghosh,]. Am. Chem. Soc. 2002, 124, Schildkraut, I. Agganval, A.K.
2440-2441. Newman, Structure 1994, 2,439-452.
95. M. Song, X.Y. Mouneimne, 113. M. Strzelecka, T. Dorner, L.F.
G. Sidani, M. Lawrence, D.S. Schildkraut, I. Agganval, A.K.
Condeelis, J.S. Ghosh, Science 2004, Newman, Nature 1994,368,660-664.
304,743-746. 114. K. Endo, M. Majima, T. Nakayama,
96. T. Hunter, Cell 1995, 80, 225-236. Chem. Commun. 2004,2386-2387.
3 Engineering Control Over Protein Function U!iing Chemistry
172
I 115. M. Nakayama, K. Kaida, Y. Majima, 133. G.C.R. Kaplan, J.H. Barsotti, R.J.
T. Endo, Angew Chem. Int. Ed. 2004, Ellis-Davies, Biophys. ]. 1996, 70,
43,5643-5645. 1006- 1016.
116. J.C. Silverman, S.K. England, P.M. 134. R. Ramesh, D. Carpenter, B.K. Hess,
Dougherty, D.A. Lester, H.A. Miller, G.P. Wieboldt, Biochemistry 1994, 33,
Neuron 1998, 20,619-624. 1526-1533.
117. K.D. Gallivan, J.P. Brandt, G.S. 135. F.M. Margulis, M. Tang, C.M. Kao,
Dougherty, D.A. Lester, H.A. J.P.Y. Rossi, J . Biol. Chem. 1997, 272,
Philipson, Am.J . Physiol. Cell. 32933-32939.
Physiol. 2001, 281, C195-C206. 136. L. Wieboldt, R. Ramesh,
D. Carpenter, B.K. Hess, G.P. Niu,
118. E. J. Brandt, G.S. Zacharias, N.M.
Biochemistry 1996,35,8136-8142.
Dougherty, D.A. Lester, H.A.
137. F.M. Kao, J.P.Y. Rossi, J . Biol. Chem.
Petersson, Biophotonics Pt A 2003,
1997,272,3266-3271.
360,258-273,
138. Y.Q. Angleson, J.K. Kutateladze, A.G.
119. G.S. Tong, Y.H. Li, M. Lester, H.A. Wan,]. Am. Chem. SOC.2002, 124,
Dougherty, D.A. Brandt, Biochemistry 5610-5611.
2000,39,1575-1576. 139. S.R. Tsien, R.Y. Adams, Annu. Rev.
120. Y.H. Brandt, G.S. Li, M. Shapovalov, Physiol. 1993, 55, 755-784.
G. Slimko, E. Karschin, 140. Y. Shigeri, Y. Ishida, A. Kameshita,
A. Dougherty, D.A. Lester, H.A. I . Fujisawa, H. Yumoto, N. Tatsu
Tong, J . Gen. Physiol. 2001, 1 1 7, Bioorg. Med. Chem. Lett. 1999, 9,
103- 118. 1093- 1096.
121. R. Gee, K. Lee, H.C. Aarhus,]. Biol. 141. A. Shigeri, Y. Tatsu, Y. Uegaki,
Chem. 1995, 270,7745-7749. K. Kameshita, I. Okuno, S. Kitani,
122. L.J. Corrie, J.E.T. Wootton, J.F. Wang, T. Yumoto, N. Fujisawa, H. Ishida,
J. Org. Chem. 2002,67, 3474-3478. FEBS Lett. 1998,427,115-118.
123. L.R. Tsien, R.Y. Makings, J . Biol. 142. Y. Shigeri, Y. Sogabe, S. Yumoto,
Chem. 1994, 269,6282-6285. N. Yoshikawa, S. Tatsu, Biochem.
124. X.P. Sreekumar, R. Patel, J.R. Biophys. Res. Commun.1996, 227,
Walker, J.W. Huang, Bi0phys.J. 1996, 688-693.
70,2448-2457. 143. A.G. Jung, G. Beck-Sickinger,
125. J. Gadella, T.W.J. Goedhart, Biopolymers 1995, 37, 123-142.
Biochemistry 2004,43,4263-4271. 144. T.J. Burgess, W.H. Prendergast, F.G.
126. B.T. Reich, R. Neeman, M. Bercovici, Lau, W. Watterson, D.M. Lukas,
T. Liscovitch, M. Williger, J . Biol. Biochemistry 1986, 25,1458-1464.
Chem. 1995, 270,29656-29659. 145. K. Debiasio, R. Taylor, D.L. Hahn,
127. S. Hirokawa, R. Iwamura, Nature 1992, 359, 736-738.
M. Watanabe, Bioorg. Med. Chem. 146. K.A. Taylor, D.L. Giuliano, Curr.
Lett. 1998, 8, 3375-3378. Opin. Cell. Biol. 1995, 7, 4-12.
147. M. Clore, G.M. Gronenborn, A.M.
128. J.E.T. Corrie,J. Chem. SOC.Perkin
Zhu, G. Klee, C.B. Bax, A. Ikura,
Trans. 11993,2161-2166.
Science 1992, 256, 632-638.
129. W.-h. Llopis, J. Whitney, 148. A. Ikura, M. Crivici, Annu. Rev.
M. Zlokarnik, G. Tsien, R.Y. Li, Biophys. Biomol. Struct. 1995, 24,
Nature 1998, 392,936-941. 85-116.
130. C. Schultz, C. Dinkel, Tetrahedron 149. M. Ikebe, R. Matsuura, M. Ikebe,
Lett. 2003, 44, 1157-1159. M. Tanaka, EMBOJ. 1995, 14,
131. J.A. Prestwich, G.D. Chen, 2839-2846.
Tetrahedron Lett. 1997, 38, 969-972. 150. R. Ikebe, M. Fay, F.S. Walker, J.W.
132. S.R. Kao, J.P.Y. Grynkiewicz, Sreekumar, Methods Enzymol.1998,
G. Minta, A. Tsien, R.Y. Adams, 291,78-94.
J . Am. Chem. Soc. 1988, 110, 151. J.W. Gilbert, S.H. Drummond, R.M.
3212-3220. Yamada, M. Sreekumar,
References 1 173
R. Carraway, R.E. Ikebe, M. Fay, F.S. 159. M.B. Rittinger, K. Volinia, S. Caron,
Walker, Proc. Natl. Acad Sci. U. S. A. P.R. Aitken, A. Leffers, H. Gamblin,
1998, 95,1568-1573. S.1. Smerdon, S.J. Cantley, L.C. Yaffe
152. J.S. Koszelak, M. Liu, J. Lawrence, Cell 1997, 91, 961-971.
D.S. Wood, J . Am. Chem. Soc. 1998, 160. M.E. Nitz, M. Stehn, J . Yaffe, M.B.
120,7145-7146. Imperiali, B. Vazquez,]. Am. Chem.
153. S. Walsh, D.A. Whitehouse, J . Biol. SOC. 2003, 125,10150-10151.
Chem. 1982, 257,6028-6032. 161. D. Chassaing, G. Prochiantz,
154. H.C. Kemp, B.E. Pearson, R.B. A. Derossi, Trends Cell. Bid. 1998, 8,
Smith, A.J. Misconi, L. Vanpatten, 84-87.
S.M. Walsh, D.A. Cheng, J . Biol. 162. M.E. Muir, T.W. Hahn, Angew
Chem. 1986, 261,989-992. Chem., Int. Ed. 2004,43,5800-5803.
155. B.E. McAnaney, T.B. Park, E.S. Jan, 163. T.W. Muir, Annu. Rev. Biochem.
Y.N. Boxer, S.G. Jan, L.Y. Cohen, 2003, 72,249-289.
Science 2002, 296, 1700-1703. 164. J.P. Hahn, M.E. Muir, T.W. Pellois,
156. M.E. Rothman, D.M. Imperiali, J . Am. Chem. Soc. 2004, 126,
B. Vazquez, Org. Biomol. Chem. 2004, 7170-7171.
2,1965-1966. 165. E.L. Bleich, H.E. Day, A.R. Freer, R.J.
157. A.J. Tanner, J.W. Allen, P.M. Shaw, Clasel, J.A. Visintainer, J. Becker,
A.S. Muslin, Cell 1996, 84,889-897. Biochemistry 1979, 18,4656-4668.
158. A. Rothman, D.M. Stehn, 166. M.C. Drabik, S.J. Ahamed, J. Ah,
J. Imperiali, B. Yaffe, M.B. Nguyen, H. Pirrung, Bioconjug. Chem. 2000,
Nat. Biotechnol. 2004, 22, 993-1000. 11,679-681.
Chemical Biology
174
3.3
EngineeringControl Over Protein Function; Transcription Control by Small
Molecules
John T.Koh
Outlook
Ligand-inducible transcription factors, whether derived from heterologously

expressed prokaryotic regulatory proteins or reengineered eukaryotic receptors,
continue to play an invaluable role in studying gene function. Through the
study of reengineering ligand-binding specificities of nuclear receptors and
other transcription factors, new tools for exploring emerging extranuclear roles
for these receptors can be generated. Developing new strategies for selective,
functionally orthogonal, ligand-receptor pairs can be applied more broadly in
chemical biology in the form of chemical inducers of dimerization (CIDs), or
analog-specific enzymes. Similar design principles may also be applied to the
functional rescue of disease-associated mutant proteins that have defects in
binding small molecules.
The impact that ligand-inducible transcription factors have had on the study
of biology over the past decade highlights the importance of developing new
methods to precisely manipulate and study complex biological systems at the
molecular level. The availability of multiple ligand-dependent transcription
factors further increases the level of complexity and sophistication with which
we can probe complex biological phenomena. In the future new systems such
as light-directed transcription control may play a powerful role in dissecting
the roles of genes that act through their unique spatiotemporal patterns in
tissue. These efforts will similarly require continued development of new tools
based on the marriage of both chemical and biological methods.
3.3.1
Introduction
This chapter reviews strategies for manipulating or engineering de novo

proteins that can regulate gene expression in response to small molecules.
Methods that allow us to control the expression of genes in a spatially and
temporally defined manner provide powerful tools for the study of gene
function. The study of naturally occurring ligand-inducible transcriptional
regulators affords insights into the strategies that nature uses to remotely
regulate protein function, thus providing a basis with which to control
and study the actions of virtually any gene product through the remote
regulation of its expression. Ligand-receptor engineering can be used to
create new transcriptional regulators, to provide the means to selectively
ISBN: 978-3-527-31150-7
3.3 Engineerjng Control Over Protein Function; Transcription Control by Small Molecules I 175
activate one of many cellular pathways responsive to the same ligand, and may
further provide new strategies to rescue disease-associated mutants of ligand-
dependent proteins. In addition, new methods to control gene expression with
light can be used to spatially and temporally pattern genes in tissues.
3.3.2
The Role of Ligand-dependent Transcriptional Regulators
Many naturally occurring inducible transcriptional regulators have been used

in heterologous systems for controlling protein expression. However, more
recently a number of research groups have used a combination of chem-
ical and genetic approaches to reengineer the specificity of transcriptional
regulators [l-31. Emerging methods may allow one to convert otherwise
nonligand-responsive proteins into ligand-responsive systems. Several new
technologies offer unprecedented control over gene expression using nucleic
acids such as antisense, ribozyme interference, and RNAi [4-6]. New methods
to control mRNA translation in a ligand-dependent manner offer a new dimen-
sion of transcriptional control [7,8].These methods can be used in conjunction
with ligand-dependent expression systems to provide spatial and temporal con-
trol of genes. In addition to strictly “on/ofl” responses, dose-dependent or
“rheostatic” control expression can provide exquisite control of gene functions
dependent on specific stoichiometries or spatiotemporal patterning.
3.3.2.1 Ligand-dependent Transcriptional Regulators Derived from Natural

Repressors
Practical applications of ligand-regulated transcription factors need to function
in a highly specific manner that should ideally only effect the expression of the
gene of interest. Some of the most common ligand-inducible transcriptional
regulators are derived from naturally occurring proteins. For example, the lac
repressor binds to operons (i.e., genetic binding sites) in the promotor region
preceding genes, and blocks transcription (Fig. 3.3-1). The lac repressor forms
a homotetramer that spans two operon sites and blocks association of the
transcription machinery through a combination of direct occlusion of DNA-
binding sites and perturbation of DNA topology [9, 101. Binding of the small-
molecule lactose or the stable synthetic analog isopropylthiogalactropyranoside
(IPTG) occurs near a dimerization interface, induces a conformational change
that disrupts oligomerization and DNA binding, thus exposing the full
promotor sequence and allowing for gene expression. The lac operon is highly
inducible and widely used for controlling protein production in eukaryotes.
This is particularly important when expression of the target protein, either
toxic to the cell or otherwise, adversely affects growth.
Several prokaryotic repressors can also be used in eukaryotes. Most notable
is the tetracycline (TET) inducible expression system, which has had a
176
Fig. 3.3-1 Prokaryotic repressors can be exploited t o control

eukaryotic expression. (a) Repressor used to turn off transcription,
(b) repressor-activator chimera used to “turn-on” gene
expression. AD - activation domain.
tremendous impact on the study of eultaryotic protein function [ll, 121.

Similar to LacR, DNA binding of TET is also conformationally controlled by
the association of a small molecule, tetracycline (tet). Tet binding triggers
dissociation of the repressor dimer and loss of DNA binding (Fig. 3.3-l(a)).
These ligand-dependent repressors have been converted to ligand-dependent
activators through fusion of LacR or TET to the potent HSV (herpes simplex
virus) transactivation domain (VPlG). These systems provide tight control
over genes of interest placed behind minimal promoters having three-
flanking operator binding sequences (Fig. 3.3-1(b)).The LacR-VP1G chimera
has approximately 1000-fold inducibility but is slow to respond to the addition
of IPTG [lo]. In the original TET system, cells continuously treated with tet
repress gene expression. When tet is removed gene expression is activated.
The need to continuously treat cells with tet was a significant drawback
as it was unclear what effects long term exposure to tet could have on a
specific system. Bujard et al. were able to reengineer the TET so that it only
bound DNA in the presence of doxycycline (dox). Fusion of this modified
form of TET to the VPlG transactivation domain formed a dox-responsive
transactivator, tTA or “Tet-On,” that tightly and rapidly upregulates the
transcription 105-foldwhen dox is added [12]. These pioneering studies have
lead to the development of a number of ligand-inducible activators based on
prokaryotic proteins.
Ligand-dependent transcriptional regulators have been derived from
prokaryotic repressors that bind DNA in response to small-molecule ligands
to commercially available antibiotics of the macrolide and streptogramin
families [13,14].Because the protein binds DNA only when liganded, chimeras
generated from the fusion of these repressors to transactivation domains can
serve as potent ligand-dependent transcriptional activators.
3.3 Engineering Control Over Protein Function; Transcription Control by Small Molecules I 177
(4
..fro-
HO
0
OH -.
(b)
,
(c)
Fig. 3.3-2 Prokaryotic regulators of transcription have been

adapted for use as eukaryotic transcriptional regulators:
(a) macrolide, (b) streptogramin, (c) quorum signaling
p-0x0-hexanoylhomoserine lactone.
3.3.2.2 Exploiting Prokaryotic Ligand-dependentActivators

Another class of small-molecule transcription factors is the quorum-
sensing receptors that often respond to surprisingly simple small molecules
(Fig. 3.3-2(c)).These naturally occurring small-molecule dependent transcrip-
tional regulators have been pursued as a means to control prokaryotic genes
only recently and therefore their discussion here is brief [lS]. Nonetheless,
these naturally occurring prokaryotic transcriptional regulators hold promise
as an important new source of ligand-dependent transcriptional regulators of
eukaryotic genes. An interesting example is that of the acetaldehyde respon-
sive protein that controls expression in response to gaseous molecule and can
therefore enforce transcription control in a whole animal transgenic model
through its air supply [lG].
3.3.2.3 Reprogramming Eukaryotic Transcriptional Regulators

A critical requirement for any transcriptional regulator to be used in
the study of gene function is the strict selectivity of the ligand-receptor
pair to activate only the gene of interest. Several groups have developed
methods to “reprogram” the ligand-binding specificity and gene target-
ing specificity of transcriptional regulators. The need to change both
ligand-binding specificity as well as DNA-binding specificity greatly limits
the possible receptors that can be directly reengineered to provide con-
trol over transgene expression. For example, G-protein coupled receptors
(GPCRs) are an important class of signaling receptors that regulate gene
expression in response to small molecules. However, GPCRs regulate ex-
pression through signaling pathways that involve the intermediary actions
of multiple proteins. In this case, the ligand binding and DNA recognition
events are separated on many different proteins, making their “reprogram-
ming” difficult.
Nuclear and steroid hormone receptors (NHRs),in contrast to GPCRs, are
ligand-inducible transcription factors, which when liganded directly bind to
178
I hormone response elements in eukaryotic genes and upregulate transcription
through a ligand-dependent transactivation domain [ 17, 181. When in their

unliganded forms, most steroid hormone receptors are not bound to DNA,
but are instead sequestered by heat-shock proteins (hsps). Steroid hormone
binding causes dissociation from hsps, dimerization, and DNA binding. In
contrast, the unliganded forms of the “nuclear” receptors such as thyroid
hormone, retinoic acid, peroxisome proliferator-activated receptor (PPAR),
and vitamin D receptor (VDR) are generally bound to DNA as heterodimers
with RXR retinoid X receptor and bind corepressor proteins that actively
repress gene expression (Fig. 3 . 3 - 3 ) .
The ligand-dependent transactivation domain and the DNA-binding do-
mains of these receptors function relatively independent of each other,
allowing one to create functional chimeras that redirect the actions of
specific hormones to new genes through alternate DNA-binding domains.
For example, an early study by Greene and Chambon demonstrated that
by exchanging the glucocorticoid receptor (GR) ligand-binding domain for
that of the estrogen receptor (ER), glucocorticoid-responsive genes could
be rendered responsive to estradiol (E2) [19]. A number of other functional
chimeras have been constructed by exchanging DNA-binding domains from
other NHRs including thyroid hormone receptor (TR)/retinoid X receptor
chimeras [20], retinoic acid/VDR chimeras (211, and TR/GR chimeras [22].
Functional chimeras have also been generated using non-NH R DNA-binding
Fig. 3.3-3 General mechanism of their unliganded forms. (b) Nuclear

nuclear/steroid hormone receptor action. receptors can bind to DNA in the absence of
(a) Steroid hormone receptors are generally ligand and can associate with transcriptional
sequestered by heat-shock proteins (hsp) in repressors.
domains such as the progesterone receptor (PR) Gal4 DNA-binding do-

main chimera, developed by Wang and O’Malley [23]. Several studies have
shown that DNA-binding domains can be reengineered or evolved to bind
to new DNA-binding sequences [24, 251. Therefore, NHRs are an attrac-
tive scaffold from which to develop new, selective transcriptional regulators
as they in principle can be modified to regulate almost any transgene of
interest. However, application of these systems is still limited by the pres-
ence of other endogenous receptors that are also responsive to the same
hormone.
The use of heterologous NHRs is one way to selectively control only
the targeted gene of interest. The ecdysone receptor (EcR) is unique to
insects and crustacea and therefore has been widely used to selectively
regulate mammalian genes [26, 271. Inducible gene expression in mammals or
mammalian cell culture can be achieved with EcR, although highly inducible
expression generally requires coexpression of RXR. It is unclear if over
expression of RXR influences expression of other NHR responsive genes.
Nonetheless, the EcR has become an important heterologous regulator of
mammalian gene expression. The need for additional and multiple ligand-
inducible transcription factors has prompted several groups to develop new
transcriptional regulators by reengineering the ligand-binding domains of
existing NHRs.
3.3.3
Engineering New Ligand Specificities into NHRs
The reengineering of NHR ligand-binding domains to selectively respond

to synthetic ligands has proved to be an important and challenging area
in ligand-receptor engineering. Since the original studies of Kirsh and
Holbrook directed toward reengineering substrate specificity of enzymes,
ligand-receptor engineering has become an important tool for studying
complex biomolecular systems [28, 291. Schreiber was perhaps the first to use
a combination of mutagenesis and synthesis to generate selective probes for
biological function in the form of chemical inducers of dimerization (CIDs;
covered elsewhere in this volume) 130-321. The basic design principle used in
these studies was the use of “bumps and holes” to alter the interface between
ligand and protein in a complementary manner [31]. The bump refers to a
molecular appendage on the ligand that would cause a steric clash if it were to
try to bind to the wild-type receptor. However the “bumped ligand” could bind
to a receptor that is appropriately modified through mutagenesis to contain
a compensatory “hole.” The “bump and hole” approach to ligand-receptor
engineering has been applied to a number of protein-ligandlenzyme substrate
systems. One ofthe most successful systems is the ATP analog-selective-kinase
systems by Shokat et al. [33].
180
3.3.4
The Requirement of “Functional Orthogonality”
The application of “bump and hole” engineering toward the generation

of selective transcriptional regulators has been limited, largely because
“hole-modified’ proteins often retain substantial aflinity for their natural
ligand [34]. For some applications, such as the selective labeling of kinase
substrates by radiolabeled ATP analogs that are only recognized by modified
kinases, competing reactions by the natural (nonlabeled ATP) substrate for
the kinase is not strictly required [35].
However, a selective transcriptional regulator used to study gene function
would have to be function independent of any endogenous receptors. Absolute
selectivity over all concentrations of ligand is rarely observed. In practice, it is
sufficient for a modified ligand-receptor pair to be “functionally orthogonal”
such that the modified receptor is nonresponsive to endogenous concentrations
of the natural ligand and that the modified ligand is unable to activate the
natural receptor at concentrations used to modulate the modified receptor [33,
341. It is important to recognize that while high potency is generally desirable,
the ligand-analog need not bind the modified receptor with the same affinity
as the natural ligand-receptor pair so long as it has high selectivity.
3.3.5
Overcoming Receptor Plasticity
The greatest challenge presented by engineered nuclear receptors is the

significant structural flexibility of the ligand-binding domain. NH R ligand-
binding domains undergo substantial structural reorganization upon hormone
binding. The hormone generally provides a hydrophobic nucleus around which
the ligand-binding domain repacks its core. The structural changes to the
receptor’s core cause changes to the receptor surface resulting in coactivator
recruitment and changes in receptor dimerization. It is therefore not surprising
that the estrogen receptor binds many ligands that are substantially larger than
E2 and would otherwise appear to be too large to fit within the binding
pocket observed in the E2-ER crystal structure (Fig. 3.3-4) [36]. These studies
imply that identifying “bumped” hormones that will not bind wild-type NHRs
could be more challenging than ligand-receptor engineering with more rigid
proteins.
Through targeted site-directed mutagenesis, Corey et al. searched a library
of failed drug candidates, “near drugs,” for their ability to selectively activate
the 9-cis retinoic acid receptor, RAR [37, 381. These mutants were carefully
selected not to have significant activity with the natural hormone 9-cis retinoic
acid. Although mutants that improved activity of these ligands with the mutant
receptor were identified, these ligands that largely contained only hydrophobic
groups, aside from the requisite carboxylate, were generally less than 10-fold
3.3 Engineering Control Over Protein Function; TranscriptionControl by Small Molecules I 181
(-./.&?&C?F5
Estrddio (E?) RU-58668 ICI-IX2.780
Hanson: 17u phenylvinyl estradiol Katrenellenbogen; 4-ally1
Fig. 3.3-4 The estrogen receptor has sufficient flexibility to accommodate a diverse array
of ligands that interact with ER a t low or sub-nanomolar potencies.
selective for the mutant over the wild-type receptor. These studies highlight
the remarkable ability of the wild-type receptor to accommodate ligands that
differ in hydrophobic shape even when modeling might suggest that these
ligands should not be accommodated by the ligand-binding site. In general,
protein plasticity limits the use of “bump and hole” engineering of flexible
proteins.
Our group has therefore focused on exploring methods to manipulate polar
groups to impart specificity to engineered ligandlreceptor pairs, following
the general notion that polar interactions impart specificity to molecular
recognition events because mismatched polar interactions cannot be easily
avoided by simple side-chain reorganization. In an early work on the retinoic
acid receptor, hormone-binding selectivity was changed by modifying a key
arginine residue, (Arg278) that forms a salt bridge to the carboxylate of
bound retinoic acid [39].Although a neutral ethylamide analog of retinoic acid
displayed some mutant versus wild-type selectivity, this analog was notably
less potent than the wild-type retinoic acid- RAR (retinoic acid receptor)
pair and showed only partial selectivity. A more dramatic attempt to impart
selectivity through the manipulation of polar interactions was the reversal of
a ligand-receptor salt bridge by creating a guanidine functionalized retinoid,
which showed selective but weak activity for the charge-complementing mutant
RARy (S289G/R278E).The weaker cellular activity of this ligand-receptor pair
is not entirely unexpected in the light of studies by Warshel suggesting that salt-
bridge interactions are stabilized protein dipoles that would be destabilizing
if the salt bridge were reversed [40, 411. In general, charged or neutral polar
182
I groups found in the interior of proteins are stabilized by multiple polar
interactions from the protein in the form of ion pairs, hydrogen bonds, and
local or macrodipoles. Adding, removing, or rearranging polar groups found in
the interior of protein-ligand complexes is generally disfavored as it leaves the
associated polar groups unsatisfied. The solution to this problem of selectivity
is not immediately obvious but in at least some cases can be solved.
The Koh and the Katzenellenbogen groups simultaneously explored estro-
gen analogs that could complement the same Glu353 + Ala or Ser mutation
in the estrogen receptor [42-441. Glu353 forms an intramolecular salt bridge
with Arg274 and both residues form key hydrogen bonds to the 3-hydroxyl
of E2 (Fig. 3.3-5(a)).Mutations to Glu353 greatly reduce the receptor’s affinity
for the natural ligand E2. While a number of estrogen analogs bearing neutral
functional groups in place of the 3-hydroxyl of E2 could activate the Glu353
mutants with high affinity, in almost all cases, these analogs activated the
wild-type ERs with equal or greater potency. A few low-potency ligands ( t 2 %
wild-type potency) show receptor selectivities as high as 34-fold (mutantlwild
type) (Fig. 3.3-G(a))[42]. By comparison, carboxylate-functionalized estrogen
analogs designed to restore (intermolecularly) the lost protein salt bridge
with Arg274 form high affinity/potency complexes with the mutant receptor
(Fig. 3.3-5(b)).These complexes are not of higher affinity than the analogs
having neutral appendages, suggesting that the favorable energetics of form-
ing a salt bridge with Arg274 is offset by the substantial cost of desolvating
the ligand-associated carboxylate [44].However, carboxylate-functionalizedlig-
ands of appropriate size and shape provided a significant gain in selectivity,
which can be as high as 95- to 400-fold in favor of the mutant over the wild-type
Fig. 3.3-5 Accessible surface model of functionally orthogonal

ER/ES8 pair. (a) Wild-type ER-E2 receptor based on structure
modeled by Brzozowski et al. [45].(b) Modeled structure o f
ESg-ER(E353A).
0
RTP = I .S RTP = 15 RTP = 17 KTP = 0.9 RTP = 2
RS = 34 RS = 1.3 R5= I1 KS = 9.2 RS = 1.6
RTP = 0.8 RTP = 3.0 RTP = 38

RS = 22 RS = 95 RS = 56
Fig. 3.3-6 Complements for ERa(E353A). structure provide high selectivity without
(a) Neutral modifications tend t o provide significant loss in affinity. RTP - relative
only modest mutant versus wild-type transcription potency; RS - receptor
selectivity. (b) Acidic analogs of appropriate selectivity (ECSowild type/ECSomutant).
ERs (Fig. 3.3-6(b)).This greater selectivity is imparted as a result of weaker

binding of the carboxylate-functionalized ligands to the wild type, presumably
as a result of mismatched polar interactions at the ligand-receptor interface.
We termed the process of exchanging polar groups across the lig-
and-receptor interface as “polar group exchange”. [43] In essence, the same
key functional groups are present in more or less the same positions in the
wild-type and the engineered ligand-receptor complexes but differ only in their
covalent connectivity ofa key polar group. In the present example, the carboxy-
late group is presumed to be in more or less the same position but covalently
linked to the ligand than to the receptor. This minimizes the impact of altering
polar groups within the interior of the protein by preserving the orientation
of key dipolar interactions. The most selective system reported is ERB(E305A)
with the synthetic ligand ES8. This mutant is no longer activatedby endogenous
concentrations of E2, but can be fully activated by concentrations of ES8 that do
not activate the wild-type ERs. This system therefore comprises a functionally
orthogonal ligand-receptor pair that, in principle, can be used to regulate gene
expression independent of endogenous estrogen responsive receptors.
3.3.6
Nuclear Receptor Engineering by Selection
Miller and Whelan were perhaps the first to recognize the potential of screening
or selecting NHR mutants from receptor libraries to identify ERs with modified
ligand specificities [46,47].Using error prone PCR, they generated populations
of mutant ERs in yeast that decreased responsiveness to E2 but has increased
responsiveness to the synthetic diphenyl indene-ol GRl32706X. Despite their
184
I elegant plan, the selected mutants had good potencies but relatively modest
selectivities, exhibiting only a 10- to 25-fold improvement in the potency

of GR13270GX with the mutant when compared to wild type. One of the
limitations of the Miller and Whelan study was that their modified ER
regulated the expression of p-galactosidase, which was laboriously followed
colorimetrically. Doyle has recently succeeded in using a true selection method
to screen codon randomized libraries of RXR that were activated by the
synthetic compound LG335 [2]. A key component to their strategy was to
utilize a fusion of the nuclear receptor coactivator ACTR linked to the potent
Gal4 activation domain (ACTR-GAD).This provided tight control of ADE2
expression to conditionally control survival of the P JG9-4Aauxitroph on media
lacking Trp and Leu. The mutant RXR(I2G8V/A272V/I310L/F313M) was 300
times more responsive toward LG335 than wild-type RXR in mammalian cell
culture. This particular ligand-receptor pair has only 30% of the wild-type
efficacy but nonetheless represents a significant advance in the strategies used
to develop functionally orthogonal transcriptional regulators. This general
strategy could be easily extended to other NHRs.
3.3.7
Ligand-dependent Recombinases
Other NHR reengineering strategies do not require engineered ligand bound

complex to be transcriptionally active but can exploit the ligand-dependent
association of steroid receptors to hsps. Pioneering work by Chambon’s group
demonstrated that site-specific recombinases can be placed under the control
of nuclear receptor ligand-binding domains [48-SO]. The chimeric fusion
protein composed of the site-specific recombinase Cre with the ER ligand-
binding domain is only active in the presence of an ER ligand such as E2 or the
antagonist tamoxifen (Fig. 3.3-7). The unliganded ER ligand-binding domain
is associated with hsp90 and interferes with the formation of the tetrameric
Cre complex, which mediates recombination. Ligand-dependent recombinases
provide a powerful tool for the gene expression because flanking a gene of
interest with Cre recognition sites can be used to permanently turn on or turn
off its expression. Because recombination causes a permanent change to the
cellular genome, all the progeny of a cell that has undergone recombination
will propagate the same genomic change. Conditional recombinases used in
conjunction with cell-type specific promotors can therefore be powerful tools
for following cell lineages in vivo [511.
Since the development of the original Cre-ER system, mutagenesis and
screening strategies have identified modified ER ligand-binding domains that
have reduced responsiveness to E2 but can mediate tamoxifen-dependent
recombination [48]. It is important to make the distinction that these
modified ligand-receptor pairs do not necessarily form transcriptionally active
complexes. Since the first report of the Cre-ER system, several new systems
i = S’-TATAAClTCGTATAGATATGCTATACGAAGTTAT-3’
1
(b)
edRE-ER a
ER ligand
11111,
ATG STOP
Fig. 3.3-7 Site-specific recombinases can presence o f an ER ligand. Recombination

be used t o control gene expression. can be used t o switch on or off genes by
(a) Homologous recombination by Cre is placing them downstream of promoter
performed a t specific LoxP sites. (b) The sequences.
chimeric Cre-ER i s only active in the
have been reported that make use of Cre or the site-specific recombinase Flp
including Cre, Cre-PR (progesterone receptor fusion), Cre-GR (glucocorticoid
receptor fusion), and EcR-Flp [Sl-531.
Although some of these ligand-dependent recombinases have been
reengineered to selectively respond to synthetic receptor antagonists such
as Tamoxifen responsive Cre-ER or RU486 responsive Cre-PR, the need to
treat cells for up to several days with these potent receptor antagonists may
have unwanted side effects, particularly, when used in in vivo developmental
models [SO, 531. This suggests that functionally orthogonal ligands may still
have an important role to play, providing the next generation of highly selective
ligand-dependent recombinases.
3.3.7.1 Chemical Biology o f NHRs and the Potential o f Engineered Nuclear

Receptors
A rapidly emerging area in nuclear receptor biology is the “nongenomic” or
“extranuclear” actions of NHRs [54]. Several lines of evidence suggest that
nuclear receptors may activate signaling complexes outside of the nucleus that
only indirectly affect gene transcription. For example, the rapid nongenomic
186
I actions ofvitamin D receptor (VDR)have been known for many years. Vitamin
D analogs that selectively activate the nongenomic actions of vitamin D have

played an invaluable role in the study of its nonnuclear actions [55-571.
Nongenomic activities of thyroid hormone [58],glucocorticoids, androgens,
and mineralcorticoids have also been identified [54, 591. Currently, the most
well characterized of these systems involves estrogen and the estrogen recep-
tor. In addition to identifying that the GPCR GPR30 is an estrogen responsive
receptor [60-621, several studies have also confirmed that the estrogen receptor
can also act outside the nucleus in complex with scaffolding proteins such as
MNAR to activate Src kinase or in palmitoylated form in association with cave-
olins to activate PI3 kinase (Fig. 3.3-8) [63-661. In this case, the nuclear receptor
is found to play multiple extranuclear roles in regulating cellular signaling
pathways. Analog selective hormone receptors may yet play an important role
in dissecting the multiple signaling pathways activated by steroid hormones.
3.3.8
Complementation/Rescue o f Genetic Disease
The development of analog-specific forms of nuclear/steroid hormone

receptors has prompted us to investigate many naturally occurring mutations
found in nuclear receptors associated with genetic disease. Mutations to
Fig. 3.3-8 Estradiol i s involved in many activation of Src kinase, c - palmitoylated

different signaling pathways some ofwhich ER can localize t o caveolins in an estrogen
involve the same ligand-receptor pair. dependent manner, d - CPCR signaling by
a - classic nuclear activation of estradiol. Pathways a, b, and c may
transcription, b - MNAR scaffolded potentially involve E R a and ERP.
nuclear receptors are associated with a family ofhuman genetic diseases, which
include VDR mutations associated with rickets, TR mutations associated with
resistance to thyroid hormone, mineralcorticoid resistance, PPAR mutations
associated with certain forms of severe insulin independent diabetes,
and androgen receptor mutations associated with androgen insensitivity
syndrome [67-691. Additionally, mutations to the androgen, estrogen, and TRs
are associated with the pathology of prostate, breast, and thyroid cancers [70].
A significant subset of these disease-associated mutations is located at the
receptor-hormone interface suggesting that appropriately designed hormone
analogs may be able to “complement” or “rescue” the function of these
receptors. Unlike current gene therapy strategies that use nucleic acid analogs,
hormone analogs typically have good druglike properties (i.e., bioavailability,
biostability) suggesting that hormone receptor complements may represent a
new strategy toward developing new treatments for genetic disease.
The possibility of using hormone analogs to rescue nuclear receptor
mutations was perhaps first explored by DeGroot et al. who demonstrated that
some synthetic hormone analogs were more potent than triiodothyronine (T3)
in mutant forms of TR, associated with resistance to thyroid hormone [71].
More recently, Feldman and Peleg similarly screened vitamin D3 analogs
that partially complement VDR mutants associated with vitamin D resistant
rickets [72], and Chatterjee et al. have identified PPAR agonists that can
restore activity to PPAR mutants associated with severe insulin independent
diabetes [73]. The first example of a molecule being designed as a rescuing
function to a mutant protein associated with a genetic disease was the
development of the thyroid hormone analog HY1, which was designed
to complement the RTH (thyroid hormone resistance) associated mutant
TRB(R320C)[74].This study represented a significant advance over the earlier
studies by DeGroot, in that the complementing analog was selective for the
mutant form of TRB over the TRcr subtype. In more recent work, new thyroid
hormone analogs have been developed that restore efficacy and potency to three
ofthe most common RTH-associated mutants Arg320 -+ Cys, Arg320 + His,
Arg316 + His (Fig. 3.3-9) [75, 761. All of the compounds used to rescue these
mutations affect the carboxylate-binding cluster of arginines, and are based on
the same general complementation strategy involving more neutral hydrogen
bonding groups in place of the ligand’s carboxylate. This suggests that once
general rules for designing complementing analogs are established, the process
of identifying new compounds may be reasonably efficient.
It is important to distinguish these “functional rescue” studies from sev-
eral other important studies showing that small molecules can stabilize or
chaperone folding of mutant proteins such as mutant p53 associated with
cancer [77, 781, mutant forms of V2R associated with nephrogenic diabetes
insipidus [79, SO], mutant forms of opsin associated with retinitis pigmen-
tosa [81],and B-glucosidase mutants associated with gaucher disease [82, 831.
By contrast, nuclear receptor mutants are often well-folded,stable proteins that
188
OH
A’
H HY1
TRfl(R320C) EC,=7.0 nM TRp(R320H) EC= , 0.46 nM
rnuffrx selectivity = 5.5 rnuffu selectivity = 1.O
H KG-8 H
TRp(R320C) EC& 7 nM TR[$(R316H)EC=, 12.6 nM
rnuffn selectivity = 12 muffu selectivtty = 4
Fig. 3.3-9 Analogs that rescue function t o TRP mutants

TRP(R320C), TRB(R320H), and TRP(R316H) associated with
resistance t o thyroid hormone. Receptor selectivity of ligand,
mutlcr, defined as (ECso with TRcr)/(EC50 with mutant TRB).
have lost ligand-dependent transactivation function that can be complemented

by appropriate ligand design.
The challenge to designing compounds that rescue mutations associated
with genetic diseases is that there are generally very few individuals with
any specific mutation. This poses an even greater challenge to chemists to
efficiently design compounds that can complement any specific mutation
in a receptor-binding pocket. We evaluated the ability of computer-aided
design to discover molecular complements for the rickets associated mutation
VDR(R274L),which is more than 1000 times less responsive to the natural
hormone 1,25-dihydroxyvitamin D3. We used a virtual screening strategy to
evaluate a focused library of analogs of the synthetic VDR agonist LG190155
(Fig. 3.3-10) [84]. Although the bound structure of LG190155 with wild-type
VDR was not available, half of the analogs selected by virtual screening were
able to restore more than GO% activity at 200 nM. When tested in cell based
assays, the best analogs were able to restore almost fully the potency and
efficacy to this otherwise unresponsive mutant. Computer-aided design was
similarly successful at identifying seco-steroid analogs that could complement
this same mutant (Fig. 3.3-10) [85]. These findings suggest that for at least
some mutants, computer-aided molecular design can be used to efficiently
design compounds that rescue genetic mutations.
3.3.9
De Novo Design of Ligand-binding Pockets
In addition to reengineering existing ligand-binding pockets, it is also possible

to generate de novo ligand-binding sites into proteins. A notable early
example shown by Matthews was the formation of de novo benzene- and
guanidine-binding sites by making Phe + Ala or Arg + Ala mutations into
R
HO LCH no
1,25dihydroxyvitaminD, ss-Ill
Wild-type VDR; EC,=2.0 nM VDR(R274L); EC=, 7.0 nM
VDR(R274L); EC, 2000 nM
LG190155 0
Wild-type VDR; EC,= 110.0 nM ss-Ill
VDR(R274L); EC, = 85 nM VDR(R274L); EC=, 3.3
Fig. 3.3-10 Molecular rescue of rickets associated mutant VDR(R274L) by designed

synthetic analogs of known agonists.
lysozyme [86].Although these de novo binding sites have only weak affinity for
these solvent substrates, they clearly demonstrated that new small-molecule
binding sites could be created into proteins. Barbas and Schultz have been
able to use this strategy to create zinc finger domains that bind only in the
presence of isoindole derivatives [87]. By fusing these inducible zinc finger
domains to transactivation domains, the isoindoles can be used to remotely
regulate gene transcription. Currently, the affinity of these de novo designed
cavities for their ligands are of only modest potency. However, combined with
recent advances in computational methods to de novo design ligand-binding
cavities [88-911, this general strategy provides a potentially powerful approach
to creating ligand-inducible transcriptional regulators.
3.3.10
Light-activatedGene Expression from Small Molecules
A new and exciting area in ligand-induced transcriptional regulators is

the development of photoresponsive transcriptional regulators, which utilize
photocaged small molecules. Just as ligand-inducibletranscriptional regulators
have revolutionized our study of protein function, light-activated transcription
(or translation) systems may prove to be a powerful tool for studying the
function of genes that elicit their effects only through their expression in precise
three-dimensional patterns, gradients, or arrays. This includes morphogens,
which are important guidance cues for neurogenesis, vascular genesis, and
limb development as well as other critical steps during development [92, 931.
Spatial gene patterning may also potentially play a role in creating artificial
tissues.
190
By photocaging nuclear receptor agonists, Koh et al. were able to show

that transcription could be controlled in an exposure-dependent manner [94].
Currently, photocaged agonists for the estrogen, thyroid, retinoic acid, and
VDRs have been used to place nuclear receptor mediated transcription under
the control of light [94-961. Using a photocaged agonist of the ecdysone
receptor, Lawrence et al. have demonstrated that even though photoreleased
agonists are freely diffusing, spatially discrete patterns of expressed genes
can be made on the micron scale in cultured cells [97]. The photoregulation
of gene expression by uncaging small molecules presents many challenges.
Small-molecule triggers for transcription have the advantage of being easily
delivered into cells by passive diffusion. Therefore, a multicellular system or
organism is only light sensitive after the addition of the caged compound.
Conversely, a cultured cell monolayer can be again rendered light-insensitive
minutes after the caged compound is removed from the media.
Ligand diffusion can affect the resolution at which genes can be patterned,
as the photoreleased activator can diffuse into neighboring cells. When the
patterned feature sizes are small, the region of activation will be confined
through the effects of ligand dilution upon bulk diffusion. In other words, the
concentration of released hormone activator will be too dilute to activate cells
that are remote to the site of activation. Photocaged antagonists may provide a
means to selectively turn off gene expression in a small region of cells within
a larger tissue [96].
The photorelease of nuclear receptor agonists in a subpopulation of cells
within a tissue presents another challenge, as the diffusion of ligand back
out of the cell will limit the duration of transcription response. For some
ligand-receptor pairs, the duration of reporter gene response may be limited
to less than a few hours, whereas for other ligand-receptor pairs, reporter gene
expression can last for several hours and as long as 1.5 days [95].The duration
of reporter gene response is much longer than the half-life of free-ligand
within the cell because many ligand-receptor complexes have very slow off-
rates. However, ligand-receptor pairs with apparently slow off-rates, can have
a relatively limited duration of response as NHR transcription complexes are
generally disassembled by chaperones and are targets of ubiquitin ligases and
proteasomes [98- 1011.The effects ofphotoreleased antagonists to turn off gene
expression can be similarly limited by ligand off-rates and receptor proteolysis.
Even when a covalent-binding antagonist that has a very long ligand-receptor
half-life is used, gene expression is recovered over several hours as new protein
is resynthesized by the cell [9G]. The long duration response observed, for at
least some ligand-receptor pairs, suggests that photocaged agonists can be
used to generate unique spatiotemporal patterns of gene expression.
The use of small molecules to activate gene expression should be compared
to methods used to photocage proteins or nucleic acids [102-1061. In gen-
eral, photocaged biopolymers are difficult to deliver into cells or organisms,
whereas caged small molecules can in principle be added in vitro or in vivobut
require the use of transfected cells of transgenic animals. Tsien et al. elegantly
References I191
demonstrated that photocaged forms of RNA and DNA can be injected into
zebrafish oocytes (single cell stage) and are sufficiently stable to be carried
into essentially all cells ofthe developed organism [lOG]. The caged RNA could
then be released in a subpopulation of cells where it is locally translated into
gene product. The use of caged nucleic acids to photoregulate gene expression
was first demonstrated by Hasselton et al. in mouse models [103-1051. The
application of caged RNAs has recently been expanded to light-activated RNAi
methods by Friedman [107].
References
1. D.F. Doyle, D.J. Mangelsdorf, D.R. J. Collins, M. Lee, A. Roth,

Corey, Modifying ligand specificity of N. Sudarsan, I. Jona, J.K. Wickiser,
gene regulatory proteins, Curr. Opin. R.R. Breaker, New RNA motifs
Chem. Biol. 2000, 4, 60-63. suggest an expanded scope for
2. L.J. Schwimmer, P. Rohatgi, B. Azizi, riboswitches in bacterial genetic
K.L. Seley, D. Doyle, Creation and control, Proc. Natl. Acad. Sci. U.S.A.
discovery of ligand-receptor pairs for 2004, 101,6421-6426.
transcriptional control with small 9. M. Lewis, G. Chang, N.C. Horton,
molecules, Proc. Natl. Acad. Sci. M.A. Kercher, H.C. Pace, M.A.
U.S.A. 2004, 101,14707-14712. Schumacher, R.G. Brennan, P.Z. Lu,
3. A.R. Buskirk, D.R. Liu, Creating Crystal structure of the lactose
small-molecule-dependent switches operon repressor and its complexes
to modulate biological functions, with DNA and inducer, Science 1996,
Chem. Biol. 2005, 12, 151-161. 271,1247-1254.
4. D.A. Braasch, D.R. Corey, Novel 10. S.B. Baim, M.A. Labow, A.J. Levine,
antisense and peptide nucleic acid T. Shenk, A Chimeric Mammalian
strategies for controlling gene Transactivator Based on the Lac
expression, Biochemistry 2002, 41, Repressor That Is Regulated by
4503-45 10. Temperature and Isopropyl
5. S.A. Raillard, G.F. Joyce, Targeting Beta-D-Thiogalactopyranoside, Proc.
sites within HIV-1 cDNA with a Natl. Acad. Sci. U.S.A. 1991, 88,
DNA-cleaving ribozyme, Biochemistry 5072-5076.
199635,11693-11701. 11. M. Gossen, H. Bujard, Tight control
6. L. Malphettes, M. Fussenegger, of gene expression in mammalian
Macrolide- and tetracycline- cells by tetracycline-responsive
adjustable siRNA-mediated gene promotcrs, Proc. Natl. Acad. Sci.
silencing in mammalian cells using U.S.A. 1992,89, 5547-5551.
polymerase 11-dependent promoter 12. M. Gossen, A.L. Bonin, H. Bujard,
derivatives, Biotechnol. Bioeng. 2004, Control of gene activity in higher
88,417-425. eukaryotic cells by prokaryotic
7. M. Mandal, M. Lee, J.E. Barrick, regulatory elements, Trends Biochem.
2. Weinberg, G.M. Emilsson, W.L. S C ~1993,
. 18,471-475.
Ruzzo, R.R. Breaker, A glycine- 13. W. Weber, M. Fussenegger,
dependent riboswitch that uses Approaches for trigger-inducible viral
cooperative binding to control gene transgene regulation in geno-based
expression, Science 2004, 306, tissue engineering, Curr. Opin.
275-279. Biotechnol. 2004, 15, 383-391.
8. J.E. Barrick, K.A. Corbino, W.C. 14. W. Weber, C. Fux, M. Daoud-El
Winkler, A. Nahvi, M. Mandal, Baba, B. Keller, C.C. Weber,
192
I B.P. Kramer, C. Heinzen, D. Aubel, 23. Y. Wang, B.W. O’Malley,Jr, S.Y.
J.E. Bailey, M. Fussenegger, Tsai, B.W. O’Malley,A regulatory
Macrolide-basedtransgene control in system for use in gene transfer, Proc.
mammalian cells and mice, Nat. Natl. Acad. Sci. U.S.A. 1994, 91,
Biotechnol. 2002, 20, 901-907. 8180-81 84.
15. P. Neddermann, C. Gargioli, 24. H.A. Greisman, C.A. Pabo, A general
E. Muraglia, S. Sambucini, strategy for selecting high-affinity
F. Bonelli, R. De Francesco, Zinc finger proteins for diverse DNA
R. Cortese, A novel, inducible, targets, Science 1997, 275, 657-661.
eukaryotic gene expression system 25. Y. Choo, A. Klug, Selection of DNA
based on the quorum-sensing binding sites for Zinc fingers using
transcription factor TraR, EMBO Rep. rationally randomized DNA reveals
2003,4,159-165. coded interactions, Proc. Natl. Acad.
16. W. Weber, M. Rimann, M. Spiel- Sci. U.S.A. 1994, 91,11168-11172.
mann, B. Keller, M. Daoud-El Baba, 26. D. No, T.P. Yao, R.M. Evans,
D. Aubel, C.C. Weber, M. Fusse- Ecdysone-inducible gene expression
negger, Gas-inducible transgene in mammalian cells and transgenic
expression in mammalian cells and mice, Proc. Natl. Acad. Sci. U.S.A.
mice, Nat. Biotechnol. 2004, 22, 1996, 93,3346-3351.
1440- 1444. 27. S.T. Suhr, E.B. Gil, M.C. Senut, F.H.
17. A.C.U. Steinmetz, J.P. Renaud, Gage, High level transactivation by a
D. Moras, Binding of ligands and modified Bombyx ecdysone receptor
activation of transcription by nuclear in mammalian cells without
receptors, Annu. Rev. Biophys. exogenous retinoid X receptor, Proc.
Biomol.StWct. 2001, 30, 329-359. Natl. Acad. Sci. U.S.A. 1998, 95,
18. A. Aranda, A. Pascual, Nuclear 7999-8004.
hormone receptors and gene 28. C.N. Cronin, B.A. Malcolm, J.F.
expression, Physiol. Rev. 2001, 81, Kirsch, Reversal of substrate charge
1269- 1304. specificity by site-directed
19. S. Green, P. Chambon, Oestradiol mutagenesis of aspartate
induction of a glucocorticoid- aminotransferase, J . Am. Chem. Soc.
responsive gene by a chimaeric 1987, 109,2222-2223.
receptor, Nature 1987, 325, 75-78. 29. A.R. Clarke, T. Atkinson, J.J.
20. I. J. Lee, P.H. Driggers, J.A. Medin, Holbrook, From analysis to
V.M. Nikodem, Recombinant synthesis: new ligand binding sites
Thyroid Hormone Receptor and on the lactate dehydrogenase
Retinoid X Receptor Stimulate framework. Part I , Trends Biochem.
Ligand- Sci. 1989, 14, 101-105.
Dependent Transcription in vitro, 30. D.M. Spencer, T.J. Wandless, S.L.
Proc. Natl. Acad. Sci. U.S.A. 1994, 91, Schreiber, G.R. Crabtree, Controlling
1647-1651. signal transduction with synthetic
21. S.M. Pemrick, P. Abarzua, ligands, Science 1993, 262,
C. Kratzeisen, M.S. Marks, J.A. 1019- 1024.
Medin, K. Ozato, J.F. Grippo, 31. P.J. Belshaw, J.G. Schoepfer, K.-Q.
Characterization of the chimeric Liu, K.L. Morrison, S.L. Schreiber,
retinoic acid receptor Rational Design of Orthogonal
RARalpha/VDR, Leukemia 1998, 12, Receptor-Ligand Combinations,
554-562. Angew. Chem., Znt. Ed. Engl. 1995,34,
22. C.C. Thompson, R.M. Evans, 2129-2132.
Trans-activation by Thyroid 32. S.N. Ho, S.R. Biggar, D.M. Spencer,
Hormone receptors: functional S.L. Schreiber, G.R. Crabtree,
parallels with Steroid Hormone Dimeric ligands define a role for
receptors, Proc. Natl. Acad. Sci U.S.A. transcriptional activation domains in
1989,86,3494-3498. reinitiation, Nature 1996, 382, 822.
References I 1 9 3
33. A. Bishop, 0. Buzko, S. Heyeck- Katzenellenbogen, The estrogen

Dumas, I. Jung, B. Kraybill, Y. Liu, receptor: a structure-based approach
K. Shah, S. Ulrich, L. Witucki, to the design of new specific
F. Yang, C. Zhang, K.M. Shokat, hormone-receptor combinations,
Unnatural ligands for engineered Chem. Biol. 2001, 8,277-287.
proteins: new tools for chemical 43. Y. Shi, J.T. Koh, Selective regulation
genetics, Annu. Rev. Biophys. Biomol. of gene expression by an orthogonal
Stwct. 2000, 29, 577-606. estrogen receptor-ligand pair created
34. J.T. Koh, Engineering selectivity and by polar-group exchange, Chem. Biol.
discrimination into ligand-receptor 2001, 8,501-510.
interfaces, Chem. Biol. 2002, 9, 44. Y.H. Shi, J.T. Koh, Functionally
17-23. orthogonal ligand-receptor pairs for
35. S.M. Ulrich, 0. Buzko, K. Shah, K.M. the selective regulation of gene
Shokat, Towards the engineering of expression generated by
an orthogonal protein kinasel manipulation of charged residues at
nucleotide triphosphate pair, the ligand-receptor interface of ER
Tetrahedron 2000, 56, 9495-9502. alpha and ER beta, /. Am. Chem. Soc.
36. J.A. Katzenellenbogen, R. Muthyala, 2002, 124,6921-6928.
B.S. Katzenellenbogen, Nature of the 45. A.M. Brzozowski, A.C. Pike,
ligand-binding pocket of estrogen Z. Dauter, R.E. Hubbard, T. Bonn,
receptor alpha and beta: the search 0. Engstrom, L. Ohrnan, G.L.
for subtype-selective ligands and Greene, J.A. Gustafsson,
implications for the prediction of M. Carlquist, Molecular basis of
estrogenic activity, Pure Appl. Chew. agonism and antagonism in the
2003, 75,2397-2403. oestrogen receptor, Nature 1997, 389,
37. D.J. Peet, D.F. Doyle, D.R. Corey, 753-758.
D.J. Mangelsdorf, Engineering novel 46. N. Miller, J. Whelan, Random
specificities for ligand-activated mutagenesis of human estrogen
transcription in the nuclear hormone receptor ligand binding domain
receptor RXR, Chem. Biol. 1998, 5, identifies mutations that decrease
13-21. sensitivity to estradiol and increase
38. D.F. Doyle, D.A. Braasch, L.K. sensitivity to a diphenol indene-ol
Jackson, H.E. Weiss, M.F. Boehm,
compound: basis for a regulatable
expression system, 1.Steroid.
D.J. Mangelsdorf, D.R. Corey,
Engineering orthogonal
Biochem. Mol. Biol. 1998, 64,
ligand-receptor pairs from “Near
129-135.
Drugs”, /. Am. Chew. SOC.2001, 123,
47. J. Whelan, N. Miller, Generation of
11367-11371.
estrogen receptor mutants with
39. J.T. Koh, M. Putnam,
altered ligand specificity for use in
M. Tomic-Canic, C.M. McDaniel,
Selective regulation of gene establishing a regulatable gene
expression using rationally-modified expression system, /. Steroid.
retinoic acid receptors, /. Am. Chem. Biochem. Mol. Biol. 1996, 58, 3-12.
SOC.1999, 121,1984-1985. 48. D. Metzger, J. Clifford, H. Chiba,
40 A. Warshel, J. Aqvist, Electrostatic P.Chambon, Conditional
energy and macromolecular site-specific recombination in
function, Annu. Rev. Biophys. Chem. mammalian cells using a
1991, 20,267-298. ligand-dependent chimeric Cre
41. J.-K. Hwang, A. Warshel, Why ion recombinase, Proc. Natl. Acad. Sci.
pair reversal by protein engineering U.S.A. 1995, 92,6991-6995.
is unlikely to succeed, Nature 1988, 49. R. Feil, J. Brocard, B. Mascrez,
334,270-272. M. LeMeur, D. Metzger,
42. R. Tedesco, J.A. Thomas, B.S. P. Chambon, Ligand-activated
Katzenellenbogen, J.A. site-specific recombination in mice,
194
I Proc. Natl. Acad. Sci. U.S.A. 1996, 93, nongenomic effects, Pharmacol. Rev.
10887-10890. 2000,52, 513-555.
50. R. Feil, J. Wagner, D. Metzger, 60. E.J. Filardo, J.A. Quinn, K.I. Bland,
P. Chambon, Regulation of Cre A.R. Fracltelton, Estrogen-induced
recombinase activity by mutated activation of Erk-1 and Erk-2 requires
estrogen receptor ligand-binding the G protein-coupled receptor
domains, Biochem. Biophys. Res. homolog, GPR30, and occurs via
Commun. 1997,237,752-757. trans-activation of the epidermal
51. J.A.Sawicki, B. Monks, R.J. Morris, growth factor receptor through
Cell-specific ecdysone-inducible release of HB-EGF, Mol. Endocrinol.
expression of FLP recombinase in 2000, 14,1649-1660.
mammalian cells, Biotechniques 1998, 61. E.J. Filardo, J.A. Quinn, A.R.
25,868-870,872-865. Frackelton, K.I. Bland, Estrogen
52. J. Brocard, R. Feil, P. Chambon, action via the G protein-coupled
D. Metzger, A chimeric Cre receptor, GPR30: stimulation of
recombinase inducible by adenylyl cyclase and CAMP-mediated
synthetic,but not by natural ligands attenuation of the epidermal growth
of the glucocorticoid receptor, Nucleic factor receptor-to-MAPK signaling
Acids Res. 1998, 26,4086-4090. axis, Mol. Endocrinol. 2002, 16, 70-84.
53. C. Kellendonk, F. Tronche, A.P. 62. P. Thomas, Y. Pang, E. J. Filardo,
Monaghan, P.O. Angrand, J. Dong, Identity of an estrogen
F. Stewart, G. Schutz, Regulation of membrane receptor coupled to a G
Cre recombinase activity by the protein in human breast cancer cells,
synthetic steroid RU 486, Nucleic Endocrinology 2005, 146,624-632.
Acids Res. 1996, 24, 1404-1411. 63. S. Balasenthil, R.K. Vadlamudi,
54. R. Losel, M. Wehling, Nongenomic Functional interactions between the
actions of steroid hormones, Nut. estrogen receptor coactivator
PELPl/MNAR and retinoblastoma
Rev. Mol. Cell Bio. 2003, 4, 46-56.
55. M.C. Farach-Carson, I. Nemere, protein, 1.Biol. Chem. 2003, 278,
22119-22127.
Membrane receptors for vitamin D
64. F. Barletta, C.W. Wong, C. McNally,
steroid hormones: potential new
B.S. Kornm, B. Katzenellenbogen,
drug targets, Curr. Drug Targets 2003,
B.J. Cheskis, Characterization of the
4, 67-76.
interactions of estrogen receptor and
56. K. Nemere, S.E. Safford, B. Rohe,
MNAR in the activation of cSrc, Mol.
M.M. DeSouza, M.C. Farach-Carson, Endocrinol. 2004, 18,1096-1108.
Identification and characterization of 65. D.P. Edwards,
1,25D(3)-membrane-associated rapid V. Boonyaratanakornkit, Rapid
response, steroid (1,25D(3)-MARRS) extranuclear signaling by the
binding protein, J . Steroid Biochem. estrogen receptor (ER): MNAR
Mol. Biol. 2004, 89-90, 281-285. couples ER and Src to the MAP
57. R. Khoury, A.L. Ridall, A.W. kinase signaling pathway, Mol. Intern.
Norman, M.C. Farachcarson, 2003,3,12-15.
Analogs of vitamin-D(3) selectively 66. L. Li, M.P. Haynes, J.R. Bender,
activate genomic and nongenomic Plasma membrane localization and
pathways in osteoblasts, J. Bone function of the estrogen receptor
Miner. Res. 1993, 8, S220-S220. alpha variant (ER46) in human
58. P. J. Davis, F.B. Davis, Nongenomic endothelial cells, Proc. Natl. Acad.
actions of thyroid hormone on the Sci. U.S.A. 2003, 100,4807-4812.
heart, Thyroid 2002, 12,459-466. 67. D.S. Latchman, Transcription-factor
59. E. Falkenstein, H.C. Tillmann, mutations and disease, N. Engl. J.
M. Christ, M. Feuring, M. Wehling, Med. 1996,334,28-33.
Multiple actions of steroid 68. D.M. Tanenbaum, Y. Wang,
hormones - A focus on rapid, S.P. Williams, P.B. Sigler,
References I195
Crystallographic comparison of the 75. Y. Shi, H. Ye, K.H. Link, M.C.

estrogen and progesterone receptor’s Putnam, I. Hubner, S. Dowdel, J.T.
ligand binding domains, Proc. Natl. Koh, Mutant-selective thyromimetics
Acad. Sci. U.S.A. 1998, 95, for the chemical rescue of thyroid
5998-6003. hormone mutants associated with
69. I . Barroso, M. Gurnell, V.E. Crowley, resistance to thyroid hormone (RTH),
M. Agostini, J.W. Schwabe, M.A. Biochem. J . 2005, 44,4612-4626.
Soos, G.L. Maslen, T.D. Williams, 76. A. Hashimoto, Y. Shi, K. Drake, J.T.
H. Lewis, A.J. Schafer, V.K. Koh, Design and synthesis of
Chatterjee, S. O’Rahilly, Dominant complementing ligands for mutant
negative mutations in human thyroid hormone receptor
PPARgamma associated with severe TRb(R320H): a tailor-made approach
insulin resistance, diabetes mellitus towards the treatment of resistance to
and hypertension [see comments], thyroid hormone, Bioorg. Med. Chem.
Nature 1999, 402,880-883. 2005, 13(11):3627-3639 In Press.
70. M. Marcelli, M. Ittmann, S. Mariani, 77. B.A. Foster, H.A. Coffey, M.J. Morin,
R. Sutherland, R. Nigam, L. Murthy, F. Rastinejad, Pharmacological
rescue of mutant p53 conformation
Y.L. Zhao, D. DiConcini, E. Puxeddu,
and function, Science 1999, 286,
A. Esen, J. Eastham, N.L. Weigel, D.J.
2507- 25 10.
Lamb, Androgen receptor mutations
78. A.N. Bullock, A.R. Fersht, Rescueing
in prostate cancer, Cancer Res. 2000,
the function of mutant p53, Nat. Rev.
60,944-949.
Cancer 2001, I , 68-76.
71. T. Takeda, S. Suzuki, R.T. Liu, L.J.
79. V. Bernier, J.P. Morello, A. Salah-
DeGroot, Triiodothyroacetic acid has pour, M.F. Arthus, A. Laperriere,
unique potential for therapy of M. Lonergan, M. Bouvier, D.G.
resistance to thyroid hormone, J . Bichet, A pharmacological chaperone
Clin.Endocrinol. Metab. 1995, 80, acting at the V2-vasopressin receptor
2033-2040. offers a treatment for nephrogenic
72. S.A. Gardezi, C. Nguyen, P.J. Malloy, diabetes insipidus, F A S E B J . 2002,
G.H. Posner, D. Feldman, S. Peleg, A 16, A142-Al43.
rationale for treatment of hereditary 80. J.P. Morello, A. Salahpour,
vitamin D-resistant rickets with A. Laperriere, V. Bernier, M.F.
analogs of 1 alpha,25- Arthus, M. Lonergan, U. Petaja-
dihydroxyvitamin D-3,J . Biol. Chem. Repo, S. Angers, D. Morin, D.G.
2001, 276 29148-29156. Bichet, M. Bouvier, Pharmacological
73. M. Agostini, M. Gurnell, D.B. chaperones rescue cell-surface
Savage, E.M. Wood, A.G. Smith, expression and function of misfolded
0. Rajanayagam, K.T. Garnes, S.H. V2 vasopressin receptor mutants, J .
Levinson, H.E. Xu, J.W.R. Schwabe, Clin. Invest. 2000, 105, 887-895.
T.M. Willson, S. O’Rahilly, V.K. 81. S.M. Noonvez, V. Kuksa,
Chatterjee, Tyrosine Agonists reverse Y.Imanishi, L. Shu, S. Filipek,
the molecular defects associated with K. Palczewski, S. Kauushal,
dominant-negative mutations in Pharmacological
human peroxisome Chaperone-mediated in vivo folding
proliferator-activated receptor and stabilization of the P23H-opsin
gamma, Endocrinology 2004, 145, mutant associated with Autosomal
1527-1538. Dominant Retinitis Pigmentosa, J .
74. H.F. Ye, K.E. O’Reilly, J.T. Koh, A Biol. Chew. 2003,278,14442-14450.
subtype-selective thyromimetic 82. A.R. Sawkar, W.C. Cheng, E. Beutler,
designed to bind a mutant thyroid C.H. Wong, W.E. Balch, J.W. Kelly,
hormone receptor implicated in Chemical chaperones increase the
resistance to thyroid hormone, J . Am. cellular activity of N370S
Chem. Soc. 2001, 223,1521-1522. beta-glucosidase: a therapeutic
196
I strategy for Gaucher Disease, Proc. the life of a mouse, Development
Natl. Acad. Sci. U.S.A. 2002, 99, 2002, 129,815-829.
15428-15433. 94. F.G. Cruz, J.T. Koh, K.H. Link,
83. F.E. Cohen, J.W. Kelly, Therapeutic Light-activated gene expression, J .
approaches to protein-misfolding Am. Chem. SOC.2000, 122,
diseases, Nature 2003, 426,905-909. 877778778,
84. S.L. Swann, J. Bergh, M.C. Farach- 95. K.H. Link, F.G. Cruz, H.-F. Ye,
Carson, C.A. Ocasio, J.T. Koh, K. O’Reilly, S. Dowdell, J.T. Koh,
Stmcture-based design of selective Photo-caged agonists of the nuclear
agonists for a rickets-associated receptors RARg and TRb provide
mutant of the vitamin D receptor, J . unique time-dependent gene
Am. Chem. SOC.2002, 124, expression profiles for light-activated
13795- 13805. gene patterning, Bioorg. Med. Chem.
85. S.L. Swann, J.J. Bergh, M.C. 2004, 12,5949-5959.
Farach-Carson, J.T. Koh, Rational 96. Y.H. Shi, J.T. Koh, Light-activated
design of vitamin D-3 analogues transcription and repression by using
which selectively restore activity to a photocaged SERMs, Chembiochem
vitamin D receptor mutant associated 2004,5,788-796.
with rickets, Org. Lett. 2002, 4, 97. W.Y. Lin, C. Albanese, R.G. Pestell,
3863-3866. D.S. Lawrence, Spatially discrete,
86. E. Baldwin, W.A. Baase, X.J. Zhang, light-driven protein expression,
V. Feher, B.W. Matthews, Generation Chem. Biol. 2002, 9,1347-1353.
of ligand binding sites in T4 98. 2 . Nawaz, D.M. Lonard, A.P. Dennis,
lysozyme by deficiency-creating C.L. Smith, B.W. O’Malley,
substitutions, J . Mol. Biol. 1998, 277, Proteasome-dependent degradation
467 -485. of the human estrogen receptor, Proc.
87. Q. Lin, C.F. Barbas, P.G. Schultz, Natl. Acad. Sci. U.S.A. 1999, 96,
Small-molecule switches for zinc 1858-1862.
finger transcription factors, J . Am. 99. A. Dace, L. Zhao, K.S. Park,
Chem. Soc. 2003, 125, 612-613. T. Fumno, N. Takamura,
88. L.L. Looger, M.A. Dwyer, J.J.Smith, M. Nakanishi, B.L. West, J.A.
H.W. Hellinga, Computational Hanover, S. Cheng, Hormone
design of receptor and sensor binding induces rapid
proteins with novel functions, Nature proteasome-mediated degradation of
2003,423,185-190. thyroid hormone receptors, Proc.
89. M. Allert, S.S. Rizk, L.L. Looger, Natl. Acad. Sci. U.S.A. 2000, 97,
H.W. Hellinga, Computational 8985-8990.
design of receptors for an 100. D.L. Osburn, G. Shao, H.M. Seidel,
organophosphate surrogate of the I.G. Schulman, Ligand-dependent
nerve agent soman, Proc. Natl. Acad. degradation of retinoid X receptors
Sci. U.S.A.2004, 101,7907-7912, does not require transcriptional
90. X. Yang, J.G. Saven, Computational activity or coactivator interactions,
combinatorial protein design: Mol. Cell. Biol. 2001, 21, 4909-4918.
sequence search and statistical 101. M. Qiu, C.A. Lange, MAP kinases
design, Abstr. Pap. Am. Chem. SOC. couple multiple functions of human
2004,228, U523-US23. progesterone receptors: degradation,
91. J.G. Saven, Combinatorial protein transcriptional synergy, and nuclear
design, Curr. Opin. Struct. Biol. 2002, association, J. Steroid Biochem. Mol.
12,453-458. Biol. 2003, 85, 147-157.
92. C. Tickle, Patterning i n Vertibrate 102. K. Curley, D.S. Lawrence,
Development, Vol. 41, Oxford Light-activated proteins, Curr. Opin.
University Press, Oxford, 2003. Chem. Biol. 1999, 3, 84-88.
93. M. Zernicka-Goetz, Patterning of the 103. M.S. Chang, F.R. Haselton, Light -
embryo: the first spatial decisions in activated protein expression using
References I 197
caged transfected plasmid 11: delivery Targeting expression with light using
by gene gun to organ cultured caged DNA, 1.Biol. Chem. 1999, 274,
corneas, Invest. Ophthalmol. Vis. Sci. 20895-20900.
1997,38,2083-2083. 106. H. Ando, T. Fumta, R.Y. Tsien,
104. F.R. Haselton, W.C. Tseng, M.S. H. Okamoto, Photo-mediated gene
Chang, Light activated protein activation using caged RNA/DNA in
expression using caged transfected zebrafish embryos, Nat. Genet. 2001,
plasmid I: delivery by liposomes to 28,317-325.
cultured retinal endothelium, Invest. 107. S. Shah, S. Rangarajan, S.H.
Ophthalmol. Vis. Sci. 1997, 38, Friedman, Light-activated RNA
2082-2082. interference, Angew. Chem. Int. Ed
105. W.T. Monroe, M.M. McQuain, M.S. 2005,44,1328-1332.
Chang, J.S. Alexander, F.R. Haselton,
Chemical Biology
I199
4
Controlling Protein- Protein Interactions
4.1
Chemical Complementation: Bringing the Power o f Genetics to Chemistry
Pamela Peralta-Yahya and Virginia W. Cornish
Outlook
Genetics in many ways is the underpinning of modern cell biology, having

provided a straightforward experimental approach to identify the proteins
involved in a given biological pathway. As practised, however, genetics leaves
us with a picture of the cell composed largely of proteins. The roles of other
molecules, such as phosphoinositides or siRNAs, have long been overlooked.
With growing interest in developing a complete description of a living cell
and with the backdrop of the genome sequencing projects, the question would
seem to be how to extend the ease of genetics to these other classes of
molecules. With a complete palette, it would then be possible to fully harness
the powerful synthetic and functional capabilities of the cell for chemistry
beyond that naturally carried out by the cell (Fig. 4.1-1).Here we consider a
particular genetic assay, the yeast two-hybrid assay, in light of these challenges.
4.1.1
Introduction
The two-hybrid assay, which detects protein-protein interactions as reconsti-

tution of a transcriptional activator, provides a general, high-throughput assay
for cloning any protein on the basis of its interaction with another protein.
Introduced only in 1989, the two-hybrid assay has proven so robust that today
roughly half of the known protein-protein interactions are determined in part
using the two-hybrid assay. In this, chapter we look at more recent efforts to
extend this powerful genetic assay to read-out the other important molecules in
ISBN: 978-3-527-31150-7
200
I 4 Controlling Protein-Protein Interactions
Fig. 4.1-1 Chemical Complementation combines the power of

genetic assays and small molecule chemistry to understand small
molecule function and develop new chemistry inside the cell.
the cell, such as nucleic acids and small molecules. We also consider the pos-
sibilities for exploiting the two-hybrid assay for chemical discovery-extending
the power of genetics to chemistry not naturally carried out in the cell.
The two-hybrid assay works by detecting protein-protein interactions as
reconstitution of a transcriptional activator, a natural eukaryotic transcription
factor, and as activation of a reporter gene. One protein is fused to the
DNA-binding domain (DBD) of the transcriptional activator, and the other
protein is fused to the activation domain (AD).If the two proteins bind to one
another, they effectively dimerize and hence reconstitute the transcriptional
activator (Fig. 4.1-2). In practice, this assay is used not just to test a single
protein-protein interaction, but to test all of the proteins expressed in a given
organism or cell line for binding to the protein of interest. A library of AD-
fusion proteins, encoding all ca lo4 different proteins, is transformed en masse
into an appropriate two-hybrid selection strain containing the DBD-protein
fusion of interest. Only cells expressing an AD-protein fusion that binds
to the DBD-protein fusion will then survive under the appropriate reporter
gene selection conditions. The assay is general because the transcription-
based selection works for any protein-protein interaction. Therefore, while
4. I Chemical Complementation: Bringing the Power ofGenetics to Chemistv I 201
Fig. 4.1-2 In the yeast two-hybrid system, activator recruits the transcriptional
dimerization of fusion proteins machinery t o the promoter region of the
X-DNA-binding domain and Y-activation reporter gene, initiating its transcriptional
domain reconstitutes the transcriptional activation.
activator. The reconstituted transcriptional
traditional genetic assays rely on pathway-specific cell survival selections or

phenotypic screens, to which not all pathways or proteins in a pathway are
amenable, the two-hybrid assay can be applied to any given protein-protein
interaction, since the transcription-based read-out is independent of the
particular pathway being studied. The assay is high-throughput because
standard molecular biology techniques allow large libraries (ca 105-107
in yeast) to be tested simultaneously, where only the cells expressing an
interacting protein pair survive.
The other strength of the two-hybrid assay is the ease with which it can
be carried out using modern methods in molecular biology. At the end of
a two-hybrid selection, the interacting proteins can be read-out simply by
extracting the DNA encoding the AD-fusion proteins from the surviving cells
and by sequencing the DNA, As a proof of the power of this approach, the
two-hybrid assay is now essential to any effort to clone proteins along a given
biological pathway. Moreover, the fortuitous development of the two-hybrid
assay concurrent with genome sequence projects, enables the construction of
exact cDNA-ADlibraries based on this data, thus facilitating protein identity to
be readily extracted from a random DNA library. The high-throughput nature
of the two-hybrid assay even allows protein interaction studies to be carried out
on a genome-wide scale. For example, analyzing all ca 6000 proteins expressed
in yeast for binding to one another by testing all GOO0 DNA-binding protein
fusions to their 6000 AD counterparts.
As with the field of genetics as a whole, the two-hybrid assay is biased
toward proteins. As variations of this assay, which can detect DNA, RNA,
and small molecule binding, are now developed, it is exciting to imagine
202
I the potential for basic science discovery for the roles of these molecules in
4 Controlling Protein-Protein Interactions
the cell. Furthermore, these so-called n-hybrid assays extend these powerful
transcription-based genetic assays to chemistry not naturally carried out in
the cell. This extension should allow these genetic assays to be used not only
for the discovery of biological pathways but also for new chemistry, including
drug discovery and the directed evolution of molecules with new functional
properties.
4.1.2
History/Developrnent
Since the conception of the two-hybrid assay to detect protein-protein

interactions in vivo at the end of the 1980s, key modifications to this assay
have expanded its scope to detect DNA-, RNA-, and small molecule-protein
interactions in so-called n-hybrid assays. More recently, “n-hybrid” assays
have also been used to detect enzyme catalysis, where enzyme activity is
linked to cell survival via transcription of a reporter gene. Here we look at the
initial publications that moved the two-hybrid assay into each of these new
directions.
4.1.2.1 Protein-Protein Interactions

In 1989 Fields and Song introduced the “Yeast Two-Hybrid Assay”
which provides a straightforward method for detecting protein-protein
interactions in uivo [l].Until the development of the two-hybrid methodology,
protein-binding interactions had been detected using traditional biochemical
techniques such as coimmunoprecipitation, affinity chromatography, and
photoaffinity labeling [2]. There are three significant advantages to this in vivo
assay that led almost immediately to its widespread use: first, it is technically
straightforward and can be carried out rapidly; second, the sequence of the
two interacting proteins can be read off directly from the DNA sequence of the
plasmids encoding them; and third, it does not depend on the identity of the
interacting proteins and so is general.
The two-hybrid assay was based on the observation that eukaryotic
transcriptional activators can be dissected into two functionally independent
domains, a DBD and a transcription AD, and that hybrid transcriptional
activators can be generated by mixing and matching these two domains [3].
It appears that the DBD only needs to bring the AD into the proximity of the
transcription start site, suggesting that the linkage between the DNA-binding
and the AD can be manipulated without disrupting activity. Thus, the linkage
in the two-hybrid assay is the noncovalent bond between the two interacting
proteins.
As outlined in Fig. 4.1-3(a),the yeast two-hybrid system consists of two
protein chimeras, and a reporter gene downstream from the binding site for
4. J Chemical Complementation:Bringing the Power ofGenetics to Chemistry I 203
DBD
I >
A I
DBD
I >
I DNA binding site I I Reporter gene I I DNA binding site I I Reporter gene I
DBD DBD
I DNA bindinq site I I DNA binding site I

Fig. 4.1-3 Different yeast n-hybrid systems given DBD. (c) The three-hybrid system that
that have been developed t o study can detect RNA-protein interactions has
protein-protein, protein-DNA, one more component than the yeast
protein-RNA, and protein-small molecule two-hybrid system: a hybrid RNA molecule.
interactions. (a) In the original version o f One half ofthe hybrid RNA is a known RNA
the yeast two-hybrid system, transcriptional (R) that binds to the MS2 coat protein
activation o f the reporter gene i s (MS2) with high affinity and serves as an
reconstituted by recruitment o f the anchor. The other half i s RNA X, whose
activation domain (AD) to the promoter interaction with protein Y is being tested.
region through direct interaction o f protein (d) Another version o f the yeast three-hybrid
X and Y, since protein X is fused t o a system can be used t o detect small
DNA-binding domain (DBD) and protein Y molecule-protein interactions. Ligand L1
i s fused to the AD. (b) In the one-hybrid that interacts with protein X is covalently
system, the AD is fused directly t o the DBD. linked to ligand L2. Thus, i f L2 interacts with
This system can be used to assay either Y, transcriptional activation of the reporter
DBDs that can bind t o a specific DNA gene will be reconstituted.
sequence or the in vivo binding site for a
the transcriptional activator. If the two proteins of interest (X and Y) interact,

they effectively dimerize the DNA-binding protein chimera (DBD-X)and the
transcription activation protein chimera (AD-Y). Dimerization of the DBD and
the transcription AD helps to recruit the transcription machinery to a promoter
adjacent to the binding site for the transcriptional activator, thereby activating
transcription of the reporter gene.
The assay was demonstrated initially by using two yeast proteins known to
be physically associated in vivo [l].The yeast S N F l protein, a serine-threonine
protein kinase, was fused to the GAL4 DBD, and the SNFl activator protein
SNF4 was fused to the GAL4 transcription AD. A GAL4 binding sequence was
placed upstream of a /?-galactosidasereporter gene (lacz).Plasmids encoding
204
I the protein fusions and the reporter gene were introduced into the yeast.
4 Contro//ing Protein-Protein Interactions
Positive protein-protein interactions lead to the increase in B-galactosidase

activity inside the cell, which can be tested in a colorimetric assay using
5-bromo-4-chloro-3-indolylB-D-galactosidase (X-gal)that turns the cells blue,
or by direct measurement of enzyme activity using chlorophenol red B-D-
galactopyranoside as a substrate. Control experiments established that neither
the DBD and AD domains on their own nor the individual protein chimeras
induced B-galactosidase synthesis above background levels. B-Galactosidase
synthesis levels were increased 200-fold when the DBD-SNF1 and SNFCAD
fusion proteins were introduced together. By comparison, the direct DBD-AD
fusion protein activated B-galactosidase synthesis levels 4000-fold.
It was quickly realized that the strength of the two-hybrid assay would lie
not in its ability to detect a single protein-protein interaction but rather to
screen an entire genome to detect novel protein-protein interactions [4-91. For
example, Murray and coworkers, as a first step toward testing their hypothesis
that the cyclin-dependent kinase (CDK) Cdc20 is involved in the spindle
assembly checkpoint in budding yeast, used the yeast two-hybrid assay to
determine if any of the proteins known to be involved in the spindle checkpoint
physically interact with Cdc20 [lo]. In this experiment, haploid strains
containing DBD-MAD (mitotic arrest defective) fusions were crossed with
haploid strains containing AD-Cdc2O fusions. Protein-protein interactions in
the resulting diploids lead to transcription activation of the lacZ reporter gene.
As controls, haploid strains containing SNF1-AD and SNF4-DBD fusions
were also mated and tested for B-galactosidase activity. The yeast two-hybrid
system detected three new protein partners for Cdc2O: MAD1, MAD2, and
MAD3. In this experiment, the yeast two-hybrid assay was the key in rapidly
and effectivelyidentifying the new protein-protein interactions. Identification
of these interactions using more traditional biochemical methods, such as
coimmunoprecipitation,would have been cumbersome and time consuming
since those methods require prior isolation of large quantities of all possible
interacting proteins before running the assays. By facilitating the discovery
of cascades of interacting proteins - in this case, the spindle assembly
checkpoint - the yeast two-hybrid assay helps researchers put together entire
biochemical pathways and to begin understanding how these proteins function
together inside a cell.
4.1.2.2 DNA-Protein Interactions

Early on it was appreciated that, just as the yeast two-hybrid assay could be
used to detect protein-protein interactions, transcriptional activators could
be used directly, in a “one-hybrid” assay, to detect DNA-protein interactions
(Fig. 4.1-3(b))[ll,121. DNA-binding proteins that bind to a given target DNA
sequence could be isolated from cDNA libraries encoding all the proteins
expressed in a given organism or specific cell type. Alternatively, the optimal
or naturally occurring recognition sequences for a given regulatory protein
4. I Chemical Complementation: Bringing the Power ofGenetics to Chemistry I 205
could be determined. With such an approach, Wang and Reed isolated a

complementary DNA for the transcriptional activator, Olf-1, believed to be the
critical switch for the coordinated expression of olfactory-specific genes [ 131.
To achieve this, they fused an olfactory cDNA library, consisting of 3.6 million
clones, to the GAL4 transcription AD. The reporter plasmid consisted of three
tandem Olf-1 binding sites upstream of a low activity promoter directing the
transcriptional activation of the H I S 3 gene. The reporter plasmid requires the
AD-cDNA fusion protein to bind to the Olf-1 sites and activate the transcription
of the HIS3 gene. Therefore, only cells expressing the AD-cDNA fusion are
able to grow on medium lacking histidine.
4.1.2.3 RNA-Protein Interactions

Selecting for RNA-protein interactions is less straightforward because RNA-
protein fusions cannot be generated directly in vivo and because routine
biochemical assays that turn RNA-binding events into an amplifiable signal
are not available. These difficulties were circumvented by adding a third
component to the two-hybrid system to generate a “three-hybrid” assay
(Fig. 4.1-3(c)) [14, 151. The third component is a hybrid RNA molecule, in
which one half is a well-studied RNA molecule that binds to a known protein
with high affinity and the other half is the RNA molecule of interest whose
protein-bindingpartner is in question. In total, the three-hybrid system consists
of two protein chimeras, one RNA chimera, and a reporter gene. The hybrid
RNA molecule bridges the DNA-binding and AD-fusion proteins and activates
transcription of a reporter gene.
In a proof of principle experiment, Wickens and coworkers showed that
the RNA three-hybrid system could detect the interactions between two well-
studied protein-RNA pairs: the iron regulatory protein (IRPl) to the iron
response element (IRE) RNA sequence, and the HIV transactivator (TAT)
protein to the HIV transactivation response (TAR) element RNA sequence
[16]. First, they constructed a bifunctional RNA containing a RNA sequence
known to bind the coat protein MS2 and the RNA sequence of either IRE
or TAR. Next, they fused the DNA-binding domain to the coat protein MS2,
and the AD to either the IRPl or TAT proteins. The two protein fusions and
the bifunctional RNA were introduced in a yeast strain containing a reporter
construct that directs activation of both a lacZ reporter gene and a H I S 3
reporter gene upon RNA-protein interaction. These reporter genes allow the
authors to carry the assay as a colorimetric screen using the lacZ reporter gene
and as a selection where only cells containing an interacting RNA-protein
pair survive on medium lacking histidine. Furthermore, using 3-amino-1,2,3-
triazole (3-AT),a competitive growth inhibitor of the enzyme encoded by the
HIS3 gene, Wickens and coworkers were able to select only cells with elevated
expression levels of the H I S 3 gene, reducing the number of false positives in
the HIS3 growth selection.
206
4.1.2.4 Small molecule-Protein Interactions

Just as a dimeric RNA molecule can be introduced to mediate the interaction
between the DNA-binding and ADS, so can a dimeric small molecule [17].
In fact, well before their use in a small molecule three-hybrid assay, dimeric
small molecules were used as “chemical inducers of dimerization” (CIDs)
to artificially oligomerize fusion proteins in vivo [18]. In the yeast three-
hybrid system, the union of two protein fusions and a CID reconstitute
the transcription of a reporter gene (Fig. 4.1-3(d)).In 1996, Licitra and Liu
built what they called a yeast three-hybrid assay [19]. This assay consists of
two fusion proteins and a heterodimeric small molecule CID that brings
these fusion proteins together to activate the transcription of a reporter gene
(Fig. 4.1-3(d)).
Licitra and Liu employed two fusion proteins: the glucocorticoid receptor
(GR)fused to the DBD LexA, and FK 506-binding protein (FKBP12) fused to
the transcription AD B42 [19].A heterodimeric dexamethasone (Dex)-FK506
molecule that binds to GR and FKBP12, respectively, bridges the two fusion
proteins and activates the transcription of a lacZ reporter gene. Further,
using the GR-LexA fusion protein and the Dex-FK506 molecule in their yeast
three-hybrid assay, Licitra and Liu were able to isolate the FKBP isoform
with the highest affinity for FK506 (FKBP12) from a Jurkat cDNA library.
This experiment opened the yeast three-hybrid system as a tool for drug
discovery.
4.1.2.5 Catalysis
In all the previous applications, the n-hybrid assay is used to detect a binding
event, whether it is protein, DNA, RNA, or small molecule binding. Our
laboratory and others have been interested in the idea that this powerful
genetic assay could be brought to bear on a broader variety of questions.
Several different approaches have now been devised for linking enzyme
catalysis to reporter gene transcription using the n-hybridassay. Our laboratory
introduced “Chemical Complementation”, which detects enzyme catalysis of
bond formation or cleavage reactions on the basis of covalent coupling of two
small molecule ligands in vivo (Fig. 4.1-4) [20]. In this assay, the enzyme is
introduced as a fourth component to the small molecule yeast three-hybrid
system, and the linker in the small molecule CID acts as the substrate for the
enzyme. Bond formation is detected as synthesis of the CID and hence the
activation of an essential reporter gene; bond cleavage is detected as cleavage
of the CID and hence the repression of a toxic reporter gene. In theory, this
approach should be readily extended to new chemistry, simply by synthesizing
small molecule heterodimers with different chemical linkers as the enzyme
substrates. Inspired by traditional genetics, our hope is to make a general
complementation assay that would link enzyme catalysis of a broad range of
chemical reactions to cell survival-extending genetic selections to chemistry
beyond that naturally carried out in the cell.
4. I Chemical Comp/ementation: Bringing the Power ofGenetics t o Chemistry I 207
E I
Substrate
DBD
I
I DNA binding site I I Reporter gene I
Fig. 4.1-4 Chemical Complementation. A either cleavage or formation of the bond
reaction-independent complementation between the two small molecules can be
assay for enzyme catalysis based on the detected as a change in transcription o f the
yeast three-hybrid assay. A heterodimeric reporter gene. The assay can be applied t o
small molecule bridges a DNA-binding new chemical reactions simply by
domain-receptor fusion protein and an synthesizing small molecules with different
activation domain-receptor fusion protein, substrates as linkers and adding an enzyme
activating transcription o f a downstream as a fourth component t o the system.
reporter gene in vivo. Enzyme catalysis o f
In our initial report, we chose cephalosporin hydrolysis by the Enter-

obacter cloacae P99 p-lactamase (P99) as a well-studied enzyme catalyzed
cleavage reaction around which to develop Chemical Complementation [20].
Cephalosporins are B-lactam antibiotics, and p-lactamases are the bacterial
resistance enzymes that hydrolyze and inactivate these antibiotics. The P99
B-lactamase is well-characterized biochemically and structurally, and the syn-
thesis of cephalosporins is well established. First, we designed a small molecule
CID cephalosporin substrate, incorporating the CID ligands at the C 3’ and
C7 positions of the cephem core. Using a lacZ reporter gene, we showed
that Chemical Complementation could be used to detect B-lactamase activity
using this dexamethasone-methotrexate (Dex-Mtx)heterodimer with a cephem
linker (Dex-Cephem-Mtx). In the absence of enzyme, the Dex-Cephem-Mtx
CID dimerizes the appropriate DBD- and AD-fusion protein activating tran-
scription of a lacZ reporter gene. Expression of the P99 p-lactamase then
presumably leads to cleavage of the Dex-Cephem-Mtx CID, disrupting tran-
scription activation. We also showed that the system could distinguish the
wild-type (wt) enzyme from the inactive P99:SG4A variant, in which the critical
4 Controlling Protein-Protein fnteractions
208
I active site serine nucleophile has been mutated to an alanine, via a lacZ
screen. These experiments established the feasibility of detecting enzyme
catalysis using the yeast n-hybrid assay.
Benkovic and coworkers took a related approach in an assay they called
Quest (Querying for Enzymes using the Three-hybrid system), which detects
catalysis by coupling substrate turnover to transcription of a reporter gene
[21]. Here, the CID that dimerizes the transcriptional activator is a homodimer
of the substrate. Enzyme catalysis of free substrate to product is detected as
displacement of homodimeric CID substrate from the transcriptional activator
fusion proteins. Although this approach has the advantage ofusing unmodified
substrate, a new CID-protein pair has to be developed for each new reaction.
In a more biological approach, Peterson and coworkers have developed a
two-hybrid-based system to detect protein tyrosine kinase (PTK) activity [22].
This assay relies on the PTK-dependent phosphorylation of a tyrosine residue
present in a peptide that has been fused to the DBD. The phosphorylated
tyrosine is then bound by the phosphotyrosine-binding protein fused to the
AD, leading to transcriptional activation of the reporter gene. While limited
to peptide substrates, this approach has the advantage that it does not require
chemical synthesis, making it more accessible to biologists.
4.1.3
Whether being applied as in the original two-hybrid assay to detect

protein-protein interactions or in the related n-hybrid assays to detect
protein-DNA, RNA, or small molecule interactions, the basic components
of the n-hybrid assay remain the same. Thus, while we focus in this section on
the small molecule three-hybrid assay because it is in this that our laboratory
specializes, this section could also be used as a technical introduction to any of
the other n-hybrid systems. The real strength of the n-hybrid assays lies in how
straightfonvard they are to implement in the laboratory with basic knowledge
of Escherichia coli and Saccharomyces cerevisiae molecular biology. Moreover, the
commercial availability of the components of the two-hybrid system permits
any laboratory to rapidly implement the system. Finally, laboratories without
prior experience working with S. cerevisiae should not be deterred from carrying
out n-hybrid assays, as molecular biology techniques for this organism are
similar to those for E. coli.
4.1.3.1 The Chemical Inducer o f Dimerization (CID)

The effectiveness ofany three-hybrid system depends critically on the CID used
to dimerize the transcriptional activator in vivo [23,24]. The subject of CIDs has
been considered fully in the previous chapter by Clackson, so here we focus
on the issues we have found particularly important for the use of CIDs in the
three-hybrid assay. Our presentation of these considerations is based largely on

our own work with the yeast three-hybrid system and the CID ligand/receptor
pairs Dex/GR, FKS06/FK506 binding protein 12 (FK506/FKBP12), a syn-
thetic analog of FK50G SLF/FK506 binding protein 1 2 (SLF/FKBP12),
methotrexate/dihydrofolate reductase (Mtx/DHFR), 06-benzylguanine/06-
alkylated guanine-DNA alkyltransferase (BG/AGT),estrone/estrogen receptor
(ES/ER), and biotin/streptavidin (biotin/SA) (Fig. 4.1-5) [19, 23-28].
Dexamethasone
Me0
FK506 SLF
Trimethoprim
HO&
Estrone Biotin
Fig. 4.1-5 Small molecules used t o create chemical inducers of dimerization (CIDs) for
the yeast three-hybrid system.
210
First and foremost, a successful three-hybrid system seems to require a

high-affinity (low nanomolar KD) CID pair [29]. Using the most sensitive
reporter genes commercially available for the Brent LexA yeast three-hybrid
system, we found that FK506-Dex, Mtx-Dex, Mtx-Mtx, and Mtx-SLF could all
activate transcription, but Dex-Dex and Dex-SLF could not [25]. Second, the
directionality of the system is important for a strong transcription read-out. We
reported that the Dex-Mtx yeast three-hybrid system showed higher levels of
transcription activation when DHFR was fused to the DBD than when fused to
the AD [30]. Third, as with any CID application, the ligandlreceptor pair must
be considered in the context of the host cell line. For example, the Dex/GR
interaction is dependent on associated heat shock proteins. Thus, the KD of
this interaction is significantly higher in S. cerevisiae, in which there are only
homologous heat shock proteins, than in the native mammalian background.
Also, this CID pair cannot be used in E. coli, in which there are no such
homologous heat shock proteins. Finally, there are also more subtle effects.
For example, for reasons we do not understand, only the E. coli DHFR, not the
murine homolog, is functional in the Dex-Mtx yeast three-hybrid system [30].
4.1.3.2 The Genetic Assay

For a laboratory new to the three-hybrid assay, we recommend beginning with
the yeast two-hybrid system, which is based on reconstitution of a eukaryotic
transcriptional activator protein. Not only is this assay straightforward to
practice but also all the necessary strains and plasmids are commercially
available. As discussed below, however, there are potential advantages to
working in E. coli or using a nontranscription-based assay. Several E. coli-
based transcription assays and general protein complementation assays (PCA)
have now been developed as two-hybrid assays. Notably, while the E. coli
transcription assays have proven amenable to the introduction of small
molecule CIDs, the PCAs have not.
4.1.3.2.1 The Yeast n-Hybrid System

There are two key versions of the yeast two-hybrid system. The GAL4 system
originally introduced by Fields and Song uses the DBD and the AD of the
yeast GAL4 gene [ l ] . The LexA system introduced by Brent and coworkers
uses the E. coli DBD LexA and the E. coli B42 AD [31]. Over time, these
two systems have benefited from a number of improvements. Convenient
DBD and AD vectors were developed to carry diverse bacterial drug-resistance
markers, yeast origins of replication, and yeast auxotrophic markers. These
technical improvements facilitate the testing of large pools of protein variants
(ca lo6) using growth selections. In addition to the basic activator system,
reverse and split-hybrid systems were developed to detect the disruption of
protein-protein interactions, and a transcriptional repressor-based system has
been reported [32, 331. Today components for these systems are commercially
4. I Chemical Complementation: Bringing the Power ofGenetics t o Chemistry I 211
available, including Stratagene and Clontech, which market the Gal4 system,
Origene, for the LexA system, and Invitrogen, which offers versions of both
systems. All of the basic features of the two-hybrid system have been covered
already in several excellent reviews and the chapters on methods.
In our laboratory we have used the Brent two-hybrid system to build our
Dex-Mtx yeast three-hybrid system. We favor the Brent system, which uses
LexA, an E. coli transcription factor, and B42, an artificial activator isolated
from E. coli genomic DNA. Both LexA and B42 are orthogonal to standard
yeast genetic tools and nontoxic to the yeast cell, yet the artificial LexA-B42
transcriptional activator is on par with the strongest transcriptional activators
endogenous to S. cerevisiae [31].Moreover, the LexA system permits the use of
the tightly regulated GAL1 promoter to drive the expression of the LexA DBD
and B42 AD-protein fusions by varying the ratio of galactose and glucose in the
growth medium. As reported by Lin et al., we use pMW103, a multicopy 2~
plasmid with a HIS3 maker, to encode the LexA DBD fusions and pMW102,
a multicopy 2,u plasmid with a TRPl marker, to encode the B42 AD fusions.
Rather than the original EGY48 LEU2 selection strain, we chose the FY251
strain (MATa trplA63 his3A200 ura3-52 leuZAlGal+), which provides an
additional selective marker for greater flexibility. The LEU2 or URA3 markers
can then be used either for the transcription activation growth selection or
introduction of additional plasmids. In this initial publication, we then used the
lacZ reporter plasmid pMW112, which encodes the lacZ gene under control of
eight tandem LexA operators. Thus, small molecule CID-induced transcription
activation could be detected using standard lacZ transcription assays either on
plates or in liquid culture [25]. Further optimization of the yeast three-hybrid
system in our lab led us to conclude that integration of either the AD or DBD
into the yeast chromosome stabilizes the transcription read-out of the reporter
gene without loosing transcriptional strength, effectively reducing the number
of false positives in the detection of novel ligand-receptor interactions [34].
4.1.3.2.2 E. coli Transcription Activation Assays

Widespread use of the yeast two-hybrid system led several groups to develop
alternate transcription-based assays. While the yeast two-hybrid assay is quite
powerful, a bacterial equivalent would increase by several orders of magnitude
the number of proteins that could be tested, as the transformation efficiency
and doubling rate of E. coli are significantly greater than those of S. cerevisiae.
There may also be applications where it is advantageous to test a eukaryotic
protein in a prokaryotic environment, in which many pathways are not
conserved. The yeast two-hybrid assay cannot, however, be transferred directly
to bacteria since the components of the transcription machinery and the
mechanism of transcriptional activation differ significantly between bacteria
and yeast.
The first bacterial repressor assay was developed in 1990 by Sauer and
coworkers, who adapted a bacterial h transcriptional repressor system to
212
I read-out the GCN4-leucine zipper fusion [ 3 5 ] .The transcriptional repressor
4 Controlling Protein-Protein fnteractions
h d controls the lytic/lysogenic pathway in bacteriophage h. As a dimer,

hcI is bound to the h operator and prevents the expression of genes
involved in the lytic pathway, allowing integration of the h DNA into the
bacterial chromosome. Taking advantage of the hcI dimerization requirement,
Sauer and coworkers fused the DNA-binding domain of two hcI to a
GCN4 leucine zipper dimerization motive to restore a functional hybrid
repressor.
Seven years later, Hochschild and coworkers designed a bacterial two-
hybrid activation system based on the transcription mechanism of E. coli
RNA polymerase (RNAP) [ 3 6 ] .This assay is based on their observation that
binding of the C-terminus of the a subunit of the RNAP (a-CTD) to an
upstream element leads to transcription activation of a downstream gene. To
create a bacterial two-hybrid system, the authors replaced the a-CTD with
the C-terminus of the transcriptional repressor hc1 (hcI-CTD), generating a
ahcI chimera. Binding of the transcriptional repressor hcI to the h operon,
leads to recruitment of RNAP via the ahcI chimera, which in turn directs
transcription activation of a reporter gene downstream of the h operon. By
simply replacing the ahcI chimera with arbitrary protein-protein interactions,
they created a bacterial two-hybrid activation system. This technology was
successfully applied to detect two interacting yeast proteins, Gal4 and Galll,
fused to hcI and a-NTD (N-terminus of the alpha subunit of the RNAP)
respectively (Fig. 4.1-6).
Our development of a successful yeast three-hybrid system and the
advantages promised by an analogous system in bacteria, led us to construct
a bacterial three-hybrid system from the RNAP two-hybrid system developed
by Hochschild and coworkers [ 3 7 ] . We chose to adapt this assay because it
is a transcriptional activation system, and reconstitution of transcriptional
activation should be largely conformation independent. The key to converting
this two-hybrid assay into a three-hybrid system was the design of a dimeric
ligand that could bridge hcI and a-NTD through the receptors of the ligand.
For the bridging small molecule, we chose to prepare a heterodimer of Mtx and
Fig. 4.1-6 The bacterial two-hybrid system and Y. Binding ofthe Acl repressor t o the A
developed by Hochschild and coworkers. operon followed by dirnerization o f X and Y
The Acl repressor and the a-subunit o f recruits RNAP leading t o transcription
RNAP are fused t o two arbitrary proteins, X activation o f a downstream reporter gene.
4. I Chemical Complementation: Bringing the Power ofGenetics t o Chemistry I 213
a synthetic analogue of FK506 (SLF).We call this heterodimer Mtx-SLF. We

did not pursue building a bacterial three-hybrid system based on the Mtx-Dex
heterodimer previously used in our yeast three-hybrid system because the
Dex/GR interactions require heat shock proteins that are absent in E. coli. The
heterodimer Mtx-SLF gives a strong transcription read-out in the E. coli RNAP
three-hybrid system, providing a robust platform €or high-throughput assays
based on protein-small molecule interactions.
4.1.3.3 Protein Complementation Assay

All of the above assays are based on transcription of a reporter gene. A
different method for studying protein-protein interactions is the use of a
PCA. Here an enzyme with a phenotype detectable via either a screen or
a selection is divided into two nonfunctional fragments that are fused to
proteins to be tested for dimerization. If the tested proteins dimerize, the two
enzyme fragments are brought into close proximity leading to reconstitution
of enzyme activity (Fig. 4.1-7) [38, 391. Since PCAs are independent ofthe cell’s
transcription machinery, they can be used to detect protein interactions
in any cell type or cell compartment in vivo or in vitro. Furthermore,
PCAs can potentially quantify protein-protein interactions since there is a
simple relationship between protein dimerization and reconstituted enzyme
activity. PCAs have been developed using a variety of proteins including
B-galactosidase, B-lactamase, DHFR, GFP (green fluorescent protein), and
YFP (yellowfluorescent protein) 140-421.
For example, in a proof of principle paper, Michnick and coworkers showed
that mDHFR can be split into two fragments that show no detectable
Fig. 4.1-7 Protein complementation reconstituted enzyme activity on their own

assays. A protein that carries out a (blue and green), but can effectively
detectable function is separated into two reconstitute enzyme activity when fused t o
fragments that show no detectable two interacting proteins, X and Y.
214
I reconstituted enzyme activity on their own but can effectively reconstitute
enzyme activity when fused to two interacting proteins. Bacteria expressing

a functionally reassembled mDHFR can easily be selected since mDHFR
activity is essential for growth of E. coli in the presence of trimethoprim,
which selectively inhibits bacterial DH FR but not its eukaryotic counterpart
mDHFR. Further, the mDHFR PCA works as a selection system in eukaryotic
cells deficient in endogenous DHFR activity [43]. In a remarkable application
of this system, Michnick and coworkers were able to detect a protein-protein
interaction, locate the interaction to a specific cell compartment, and place
the interaction in a signal transduction pathway by doing a single assay based
on the DHFR PCA in mammalian cells deficient of DHFR [44].Specifically,
they examined protein interactions in the well-studied signal transduction
pathway of receptor tyrosine kinase, which mediates control of initiation of
translation in eukaryotes. From 35 interactions tested, the DHFR PCA selection
identified 14 interacting partners that were localized to specific intracellular
compartments using fluorescein-Mtx,a fluorophore in which the Mtx portion
binds to the reconstituted DHFR with nanomolar affinity. The position of
the protein interaction in the signal transduction pathway was determined
by using three small molecule inhibitors known to act at key points of the
pathway.
In view ofthe advantages PCAs would bring to the detection ofprotein-small
molecule interactions, our laboratory has made some efforts to develop a small
molecule PCA three-hybrid assay, though without success [45]. Specifically,we
tested both the Mtx-SLF adenylate cyclase PCA and the Mtx-SLF b-lactamase
PCA in E. coli (E. Althoff, V. Cornish, unpublished results). In addition,
we tested a Dex-Mtx GFP PCA also in E. coli in collaboration with Regan
and coworkers (E. Althoff, V. Cornish, T. Magliery, L. Regan, unpublished
results). From both, a simple thermodynamic consideration and these results,
we hypothesize that without the high degree of cooperativity found in
the transcription-based assays, the PCAs cannot detect a three-component
interaction.
4.1.3.4 Problem Choice

The two-hybrid assay was originally used simply for cloning proteins based on
their interaction with other proteins in a given biological pathway. However,
the more recent development of one- and three-hybrid assays opens the door
to studying DNA, RNA, and small molecule interactions, and even catalysis.
Though developed as a genetic assay for cloning, there is no reason that the
n-hybrid assays cannot be used for a broad range of applications, including
drug discovery, directed evolution, and enzymology.
It is interesting to consider how well suited the two-hybrid assay is for
its original conception - the discovery of new proteins on the basis of their
binding to other known proteins - particularly as this assay begins to be carried
out on a genome-wide scale. An important paper that bears on this question,
4.1 Chemical Complementation: Bringing the Power ofGenetics to Chemistry 1 215
in our opinion, comes from Golemis and Brent, in which they estimated that
the KD cutoff for the yeast two-hybrid assay is ca 1 p M [4G].Assuming that
the proteins are being expressed at ca 1 p M concentrations, the two-hybrid
assay can only detect relatively high-affinity interactions (ca K D = 1 pM).
Thus, while the two-hybrid assay is quite successful at identifying new
interactions, it is probably not appropriate to assume that a high-throughput
two-hybrid assay gives a snapshot of all interactions. In fairness, however,
it should be pointed out that traditional affinity chromatography approaches
are even further impaired because they rely on the natural abundance of
any given protein in the cell. Extending this analysis to drug discovery
using the small molecule three-hybrid assay, it is our opinion that the three-
hybrid assay was long underutilized because the original systems had low
sensitivity owing to the CID anchor. Recently, we have shown that our
Mtx three-hybrid system has a KD cutoff of ca 100nM [29].Consistent
with this idea, GPC Biotech reported last year the use of the Mtx three-
hybrid system for identification of protein targets of CDK inhibitors [47].
Interestingly, Hochschild and coworkers have shown that they can build
additional sensitivity into their bacterial two-hybrid assay by adding cooperative
interactions [48].
The n-hybrid assay can also be used for directed evolution. For example,
Pabo and coworkers have adapted a bacterial one-hybrid assay to evolve zinc-
finger variants with defined DNA-binding specificities [49].Starting with a
three zinc-finger protein that has nanomolar affinity for its DNA-binding
site, the authors replaced the binding site for the third zinc finger with a
new DNA sequence and then randomized the third finger to evolve a zinc-
finger variant with increased affinity for the target sequence. Impressively,
the evolved zinc finger showed DNA affinity within 10-fold of the wt protein,
KD = 0.01 nM, and a 10- to 100-fold preference for the modified over the
wt DNA sequence. Given the low K D cutoff and the fact that the n-hybrid
assay is governed by equilibrium binding, there are two likely limitations to
using this assay for directed evolution. First, the assay cannot effectively detect
initial, weak binders. Second, the assay is limited in its ability to distinguish
evolved variants on the basis of improvements in KD since energy differences
of only a few kilocalories per mole determine whether a molecule is bound
at equilibrium. In theory, however, these limitations could be overcome by
varying the concentration of the n-hybrid components or, again, by building in
a series of tunable, cooperative interactions. Pabo and coworkers, then, choose
their problem well. They began with a zinc-finger protein with two out of three
zinc fingers intact. This initial binding affinity enabled them to select good
binders in a single round of selection, rather than trying to improve binding
affinity through multiple rounds of selection. A similar analysis suggests that
the n-hybrid assays may be ideally suited to catalysis applications since large
differences in catalytic activity are needed to significantly affect the half-life of
product formation.
216
I 4 Controlling Protein-Protein lnteractions
4.1.4
Applications
Although introduced only in 1989, the yeast two-hybrid assay has emerged as
an integral tool for biology research. Two-hybrid screens now appear regularly
in the biology literature. Genome-widetwo-hybrid screens are even the focus of
major research publications. Somewhat surprisingly then, there have been few
applications of the related n-hybrid technologies to detect protein interactions
with DNA, RNA, and small molecules, or applications beyond cloning. Here
we look at more recent applications of n-hybrid assays with an eye for asking
whether this discrepancy results from the relative power of these different
n-hybrid assays or rather the biases of current research.
4.1.4.1 Protein-Protein lnteractions

Traditional genetic assays and more recently the yeast two-hybrid assay have
been primarily used to identify natural protein-protein interactions. Two-
hybrid screens are now fully integrated into the biologist’s toolbox and
appear routinely in the published literature. Almost half of the published
protein-protein interactions to date have been detected, at least in part, using
the yeast two-hybrid assay [SO]. Beyond these simple cloning applications,
the two-hybrid assay would seem perfectly suited for genomics. For example,
automation techniques were used to identify all possible protein-protein
interactions in S. cerevisiae [51]. Every open-reading frame encoding a protein,
ca GOOO in S. cerevisiae, was fused both to a DNA-binding domain and an AD,
and the two fusion libraries were screened against one another. The major
challenge in this project was how to transform all combinations of the GO00
DBD and GOOO AD fusions into yeast and then how to assay so many cells. Since
a library of lo7 is at the limit of the transformation efficiency of yeast, it is in
theory achievable. Uetz and coworkers compared two approaches. In the first
approach, they explicitly mated haploid mating type (MATa) cells containing
192 DBD fusions with haploid MATa cells containing the GOOO AD fusions
in a spatially addressable format, such as microtiter plate, and assayed each
well using a HIS3 growth selection. In the second one, MATa cells containing
the GOOO DBD fusions were mated with MATa cells containing the GOOO AD
fusions, and only diploids that survived in a LEU2 growth selection were
arrayed and analyzed individually. Interestingly, there were significantly more
“hits” in the first spatially addressable format, underscoring the importance of
parameterizing new methods for high-throughput screening and the problem
of distinguishing false positives and negatives in genomics. This example
highlights how well suited the n-hybrid assays are for extracting some of the
information provided by recent genome sequencing efforts.
While the two-hybrid method has been extensively used to detect natural
protein-protein interactions, it should also be well suited for protein evolution.
Brent and coworkers demonstrated that the two-hybrid assay can be used to
4. J Chemical Complementation: Bringing the Power ofGenetics to Chemistry I 217
Table4.1-1 The sequences and binding affinities of 14 different

aptamers for binding to Cdk2 isolated in a yeast two-hybrid system
Aptamer KO (n M) Amino acid sequence
Pep1 ND[~~ ELRHRLGRAL SEDMVRGLAW GPTSHCATVP GRSDLWRVIR

Pep2 *
64 16 LVCKSYRLDW EAGALFRSLF
pep3 112 4~17 YRWQQGWPS NMASCSFRQ
pep4 ND SSFSLWLLMV KSIKRAAWEL GPSSAWNTSG WASLSDFY
pep5 52f3 SVRMRYGIDA FFDLGGLLHG
Pep6 ND RVKLGYSFWA QSLLRCISVG
pep7 ND QLYAGCYLGV VIASSLSIRV
Pep8 3nf5 YSFVHHGFFN FRVSWREMLA
pep9 ND QQRFVFSPSW FTCAGTSDFW GPEPLFDWTR D
Peplo *
105 10 QVWSLWALGW RWLRRYGWNM
Pep11 87 7* WRRMELDAEI RWVKPISPLE
Pep12 ND RPLTGRWVVW GRRHEECGLT
pep13 ND PVCCMMYGHR TAPHSVFNVD
pep14 ND WSPELLRAMV AFRWLLERRP
a ND - not determined
identify peptide aptamers that inhibit Cdk2 from a library of random peptide
sequences (Table 4.1-1) [52]. The 20-residue peptide library was displayed in
the active site loop of E. coli thioredoxin (TrxA).The TrxA loop library was
fused to the AD, and Cdk2 was fused to the DBD. In a single round of assay,
6 x lo6 TrxA-AD transformants, a very small percentage of the 20mers
possible, were tested for binding to LexA-Cdk2. From this assay, they isolated
66 colonies that activated transcription of both a LEU2 and a lacZ reporter
gene. Remarkably, these colonies converged on 14 different peptide sequences
that bound Cdk2 with high affinity. Using surface plasmon resonance, the
peptide aptamers were shown to bind Cdk2 with KDs of 30-120 nM. In kinase
inhibition assays, the peptide aptamers had ICsos for the CdkZ/cyclin E kinase
complex of 1- 100 nM. What is particularly impressive about this experiment is
that nanomolar affinity ligands are being isolated in a single round of selection
from a library only on the order of 106-108. Similar results have been obtained
using peptide aptamers in a traditional genetic selection [53].
Given the success of this and related “aptamer” selections, it is somewhat
surprising that these “aptamer” scaffolds are not more widely used.
There are several potential advantages to directed evolution over traditional
monoclonal antibody technology for generating selective binding proteins.
Optimistically, six months are required from the start of immunization,
through immortalization, and finally screening to generate a monoclonal
antibody. On the other hand, if several peptide aptamer libraries were
maintained for routine use, the libraries could be screened against a new target,
false positives could be sorted out, and biochemical assays could validate a
target in less than a month and at considerably less expense. Moreover, protein
218
I scaffolds other than antibodies may prove more robust for use as reagents and
4 Controlling Protein-Protein lnteractions
therapeutic applications. Perhaps because monoclonal antibody technology

has become so robust over the years, the momentum does not seem to be
there to seriously explore replacing this technology with directed evolution. It
is also interesting to compare these “aptamer” scaffolds to chemical genetic
approaches for generating inhibitors for a broad array of biological targets.
4.1.4.2 DNA-Protein Interactions

Just as the yeast two-hybrid assay can be used to detect protein-protein inter-
actions, transcriptional activators can be used directly to detect protein-DNA
interactions. In truth, this type of experiment was done before the one-hybrid
assay was conceptualized as such. For example, as early as 1983 a His6 +
Pro Mnt variant was generated that preferentially binds a mutant Mnt oper-
ator using a transcription-based selection [54]. A plasmid encoding Mnt was
mutagenized both by irradiation with UV light and by passage through a
mutator strain. The mutant plasmids were then introduced into E. coli and
selected against binding to the wt operator and for binding to the mutant
operator. Because there are a variety of convenient reporter genes, the E. coli
was engineered to link DNA recognition to cell survival in both the negative
(selection against binding to the wt operator) and the positive (selection for
binding to the mutant operator) directions. Binding to the wt Mnt operator was
selected against by placing a tet resistance (tetR)gene under negative control
of the wt Mnt operator. If a Mnt mutant bound the wt operator, it would
block synthesis of the tetR gene, and the E. coli cells would die in the presence
of tetracycline. Then Mnt variants with altered DNA-binding specificity were
selected for on the basis of immunity to infection by a P22 phage containing
a mutant Mnt operator. The mutant Mnt operator controlled synthesis of the
proteins responsible for lysing the bacterial host. If a Mnt variant could bind
to this mutant operator, it would turn off the lytic machinery, and the bacteria
would survive phage infection. Four independent colonies were isolated from
the two selections. Again, only a single round of selection was required for each
step. All four colonies encoded the same His6 + Pro mutation, two by a CAC
+ CCC and two by a CAC + CCT mutation. Not only did these mutants bind
to the mutant operator but they also did not bind efficiently to the wt operator.
More recently, Pabo and coworkers adapted a bacterial two-hybrid assay into
a bacterial one-hybrid system to evolve zinc-finger variants with defined DNA-
binding specificities [49]. In this assay, three tandem zinc fingers function as
the DBD of this one-hybrid system and are fused to Gall1 protein, known
to dimerize with Ga14, which is fused to the RNA polymerase. Binding of
the three tandem zinc fingers to a specific DNA sequence upstream of the
reporter gene, mobilizes the RNAP to the promoter region of the reporter gene
and initiates transcription thereof (Fig. 4.1-8).This assay allows testing f 1 0 8
protein variants per round of selection. However, if all three zinc fingers were to
be randomized simultaneously it would create 8 x protein variants (using
4. I Chemical Complementation: Brhging the Power ofGenetics to Chemistry I 219
1 round of -
s e T I d g I
F3 ZF
2F3 F
DNAbindiny 18fe Reporter ene
Fig. 4.1-8 Development ofzinc fingers the cy-subunit o f RNAP. I f ZF3 bound t o the
specific for a specific DNA sequence using a first site with high affinity, the RNAP
one-hybrid assay adapted from a bacterial complex would be recruited, activating
two-hybrid system. Zinc fingers (ZF) 1, 2, transcription o f a HIS3 reporter gene.
and 3 from the Zif268 protein were fused to Significantly, in just one round o f assay,
the Call 1 protein. The Gal4 protein, which several proteins were identified that bound
binds Gall 1 with high affinity, was fused to specifically to the target DNA sequence.
24 codons at six amino acids per three zinc finger = (246)3),which cannot
be covered by this high-throughput method. Thus, the authors are limited to
randomizing one finger at a time, while keeping the other two unchanged. We
believe that conserving the high affinity of two zinc fingers for the DNA may be
important for the success of Pabo and coworkers’ directed evolution, because
starting a directed evolution with a high-affinity protein for DNA ensures the
evolution of proteins within the dynamic range of the n-hybrid system. For this
zinc-finger evolution, they created a library of ca 10’ variants, and identified
a total of nine sequences that bound specifically to three target DNAs with a
preference of 10-to100-fold for the modified over the wt DNA.
Comparing their results for the zinc-finger evolution using the bacterial
hybrid system with earlier results obtained in a similar zinc-finger evolution
study using phage display, Pabo and coworkers conclude that the affinity and
specificity of the selected zinc fingers is superior to those obtained in earlier
phage display studies. Moreover, the bacterial hybrid system is a more rapid
alternative to phage display because it permits isolation of functional fingers
in a single selection step instead of using multiple rounds of enrichments.
Speaking to the power of this approach, Sangamo uses a modified one-hybrid
assay for its selection of artificial DNA-binding proteins for commercial appli-
cations [55, 561. The success found here raises the question of other binding
interactions. One could speculate that the success here depends on starting
with two known zinc fingers with high affinity for their DNA target, except that
the protein “aptamer” scaffold selections described in the previous section
have begun with scaffolds with no measurable affinity for their protein target.
4.1.4.3 RNA-Protein Interactions

Before the development of the RNA three-hybrid system, identification of
protein-RNA interactions was limited to in vitro methods such as pull-down
assays using radiolabeled RNA. The introduction of the RNA three-hybrid
system has allowed not only the detection of well-studied protein-RNA
220
I pairs, but also the identification of novel protein-RNA
interactions. An
impressive application ofthis system is the cloning of a regulatory protein from
Caenorhabditis elegans that binds to the 3’ untranslated region of the FEM-3
(fern-33’UTR)and mediates the sperm/oocyte switch in hermaphrodites [57].
In this assay, a bifunctional RNA plasmid possessing fern-33’UTRand the RNA
ligand for the MS2 coat protein was introduced into a yeast strain expressing
a DBD-MS2 upstream of the HIS3 and lac2 reporter genes. Into this strain,
a complementary DNA-AD library was introduced. Cells containing a positive
protein-RNA interaction were selected first for HIS3 and lacZ activation
followed by screening for the presence of the bifunctional RNA plasmid. The
RNA plasmid from successful candidates was lost by reverse selection and
the cells were tested again for lacZ activation to reduce the number of false
positives. Cells that failed to activate lacZ after plasmid loss were tested for
fern-33’UTR binding specificity by reintroduction of the bifunctional RNA
plasmids. The protein encoded in the only cDNA-AD that satisfied all selection
and screening criteria was found to have 93% homology at the nucleotide level
with two genes encoded in the C. elegans genome. Further testings confirmed
these genes to be regulators of the sperm/oocyte switch in hermaphrodite
C. elegans. The specificity with which the RNA three-hybrid assay selected
just one protein from thousands for the selected protein-RNA interaction
illustrates the power of this assay for finding novel protein-RNA interactions
[lG].The recent discovery, for example, of RNAi highlights the need not to forget
about molecules other than proteins when carrying genetic assays [58, 591.
4.1.4.4 Small Molecule-Protein Interactions

While several small molecule three-hybrid systems have now been reported, it
was only in 2004 that such a system was used successfully for drug discovery
research. Specifically, Becker and coworkers reported that the Mtx yeast three-
hybrid system developed in our laboratory could be used to clone novel protein
targets of CDK inhibitors (Table 4.1-2) [47].The CIDs used in this study took
advantage of the low picomolar affinity of Mtx for DHFR [25]. Three known
CDK inhibitors, roscovitine, purvalanol B, and indenopyrazole, were linked to
Mtx and introduced into a yeast strain expressing a DBD-DHFR protein fusion
upstream of the HIS3 reporter gene and a library of kinase cDNAs linked to
a transcription AD. With this system they isolated, besides the known CDK
targets, 29 new kinase targets, 22 of which were either confirmed by in vitro
binding or enzyme inhibition assays. We speculate that the success here was
from the use of the high-affinity Mtx/DHFR anchor, which, as we recently
showed, gives a KD cutoff of ca 100 nM in the yeast three-hybrid assay.
4.1.4.5 Catalysis
The widespread utility and robust transcription read-out of the n-hybrid system
motivated several laboratories to develop general methods to detect enzyme
Table 4.1-2 Summary of biochemical analysis o f purvalanol

B-Protein interactions. Binding o f proteins t o immobilized
purvalanol B but not t o CDK-inactive-N6-methylated purvalanol B
was evaluated by immunoblotting or liquid chromatography-mass
spectrometry (for endogenous Jurkat proteins). Enzyme assays
were performed with purified enzymes and percentage inhibition
o f kinase activity observed with 1 pM purvalanol B
catalysis in vivo around the small molecule three-hybrid system. Several

proofs of principle papers have been published in the last few years, and
now the key test of these systems is whether they can be readily applied to
new chemistry. Toward that end, our laboratory recently demonstrated that
Chemical Complementation could be used to detect glycosidic bond formation
using a glycosynthase [GO].
We chose glycosidic bond formation because despite the fundamental role
of carbohydrates in biological processes and their potential use as therapeutics,
carbohydrates still remain difficult to synthesize. Specifically, this system was
developed using the E197A mutant of Cel7B from Humicola insolens, which
222
I had previously been shown to be an efficient“glycosynthase” using an a-fluoro
donor substrate. Here, enzymatic activity is detected as formation of a bond

between a Mtx-disaccharide-fluoridedonor (Mtx-Lac-F)and a dexamethasone-
disaccharide acceptor (Dex-Cel), which dimerize DBD-eDHFR and AD-GR
activating transcription of a LEU2 reporter gene that permits survival under
appropriate selective conditions. The growth advantage conferred by the
glycosynthase activity was used to select the Ce17B:E197A glycosynthase from
a pool of inactive variants (Cel7B).A mock library containing 100: 1 inactive
variants to glycosynthase underwent 400-fold enrichment in glycosynthase
after a single round of selection. Encouraged by this result, we carry out the
directed evolution of the glycosidase Cel7B to improve its glycosynthase activity
using a Glu197 saturation library. From a library of lo5 mutants, Ce17BE197S
was selected, which showed a fivefold improvement glycosynthase activity over
the known Ce17B:E197A glycosynthase (Table 4.1-3).
As intended, no further modifications to Chemical Complementation were
needed to extend this assay to detect glycosynthase activity. All that was
required to detect glycosynthase activity was to add the Dex and Mtx saccharide
substrates. This result shows the generality of Chemical Complementation,
and the ease with which it can be applied to new chemical reactions. Moreover,
it shows that Chemical Complementation can detect not only bond cleavage
but also bond formation reactions. Although, the size of the Glu197 saturation
library selected here was quite small, with only 32 members at the DNA level,
the transformation efficiency of S. cerevisiae, however, allows much larger
libraries, in the order of lo5-10’.
4.1.5
Future Development
The yeast two-hybrid assay no doubt will continue to be a mainstay technique

for the discovery of new protein-protein interactions. As biological pathways
Table 4.1-3 Clycosynthase activities and protein purification

yields for Cel7B variants
E197A E197S N196D/E197A
Specific activity (mol [F])/(min-’ mol [&I) 8 f2 40 f 5 7&1

Protein purification yield [nmol IF1] 6.1 4.6 7.3
Glycosynthase activity for tetrasaccharide synthesis from a-lactosyl fluoride and
p-nitrophenyl p-cellobioside (PNPC) was measured for the Humicola insolens Cel7B
variants in sodium phosphate buffer, pH 7.0, at room temperature. Specific activities
were determined by measuring the fluoride ion release rate by a fluoride ion selective
electrode. The protein purification yields are the yield of purified protein as determined
by western analysis from total cell culture.
References I 2 2 3
are being studied increasingly at the systems level, the two-hybrid assay has
the potential to be quite useful for analyzing total protein dynamics in living
cells. As seen in the PCA work by Michnick and coworkers, it is here that
technical improvements will prove important for the two-hybrid assay.
But it is the n-hybrid assays that have the potential to extend the power
of genetics to molecules other than proteins, such as nucleic acids and
small molecules. Despite this enormous potential, use of these other n-hybrid
assays pales in comparison to that of the two-hybrid assay. As we argue in
this chapter, a consideration of the published literature suggests that this
discrepancy is not the result of some inherent technical limitation to the
n-hybrid assays, but rather likely reflects the bias of current practice. Thus,
it is here that we believe there is most potential for the future development
of the n-hybrid assay and indeed genetics as a whole. Technically, the n-
hybrid assays probably still can be further developed for different classes
of molecules or posttranslational modifications. But already in their present
form these assays seem to have tremendous potential for biological discovery,
uncovering new functions for the many classes of molecules that make up
the cell.
These advances also expand our ability to engineer the cell to harness
its synthetic and functional capabilities for chemical discovery. Just as
protein engineering impacted both basic research and the biotechnology
and pharmaceutical industries in the last 25 years, so should cell engineering
in this century. Such systems engineering likely will require a much more
quantitative understanding of cellular processes, and accordingly the n-hybrid
assays will have to be characterized and rebuilt on this level, allowing, for
example, the K D cutoff of the assay to be dialed-in. Using this genetic assay
in entirely new ways should then open the door for new chemistry, with the
potential to match the complexity of cell function.
References
1. S. Fields, 0. Song, A novel genetic Manual, 1st ed., Cold Spring Harbor
system to detect protein-protein Lab Press, New York, 2002.
interactions, Nature 1989, 340, 5. B.T. Carter, H. Lin, V.W. Cornish, in
245-246. Directed Molecular Evolution of Proteins,
2. E.M. Phizicky, S. Fields, Protein- (Eds.: S. Brakmann, K. Johnsson),
protein interactions: methods for Wiley-VCH Verlag, Weinheim, 2002.
detection and analysis, Microbiol. Rev. 6. E. Phizicky, P.I. Bastiaens, H. Zhu,
1995,59,94-123. M. Snyder, S. Fields, Protein analysis
3. L. Keegan, G. Gill, M. Ptashne,
on a proteomic scale, Nature 2003,
Separation of DNA binding from the
transcription-activating function of a 422,208-215.
eukaryotic regulatory protein, Science 7. C.R. Geyer, R. Brent, Selection of
1986, 231,699-704. genetic agents from random peptide
4. E.A. Golemis, Protein-Protein aptamer expression libraries, Methods
Interactions: a Molecular Cloning En~ymol.2000,328,171-208.
224
I 4 Controlling Protein-Protein interactions
8. H. Lin, V.W. Cornish, In vivo 18. S.L. Schreiber, Chemistry and biology
protein-protein interaction assays: of the immunophilins and their
beyond proteins we would like to immunosuppressive ligands, Science
thank Tony Siu, Dr. Charles Cho, and 1991,251,283-287.
the members of our lab for their 19. E.J. Licitra, 7.0. Liu, A three-hybrid
helpful comments as we were system for detecting small ligand-
preparing this manuscript, Angew. protein receptor interactions, Proc.
Chem., Int. Ed. Engl. 2001,40, Natl. Acad. Sci. U.S.A. 1996, 93,
871-875. 12817-12821.
9. H. Lin, V.W. Cornish, Screening and 20. K. Baker, C. Bleczinski, H. Lin,
selection methods for large-scale G. Salazar-Jimenez,D. Sengupta,
analysis of protein function, Angew. S. Krane, V.W. Cornish, Chemical
Chem., Int. Ed. Engl. 2002, 41, complementation: a
4402-4425. reaction-independent genetic assay for
10. L.H. Hwang, L.F. Lau, D.L. Smith, enzyme catalysis, Proc. Natl. Acad. Sci.
C.A. Mistrot, K.G. Hardwick, E.S. U.S.A. 2002, 99,16537-16542.
Hwang, A. Amon, A.W. Murray, 21. S.M. Firestine, F. Salinas, A.E. Nixon,
Budding yeast Cdc20: a target of the S.J. Baker, S.j. Benkovic, Using an
spindle checkpoint, Science 1998, 279, AraC-based three-hybrid system to
1041- 1044. detect biocatalysts in vivo, Nut
11. J.A. Chong, G. Mandel, in The Yeast Biotechnol 2000, 18, 544-547.
Two-Hybrid System, (Eds.: B. P.L., 22. D.D. Clark, B.R. Peterson, Rapid
S. Fields), Oxford University Press,
detection of protein tyrosine kinase
New York, 1997, pp. 289-297. activity in recombinant yeast
12. M.K. Alexander, D. Bourns, V.A.
expressing a universal substrate, /.
Zakian, in Two-Hybrid Systems,
Proteome Res. 2002, I , 207-209.
Methods and Protocols, Vol. 177 (Ed.:
23. D.M. Spencer, T.J. Wandless, S.L.
P.N. MacDonald), Humana Press,
Schreiber, G.R. Crabtree, Controlling
New Jersey, 2001, pp. 241-260.
13. M.M. Wang, R.R. Reed, Molecular signal transduction with synthetic
cloning of the olfactory neuronal ligands, Science 1993, 262, 1019-1024.
24. J.F. Amara, T. Clackson, V.M. Rivera,
transcription factor Olf-1 by genetic
selection in yeast, Nature 1993, 364, T. Guo, T. Keenan, S. Natesan,
121-126. R. Pollock, W. Yang, N.L. Courage,
14. S. jaeger, G. Eriani, F. Martin, Results D.A. Holt, M. Gilman, A versatile
and prospects of the yeast three-hybrid synthetic dimerizer for the regulation
system, F E E S Lett. 2004, 556, 7-12. of protein-protein interactions, Proc.
15. B. Zhang, B. Kraemer, D. SenGupta, Natl. Acad. Sci. U.S.A. 1997, 94,
S. Fields, M. Wickens, Yeast 10618-10623.
three-hybrid system to detect and 25. H. Lin, W. Abida, R. Sauer, W.V.
analyze interactions between RNA and Cornish, Dexamethasone-
protein, Methods Enzymol. 1999, 306, methotrexate: an efficient chemical
93-113. inducer of protein dimerization in
16. D.J. SenGupta, B. Zhang, B. Kraemer, vivo,J. Am. Chem. SOC.2000, 122,
P. Pochart, S. Fields, M. Wickens, A 4247-4248.
three-hybrid system to detect 26. S.J. Kopytek, R.F. Standaert, J.C. Dyer,
RNA-protein interactions in vivo, Proc. J.C. Hu, Chemically induced
Natl. Acad. Sci. U.S.A. 1996, 93, dimerization of dihydrofolate
8496-8501. reductase by a homobifunctional
17. N. Kley, Chemical dimerizers and dimer of methotrexate, Chem. Biol.
three-hybrid systems: scanning the 2000, 7,313-321.
proteome for targets of organic small 27. S. Gendreizig, M. Kindermann,
molecules, Chem. Biol. 2004, I I , K. Johnsson, Induced protein
599-608. dimerization in vivo through covalent
References I225
labeling,]. Am. Chem. SOC.2003, 125, 3 6. S.L. Dove, J.K. Joung, A. Hochschild,
14970-14971. Activation of prokaryotic transcription
28. S.S. Muddana, B.R. Peterson, Facile through arbitrary protein-protein
synthesis of cids: biotinylated estrone contacts, Nature 1997, 386, 627-630.
oximes efficiently heterodimerize 37. E.A. Althoff, V.W. Cornish, A bacterial
estrogen receptor and streptavidin small-molecule three-hybrid system,
proteins in yeast three hybrid systems, Angew. Chem., Int. Ed. Engl. 2002, 42,
Org. Lett. 2004, 6, 1409-1412. 2327-23 30.
29. K.S. de Felipe, B.T. Carter, E.A. 38. S.W. Michnick, I. Remy, F.X.
Althoff, V.W. Cornish, Correlation Campbell-Valois, A. Vallee-Belisle,
between ligand-receptor affinity and J.N. Pelletier, Detection of
the transcription readout in a yeast protein-protein interactions by protein
three-hybrid system, Biochemistry fragment complementation strategies,
2004,43,10353-10363. Methods Enzymol. 2000, 328, 208-230.
30. W.M. Abida, B.T. Carter, E.A. Althoff, 39. 1. Remy, J.N. Pelletier, A. Galarneau,
H. Lin, V.W. Cornish, Receptor- in Protein-Protein Interactions, (Ed.:
dependence of the transcription E. Golemis), Cold Spring Harbor
read-out in a small-molecule Laboratory Press, New York, 2001,
three-hybrid system, Chembiochem pp. 449-475.
2002,3,887-895. 40. S.W. Michnick, 1. Remy, F. Valois, in
31. J. Gyuris, E. Golemis, H. Chertkov, Methods in Enzymology,Vol. 14, (Eds.:
R. Brent, Cdil, a human G1 and S J. Abelson, S. Emr, J. Thorner),
phase protein phosphatase that Academic Press, London, 2000,
associates with Cdk2, Cell 1993, 75, pp. 208-230.
791-803. 41. F. Rossi, C.A. Charlton, H.M. Blau,
32. M. Vidal, R.K. Brachmann, A. Fattaey, Monitoring protein-protein
E. Harlow, J.D. Boeke, Reverse interactions in intact eukaryotic cells
two-hybrid and one-hybrid systems to by beta-galactosidase
detect dissociation of protein-protein complementation, Proc. Natl. Acad.
and DNA-protein interactions, Proc. Sci. U.S.A. 1997, 94,8405-8410.
Natl. Acad. Sci. U.S.A. 1996, 93, 42. T. Wehrman, B. Kleaveland, J.H. Her,
10315-10320. R.F. B a h t , H.M. Blau,
33. H.M. Shih, P.S. Goldman, A.J. Protein-protein interactions
DeMaggio, S.M. Hollenberg, R.H. monitored in mammalian cells via
Goodman, M.F. Hoekstra, A positive complementation of beta-lactamase
genetic selection for disrupting enzyme fragments, Proc. Natl. Acad.
protein-protein interactions: Sci. U.S.A. 2002, 99, 3469-3474.
identification of CREB mutations that 43. 1. Remy, S.W. Michnick, Clonal
prevent association with the selection and in vivo quantitation of
coactivator CBP, Proc. Natl. Acad. Sci. protein interactions with
U.S.A. 1996, 93, 13896-13901. protein-fragment complementation
34. K. Baker, D. Sengupta, G. Salazar- assays, Proc. Natl. Acad. Sci. U.S.A.
Jimenez, V.W. Cornish, An optimized 1999, 96,5394-5399.
dexamethasone-methotrexate yeast 44. I. Remy, S.W. Michnick, Visualization
3-hybrid system for high-throughput of biochemical networks in living
screening of small molecule-protein cells, Proc. Natl. Acad. Sci. U.S.A.
interactions, Anal. Biochem. 2003, 3 15, 2001. 98,7678-7683.
134-137. 45. E.A. Althoff, Engineering Ligand-
35. J.C. Hu, E.K. O’Shea, P.S. Kim, R.T. Receptor Interactions Using a Bacterial
Sauer, Sequence requirements for Three-Hybrid System, Columbia
coiled-coils: analysis with lambda University, New York, 2004.
repressor-GCN4 leucine zipper 46. J. Estojak, R. Brent, E.A. Golemis,
fusions, Science 1990, 250, 1400-1403. Correlation of two-hybrid affinity data
226
with in vitro measurements, Mol. Cell. cyclin-dependent kinase 2, Nature

Biol. 1995, 15, 5820-5829. 1996,380,548-550.
47. F. Becker, K. Murthi, C. Smith, 53. M. Yang, Z. Wu, S. Fields,
J. Come, N. Costa-Roldan, Protein-peptide interactions analyzed
C. Kaufmann, A. Hanke, S. Dedier, with the yeast two-hybrid system,
S. Dill, D. Kinsman, N. Hediger, Nucleic Acids Res. 1995, 23,
N. Bockovich, S . Meier-Ewert,A 1152-1156.
three-hybrid approach to scanning the 54. P. Youderian, A. Vershon, S . Bouvier,
proteome for targets of small molecule R.T. Sauer, M.M. Susskind, Changing
kinase inhibitors, Chem. Biol. 2004, 11, the DNA-binding specificity of a
211-223. repressor, Cell 1983, 35,777-783.
48. A. Hochschild, M. Ptashne, 55. S. Tan, D. Guschin, A. Davalos, Y.L.
Cooperative binding of lambda Lee, A.W. Snowden, Y. Jouvenot, H.S.
repressors to sites separated by Zhang, K. Howes, A.R. McNamara,
integral turns of the DNA helix, Cell A. Lai, C. Ullman, L. Reynolds,
1986,44,681-687. M. Moore, M. Isalan, L.P. Berg,
49. K. Joung, E. Ramm, C. Pabo, A B. Campos, H. Qi, S.K. Spratt, C.C.
bacterial two-hybrid selection system Case, C.O. Pabo, J. Campisi, P.D.
to study protein-DNA and Gregory, Zinc-finger protein-targeted
protein-protein interactions, Proc. gene regulation: genomewide
Natl. Acad. Sci. U.S.A. 2000, 97, single-gene specificity, Proc. Natl.
7382-7387. Acad. Sci. U.S.A. 2003, 100,
50. I. Xenarios, L. Salwinski, X.J. Duan, 11997-12002.
P. Higney, S.M. Kim, D. Eisenberg, 56. Sangamo, Biosciences, Vol. 2005,
DIP, the database of interacting 2005, pp. Sangamo Bio Science Inc,
proteins: a research tool for studying www.sangamo.com; Biotechnology
cellular networks of protein company focused on the research and
interactions, Nucleic Acids Res. 2002, development of novel transcription
30,303-305. factors for regulating human, plant,
51. P. Uetz, L. Giot, G. Cagney, T.A. and microbial genes.
Mansfield, R.S. Judson, J.R. Knight, 57. B. Zhang, M. Gallegos, A. Puoti,
D. Lockshon, V. Narayan, E. Durkin, S. Fields, J. Kimble, M.P.
M. Srinivasan, P. Pochart, Wickens, A conserved RNA-binding
A. Qureshi-Emili, Y. Li, B. Godwin, protein that regulates sexual fates in
D. Conover, T. Kalbfleisch, the C. elegans hermaphrodite germ
G. Vijayadamodar, M. Yang, line, Nature 1997, 390, 477-484.
M. Johnston, S. Fields, J.M. Rothberg, 58. G.J. Hannon, RNA interference,
A comprehensive analysis of Nature 2002,418, 244-251.
protein-protein interactions in 59. D.R. Engelke, J.J. Rossi, R N A
saccharomyces cerevisiae, Nature Interference, Methods Enzymology
2000,403,623-627. VO~. 392, 2005, 1-454.
52. P. Colas, B. Cohen, T. Jessen, 60. H. Lin, H. Tao, V.W. Cornish,
I. Grishina, J. McCoy, R. Brent, Directed evolution of a glycosynthase
Genetic selection of ueutide autamers via chemical comulementation. 1. Am.
that recognize and inhibit Chem. SOC.2004, iZG, 15051-15b59.
Chemical Biology
4.2 Contro//ing frote;n-frote;n Interactions I 227
4.2
Controlling Protein-Protein interactions Using Chemical inducers and
Disrupters of Dimerization
Tim Clackson
Outlook
Transient interactions between proteins are a common mechanism of

information transfer in biological systems. Chemical inducers of dimerization
allow these interactions to be brought under specific, real-time chemical
control, and have become established tools for cell biology research. This
chapter reviews the diverse types of ligands and cognate binding proteins
that can be used to control protein-protein associations, and discusses
the applications of the technology, both in basic research and in potential
therapeutic settings.
4.2.1
Introduction
Many cellular processes are triggered by the induced interaction of signaling

proteins [I, 21. Examples include the clustering of cell surface receptors
by extracellular growth factors and the subsequent stepwise recruitment
and activation of intracellular signaling proteins. Indeed, many signaling
cascades proceed almost entirely through such interactions, from the initial
extracellular receptor engagement through signaling to the nucleus, proximity-
driven activation of gene transcription, and subsequent effector steps such as
regulated protein secretion.
A chemical inducer of dimerization, or “dimerizer”, is a cell-permeant
organic molecule with two separate motifs each of which bind with high
affinity to a specific protein module. In principle, any cellular process that
is activated (or inactivated) by protein-protein interactions can be brought
under dimerizer control by fusing the protein(s) of interest to the binding
domain(s) recognized by the dimerizer. Addition of the dimerizer then non-
covalently links the chimeric signaling proteins, activating the cellular event
that it controls (Fig. 4.2-l(a)).
This conceptually simple approach, described more than 10 years ago [ 3 ] ,
has proved broadly applicable and has been widely adopted not only in the
chemical biology community but also across biological research in general. It
has also spawned several related technologies, such as systems for “reverse
dimerization”. This chapter will review the various protein-ligand systems
that have been designed, and describe examples of their use, both in research
and drug discovery.
Copyright 0 2007 WlLEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
228
I
Fig. 4.2-1 Schemes showing the principle cells. (b) Heterodimerization. In this
of chemically induced dimerization o f example, one fusion protein is membrane
proteins. (a) Homodimerization. in this tethered; the other is expressed as a soluble
example, fusion proteins are tethered t o the cytosolic protein and is recruited to the cell
cell membrane through fusion to a peptide membrane upon addition ofdimerizer.
sequence that becomes myristoylated inside
4.2.2
Development o f Chemical Dimerization Technology
The concept of chemically induced dimerization was introduced by Schreiber

and Crabtree and their colleagues in 1983 [ 3 ] .The inspiration for their work
came from the mechanism of the natural product immunosuppressive drug
FKSOG, which binds simultaneously to FK50G binding protein 12 (FKBP12
or FKBP), a ubiquitous peptidyl-prolyl cis-trans isomerase, and the signaling
phosphatase calcineurin, inhibiting the latter’s phosphatase activity and hence
blocking signaling. This suggested a general way to bring any protein-protein
interaction under small molecule control. Bifunctional organic molecules
could be designed, with two protein-binding moieties. Target proteins for
these molecules could be appended to the signaling domains of interest at
the genetic level to create chimeric proteins. Addition of the bifunctional
organic molecule to cells expressing the chimeric proteins would induce
dimerization of the engineered proteins, mimicking the natural activation
process (Fig. 4.2-l(a)).
4.2 Controlling Protein-Protein lnteractions I 229
In the initial paper, Spencer et al. used the FK506-FKBP interaction itselfto
provide building blocks for the dimerizer system. They generated a dimerizer
by linking two molecules of FK506 to create FK1012, a molecule that can bind
two FKBP domains simultaneously (but not calcineurin). They then created
a suitable variant of their target protein, the T-cell receptor zeta chain, by
appending three copies of FKBP. Addition of FK1012 to cells expressing the
engineered protein led to clustering of the protein and activation of authentic
downstream cellular events.
FK1012 is a homodimerizer, with two identical binding motifs. It was
quickly recognized that induced heterodimerization should also be feasible, by
fusing the two proteins of interest to different protein-binding domains
that are targeted by a suitable nonsymmetrical dimerizer (Fig. 4.2-l(b))
[4-61. Dimerizers used for such approaches have included, for example,
dimers of FK506 and cyclosporine (FK-CsA) [4]. However, it is most
straightforward to simply use the bifunctional natural products directly.
Rapamycin, an immunosuppressive drug related to FK506, functions by
binding simultaneously to FKBP and the protein kinase FRAP/mTOR [7]and
can be used to heterodimerize proteins fused to these protein modules [5, 61.
The ability to induce a protein-protein interaction inside cells provided a
general way to generate inducible alleles of signaling and other proteins - one
that can be activated in real time, in contrast to classical genetic approaches [8].
This suggested a series of important applications, ranging from mechanistic
analysis of protein function to understanding the consequences of activating
signaling in whole cells and even transgenic animals. Initial hopes have been
more than fulfilled, and several hundred papers have now been published that
describe diverse uses of the technology [9].
4.2.3
Dimerization Systems
A major focus, following the initial reports, was on refining the tools used to
achieve chemical dimerization - in particular, the dimerizers themselves.
Important aims were to improve chemical feasibility, specificity, and
pharmacological properties, the latter to permit studies in experimental
animals. This section will describe the options that have evolved for
different types of induced dimerization. The focus will be on the FKBP-
based technologies and applications developed by the author’s group and its
collaborators, although other systems will also be mentioned.
4.2.3.1 Homodimerization
A series of FK1012 variants has been described with different linkers and, in
some cases, facile syntheses using FK506 as a starting point (Fig. 4.2-2) [lo].
All of these can be used to effect dimerization between FKBP fusion proteins.
230
FK1012 Linker X
OH Z
OMe
OMe Me0 H2 ii3
AP1510
Fig. 4.2-2 First-generation homodimerization agents FK1012 and

AP1510. These molecules are able to induce homodimers between
wild-type FKBP fusion proteins. The variant FK1012s differ only in
the linker region.
We sought to develop fully synthetic, lower-molecular-weightreplacements

for FK1012, to allow full exploration of structure-activityrelationship (SAR)
and optimization of pharmaceutical properties. These efforts led to the design
of A m 1 0 (Fig. 4.2-2), which comprises two synthetic FKBP-binding ligands
joined by a short spacer [ll].Although AP1510 binds less tightly to FKBP
than FK1012, it is more potent in most applications, perhaps due to a greater
conformational rigidity.
FK1012s or AP1510 can be used to induce discrete homodimers between
molecules ofan FKBP fusion protein when that protein contains a single FKBP
domain. Higher-order clustering can, in principle, be achieved by including
two or more FKBP domains, although the geometry and stoichiometry of the
resulting complexes are difficult to control.
In addition to FKBP-based systems, homodimerization has also been
achieved using the naturally dimeric natural product coumermycin, which
can dimerize proteins fused to Escherichia coli DNA gyrase [12].
4.2.3.2 Heterodimerization
Although early heterodimerization studies used molecules such as FK-CsA,
the most common approach is the use of rapamycin, which naturally functions
4.2 C o n t r d i n g Protein-Protein lnteractions 1 231
as a heterodimerizer [7]. One protein is fused to FKBP, and the other to the
-100 amino acid domain of FRAP/mTOR which binds to the FKBP-rapamycin
complex, termed FRB (for FKBP-rapamycin binding domain) [13]. FKBP and
FRB have no detectable affinity for one another in the absence of rapamycin,
yet the drug binds simultaneously to both proteins with high affinity. Thus,
addition of rapamycin to cells expressing FKBP and FRB fusion proteins leads
to strictly drug-dependent heterodimerization.
Because of its inherent directionality, heterodimerization is often a more
precise tool than homodimerization and can be used in many configurations.
For example, a protein can be inducibly recruited to the plasma membrane
by fusing it to one of the drug-binding domains, and fusing the other
to a myristoylation motif (see Fig. 4.2-l(b)) [4]. A major application of
heterodimerization is in the control of transcription (see Section 4.2.3.4) [5, 61.
In addition to the rapamycin system, other heterodimerization systems
have been described, including dimers of methotrexate and dexamethasone
to target dihydrofolate reductase and glucocorticoid receptor fusion proteins,
respectively [14, 151, and dimers of estrogen analogs and biotin analogs to
target fusions to estrogen receptors and streptavidin [16].
4.2.3.3 Refining Ligand-Protein Pairs: “Bumps and Holes”

Although the ligand-protein interfaces provided by nature are good starting
points for building dimerization systems, there is room for improvement.
In particular, it is desirable to maximize the selectivity of the ligands for
their target fusion proteins compared to endogenous proteins, to ensure
that the ligands have no effect on natural cellular physiology. In the case of
FKBP-based homodimerization, the ligands might interfere with the natural
function of FKBP as a modulator of transmembrane signaling proteins
(although this is unlikely given the high intracellular FKBP levels). There
is also the possibility that dimerizer potency could be blunted by sequestration
of the drug into the extensive cellular FKBP pool. In the case of rapamycin-
based heterodimerization, adding rapamycin to cells inhibits endogenous
mTOR/FRAP activity, inducing antiproliferative effects.
The solution devised for these problems has become known as “bumps
and holes”, and takes advantage of the fact that the sequences of the drug-
binding domains are available for genetic modification, since they are being
expressed heterologously in the cell (Fig. 4.2-3). In this approach, the ligand
is modified to introduce a steric clash (a “bump”) that abolishes binding to
the target protein. Then, using structure-guided or screening approaches, one
or more compensating mutations are identified in the drug-binding domain
that restore the ability to bind the modified ligand (a “hole”). The bumped
dimerizer is now able to bind only to the modified drug-binding domain of the
chimeric protein and not to endogenous proteins.
In addition to affording highly specific protein-ligand pairs, this interface-
engineering approach has also provided insights into the structural and
232
I 4 C o n t r o h g Protein-Protein interactions
Fig. 4.2-3 Engineering specificity into FKBP system. Bumped “rapalogs” are able to
dimerizing agents using “bumps and induce heterodimers between FKBP fusion
holes”. (a) Homodimerization system. proteins and FRB fusion proteins engineered
Bumped homodimers are able t o induce with a specific “hole”. The compounds can
dimers between FKBP fusion proteins still bind to endogenous FKBP, but have
engineered with appropriate “holes”, while reduced or eliminated antiproliferative
evading endogenous FKBP. activity because this complex cannot bind
(b) Rapamycin-based heterodimerization effectively t o endogenous FRAP/mTOR.
4.2 C o n t r o h g Protein-Protein interactions 1 233
thermodynamic plasticity of small molecule-protein interfaces [ 17, 181. The

approach has echoes in many other areas of chemical biology, in particular
the pioneering work of Shokat and coworkers in engineering allele-selective
kinase inhibitors and substrates (see Chapter 3.1).
4.2.3.3.1 Bumped Hornodirnerizers

Highly potent and selective hornodimerizers have been designed by engineer-
ing the interface between AP1510 and FKBP. X-ray crystallographic analysis
suggested that alkyl substitution of a specific carbonyl group on the FKBP lig-
and would destroy binding and that loss-of-size mutations at FKBP residue F36
should restore affinity (Fig. 4.2-3(a)).Subsequent studies resulted in AP1903,
-
a bumped dimerizer with very high affinity (& 0.1 nM) and 1000-foldselec-
tivity for the FKBP mutant F36V compared to the wild-type protein (Fig. 4.2-4)
[ 191. Related dimerizers with different linkers but equivalent potencies have
also been described (such as AP20187; see Fig. 4.2-4).
These dimerizers, in general, have proved to be much more potent than their
unbumped cousins and suitable for in vivo studies in a range of experimental
animals. Numerous studies have reported the use of FKBP-F36V fusion
proteins and AP20187 to control cellular processes [9],and AP1903 itself has
completed a phase I clinical trial in healthy human volunteers, where it was
found to be safe and well tolerated [20].
4.2.3.3.2 Bumped Heterodirnerizers: “Rapalogs”

“Bumped” raparnycin systems have been developed by chemically modifying
the FRB-binding portion of rapamycin, to generate “rapalogs” with reduced
~ Dtrnerizer x Linker Y
O H
Fig. 4.2-4 Bumped homodimerizers. These compounds are designed to bind potently
and specifically to the F36V mutant of FKBP.
234
I 4 Controlling Protein-Protein hteractions
or eliminated FRB binding and, hence, biological activity. Compensating mu-

tations in FRB have then been identified using structure-guided mutagenesis
and screening/selection, which can then be introduced into target protein FRB
fusions (Fig. 4.2-3(b)).
Several rapamycin bump-hole solutions have been described (Fig. 4.2-5).
In one, bulky substitutions at the Cl6 methoxy group of rapamycin were used
to abrogate binding to wild-type FRB. In a structure-guided screen, mutation
of FRB residue Thr2098 (which abuts Cl6) to Leu was found to allow binding
of a wide range of Cl6-substituted rapalogs (Ref. 21 and our unpublished
work) (Fig. 4.2-5). In fact, the T2098L substitution is a versatile “tag” that
functionally accommodates numerous rapamycin analogs, modified at C 16
and/or other positions, as well as rapamycin itself. As a result it is routinely
incorporated into all our FRB fusion protein constructs and has been used
with C16-bumped rapalogs in numerous in vitro and in vivo studies.
Another system uses C20-methallyl rapamycin (Ma-rap; Fig. 4.2-5), which
is unable to bind wild-type FRB and is therefore devoid of FRAP/mTOR
inhibitory activity [22]. Ma-rap was found in a screen to bind very specifically
to a triple mutant of FRB known as PLF [22]. Using the PLF mutant of FRB,
Ma-rap can be used to achieve highly selective heterodimerization of proteins
Rapamycinl
AP rapalogs Rapalog R16 R32
Me0
Rapamycin OMe II
0
OMe /I
Me0 AP22594 0
OMe
AP1861 II
0
Me0 ~
MA-rap
AP21967 I
OH
~
L7
AP23102 HN,koa I1
0
J,
Fig. 4.2-5 Bumped rapalogs used as rapamycin), in which the triene portion of
heterodimerizers. The rapalogs listed in the rapamycin is modified as shown, is active in
panel are all active in dimerization systems dimerizeration systems incorporating the
incorporating the T2098L mutation in FRB specific FRB triple mutation PLF
fusion proteins. Ma-rap (CZO-methallyl (K2095P/T2098L/W2101 F) [22].
236
Fig. 4.2-6 Schemes for controlling transcription using chemically induced dimerization.
(a) Control using homodimerizers. (b) Control using heterodimerizers (rapalogs).
of FKBP binds to itself in a manner that can be reversed using an FKBP

ligand [27]. The phenomenon was initially noted in a two-hybrid assay
and subsequently confirmed by biophysical studies on the purified protein.
Although the monomer-monomer affinity is relatively weak (& 30 yM),
the interaction is specific, and concatenated F36M domains form discrete
-
aggregates by virtue of multivalent binding. Interactions can be completely
disrupted by addition of a monomeric “bumped” ligand analogous to one
half of AP1903 (see Fig. 4.2-4),suggesting that the F3GM mutation, similar to
F36V, introduces a “hole” in the protein surface. This result also implies that
the proteins interact through their ligand-binding sites - a finding confirmed
crystallographically (see next section).
This system can be used to reversibly aggregate any protein to which multiple
F36M domains are attached. For example, intracellular expression of a fusion
between four F36M domains and green fluorescent protein (GFP) results in
large fluorescent intracellular aggregates that disperse within minutes upon
adding monovalent ligand [27]. Removal of ligand leads to rapid re-formation
of aggregates.
4.2 C o f l t r o h g Protein-Protein Interactions 1 237
Fig. 4.2-7 Comparison of conventional and proteins. (b) Reverse dimerization system
"reverse" FKBP dimerization systems. using monomeric ligand (AP21998) and
(a) induced dimerization using bumped F36M fusion Proteins.
homodimerizer AP20187 and F36V fusion
4.2.3.6 Structural Basis of Induced Dimerization

One attraction of using inducible dimerization is that the interacting molecules
are understood in great detail. The high-resolution X-ray structures of
all three FKBP-based complexes in the dimerized state are available - the
AP1903 homodimerization system (our unpublished work), rapamycin
heterodimerization system [7], and the F36M reverse dimerization system
[27] (Fig. 4.2-8). These structures have been invaluable for engineering and
optimizing the drug-protein interfaces. In addition, they provide important
guidance on the orientations in which the binding proteins can be fused
to heterologous proteins of interest, in order to induce dimerization of the
appropriate geometry.
4.2.4
Applications
With protein-protein interactions being pervasive throughout biology,

chemically controlled dimerization has proved to be a remarkably versatile
technology, and more than 300 papers have described use of the approach
[9]. These applications can be broadly separated into two classes. The first
is the use of dimerization technologies in basic and applied biological
research, to understand the functions of proteins or pathways, and to create
238
Fig. 4.2-8 X-ray crystal structures of (b) Structure o f raparnycin in complex with
dimerized complexes. In each case, protein wild-type FKBP green and the FRB domain
N-termini are marked in blue and C-termini of FRAP/rnTOR gray (Protein Data Bank
in red. (a) Structure ofAP1903 in complex (PDB) ID: 4FAP) [7]. (c) Structure ofthe
with two molecules o f FKBP-F36V (our homodimeric complex o f the
unpublished data). The two proteins are self-associating FKBP mutant F36M
brought close to each other in a “parallel” (PDB ID: 1 EYM) [27]. The two molecules
configuration, and intramolecular interact through their ligand-binding sites in
drug-drug interactions are extensive. an “antiparallel” configuration.
inducible animal models of disease. The second is the direct use of the
technologies in potential therapeutic applications, generally in the context of
cell or gene therapies. Examples of both will be reviewed in the following
sections.
4.2 Contro//ing Protein-Protein interactions 1 239
4.2.4.1 Analysis o f Protein Function

A very common and powerful application is creating an inducible allele of a
protein in order to dissect its function. Typically, the protein of interest is fused
to a dimerization domain, cells expressing the fusion protein are exposed to
dimerizer, and the consequences are assessed by any appropriate technique,
such as assaying downstream signaling or profiling mRNA expression. The
key advantages of chemically induced dimerization are that activation can be
restricted to one particular protein and can be initiated and then monitored in
“real time” by addition of drug. This allows very specific questions to be asked
about the function of a protein or of the pathway that it controls.
Over 100 proteins have been successfully brought under dimerizer control
in this way 191. In many cases, these are signaling proteins such as cell
surface receptors, intracellular protein kinases, and signaling proteases such as
caspases. Often, the experimental goal is simply to test whether dimerization is
sufficient to activate the protein. For example, such studies support an induced
proximity model for activation of Raf-1 [12], caspase 8 [28], and leukemia-
associated fusion proteins [29]. However, more complex questions can be
asked, particularly through combined use of homo- and heterodimerization.
Dimerizable alleles of the epidermal growth factor (EGF) receptor family have
beenused to show that EGFRl homodimers, EGFR2 (HER2)homodimers, and
EGFR1-EGFR2 heterodimers all have different effects on breast tumor cell
proliferation and invasion in three-dimensional culture models [30]. By using
dimerizable alleles, the roles of each complex could be probed independently
and without the complicating effects of the natural receptor ligands.
More broadly, dimerization can be used to gain control over a specific
molecular process or even cellular event that can be induced by proximity:
examples include cell adhesion and rolling [31],DNA looping [32], recombinase
enzymatic activation [33], RNA splicing [34], protein splicing [35], and
glycosylation [3G]. These inducible alleles allow the process in question to
be dissected, but often also provide tools that have applications in their
own right: for example, the use of inducible recombinase activity to achieve
temporal control of gene deletion [33].
4.2.4.2 Animal Models of Disease

Because the inducing compounds are suitable for use i n vivo, and are generally
orthogonal to mammalian biology, studies can also be performed in a whole-
animal context. A common approach is to generate transgenic mice in which
expression of the fusion protein is restricted to a tissue of interest. These mice
allow study of protein or pathway function i n vivo, but can also provide an
inducible model of any disease that is associated with activation (or inhibition).
For example, transgenic mice expressing inducible versions of either fibroblast
growth factor receptor 1 (FGFR1) or FGFR2 specifically in the prostate have
been used to show that only the former receptor can induce the neoplasia and
hyperplasia typical of early prostate cancer [37] (Fig. 4.2-9).These mice could
240
Fig. 4.2-9 Use of dimerization technology prostate tissue. Administration of dimerizer

t o probe the roles of FGF receptor subtypes (AP20187) induced prostate neoplasia and
in prostate cancer development. Transgenic hyperplasia only in the FGFRl mice,
mice were prepared in which implicating this receptor subtype in early
dimerizer-inducible alleles of FGFRl or Prostate cancer development.
FCFRZ were expressed exclusively in
also be used to test potential drugs for the ability to block the induced FGFRl
signal and its consequences.
A general approach to creating animal models of degenerative diseases is to
induce apoptosis specifically in target tissues or organs. This can be achieved
through tissue-specific expression of inducible alleles of the Fas receptor or
through any number of downstream caspases. Mice in which hepatocytes can
be inducibly ablated represent a valuable model for liver diseases [38], and
mice expressing inducible caspase in macrophages are a valuable resource for
probing the roles of these cells [39].
4.2.4.3 Regulated Cell Therapies

A powerful use of dimerizer technology is in controlling the proliferation,
differentiation, and/or survival of genetically engineered cells [40]. Cell
therapies have broad potential to treat diseases but suffer from limitations,
including the inability to manipulate the cells once introduced into the body.
Blau and coworkers have used dimerizer-activated alleles of cytokine receptors
to acquire control over cell proliferation. Cells modified with a gene of
interest are also engineered with this “cell growth switch”; administration of
dimerizer then stimulates proliferation only of modified cells, in vitro or in vivo
(Fig. 4.2-10). This approach has been successfully demonstrated in small [41]
and large animal studies [42]and offers a way to expand very rare modified cell
populations into a therapeutic range. Other signaling proteins can be used to
achieve different outcomes - for example, dimerizing CD40 induces a potent
4.2 C o n t r o h g Protein-Protein lnteractions I 241
Fig. 4.2-10 Application o f a receptor. Although transduced cells are rare,

dirnerization-based “cell growth switch” to following infusion in vivo they can be
achieve expansion of genetically modified selectively expanded by administering
cells. Hernopoietic cells are transduced with dimerizer (AP20187), which induces their
a retrovirus encoding a therapeutic gene proliferation and differentiation. Expansion
along with a fusion between FKBp.F36V and can akO be carried O u t in Cell CultUre.
the signaling domain o f rnpl, a cytokine
immunomodulatory response in cells and could be used as part of a cellular

cancer vaccine [43].
The opposite approach to inducing proliferation is to induce cell death,
using conditional alleles of Fas or caspases. A Fas “death switch” has been
used to eliminate engineered T cells infused into animals [44],as a model
for depleting the T cells that cause graft-versus-host disease following bone
marrow transplantation [45].More potent caspase-based switches can also be
used [46] and, in principle, could be installed into any therapeutic cell to provide
a “fail-safe” mechanism for cell destruction should adverse events ensue.
4.2.4.4 Regulated Transcription and Regulated Gene Therapies

Use of dimerizers to control transcription of engineered target genes represents
an alternative to technologies such as the tetracycline-inducible (“Tet”) system
242
I that rely on allosteric activation [47].A key advantage of dimerizer approaches
is the very low background transcription in the absence of dimerizer, most

likely because the AD is physically separated from DNA prior to activation (see
Fig. 4.2-6) [25].This feature has been exploited to achieve inducible expression
of proteins that are highly toxic, such as diphtheria toxin [21],or highly potent,
such as activators of viral replication [48].The modular nature of the dimerizer
system also facilitates control of endogenous (as opposed to introduced) genes,
achieved by fusing FKBP modules to a DBD engineered to recognize the
appropriate natural promoter [49].
There is considerable interest in the use of dimerizer-controlled gene ex-
pression in regulated gene therapies. Extensive work has gone into optimizing
the rapamycin-inducible system for potential clinical use, including iden-
tifying rapalogs with optimal pharmacology, and developing “humanized”
DNA-binding and activation domains, so that all protein components of the
system are of human original to minimize immunogenicity in a clinical set-
ting (reviewed in Refs 25, 47). The rapamycin system has been successfully
incorporated into most gene therapy vector contexts, including adenovirus and
adeno-associated virus (AAV) [SO], onco-retrovirus, lentivirus, herpes simplex
virus, and naked DNA (reviewed in Ref. 25). Tightly controlled erythropoietin
(Epo) production in response to rapamycin has been demonstrated in nonhu-
man primates for over 6 years following a single intramuscular administration
Fig. 4.2-11 Use of dimerizer-controlled indicated doses (mg/kg, intravenously

transcription to achieve long-term regulated triangles) induced discrete and reversible
expression of a therapeutic gene in a increases in serum Epo levels (black
nonhuman primate. At time zero, the symbols, left axis) and commensurate
animal received a single intramuscular elevations in hematocrit (open symbols,
injection of adeno-associated viral vectors right axis). Inducibility has persisted for over
encoding primate erythropoietin (Epo) 6 years t o date and the study is ongoing.
under the control o f the rapamycin- This figure was originally published in Blood
regulated dimerization system. Subsequent [51]. 0The American Society of
administrations o f rapamycin at the Hematology.
4.2 Controlling Protein-Protein lnteractions I 243
ofAAV vectors (Fig. 4.2-11)[51].Rapamycin- or rapalog-controlled gene expres-
sion has also been demonstrated in animal models after gene delivery to the
liver [52], eye [53],and brain [54].These studies support the concept ofbringing
therapeutic protein production under dimerizer control in the clinical setting.
4.2.4.4.1 Three-hybridApproaches
Another use of dimerizer-controlled transcription is in three-hybrid assays
[14, 151. In these applications, the “third hybrid” is the dimerizer, and gene
activation serves merely as an assay to report on the interaction between a
dimerizer and the two fusion proteins, rather than as the end in itself. Three-
hybrid assays can be used to identify target proteins for a given small molecule
(by incorporating the molecule into a dimerizer and screening against a cDNA
library fused to an AD; see Chapter 18.2), or to identify small molecules that
bind a given target (by cloning the target as an AD fusion protein and screening
against a library of dimerizers in which one monomer is diversified). More
recently, they have been applied to directed evolution of the catalytic properties
of proteins using “chemical complementation” (see Chapter 4.1).
4.2.4.5 Regulated Secretion Using “Reverse Dimerization” System

The reverse dimerization system (Section 4.2.3.5) has been used to develop an
approach for the regulated pulsatile secretion of proteins [55].The aim of this
work was to mimic the natural, rapid release of proteins such as insulin using a
regulated gene therapy strategy. Since control at the transcriptional level takes
place on the timescale of days, it is necessary to directly regulate the secretion
process. To achieve this, the protein ofinterest is expressed as a secreted protein
fused to tandem copies of the FKBP-F36M domain, resulting in the formation
of aggregates in the endoplasmic reticulum (ER) that are too large to exit to the
Golgi (Fig. 4.2-12). Addition of a monomeric ligand breaks up the aggregates,
allowing the proteins to proceed to the Golgi, where they are processed by the
endogenous protease furin, releasing the authentic protein for secretion.
Using this system, rapid pulses of insulin secretion could be iteratively
induced by adding ligand to cells in uitro (Fig. 4.2-12(c)).Furthermore, in a
mouse model of insulin-dependent diabetes, induced release of insulin could
transiently reverse hyperglycemia [55].More recently, we have incorporated the
system into an AAV vector and demonstrated long-term inducible secretion
following gene transfer into mice (our unpublished studies). These findings
suggest that regulated secretion could be useful for regulating the expression
of proteins that require delivery in rapid pulses.
The ability to reversibly induce large protein aggregates has also provided a
useful tool in basic research on the mechanisms of intracellular transport - for
example, allowing demonstration, for the first time, of the existence of
“megavesicles” that traffic between the ER (endoplasmatic reticulumn) and
Golgi [56].
244
I
4.2 C o n t r d h g Protein-Protein interactions I 245
4 Fig. 4.2-12 Use ofthe reverse dimerization Cells expressing an insulin-F36M fusion
system t o control protein secretion in protein were exposed t o AP21998 for three
mammalian cells. (a) Scheme for inducible 1-h periods as indicated, and medium was
secretion. (b) Chemical structure o f collected every hour and assayed for insulin
monomeric ligand AP21998. (c) Pulsatile levels [55].
release o f insulin from engineered cells.
4.2.5
Future Development
Inducible dimerization technologies are now firmly established as research

tools. The components of the various systems are largely developed, although
refinements will likely continue in some areas - for example, the optimization
of protein-ligand pairings, particularly rapamycin analogs. A worthwhile goal
now within reach is the simultaneous regulation of multiple pathways or
proteins using dimerizers and binding proteins that are completely orthogonal
to one another [24].
Some of the most powerful research applications of the technology are only
now starting to be explored - a consequence of the time necessary to establish
transgenic mouse lines expressing appropriate fusion proteins. The next few
years will likely see many more reports using such mice to dissect the roles of
individual proteins and pathways in normal physiology and in disease.
Similarly, although the feasibility and promise of therapeutic uses of
dimerizer technology has been well established in animal models, translation
into the clinic has been slow owing to the general issues and complexities
associated with gene and cell therapies. As these issues are resolved, dimerizer
technology may have a key role to play in conferring control and safety on such
therapies.
Looking further ahead, interesting extensions of the dimerizer concept are
emerging. These include attempts to enhance the potency of drugs by linking
them to another small molecule, such as an FKBP ligand, that can recruit an
endogenous protein and improve overall binding affinity [57]. The ultimate
extrapolation of chemical dimerization would be dimerizers that bind directly
to native target proteins, as opposed to engineered fusion proteins. Attempts
to build fully synthetic transcriptional activators that directly bind both DNA
and transcriptional regulators are a step in this direction [58],and compounds
that directly dimerize and activate cytokine receptors may, in time, become a
therapeutic alternative to recombinant proteins such as Epo [59].
4.2.6
Conclusion
Chemically controlled dimerization represents a clear and successful example

of how chemical biology approaches can “cross over” into mainstream biology
246
I and become established as powerful and generally accepted research tools. The
technology has contributed significant new insights into numerous biological

processes and, in turn, has inspired new directions in chemical biology
research. Both of these benefits are likely to continue as the technology
becomes more broadly utilized.
Acknowledgments
I thank Len Rozamus, Xiaotian Zhu, Vic Rivera, and Renate Hellmiss
for preparing the figures. I am indebted to my many ARIAD colleagues
and collaborators, past and present, who have contributed to our work on
dimerization technology. Particular thanks are due to Vic Rivera for numerous
discussions over many years. Kits for the regulated dimerization of proteins
may be requested through ARIAD’s website at www.ariad.com/regulationkits.
References
1. G.R. Crabtree, S.L. Schreiber, humanized system for pharmacologic

Three-part inventions: intracellular control of gene expression, Nat. Med.
signaling and induced proximity, 1996,2,1028-1032.
Trends Biochem. Sci. 1996, 21, 7. J. Choi, J. Chen, S.L. Schreiber,
418-422. J. Clardy, Structure of the
2. J.D. Klemm, S.L. Schreiber, G.R. FKBP12-rapamycin complex
Crabtree, Dimerization as a regulatory interacting with the binding domain
mechanism in signal transduction, of human FRAP, Science 1996, 273,
Annu. Rev. Immunol. 1998, 16, 239-242.
569-592. 8. L.A. Banaszynski, T.J.Wandless,
3. D.M. Spencer, T.J. Wandless, S.L. Conditional control of protein
Schreiber, G.R. Crabtree, Controlling function, Chem. Biol. 2006, 13, 11-21.
signal transduction with synthetic 9. A complete list of publications
ligands, Science 1993, 262, 1019-1024. describing use of chemical
4. P.J. Belshaw, S.N. Ho, G.R. Crabtree, dimerization technologies can be
S.L. Schreiber, Controlling protein found at, http://www.ariad.com/
association and subcellular regulationkits.
localization with a synthetic ligand 10. S.T. Diver, S.L. Schreiber, Single-step
that induces heterodimerization of syntheses of cell permeable protein
proteins, Proc. Natl. Acad. Sci. U.S.A. dimerizers that activate signal
1996, 93,4604-4607. transduction and gene expression, J .
5. S.N. Ho, S.R. Biggar, D.M. Spencer, Am. Chem. SOC.1997, 119,5106-5109.
S.L. Schreiber, G.R. Crabtree, Dimeric 11. J.F. Amara, T. Clackson, V.M. Rivera,
ligands define a role for transcriptional T. Guo, T. Keenan, S. Natesan,
activation domains in reinitiation, R. Pollock, W. Yang, N.L. Courage,
Nature 1996,382,822-826, D.A. Holt, M. Gilman, A versatile
6. V.M. Rivera, T. Clackson, S. Natesan, synthetic dimerizer for the regulation
R. Pollock, J.F. Amara, T. Keenan, S.R. of protein-protein interactions, Proc.
Magari, T. Phillips, N.L. Courage, Natl. Acad. Sci. U S A . 1997, 94,
F. Cerasoli Jr. D.A. Holt, M. Gilman, A 10618-10623.
References I247
12. M.A. Farrar, I. Alberol, R.M. safety and pharmacokinetics of a novel
Perlmutter, Activation of the Raf-1 dimerizer drug, AP1903, in healthy
kinase cascade by volunteers, /. Clin. Pharmacol. 2001,
coumermycin-induced dimerization, 41,870-879.
Nature 1996, 383, 178-181. 21. R. Pollock, R. Issner, K. Zoller,
13. 1. Chen, X.F. Zheng, E.J. Brown, S.L. S. Natesan, V.M. Rivera, T. Clackson,
Schreiber, Identification of an 11-kDa Delivery of a stringent dimerizer-
FKB P 12-rapamycin-binding domain regulated gene expression system in a
within the 289-kDa single retroviral vector, Proc. Natl.
FKBP12-rapamycin-associated protein Acad. Sci. U.S.A. 2000, 97,
and characterization of a critical serine 13221-13226.
residue, Proc. Natl. Acad. Sci. U.S.A. 22. S.D.Liberles, S.T. Diver, D.J. Austin,
1995, 92,4947-4951. S.L. Schreiber, Inducible gene
14. E.J. Licitra, J.O. Liu, A three-hybrid expression and protein translocation
system for detecting small using nontoxic ligands identified by a
ligand-protein receptor interactions, mammalian three-hybrid screen, Proc.
Proc. Natl. Acad. Sci. U.S.A. 1996, 93, Natl. Acad. Sci. U.S.A. 1997, 94,
12817- 12821. 7825-7830.
15. H. Lin, W.M. Abida, R.T. Sauer, V.W. 23. K. Stankunas, J.H. Bayle, J.E.
Cornish, Dexamethasone- Gestwicki, Y.M. Lin, T.J. Wandless,
methotrexate: an efficient chemical G.R. Crabtree, Conditional protein
inducer of protein dimerization in alleles using Knockin mice and a
vivo,J. Am. Chem. SOC.2000, 122, chemical inducer of dimerization,
4247-4248.
Mol. Cells 2003, 12, 1615-1624.
16. S.S. Muddana, B.R. Peterson, Facile
24. J.H. Bayle, J.S. Grimley, K. Stankunas,
synthesis of acids: biotinylated estrone
J.E. Gestwicki, T. J. Wandless, G.R.
oximes efficiently heterodimerize
Crabtree, Rapamycin analogs with
estrogen receptor and streptavidin
differential binding specificity permit
proteins in yeast three hybrid systems,
orthogonal control of protein activity,
Org. Lett. 2004, 6, 1409-1412.
Chern. Biol. 2006, 13, 99-107.
17. W. Yang, L.W. Rozamus, S. Narula,
25. R. Pollock, T. Clackson,
C.T. Rollins, R. Yuan, L.J. Andrade,
Dimerizer-regulated gene expression,
M.K. Ram, T.B. Phillips, M.R. van
Curr. Opin. Biotechnol. 2002, 13,
Schravendijk, D. Dalgarno,
T. Clackson, D.A. Holt, Investigating 459-467.
protein-ligand interactions with a 26. W. Yang, T.P. Keenan, L.W. Rozamus,
mutant FKBP possessing a designed X. Wang, V.M. Rivera, C.T. Rollins,
specificity pocket, J. Med. Chem. 2000, T. Clackson, D.A. Holt, Regulation of
43,1135-1142. gene expression by synthetic
18. T. Clackson, Redesigning small dimerizers with novel specificity,
molecule-protein interfaces, Curr. Bioorg. Med. Chern. Lett. 2003, 13,
Opin. Struct. Biol. 1998, 8, 451-458. 3181-3184.
19. T. Clackson, W. Yang, L.W. Rozamus, 27. C.T. Rollins, V.M. Rivera, D.N.
M. Hatada, J.F. Amara, C.T. Rollins, Woolfson, T. Keenan, M. Hatada, S.E.
L.F. Stevenson, S.R. Magari, S.A. Adams, L. J. Andrade, D. Yaeger, M.R.
Wood, N.L. Courage, X. Lu, van Schravendijk, D.A. Holt,
F. Cerasoli Jr, M. Gilman, D.A. Holt, M. Gilman, T. Clackson, A
Redesigning an FKBP-ligand interface ligand-reversible dimerization system
to generate chemical dimerizers with for controlling protein-protein
novel specificity, Proc. Natl. Acad. Sci. interactions, Proc. Natl. Acad. Sci.
U.S.A. 1998, 95,10437-10442. U.S.A. 2000, 97, 7096-7101.
20. J.D. Iuliucci, S.D. Oliver, S . Morley, 28. M. Muzio, B.R. Stockwell, H.R.
C. Ward, I. Ward, D. Dalgarno, Stennicke, G.S. Salvesen, V.M. Dixit,
T. Clackson, H.J. Berger, Intravenous An induced proximity model for
4 Controlling Protein-Protein interactions
248
I caspase-8 activation,J. Biol. Chew. Conditional cell ablation by tight
1998, 273,2926-2930. control of caspase-3 dimerization in
29. K.M. Smith, R.A. Van Etten, transgenic mice, Nat. Biotechnol. 2002,
Activation of c-Abl kinase activity and 20,1234-1239.
transformation by a chemical inducer 39. S.H. Burnett, E.J. Kershen, J. Zhang,
of dimerization, J. Bzol. Chew. 2001, L. Zeng, S.C. Straley, A.M. Kaplan,
276,24372-24379. D.A. Cohen, Conditional macrophage
30. L. Zhan, B. Xiang, S.K. Muthuswamy, ablation in transgenic mice expressing
Controlled activation of ErbBl/ErbB2 a Fas-based suicide gene, J. Leukocyte
heterodimers promote invasion of Biol. 2004, 75, 612-623.
three-dimensional organized epithelia 40. T. Neff, C.A. Blau, Pharmacologically
in an ErbB1-dependent manner: regulated cell therapy, Blood 2001, 97,
implications for progression of 2535-2540.
ErbB2-overexpressingtumors, Cancer 41. L. Jin, H. Zeng, S. Chien, K.G. Otto,
Res. 2006,66,5201-5208. R.E. Richard, D.W. Emery, A.C. Blau,
31. X. Li, D.A. Steeber, M.L.K. Tang, M.A. In vivo selection using a cell-growth
Farrar, R.M. Perlmutter, T.F. Tedder, switch, Nat. Genet. 2000, 26, 64-66.
Regulation of L-selectin-mediated 42. R.E. Richard, R.A. De Claro, J. Yan,
rolling through receptor dimerization, S. Chien, H. Von Recum, J. Morris,
J . Exp. Med. 1998, 188,1385-1390. H.P. Kiem, D.C. Dalgarno,
32. S.L. Ameres, L. Drueppel, S. Heimfeld, T. Clackson, R. Andrews,
K. Pfleiderer, A. Schmidt, W. Hillen, C.A. Blau, Differences in
C. Berens, Inducible DNA-loop F36VMpl-based in vivo selection
formation blocks transcriptional
among large animal models, Mol.
activation by an SV40 enhancer,
Ther. 2004, 10, 730-740.
EMBOJ. 2005, 24,358-367.
43. B.A. Hanks, J. Jiang, R.A. Singh,
33. N. Jullien, F. Sampieri, A. Enjalbert,
W. Song, M. Barry, M.H. Huls, K.M.
J.P. Herman, Regulation of Cre
Slawin, D.M. Spencer, Re-engineered
recombinase by ligand-induced
complementation of inactive CD40 receptor enables potent
fragments, Nucleic Acids Res. 2003, 31, pharmacological activation of
e131. dendritic-cell cancer vaccines in vivo,
34. B.R. Graveley, Small molecule control Nat. Med. 2005, 11, 130-137.
of pre-mRNA splicing, R N A 2005, 11, 44. C. Berger, C.A. Blau, M.L. Huang, J.D.
355-358. Iuliucci, D.C. Dalgarno, J. Gaschet,
35. H.D. Mootz, T.W. Muir, Protein S. Heimfeld, T. Clackson, S.R. Riddell,
splicing triggered by a small molecule, Pharmacologically regulated
J . Am. Chem. SOC.2002, 124(31), Fas-mediated death of adoptively
9044- 9045. transferred T cells in a nonhuman
36. J.J.Kohler, C.R. Bertozzi, Regulating primate model, Blood 2004, 103(4),
cell surface glycosylation by small 1261-1269.
molecule control of enzyme 45. D.C. Thomis, S. Marktel, C. Bonini,
localization, Chew. Biol. 2003, 10, C. Traversari, M. Gilman,
1303-1311. C. Bordignon, T. Clackson, A
37. K.W. Freeman, B.E. Welm, R.D. Fas-based suicide switch in human T
Gangula, J.M. Rosen, M. Ittmann, cells for the treatment of
N.M. Greenberg, D.M. Spencer, graft-versus-host disease, Blood 2001,
Inducible prostate intraepithelial 97,1249-1257.
neoplasia with reversible hyperplasia 46. K.C. Straathof, M.A. Pule, P. Yotnda,
in conditional FG F R1 -expressing G. Dotti, E.F. Vanin, M.K. Brenner,
mice, Cancer Res. 2003, 63,8256-8263. H.E. Heslop, D.M. Spencer, C.M.
38. V.O. Mallet, C. Mitchell, J.E. Guidotti, Rooney, An inducible caspase 9 safety
P. Jaffray, M. Fabre, D. Spencer, switch for T-cell therapy, Blood 2005,
D. Arnoult, A. Kahn, H. Gilgenkrantz, 105,4247-4254.
References I 2 4 9
47. T. Clackson, Regulated gene 54. L.M. Sanftner, V.M. Rivera, B.M.
expression systems, Gene Ther. 2000, Suzuki, L. Feng, L. Berk, S. Zhou, J.R.
7, 120-125. Forsayeth, T. Clackson,
48. H. Chong, A. Ruchatz, T. Clackson, J. Cunningham, Dimerizer regulation
V.M. Rivera, R.G. Vile, A system for of AADC expression and behavioral
small-molecule control of response in AAV-transduced 6-OHDA
conditionally replication-competent lesioned rats, Mol. Ther. 2006, 13,
adenoviral vectors, Mol. Ther. 2002, 5, 167- 174.
195-203. 55. V.M. Rivera, X. Wang, S. Wardwell,
49. R. Pollock, M. Giel, K. Linher, N.L. Courage, A. Volchuk, T. Keenan,
T. Clackson, Regulation of D.A. Holt, M. Gilman, L. Orci,
endogenous gene expression with a F. Cerasoli Jr, J.E. Rothman,
small-molecule dimerizer, Nat. T. Clackson, Regulation of protein
Biotechnol. 2002, 20, 729-733. secretion through controlled
50. X. Ye, V.M. Rivera, P. Zoltick, aggregation in the endoplasmic
F. Cerasoli Jr, M.A. Schnell, G. Gao, reticulum, Science 2000, 287,826-830.
J.V. Hughes, M. Gilman, J.M. Wilson, 56. A. Volchuk, M. Amherdt,
Regulated delivery of therapeutic M. Ravazzola, B. Brugger, V.M.
proteins after in vivo somatic cell gene Rivera, T. Clackson, A. Perrelet, T.H.
transfer, Science 1999, 283, 88-91. Sollner, J.E. Rothman, L. Orci,
51. V.M. Rivera, G.P. Gao, R.L. Grant, Megavesicles implicated in the rapid
M.A. Schnell, P.W. Zoltick, L.W. transport of intracisternal aggregates
Rozamus, T. Clackson, J.M. Wilson, across the Golgi stack, Cell 2000, 102,
Long-term pharmacologically 335- 348.
regulated expression of erythropoietin 57. J.E. Gestwicki, G.R. Crabtree, I.A.
in primates following AAV-mediated Graef, Harnessing chaperones to
gene transfer, Blood 2005, 105, generate small-molecule inhibitors of
1424-1430. amyloid beta aggregation, Science
52. A. Auricchio, G.P. Gao, Q.C. Yu, 2004,306,865-869.
S. Raper, V.M. Rivera, T. Clackson, 58. C.Y. Majmudar, A.K. Mapp, Chemical
J.M. Wilson, Constitutive and approaches to transcriptional
regulated expression of processed regulation, Curr. Opin. Chem. Biol.
insulin following in vivo hepatic gene 2005, 9,467-474.
transfer, Gene Ther. 2002, 9, 963-971. 59. S.A. Qureshi, R.M. Kim, Z. Konteatis,
53. A. Auricchio, V. Rivera, T. Clackson, D.E. Biazzo, H. Motamedi,
E. O’Connor, A. Maguire, R. Rodrigues, J.A. Boice, J.R. Calaycay,
M. Tolentino, J. Bennett, J. Wilson, M.A. Bednarek, P. Griffin, Y.D. Gao,
Pharmacological regulation of protein K. Chapman, D.F. Mark, Mimicry of
expression from adeno-associated viral erythropoietin by a nonpeptide
vectors in the eye, Mol. Ther. 2002, 6, molecule, Proc. Natl. Acad. Sci. U.S.A.
238-242. 1999, 96,12156-12161.
Chemical Biology
250
I 4 Contro//;ng Prote;n-Protein interactions
4.3
Protein Secondary Structure Mimetics as Modulators o f Protein-Protein and
Protein- Ligand Interactions
Hang Yin and Andrew D. Hamilton
Outlook
The development of low-molecular-weight agents that modulate pro-

tein-protein interactions has been regarded as a difficult goal due to the
relatively large and featureless protein interfacial surfaces involved [l-31. Con-
ventional methods for identifylng inhibitors of protein-protein interactions
generally involve the preparation and screening of large chemical libraries to
discover lead compounds [4]. Despite significant advances in high-throughput
methods, screening a large number of compounds cannot guarantee the
delivery of potential drug candidates with necessary potency and selectivity.
Structure-based design is an area of great current interest and represents a
much-considered alternative to conventional methods. In this chapter, we will
review some representative studies ofusing synthetic agents that mimic protein
secondary structures in drug discovery, in particular, to target protein-protein
and protein-ligand interactions. These studies have expanded the horizon
of drug design, strengthened our understanding of protein-protein and pro-
tein-ligand interactions, and offered an economical alternative to conventional
screening methods.
4.3.1
Introduction
Modulating protein-protein interactions using synthetic compounds is a

highly active field in medicinal chemistry. Conventional targets for small
molecule agents are usually enzyme active sites within the interior of proteins
because: (a) the enzyme recognition sites are usually well-defined clefts or
cavities within the protein, with multiple points of contact often leading to
high affinity, (b) hydrogen bonding, salt bridges, and electrostatic interactions
play critical roles in the recognition of small molecules within the cavities,
so inhibitors containing complementary hydrogen-bond donors or acceptors
often work well, (c) native enzyme substrates can provide good models for the
inhibitor design, and (d) the assay methods to test these enzyme inhibitors are
well established and readily available.
In contrast, the development of synthetic agents that modulate pro-
tein-protein interactions is much more demanding even though it is
of great therapeutic value. In particular, approaches for the disruption
of protein-protein interactions are made more difficult because: (a) large
ISBN: 978-3-527-31150-7
4.3 Protein Secondary Structure Mimetics I251
and mobile protein surfaces are involved in protein-protein interactions,

(b) natural protein-binding partners are usually not good models for small
molecule antagonist design as the binding regions are often discontigu-
ous and relatively featureless, (c) few “druglike” small molecules have been
identified from library screening as effective disrupters of large surface
area contact, and (d) finally, biological assays that evaluate the functional
consequence of disrupting protein-protein interactions are less readily
available.
In spite of these daunting challenges, several successful approaches
have appeared in recent years using small molecule agents to mediate
protein-protein interactions. General methodologies, such as virtual and
fragment screening, tethering techniques, and computer-aided inhibitor
design, have been established and applied in drug discovery. The rational
design of synthetic inhibitors that mimic protein secondary structural domains
is an active area of research in the development of protein-protein disrupters.
Such structural mimetics of a-helices and B-turns or strands are anticipated
to maintain the biological functions of their protein progenitors and should
possess biological activity.
4.3.2
History and Development
The rational design of low-molecular-weight inhibitors that disrupt pro-

tein-protein interactions is challenging because of their large interfaces.
Often, as much as 1600 A2 of interfacial area with 10 to 30 amino acid residues
(170 atoms) from each protein are buried upon complex formation [l].To ef-
fectively compete with such a vast binding surface using low-molecular-weight
agents is a daunting task. Despite this, as early as 1925 it had been recognized
that morphine competes with peptide ligands in binding to protein receptors
[5]. In 1980, Farmer, with great foresight, proposed the use of cyclohexane as
a scaffold to project functionality as a mimetic of protein secondary structures
[6]. Moreover, several groups reported, in the late 1980s, nonpeptide agents
that mimic B-turns or strands and this area has recently been summarized by
Fairlie and Loughlin [7].
In a milestone analysis, the energetics for human growth hormone (hGH)
binding to the extracellular domain of its receptor (hGHbp) was studied
[S], leading to the conclusion that the critical binding region of one protein
partner might be reduced to a small domain, and therefore, mimicked by
relatively simple molecules. By conducting alanine scanning of the interfacial
residues, Clackson and Wells found that a small and complementary set of
these residues, the “hot spot”, accounts for most of the free energy change
in the complex formation. They showed that the hGHbp residues Trpl04
and Trp169 (Fig. 4.3-1) dominate the binding interface, with each donating
over -4.5 kcal mol-.’ to a total binding energy of -12.3 kcal mol-’ for the
252
Fig. 4.3-1 X-ray crystal structure o f the h C H (purple)/hCHbp (cyan) complex. Side
chains of the critical amino acid residues (hot spots) are shown in stick representation.
complex formation. In a similar manner, Aspl71, Lys172, and Thr175 ofhGH

make substantial contributions to the binding [9]. In contrast, half of the 31
interfacial residues do not make significant contributions.
Some of the earliest work on protein surface mimetics came, in the early
1990s, from Hirschmann, Nicolaou, and Smith, who reported a series of
nonpeptide agents that mimic b-strands and B-turns. These compounds were
used to develop inhibitors of several protein targets, such as HIV protease
and somatostatin (SRIF) receptors [10,11]. In an early example of synthetic
mimics of a-helices, Honvell et al. showed that 1,G-disubstituted indanes
present functionalities in a similar spatial arrangement to the i and i + 1
residues of an a-helix [12]. However, these mimics do not cover a surface
area large enough to sufficiently represent an a-helical mimetic. In an at-
tempt to improve on this, Kahne and coworkers have reported an a-helix
mimic, based on an oligosaccharide scaffold, which binds the minor groove
of DNA with selectivity over RNA [13].Similarly, Hamilton et al. have recently
4 . 3 Protein Secondary Structure Mimetics 1 253
reported terphenyl, oligoamide, and terephthalamide derivatives as structural

and functional mimics of extended regions of a-helices and have confirmed
their binding to a series of protein targets [14-161.
Several reviews have provided insights into the key issues involved in
identifying disrupters of protein-protein interactions. Stites has presented
a thorough discussion on the thermodynamic aspects of protein-protein
association and the relative importance of enthalpy, entropy, and the heat
capacity effects in stabilizing complexation [ 11. Cochran has summarized the
early development of synthetic antagonists of protein-protein interactions and
a number of recent reviews have brought the field up to date [l,3,4,171. Most
recently, Hamilton et al. have discussed the strategies for designing synthetic
agents to target protein-protein interactions [18].
4.3.3
Conventional drug discovery often starts by screening a large and diverse chem-
ical library, from which lead compounds can be identified using biochemical
and cell-based evaluation methods. The subsequent steps involve an iterative
loop of structure determination, modeling, and lead optimization. In many
cases, millions of compounds in the preliminary screening, dozens of high-
resolution X-ray structures of a drug target, as well as months of collaborative
research are necessary to achieve the potency, selectivity, and pharmacokinetic
and toxicological properties required of a preclinical drug candidate.
Rational inhibitor design offers a compelling alternative for the identification
of protein-protein disrupters as it is based on a structural knowledge of the
interface. In particular, synthetic scaffolds that mimic the key elements of a
protein surface can potentially lead to small molecules with the full activity of
a protein domain, a fraction of the molecular weight, and no peptide bonds.
Furthermore, lead compounds derived from rational design can be readily
optimized by structure-activity relationship (SAR) studies.
In general, structure-based drug design treats the backbone of the protein as
a relatively rigid entity. Once the structure of a complex of the protein with a
representative ligand has been solved experimentally, it can be used as a valid
template, onto which atoms or functional groups can be added to the ligand
if free space is available within the binding pocket. In reality, protein side
chains within the binding pocket may move to accommodate a ligand and, in
some cases, there may even be limited movement of the polypeptide backbone.
Moreover, bound solvent may define the surface of the binding pocket, rather
than the protein itself, and thus limit the space available for the addition of
substituents.
Before designing small molecule agents that target certain protein-protein
interfaces, it is helpful to consider the characteristics of a general pro-
tein-protein complex. The association constant, which is determined by
254
I the free energy difference (AG) between the associated and unassociated
4 Contro//ing Protein-Protein Interactions
states of the proteins, is the parameter of the utmost importance since it

determines at what concentrations the protein complex is formed. However,
the changes in enthalpy, entropy, and heat capacity all provide useful insights
into the nature of the complexation and the interacting sites. In his review,
Stites listed the thermodynamic characteristics for 43 protein-protein, and 26
protein-peptide interactions, most of which were determined by isothermal
titration calorimetry. The range of AG is -7.0 to -17.2 kcal mol-' for pro-
tein-protein interactions and -5.3 to -11.7 kcal mol-' for protein-peptide
interactions. The range of A H and A S is +12.6 to -66.7 kcal mol-' and
f78.6 to -188.4 cal mol-' K-' for protein-protein interactions and +19.9
to -41.9 kcal mol-' and +95.7 to -109 cal mol-' K-' for protein-peptide
interactions. The values of heat capacity (ACp), which can be correlated
to the amount of polar and nonpolar surface areas buried upon complex
formation, range from 2 to -767 and -100 to -1200 cal mol-' KP1 for
protein-peptide interactions. The average A G value for protein-protein inter-
actions is -10.40 kcal mol-' with a standard deviation of 2.49 kcal mol-'.
The average AH value is -8.60 *
13.63 kcal mol-l, and that of AS is
*
6.12 43.68 cal mol-' K-'. Protein-protein interactions have an average AC,
of -333 =t202 cal mol-' K-'. The most important conclusion to be drawn
from this analysis is that the thermodynamic driving force for protein-protein
interactions is highly variable, ranging from strongly enthalpically to strongly
entropically driven.
Stites also concluded that hydrophobic interactions generally provide the
key contact forces for protein-protein complexation though other alternatives,
such as electrostatic effects can also play a dominant role [19]. The association
of proteins generally follows a two-step mechanism, with the first being a
diffusion-controlled association resulting in a loose complex and the second
involving specific docking of complementary surfaces that yields the high
affinity complex [20]. A common feature of associating proteins is that
the on-rate for interaction shows strong dependence on ionic strength,
whereas the off-rate is relatively insensitive. The study of the association
of bacterial ribonuclease barnase and its polypeptide inhibitor barstar,
which is driven by strong complementary electrostatic forces, shed light
on the influence of electrostatic forces on the structure of the activated
complex [21]. Fersht and Schreiber probed the interaction of barnase and
barstar at various ionic strengths and found that at low ionic strength, all
proximal charge pairs form contacts. Increasing the ionic strength, which
masks the electrostatic forces, induced a partial loss of the charge-charge
interactions. However, the barnase-barstar interface still aligned itself
correctly [22].
Extensive work has been done on the amino acid composition at
protein-protein interfaces, which provides useful information for inhibitor
design. Bogan et al. examined 2325 alanine mutants for which changes in
free energy of binding have been measured and showed that the energetic
4.3 Protein Secondary Structure Mimetics I 2 5 5
contributions of the individual side chains did not correlate with their buried
surfaces [23]. In several cases, a set of energetically unimportant contacts
surrounded the hot spot, seeming to occlude bulk solvent in the manner of an
0 ring. Certain amino acid residues, in particular, tryptophan (21%),arginine
(13%), and tyrosine (12%), appear more frequently in hot spots (contribute
more than 2 kcal mol-' to a binding interaction) than others, such as leucine,
methionine, serine, threonine, and valine, each of which account for less than
3% of the overall hot spot residues [24]. Tryptophan, arginine, and tyrosine
residues are also found more frequently in the protein interfaces, with 3.91-,
2.47-, and 2.29-fold enrichment, respectively, in hot spot areas. An enrichment
of tyrosine and tryptophan as well as a discrimination against valine, isoleucine,
and leucine has also been reported in antibody complementarity-determining
region (CDR) sequences [25]. Padlan et al. proposed that the enrichment
of these aromatic amino acid residues is due to their ability to participate
in hydrophobic contacts without large entropic penalty, as they have fewer
rotatable bonds.
Recent developments in bioinformatics have provided insights into the
analysis of protein-protein interfaces and have helped detection of the hot
spots. A wealth of data of alanine mutations in various protein-protein
complexes is available (www.asedb.org) and has assisted in the design of
small molecules to modulate their interactions [2G]. Table 4.3-1 lists the
protein-protein interactions whose alanine scanning energetic data are
currently available on the ASEdb database. Alternatives for detecting hot
spot regions include computational tools that generate combinatorial libraries
offunctional epitopes and identify recurring sets ofresidues in the epitope [27].
The spatial arrangement of key structural motifs at protein-protein interfaces
has been efficiently detected by this method. Ben-Tal and coworkers have
developed an algorithm, Rate4Site, and a web-server Consurf (consurf.tau.ac.il)
[28] for identification of functional interfaces based on the evolutionary
relations among homologous proteins, as reflected in phylogenetic trees [29].
Using the tree topology and branch lengths corresponding to the evolutionary
relationships between two proteins, the algorithm accurately identified a
homodimer interface of a hypothetical protein Mj0577 that was also detected
in an X-ray crystallographic analysis.
4.3.4
A major problem with peptide-based modulators of protein-protein interac-

tions is that they are vulnerable to proteolytic cleavage and thus have poor
bioavailability. Different strategies have been used to overcome this problem.
For example, peptides in which L-amino acids at potential protease cleavage
sites are replaced by D-aminO acids or constrained analogs have improved half-
lives in cellular assays. However, these methods have serious limitations as the
256
I 4 Contro//ing Protein-Protein hteractions
Table 4.3-1Protein-protein interactions currently listed in the

ASEdb database
Ab hu4D5-5/~185HER2 I L-2 (human)/ I L-2R

Agitoxinjshaker IL-2 (murine)/IL-ZRB
Angiogenin/RNase inhibitor I L-4/1L4-BP
Barnaselbarstar IL-G/IL-GR
bFGF/FGFRlb IL-G/MAb8
BMP type IA receptor/BMP-4 I L-8/IL-8R
Bovine profilin I/rabbit actin I L-8/1 L-8RA
B PTI lchymotrypsin 1L4(IL4bp)/y -c
BPTI/trypsin Im2/E9 Dnase
CD2 /CD48 k-Conotoxin PVIIA/shaker K+ channel
CD4/gp120 Kistrin/GP IIb-IIla
Charybdotoxin/shaker MAb A4.6.1/VEGF
Complement Clq/IgG2b mIL-2/ mIL-2Ra
D1.3/E5.2 NmmI/nAChR
D1.3/HEL NT-3/~75
Dendrotoxin K / K f channels NT-3/trkC
Erabutoxin A/AChR Protein A/IgG1
Erabutoxin A/Ma2-3 RNase inhibitorlangiogenin
Factor VII/tissue factor RNase inhibitor/Rnase A
H EL/ HYH E L-10 SCTCRVb/SEC3-1A4
hG-CSF/hG-CSFbp SEC3/TCR Vb
hGH/MAb (1-21) Shaker/agitoxin
hGHbp/MAblZB8 Shaker/CTX
hG Hbp/MAbl3 E 1 sHIR/insulin
hGHbpIMAb263 Tissue factor/Fab 5G9
hGHbpjMAb3B7 Tissue factorjfactor VIIa
hGHbp/MAb3D9 VEGF/KDR
hI L- 18 binding protein/h I L- 18 VEGF/MAb 3.2E3.1.1
HYHEL-lO/HEL VEGF/MAb A4.6.1
IGF-l/IGF-lR yCaM/calcineurin
unnatural amino acids and conformational constraints sometimes interfere

with the complexation process. Furthermore, it has been suggested that the
poor oral bioavailability of peptides is not solely due to their susceptibility to
cleavage by peptidases as the peptide bond itself contributes, at least partially,
to the problem [30]. Such limitations make the development of nonpeptide
agents that mediate protein-protein interactions a matter of much interest
and therapeutic value.
4.3.4.1 Peptidomimeticsof /?-TurnslStrands

Hirschmann, Nicolaou, and Smith have pioneered the development of
synthetic agents that mimic B-strand and B-turn conformations. As an early
example, Hirschmann and Nicolaou reported a mimetic of the cyclic peptide
hormone somatostatin (SRIF) using a B-D-glucose scaffold [lo]. SRIF is a
cyclic tetradecapeptide that inhibits the release of growth hormone (GH) [31].
4.3 Protein Secondary Structure Mimetics I 257
1 2
Fig. 4.3-2 Structure of j3-D-glucose-based peptidomimetics of SRIF.
Previous studies had shown that cyclic hexapeptide 1 was a potent agonist
of SRIF [32], due to the dipeptide motif of Phe-Pro, enforcing a B-turn
conformation and the correct positioning of the remaining four side chains. In
addition, the aromatic side chains of the Phe-Pro dipeptide provide favorable
hydrophobic interactions with the SRIF receptor.
On the basis ofthis peptide agonist of SRIF, compound 2 was designed with
the critical side chains of 1 projected on a B-D-glucose scaffold (Fig. 4.3-2).
B-D-Glucose is a good design for a B-turn mimetic because: (a) the pyran
ring imposes an appropriate projection of the side chains, and (b) the glucose
backbone is relatively rigid. The shape and substitution pattern of B-D-glucose
was found to best present the Trp, Lys, and Phe side chains. A radiolabeled
binding assay showed that 2 completely displaced a peptide ligand, 12'I-CGP
23996, from the SRIF receptor on membranes from AtT-20 cell lines with an
ICso of 1.9 pM. Binding studies using cerebral cortex and pituitary membrane
cells showed similar results. Taken together, this study supported the validity
of using nonpeptide scaffolds to mimic protein secondary structures that are
of biological interest.
In a follow-up study, Smith and Hirschmann have elaborated a pyrrolinone-
based mimetic of the /I-strandlp-sheet conformations [33, 341, in which
all of the key recognition features (i.e., side chains and hydrogen-bond
donors/acceptors) are faithfully represented within a low-molecular-weight
nonpeptide analog 4 (Fig. 4.3-3). This design has been applied to the
development of antagonists of HIV-1 protease and more recently to mimics of
major histocompatibility complex (MHC)class I1 protein substrate [34, 351.
Computational modeling using the Macromodel program suggested that
3,S-linked pyrrolin-4-ones can structurally mimic a short peptide in a
B-strand conformation. In a computer-simulated conformational search, the
pyrrolinone rings fix the dihedral angles analogous to 4, $, and w in a
peptide (Fig. 4.3-3). This favored conformation is due to the hindrance of the
gauche interaction between the side chain substituents and their neighboring
pyrrolinone rings. The side chains appended at the 5-positions of pyrrolinone
258
I>
3 4
Fig. 4.3-3 Polypyrrolinone-based B-turn peptidomimetic 4.
take up an orientation axial to the heterocyclic ring. Comparison of peptide

3 with the mimetic 4 suggested that the disposition of the vinylogous amide
carbonyls in 4 closely reproduces the orientation of the peptide carbonyls in
3. By this means, compound 4 maintains the hydrogen-bond acceptors of the
native B-strand using the vinylogous amide nitrogen. Despite the presence
of the vinylogous substitution, pyrrolinone -NH groups are comparable to
amide groups in basicity and may further stabilize the requisite B-strand and
B-sheet conformations through intra- and intermolecular hydrogen bonding,
respectively.
As a test of this B-strand mimetic design, Hirschmann and Smith selected a
fragment of equine angiotensinogen, tetrapeptide methyl ester 3, as the initial
target. Least-square comparison showed good spatial agreement between
the optimized conformation of 4 and the X-ray crystal structure of 3. The
X-ray crystal structure of 4 confirmed that this mimetic adopts a B-strand
conformation in solid state. Moreover, the side chain trajectories and carbonyl
orientations showed similar spatial projection with those of the tetrapeptide,
affirming that 4 is a good structural mimetic of 3.
To evaluate the biological applicability ofthis design, Smith and Hirschmann
have developed HIV-1 protease inhibitors based on the polypyrrolinone
scaffold. Previous studies have shown that many binding interactions are
conserved in the HIV-1 protease/inhibitor complex formation [36]. B-Strand
peptide inhibitors, such as 5 and JG-365 (Ac-Ser-Leu-Asn-Phe-Hea-Pro-Ile-
Val-OMe, Hea - hydroxylamine [CH(OH)CHzN]),bind in an active site on
the HIV-1 protease surface with their side chains inserting into hydrophobic
pockets (Fig. 4.3-4).
The inhibitory effects of the pyrrolinone derivatives were evaluated using
enzyme inhibition and cellular activation assays. Compound G (Fig. 4.3-5)
showed an IC50 of lOnM, compared to O.GnM for the related peptide
inhibitor 5 (L682,679). However, the synthetic agent G showed better cell
transport capacity. In a cellular antiviral assay, 5 and G showed CIC95 values
(the concentration that inhibits 95% of virus multiplication in the cellular
cultures) of 6.0 and 1.5 pM, respectively. Smith and Hirschmann proposed
that the improved cellular uptake properties of polypyrrolinones are due to a
reduction in the inhibitor solvation. Solvation is an impediment to transport
because extraction of a molecule into a lipid bilayer from an aqueous phase is
Fig. 4.3-4 Complex o f t h e HIV-1 protease and p-strand peptide inhibitor JC-365.
5 (L682,679) 6
Fig. 4.3-5 HIV-1 protease inhibitors 5 and 6
thermodynamically disfavored [ 371. The polypyrrolinone compounds can form

intramolecular hydrogen bonds, which reduce the number of solvating water
molecules by two and favor the entry of the mimetics into the cell membrane.
Smith and Hirschmann’s studies opened a new field of using de novo
designed synthetic scaffolds to mimic relatively large protein secondary
structures. While more structural studies, such as X-ray and N M R analyses, are
needed to confirm whether these compounds recognize their protein targets
in the same manner as their peptide models, the concept of using small
molecules to project critical functionalities to target proteins is established.
Although many of the B-strand mimetic designs were used only to modulate
protein-ligand interactions, the potential application of this strategy in other
biological processes is clear.
260
4.3.4.2 Terphenyl-based Helical Mimetics that Disrupt the Bcl-xL/Bak

Interaction
a-Helices are another major protein secondary structure found in nature.
About 40% of all amino acids in natural proteins take up a-helical
a
conformations. A typical a-helix rises at 5.4 per turn or 1.5 A per residue
+
(Fig. 4.3-G(a)).The amino acid residues at the i, i + 3, i 4, and i + 7 positions
are aligned on the same face of the helical backbone and often combine
in the recognition of a complementary surface. a-Helices play key roles in
numerous protein-protein, protein-DNA, and protein-RNA interactions,
making them an attractive target for the design of small molecule agents
that mimic both their structures and functions [38]. In recent years, major
strides have been made in this field, evolving from strategies based on induced
helix stabilization to the recent advent of helix proteomimetics, molecules that
mimic the surface functionalities presented by a-helical secondary structures
12, 391.
Hamilton et al. have reported a series of synthetic agents based on a
terphenyl scaffold that mimic the helical region of the Bak peptide. The
terphenyl derivatives (Fig. 4.3-G(b)),substituted with alkyl or aryl side chains
at the 3,2',2"-positions, project these side chains in a fashion similar to the
+ +
arrangement of the i, i 4, and i 7 residues on an a-helical backbone.
Fig. 4.3-6 (a) Surface displacement o f residues on an a-helix surface.

(b) Terphenyl-based a-helical rnimetics.
To test this general design, Hamilton and coworkers have developed a-helix
mimetics of the Bak protein that binds into a shallow hydrophobic cleft on
the surface of Bcl-xL. Bak and Bc1-x~are members of the B-cell lymphoma-
2 (Bcl-2) protein family, which plays an important role in the apoptotic
pathway [40]. This protein family can be divided into two subgroups: the
proapoptotic and the prosurvival subfamilies. The proapoptotic subfamily
proteins, such as Bak, Bad, and Bax, share a minimal helical homologous
region, the BH3 domain, which is responsible for mediation of apoptosis
through heterodimerization with the prosurvival Bcl-2 family members [41].
Overexpression of the prosurvival proteins, such as Bcl-2 and Bcl-x~,can
inhibit the potency of many currently available anticancer drugs by blocking
the apoptotic pathway [42].
A current strategy for modulating apoptosis is to target the Bak-recognition
site on BcI-XL and thereby disrupt the protein-protein contact. The structure
of the Bcl-xL/Bak complex determined by N M R spectroscopy showed that a
helical region of Bak (amino acid 72 to 87) binds to a hydrophobic cleft on
the surface of Bcl-x~(& = 340 nM) [43].Furthermore, the crucial residues for
binding, shown by alanine scanning, are Va174, Leu78, Ile81, and Ile85, which
+ + +
project at the i, i 4, i 7, and i 11positions along one face of the Bak helix.
The design of agents that directly mimic the death-promoting BH3 domain
of the proapoptotic subfamily of Bcl-2 proteins is of much current interest as
they can potentially provide drugs that control apoptosis [44].
A series of terphenyl derivatives with different side chains was prepared
as structural mimetics of the Bak peptide using a modular and convergent
synthesis. We used a fluorescence polarization assay to monitor the interaction
between the inhibitor and the target protein. Some of the structure-activity
results are listed in Table 4.3-2. Terphenyl 7, with two carboxyl groups and
a substituent sequence of isobutyl, 1-naphthylmethylene,isobutyl groups in
the 3,2',2"-positions, was identified as a potent inhibitor (Kd = 114 nM) of the
Bak/Bcl-xLcomplexation. The binding specificity was confirmed by scrambling
the sequence of the substitutions, as in isomer 12, which caused a 25-fold drop
in Ki. The importance of the side chains was confirmed by terphenyll3 which
lacks the ability to disrupt Bak binding to BcI-XL, ruling out the possibility of
nonspecific binding by the terphenyl backbone.
"N-HSQC N M R experiments with 7 indicated that the terphenyl derivatives
target the same hydrophobic cleft on Bc1-x~as the Bak peptide (shown in blue,
Fig. 4.3-7). Residues A89, L99, L108, T109, S110, 4111, 1114, 4125, L130,
F131, W137, G138, R139, 1140, A142, S145, and F146 (shown in magenta
in Fig. 4.3-7) showed significant chemical shift changes on addition of the
synthetic inhibitor 7. Some other residues, including G94, L112, S122, G134,
K157, E158, and M159 (shown in yellow in Fig. 4.3-7) showed moderate
chemical shift changes under the same conditions. All these affected residues
lie near the shallow cleft on the protein surface into which the Bak BH3 helix
binds. The targeted residues V74, L78, and I81 of Bak BH3 are within 4 A
distance of residues F97, R102, L108, L130, 1140, A142, and F146 of Bc~-xL,
262
I 4 C o n t r o h g Protein-Protein Interactions
Table 4.3-2 Results ofthe fluorescence polarization assay for the

terphenyl-based Bak rnirnetics.
Bn -iBu 11 2.73
q . 3 iBu iBu 12 2.70
H H H 13 >30.0
C02H
Polarization measurements were recorded on titration of
inhibitors at varying concentrations in a solution of 15 n M
labeled Bak peptide (F1-CQVCRQLAIIGDDINR-CONH2) and
184 nM Bcl-xL (25 "C, 1.0 mM PBS, pH 7.4)
most ofwhich showed significant chemical shift changes (F97 overlapped with
NS), confirming that 7 and Bak BH3 target the same area on the exterior surface
of Bc1-x~.Overlay of 7 and the Bak BH3 peptide suggested that the terphenyl
indeed adopts a staggered conformation, mimicking the cylindrical shape of
the helix with the substituents making a series of hydrophobic contacts with
the protein surface.
Further studies using human embryonic kidney 293 (HEK293) cells have
shown that terphenyl 7 disrupts Bak/Bcl-xL binding in whole cells [lG].
HEK293 cells transfected with both HA-Bcl-xL and flag-Bax,an analog of Bak,
were treated with terphenyl derivatives. After 24-h incubation, the cells were
harvested and lysed. HA-tagged BcI-XLwas collected via immunoprecipitation
with HA antibody. The resulting mixture was loaded on to a 12.5% SDS-PAGE
gel, and proteins transferred to nitrocellulose for western blot analysis. The
presence of Bax protein was probed with antiflag antibody. The inhibitory
potencies of the terphenyl compounds were determined by measuring the
relative intensity of the Bax protein bound to Bcl-xL. We found that 51% of the
Bak/Bcl-xL interaction was disrupted in HEK293 cells treated with terphenyl
7, indicating that certain terphenyls are competitive with the full-length
protein-protein interaction in a cellular environment.
Fig. 4.3-7 Results ofthe "N-HSQC and highest ranked binding mode o f inhibitor 7
computational docking experiments o f 7 predicted from a computational docking
binding to BcI-xL. The residues that showed simulation (Autodock 3.0) has been
significant chemical shift changes in the superimposed on the helical Bak BH3
presence o f 7 are shown in yellow. The domain for comparison.
A critical issue in the design of small molecule a-helix mimetics is

the selectivity of these compounds among different helix-binding proteins,
as lack of specificity might lead to damage to normal cells [45]. Nature
frequently uses secondary structure modules, such as a-helices, to recognize
different protein targets and achieves high specificity through spatial and
charge complementarity [ 171. As an example, the tumor suppressor protein
p53 selectively binds, with its helical N-terminal domain, to the regulatory
protein HDM2 over other oncogenic proteins, such as Bcl-xL and Bcl-2,
which both complex with the a-helical Bak BH3 domain [46]. Comparison
of terphenyl isomers 7 and 10, with 1-and 2-naphthylmethylene side chains,
respectively, on the middle phenyl rings, showed that terphenyl derivatives
can selectively bind to different helix-binding proteins (Table 4.3-3) [15, 161.
Terphenyl 5 binds to Bcl-xL more than 10-fold stronger than 8, whereas,
terphenyl 8 specifically disrupts the HDM2/p53 complexation, possibly due
to the deeper pocket in HDM2 for W23 at the i + 4 position compared to
the L78-pocket of Bcl-xL or Bcl-2. These results confirm the generality of
the terphenyl scaffold as a mimic of the side chain induced selectivity of
a-helices and provide a useful tool for the rational design of protein-binding
agents.
264
Table 4.3-3 Comparison ofterphenyl derivatives 7 and 10 in

inhibition of different protein-protein complex
Ki (ILM) HDM2/p53 Bcl-xL/Bak Bcl-Z/Bak

~~
7 25.7 0.114 0.121

10 0.182 2.50 15.0
4.3.5
Future Developments
The future development of structure-based drug design depends heavily on the

progress of computer techniques. In a recent review, Jorgenson has pointed out
that despite widespread suspicion, computer-aided drug design has become
a useful tool in generating focused libraries [47]. The recently developed
computer program BOMB is among the first software packages that can assist
in the design of inhibitors for a specific protein target, from scratch, on the
basis of the available structural information. Even though these approaches are
in their infancy, when more parameters, such as solvent effects, ionic strength,
and surface mobility, are taken into account the accuracy and credibility of the
methods will be improved. It is unlikely that dramatic improvements in current
sampling algorithms and scoring functions will occur in the near future; thus,
advancement of the field will likely come from better understanding of how to
apply existing technologies.
The techniques applied to the identification of potential inhibitors of
protein-protein interactions have been another evolving area. NMR-based
screening methods that focus either on the protein receptor or the ligand
have been used in pharmaceutical research, although they can still be lengthy
processes [48].Structure-based NMR screening and fragment combination
strategies are particularly effective for discovering novel leads that target
a different area on a protein surface. Furthermore, Mrksich etal. have
described a strategy using matrix-assisted laser-desorption ionization time-
of-flight (MALDI-TOF) mass spectrometry (MS) to screen large libraries of
low-molecular-weight compounds [49]. The major advantage of MS is that
it avoids the requirement of analyte labeling. Mrksich and coworkers used
self-assembled monolayers (SAM) that are engineered to measure enzyme
activities and MALDI-TOFto detect lead compounds. Currently, this approach
has been used only in identifying small molecule agents that inhibit enzyme
activity. MS will certainly be applied more broadly to detect inhibitors for
protein-protein interactions as an efficient alternative to the conventional
fluorescent-based screening methods.
Fragment-based lead discovery has drawn much attention as a novel
discovery strategy. By screening a relatively small number of fragment units,
functional groups can be found to recognize subpockets within an active
site. This approach is especially useful with protein targets that have more
References I 2 6 5
than one binding pocket, each of which might contribute separately to the
complex formation. Furthermore, smaller molecules offer better starting
points for drug discovery because they can be readily assembled into larger
compounds. Wells et al. have reported a powerful technique for identifying
antagonists of protein-protein interactions with only medium to low potency
(micromolar - millimolar) by using a dynamically interconverting thiol-
tethered library [SO]. This method has a great advantage in searching for
inhibitors that target a mobile protein surface. Kodadek et al. have developed
a general methodology that is effective in searching for a second binding site
on the protein surface. A library of combinatorial oligomeric compounds
is attached to a low-affinity anchor compound that can recognize the
target protein. The resulting library is then screened under conditions too
demanding for the lead to support robust binding to the protein target.
Using MDM2 as a model, they have identified relatively potent chimeric
compounds that simultaneously recognize multiple binding sites on the
protein surface [Sl].
4.3.6
Conclusion
Several examples of rationally designed protein secondary structure mimetics

that modulate protein-protein and protein-ligand interactions have appeared
in recent years. These studies showed that the strategy of mimicking protein
secondary structures in small molecules provides an alternative to conventional
library screening in drug discovery. To further accelerate progress in this area,
we need more in-depth understanding of the receptor-ligand complexation,
which requires a collaborative effort in organic syntheses, structural analyses,
computational simulations, and biological evaluation.
Acknowledgments
We thank the National Institutes of Health (GMG9850) for financial support

of this work.
References
I . W.E. Stites, Protein-protein molecules, Chem. Rev. 2000, 100,

interactions: interface structure, 2479-2493.
binding thermodynamics, and 3. P.L. Toogood, Inhibition of
mutational analysis, Chem. Rev. 1997, protein-protein association by small
97,1233-1250. molecules: approaches and progress,
2. M.W. Peczuh, A.D. Hamilton, Peptide /. Med. Chem. 2002,45,
and protein recognition by designed 1543- 1558.
4 Controlling Protein-Protein lnteractions
266
I 4. A.G. Cochran, Antagonists of 1,1,6-trisubstitutedindanes, Bioorg.
protein-protein interactions, Chem. Med. Chem. 1996, 4, 33-42.
Biol. 2000, 7, R85-R94. 13. H. Xuereb, M. Maletic, J. Gildersleeve,
5. J.M.Gulland, R. Robinson, The I. Pelczer, D. Kahne, Design of an
constitution of codeine and the baine, oligosaccharide scaffold that binds in
Mem. Proc. Munch. Lit. Phil. SOC. 1925, the minor groove of DNA, /. Am.
69, 79. Chem. SOC.2000, 122, 1883-1890.
6. P.S. Farmer, in Drug Design, (Ed.: E.J. 14. B.P. Orner, J.T. Ernst, A.D. Hamilton,
Ariens), Vol. X . Academic, New York, Toward proteomimetics: terphenyl
1980, pp. 119. derivatives as structural and functional
7. W.A. Loughlin, J.D. Tyndall, M.P. mimics of extended regions of an
Glenn, D.P. Fairlie, Beta-strand alpha-helix,/. Am. Chem. SOC. 2001,
mimetics, Chem. Rev. 2004, 104, 123,5382-5383; J.T. Ernst, 0. Kutzki,
6085-6118. A.K. Debnath, S. Jiang, H. Lu, A.D.
8. T. Clackson, J.A.Wells, A hot-spot of Hamilton, Design of a protein surface
binding-energy in a hormone-receptor antagonist based on alpha-helix
interface, Science 1995, 267, 383-386 mimicry: inhibition of gp41 assembly
9. B.C. Cunningham, J.A. Wells, and viral fusion, Angew. Chem. Int. Ed.
Comparison of a structural and a Engl. 2001,41,278-282-; 0. Kutzki,
functional epitope, 1.Mol. Biol. 1993, H.S. Park, J.T. Ernst, B.P. Orner, H.
234,554-563. Yin, A.D. Hamilton, Development of a
10. R. Hirschmann, K.C. Nicolaou, potent Bcl-X(L)antagonist based on
S. Pietranico, J. Salvino, E.M. Leahy, alpha- helix mimicry, /. Am. Chevn.
P.A. Sprengeler, G. Furst, A.B. Smith, SOC.2002, 124, 11838-11839; J.T.
C.D. Strader, M.A. Cascieri, M.R. Ernst, J. Becerril, H.S. Park, H. Yin,
Candelore, C. Donaldson, W. Vale, A.D. Hamilton, Design and
L. Maechler, Nonpeptidal application of an alpha-helix-mimetic
peptidomimetics with a scaffold based on an
beta-D-glucose scaffolding - a partial oligoamide-foldamer strategy:
somatostatin agonist bearing a close antagonism of the bak Bh3/Bcl-X1
structural relationship to a potent, complex, Angew. Chem. Int. Ed. Engl.
selective substance-P antagonist, /. 2003,42,535-550 H. Yin, A.D.
Am. Chem. Soc. 1992, 114,9217-9218. Hamilton, Terephthalamide
11. A.B. Smith, R. Hirschmann, derivatives as mimetics of the helical
A. Pasternak, R. Akaishi, M.C. region of bak peptide target Bcl-X1
Guzman, D.R. Jones, T.P. Keenan, protein, Bioorg. Med. Chem. Lett. 2004,
P.A. Sprengeler, P.L. Darke, E.A. 14, 1375-1379; H. Yin, G.I. Lee, K.A.
Emini, M.K. Holloway, W.A. Schleif, Sedey, J.M. Rodriguez, H.G. Wang,
Design and synthesis of S.M. Sebti, A.D. Hamilton,
peptidomimetic inhibitors of Hiv-1 Terephthalamide derivatives as
protease and renin - evidence for mimetics of helical peptides:
improved transport, 1.Med. Chem. disruption of the Bcl-Xl/Bak
1994,37,215-218. interaction, J. Am. Chem. Soc. 2005,
12. D. Horwell, M. Pritchard, J. Raphy, 127, in press.
G. Ratcliffe, ‘Targeted’molecular 15. H. Yin, G.I. Lee, H.S. Park, G.A.
diversity: design and development of Payne, J.M. Rodriguez, S.M. Sebti,
non-peptide antagonists for A.D. Hamilton, Terphenyl-based
cholecystokinin and tachykinin helical mimetics that disrupt the
receptors, Immunophamacology 199G, P53/Hdm2 interaction, Angew. Chem.
33,68-72; D.C. Honvell, W. Howson, Int. Ed. Engl. 2005, 44, 2704-2707.
G.S. Ratcliffe, H.M.G. Willems, The 16. H. Yin, G.I. Lee, K.A. Sedey,
design of dipeptide helical mimetics: 0. Kutzki, H.S. Park, B.P. Orner, J.T.
the synthesis, tachykinin receptor Ernst, H.G. Wang, S.M. Sebti, A.D.
affinity and conformational analysis of Hamilton, Terphenyl-based bak-Bh3
References I 2 6 7
alpha-helical proteomimetics as in enthalpy on mutation, j . Mol. Biol.

low-molecular-weight antagonists of 1997,267,696-706.
Bcl-X1,j . Am. Chem. Soc. 2005, 127, 22. C. Frisch, A.R. Fersht, G. Schreiber,
10191-10196. Experimental assignment of the
17. T. Berg, Modulation of protein-protein structure of the transition state for the
interactions with small organic association of barnase and barstar, /.
molecules, Angew. Chem. Int. Ed. Engl. Mol. Biol. 2001, 308, 69-77.
2003,42, 2462-2481; D.L. Boger, J. 23. A.A. Bogan, K.S. Thorn, Anatomy of
Desharnais, K. Capps, Solution-phase hot spots in protein interfaces, j . Mol.
combinatorial libraries: modulating Biol. 1998, 280, 1-9.
cellular signaling by targeting 24. B.Y. Ma, T. Elkayam, H. Wolfson,
protein-protein or protein-DNA R. Nussinov, Protein-protein
interactions, Angew. Chem., Int. Ed. interactions: structurally conserved
Engl. 2003, 42,4138-4176; D.L. Boger, residues distinguish between binding
Solution-phase synthesis of sites and exposed protein surfaces,
combinatorial libraries designed to Proc. Natl. Acad. Sci. U. S.A. 2003,
modulate protein-protein or 100,5772-5777.
protein-DNA interactions, Bioorg. Med. 25. E.A. Padlan, On the nature of antibody
Chem. 2003, 1 1 , 1607-1613; A.G. combining sites - unusual structural
Cochran, Protein-protein interfaces: features that may confer on these sites
mimics and inhibitors, Curr. Opin. an enhanced capacity for binding
Chem. Biol. 2001, 5, 654-659; T.R. ligands, Proteins Struct. Funct. Genet.
Gadek, J.B. Nicholas, Small molecule 1990, 7,112-124.
antagonists of proteins, Biochem.
26. K.S. Thorn, A.A. Bogan, Asedb: a
Pharmacol. 2003, 651-8; A.V.
database of alanine mutations and
Veselovsky, Y.D. Ivanov, A.S. Ivanov,
their effects on the free energy of
A.I. Archakov, P. Lewi, P. Janssen,
binding in protein interactions,
Protein-protein interactions:
Bioinformatics 2001, 17, 284-285.
mechanisms and modification by
27. N. Leibowitz, Z.Y. Fligelman,
drugs, 1.Mol. Recognit. 2002, 15,
R. Nussinov, H.J. Wolfson, Automated
405-422; M.R. Arkin, J.A. Wells,
multiple structure alignment and
Small-molecule inhibitors of
protein-protein interactions: detection of a common substructural
progressing towards the dream, Nat. motif, Proteins Struct. Funct. Genet.
Rev. Drug Discov. 2004, 3, 301-317. 2001, 43,235-245; B.Y. Ma, H.J.
18. H. Yin, A.D. Hamilton, Strategies for Wolfson, R. Nussinov, Protein
targeting protein-protein interactions functional epitopes: hot spots,
using synthetic agents, Angew. Chem., dynamics and combinatorial libraries,
Int. Ed. Engl. 2005, 44,4130-4163. Curr. Opin. Struct. Biol. 2001, 1 1 ,
19. G.C. Kresheck, L.B. Vitello, J.E. 364-369.
Erman, Calorimetric studies on the 28. F. Glaser, T. Pupko, I . Paz, R.E. Bell,
interaction of horse ferricytochrome-C D. Bechor-Shental, E. Martz,
and yeast cytochrome-C peroxidase, N. Ben-Tal, Consurf: identification of
Biochemistry 1995,34,8398-8405. functional regions in proteins by
20. H. Wendt, L. Leder, H. Harma, surface-mapping of phylogenetic
1. Jelesarov, A. Baici, H.R. Bosshard, information, Bioinformatics 2003, 19,
Very rapid, ionic strength-dependent 163- 164.
association and folding of a 29. R.E. Bell, N. Ben-Tal, In silico
heterodimeric leucine zipper, identification of functional protein
Biochemistry 1997, 36,204-213. interfaces, Comp. Funct. Genom. 2003,
21. C. Frisch, G. Schreiber, C.M. Johnson, 4,420-423.
A.R. Fersht, Thermodynamics of the 30. R. Hirschmann, Medicinal chemistry
interaction of barnase and barstar: in the golden-age of biology - lessons
changes in free energy versus changes from steroid and peptide research,
268
Angew. Chem. Int. Ed. Engl. 1991, 30, 34. A.B. Smith, A.B. Benowitz, P.A.
1278-1301. Sprengeler, J. Barbosa, M.C. Guzman,
31. P. Brazeau, W. Vale, R. Burgus, R. Hirschmann, E. J. Schweiger, D.R.
R. Guillemi, Isolation of Somatostatin Bolin, 2. Nagy, R.M. Campbell, D.C.
(a somatotropin-release-inhibiting- Cox, G.L. Olson, Design and synthesis
factor) of ovine hypothalamic origin, of a competent pyrrolinone-peptide
Can.]. Biochem. 1974,52,1067-1072. hybrid ligand for the class Ii Major
32. P. Brazeau, W. Vale, R. Burgus, histocompatibility complex protein
N. Ling, M. Butcher, J. Rivier, Hla-Dr1,J. Am. Chem. SOC.1999, 121,
R. Guillemi, Hypothalamic 9286-9298.
polypeptide that inhibits secretion of 35. A.B. Smith, R. Hirschmann,
immunoreactive pituitary A. Pasternak, W.Q. Yao, P.A.
growth-hormone, Science 1973, 179, Sprengeler, M.K. Holloway, L.C. Kuo,
77-79. Z.G. Chen, P.L. Darke, W.A. Schleif,
33. A.B. Smith, W.Y. Wang, P.A. An orally bioavailable pyrrolinone
Sprengeler, R. Hirschmann, Design, inhibitor of Hiv-1 protease:
synthesis, and solution structure of a computational analysis and X-ray
pyrrolinone-based beta-turn crystal structure of the enzyme
peptidomimetic, J . Am. Chem. SOC. complex, J . Med. Chem. 1997, 40,
2000, 122,11037-11038; A.B. 2440-2444; P.V. Murphy, J.L. O’Brien,
Smith, H. Liu, R. Hirschmann, A L.J. Gorey-Feret, A.B. Smith, Synthesis
second generation synthesis of of novel Hiv-1 protease inhibitors
polypyrrolinone nonpeptidomimetics:
based on carbohydrate scaffolds,
prelude to the synthesis of
Tetrahedron 2003, 59, 2259-2271; P.V.
polypyrrolinones on solid support,
Murphy, J.L. O’Brien, L.J. Gorey-Feret,
Org. Lett. 2000, 2,2037-2040 A.B.
A.B. Smith, Structure-based design
Smith, T.P. Keenan, R.C. Holcomb,
and synthesis of Hiv-1 protease
P.A. Sprengeler, M.C. Guzman, J.L.
Wood, P.J. Carroll, R. Hirschmann, inhibitors employing
Design, synthesis, and beta-D-mannopyranoside scaffolds,
crystal-structure of a Bioorg. Med. Chem. Lett. 2002, 12,
pyrrolinone-based peptidomimetic 1763-1766.
possessing the conformation of a 36. J.R. Huff, Hiv Protease - a Novel
beta-strand - potential application to Chemotherapeutic Target for Aids, /.
the design of novel inhibitors of Med. Chem. 1991,34, 2305-2314 A.L.
proteolytic-enzymes, J. Am. Chem. SOC. Swain, M.M. Miller, J. Green, D.H.
1992, 114,10672-10674; A.B. Smith, Rich, J. Schneider, S.B.H. Kent, A.
L.D. Cantin, A. Pasternak, L. Wlodawer, X-ray crystallographic
Guise-Zawacki, W.Q. Yao, A.K. structure of a complex between a
Charnley, J. Barbosa, P.A. synthetic protease of human
Sprengeler, R. Hirschmann, S. immunodeficiency virus-1 and a
Munshi, D.B. Olsen, W.A. Schleif, substrate-based hydroxyethylamine
L.C. Kuo, Design, synthesis, and inhibitor, Proc. Natl. Acad. Sci. U.S . A.
biological evaluation of 1990,87,8805-8809.
monopyrrolinone-based Hiv-1 37. W.D. Stein, The Movement ofMolecules
protease inhibitors, J. Med. Chem. across Cell Membranes, Academic, New
2003,46, 1831-1844; A.B. Smith, York, 1967, pp. 65-125.
M.C. Guzman, P.A. Sprengeler, T.P. 38. D.P. Fairlie, M.L. West, A.K. Wong,
Keenan, R.C. Holcomb, J.L. Wood, P.J. Towards protein surface mimetics,
Carroll, R. Hirschmann, De-novo Curr. Med. Chem.1998,5, 29-62.
design, synthesis, and x-ray 39. L.D. Walensky, A.L. Kung, I. Escher,
crystal-structures of pyrrolinone-based T.J. Malia, S. Barbuto, R.D. Wright,
beta-strand peptidomimetics, J . Am. G. Wagner, G.L. Verdine, S.J.
Chem. Soc. 1994, 116, 9947-9962. Korsmeyer, Activation of apoptosis in
References I 2 6 9
vivo by a hydrocarbon-stapled Bh3 45. J.W. Harbour, T.G. Murray, in

helix, Science 2004, 305, 1466-1470. Ophthalmic Surgely: Principles and
40. J.M. Adams, S. Cory, The Bcl-2 protein Techniques, (Ed.: D. Albert), Blackwell
family: arbiters of cell survival, Science Publishers, Maden, 1998, pp.
1998, 281,1322-1326; J.C. Reed, 682-705.
Double identity for proteins of the 46. J.W. Harbour, L. Worley, D.D. Ma,
Bcl-2 family, Nature 1997, 387, M. Cohen, Transducible peptide
773-776. therapy for uveal melanoma and
41. D.T. Chao, S.J. Korsmeyer, Bcl-2 retinoblastoma, Arch. Ophthalmol.
family: regulators of cell death, Annu. 2002, 120,1341-1346.
Rev. Immunol. 1998, 16, 395-419. 47. W.L. Jorgensen, The many roles of
42. A. Strasser, D.C.S. Huang, D.L. Vaux, computation in drug discovery, Science
The role of the Bcl-2/Ced-9 gene 2004,303,1813-1818.
family in cancer and general 48. C.A. Lepre, J.M. Moore, J.W. Peng,
implications of defects in cell death Theory and applications of Nmr-based
control for tumourigenesis and screening in pharmaceutical research,
resistance to chemotherapy, Biochim. Chem. Rev. 2004,104,3641-3675.
Biophys. Acta Rev. Cancer 1997, 1333, 49. D.H. Min, W.J. Tang, M. Mrksich,
F 151-F178. Chemical screening by mass
43. M. Sattler, H. Liang, D. Nettesheim, spectrometry to identify inhibitors of
R.P. Meadows, J.E. Harlan, anthrax lethal factor, Nut. Biotechnol.
M. Eberstadt, H.S. Yoon, S.B. Shuker, 2004, 22,717-723.
B.S. Chang, A.J. Minn, C.B. 50. D.A. Erlanson, A.C. Braisted, D.R.
Thompson, S.W. Fesik, Structure of Raphael, M. Randal, R.M. Stroud,
Bcl-X(L)-Bakpeptide complex: E.M. Gordon, J.A. Wells, Site-directed
recognition between regulators of ligand discovery, Proc. Natl. Acad. Sci.
apoptosis, Science 1997, 275, U. S. A. 2000, 97,9367-9372.
983-986. 51. M.M. Reddy, K. Bachhawat-Sikder,
44. J.M. Adams, S. Cory, Life-or-death T. Kodadek, Transformation of
decisions by the Bcl-2 protein family, low-affinity lead compounds into
Trends Biochem. Sci. 2001, 26, high-affinity protein capture agents,
61-66. Chem. Bid. 2004, 1 1 , 1127-1137.
Chemical Biology
I271
5
Expanding the Genetic Code
5.1
Synthetic Expansion o f the Central Dogma
Masahiko Sisido
Outlook
Protein biosynthetic system has been expanded to incorporate a variety of

nonnatural amino acids. The expansion includes nonenzymatic attachment
of a nonnatural amino acid to a specific tRNA, design of orthogonal tRNAs
that cannot be aminoacylated by any of the endogenous aminoacyl-tRNA
synthetases, examination of elongation factor (EF-Tu) if it accepts wide
variety of nonnatural amino acids, extension of the codonlanticodon pairs
for assigning the positions of nonnatural amino acids, and finally expansion
of ribosomal system to accept nonnatural amino acids. The extent of the
expansion required at each step depends on the types of nonnatural amino
acid. For amino acids whose structures resemble some of the naturally
occurring ones, relatively small alterations on the relevant biomolecules may
be sufficient. For large-sized nonnatural amino acids that carry specialty side
groups, however, further modifications of the biomolecules are required and
sometimes even creation of totally artificial “bio”molecu1es is needed. The
author will refer to the small expansion that requires only minor modification
within the framework of conventional protein engineering, as the biological
expansion. On the other hand, if the expansion requires introduction of
a synthetic component it may be called chemical or synthetic expansion. In
this chapter, we inclined to describe the chemical expansion more than the
biological one, because our final goal is to introduce chemical functions into
living organisms by the incorporation of nonnatural amino acids that often
have large-sized specialty side groups. But, of course, the above discrimination
is tentative and there is no clear boundary between the two.
The technology of nonnatural mutagenesis is finding a wide range of
applications in fluorescence labeling for proteome analysis, synthesis of
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinhrim
ISBN: 978-3-527-31150-7
5 Expanding the Genetic Code
272
I phosphorylated or glycosylated proteins as medicinal tools, and so on.
Furthermore, synthesis of mutant proteins that contain specialty amino acids
in living cells will open a way toward “synthetic microorganisms” that function
differently from the existing organisms.
5.1.1
Introduction
Progress of synthetic chemistry during the last century was really over-
whelming. Chemists with the state-of-the-artknowledge and technique can
produce almost any compounds that can exist in nature. Moreover, they
can fabricate compounds into membranes, vesicles, and other supramolecu-
lar assemblies by using secondary forces, like hydrogen bonds, electrostatic
forces, hydrophobic interactions, and so on. Then, a question arises, whether
chemists can create a living organism. Creation of a living organism is not
an unrealistic target, because essential mechanisms of major reactions in
living cells and important structures of biomolecules that function inside
the cells have been clarified during the last 30 years. It may be possible, at
least in theory, to put all components of the DNA replicating system and
the protein synthesizing system inside an artificial liposome together with
relevant monomers for creation of a minimum prototype of a self-replicating
system.
The most advantageous point of the synthetic approach is, however, not
a simple reconstitution of the existing living organisms, but expansion or
alteration of the existing systems by introducing analogs and surrogates
of biomolecules. Analogs of biomolecules are artificial compounds that
resemble existing biomolecules and function like they do in living organisms.
Nonnatural amino acids and nonnatural nucleic bases, described in this
chapter, are typical analogs. Surrogates are also artificial molecules that have
structures different from those of existing biomolecules but function similarly
or alternatively as some of them. Peptide nucleic acid (PNA) is a typical
surrogate that emulates the hybridization behavior of DNAs and RNAs. By
introducing analogs and surrogates into biochemical systems, we can alter
or expand biochemical functions to create novel functions that have not
been observed in the existing organisms. In particular, expansion of protein
biosynthesizing system to include a variety of nonnatural amino acids is the
subject of this chapter.
The introduction of the 21st and more nonnatural amino acids requires
expansion ofwhole steps in protein synthesis (central dogma) as illustrated in
Fig. 5.1-1 [l-41.
1. Synthesis of nonnatural amino acids of desired functions.
2. Preparation of an orthogonal tRNA that cannot be
aminoacylated by any aminoacyl-tRNAsynthetases
(aaRSs)in the biochemical system. The orthogonal tRNA,
5.7 Synthetic Expansion ofthe Central Dogma I 273
Fig. 5.1-1 Mechanism of protein synthesis (central dogma) and its expansion to include
nonnatural amino acids.
once it has been aminoacylated with a nonnatural amino

acid, must work like other aminoacylated tRNAs.
3. Aminoacylation of the orthogonal tRNA by a nonnatural
amino acid. For in vivo synthesis of nonnatural mutant
proteins, the aminoacylation must be tRNA specific, that
is, must take place only on a particular orthogonal tRNA
even in the presence of different types of tRNAs.
4. Modification of an elongation factor for translation
(EF-Tu)to accept aminoacyl-tRNAs carrying nonnatural
amino acids and to bring them into the A site of ribosome.
5. Expansion of the codon/anticodon pairs to assign
positions of nonnatural amino acids in proteins.
6. Modification of the ribosome system to accept nonnatural
amino acids.
Steps 4 and 6 may not be serious, since both EF-Tu and ribosome are
tolerant to accept all 20 naturally occurring amino acids and this tolerance
may hold for some nonnatural amino acids also. However, if we want to
incorporate large-sized nonnatural amino acids whose side chain structures
are very different from the naturally occurring ones, we cannot postulate the
tolerance of EF-Tu and ribosome. In these cases, we will also have to expand
them.
274
I 5.1.2
Aminoacylation of tRNA with Nonnatural Amino Acids
5.1.2.1 Hecht Method for Chemical Aminoacylation of Isolated tRNAs

Since the enzymes for tRNA aminoacylation (aaRSs) show high specificity to
particular amino acid and to particular tRNA, it is difficult, if not impossible,
to obtain mutants that accept a specific nonnatural amino acid (aa*) and do
not accept any naturally occurring ones. The aminoacylation for nonnatural
amino acids, therefore, has to be carried out nonenzymatically. Nonenzymatic
aminoacylation has been pioneered by Hecht and coworkers [S] (Fig. 5.1-2).
They synthesized a 2'( 3') -aminoacylated mixed dinucleotide pCpA-aa*, then
ligated it with a tRNA that lacks a pCpA unit at the 3' end. Later, the pCpA
dinucleotide was replaced by a pdCpA unit to simplify the synthesis. The Hecht
method is applicable to any types of amino acids and any types of tRNAs with
relatively high yields. At present, the Hecht method has been employed most
widely for aminoacylation of isolated tRNA in vitro. However, there are several
drawbacks. First, a large-scale synthesis of pdCpA is difficult, although a few
milligram quantity of pdCpA can be obtained through solid phase method.
For a coupling of pdCpA with N-protected amino acid, the former must be sol-
ubilized into dimethylformamide through formation of tetrabutylammonium
salt. This process is sometimes tricky, although this problem can be avoided by
using cationic micelles as the reaction medium [GI.Ligation of the pdCpA-aa"
to tRNA(-CA)by T4 RNA ligase must compete with formation of a cyclic tRNA
as a by-product. Unfortunately, the cyclic tRNA works as an inhibitor of protein
synthesis [7]. Of course, the Hecht method is not tRNA selective and it cannot
be carried out for aminoacylation of a specific tRNA in vitro and in vivo.
Nonenzymatic aminoacylation has been attempted by simpler procedures.
Krzyzaniak et al. reported that aminoacylation took place when a solution of
amino acid and tRNA was incubated under high pressures as GOO0 bar [8].
Fig. 5.1-2 Hecht method for chemical aminoacylation oftRNA with a nonnatural amino
acid.
However, they have not confirmed if the aminoacylated tRNA really works
in vitro or in vivo.
5.1.2.2
Micelle-mediated Arninoacylation
Very recently, the author found that cationic rnicelles mediate arninoacyla-
tion of tRNAs with N-protected amino acid activated ester under ultrasonic
irradiation (Fig. 5.1-3) [9].A cationic rnicelle, like CTACI rnicelle, solubilizes hy-
drophobic N-pentenoyl amino acid cyanomethyl ester inside the hydrophobic
core, whereas the negatively charged tRNA molecules are concentrated on the
positively charged rnicelle surface. The two components are separated inside
and outside the rnicelle and do not react with each other as they stand still.
When the mixture was ultrasonicated, the rnicellar structure may have fluttered
and the reaction taken place. For example, when 5 mM of N-pentenoyl-~-2-
naphthylalanine cyanomethyl ester and 0.01 rnM tRNA were sonicated in a
90 mM imidazole buffer (pH 7.5) that contained 1 8 mM CTAC1, up to 75%
yield of the aminoacylated tRNA was achieved within 10 minutes. Product
analysis indicated that about 70% of the aminoacylation is occurring at the 2'
or 3' OH group of the 3' end and no aminoacylation to the amino groups of the
nucleobases occurs. This high regioselectivity is surprising, because there are
77 OH groups in the tRNA and most of them are exposed to the solvent. The
rest of 30% arninoacylation occurs at the OH groups of other nucleotide units.
Fig. 5.1-3 Micelle-mediated aminoacylation under ultrasonic agitation

276
I Fortunately, the incorrectly aminoacylated tRNAs did not seriously inhibit
protein synthesis, presumably because they cannot bind to EF-Tu and cannot
go into the A site of ribosome. Indeed, when the crude aminoacyl-tRNAwas
added to Escherichia coli in vitro protein biosynthesizing system, a mutant
protein incorporated with a 2-naphthylalanine was obtained. The success of
micellar aminoacylation suggests that the t RNA aminoacylation is inherently
specific to the 2’(3’)-OHgroup, presumably because of the high reactivity of
the gem-diol group. A drawback of the micellar aminoacylation is that a small
amount of the cationic detergent remains attached to the negatively charged
tRNA. This may reduce the protein yield to some extent.
5.1.2.3
Ribozyme-mediated Aminoacylation
Suga and coworkers undertook a challenging work to create a surrogate of

aaRS with their ribozyme technique (Fig. 5.1-4) [lo-131. Inspired by the fact
that tRNAs are biosynthesized through cleavage of 5’ flankers, they attached a
random RNA sequence at the 5‘ end of a tRNA to obtain a library of extended
Fig. 5.1-4 Ribozyme-mediated aminoacylation.

tRNAs. From the library, they selected those that undergo self-aminoacylation
with a biotinylated amino acid cyanomethyl ester. The identified RNA sequence
worked as an artificial aaRS even after it was cleaved off from the original
tRNA. Because the ribozyme is flexible enough to aminoacylate a wide variety
of tRNAs that have a common ACCA 3’ end, with a variety ofp-substituted
phenylalanine derivatives, it was named as a Jexizyrne. After optimization
and minimization of the RNA sequence, the flexizyme was charged onto a
columnar gel. The flexizyme column can aminoacylate tRNAs with a variety
of p-substituted phenylalanine cyanomethyl esters simply by passing a tRNA
with an amino acid cyanomethyl ester through the column [14-161. The
aminoacylated tRNA has been shown to work in E. coli in vitro system to
introduce the p-substituted phenylalanine derivatives into proteins. Recently,
the flexizyme has been given tRNA specificity by extending its 3’ end with a
complementary chain to a specific tRNA [17].
5.1.2.4
PNA-assisted Aminoacylation
Recently, the author’s group developed another aminoacylation method using

PNA [18] as a tRNA-recognizing molecule (Fig. 5.1-5) [19]. An amino acid
thioester was linked through a spacer to a 9-mer PNA that is complementary
to the 3’ region of a tRNA. When the PNA was hybridized with the tRNA, the
amino acid thioester comes close to the 3’ OH group of the tRNA, provided the
Fig. 5.1-5 PNA-mediated aminoacylation.

278
I spacer chain is properly designed. The PNA must bind to a specific tRNA, but
not too tightly, otherwise it will remain attached after the aminoacylation and
retard or even inhibit the protein synthesis. In the case of yeast phenylalanine
tRNA, the 9-mer PNA was the best choice, but the chain lengths had to be
optimized for other tRNAs. Addition of an equimolar amount of the aa*-S-
sp-PNA conjugate to the tRNA gave 40-50% yield of aminoacylation against
yeast phenylalanine tRNA.
The PNA-assisted aminoacylation was specific to a target tRNA that has
a complementary 3‘-region to the PNA in an E. coli S30 in uitro protein
synthesizing system that contained a variety of endogenous tRNAs. When
we put a 2-naphthylalanine thioester-spacer-PNA conjugate together with an
orthogonalized yeast phenylalanine tRNA into the S30 system, the nonnatural
amino acid was successfully incorporated into the target protein.
The PNA-assisted aminoacylation/in vitro translation system is currently the
simplest way to obtain nonnatural mutants, if the relevant compound is given.
Since this is a chemical expansion of the aminoacylation process, it will be
applicable to a wide variety of nonnatural amino acids and different tRNAs.
The PNA-assisted aminoacylation is specific to a complementary tRNA and
is potentially effective in a living cell. The only obstacle against the in uiuo
aminoacylation is that the Nielsen-type PNA does not easily penetrate through
cell membranes. Efforts to design different types of PNAs that can penetrate
through cell membranes are in progress [20, 211.
5.1.2.5
Directed Evolution of Existing aaRS/tRNA Pair to Accept Nonnatural
Amino Acids
An alternative approach to the nonnatural aminoacylation is to alter substrate

specificity of existing aaRSs. This is not an easy task, since aaRSs show
rigorous specificity to a particular amino acid and to a particular tRNA, and
link the former specifically to the 3’ or 2’-OH group of the latter. The rigorous
specificity must maintain the fidelity of the translation process. Schultz and
coworkers, however, constructed a sophisticated selection scheme to find a
mutant of aaRS that aminoacylates a particular tRNA with a specific nonnatural
amino acid, but not with any of natural amino acids [22, 231. They started
from a TyrRS/tRNA pair of Methanococcas jannaschi and mutated its tRNA
structure not to accept any natural amino acids by the endogenous aaRSs
in the E. coli system (Fig. 5.1-6). The mutated tRNA/TyrRS pair worked as
an orthogonal aaRS/tRNA pair in the E. coli system independently from the
endogenous aaRS/tRNA pairs [22].Next, they mutated the TyrRS structure not
to accept Tyr or any other natural amino acids (Fig. 5.1-7), but to accept only
0-methyltyrosine (Fig. 5.1-8) [23].They introduced the orthogonal tRNA/aaRS
pair into an E. coli and obtained a first living cell that incorporates 0-
methyltyrosine as a 21st amino acid into a protein (Fig. 5.1-9). By using a
5.I Synthetic Expansion ofthe Central Dogma I 279
Fig. 5.1-6 Selection oftRNAs that are not aminoacylated by any o f t h e aaRSs in E. coli.
Fig. 5.1-7 Negative selection for eliminating TyrRS mutants that aminoacylate the
orthogonal tRNA with Tyr or any o f natural amino acids in E. coli.
280
I 5 Expanding the Genetic Code
Fig. 5.1-8 Positive selection for picking up TyrRS mutants that aminoacylate the
orthogonal tRNA with 0-methyltyrosine.
Fig. 5.1-9 Expanded living organism that produces proteins including a nonnatural
amino acid as the 21 st one.
similar procedure, they introduced various nonnatural amino acids into living
cells [24-26]. Later, they put the orthogonal tRNA/aaRS pair together with
an enzyme that synthesizes p-aminophenylalanine from basic carbon sources
[27].This is the first example of a cell that self-creates a 21st amino acid and
lives with it.
Yokoyama and coworkers also used a similar approach to find an orthogonal
aaRS/tRNA pair that works in mammalian cells. They used the orthogonal
pair to incorporate iodotyrosine into proteins [28, 291. The i n vivo system that
produces proteins in which iodine atoms are incorporated at specific positions
will find applications in large-scale production of heavy-atom labeled proteins
for X-ray analysis.
The elegant approaches of Schultz and Yokoyama are, however, typical
examples of biological expansion. It is not surprising, therefore, that their
screening processes, so far, produced aaRS/tRNA pairs only for amino acids
that are not far from the naturally occurring ones. It seems difficult, if not
impossible, to identify aaRS/tRNA pairs that can introduce large-sized amino
acids from their screening processes. Since nonnatural amino acids of specialty
functions, like fluorescence, electron donating, and accepting functions, often
carry large side groups, a more widely applicable method for aminoacylation
is needed.
At this moment, aminoacylation of tRNA with a nonnatural amino acid is
still a bottleneck step for nonnatural mutagenesis both in vitro and i n vivo.
Hecht method is versatile to almost any types of amino acids, but can be
done only for isolated tRNAs in a test tube. Further, the aminoacylation step
of pdCpA is sometimes tricky. For aminoacylation in a test tube, micelle-
mediated method is easier than the Hecht method, at least for some types
of amino acids. The ribozyme technique of Suga is applicable to a variety
of p-substituted phenylalanines and to a wide variety of tRNAs. This is, at
present, the simplest and most dependable method of aminoacylation for
isolated tRNAs. It has not been, however, applied to i n vivo systems and to
large-sized amino acids. Our PNA-assisted aminoacylation method may also
be applicable to a wide variety of amino acids and tRNAs. Since the PNA-
assisted aminoacylation is tRNA selective, it works as a potential amino acid
donor in living cells. The orthogonal tRNA/aaRS pairs reported by Schultz
and by Yokoyama are effective in some nonnatural amino acids with small
side groups, but they have not been applied to large-sized amino acids,
so far.
5.1.3
Other Biornolecules That Must Be Optimized for Nonnatural Amino Acids
5.1.3.1 Orthogonal tRNAs

As pointed out above, the tRNA to be used as a carrier of nonnatural amino
acid must not be aminoacylated by any aaRSs in the system, but once it
282
I is aminoacylated with a nonnatural amino acid by any means, it must work
efficiently as an ordinary aminoacyl-tRNA. In the Schultz’s case, the orthogonal
tRNA has to be selected as an orthogonal tRNA/aaRS pair. This imposes tough
restrictions on the tRNA structures and makes it difficult to identify rigorously
orthogonal and highly efficient tRNAs for a nonnatural amino acid. Whether
the aminoacylation would be carried out for isolated tRNAs, or for a specific
tRNA with a ribozyme or with an amino acid-PNA conjugate, the orthogonal
condition has to be satisfied only against aaRSs in the system. Namely, the
tRNA must be protected from the attack of endogenous aaRSs, but does not
have to be a specific and efficient substrate of an engineered aaRS for a
nonnatural amino acid. Under these relaxed conditions, we have found several
orthogonal tRNAs that efficiently deliver a nonnatural amino acid to the E. coli
ribosomal system [30].
We started with tRNAs having nonstandard secondary structures, such
as those in mitochondria and other species, and added small changes
on their stem structures. The tRNAs were examined for their ability of
exclusive introduction of a nonnatural amino acid into a protein in E. coli
in vitro protein synthesizing system. The nonstandard tRNAs that carry a
CCCG four-base anticodon were absolutely protected from the attack by
the endogenous aaRSs in the E. coli system. Fortunately, however, some of
the nonstandard tRNAcCCGS, when they were chemically aminoacylated with
p-nitrophenylalanine, very efficiently decoded a CGGG four-base codon on
the streptavidin mRNA to introduce the nonnatural amino acid. The results
indicate that the tRNAs of nonstandard structures make a good starting
point toward finding orthogonal tRNAs as carriers of nonnatural amino
acids. Some of the orthogonal tRNAs that have been identified to work
efficiently as carriers of nonnatural amino acids in E. coli system are listed in
Fig. 5.1-10.
Suga Schultz Schultz with Bovine mt tKNA’er,,,,

yeast Phe acceptor stem
Fig. 5.1-10 Orthogonal tRNAs that are not aminoacylated by any of natural amino acids
in E. coli, but can bring a nonnatural amino acids efficiently into the ribosome A site.
5.1.3.2
Adaptability of EF-Tu to Aminoacyl-tRNAs Carrying a Wide Variety of Nonnatural
Amino Acids
Aminoacyl-tRNAs that carry nonnatural amino acids enter into the A site
of ribosome with the aid of an enzyme called an elongation factor, EF-
Tu. Only a single type of EF-Tu molecule exists in E. coli and it delivers
all types of aminoacyl-tRNAs into the ribosome A site. Therefore, the
EF-Tu molecule has an adaptability to bind a wide range of aminoacyl-
tRNAs, presumably, including those with some nonnatural amino acids.
Our preliminary experiment indicates that the E. coli EF-Tu binds yeast
phenylalanine tRNA that carries a variety of nonnatural amino acids with,
however, reduced affinities [31]. Aminoacyl-tRNAs carrying bulky nonnatural
amino acids, like 1-pyrenylalanine bind very weakly to the EF-Tu. Although
the binding affinity to EF-Tu may not be directly proportional to the
incorporation efficiency, it is clear that insufficient binding to EF-Tu leads
to unsuccessful incorporation of the nonnatural amino acid. Design and
synthesis of engineered EF-Tus that bind wider range of aminoacyl-tRNAs
with bulky nonnatural amino acids, are now in progress.
5.1.3.3
Adaptability of Ribosome to Wide Variety o f Nonnatural Amino Acids
Since the peptide bonds form in the ribosome, its expansion to accept wide
range of nonnatural amino acids will be the final target. It is somewhat
surprising that amino acids that carry large side groups like those shown
in Fig. 5.1-11 (left) have been incorporated into proteins in fairly high yields
in E. coli and other biosynthesizing systems [32]. This indicates that the
ribosomes of various species are very tolerant to a wide variety of amino
acids even beyond the naturally occurring ones. At the same time, however,
there are kinds of nonnatural amino acids that are rigorously rejected from
the ribosome, although their side groups are not very bulky [32]. Some
examples are shown in Fig. 5.1-11 (right).Typically, D-amino acids have been
rigorously rejected by the E. coli ribosome [33, 341. Similarly, our recent
experiment suggests that 9-anthrylalanine is rigorously rejected [32], even
though chemically aminoacylated yeast Phe tRNA with 9-anthrylalanine binds
to EF-Tu with somewhat reduced affinity [31].
The adaptability of E. coli ribosome has been investigated by using puromycin
analogs that carry a variety of nonnatural amino acids [35]. Since puromycin
is known to bind to the ribosomal A site without assistance of EF-Tu, the
extent of the inhibition of translation by the puromycin analogs can be a direct
measure of the adaptability of the A site to a variety of nonnatural amino acids.
The inhibition efficiency indicated that some aromatic amino acids that carry
widely expanded side groups, like 9-anthrylalanine and 1-pyrenylalanine, are
284
\
' COOH COOH COOH

v
R
I I
D-Amino
NH
I
o=s=o
NrC=O acids
Relatively small amino acids

that are rejected by E.coli
I
NMe,
Relatively large amino acids

that are allowed by E.coli
ribosome /
Fig. 5.1-1 1 Relatively large-sized nonnatural amino acids that are efficiently incorporated
into proteins and small-sized ones that cannot be incorporated into proteins.
not accepted by the A site. Recently, Roberts and coworkers also showed that
analogs carrying D-aminO acids or ,!?-aminoacids are little bound to the A site,
although they did not carry very large side groups [36].
These facts suggest that the inner structure of A site is very critical to reject
some types of amino acids and even small modifications of its structure will
expand its amino acid adaptability significantly. Indeed, Hecht and coworkers
showed that an E. coli ribosome with 23s rRNA with a UGGCA sequence
instead of GAUAA in the region 2447-2451, accepts D-amino acids to some
extent [37].Elaboration on the ribosome structure will open a way to synthesize
proteins that contain much wider variety of nonnatural amino acids.
5.1.4
Expansion o f the Genetic Codes
5.1.4.1 Amber and Other Stop Codons

The second key step for the expansion of the biosynthesizing system to
introduce nonnatural amino acids is the expansion of the genetic codes.
Schultz [38]and Chamberlin [39]first assigned an amber (UAG) stop codon to
a nonnatural amino acid (aa"). By adding an aa"-tRNA with a CUA anticodon as
a suppressor of the amber codon, they successfully introduced the nonnatural
amino acid at that position. Since then, the amber suppression method has
been employed by a number of researchers. This method is advantageous
in that an unsuccessful decoding of the UAG codon automatically leads to
truncation of the protein synthesis. No full-length protein that erroneously

contains one of the 20 naturally occurring amino acids is produced, provided
that the tRNA is rigorously orthogonal. One of the drawbacks of the stop-codon
suppression method is that only one or two of the three stop codons (UAG,
UAA, UGA) can be assigned to nonnatural amino acids and, therefore, only
one or two nonnatural amino acids can be incorporated into a single protein.
This restricts the application of the nonnatural mutagenesis.
It is not trivial that the amber suppression method can be used in living cells,
because some of essential proteins may not be synthesized properly in the
presence of a large amount of the aminoacylated suppressor tRNA. However,
the amber suppression method has been reported to work successfully in
Xenopus oocyte [40, 411, E. coli [23-251, and mammalian cells [28, 42-44].
5.1.4.2
Four-base Codons
We have demonstrated that several four-base codons like CGGG and AGGU
can be used independently in the framework of the existing three-base codon
system [45, 461. The idea of the four-base codon has been inspired from the
naturally occurring frame-shift suppression. An undesired frame shift that
originates from an insertion of one nucleotide unit can be suppressed by
a frame-shift suppressor tRNA that contains a four-base anticodon. Similar
to the frame-shift suppressor tRNA, some of the four-base codons can be
successfully decoded by artificial frame-shift suppressor tRNAs that contain
the complementary four-base anticodons. Unsuccessful translation of a four-
base codon as the corresponding three-base codon causes an undesired
frame shift, but it often leads to an encounter of a stop codon downstream
(Fig. 5.1-12).Therefore, the four-base codon method, like the amber method,
gives exclusively a full-length protein that contains a nonnatural amino acid at
that position and an undesired decoding as a three-base codon gives a truncated
protein. The probability of the undesired three-base codon decoding can be
reduced by choosing rare codons as the first three bases of the four-base codons.
The most remarkable advantage of the four-base codons as compared
with the amber codon is that we can incorporate two or more different
nonnatural amino acids into single proteins [47, 481. We have identified five
different four-base codons that work independently in E. coli system, namely,
AGGU, CGGG, GGGU, CUCU, and CCCU [4G]. Similarly, CGGU(CGCU),
CCCU, CUCU(CUAU), and GGGU work efficiently in the lysate of rabbit
reticulocyte [49]. Since they are independent and orthogonal to each other, we
can introduce, in theory, up to five different nonnatural amino acids into a
single protein in E. coli system, and up to four in the rabbit system. In practice,
however, because of the reduced incorporation efficiencies of nonnatural
amino acids, the maximum number of nonnatural amino acids in a single
protein is limited to three, at this moment. The multiple incorporation has
286
Fig. 5.1-12 Principle o f the four-base codon strategy.
been actually demonstrated by introducing a fluorophore-quencher pair into

single streptavidin [48]. Four-base codons can be used in conjunction with stop
codons for multiple incorporations [SO, 511.
It is argued that the extension of the lengths of codons and anticodons might
cause steric overcrowding between the tRNAs in the ribosomal A site and P
site. The overcrowding in ribosome, however, has been avoided by a bend of
mRNA chain at the junction between the A and P sites [52]. Because of this
bend, the main bodies of the two tRNAs are well separated, while the two
anticodons as well as the amino acid and the peptide C-terminal are close to
each other. Indeed, even five-base codons [ 5 3 ] and a tandem four-base codon
[54] have been reported to be successful.
Similar to the amber codon method, four-base codon method has been
shown to work in living cells [55].
5.1.4.3
"Synthetic Codons" That Contain Nonnatural Nucleobases
Nonnatural nucleobases are another important and challenging area of

chemical biology. Benner reported that isoC-isoG pair works as an orthogonal
base pair in addition to the existing A-T and G-C pairs (Fig. 5.1-13) [SG].
5. I Synthetic Expansion ofthe Central Dogma I 287
isoC
*H
isoG
Benner Hirao, Hirao,

Yokoyaina Yokoyama
Fig. 5.1-13 Nonnatural base pairs that are orthogonal to the A-T and G-C pairs.
The “synthetic codon/anticodon pair”, like isoCAG/CUisoG has been actually

used to assign a nonnatural amino acid in an E. coli in vitro system [57]. Hirao
and Yokoyama reported that a y-s pair also works as an orthogonal base pair.
The y-s pair is advantageous because “s” on DNA can be transcribed to “y”
on mRNA with high enough fidelity in the presence of yTP. The resulting
synthetic codon yAG was successfully translated by a tRNA containing the
corresponding synthetic anticodon CUs [58, 591. Unfortunately, transcription
of “y” on DNA to “s” on RNA was not accurate enough and the tRNAcus
had to be synthesized chemically. Recently, they reported an improved version
of the nonnatural base pair, s-z pair, to solve this problem [GO]. Nonnatural
base pairs have also been explored by Schultz’s group, using hydrophobic
interactions as the unique forces for base pairing [ G l ] .
5.1.5
In vivo Synthesis o f Nonnatural Mutants
So far, the nonnatural mutants have been synthesized mostly in cell-free in

vitro protein synthesizing system, mainly because chemical aminoacylation
had to be carried out for isolated tRNAs in a test tube. In vivo synthesis of
nonnatural mutant proteins is advantageous because it produces a much larger
amount of mutant proteins and provides opportunity for in vivo test of drugs
and other small molecules by selective fluorescent labeling of target proteins
in vivo. For an in vivo synthesis of nonnatural mutants, the aminoacylation has
to be carried out for a specific tRNA with a specific nonnatural amino acid. At
this moment, the in vivo aminoacylation has been successfully carried out only
by engineered aaRSs that have been selected to accept a specific nonnatural
amino acid [23-291. As mentioned above, however, the engineered aaRSs have
been successful only for small-sized amino acids, and no successful result has
been reported for those carrying large-sized amino acids, like fluorescent ones.
288
I Although ribozyme- and PNA-assisted aminoacylation are potentially tRNA
specific and would work as aminoacylating agents in vivo, their application in
living cells has not been reported, yet.
Import of aminoacyl-tRNA into living cells is another approach toward in
vivo production of nonnatural mutant proteins. Dougherty and coworkers
microinjected [41]or electroporated [44]an aminoacyl-tRNA/mRNA pair into
Xenopus oocyte to synthesize fluorescently labeled acetylcholine receptor. The
microinjection method is applicable to any type of tRNA and amino acid, but
the number of cells that can be treated at one time is very limited.
RajBhandary and coworkers [42, 431 showed that aminoacyl-tRNAs can be
imported safely by the use of transfection reagents (Fig. 5.1-14). By importing
two types oftRNAs, one for suppressing amber (UAG) codon and the other for
suppressing ocher codon, that are preaminoacylated with different amino acids
they successfully obtained a multiply mutated protein in a mammalian cell. The
transfection method is also applicable to any type oftRNA and amino acid and to
a wide variety ofcells. A possible drawback ofthis method is the short lifetime of
aminoacyl-tRNAs that is often less than an hour at neutral pH ranges, whereas
most of the transfection reagents form endosomes that are stable in cytoplasm
for a few hours or even a day. Fortunately, however, since the pH value inside
the endosome is estimated to be about 4,significant amount of aminoacyl-
tRNAs will be still remaining until the breakdown of endosome. Despite these
Fig. 5.1-14 Import oftRNA aminoacylated with nonnatural amino acids into a living cell
through endocytosis.
facts, for the transfection method to be efficient, the endosomes must be broken
in the cytoplasm as quickly as possible, or alternatively, another technique that
leads to direct penetration of aminoacyl-tRNA must be developed.
5.1.6
Application o f Nonnatural Mutagenesis - Fluorescence Labeling
Nonnatural mutagenesis has been finding applications in probing protein

functions and structures, in glycosylation [62-641 and phosphorylation [65] as
alternative routes to the posttranslation modifications, in controlling protein
functions by external factors like photoirradiation, and so on. Since the
amount of mutant proteins produced in conventional in vitro system is usually
less than a microgram, fluorescence labeling seems the most practical and
promising application. Position-specific fluorescence labeling is a key step in
vast biochemical fields including in vitro and in vivo proteome analysis and
protein network analysis, in vitro and in vivo conformational analysis, and
single molecular spectroscopic analysis.
A variety of fluorescent amino acids have been synthesized and examined
for their incorporation into proteins. The fluorescent amino acids that
show excitation wavelengths longer than 350 n m and have been successfully
incorporated into proteins are listed in Fig. 5.1-15 [66-731.
When polarity-sensitive fluorescent amino acids, like 1, 2, 4, 5, and G were
incorporated into antibodies, receptors, and enzymes, the mutants worked
as sensors for the antigens, ligands, and substrates or inhibitors. For the
fluorescently labeled proteins to be sensitive enough, however, the fluorophore
must be located at a specific position where binding of low-molecular-weight
compound causes polarity change around the fluorophore, but, at the same
time, the body of the fluorophore should not disturb the binding of the
low-molecular-weight compounds. In short, the fluorophore must be located
not too close to, but not too far from the binding site. Only position-specific
incorporation of fluorescent amino acids can satisfy the conflicting conditions.
When an acridonylalanine (acdAla)was incorporated at different positions of
camel single-chain antibody against hen-lysozyme, the TyrlO6acdAla mutant
sensitively responded to the binding of nanomolar concentration of the antigen,
whereas the Trpl23acdAla mutant was insensitive to the binding (Fig. 5.1-16)
[71].When the same fluorescent amino acid was incorporated into streptavidin,
some mutants responded to even a picomolar quantity of biotin [71].The lower
limit of the detectable concentration is determined not by the fluorescence
sensitivity, but by the dissociation constants of the protein-small molecule
interactions.
Incorporation of two different fluorescent amino acids into single proteins
can expand the scope of fluorescence analysis from the simple quenching
analysis as described above to a detailed study on conformational changes
associated with folding processes. Fluorescence resonance energy transfer
290
I COOH COOH COOH COOH
1 2 3 4
COOH COOH COOH COOH
H 2 N 3
NH
I
o=s=o
5 $ 6 7
NMe,
Fig. 5.1-15 Nonnatural amino acids carrying fluorescent groups, that have been
incorporated into proteins with high efficiency.
Fig. 5.1-16 Detection of antigen molecule by a fluorescently labeled antibody.

References I 2 9 1
(FRET)is often the method ofchoice [53]because it is based on firm theoretical

background and has been experimentally shown to obey the Forster’s l / r 6
distance dependence, provided that the orientation factor has been averaged
out [74]. The only restriction at present is that the types of fluorescent amino
acids for energy donors and energy acceptors are very limited as listed in
Fig. 5.1-15.
5.1.7
Future Development and Conclusion
Basic strategy ofnonnatural mutagenesis was first reported more than 15 years
ago, as a promising technology for structural and functional analyses of
proteins in vitro and in vivo and for creating proteins of specialty functions.
However, it still remained a special method for only a limited number of
researchers, mainly because of the lack of an easy way of aminoacylation
and lack of appropriate nonnatural amino acids for useful applications.
Fortunately, facile and dependable methods for aminoacylation are now
available and several nonnatural amino acids reported recently appear to
be really useful for fluorescence labeling, glycosylation, phosphorylation, and
other applications. Commercialization of the reagents for aminoacylation
and the nonnatural amino acids carrying specialty side groups will further
accelerate the prevalence of this method. Nonnatural mutagenesis is a unique
method that enables position-specific labeling with a variety of functional
groups. Further, the labeling can be done even in living cells. No alternative
technique can do this. Wide application of this method will open a new area
in protein research in general and, especially, in drug discovery and protein
network analysis.
Acknowledgments
Recent experimental results from our laboratory described in this chapter have
been obtained by a support from a Grand-in-Aid for Scientific Research of the
Ministry of Education, Science, Sports, and Culture, japan (No. 15101008).
References
1. T. Hohsaka, M. Sisido, Incorporation Biopolymers, Vol. 8 (Eds.:

of non-natural amino acids into A. Steinbiichel, S.R. Fahnestock),
proteins, Curr. O p k . Chem. Bid. 2002, Chapter 2, Wiley-VCH, Weinhelm,
6,809-81s. Germany, 2002, pp. 26-49.
2. M. Sisido, Proteins containing 3. M. Sisido, Synthetic expansion of the
nonnatural amino acids, in central dogma: chemical
292
I aminoacylation, 4-base codons and non-natural amino acids based on a
nonnatural mutagenesis, in Peptide programmable ribozyme, Nut.
Revolution: Genomics, Proteomics @ Biotechnol. 2002, 20, 723-728.
Trterupeutics”, Proceedings ofthe 13. H. Saito, D. Kourouklis, H. Suga, An
Eighteenth Awlencan Peptide in vitro evolved precursor tRNA with
Symposium (Eds.: M. Chorev, T.K. aminoacylation activity, EMBO J.
Sawyer),American Peptide Society, 2001, 20,1797-1806.
Cardiff, CA, USA, 2004, pp. 294-300. 14. H. Murakami, N.J. Bonzagni,
4. C. Kohrer, U.L. RajBhandary, Proteins H. Suga, Aminoacyl-tRNAsynthesis
with one or more Unnatural Amino by a resin-immobilized ribozyme, J.
Acids, in 7 h e Aminoacyl-tRNA Am. Chem. SOC.2002, 124,6834-6835.
Synthetases (Eds.: M. Ibba, 15. H. Murakami, H. Saito, H. Suga, A
C. Francklyn, S. Cusack), Landes versatile tRNA aminoacylation catalyst
Bioscience, Georgetown, Texas, USA, based on RNA, Chem. Biol. 2003, 10,
2005. 655-662.
5. T.G. Heckler, L.H. Chang, Y. Zama, 16. H. Murakami, D. Kourouklis,
T. Naka, M.S. Chorghade, S.M. Hecht, H. Suga, Using a solid-phase
T4 RNA ligase mediated preparation ribozyme aminoacylation system to
of novel “chemically misacylated” reprogram the genetic code, Chem.
tRNAPhes,Biochemistry 1984, 23, Biol. 2003, 10, 1077-1084.
1468- 1473. 17. H. Saito, H. Murakami, K. Shiba,
6. K. Ninomiya, T. Kurita, T. Hohsaka, K. Ramaswamy, H. Suga, Designer
M. Sisido, Facile aminoacylation of ribozymes: programming the tRNA
pdCpA dinucleotide with a nonnatural specificity into flexizyme,J. Am. Chem.
amino acid in cationic micelle, Chem. SOC.2004, 126,11454-11455.
Commun2004,2242-2243. 18. P.E. Nielsen, M. Egholm, R.H. Berg,
7. K. Yamanaka, H. Nakata, T. Hohsaka, 0. Buchardt, Sequence selective
M. Sisido, Efficient synthesis of recognition of DNA by strand
nonnatural mutants in E. coli in vitro displacement with a
protein synthesizing system, J. Biosci. thymine-substituted polyamide,
Bioeng. 2004, 97, 395-399. Science 1991,254,1497-1500.
8. A. Krzyzaniak, P. Salanski, J. Jurczak, 19. K. Ninomiya, T. Minohata,
T. Twardowski, J. Barciszewski, tRNA M. Nishimura, M. Sisido, In situ
aminoacylated at high pressure is chemical aminoacylation with amino
correct substrate for protein acid thioesters linked to a peptide
biosynthesis, Biochem. Mol. Biol. Int. nucleic acid, J. Am. Chem. SOC.2004,
1998,45,489-500. 126,15984-15989.
9. N. Hashimoto, K. Ninomiya, T. Endo, 20. M. Kitamatsu, M. Shigeyasu,
M. Sisido, Simple and quick chemical T. Okada, M. Sisido, Oxy-peptide
aminoacylation of tRNA in cationic nucleic acid with a pyrrolidine ring
micellar solution under ultrasonic that is configurationally optimized for
agitation, Chem. Commun. 2005, hybridization with DNA, Chem.
4321-4323. Commun. 2004,1208-1209.
10. N. Lee, Y. Bessho, K. Wei, J.W. 21. M. Kitamatsu, M. Shigeyasu,
Szostak, H. Suga, Ribozyme-catalyzed M. Saitoh, M. Sisido, Configurational
tRNA aminoacylation, Nut. Strut. preference of pyrrolidine-based
Biol. 2000, 7, 28-34. oxy-peptidenucleic acids as
11. H. Saito, H. Suga, A ribozyme hybridization counterparts with DNA
aminoacylates exclusively on the and RNA, Biopolymers Pept. Sci. 2006,
3’-hydroxylgroup of the 3’-terminus of 84,267-273.
tRNA, J. Am. Chem. SOC.2001, 123, 22. L. Wang, P.G. Schultz, A general
7178-7179. approach for the generation of
12. Y. Bessho, D.R.W. Hodgson, H. Suga, orthogonal tRNAs, Chem. Biol. 2001, 8,
A tRNA aminoacylation system for 883-890.
References I 2 9 3
23. L. Wang, A. Brock, B. Herberich, P.G. incorporation of nonnatural amino

Schultz, Expanding the genetic code of acids with large aromatic groups into
Escherichia coli, Science 2001, 292, streptavidin in in vitro protein
498-500. synthesizing systems, J . Am. Chem.
24. L. Wang, A. Brock, P.G. Schultz, SOC.1999, 121, 34-40.
Adding L-3-(2-naphthyl)alanineto the 33. J.R. Roesser, C. Xu, R.C. Payne, C.K.
genetic code of E.coli, J. Am. Chem. Surratt, S.M. Hecht, Preparation of
SOC.2002, 124, 1836-1837. misacylated aminoacyl- tRNAPhes
25. J.W. Chin, S.W. Santoro, A.B. Martin, useful as probes of the ribosomal
D.S. King, L. Wang, P.G. Schultz, acceptor site, Biochemistry 1989, 28,
Addition of p-azido-L-phenylalanine to 5185-5195.
the genetic code of Escherichia coli, J . 34. J.D. Bain, E.S. Diala, C.G. Glabe, D.A.
Am. Chem. SOC. 2002, 124,9026-9027. Wacker, M.H. Lyttle, T.A. Dix, A.R.
26. J.W. Chin, T.A. Cropp, J.C.Anderson, Chamberlin, Site-specific
M. Mukherji, Z. Zhang, P.G. Schultz, incorporation of nonnatural residues
An expanded eukaryotic genetic code, during in vitro protein biosynthesis
Science 2003, 301, 964-967. with semi-synthetic aminoacyl-tRNAs,
27. R.A. Mehl, J.C. Anderson, S.W. Biochemistry 1991, 30, 5411-5421.
Santoro, L. Wang, A.B. Martin, D.S. 35. T. Hohsaka, K. Sato, M. Sisido,
King, D.M. Horn, P.G. Schultz, K. Takai, S. Yokoyama, Adaptability of
Generation of a bacterium with a 21 nonnatural aromatic amino acids to
amino acid genetic code, J. Am. Chem. the active center of E. Coli ribosomal A
SOC.2003, 125,935-939. site, FEBS Lett. 1993, 335, 47-50.
28. D. Kiga, K. Sakamoto, K. Kodama, 36. S.R. Starck, X. Qi, B.N. Olsen, R.W.
T. Kigawa, T. Matsuda, T. Yabuki, Roberts, The puromycin route to asses
M. Shirouzu, Y. Harada, stereo- and regiochemical constraints
H. Nakayama, K. Takio, Y. Hasegawa, on peptide bond formation in
Y. Endo, I . Hirao, S. Yokoyama, An eukaryotic ribosomes, J . Am. Chem.
engineered Escherichia coli SOC.2003, 125,8090-8091.
tyrosyl-tRNA synthetase for 37. L.M. Dedkova, N.E. Fahmi, S.Y.
site-specific incorporation of an Golovine, S.M. Hecht, Enhanced
unnatural amino acid into proteins in D-amino acid incorporation into
eukaryotic translation and its protein by modified ribosomes, J. Am.
application in a wheat germ cell-free Chem. SOC.2003, 125,6616-6617.
system, Proc. Natl. Acnd. Sci. U. S. A. 38. C.J. Noren, S.J. Anthony-Cahill, M.C.
2002, 99,9715-9720. Griffith, P.G. Schultz, A general
29. K. Sakamoto, A. Hayashi, method for site-specific incorporation
A. Sakamoto, D. Kiga, H. Nakayama, of unnatural amino acids into
A. Soma, T. Kobayashi, M. Kitabatake, proteins, Science 1989, 244, 182-188.
K. Takio, K. Saito, M. Shirouzu, 39. J.D. Bain, C.G. Glabe, T.A. Dix, A.R.
I . Hirao, S. Yokoyama, Site-specific Chamberlin, E.S. Diala, Biosynthetic
incorporation of an unnatural amino site-specific incorporation of a
acid into proteins in mammalian cells, non-natural amino acid into a
Nucleic Acids Res. 2002, 30, 4692-4699. polypeptide, J. Am. Chem. SOC.1989,
30. T. Manabe, T. Ohtsuki, M. Sisido, 111, 8013-8014.
Design and synthesis of orthogonal 40. M.W. Nowak, P.C. Kearney, J.R.
tRNAs of nonstandard structures as Sampson, M.E. Saks, C.G. Labarca,
carriers of nonnatural amino acids in S.K. Silverman, W. Zhong, J. Thorson,
E.coli in vitro protein synthesizing J.N. Abelson, N. Davidson, P.G.
system, in preparation. Schultz, D.A. Dougherty, Nicotinic
31. H. Nakata, T. Ohtsuki, M. Sisido, in receptor binding site probed with
preparation. unnatural amino acid incorporation in
32. T. Hohsaka, D. Kajihara, Y. Ashizuka, intact cells, Science 1995, 268,
H. Murakami, M. Sisido, Efficient 439-442.
294
I 41. D.A. Dougherty, Unnatural amino 49. H. Taira, M. Fukushima, T. Hohsaka,
acids as probes of protein structure M. Sisido, Four-base codon-mediated
and function, Cum. Opin. Chem. Biol. incorporation of nonnatural amino
2000,4,645-652. acids into proteins in a eukaryotic
42. C. Kohrer, L. Xie, S. Kellerer, cell-freetranslation system, J. Biosci.
U. Varshney, U.L. RajBhandary, Bioeng. 2005, 99,473-476.
Import of amber and ochre suppressor 50. R.D. Anderson, J. Zhou, S.M. Hecht,
tRNAs into mammalian cells: a Fluorescence resonance energy
general approach to site-specific transfer between unnatural amino
insertion of amino acid analogues into acids in a structurally modified
proteins, Proc. Natl. Acad. Sci. U. S . A. dihydrofolate reductase, J. Am. Chem.
2001, 98,14310-14315. SOC.2002, 124,9674-9675.
43. C. Kohrer, J.-H.Yoo, M. Bennett, 51. S.W. Santoro, J.C. Anderson,
J. Schack, U.L. RajBhandary, A V. Lakshman, P.G. Schultz, An
possible approach to site-specific archaebacteria-derived glutamyl-tRNA
insertion of two different unnatural synthetase and tRNA pair for
amino acids into proteins in unnatural amino acid mutagenesis of
mammalian cells via nonsense proteins in Escherichia coli, Nucleic
suppression, Chem. Biol. 2003, 10, Acids Res. 2003, 31, 6700-6709.
1095-1102. 52. M.M. Yusupov, G.Z. Yusupova,
44. S.L. Monahan, H.A. Lester, D.A. A. Baucom, K. Lieberman, T.N.
Dougherty, Site-specificincorporation Earnest, J.H.D. Cate, H.F. Noller,
of unnatural amino acids into Crystal structure of the ribosome at
receptors expressed in mammalian 5.5 A resolution, Science 2001, 292,
cells, Chem. Biol. 2003, 10, 573-580. 883-896.
45. T. Hohsaka, Y. Ashizuka, 53. T. Hohsaka, Y. Ashizuka,
H. Murakami, M. Sisido, H. Murakami, M. Sisido, Five-base
Incorporation of nonnatural amino codons for incorporation of
acids into streptavidin through in vitro nonnatural amino acids into proteins,
frame-shift suppression, J . Am. Chem. Nucleic Acids Res. 2001, 29, 3646-3651.
SOC. 1996, 118,9778-9779. 54. B. Moore, C.C. Nelson, B.C. Persson,
46. T. Hohsaka, Y. Ashizuka, H. Taira, R.F. Gesteland, J.F. Atkins, Decoding
H. Murakami, M. Sisido, of tandem quadruplets by adjacent
Incorporation of nonnatural amino tRNAs with eight-base anticodon
acids into proteins by using various loops, Nucleic Acids Res. 2000, 28,
four-base codons in an Escherichia 3615-3624.
coli in vitro translation system, 55. J.C. Anderson,
Biochemistry2001,40, 11060-11064. N. Wu, S.W. Santoro, V. Lakshman,
47. T. Hohsaka, Y. Ashizuka, H. Sasaki, D.S. King, P.G. Schultz, An expanded
H. Murakami, M. Sisido, genetic code with a functional
Incorporation of two different quadruplet codon, Droc. Natl. Acad.
nonnatural amino acids independently Sci. U. S. A. 2004, 101, 7566-7571.
into a single protein through 56. C. Switzer, S.E. Moroney, S.A.
extension of the genetic code, J . Am. Benner, Enzymatic incorporation of a
Chem. SOC.1999, 121, 12194-12195. new base pair into DNA and RNA, /.
48. M. Taki, T. Hohsaka, H. Murakami, Am. Chem. SOC.1989, I l l , 8322-8323.
K. Taira, M. Sisido, Position-specific 57. J.D. Bain, C. Switzer, A.R.
incorporation of a Chamberlin, S.A. Benner,
fluorophore-quencher pair into a Ribosome-mediated incorporation of a
single streptavidin through orthogonal non-standard amino acid into a
four-base codon/anticodon pairs, 1. peptide through expansion of the
Am. Chem. SOC.2002, 124, genetic code, Nature 1992, 356,
14586-14589. 537-539.
References I295
58. I. Hirao, T. Ohtsuki, T. Fujiwara, R. Abe, H. Murakami, M. Sisido,

T. Mitsui, T. Yokogawa, T. Okuni, Position-specific incorporation of
H. Nakayama, K. Takio, T. Yabuki, dansylated nonnatural amino acids
T. Kigawa, K. Kodama, T. Yokogawa, into streptavidin by using a four-base
K. Nishikawa, S. Yokoyama, An codon, FEBS Lett. 2004, 560,173-177.
unnatural base pair for incorporating 68. H. Hamada, N. Kameshima,
amino acid analogs into proteins, Nut. A. Szymanska, K. Wegner,
Biotechnol. 2002, 20, 177-182. L. kankiewicz, H. Shinohara, M. Taki,
59. T.Ohtsuki, M. Kimoto, M. Ishikawa, M. Sisido, Position-specific
T. Mitsui, I. Hirao, S. Yokoyama, incorporation of a highly photodurable
Unnatural base pairs for specific and blue-laser excitable fluorescent
transcription, Proc. Natl. Acad. Sci. amino acid into proteins for
U. S. A. 2001, 98,4922-4925. fluorescence sensing, Bioorg. Med.
60. 1. Hirao, Y. Harada, M. Kimoto, Chem 2005, 13,3379-3384.
T. Mitsui, T. Fujiwara, S. Yokoyama, A 69. V.W. Cornish, D.R. Benson, C.A.
two-unnatural-base-pair system Altenbach, K. Hideg, W.L. Hubbell,
toward the expansion of the genetic P.G. Schultz, Site-specific
code,J. Am. Chem. Soc. 2004, 126, incorporation of biophysical probes
13298-13305. into proteins, Proc. Natl. Acad. Sci.
61. Y. Wu, A.K. Ogawa, M. Berger, P.G. 1994, 91,2910-2914.
Schultz, Efforts toward expansion of 70. G. Turcatti, K. Nemeth, M.D.
the genetic alphabet: optimization of Edgerton, U. Meseth, F. Talabot,
interbase hydrophobic interactions, 1. M. Peitsch, J. Knowles, H. Vogel,
Am. Chem. SOC.2000, 122,7621-7632. A. Chollet, Probing the structure and
62. H.Liu, L. Wang, A. Brock, C.-H. function of the tachykinin
Wong, P.G. Schultz, A method for the neurokinin-2 receptor through
generation of glycoprotein mimetics, J. biosynthetic incorporation of
Am. Chem. Soc. 2003, 125, 1702-1703. fluorescent amino acids at specific
63. S.V. Mamaev, A.L. Laikhter, T. Arslan, sites,]. Biol. Chem. 1996, 271,
S.M. Hecht, Firefly luciferase: 19991-19998.
alteration of the color of emitted light 71. L.E. Steward, C.S. Collins, M.A.
resulting from substitutions at Gilmore, J.E. Carlson, J.B. Alexander
position 286,J. Am. Chem. Soc. 1996, Ross, A.R. Chamberlin, I n vitro
118,7243-7244. site-specific incorporation of
64. S. Manabe, K. Sakamoto, Y. Nakahara, fluorescent probes into
M. Sisido, T. Hohsaka, Y. Ito, p-galactosidase, J . Am. Chem. Soc.
Preparation of glycosylated amino acid 1997, 119,6-11.
derivatives for glycoprotein synthesis 72. C.F.W. Becker, C.L. Hunter, R.P.
by in vitro translation system, Bioorg. Seidel, S.B.H. Kent, R.S. Goody, M.A.
Med. Chem. 2002, 10,573-581. Engelhard, A sensitive fluorescence
65. D.M. Rothman, E.J. Peterson, M.E. monitor for the detection of activated
Vazquez, G.S. Brandt, D.A. Ras: total chemical synthesis of
Dougherty, B. Imperiali, Caged site-specifically labeled Ras binding
phosphoproteins, J . Am. Chem. SOC. domain of c-Rafl immobilized on a
2005, 127,846-847. surface, Chem. Bid. 2001, 8, 243-252.
66. H.Murakami, T. Hohsaka, 73. B.E. Cohen, T.B. McAnaney, E.S.
Y. Ashizuka, K. Hashimoto, M. Sisido, Park, Y.N. Jan, S.G. Boxer, L.Y. Jan,
Site-directed incorporation of Probing protein electrostatics with a
fluorescent nonnatural amino acids synthetic fluorescent amino acid,
into streptavidin for highly sensitive Science 2002,296,1700-1703.
detection of biotin, Biomacromolecules 74. M. Kuragaki, M. Sisido, Long-distance
2000, I , 118-125. singlet energy transfer along a-helical
67. T. Hohsaka, N. Muranaka, polypeptide chains, J. Phys. Chem.
C. Komiyama, K. Matsui, S. Takaura, 1996, 100,16019-16025.
PART 111
Discovering Small Molecule Probes for Biological
Mechanisms
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Ghnther Wess
ISBN: 978-3-527-31150-7
Chemical Biology
6
Forward Chemical Genetics
Stephen]. Haggarty and Stuart L. Schreiber
Outlook
This chapter will review important historical and conceptual developments in

the use of chemical genetics to discover small-molecule probes of biological
mechanisms. The main focus will be on the notion ofusing “forward” chemical
genetics (phenotype-based discovery of biologically active small molecules) to
dissect the functions of genes. We will provide a comparison of this approach
to its classical genetic counterpart and to “reverse” chemical genetics (gene
product-based discovery of small molecules). We will summarize recent
technical advances that facilitate the discovery process - most notably the
use of high-throughput, phenotypic assays that measure cell-state changes on
the basis of the recognition of epitopes by antibodies, messenger ribonucleic
acid (mRNA) expression levels, and fluorescence imaging of individual and
populations of cells. As practical examples of the application of forward
chemical genetics we will discuss the use of the ongoing development of a
“molecular tool box” for the study of the cell-cycleand chromatin remodeling,
which has both basic- and clinical-research applications. Besides these specific
examples, and by way of an analogy to the creation of genetic maps using
classical genetics, we will generalize the notion of using an individual chemical-
genetic screen to find an active compand for the systematic use of chemical
genetics to map “chemical space” using phenotypic descriptors. Lastly, we will
discuss possible future developments in the field of chemical genetics.
6.1
Introduction
It is sometimes thought that the Neurospora work was responsible for the “one gene-one
enzyme” hypothesis - the concept that genes in general have single primavyfunctions, aside
from serving an essential role in their own replication, and that in many cases thisfunction
ISBN: 978-3-527-31150-7
300
I 6 Forward Chemical Genetics
is to direct specijicities ofenzymatically active proteins. Thefact is that it was the other way
around - the hypothesis was clearly responsiblefor the new approach.
George Wells Beadle

Nobel prize in medicine or physiology, 1958
Since the time of Gregor Mendel (1822-1884) and the discovery of “heritable
factors” [I],which are now referred to as genes, classical genetics, and more
recently molecular genetics, has become the dominant experimental paradigm
for understanding biological systems [2].An attractive feature of the genetic
approach is its adherence to the logic that to understand a system you
should perturb it and observe the consequences. Another important feature
is its generality, that is, genetics provides an experimental approach that is
applicable to the dissection of almost all biological systems provided that the
systems can reproduce and heritable mutations in genes can be made.
Despite the successes of classical genetics and knowledge of the complete
sequence of deoxyribonucleic acid (DNA) that comprises the human genome
[ 3 ] , the functions of the majority of genes and other regulatory elements
within the genome remain as enigmatic as they were at the time of Mendel.
In fact, many recent studies analyzing the basic tenets of what constitutes a
“gene”, as well as studies on the regulatory roles of ribonucleic acids (RNA),
challenge many of the tenets of the central dogma (DNA-to-RNA-to-protein).
Moreover,while knowledge of the complete human-genome sequence provides
a foundation for understanding disease biology, even for the majority of cases
of single-gene Mendelian disorders (e.g.,Huntington’s disease, cystic fibrosis),
knowledge of the genetic variation that causes the diseases is only the first step
toward an understanding of the disease pathogenesis and the development of
therapeutic treatments. Furthermore, it is now widely recognized that many
common human diseases, including cancer, schizophrenia, and diabetes, have
a strong genetic component, but the heritability of these diseases is so-called
complex in terms of the number of alleles (variants of genes) that contribute
to the final outcome and susceptibility. As a result of these challenges, there
exist only a handful of medical treatments based on an understanding of the
molecular etiology of a particular disease, and very few treatments that take
into account an individual’s genetic history. Therefore, there exists a great need
to expand the “molecular toolkit” available to both researcher scientists and
clinicians - the field of chemical biology is well poised to contribute toward
this task.
As stated above, George W. Beadle in his acceptance speech for the Nobel
prize in medicine or physiology in 1958 (shared with Edward L. Tatum “for
their discovery that genes act by regulating definite chemical events” using
the red bread mold Neurospora crussa, and with Joshua Ledenberg “for his
discoveries concerning genetic recombination and the organization of the
genetic material of bacteria”) noted that the desire to test new hypotheses
in science can be the genesis of new approaches that are transformative to
G. 1 Introduction 1 301
the existing scientific paradigm - rather than the other way around. With
this notion in mind, and with the aim of deciphering the functions of the
human and other model genomes, chemical genetics provides an approach
both to discover and to dissect the functions of gene products encoded within a
genome using biologically active small molecules (Fig. 6-1) [4-111. By directly
targeting gene products, mostly encoding for proteins, rather than by mutating
an organism’s genetic material, this approach differs from classical genetics.
However, as discussed in this chapter and elsewhere in this book, the overall
logic of chemical genetics and many of the principles of the approach are
similar to classical genetics. Given the temporal control offered by small
molecules, and the ability to use Combinations of small-molecule modulators,
chemical genetics promises to complement the use of pure genetic analysis
to study a wide range of biological systems and mechanisms. In this regard,
it is possible that many of the hypotheses that can be tested using chemical
genetics will ultimately play a transformative role in the coming years, much
like Beadle and Tatum’s efforts over a half-century ago.
To be effective as probes of biological mechanisms, and to function as
therapeutic agents in the clinical setting, small molecules must modulate
biological states by perturbing cellular networks through interactions with
macromolecular molecules. The challenge of doing this effectively is
highlighted by emerging models from genome- and proteome-wide interaction
Fig. 6-1 Classical genetics versus chemical genetics. Chemical

genetics aims to target gene products using small molecules
rather than t o target the genes themselves by m u t a t i n g an
organism’s genetic material.
G
302
I studies
[ 11 151. These models have revealed the highly interconnected nature

-
of the underlying networks of biochemical and genetic interactions in which

the nodes are proteins or genes and the edges represent a physical or genetic
interaction. Here, the observation that biological systems are robust to random
perturbations but are highly susceptible to the targeted perturbation of highly
connected nodes, means that not all gene products involved in a particular
cellular process have equal importance in terms of the fidelity or robustness
of the process [ll,141. As such, contrary to the original tenets of the Beadle
and Tatum’s ‘one gene-one enzyme’ hypothesis, many gene products are not
enzymes and many gene products have multiple functions, some of which
are redundant in that they can be compensated for in their absence by other
gene products. Thus, because of the connectivity of biological networks, while
targeting a highly connected node may produce a desired phenotype, doing
so may also result in untoward effects due to modulation of functionally
connected nodes that are neither directly relevant nor are needed for the
desired phenotypic outcome. The development of experimental methods to
uncover and modulate selectively the functions of individual nodes (mostly
representing proteins) in such networks is the central aim of functional
genomics, in general, and chemical genetics, in particular, [4-111.
6.2
History/Development
Throughout history, small molecules have played an important role in many

basic discoveries in science and have provided medicinally useful agents for the
treatment of disease in the millennia. Although difficult to define precisely what
constitutes a “small molecule”, as compared to other molecules in general,
it is instructive to examine examples (Fig. 6-2). In general, small molecules
are composed of stable arrangements of the atoms carbon, hydrogen, oxygen,
nitrogen, sulfur, phosphorous - the same constituents of the amino acids,
nucleic acids (DNA and RNA), carbohydrates, lipids, and other chemicals
that form the macromolecular building blocks of life itself. Unlike the
macromolecular components of DNA, RNA, and protein, small molecules
are generally of lower molecular weight and are usually not composed of
polymeric, repeating subunits.
A few, important examples of small molecules include (Fig. 6-2): penicillin
(1) an antibiotic discovered by Alexander Fleming; thiamine (vitamin B1) (2)
used by George W. Beadle and Edward L. Tatum to rescue auxotrophic mutants
of N.crussu; geldanamycin (3)a natural product that targets the HSP90 resulting
in aberrant protein folding and suppression of oncogenic mutations that occur
in certain cancers; dopamine (4)an important excitatory neurotransmitter
that mediates many aspects of human behavior and cognition; haloperidol
(5) an antipsychotic used to treat schizophrenia that targets a family of
neurotransmitter receptors, including the dopamine Dz receptor; colchicine
6.2 History/Deue/oprnent I 303
OMe
6
Fig. 6-2 Examples of biologically active depressant and sedative; (6) colchicine, an
small molecules whose structural inhibitor o f mitosis that causes microtubule
complexity, protein targets, and consequent destabilization; (7) rapamycin, an anticancer
observable phenotypes are different. (1) agent that inhibits TOR proteins when
Penicillin C,an antibiotic; (2) thiamine complexed t o FKBP12; (8) latrunculin B, a
(vitamin BI), a metabolite that is an enzyme destabilizer of actin microfilaments; (9)
cofactor; (3) geldanamycin, an inhibitor o f caffeine, a central nervous system stimulant
heat-shock protein 90 (HSP90); (4) that targets proteins including cyclic
dopamine, a neurotransmitter; ( 5 ) nucleotide phosphodiesterases.
haloperidol, a central nervous system
( 6 ) first used by the Egyptians over 35 centuries ago for the treatment
of what is now recognized as cancer, and later used to discover tubulin,
a major component of the cytoskeleton; rapamycin (7) a natural product
with anticancer properties first isolated from the bacteria Streptornyces and
later used to discover mammalian FKB P12-rapamycin-associated protein
(FRAP)/mammalian target of rapamycin (mTOR); latrunculin (8), a natural
product isolated from the marine sponge that causes destabilization of the
actin cytoskeleton; and caffeine (9), a naturally occurring methylxanthine
found in coffee and tea, which has several cellular actions, including the
inhibition of cyclic nucleotide phosphodiesterases. Indeed, many aspects of
biological research - from using antibiotics (e.g., ampicillin), to selecting for
the transformation of Escherichia coli with a recombinant DNA plasmid, to
the vitamin constituents (e.g., vitamin B6) of the basic culture media used
to culture mammalian cells, to the inhibition of proteases (e.g., leupeptin)
and phosphatases (e.g., pervanadate) during biochemical purification of
proteins - rely on the use of small molecules. Besides these routine uses
in biology, biologically active small molecules are widely used as imaging
304
I reagents in basic research and clinical diagnosis (e.g., fiuorodeoxyglucose
G Forward Chemical Genetics
positron-emission tomography (FDG-PET)). They provide essential roles in

newly developed technologies such as somatic cell nuclear transfer (e.g.,
A23187, a calcium ionophore), and many small molecules are produced in
mammalian cells using endogenous metabolic pathways (e.g., the opiate
analgesic morphine).
By using small-molecule libraries in appropriate cell-based assays, the
functions of a growing number of novel gene products and biologically active
small molecules from both natural sources and laboratory syntheses have
been discovered (Table 6-1). Many of these small molecules cause a loss
of function of their cognate targets, including kinases and phosphatases,
deacetylases and acetyltransferases, membrane receptors, proteases, isoprenyl
transferases, and polymerases, and to a lesser extent, small molecules that
cause a gain of function have also been discovered or invented.
An important example of using chemical genetics to characterize a signaling
pathway from the cell membrane to the nucleus is that of the discovery
of the common targets of the immunosuppressant drugs cyclosporine A
(CsA) and FKSO6 (reviewed in Refs 16, 17). Prior to this discovery, CsA
was known to inhibit the production of IL-2, a T-cell-derived cytokine that
mediates the immune response leading to rejection of transplanted organs
in humans, although the mechanism of action was unknown. Scientists
looking to discover new immunosuppressants, first isolated FK506 from the
fermentation broth of Streptomyces tsukubaensis after discovering that an extract
of this organism could also block IL-2 secretion [18].Since FK506 was a potent
immunosuppressive with activity at concentrations several hundredfold lower
than CsA, scientists became interested in identifying the cellular receptors or
targets of both CsA and FK506, leading first to the recognition that they had to
target separate “immunophilins”, cyclcophilin and FK506 binding protein-12
(FKBP12)[19].Further investigation led to the recognition that the complexes of
cyclophilin-CsAand FKBP12-FK506 competitively bind and inhibit the Ca2+-
and calmodulin-dependent phosphatase calcineurin [20]. Collectively, these
studies revealed a previously unknown family of evolutionarily conserved gene
products (the immunophilins), revealed a biological function of calcineurin,
identified and characterized new biologically active small molecules, provided
an important example of using synthetic chemistry to manipulate an important
class of small molecules to identify their cellular targets using affinity
chromatography, and expanded the repertoire and medical understanding
of immunosuppressant drugs. Since the time of these discoveries, calcineurin
has been recognized as an important mediator of T-cell signal transduction
pathway regulating transcription factors such as the nuclear factor of activated
T cells (NF-AT),which are involved in the expression of a number of important
genes involved in T-cell-receptoractivation, including IL-2; calcineurin has also
been shown to be an important regulator of the nervous and cardiovascular
system [21].
Table 6-1
Small molecule Assay format Key phenotype Target
Cytoskeleton and cell division

Colchicine Cultured cells Perturbs mitosis Tubulin
Taxol Cultured cells Perturbs mitosis Tubulin
Hesperadin Cultured cells Perturbs mitosis Aurora kinases
Latrunculin Cultured cells Perturbs mitosis Actin
Synstab A Cultured cells Perturbs mitosis Tubulin
Depol-2b Cultured cells Perturbs mitosis Tubulin
Y-27632 Smooth muscle tissue Inhibits smooth muscle contraction pl6OROCK
Wiskostatin Xenopus extract Inhibits actin polymerization N-WASP
Monastrol Cultured cells Induces monopolar spindles Mitotic kinesin Eg5
Diminutol Xenopus extract Induces a small mitotic spindle NADP-dependent quinone
oxidoreductase
Tubacin Cultured cells Increases tubulin acetylation Histone deacetylase 6
Myoseverin Cultured cells Depolymerizes microtubules Tubulin
Isogranulatimide Cultured cells Bypasses DNA damage induced G2 checkpoint Chkl
Suptopins Cultured cells Bypasses chromatid catenation induced G2 Unknown
checkpoint
Erastin Cultured cells Synthetic lethal with tranformin oncogens Unknown
Macbecin I1 Cultured cells Synthetic lethal with RNAi of Tsc2 Unknown
Dihydromotuporamine C Cultured cells Prevents cell invasion Sphingolipid metabolism
I;\
h,
Chromatin remodeling
Trapoxin B Cultured cells Reversal of transformed phenotype: histone
3
Histone deacetylases F
acetylation
Depeudecin Cultured cells Reversal of transformed phenotype: histone
5
Histone deacetylases p
acetylation
Trichostatin A Cultured cells Reversal of transformed phenotype; histone Histone deacetylases
3
n
acetylation 3
ITSAl Cultured cells Bypasses cell-cycle arrest by trichostatin A Unknown 9
-
(continued overleaf)
a
cn
Table 6-1 (continued) 3
Small molecule $
Assay format Key phenotype Target a
n
Protein synthesis,folding, traficking, and secretion
Geldanamycin
Leptomycin B Antiviral/antifungal Inhibits nuclear export Crml
Multiple inhibitors In vitro translation extract Inhibition of translation initiation and elongation RNA and varied
Multiple inhibitors Cultured cells Inhibit FOXOla nuclear export Varied
Brefeldin A Antiviraljantifungal Blocks ER-to-Golgi transport Arfl
Exol Cultured cells Blocks ER-to-Golgi transport Unknown
Ex02 Cultured cells Blocks ER-to-Golgi transport Unknown
Multiple sulfonamides Cultured cells Block Golgi-to-cell-membranetransport Unknown
Sortins Cultured cells Induce secretion Unknown
Ubiquitin-proteasome pathway
Lactacystin Cultured cells Neurite induction and protease inhibition Proteasome
Ubistatin Xenopus extract Inhibits ubiquitin-dependent proteolysis Multiubiquitin chain
Signaling pathway
Cyclopamine Cultured cells Inhibits hedgehog signaling Smoothened
Cyclosporin Cultured cells Inhibits T-cell signaling Cyclophilin and calcineurin
FK50G Cultured cells Inhibits T-cell signaling FKBP12 and calcineurin
Rapamycin Cultured cells Inhibits T-cell signaling FKBPl2 and TOR kinase
Fumagillin Cultured cells Inhibits endothelial cell proliferation Methionine aminopeptidase
SMIR4 Cultured cells Suppresses rapamycin Nirlp (Ybr077cp)
Purmorphamine Cultured cells Induces osteogenesis Hedgehog signaling agonist
TWS119 Cultured cells Induces neurogenesis Glycogen synthase kinase-3b
Cardiogenol Cultured cells Induces cardiomyogenesis Unknown
Concentramide Zebrafish embryos Disrupts heart patterning Unknown
GS4012 Zebrafish embryos Suppresses cardiac defect Upregulates VEGF levels
6.3 General Considerations I 307
Although many important individual discoveries, like the role of calcineurin

in T-cell signaling, have been made using chemical genetics (Table G-l), one
of the limiting factors in making such discoveries is the gap between the fields
of chemistry and biology. In an effort to bridge the differences between these
fields, a notable “cross-talk’’article entitled Toward a Pharrnacological Genetics
in the inaugural issue of the journal Chemistry @ Biology in 1994 cited many of
the advantages of using small molecules to study biological systems and the
need for increased interaction among chemists and biologists [4].Over a decade
later, many of the ideas discussed in this article continue to be favored topics of
discussion and provide challenges that the field of chemical biology as a whole
continues to face. Besides the development of high-throughput phenotypic
assays for screening large collections of small molecules, which has enabled
chemical-genetic approaches, and high-throughput binding and enzymatic
assays, which have enabled reverse chemical-genetic approaches, chemical
genetics has evolved to emulate classical genetics in a number of other ways:
(a) the development of high-throughput phenotypic assays compatible with
performing screens of large collections of chemicals, (b) the use of chemical-
genetic modifier (suppressor and enhancer) screens to reveal connections
between pathways and networks as well as epistatic relationships between
gene products, (c) the use of synthetic-lethal (and synthetic-viable) screening
to reveal redundant elements ofpathways and networks, and (d) the creation of
“chemical-geneticmaps” that position chemicals in a multidimensional space
formed from phenotypic or computed descriptors. It is the objective of this
chapter to discuss these topics and to provide examples in which the approach
of chemical genetics has been successful in discovering small-molecule probes
for biological mechanisms.
6.3
6.3.1
Small Molecules as a Means to Perturb Biological Systems Conditionally
Although chemical genetics is modeled after classical genetics, especially

with respect to the use of phenotype-based screening (the word phenotype is
derived from Greekphaino-, from phainein, meaning to show or be observable),
it differs from classical genetics in the use of small molecules, rather than
mutations, to perturb the function(s) of gene products [4-111. Thus, chemical
genetics applies the principles and logic of genetics, but the analyses focus on
proteins rather than genes.
Several features of small molecules render them ideal for use with
complex biological systems and for complement classical genetic analysis
and methods based on ribonucleic acid-based interference (RNAi). These
features include the ability to offer nearly instantaneous temporal control,
G Fonvard Chemical Genetics
308
I the ability to use combinations of small-molecule modulators, the ability to
disrupt protein-protein interactions, the ability to cause both gain and loss
of individual functions, and the ability to modulate individual functions of
multifunctional proteins.
Since small molecules can alter specifically the function of a gene product
from all copies of a gene (assuming there are no functional differences
between the alleles), a small molecule can be used analogously to an inducible
dominant or homozygous recessive mutation in diploid genetic systems that
posses two copies or alleles of each gene. This circumvents the difficulty of
generating these types of mutations in the case of mammalian systems. Also,
just as mutation sites can identify functionally relevant coding sequences of
genes, small molecules can identify functionally relevant amino acid residues
of proteins, on the basis of their mechanism of interaction.
Unlike most mutagenic methods, the use of small molecules will not
generally produce heritable alterations in genes. Since a small molecule can
generally be added and removed from an experiment at will, the perturbations
induced by small molecules are generally conditional and reversible. Large
numbers of small molecules, and not mutations, are required to perturb the
complete complement of cellular gene products. Determining which gene
product is altered in a genetic assay requires mapping of a mutation or
sequencing of a gene as opposed to identifying the protein(s) targeted by a
small molecule, that is, the “target identification problem” (see below).
Although the focus of the chemical genetics described in this chapter is
that of the screening of small organic molecules, other exogenously added
chemicals, such as DNA sequences that encode for an amino acid or nucleic
acid polymer or other compositions of matter that may alter the state of a
biological system, are also of interest. In particular, the use of RNAi and related
phenomena now provide powerful reverse genetic approaches for functional
genomics [22, 231. However, while RNAi can provide selectivity (assuming
that the probe is appropriately designed and validated for the system being
tested), RNAi probes must first be synthesized using the knowledge of gene
sequence, and their effects are limited to loss or reduction of function of gene
products. Furthermore, the inability of RNAi to selectively target individual
functions of proteins, to directly disrupt protein-protein interactions, and its
extended temporal scale, limits the generality and applicability of this strategy
for modulating gene-product function. Ultimately, however, the combination
of different forms of perturbations will be an important means of elucidating
pathways and targets.
6.3.2
Forward and Reverse Chemical Genetics
Overall, the use of genetic approaches can be subdivided into “forward

genetics”, which involves the use of phenotype-based screening and “reverse
G.3 General Considerations I 3 0 9
genetics”, which involves studying the phenotypic consequences of mutations

in a known gene (Table 6-2).The use offorward genetics entails determining
the phenotypic consequences of mutations in genes and identifying the
gene product that produces a heritable phenotypic change when mutated.
By starting with a phenotype of interest and working toward an altered
gene sequence, the forward genetic approach allows the ordering of gene
products into functional pathways and the analysis of the interactions between
other gene products and pathways (epistasis). Although initially developed
for the study of how genes control inheritance by establishing a connection
between changes in genotype and changes in phenotype, a forward genetic
approach allows the identification of novel gene products involved in almost
any biological process of interest. Since the pioneering work of Mendel, a
number of genetically tractable model organisms have become widely used,
including: Drosophila melanogaster (fruit fly), Caenorhabditis eleguns (nematode
worms), Saccharornyces cerevisiae (budding yeast), Arabidopsis thalina (plant),
and even complex vertebrates such as Danio rerio (zebrafish)and Mus musculus
(mice) [24-281. Each of these provides a number of strengths and weakness
for elucidation of genotype-phenotype relationships.
Like its genetic counterpart, “forward” chemical genetics relies on a
phenotype of interest to guide the selection of biologically active small
molecules that modulate a particular biological system or mechanism
(Fig. 6-3) [5-7]. Overall, this approach entails a three-step process that
Table 6-2
Forward genetics Reverse genetics

(from phenotype to gene/protein) (from gene/protein to phenotype)
Classical genetic Chemical-genetic Classical genetic Chemical-genetic

approach approach approach approach
Random Add library of small Mutate single Use a purified protein to

mutagenesis (e.g., molecules to a gene of interest in screen a collection of
irradiating cells) biological system cells or whole small molecules for
(extracts, cells, organisms (e.g., binders or modulators of
whole organisms) knockout mouse) function
Select mutants Select small Generate cells or Add the molecules that
with the molecules that animals with bind to the protein of
phenotype of produce the mutant gene interest to cells or whole
interest phenotype of organisms
interest
Identify the Identify the Observe phenotype(s)
mutated genes by protein(s) and
mapping and genetic pathways
sequencing with which the
small molecules
interact
310
I
Fig. 6-3 Forward versus reverse chemical molecules that can be used t o probe the
genetics. While forward chemical genetics function o f the selected protein. Both
relies on a phenotype o f interest t o guide the approaches require the use o f small
selection o f biologically active small molecules and phenotypic assays but differ
molecules, reverse chemical genetics use a in the starting Points ofdiscovery.
protein of interest t o identify small
begins with the development of a phenotypic assay to measure a biological

property or mechanism of interest, and then screening of small-molecule
libraries for compounds that induce a change in the desired phenotypic
property or mechanism. After identifying active compounds, the third,
and often most challenging, step involves the identification of interacting
protein targets and genetic pathways. Thus, by starting with a phenotype
of interest and working toward identifying the protein whose function is
altered (rather than altered gene sequence) the forward chemical-genetic
approach still allows the ordering of gene products into functional pathways
and the analysis of the interactions between other gene products and
pathways (epistasis). In addition to identifying functions of gene products,
by using phenotypic variation as a means to study biologically active small
molecules the forward chemical-genetic approach allows the ordering of
biologically active small molecules into functional pathways irrespective of
knowledge of their targets and mechanism of action. By analogy to the
study of “genotype-phenotype’’ relations, these efforts contribute toward
an understanding of “chemotype-phenotype” relations, which includes
quantitative structure-activity relationship (QSAR)modeling, which attempts
to explain the chemical properties of small molecules that produce molecular
recognition events that lead to specific phenotypes. As discussed below, a
greater understanding of the relationship between chemotype and phenotype
may come about through efforts similar to that of the mapping of genetic
mutations.
312
I 6 Fonvard Chemical Genetics
Fig. 6-4 Phenotypic assays for chemical the presence of a particular antigen using a
genetics. (a) Types of assays that have been specific primary antibody in solution. A
used for chemical-genetic screening. secondary antibody covalently linked t o
(b) Example o f a cell-based assay involving horseradish peroxidase is added and the
phospho-specific antibody-based presence of the entire complex is detected
determination o f a cell state [31]. A cytoblot through the chemiluminescent reaction
involves growing cells on the bottom of a caused by addition of luminal and hydrogen
well, fixing the cells and probing the cells for peroxide.
be low such that methods of analysis can readily identify which molecules
are active. Ideally, instead of using visual observations or considering a binary
descriptor of “0” or “I”, the assay being used is quantitative in nature in terms
of providing a continuous valued measure of activity that can be recorded
electronically using plate readers designed to measure changes in absorbance,
fluorescence, and luminescence.
High-throughput (10000-200 000 compounds per day) phenotypic assays
involving the measurement of changes in calcium levels or second messengers,
like cyclic adenosine monophosphate (CAMP),in cultured cells have been
possible using “fluorescence imaging plate readers” (FLIPRs) for many years.
However, almost exclusively, these assays have been performed in the context
of the development of drugs targeting directly specific cell surface receptors,
including the large family of G-protein coupled receptors (GPCRs), whose
expression has been engineered to occur in a particular cell line that is readily
amenable to high-throughput screening. While these assays have produced
many biologically active small molecules that work as either receptor agonists
or antagonists, some of which are therapeutically used drugs, the focused

nature of the screens means that they have not been used to purposefully
target the full diversity of possible biological mechanisms.
Another assay type that has been widely used is that of using a “reporter
gene”, which acts as an easy-to-measure surrogate for a gene product of
interest. Such reporter genes contain one or more specific gene regulatory
elements that often bind transcription factors whose function is directly
linked to a pathway of interest (e.g., CAMP response element binding
(CREB) protein), the reporter gene sequence itself (e.g., luciferase or B-
galactosidase), and other sequences required for the formation of functional
mRNA. Once the reporter construct is introduced into the cells, a direct
assay of the reporter protein’s enzymatic activity provides a means to monitor
the upstream signaling pathways, as well as other factors affecting mRNA
stability and protein turnover. Through the use of gene expression-based high-
throughput screening (GE-HTS)in which a gene-expression signature is used
as a surrogate for cellular states, it is now possible to multiplex the number of
reporters that are used, although the concept of coupling phenotypic changes
in response to small molecules interacting with protein to changes in mRNA is
the same [ 3 2 ] . Once a signature consisting of a small set of genes is obtained,
this approach provides a general method of screening applicable to many
cell types and biological mechanisms. By not having to introduce a reporter
gene construct and instead relying on expression of gene from endogenous
promoters and read-outs based on hybridization of specific transcripts, these
assays have the advantage of examining gene expression under the influence
of its natural chromatin and chromosomal context. In the limit of using a
full genome’s level of mRNA expression patterns as a phenotype, even with
the coexpression patterns of many genes, this approach to forward chemical-
genetic screening provides a truly high information content read-out of cell
states [ 3 3 ] . However, since mRNA levels are not always directly related to
protein levels and they cannot reflect directly the posttranslational state or
localization of proteins in cells, there has been much effort put forth to develop
assays that can measure additional biological mechanisms.
One common mechanism of biological regulation that cannot be measured
directly by a reporter gene or FLIPR assays involves the reversible, covalent
modification of proteins. Many posttranslational modifications, including
protein glycosylation, methylation, lipidation, isoprenylation, ubiquitination,
phosphorylation, and acetylation, have been found to be integral components of
the signal transduction mechanisms operating to transfer information in and
between cells. By rapidly and reversibly altering the chemical properties of gene
products in a manner dependent on and capable of influencing subcellular
localization and the interaction with other protein partners, such intracellular
chemistry provides a means to both observe and modulate biological systems.
To assess the intracellular pathways regulating posttranslational modifications
using forward chemical genetics, a number of assays have been developed
that allow screening of small-molecule libraries for modulators of such
314
I modifications. One nonradioactive format, called the cytoblot, is capable of
I; Forward Chemical Genetics
detecting posttranslational events using an appropriate antibody (Fig. 6-4(b))

[31]. Unlike a reporter gene assay, since this assay does not require the
engineering of a the cellular system, and instead takes advantage of the ability
of cells to produce proteins and to analyze proteins in their endogenous context
without overexpression, this format facilitates the assaying of transformed or
primary cell lines that are from different tissue types or from different genetic
backgrounds.
Two of the emerging technological developments, which when combined
together promise to play an important role in forward chemical genetics, are
the use of optical imaging and automated microscopy [34].Through the use of
appropriate fluorescent dyes, antibodies, and genetically encoded probes, such
as the green-fluorescent protein (GFP), these techniques allow the resolution
of individual cells and subcellular organelles within cultured cells in multiwell
plates (Fig. 6-5). The term “high-content” is often used to refer to the high
information content of these types of assays, which follows from their ability
Fig. 6-5 Example o f a high-content multiple cell types, and phenotypes can be
image-based screen for small molecules that quantified from a single image using image
alter neural stem-cell differentiation. Unlike segmentation and computational analysis.
homogeneous, plate-reader based assays,
G.3 General Considerations I 31 5
to extract a variety of features from images. Thus, instead of considering

either a binary descriptor or a continuous valued measure of activity that is
produced from the entire content of a well, as is often obtained from using
visual inspection or a plate reader, these assays can quantify phenotypes in
individual cells, as well as provide a population average.
Since routine imaging allows the use of multiple (3-4) fluorophores with
different excitation and emission properties, ratiometric and multiplexed mea-
surements can be made. For example, by considering a binary measurement
of intensity alone, and not the morphology of cells, for three separate colors
(red, blue, green) there are a total of 23 = 8 possible ratiometric measurements
per well. Furthermore, beside overall intensity, image segmentation allows the
features of only a subset of objects in a well to be quantified separately from
others. As a result, complex mixtures of cell types can be assayed simultane-
ously to perform a multiplexed assay to provide a more physiologically relevant
environment. Figure 6-5 shows an example of an image-based screen to look
for small molecules that modulate the differentiation of mouse neuronal stem
cells into the three principal cell types of the brain: astrocytes, oligodendro-
cytes, and neurons. The following three examples highlight the usefulness of
image-based screening for chemical genetics.
Example 1: Perlman and colleagues performed a fully automated, image-
based, centrosome-duplication assay that measured the size of centrosomes
in individual cells [ 3 5 ] . Using this assay, they performed a series of
chemical-genetic modifier screens (see below) looking for suppressors and
enhancers of hydroxyurea, a compound that was known to induce centrosome
duplication. Out of a collection of known biologically active compounds this
assay revealed that compounds targeting microtubules and protein synthesis
blocked centrosome duplication, while certain paralog-specific protein kinase C
inhibitors and retinoic acid receptor agonists increased it. Then using a library
of uncharacterized small molecules, they were able to identify five novel
centrosome-duplication inhibitors that do not target microtubule dynamics or
protein synthesis.
Example 2: In a phenotypic screen for inhibitors of the secretory pathway
(endoplasmic reticulum - Golgi apparatus - cell membrane), Feng and
colleagues identified several structural classes of small molecules that perturb
membrane trafficking [36].Through more in-depth analysis [37], one class of
sulfonamide-containing molecules were shown to inhibit the ATPase activity
of the vacuolar ATPase and others were shown to act by a mechanism distinct
from that of the natural-product brefeldin A, which inhibits Arfl GTPase by
stabilizing it in its inactive GDP-bound state.
Example 3: Using a visual, image-based phenotypic screen that measured
the subcellular localization of GFP-tagged FOXOla, a screen for inhibitors
of FOXOla nuclear export in the absence of the PTEN phosphatase was
performed by Kau and colleagues [38]. These studies led to the discovery
316
I of general inhibitors of nucleocytoplasmic transport, which, like the natural-
6 Forward Chemical Genetics
product leptomycin, directly inhibited the nuclear export factor CRM1. Besides
this class of compounds, a number of other compounds inhibiting PI3K/Akt
signaling were discovered, which included multiple antagonists of calmodulin
signaling and psammaplysene A [39],a natural product isolated from marine
extracts. Given the importance of the PI3K/PTEN/Akt signal transduction
pathway in a variety of cancers, and the ability of FOXOla targeted to the
nucleus to reverse tumorigenicity of PTEN null cells, these small molecules
and their targets may provide a new generation of therapeutic agents.
6.3.4
Nonheritable and Combinations o f Perturbations
One of the significant differences between chemical genetics and classical

genetics is that the possible perturbations are not limited to those that can
be made by making heritable changes in discrete factors, such as a gene. In
addition, unlike a genetic perturbation that needs to be recreated if one wants
to study a new organism or the mutation in a different genetic background,
many small molecules are active in multiple biological systems. In fact, if
a small molecule can be found to have a similar phenotype in a genetically
tractable organism, such as S. cerevisiae or C. elegans, then exploiting the
evolutionary conservation of biological systems provides a means to assist in
the identification of the targets of the small molecules.
As a result of the ease of being able to add different small molecules to
an experimental system, as compared to the difficulty of making extensive
double or other combinations of genetic mutants, it is possible to exploit the
combinatorics of possible perturbations to discover combinations of small
molecules or other perturbations that produce a desired phenotype [39]. For
example, ifwe consider a chemical library composed of N small molecules that
are to be tested at C concentrations, there are: C x N possible single treatments,
C x N (C x N - 1)/2 possible unique combinations, and C x NZ possible
combinations (if the order of addition of the small molecules is relevant).
Thus, even for a small collection of compounds (N= 100) tested at three
concentrations (C = 3) there are over 44 850 possible unique combinations
of treatments. However, the diversity of the resulting perturbations might be
less optimal for discovering new probes, as it would be expected that many
of the different combinations would be functionally similar. Alternatively,
instead of performing an “all against all” screen, it is possible to select specific
small molecules of interest and purposefully perform what is referred to as a
“chemical-genetic modifier” screen to look for suppressors and enhancers
of the phenotypic effect of the small molecule of interest (Fig. 6-6). In
classical genetics, suppressor and enhancer screens are used to identify
genes that, when mutated, suppress or enhance a previously identified
phenotype of interest. The advantage of such screening, as compared to
6.3 General Considerations 1 317
Fig. 6-6 Chemical-genetic modifier small molecule f r o m a chemical library and

screens. (a) By p u t t i n g cells i n a defined cell each c o l u m n a different small-molecule
state, it is possible t o identify modifier that puts the yeast i n t o a different
small-molecule suppressors and enhancers. cell state. The level o f red and green is
(b) Examples o f data collected f r o m a screen indicative of the observed growth measured
for chemical-genetic modifiers u s i n g a by optical density o f w e l l s . Certain
growth assay i n b u d d i n g yeast (data f r o m compounds allow the yeast t o grow,
Harvard U n iversity, MCB100 Ex per im e nta I whereas others prevent growth.
Biology course). Each r o w corresponds t o a
using a wild-type (WT) genetic background, is in the sensitization of the

pathway to further perturbation, rendering the mutations identified often more
relevant to the pathway of interest. In the end, like the synthesis of diverse
compounds via two-component coupling reactions, the sparse sampling
of a larger matrix of possible combinations via chemical-genetic modifier
screens may prove beneficial for identifying novel small-molecule probes of
biological mechanisms. Examples of chemical-genetic modifier screens that
have been performed include the identification of suppressors of (a) the histone
deacetylase inhibitor trichostatin A [40], (b) ICRF-193 [41], a topoisomerase I1
inhibitor that causes a Gl-checkpoint arrest, (c) rapamycin [42], an inhibitor
of TOR proteins, (d) FK50G and its effect on calcineurin’s regulation of
salt stress [43],and (e) hydroxyurea’s effect on centrosome duplication [35].
Suppressors and enhancers have also been identified for a variety of other
small molecules, including the motor protein kinesin-5 inhibitor monastrol,
the microtubule destabilizer nocodazole, the microtubule stabilizer taxol, the
actin destabilizer latrunculin, the protein translation inhibitor cycloheximide,
and the calmodulin inhibitor W7 (S.J.H.and S.L.S., unpublished data).
318
I 6.3.5
Multiparametric Considerations: Dose and Time
From first principles, other important considerations for determining the

phenotypic effect of small molecules are those of the concentration and the
length of treatment, which are collectively referred to as dosage efects. Not
unlike the challenges faced by geneticists who induce multiple different alleles
by mutagenesis and determine which mutations are hypomorphic (reduction
of function), hypermorphic (gain of function), or a complete null allele (no
function), chemical biologists studying small molecules that show different
phenotypes at different concentrations have to determine whether the molecule
is interacting with multiple protein targets with different thresholds of activity,
or with a single target that induces different phenotypes with different levels
of modulation. Depending on the resolution of the assay being used to screen
the small molecules and to assess their phenotypic effects, there may be
a threshold for the length of treatment with a small molecule, which can
also be affected by the concentration. For example, measuring the effects of
a small molecule on the progression of mammalian cells through the cell
cycle requires a few hours of treatment, but cellular processes such as the
synaptic vesick cycle require only a few seconds. As discussed below, these
along with other parameters are beginning to be addressed upfront as part of
“multidimensional” screening efforts.
6.3.6
Sources of Phenotypic Variation: Genetic versus Chemical Diversity
In many ways, the ongoing development of improved collections of small

molecule perturbagens (SMPs) for forward chemical genetics is reminiscent
of the development of improved method for mutagenesis in classical genetics.
Before it was realized that the genetic material was a molecule, early geneticists,
such as Thomas H. Morgan who was awarded the Nobel prize in physiology
or medicine 1933 “for his discoveries concerning the role played by the
chromosome in heredity”, had to rely on spontaneous mutants as their
source of genetic variation, thus limiting the power of forward genetics. A
great leap forward was made in 1927 when Herman J. Muller, a student of
Thomas H. Morgan, discovered that heritable mutations in Drosophila could
be induced. For “the discovery of the production of mutations by means of
X-ray irradiation” Herman J. Muller was recognized in 1946 with the Nobel
prize in physiology or medicine. This finding meant that for the first time
it was possible to access a wide swath of genetic variation and associated
diversity of phenotypes. With the advent of chemical mutagens, such as
ethylnitrosourea capable of inducing point mutations (changes in single base
pairs), many different types of alleles could be induced, including both loss-
of-function and gain-of-function mutations. While the early practitioners of
genetics would likely have never anticipated such developments, the advent of
even improved methods for genome manipulation, including gene disruptions
due to insertion of transposable elements, gene trap vectors, and homologous
recombination, now allow a wide spectrum of genetic variation to be studied.
The serendipitous discovery of small molecules “spontaneously” produced
by natural sources, such as cultured bacteria and marine sponges, has been a
long-standing source of bioactive small molecules [44, 451. Like the discovery
of X rays and other agents that can induce phenotypic variation, chemical
biologists are becoming increasingly adept at making small molecules that are
suitable for use in forward and reverse chemical-genetic studies [6, 46-49].
These methods include the use of DNA template-mediated, and target-and
diversity-oriented organic synthesis, peptide and carbohydrate synthesis, and
enzyme-mediated synthesis, the latter of which enables in vitro evolution,
protein engineering, and even nonnatural amino acids to be incorporated
into polypeptides. The collective aim is to provide increasingly complex and
effective small-molecule modulators of biological processes by developing
efficient (three- to five-step) syntheses of collections of small molecules having
rich skeletal and stereochemical diversity. Such synthetic strategies are not
directed toward any one molecular target, as occurs in target-oriented synthesis;
instead, the efforts are ultimately aimed at being able to target all molecular
components of the networks regulating biological processes [G,461.
An important conceptual development in chemical library synthesis has
been the recognition of the importance of not only creating diversity (so as to
increase the likelihood of finding an active small molecule) but also retaining
the potential to site- and stereoselectively attach appendages to the small
molecule during a postscreening optimization stage. Such chemical handles
not only facilitate the addition of functionalities that increase the potency
or selectivity of the small molecule but, equally as important, can also be
used to facilitate the identification of interacting target proteins and pathways
(see below). With access to such idealized collections of small molecules, the
challenge for the field of chemical biology includes: (a) determining which of
these molecules have spec@ effects on biological systems (at various levels of
resolution from proteins to whole organisms), (b) determining the structural
and physiochemical properties of molecules that specify associated biological
activities, and ultimately (c) directing future synthetic efforts along particular
pathways in the synthetic network to produce effectively small molecules that
modulate biological systems in any desired manner.
6.3.7
The “Target Identification” Problem
Like its classical genetic counterpart, an important aspect of forward chemical

genetics is the reliance on the ability of biological systems to reveal a set of
possible targets that when perturbed creates a desired phenotype [4-7, 101.
320
I GHowever, reliance on phenotype alone to select active small molecules requires
that the exact nature ofthe molecular interactions that give rise to the phenotype
be further investigated, usually by lower-throughput methods. This situation
differs from efforts directed toward target validation through indirect means,
such as loss of function caused by gene targeting, overexpression, or reduction
in expression by RNAi. By considering the effects of small molecules on intact
biological networks as part of the initial discovery process, the logic of forward
chemical genetics is a reversal of the logic of most ofthe current efforts in drug
discovery. Current drug discovery often picks a specific molecular target based
on indirect means of target validation, and then optimizes the interactions of
small molecules with a network of main- and side-chain interactions from an
individual polypeptide in vitro or in silica Since the eventual desire of the drug
discovery approach is to use the small molecule in the context of intact living
systems, the full spectrum of phenotypic effects is later explored only for a few
select compounds. As such, there exists a paucity of information about the
phenotypic effects of large collections of small molecules. Such information
would help enable the design of new probes and generations of small-molecule
therapeutics.
Besides the examples of the identification of the targets of the immuno-
suppressant compounds CsA and FK506 that are described above, there are
a growing number of successful examples of identifying the targets of small
molecules identified from forward chemical-genetic screens (Table 6-2) [SO].
However, as was true for early geneticists who used random mutagenesis to
introduce genetic variation and then faced the challenge of identifying where
in the genome the mutation was, the most challenging aspect of forward
chemical genetics, and the rate-limiting step in the discovery cycle, involves
the identification of the target of the small-molecule perturbation. To be suc-
cessful in targeting the myriad possible gene products that might result in a
desired phenotypic effect, chemical genetics requires access to diverse small
molecules that incorporate structural features to assist in target identification
and resynthesis.
One method of target identification that requires the modification of the
small molecules, which was the approach taken to identify the cellular targets
of CsA and FK506, involves the fractionation of cellular extracts with an
affinity matrix covalently modified with the biologically active small molecules.
A classic example of this approach is that of the identification of the target of
microbially derived cyclotetrapeptide trapoxin B (Fig. 6-7)[Sl]. Like trichostatin
A and butyrate [ 5 2 ] , trapoxin B was known at the time to share the properties
of causing both reversion of oncogene-transformed fibroblast cells and the
accumulation of acetylated histones [Sl]. However, unlike trichostatin A
and butyrate, trapoxin B was found to be an irreversible inhibitor of the
deacetylation of histones, and its cellular and in vitro activity were dependent
on the presence of the epoxide functionality [Sl]. Since trapoxin by itself was
not directly amenable to modification to facilitate target identification, using
a total of 20 steps from commercially available staring material, Taunton and
OH
Y297
N , D173
0 (Dl911
(Y303, 0
0
<N
" H131
.I,
(ti1401
D25& %o OJ D166
(D264)
0168 l(D174)
iDli6)
K- -
Fig. 6-7 Target identification o f an inhibitor
o f histone deacetylation.
K - t v Affi-Sol 10 offinity matrix
affinity matrix that lead t o the identification

by affinity chromatography o f H D A C l [53].
"
(a) Cap-linker-chelator model of H D A C (d) Crystal structure o f t r i c h o s t a t i n A in an

inhibitors and structures of trichostatin A HDAC-like protein revealing chelation by t h e
and trapoxin 6. (b) Histone hydroxamate o f a metal a t o m important t o
acetyltransferase (HAT) activity opposes t h e hydrolytic activity o f t h e enzyme 1551.
that o f H D A C activity. (c) Synthesis o f K-trap
colleagues replaced one of the amino acid moieties (phenylalanine) in the

cyclic ring with a lysine group to afford a modified trapoxin B, named K-trap,
which could be directly attached to a solid support (Affi-Gel 10) [53]. After
first using subcellular fractionation and anion exchange chromatography to
reduce the complexity of the proteome of human cells, the K-trap affinity
matrix isolated two nuclear proteins that copurified with histone deacetylase
activity [54]. Using peptide microsequencing, a complementary DNA (cDNA)
encoding the histone deacetylase catalytic subunit 1 (HDAC1) was identified,
which showed sequence similarity to Rpd3p, a known transcriptional regulator
in yeast [54].
Since the discovery of HDACl, the family of HDAC-related enzymes has
grown to include a total of 11paralogs, and is now the subject of both research
and clinical investigation. As reviewed recently, these proteins have emerged as
multifunctional nodes involved in many cellular processes including cell-cycle
progression, cellular differentiation, transcriptional regulation, cytoskeletal
322
I dynamics, and protein trafficking [55,561. Histone hyperacetylation induced
by HDAC inhibitors, such as trichostatin A and trapoxin B, correlates with

gene expression, cell-cycle arrest, cell differentiation, and cell death depending
on the cell type, duration of treatment, and the concentration of treatment. As
a result, there is a growing interest in developing means to modulate HDAC
activity, both as research tools and as therapeutic agents. HDAC inhibitors
have been proposed for treatment of cancer as well as neurodegenerative
disorders associated with mutations in polyglutamine encoding tracts [57].
In addition, agents already used clinically for other purposes, such as
valproate (which is used for the treatment of epilepsy, bipolar disorder,
and is used as an adjuvant therapy for schizophrenia), inhibit HDACs
and cause histone hyperacetylation in cultured cells [58]. Further research
aimed at elucidating a functional role for acetylation of proteins other than
histones is necessary to understand better the physiological targets of protein
deacetylases and the mechanisms by which HDAC inhibitors mediate their
spectrum of phenotypic effects (see below for an example of identifying
inhibitors of protein deacetylases with selectivity patterns different than that
of trichostatin A).
A second method of target identification involves preparing radiolabeled
derivatives of the small molecule and determining the molecular targets that
are labeled, perhaps covalently, by these radioactive probes. Ideally, a covalent
labeling allows for the isolation of a small molecule-protein complex under
conditions required for separating proteins under denaturing conditions of
sodium dodecyl sulfate, polyacrylamide gel electrophoresis (SDS-PAGE),or
through mass spectrometric detection of an altered mass of a given peptide
or protein. An excellent example of this approach is the identification of the
target of the steroidal alkaloid cyclopamine by Chen and colleagues (Fig. 6-8)
[59]. Cyclopamine had been known for many years to posses both teratogenic
and antitumor activities, and prior to their work had been shown to inhibit the
Hedgehog signaling in pathway in vertebrate cells and organisms, but through
unknown mechanisms. By synthesizing a 12’iodine-labeledphotoaffinity ( 125 I-
PA-cyclopamine) derivative, on light activation and consequent cross-linking
they were able to detect labeling of a “smoothened”, seven-transmembrane
protein that is the receptor for the ligand “patched”, when expressed in
COS-1 cells [59, 601. In further support of the target being smoothened, a
fluorescent (B0DIPY)-cyclopamine derivative was synthesized, and this probe
fluorescently labeled the membrane region of cells that express the smoothened
target in a manner that could be completed using cyclopamine itself [59, 601.
A third method of target identification uses a “three-hybrid” transcriptional
activation system that anchors a derivative of the active ligand for display
against a library of cDNAs fused to a transcriptional activation domain [61].
A fourth method involves the use of mRNA expression analysis to identify
targets and associate patterns of gene expression to specific perturbations
[33, 621. A fifth method involves the use of the display of target protein on
phage [63]. Lastly, with the recent advent of microarray technology and the
0
..& , ..
F N'-
B
F N '
MP ' Me
H O I "
Cycbpomim h t e of finity- cyckpamine BObIPY-cyckpdne
Fig. 6-8 Target identification of an inhibitor of Hedgehog

signaling. (a) Structure of the alkaloid cyclopamine.
(b) Photoaffinity and radioactive derivative of cyclopamine [59, 601.
(c) Fluorescently labeled derivative of cycloparnine [59, 60).
development of increasingly large collections of recombinant proteins from a

variety of organisms, including humans, it has become possible to search for
the protein targets of a small molecule in a high-throughput manner using
protein microarrays (Fig. 6-9) [42, 641. This approach in conjunction with
libraries of small molecules that can be easily modified to include a fluorescent
label provides a very promising path forward for target identification.
In addition to these biochemical methods, genetic mutations that render
a cell or organism resistant to the effects of a small molecule have also
been used to identify the target of small molecules and other components of
the interacting pathway. Now with the advent of collections of genome-wide
deletion strains in S. cerevisiae, and related knock-down collections created
using RNAi, the loss of function of genes and the matching of mutants with
similar phenotypes is being used to suggest candidate targets for further testing
[lo, 65-68]. Another approach uses multicopy gene suppression in which the
expression of a genomic library is screened for sensitivity or resistance to a
particular small molecule [69].While the success of biochemical approaches is
dependent on both the specificity of the compound and its affinity, the success
of genetic approaches depends on both the specificity of the compound and the
availability of existing mutant phenotypes to match the observed phenotypic
defects or to discover an interacting mutation. Technical developments in
both biochemical and genetic methods, along with the use of computational
science described below, will continue to provide improved solutions for target
identification in the years to come.
6.3.8
Relationship between Network Connectivity and Discovery o f Small-molecule
Probes
A question raised by chemical-genetic screens is why are some proteins targeted

by small molecules more frequent than others. For example, in an antimitotic
324
I G Forward Chemical Genetics
Fig. 6-9 Target identification of a suppressor of rapamycin [42].

(a) SMIR4 a suppressor of rapamycin identified using a
chemical-genetic modifier screen. (b) Identification o f gene
products that interact with biotin-SMIR4 using a yeast protein
rnicroarray [42].
screen performed by Haggarty and colleagues [70], over 80 small molecules

that directly targeted tubulin and two structurally distinct small molecules that
arrested cells in mitosis without targeting tubulin were later shown to target
the motor protein kinesin-5 (monastrol [71] and HR22C16 [72]). Similarly,
DeBonis and colleagues, by screening growth inhibitory compounds that were
obtained from the National Cancer Institute collection identified S-trityl-L-
cysteine, gossypol, flexeril, and two phenothiazines as kinesin-5 inhibitors
[73]. Kau and colleagues in a screen for inhibitors of FOXOla nuclear export
found many general inhibitors of nucleocytoplasmic transport, which, like the
natural-product leptomycin, directly bind the nuclear export factor CRM 1 [38].
In addition, multiple antagonists of calmodulin signaling were identified [38].
Are some proteins simply more susceptible to modulation by small molecules
or do biases exist in the way that targets are identified?
One explanation for these observations is provided by emerging models ofthe
global organization of cellular networks in which gene products are modeled
as nodes and the functions of genes are represented by edges [ll-151. In these
G.3 General Considerations I 325
models, where protein and genetic interaction networks are robust and have a
power-law distribution of edges, if a random perturbation results in a change
in phenotype, then the perturbation is more likely to target a highly connected
node (a node with many edges) than a node with a low degree of connectivity.
The relevance of these network properties can be illustrated by the following
experiment designed to simulate the act of screening small molecules in a cell-
based assay. Consider four nodes (modeling proteins), with edges (modeling a
function of a protein) of degrees of one, two, three, and four respectively, such
that the total sum of edges equals 10. If these nodes are randomly sampled by
picking an edge (simulating a molecular recognition event in which a small
molecule modulates a protein function), then even though there is a 25%
chance of picking each node, 70% of the time nodes of a degree equal to or
greater than three will be selected (assuming replacement of nodes after each
selection). This preferential selection of highly connected nodes is due to the
increased probability of interacting with a node with many edges. Thus, if we
consider that biological systems have evolved over time, and that many gene
products have been formed by reusing protein domains (e.g.,immunoglobulin
or GTP-binding domains) and by gene duplications, then identifying small
molecules with similar phenotypic effects in evolutionary distant organisms
may provide a method for mapping the chemical properties ofhighly connected
and, therefore, functionally important nodes in biological networks.
In support of this, many small molecules, including: rapamycin (inhibitor
of TOR proteins), FK506 (calcineurin phosphatase inhibitor), trichostatin A
(histone deacetylase inhibitor), colchicines/nocodazole (microtubule desta-
bilizers), taxol (microtubule stabilizer), latrunculin B (actin microfilament
destabilizer), brefeldin A (inhibits ADP ribosylation), etoposide/camptothecin
(topoisomerase inhibitors), wortmanin (phosphatidylinositol kinase inhibitor),
staurosporine (protein kinase C inhibitor), UCN-01 (Chkl/2 inhibitors), caf-
feine (ATM/ATR kinase inhibitors), roscovitine (cyclin-dependent kinase
inhibitor), target functionally important nodes in mammalian cells and have
similar biochemical interactions and phenotypic effects in organisms, such as
S. cerevisiae. Testing the hypothesis that there exists a correlation between the
connectivity of proteins in a biological network and the likelihood of finding a
modulating small molecule by screening will require further characterization
of the targets of biologically active small molecules in multiple biological
systems, and the analysis of the connectivity of these targets in the relevant
biological network.
6.3.9
Computational Framework for Forward Chemical Genetics: Legacy o f Morgan
and Sturtevant
On testing a set of small molecules in a chemical-genetic screen, it is a natural

question to ask how the same small molecules, or ones that are close structural
326
I analogs, performed in other related or unrelated chemical-genetic screens. As
a result of numerous such screens now available in the public domain, the
resulting datasets allow answering this question, but the size and complexity
(in terms of the number of possible comparisons between objects) of the
datasets require the use of computational tools that are designed for allowing
visualization and pattern recognition in high-dimensional spaces.
The need to develop a suitable computational framework is reminiscent of
the need of classical geneticists close to a century ago to develop an analytical
framework to guide the then nascent field. At that time, geneticists such
as Thomas H. Morgan and his graduate student Alfred H. Sturtevant, were
struggling with understanding the nature of Mendelian genes and trying
to interpret a growing amount of observational data on heritable variation
collected using forward genetic screen in the fruit fly Drosophila [2]. Particularly
puzzling was the pattern of inheritance of combinations of traits that did not
sort independently during meiosis as predicted by Mendel’s second law (law
of independent assortment) [l].After many years of collecting mutants and
analyzing data, Morgan and Sturtevant recognized that the “. . .frequency of
crossing over (recornbination) furnish[ed] evidence of the linear order of the
elements (genes) in each linkage group and of the relative position of the
elements (genes) with respect to each other” [2].Accordingly, mutant genes
(or allelic variation) could be “mapped” as a point in a one-dimensional
space using the metric (measured in centiMorgans) of 1% recombination
equal to one map unit. By making overlapping distance measurements, it was
discovered that a genetic map corresponding to the relative arrangement of
genes in the linear space could be constructed.
From these genetic maps, it became apparent that the deviation observed
from Mendel’s law of independent assortment could be explained by “linkage”
of genes due to their location within a similar position in the space representing
the underlying DNA sequence [2]. Although not obvious at the onset of Morgan
and Sturtevant’s studies, the maps of these genetic spaces are now known
to correspond physically to the arrangement of genes within a linear and
continuous sequence of the DNA, constituting a chromosome. In the end,
the recognition that genes could be arranged as a linear series provided the
conceptual foundation for the eventual sequencing of the complete human
and other model organism’s genomes [3].
6.3.10
Mapping of Chemical Space Using Forward Chemical Genetics
By analogy to the framework for classical genetics developed by Morgan

and colleagues, the development of an experimentally driven, computational
framework for chemical genetics, which allows the “mapping” ofthe functional
units (chemicals) that can induce variation in biological systems, holds the
potential to revolutionize the discovery of small-molecule probes for basic
research and, potentially, the discovery of novel therapeutic targets and agents
[74-761. But how can biologically active small molecules be “mapped” as points
(loci) in a space? If they can be mapped, what would the global properties
of this space look like and, moreover, what might the global properties of
such space reveal about the nature of the interaction of small molecules with
biological systems? While it is much too early to have a full answer to these
questions, a number of ideas have emerged as to how the “mapping” of small
molecules using biological descriptors might be approached.
Unlike genes, which are physically located at a locus on a chromosome based
on their linkage to other sequences of DNA (although they may move owing
to transpositions and recombination events), small molecules that induce
phenotypic variation in biological systems are themselves not physically located
in a space. Thus, if small molecules are to be mapped to a common space, then
the space must be considered to represent “abstract space” in the sense that it
is mathematically derived [74-761. This abstract space, which we will refer to
as “chemical space”, is formed by multiple dimensions, or axes, such that the
relative distance between small molecules represented by points becomes a
measure of their structural or functional similarity. The notion is that certain
regions in this space correspond to small molecules that have similar structure
or function.
According to such a framework, the corresponding data structure for
analyzing chemical space is most often that of a two-dimensional array, or
matrix, denoted by S, consisting of an ordered array of n columns and m rows
(Fig. 6-10). Each column (y]) in S, corresponds to a descriptor, and is denoted
by a bold face, lower case letter subscripted j (wherej = 1 to n). Each row (xi)
in S corresponds to a chemical, and is denoted by a bold face, lower case letter
subscripted i (where i = 1 to m). Accordingly, an element (en) of S encodes
information (m, n) about chemical m for descriptor n. This allows the elements
of S to be considered as coordinates in a multidimensional space spanned by
the descriptor axes, which, in turn, allows each chemical to be represented
as a vector whose magnitude and direction are given by the corresponding
values in S, x, = [el, e2, . . . . e,]. In this matrix-based representation of chemical
space, the relative distance between chemicals x, becomes a measure of their
similarity with respect to the particular descriptors considered.
As depicted in Fig. 6-10, when considering the dimensions or axes of
chemical space there are two fundamentally different classes of descriptors
that are used: computed and measured [74-761. These classes differ insofar as
the former are generally calculated using a computer and various algorithms
designed to determine the value of a specified mathematical function [77,
781, whereas the latter involve the observation of the effect of a given
small molecule on, for example, the function of a gene product (nucleic
acids, proteins) or metabolite (carbohydrate, lipid, other organic molecules)
[79, 801. Recognizing the distinction between chemical spaces derived from
computed descriptors as compared to measured descriptors is of fundamental
importance. While the former is unambiguously definable, the latter involves
328
IG Fonvard Chemical Genetics
Fig. 6-10 Mapping chemical space 1761. considered. Accordingly, small molecules xi
Principle component models o f chemical can be considered t o befunctionally similar i f
space are shown for 480 small molecules they are closely positioned (i.e., within a
analyzed using 24 computed molecular specified radius) in the underlying
descriptors and 60 measured phenotypic descriptor space. Since similarity between
descriptors derived from a cell-based assay small molecules is determined by the
o f cell proliferation. By considering the pattern o f interaction with biological
elements o f S as coordinates, small systems, the corresponding distance metric
molecules can be modeled as vectors, D complements the definition o f similarity
xi = [el, e2, . . . , en], in an n- dimensional obtained from calculated molecular
vector space. By defining the Euclidean descriptors based on chemical structure.
distance D between two vectors (e.g., x1 and Furthermore, since similarity in cell-based
x2) in this vector space t o be: assays results from patterns o f small
D I =~ C[(x1~ - xz)’], the space o f molecules interacting with expressed gene
chemical-genetic observation can be products, the corresponding distance metric
considered as a metric space. This means D complements the definition o f similarity
the relative distance D between chemicals xi obtained from DNA sequence or
is informative with respect t o similarity gene-expression analysis.
between the particular descriptors
the process of observation, and as such involves noise inherent to the process
of measurement. Measured phenotypic descriptors are also subject to the
influence of a variety of other variables, including the dose of the chemical,
length of treatment, and the genotype of the biological system.
I
6.3 General Considerations 329
Most representations of the structure of small molecules are themselves

graphical models of chemicals embedded in a three-dimensional space and
projected onto the two-dimensional plane of the paper (or screen) [81].
While such models are useful for visualization purposes, for computational
purposes small molecules are best represented more abstractly in the
form of an adjacency matrix. This adjacency matrix encodes both the
connectivity of a graph composed of nodes as atoms and edges as
bonds between nodes (Fig. (3-11).Once represented in this manner, the
structure of a small molecule can be analyzed using various graph- and
information-theoretic descriptors to quantify topological properties, along
with physiochemical properties, such as the molecular weight and estimations
of the partition coefficient between octanol and water (cLogP) [74, 75, 811.
This format enables a quantitative definition of molecular “similarity”,
and provides a means to create a map representing the relative position
of small molecules in a space formed from their descriptors (see below)
[77, 781.
One challenge with using molecular descriptors to create maps of chemical
space that can both locally and globally predict biological activity is that
a given chemical can exist as a variety of structures corresponding to
various protonation, tautomeric, and stereochemical states depending on
the molecule’s environment [44, 781. Another challenge is the ability of
enzymes to metabolize small molecules into what might be either an active
or inactive component. Together, these and other factors contribute to the
difficulty of predicting the function of a small molecule, particularly in the
context of an intact living system as complex as the human body. Nonetheless,
since chemical space can be explicitly defined using specific algorithms to
compute molecular descriptors, it seems reasonable to expect that a universally
agreed upon set or perhaps biological, mechanism-specific sets of molecular
descriptors will be useful for creating maps of chemical space.
Fig. 6-11 Small molecules as chemical node (atom), the type o f edge (bond), and
graphs [Sl].Representation of the structure the connectivity of nodes. Hydrogen atoms
of small molecules as graphs encoded by an are not considered as nodes in the graph.
adjacency matrix that specifies the type of
330
In contrast to computed molecular descriptors, observed or phenotypic

descriptors involve the measurements of the effects of a small molecule on a
biological system. Accordingly, phenotypic descriptors provide the opportunity
to classify chemical structures by creating maps of chemical space according
to biologically relevant descriptors (Fig. 6-12) [74-761. Given the wide range
of observable properties of biological systems, the challenge for mapping
chemical space using chemical genetics is to determine the most relevant
phenotypic descriptors and to measure them in a high-throughput enough
manner, which in turn may depend on the biological system and process
being studied. Ultimately, it is the relationships between the positions of
small molecules in different chemical spaces that will allow researchers to
understand the chemotype-phenotype mapping at increasing resolutions
(Fig. 6-13).
6.3.1 1
Dimensionality Reduction and Visualization of Chemical Space
Given a multidimensional matrix of data derived from chemical-genetic

screens and computed molecular descriptors (Fig. 6-10), meaningful visual
Fig. 6-12 Mapping chemical space using characterization. Clustering and the
multidimensional phenotypic descriptors. construction o f chemical-genetic networks
Phenotypic data from multiple assays are provide methods for visualization o f
arranged in a chemical-genetic data array high-dimensional observation spaces and
and computational methods are used t o pattern finding.
select small molecules for further
Fig. 6-13 Overview ofchemical space. O n the left, chemicals are

positioned in space using computed molecular descriptors. O n
the right chemicals are positioned in space using measured
phenotypic descriptors of biological activity.
and compact representations are required to allow for data exploration

and to facilitate subsequent modeling efforts aimed at understanding the
relationships between objects (small molecules and assays). To solve related
problems in other fields of study, a variety of “dimensionality-reduction” and
pattern-finding techniques have been developed [77- 791. Although differences
exist in the specific algorithms, the techniques share the common goals of
extracting trends and information that is otherwise not apparent from manual
inspection, and to provide a more compact representation or model of the data.
In doing so, dimensionality-reduction and pattern-finding techniques allow
for the creation of higher-level representations of the information inherent
in the lower-level relational data with a large data matrix. In general, two
types of such “learning” techniques are used: supervised and unsupervised.
In supervised learning, a set oflabeled or known data is used to classify the rest
of an unknown dataset. Alternatively, in unsupervised learning the goal is to
discover a “natural” grouping of objects without knowledge of any class labels.
One method of unsupervised learning that has proved useful for analyzing
data from chemical-genetic screens is called clustering. This method attempts
to cluster objects into sets that are somehow related on the basis of a set
332
I of descriptors. For example, consider a model dataset consisting of seven
SMPs (SMP-1to -7) and a control treatment (e.g., only organic solvent), which
are subject to an array of five, chemical-genetic screens consisting of three
cell-based assays measuring: (a) neurite extension, (b) neuron viability, and
(c) synapse formation, and two in vitro assays with cell extracts to measure the
polymerization of: (d) actin, and (e) tubulin (Fig. 6-14(a)).In the resulting data
matrix, a value of “1” encodes the observation that the SMPs were active in
the assay and otherwise a value of “0” is used. Even with such a small dataset,
which uses a binary rather than a continuous valued measure, the challenge
of defining the major activity patterns and the compounds that are similar to
each other becomes apparent. What exactly does “similar” mean and how is it
computed?
Although for binary data other distance, metrics are in general more
appropriate (e.g., Tanimoto metrics), for simplicity we can compute the
standardized (to the mean and standard deviation of the distribution) Pearson
correlation matrix, which contains the correlation coefficients between each of
the five assays. These data can then be used to cluster the chemicals based on
their correlation as a metric of similarity. The groupings depicted in Fig. 6-14(b)
Assay
Tiibiiliii
A - SMP-1
Q)
Neiirite Exteiisioe N e w o i l Viability Syiiapse Foriliatioil
1 1 1
Actiii
1 0
3 SMP-2 1 0 0 0 1
-8 SMP-3 1 1 1 1 0
1 0 0 1
2 E : 0
- SMP6
1
0 0
I 1
0
0
1
0
0
E SMP-7 1 1 I 0 0
u, coaliol 1 0 0 0 0
Small Molecule Clustering C Assay Clustering
Fig. 6-14 Cluster analysis of indicates activity and a value o f “0”

multidimensional, chemical-genetic data. indicates that t h e c o m p o u n d was inactive.
(a) Example o f five small-molecule (b) Dendrogram showing clustering o f t h e
perturbagens (SMP-1 t o -7) and their activity small molecules. (c) Dendrogram showing
i n five phenotypic assays. A value o f “1” clustering o f t h e assays.
6 . 3 General Considerations 1 333
reflect the fact that, of the seven SMPs, some had identical patterns of activity
(analogous to mutations mapping to the same region of the chromosome),
while others showed varying levels of common activity (analogous to mutations
mapping to different regions of a chromosome). Likewise, by transposing
the data matrix and considering the small molecules as descriptors for the
phenotypic assays, it becomes possible to use the information encoded in the
pattern of interaction of small molecules with biological systems to classify
the assay measurements instead of the small molecules (Fig. G-l4(c)). Just
as for the small molecules, the resulting data creates a high-dimensional,
information-rich signature of the biological system being probed, which in
turn can be used for pattern recognition and classification. The activity patterns
from small-molecule descriptors can provide a measure of the diversity of
particular cell types or cell states when subject to additional perturbations,
such as those provided by natural genetic variation and chemical-genetic
modifiers. When characterizing different genotypes, the generation of these
“perturbation profiles”, by analogy to mRNA profiling, has been referred to
as chemical-genomicprofiling (see below) [82]. The nature of these profiles can
shed light on the underlying chemical differences between cell states, and
may eventually be useful as cellular network-based diagnostics to complement
traditional use of DNA sequence analysis. However, to date there have been
only a few studies that have purposefully used the patterns of activities of small
molecules to classify biological systems.
Besides clustering, which has been widely used to group small molecules
into various structural and activity classes, another method of dimensionality
reduction for multidimensional chemical-genetic screening is that of principal
component analysis (PCA). Unlike clustering, this method does not group
small molecules into discrete groups by imposing a particular structure of the
data (i.e.,to form clusters). Instead, to analyze the diversity of small molecules,
PCA consists of a linear transformation of the original system of axes formed
by the n-dimensions of the data matrix, where n is the number of descriptors.
This transformation is in the form of a Euclidean distance-preserving rotation,
the directions of which are determined by computing a set of eigenvectors
and corresponding eigenvalues of a diversity matrix created by computing
a standardized covariance matrix (i.e., Pearson correlation coefficients). The
resulting eigenvectors provide a new set of linearly independent, orthogonal
axes, calledfactors or principal components,each ofwhich accounts for successive
directions in the n-dimensional ellipsoid spanning the multivariate distribution
of the original data. The corresponding eigenvalues account for progressively
smaller fractions of the total variance in the original data. Accordingly, PCA
creates a global model that minimizes the information lost on projection into a
space of reduced dimensionality, and is thus well suited for exploring complex
activity patterns and datasets that do not have a clustered structure. Besides
allowing for visualization of multidimensional data, PCA has a practical
application for data analysis, as the reduced number of dimensions simplifies
subsequent computations that may be memory- and time-intensive. While PCA
334
I provides a readily computable, linear dimensionality reduction affording linear
G Fonvard Chemical Genetics
combinations of descriptors that allow for the maximum amount of variance

to be described by a minimum set of descriptors, a number of algorithms
with improved outcome have been described, and others will undoubtedly be
developed in the years to come.
Following the example ofthe model data set shown in Fig. 6-14(a),to perform
PCA the correlation matrix is computed to reveal the relationship between the
descriptors being considered. From the correlation matrix, the eigenvalues and
corresponding eigenvectors are computed (Fig. 6-15(a)).These eigenvalues are
mathematical objects that represent the quality of the dimensionality reduction
from the original multidimensional space. For ideal representations, the
first two or three eigenvalues will correspond to a high percentage of the
variance. Each eigenvalue corresponds to a factor (a linear combination of
the initial descriptors that is uncorrelated with the other factors), and each
factor corresponds to one dimension in the new space. In this example
(Fig. 6-15(a)),the first eigenvalue equals 2.43 and represents 48.5% of the
total variability. This means that if we were to represent the data on only
one axis we would still be able to see 48.5% of the total variability of the
data. The “cumulative %” calculated from the eigenvalues provides an idea
of the global variability represented when using the axes of interest. Using
the corresponding eigenvectors to create a new rotated axis, the SMPs can be
seen distributed throughout the resulting assay measurement space, with the
distance between them in the reduced space (here three of the five original
dimensions) a measure of their similarity (Fig. 6-15(b)).Thus, like the cluster
analysis, we conclude that the pairs of compounds SMP-1, and -3 and SMP-5,
and -7 are the same with the distance between the other compounds a measure
of their functional differences. As the size of the dataset and complexity of the
activity patterns increases, methods of analysis like PCA become invaluable
tools for discerning the global activity patterns and relationships between
objects on the axes [80].
6.3.12
Discrete Methods of Analysis o f Forward Chemical-genetic Data
Given a multidimensional matrix of data derived from chemical-genetic

screens, it is also possible to use computational tools derived from the field
of discrete mathematics and principles, again, borrowed from graph theory
[81].For example, through multiple screens biologically active small molecules
can be linked together into a network of chemical-genetic interactions, which
can be represented by the graph G = (V, E ) , where V represents either small
molecules or assays and E represents edges indicating the activity of a small
molecule in a given assay (Fig. 6-16). To determine that a small molecule is
active, a threshold or a statistical measure based on a control distribution of
inactive or control compounds can be used. Ultimately, the topology of the
sw-2 .9.
SMP-3 * ii
Control 8‘2
Fig. 6-15 Principal component analysis o f defines a coordinate transform (rotation)

multidimensional, chemical-genetic data. that best decorrelates the data into
(a) Eigenvalues and associated variance, orthogonal linear subspaces. (b) Resulting
and eigenvectors and associated factor three-dimensional chemical space created
scores computed from the data in from using the first factors (principal
Fig. 6-14(a). The matrix of eigenvectors components) as axes.
Fig. 6-16 A chemical-genetic network and 1107 edges ( E ) between small-molecule

representing a graph C = ( V , E ) (data from nodes (colored red or yellow for active; gray
[82]). Each node ( V circles) represents a for inactive; total o f 352) and an assay node
biologically active small molecule or a (colored blue; total o f 74 in 7 organisms).
phenotypic assay and each edge (E; line) This “energy-minimized’’ representation
represents an observed biological activity. was computed using Pajek v0.72 (see
Shown here is an undirected, unweighted, http://vlado.fmf.uni-
bipartite graph with a total of 426 nodes (V) Ij.si/pub/networks/pajek/).
336
I chemical-genetic network for a particular biological system will be determined
I; Forward Chemical Genetics
by the selectivity of the small molecules and constrained by the properties of

the underlying biological networks being studied.
This graph-theoretic framework is well suited for visualizing the results of
performing chemical-genetic modifier screens iteratively on any of the active
products of an assay. Here, each node represents a biologically active small
molecule (e.g., an enhancer or a suppressor) that is linked (represented by an
edge) to new nodes (small molecules with differentfunctions) through different
phenotypic assays. The result is reminiscent of the use of pairs of complexity-
generating reactions with an essential product-substrate relationship along
a synthetic pathway to create structurally complex and diverse compounds.
In this case, each node in the corresponding network represents a discrete
chemical entity that can be linked (represented by an edge) to new nodes
(small molecules with different structures) through synthetic transformations.
Thus, the recognition of “product-substrate” relationships is useful for both
the designing of diverse collections of small molecules and the exploration of
the diversity of biological mechanisms.
6.4
One of the most useful applications of chemical genetics is to reveal the gene
products that function in pathways or processes in an unbiased manner. In
this section we will describe two practical examples. We will then end with
another example of applying collections of small molecules discovered using
chemical genetics to study the phenotypic differences of cells with different
genotypes in an unbiased, global manner (chemical-genomic profiling).
6.4.1
Example 1: Mitosis and Spindle Assembly
Since Pernice’s description in 1889 of the effects of colchicines, small

molecules have played essential roles in dissecting the molecular mechanisms
involved in chromosome segregation during mitosis (Fig. 6-2)[83],and later in
the discovery of tubulin as the cellular target. Owing to the clinical efficacy of
inhibitors of mitosis as antitumor agents, such as paclitaxel (Taxol) [84],which
were originally discovered by the National Cancer Institute’s plant natural-
product screening program in the early 1960s [85],numerous chemical-genetic
screens for inhibitors of mitosis have been performed. Most of these screens
have used natural-product extracts as a source material of chemical diversity
[83].In an attempt to discover new inhibitors ofmitosis from a synthetic library
that worked in ways similar and different from existing small molecules,
Haggarty and colleagues used a collection of 16320 compounds and both
6.4 Applications and Practical Examples 1 337
Fig. 6-17 Forward chemical-genetic screen compound activity from the initial cell-based
for inhibitors of mitosis (data from Ref. 73). and in vitro tubulin polymerization assay.
(a) Overview o f mitotic cell cycle. (b) (d) Examples o f a compound that
Example of data from one 384-well plate destabilized microtubules (deploy-2b) and a
form the cytoblot primary screen with compound that stabilized microtubules
increased TC-3 mAb reactivity indicative of (synstab A).
an increased mitotic index. (c) Summary o f
phenotypic and biochemical assays (Fig. 6-17(a))[70]. As an initial filter, the

compounds were screened using a high-throughput cytoblot assay, where an
antibody is used to detect a posttranslational modification characteristic of the
process ofinterest [31].This assay used TG-3, a monoclonal antibody (mAb)that
recognizes a phosphorylated form of the protein nucleolin formed in mitosis,
to report indirectly on the progress of cells through mitosis [86]. Accordingly,
small molecules that increase the reactivity of this mAb in cells are likely
to have arrested cells in the mitotic state. Since many compounds that were
previously shown to arrest cells in mitosis affect directly the polymerization
of a - and B-tubulin (the heterodimeric subunits of microtubules), and thereby
alter the microtubule dynamics of the mitotic spindle, compounds that
scored positive in this initial assay are subsequently tested in an in vitro
tubulin polymerization assay. Finally, to classify compounds further based
on their phenotypic effects, fluorescence microscopy was used to visualize
338
I the distribution of microtubules, actin, and chromatin in cells treated with
compounds of interest.
Two rounds of screening 16 320 compounds at -20-50 PM resulted in
the identification of 139 compounds that increased the number of cells in
mitosis (Fig. 6-17(b))[70]. Fifty-two of these compounds destabilized and one
compound, named synstab A (for synthetic stabilizer), stabilized microtubules
through a direct interaction with tubulin. Although the discovery of small-
molecule inhibitors of protein-protein interactions is in general demanding,
approximately 0.3% of compounds screened were found to be direct inhibitors
of alp-tubulin interactions in this study, which illustrates an example of using
phenotypic screenings to identify components in a pathway that are most
easily targeted by small molecules. It also suggests that the toxicity associated
with many compounds may be due to their ability to destabilize microtubules.
To determine the mechanism of action of the 86 compounds, each was
tested in a TG-3 cytoblot assay using cells that had previously been arrested
in interphase by the histone deacetylase inhibitor, trichostatin A or the
topoisomerase I1 inhibitor, ICRF-193 (Fig. 6-18). Under these conditions, none
of the compounds allowed cells to accumulate in mitosis, indicating that they
require active cell-cycle progression for an increase of reactivity with the TG-3
mAb. Subsequent cellular studies revealed that many of these small molecules
cause an altered stability of microtubules in cells in interphase suggesting
that they also targeted tubulin (Fig. 6-17(c)).The common occurrence of
compounds targeting microtubules recapitulated what has been observed in
natural-product screening, where the sensitivity of cells to perturbation of the
mitotic spindle was first observed [83]. This screen, however, identified for
the first time compounds that affect the mitotic machinery without directly
targeting microtubules. As discussed in Mayer et al. [71],the unique monopolar
phenotype of one of these compounds, named monastrol, inhibits specifically
the motor protein kinesin-5 (Fig. 6-18). This provided evidence for the first
time of a means to perturb the mitotic spindle without directly targeting
tubulin. Subsequently, monastrol has been a useful tool for dissecting the
molecular mechanisms underlying spindle assembly [87] Second generation,
more potent kinesin-5 inhibitors have now been discovered and are beginning
to be tested in tumor models.
6.4.2
Example 2: Protein Acetylation
To expand further the molecular toolbox available for studying intracellular

protein acetylation [88], a number of chemical-genetic screens have been
performed. To identify probes of the mechanism through which HDAC
inhibitors cause cell-cell cycle arrest and affect histone acetylation, a “cytoblot”
cell-based screen was used to identify small-molecule suppressors of the
trichostatin A named the ITSAs (for inhibitor of trichostatin A) (Fig. 6-19) [40].
G.4 Applications and Practical Examples I 339
Fig. 6-18 New activities in chemical space and antimitotics (blue). In all, there were 20
and the target o f monastrol. suppressors o f ICRF-193, 21 suppressors o f
(a) Three-dimensional representation o f ITSA, 89 antimitotics, and 2 small molecules
chemical space showing the position o f that scored in both the antimitotic and
15 120 small molecules-(colored balls) in a trichostatin A suppressor screen.
molecular descriptor space derived from the Monastrol's location was as shown. Testing
first three principal components axes o f over 30 structurally similar analogs
(W1 W3) obtained from the analysis ofthe
~
revealed no other active compounds [71].
corresponding structural and (b) Cocrystal structure o f monastrol with the
physiochemical descriptors (data from Refs motor domain o f human KSP (Eg5) showing
40, 41, 70, 80). Inset shows 132 biologically that monastrol confers inhibition by creating
active small molecules colored based on an "induced-fit'' to a pocket away from t h e
phenotypic data from cell-based assays for adenosine triphosphate and magnesium
suppressors o f the topoisomerase inhibitor binding site within the catalytic center (data
ICRF-193 (red), suppressors o f t h e histone from Ref. 87).
deacetylase inhibitor trichostatin A (green),
Besides counteracting the cell-cycle arrest phenotype of trichostatin A, the

ITS As counteract trichostatin-induced histone acetylation and transcriptional
activation. Some of these ITSAs are active as suppressors of trichostatin A
in zebrafish and yeast suggesting they target an evolutionarily conserved
component of chromatin remodeling. As such, suppressors of HDAC
inhibitors, such as the ITSAs, may prove to be valuable probes of many
biological processes involving protein acetylation.
In addition to butyrate, trichostatin A, and trapoxin B, other small-molecule
inhibitors of protein deacetylation have been identified from both natural and
340
Fig. 6-19 Chemical-genetic modifiers o f acetylation and altered chromatin

trichostatin A (data from Ref. 40). remodeling. The “ITSAs” (for inhibitor o f
Trichostatin A causes cell-cycle arrest, which trichostatin A) suppress the ability of
is correlated with an increase in histone trichostatin A t o arrest the cell cycle.
synthetic sources [55]. For example, using a panel of cell-based assays based
on the recognition of histone and a-tubulin acetylation on specific lysine
residues using antibodies and a library of over 7200 small molecules derived
from a diversity-oriented synthesis that included “biasing” elements to target
the compounds toward the family of HDACs [89], over 600 small-molecule
inhibitors of protein deacetylation were identified (Fig. 6-20) [80]. Following
the decoding of chemical tags and resynthesis, the selectivity of one inhibitory
molecule (tubacin) was shown toward a-tubulin deacetylation and that of
another (histacin) toward histone deacetylation (Fig. 6-21) [80]. Tubacin was
found not to affect the level of histone acetylation, gene-expression patterns,
or cell-cycle progression. Using immunoprecipitated, recombinant enzyme,
it was determined that the class I1 histone deacetylase 6 (HDAC6) is the
intracellular target of tubacin [90]. Through a combination of the use of
catalytically inactive point mutations in each of the two catalytic domains
of HDAC6 and tubacin, it was shown that only one of the two catalytic
domains of HDAC6 possesses tubulin deacetylase activity, and that only that
domain’s deacetylase activity could be inhibited by tubacin. Collectively, the
small molecules identified as suppressors of trichostatin A (ITSAs) and the
selective inhibitors of protein deacetylation should facilitate dissecting of the
role of acetylation in a variety of cell-biological processes (Fig. 6-22) [40, 901.
6.4.3
Example 3: Chemical-genomic Profiling
With increasing appreciation of the contribution of genotype to the outcome of

therapeutic treatments, efforts in drug discovery are moving more toward
6.4 Applications and Practical Examples I 341
Fig. 6-20 Forward chemical-genetic screen AcLysine-selective (green), and most potent
for inhibitors o f protein deacetylation (data (blue). (c) Chemical-genetic network from
from Ref. 80). (a) Overview o f cell-based screening data after applying the
screens o f the 1,3-dioxane-based, Fruchterman-Reingold “energy”
diversity-oriented synthesis-derived library minimization algorithm
using antibodies t o measure tubulin and (http://vlado.fmf.uni-lj.si/pu b/
histone acetylation. (b) Relative position o f networkslpajekl). Nodes represent either
selected active compounds in a assays or small molecules according t o the
three-dimensional principal component indicated colors. Edges (black lines) connect
model computed from five cell-based assay bioactive small molecules t o the
descriptors. AcTubulin-selective (red), corresponding assay.
“personalized medicine” based on an individual’s genetic make up. As

a result, there is much interest in characterizing the genetic differences
between cells using profiling experiments, where genome-wide measurements
yield rich fingerprints for comparison and interpretation. While differential
labeling of mRNA or protein samples and their analyses on microarrays and
two-dimensional gels, respectively, are facilitating global views of biological
networks, they do so by ultimately analyzing intrinsic molecular features
of gene products strictly in an observational manner. In contrast, a new
type of profiling experiment where the response of genetically similar but not
identical cells to individual or pairwise combinations of biologically active small
molecules has been developed, which is referred to as chemical-genomicprojiling
(Fig. 6-23(a)). Using this method of profiling, the ability of combinations
of small molecules to interact antagonistically or synergistically provides a
chemical tool to resolve differences between biological networks. Because the
outcomes of this method of profiling are dependent on the interaction of small
molecules in the context of an intact genetic network (i.e., perturbations),
342
I
Fig. 6-21 Selective inhibitors ofu-tubulin (tubacin) and histone deacetylation (histacin)
identified by chemical-genetic screening [go].
this method differs fundamentally from profiling methods based on DNA

sequence or mRNA/protein expression patterns (i.e., observations).
For example, chemical-genomic profiling was performed using a WT strain
of the budding yeast S. cerevisiae along with nine otherwise isogenic deletion
strains, each missing a component of the cell polarity network [82]. As a model
phenotype relevant to the function of the deleted genes, cell-cycle progression
was used. To obtain a chemical-genomic profile, a two-dimensional matrix
of all possible painvise combinations of 24 small molecules, each with a
different structure, was expanded in a third dimension by using the WT
and nine deletion strains. In total, 5760 assay measurements were obtained
(Fig. 6-23(b)).Besides a set of 4 known biologically active small molecules,
20 additional biologically active small molecules were used that had been
discovered in yeast chemical-genetic modifier and synthetic-lethal screens.
Given that many of these modulators have unknown targets and mechanisms
of action, they were referred to as SMPs, for “small-molecule perturbagens”.
After analyzing the growth of each well, the data were encoded into the form
of a binary adjacency matrix, A, with one row and one column for each of the
24 small molecules. A value of 0 was used to indicate no observable effect on
growth, and a value of 1 was used to indicate no growth or that growth was
reduced, in both replicates. Each adjacency matrix was then used to construct
Fig. 6-22 Molecular tools for the dissection o f intracellular protein acetylation [40, 80)
a discrete model in the form of a graph G = (V, E) composed of V nodes, one

for each small molecule, and E edges connecting nodes representing small
molecules whose combination resulted in a value of 1 in the adjacency matrix
A. The results obtained revealed that the structure of the genetic network
determines the structure of the chemical-genetic network with none of the
deletion strain networks being identical to each other or the WT network
(Fig. 6-23(c)).Given a graphical representation of the phenotypic differences,
graph-theoretic descriptors that are analogous to molecular descriptors used for
the quantitative analysis and comparison of the structures of small molecules
were computed for each of the 10 chemical-genetic networks. Collectively, the
numerical values of the descriptors yielded a topological fingerprint of each
chemical-genetic network; standard clustering and dimensionality-reduction
algorithms were used to reveal global similaritiesldifferences of the observed
chemical-genetic networks. Besides aiding the characterization of molecular
diversity and annotation of chemical space, the results suggest that chemical-
genomic profiling may serve as a tool for the characterization of perturbations
in biological networks or of the networks themselves (e.g., as a diagnostic
tool). These capabilities may lead to new approaches to discern the molecular
344
I
Fig. 6-23 Chemical-genomic profiling (data node networks derived from the mapping o f
from Ref. 82). (a) 276 unique combinations a matrix o f 2 4 x 24 combinations o f small
and 24 single treatments o f “small-molecule molecules against a set o f 10 strains o f t h e
perturbagens” (SMPs) were assayed for an budding yeast. Graphs were visualized using
effect on the cell cycle o f budding yeast. Pajek v0.72 and “energy” minimizations
Each ofthe 10 strains profiled had a performed using the Fructherman-Reingold
different genotype yielding a algorithm (http://vlado.fmf.uni-
three-dimensional matrix o f 24 x 24 x 10 lj.si/pub/networks/pajek/). None o f the 10
observations. (b) Structures o f 23 small chemical-genetic networks were identical,
molecules (other than dimethylsulfoxide) indicating that the structure o f t h e genetic
used to profile 10 yeast genotypes in a network determines the structure ofthe
three-dimensional matrix. (c) Twenty-four chemical-genetic network.
etiology of complex phenotypes, including those involved in human disease,

that in the case of quantitative traits, emerge as a result of the additive effects
of multiple alleles.
6.5
Future Development
For chemical genetics to truly compete with classical genetics, and for it to
function as a general approach to dissecting biological mechanisms, there
6.5 Future Development 1345
needs to be continued development and refinement of the techniques for

screening and assessing complex patterns of phenotypic changes. Besides
the specific examples of identifying inhibitors of mitosis and modulators of
protein deacetylation described above, it is worth noting the remarkable ability
of antibodies to detect posttranslational modifications of proteins and other
biosynthetic events that occur intracellularly at a single-cell level. Antibodies
differ from small molecules in their size, composition, and origin as they are
immunoglobulins composed of both heavy and light chains, which are secreted
by immune system cells. The ability to recognize epitopes, as small as a single
acetyl group within the context of chromatin or a single phosphate group on a
protein within the cytoplasm of cells, speaks of their specificity and power as
markers of phenotypes. The development of an expanded collection of cell-state
selective antibodies, and improved methods for multiplexing multiple probes
in parallel or in series would have widespread utility for chemical genetics
as part of cytoblot and image-based screens. Similarly, further development
of genetically encoded probes that allow for imaging of signaling events and
cellular processes in live cells in real time will open up previously unexplored
areas of cellular biology. In particular, the use of genetically encoded probes
targeted to specific cell populations will be useful for creating more complex
and physiologically relevant assays, particularly in animal models.
By aiming to provide information-rich profiles of chemical and biological
systems, chemical genetics should provide a framework for a number of
lines of deeper inquiry that will continue to challenge chemical biologists
for many years to come. One line of inquiry will be to investigate the
cellular mechanisms in terms of interactions(s) with a molecular target, a
cell, and an entire organism. A prerequisite for many of these studies and the
understanding of such chemotype-phenotype relations will be the discovery of
specific molecular targets of small molecules using proteome-wide approaches
(Fig. 6-9). With targets in hand, these efforts can be merged with structural
biology efforts to look at atomic resolution interactions, and an examination of
the degree to which specificity for targets influences the observed phenotypic
effects.
With the use of phenotypic descriptors derived from cell-based assays,
a second line of inquiry will be to determine how well traditional
statistical approaches involving linear and nonlinear regression can derive
structure-activity relationships, or whether alternative approaches, for
example, based on creating discrete graphical networks, are required
(Fig. 6-16). There also remains a paucity of studies addressing more general
properties of bioactive molecules, independent of those that are developed into
drugs. Furthermore, with the development of numerous natural-product-like
small molecules that are entering the realm of screening, and the noted
differences between many natural products and drugs, it remains to be seen
whether a strict adherence to rules, such as those developed by Lipinski based
on analyzing known drugs, continues to hold up as the best predictor of
biological activity for probe development and therapeutic drug discovery.
346
I G Forward Chemical Genetics
Lastly, it may be possible to search for a “molecular recognition

code(s)” that ultimately determines the mapping, both locally and globally,
between molecules in multidimensional molecular descriptor spaces and
multidimensional phenotypic descriptor spaces (Fig. 6-10). These codes may
be considered at a variety of levels, including more general categories that
allow the prediction of properties relevant to the interaction with different
subcellular structures (e.g., the mitochondria or cytoskeleton) or different
biological systems (e.g., the xenobiotic transformation systems involved in
drug metabolism). Knowledge of such codes would, as did knowledge of the
genetic code, usher in a new era of research and medical advances that would
allow the systematic modulation of gene-product function.
Besides these lines of inquiry, there are a number of “grand challenges” for
chemical genetics (Fig. 6-24).One of these grand challenges will undoubtedly
be to assay, in a high-throughput multiplexed manner, in real time, in live
cells, the signal transduction events leading from an extracellular stimulus, to
the intracellular signaling events that lead to a change in chromatin structure,
changes in gene expression, protein translation, and consequent biological
response. To return to its roots, perhaps the ideal model pathway for developing
this capability will be that of T-cell-receptoractivation in lymphocytes, leading
to the activation of calcineurin, changes in chromatin remodeling at NF-AT
target genes, and the resulting secretion of IL-2, which were elucidated in
part as described above using CsA and FK506. Here, assays exist for many of
the steps in the pathway, although not yet in a suitable manner that allows
the interrogation of live cells and the measurement of changes in real time.
For the latter reasons, such an effort will require further advances in the use
Fig. 6-24 Future challenges for forward chemical-genetic discovery of probes of

biological mechanisms.
6.6 Conclusion 1 347
of genetically-encoded or small-molecule fluorescent probes, and automated

imaging.
A second “grand challenge” will be to test the hypothesis that there exists
a correlation between the connectivity of proteins in the underlying biological
network and the likelihood of finding a cognate small molecule by chemical-
genetic screening. As explained above, this will require substantial develop-
ment in the future of improved methods for target identification and under-
standing the overall topology of biological networks (genetic and biochemical).
A final “grand challenge” that, ideally, would be incorporated into a scheme
for assaying the effects of small molecules from the cell surface to the nucleus as
described above, would be to use immortalized human cell lines, or even differ-
entiated human stem-cell lines, that have been fully genotyped and are known
to provide a comprehensive sample of the major patterns of genetic diversity
for screening. With this set of cell lines as a reference set, it could then be pos-
sible to determine whether individual or combinations of “SMPs” can reveal
phenotypic consequences of otherwise cryptic allelic differences that act in con-
cert to create complex, non-Mendelian traits associated with human disease.
Should this be possible, then chemical genetics will truly have proven its merit
and contributed to our understanding of genotype-phenotype relationships.
6.6
Conclusion
Indeed, the vista ofthe biochemist is one with a n infinite horizon. And yet, this program of
explaining the simple through the complex smacks suspiciously ofthe program ofexplaining
atoms in terms ofcomplex mechanical models. I t looks sane until the paradoxes crop up and
come into sharperfocus. In Biology we are not yet at the point where we are presented with
clear paradoxes and this will not happen until the analysis ofthe behavior ofliving cells has
been carried intofargreater detail. This analysis should be done on the living cell’s own terms
and the theories should befomulated withoutfear ofcontradicting molecular physics.
Max Delbruck
Nobel prize in medicine or physiology, 1958
Mendel’s rules for considering the discreteness and combinatorics of

inherited traits provided a foundation for classical genetics that has continued
to provide insight into genotype-phenotype relations and the nature of heredity
for more than a century [l].By using small molecules to perturb biological
systems conditionally at the level of gene products, rather than at the level
of genes themselves, chemical genetics promises to complement the use of
classical genetic analysis to study a wide range of biological mechanisms and
systems [S-lO]. Because of the confluence of recent technical and conceptual
developments, the field of chemical biology in general, and chemical genetics
in particular, is well poised to translate the discoveries made by genomics
and proteomic studies into tools and technologies that will be transformative
348
I in basic and biomedical research. While earlier advances in the field have
previously come from molecular biology, chemical synthesis, and materials

science, future advances will require integration of the information derived
from computational studies of molecular structure and observational studies of
molecular function into global models that are both explanatory and predictive.
To this end, the analysis of multidimensional data derived from chemical
genetics, using methods of dimensionality-reduction and pattern-finding
techniques, is beginning to provide a computational framework for mapping
multidimensional, chemical descriptor spaces [74-77, 911. Overall, these
techniques allow for the creation of higher-level representations of the
information inherent in the lower-level relational data encoded within matrices
of data. The systematic screening of small molecules in minimally redundant,
cell- and organism-based assays, which cover a wide range of biological
phenotypes relevant to basic and clinical research, will enable accurate
maps of chemical space to be constructed, which can be compared to those
derived from using computed molecular descriptors. Here, the use of global
methods of analysis, when coupled with local methods aimed at validating
and elucidating the mechanisms of action of reference makers (landmarks)
in these spaces, should allow, over time, for increasingly higher resolution
maps to be created, analogous to the progression of genetic maps over the past
century. As evidenced by the efforts toward the development of ChemBank
[92], Blueprint’s Small-Molecule Interaction Database (SMID) [93], and the
PubChem Database [94], the importance of computational science, and open
access to information on small-molecule activities and structures, to chemical
biology is rapidly growing and will continue to do so in the future. Through
continued refinement and development of new techniques, particularly for
target identification and understanding the influence of genotype on biological
activity of small molecules [95],it should be possible to annotate genomes, not
only by sequence analysis but also functionally using the language of organic
chemistry. Should it prove possible to use individual or combinations of SMPs
to reveal phenotypic consequences of otherwise cryptic allelic differences that
act in concert to create complex, non-Mendelian traits, chemical genetics
will truly have earned its name. As summarized by Max Delbruck, who
originally trained as a physicist under Niels Bohr, the vista of the chemical
biologist indeed “is one with an infinite horizon”. For this reason, the use
of forward chemical genetics to discover small-molecule probes for biological
mechanisms will likely continue to flourish in the years to come.
Acknowledgments
Members of the Schreiber Lab, the Broad Institute’s Chemical Biology

Program, and Michel Roberge are thanked for sharing their insight and
passion for chemical genetics. We apologize to our colleagues whose work we
were unable to cite for reasons of space constraints.
References 1349
References
1. G . Mendel, Experiments in Plant 14. R. Albert, H. Jeong, A.L. Barabasi,

Hybridization, Harvard University Error and attack tolerance of complex
Press, Cambridge, 1963. networks, Nature 2000, 406, 378-382.
2. T.H. Morgan, A.H. Sturtevant, H.J. 15. T.1. Lee, N.J. Rinaldi, F. Robert, D.T.
Muller, C.B. Bridges, The Mechanism Odom, 2. Bar-Joseph, G.K. Gerber,
ofMendelian Heredity, Henry Holt and N.M. Hannett, C.T. Harbison, C.M.
Company, New York, 1915. Thompson, I. Simon, J. Zeitlinger,
3. E.S. Lander, et al., Initial sequencing E.G. Jennings, H.L. Murray, D.B.
and analysis of the human genome, Gordon, B. Ren, J.J.Wyrick, J.B.
Nature 2001, 409,860-921. Tagne, T.L. Volkert, E. Fraenkel, D.K.
4. T.J. Mitchison, Towards a Gifford, R.A. Young, Transcriptional
pharmacological genetics, Chem. Biol. regulatory networks in Saccharomyces
1994, I, 3-6. cerevisiae, Science 2002, 298, 799-804.
5. B.R. Stockwell, Chemical genetics: 16. S.L. Schreiber, Chemistry and biology
ligand-based discovery of gene of the immunophilins and their
function, Nat. Rev. Genet. 2000, I, immunosuppressive ligands, Science
116-125. 1991, 251,283-287.
6. S.L. Schreiber, The small-molecule 17. S. Ho, N. Clipstone, L. Timmermann,
J. Northrop, 1. Graef, D. Fiorentino,
approach to biology: chemical genetics
and diversity-oriented organic
J. Nourse, G.R. Crabtree, The
mechanism of action of cyclosporin a
synthesis make possible the
and FK506, Clin. Immunol.
systematic exploration of biology,
Immunopathol. 1996 80, S4O-S45.
Chem. Eng. News 2003,81, 51-61.
18. T. Kino, H. Hatanaka, M. Hashimoto,
7. K.M. Specht, K.M. Shokat, The
M. Nishiyama, T. Goto, M. Okuhara,
emerging power of chemical genetics,
M. Kohsaka, H. Aoki, H. Imanaka,
Curr. Opin. Cell Biol. 2002, 14, FK-506, a novel immunosuppressant
155-159. isolated from a Streptomyces. I .
8. S.L. Schreiber, Chemical genetics Fermentation, isolation, and
resulting from a passion for synthetic physico-chemical and biological
organic chemistry, Bioorg. Med. Chem. characteristics, J . Antibiot. 1987, 40,
1998, 6, 1127-1152. 1249- 1255.
9. B.R. Stockwell, Exploring biology with 19. M.W. Harding, A. Galat, D.E.
small organic molecules, Nature 2004, Uehling, S.L. Schreiber, A receptor for
432,846-854. the immunosuppressant FK506 is a
10. S. Shang, D.S. Tan, Advancing cis-trans peptidyl-prolyl isomerase,
chemistry and biology through Nature 1989, 341,758-760.
diversity-oriented synthesis of natural 20. J . Liu, J.D. Farmer Jr, W.S. Lane,
product-like libraries, Curr. Opin. J . Friedman, I. Weissman, S.L.
Chem. Biol. 2005, 9, 248-258. Schreiber, Calcineurin is a common
11. J.R. Sharom, D.S. Bellows, M. Tyers, target of cyclophilin-cyclosporin a and
From large networks to small FKBP-FK506 complexes, Cell 1991, 66,
molecules, Curr. Opin. Chem. Biol. 807-815.
2004,8,81-90. 21. J. Aramburu, J. Heitman, G.R.
12. H. Jeong, B. Tombor, R. Albert, Z.N. Crabtree, Calcineurin: a central
Oltvai, A.L. Barabasi, The large scale controller of signalling in eukaryotes,
organization of metabolic networks, EMBO Rep. 2004, 5, 343-348.
Nature 2000,407,651-654. 22. G.J. Hannon, J.J. Rossi, Unlocking the
13. S. Maslov, K. Sneppen, Specificity and potential of the human genome with
stability in topology of protein RNA interference, Nature 2004, 431,
networks, Science 2002, 296, 910-913. 371-378.
350
IG
23.
Fonvard Chemical Genetics
C.C. Mello, D. Conte Jr, Revealing the 31. B.R. Stockwell, S. J. Haggarty, S.L.
world of RNA interference, Nature Schreiber, High-throughput screening
2004,431,338-342. of small molecules in miniaturized
24. L.H. Hartwell, Twenty-five years of cell mammalian cell-based assays
cycle genetics, Genetics 1991, 4, involving post-translational
975-980. modifications, Chem. Biol. 1999, G,
25. M.M. Metzstein, G.M. Stanfield, H.R. 71-83.
Horvitz, Genetics of programmed cell 32. K. Stegmaier, K.N. Ross, S.A. Colavito,
death in C. elegans: past, present and S. O’Malley, B.R. Stockwell, T.R.
future, Trends Genet. 1998, 14, Golub, Gene expression-based
410-416. high-throughput screening (GE-HTS)
26. C. Nusslein-Volhard, E. Wieschaus, and application to leukemia
Mutations affecting segment number differentiation, Nut. Genet. 2004, 36,
and polarity in drosophila, Nature 257-263.
1980, 287,795-801. 33. T.R. Hughes, M.J. Marton, A.R. Jones,
27. M.C. Mullins, M. Hammerschmidt, C.J. Roberts, R. Stoughton, C.D.
P. Haffter, C. Nusslein-Volhard. Armour, H.A. Bennett, E. Coffey,
Large-scale mutagenesis in the H. Dai, Y.D. He, M. J. Kidd, A.M. King,
zebrafish: in search of genes M.R. Meyer, D. Slade, P.Y. Lum, S.B.
controlling development in a Stepaniants, D.D. Shoemaker,
vertebrate, Curr. Biol. 1994, 4, D. Gachotte, K. Chakraburtty,
189-201. J . Simon, M. Bard, S.H. Friend,
28. P.M. Nolan, J. Peters, M. Strivens, Functional discovery via a
D. Rogers, J. Hagan, N. Spurr, I.C. compendium of expression profiles,
Gray, L. Vizor, D. Brooker, Cell 2000, 202,109-126.
E. Whitehill, R. Washbourne, 34. T. J. Mitchison, Small-molecule
T. Hough, S. Greenaway, M. Hewitt, screening and profiling by using
X. Liu, S. McCormack, K. Pickford, automated microscopy, Chembiochem
R. Selley, C. Wells, 2004,29,33-39.
Z. Tymowska-Lalanne, P. Roby, 35. Z.E. Perlman, T.J , Mitchison, T.U.
P. Glenister, C. Thornton, C. Thaung, Mayer, High-content screening and
J.A. Stevenson, R. Arkell, P. Mburu, profiling of drug activity in an
R. Hardisty, A. Kiernan, A. Erven, K.P. automated centrosome-duplication
Steel, S. Voegeling, J.L. Guenet, assay, Chembiochem 2005, 6,
C. Nickols, R. Sadri, M. Nasse, 145-151.
A. Isaacs, K. Davies, M. Browne, E.M. 36. Y. Feng, S. Yu, T.K. Lasell, A.P.
Fisher, J. Martin, S. Rastan, S.D. Jadhav, E. Macia, P. Chardin,
Brown, J. Hunter, A systematic, P. Melancon, M. Roth, T. Mitchison,
genome-wide, phenotype-driven T. Kirchhausen, Exol: a new chemical
mutagenesis programme for gene inhibitor of the exocytic pathway, Proc.
function studies in the mouse, Nut. Natl. Acad. Sci. U.S.A. 2003, 200,
Genet. 2000, 25,440-443. 6469-6474.
29. R.T. Peterson, B.A. Link, J.E. Dowling, 37. T.J. Nieland, Y. Feng, J.X. Brown, T.D.
S.L. Schreiber, Small molecule Chuang, P.D. Buckett, J. Wang, X.S.
developmental screens reveal the logic Xie, T.E. McGraw, T. Kirchhausen,
and timing of vertebrate development, M. Wessling-Resnick, Chemical
Proc. Natl. Acad. Sci. U.S.A. 2000, 97, genetic screening identifies
12965- 12969. sulfonamides that raise organellar pH
30. S.N. Bailey, D.M. Sabatini, B.R. and interfere with membrane traffic,
Stockwell, Microarrays of small Trafic 2004,5,478-492.
molecules embedded in biodegradable 38. T.R. Kau, F. Schroeder,
polymers for use in mammalian S. Ramaswamy, C.L. Wojciechowski,
cell-based screens, Proc. Natl. Acad. J.J.Zhao, T.M. Roberts, I. Clardy,
Sci. U.S.A. 2004, 101,16144-16149. W.R. Sellers, P.A. Silver, A chemical
References 1351
genetic screen identifies inhibitors of structural features both reminiscent of

regulated nuclear export of a Forkhead natural products and compatible with
transcription factor in PTEN-deficient miniaturized cell-based assays, 1.Am.
tumor cells, Cancer Cell 2003,4, Chem. SOC.1998,120,8565-8566.
463-476. 48. H.E. Blackwell, L. Perez, R.A.
39. F.C. Schroeder, T.R. Kau, P.A. Silver, Stavenger, J.A. Tallarico, E. Cope
J. Clardy, The psammaplysenes, Eatough, M.A. Foley, S.L. Schreiber, A
specific inhibitors of FOXOla nuclear one-bead, one-stock solution approach
export,]. Nat. Prod. 2005,68, to chemical genetics: part 1 , Chem.
574-576. Biol. 2001,8, 1167-1182.
40. K.M. Koeller, S.J. Haggarty, B.D. 49. P.A. Clemons, A.N. Koehler, B.K.
Perkins, I. Leykin, J.C. Wong, M.C. Wagner, T.G. Sprigings, D.R. Spring,
Kao, S.L. Schreiber, Chemical genetic R.W. King, S.L. Schreiber, M.A. Foley,
modifier screens: small molecule A one-bead, one-stock solution
trichostatin suppressors as probes of approach to chemical genetics: part 2.,
intracellular histone and tubulin Chem. Biol. 2001,8,1183-1195.
acetylation, Chem. Biol. 2003,10, 50. G.P. Tochtrop, R.W. King, Target
397-410. identification strategies in chemical
41. S.J. Haggarty, K.M. Koeller, T.R. Kau, genetics, Comb. Chem. High
P.A. Silver, M. Roberge, S.L. 7'hroughput Screen. 2004,7, 677-688.
Schreiber, Small molecule modulation 51. M. Kijima, M. Yoshida, K. Sugita,
of the human chromatid decatenation S. Horinouchi, T. Beppu, Trapoxin, an
checkpoint, Chem. Biol. 2003, 10,
antitumor cyclic tetrapeptide, is an
1267-1279.
irreversible inhibitor of mammalian
42. J. Huang, H. Zhu, S.J. Haggarty, D.R.
histone deacetylase, J. Biol. Chem.
1993,268,22429-22435.
Spring, H. Hwang, F. Jin, M. Snyder,
S.L. Schreiber, Finding new
52. M. Yoshida, M. Kijama, M. Akita,
components of the target of rapamycin
T. Beppu, Potent and specific
(TOR) signaling network through
inhibition of mammalian histone
chemical genetics and proteome chips,
Proc. Natl. Acad. Sci. U.S.A. 2004,101,
deacetylase both in vivo and in vitro by
trichostatin A,J. Biol. Chem. 1990,265,
16594-16599.
17174- 17179.
43. R.A. Butcher, S.L. Schreiber, A small
53. J. Taunton, J.L. Collins, S.L. Schreiber,
molecule suppressor of FK506 that
targets the mitochondria and Synthesis of natural and modified
modulates ionic balance in trapoxins, useful reagents for
saccharomyces cerevisiae, Chem. Biol. exploring histone deacetylase
2003,10,521-531. function,]. Am. Chem. SOC.1996,118,
44. J. Clardy, C. Walsh, Lessons from 10412- 10422.
natural molecules, Nature 2004,432, 54. J. Taunton, C.A. Hassig, S.L.
829-837. Schreiber, A mammalian histone
45. J. Handelsman, M.R. Rondon, S.F. deacetylase related to the yeast
Brady, J . Clardy, R.M. Goodman, transcriptional regulator Rpd3p,
Molecular biological access to the Science 1996,272,408-411.
chemistry of unknown soil microbes: 55. M.S. Finnin, J.R. Donigian, A. Cohen,
a new frontier for natural products, V.M. Richon, R.A. Rifkind, P.A.
Chem. Biol. 1998,5,245-249. Marks, R. Breslow, N.P. Pavletich,
46. S.L. Schreiber, Target-oriented and Structures of a histone deacetylase
diversity-oriented organic synthesis in homologue bound to the TSA and
drug discovery, Science 2000,287, SAHA inhibitors, Nature 1999,401,
1964-1969. 188-193.
47. D.S. Tan, M.A. Foley, M.D. Shair, S.L. 56. C.M. Grozinger, S.L. Schreiber,
Schreiber, Stereoselective synthesis of Deacetylase enzymes: biological
over two million compounds having functions and the use of
352
I small-molecule inhibitors, Chem. Biol. identification, Chem. Biol. 2005, 12,
2002, 9, 3-16. 55-63.
57. B. Langley, J.M. Gensert, M.F. Beal, 66. T. Hughes, B. A n d r e w c . Boone, Old
R.R. Ratan, Remodeling chromatin drugs, new tricks: using genetically
and stress resistance in the central sensitized yeast to reveal drug targets,
nervous system: histone deacetylase Cell 2004, 116, 5-7.
inhibitors as novel and broadly 67. P.Y. Lum, C.D. Armour, S.B.
effective neuroprotective agents, c u r . Ste~aniants,G . Cavet, M.K. Wolf, J.S.
Drug Targets CNS Neurol. Disord. 2005, Butler, 1.c. Hinshaw, p. Gamier, G.D.
4,41-50. Prestwich, A. Leonardson,
58. C.J. Phiel, F. Zhang, E.Y. Huang, M.G. p. Garrett-Engele,C.M. Rush,
Guenther, M.A. Lazar, P.S. Klein, M. Bard, G. Schimmack, J.W. Phillips,
Histone deacetylase is a direct target of C.J. Roberts, D.D. Shoemaker,
valproic acid, a potent anticonvulsant, Discovering modes of action for
mood stabilizer, and teratogen, /. Biol. therapeutic compounds using a
Chem. 2001, 276,36734-36741. genome-wide screen of yeast
59. J.K. Chen, J. Taipale, M.K. Cooper, heterozygotes, Cell 2004, 116,
P.A. Beachy, Inhibition of hedgehog 121-137.
signaling by direct binding of 68. G. Giaever, P. Flaherty, J. Kumm,
cyclopamine to smoothened, Genes M. Proctor, C. Nislow, D.F. Jaramillo,
DCV.2002, 16,2743-2748. A.M. Chu, M.I. Jordan, A.P. Arkin,
60. J.K. Chen, J. Taipale, K.E. Young, R.W. Davis, Chemogenomic profiling:
T, Maiti, P,A, Smallmolecule identifying the functional interactions
modulation of smoothened activity, of small molecules in yeast, Proc. Natl.
Proc. Natl. Acad. Sci. U.S.A. 2002, 99, Acad. Sci. U.S.A.2004, 101,793-798.
69. H. Luesch, T.Y. Wu, P. Ren, N.S.
14071- 14076.
Gray, P.G. Schultz, F. Supek, A
61. E.J. Licitra, J.O. Liu, A three-hybrid
genome-wide overexpression screen in
system for detecting small
yeast for small-molecule target
ligand-protein receptor interactions,
identification, Chem. Biol. 2005, 12,
Proc. Natl. Acad. Sci. U.S.A. 1996, 93,
55-63.
12817-12821.
70. S.J. Haggarty, T.U. Mayer, D.T.
62. M.J. Marton, J.L. DeRisi, H.A.
Miyamoto, R. Fathi, R.W. King, T.J.
Bennett, V.R. Iyer, M.R. Meyer, C.J.
Mitchison, S.L. Schreiber, Dissecting
Roberts, R. Stoughton, J. Burchard, cellular processes using small
D. Slade, H. Dai, D.E. Bassett Jr, L.H. molecules: identification of
Hartwell, P.O. Brown, S.H. Friend, colchicine-like,taxol-like, and other
Drug target validation and small molecules that perturb mitosis,
identification of secondary drug target Chem, Biol, 2000, 7, 275-286.
effects using DNA microarrays, Nat. 71. T.U. Mayer, T.M. &poor, S.J.
Med. 1998,4,1293-1301. Haggarty, R.W. King, S.L. Schreiber,
63. P.P. Sche, K.M. McKenzie, J.D. White, T,J , Mitchison, smallmolecule
D.J. Austin, Display cloning: inhibitor of mitotic spindle bipolarity
functional identification of natural identified in a phenotype-based
product receptors using cDNA-phage screen, Science 1999, 286, 971-974.
display, Chem. B i d . 1999, 6, 707-716. 72. S . Hotha, J.C. Yarrow, J.G. Yang,
64. J. Labaer, N. Ramachandran, Protein S. Garrett, K.V. Renduchintala, T.U.
microarrays as tools for functional Mayer, T.M. Kapoor, HR22C16: a
proteomics, Curr. Opin. Chem. Biol. potent small-molecule probe for the
2005, 9, 14-19. dynamics of cell division, Angew.
65. H. Luesch, T.Y. Wu, P. Ren, N.S. Chem. Int. Ed. Engl. 2003, 42,
Gray, P.G. Schultz, F.A. Supek, 2379-2382.
Genome-wide overexpression screen 73. S. DeBonis, D.A. Skoufias, L. Lebeau,
in yeast for small-molecule target R. Lopez, G. Robin, R.L. Margolis,
R.H. Wade, F. Kozielski, In vitro 84. P.B. Schiff, J. Fant, S.B. Honvitz,
screening for inhibitors of the human Promotion of microtubule assembly in
mitotic kinesin Eg5 with antimitotic vitro by taxol, Nature 1979, 277,
and antitumor activities, fvfol. Cancer 665-657.
Ther. 2004,3,1079-1090. 85. M.C. Wani, H.L. Taylor, M.E. Wall,
74. C.M. Dobson, Chemical space and P. Coggon, A.T. McPhail, Plant
biology, Nature 2004, 432, 824-828. antitumor agents. VI. The isolation
75. C. Lipinski, A. Hopkins, Navigating and structure of taxol, a novel
chemical space for biology and antileukemic and antitumor agent
medicine, Nature 2004, 432, 855-861.
from Taxus brevifolia, J. Am. Chem.
76. S.J. Haggarty, The principle of
SOC.1971, 93, 2325-2327.
complementarity: chemical versus
86. M. Roberge, B. Cinel, H.J. Anderson,
biological space, C u r . Opin. Chem.
L. Lim, X. Jiang, L. Xu, C.M. Bigg,
Biol. 2005, 9, 296-303.
77. D.K. Agrafiotis, Multiobjective M.T. Kelly, R.J. Andersen, Cell-based
optimization of combinatorial screen for antimitotic agents and
libraries, Mol. Divers. 2002, 5, identification of analogues of rhizoxin,
209-230. eleutherobin, and paclitaxel in natural
78. D.K. Agrafiotis, V.S. Lobanov, F.R. extracts, Cancer Res. 2000, 60,
Salemme, Combinatorial informatics 5052-5058.
in the post-genomics ERA, Nut. Rev. 87. Y. Yan, V. Sardana, B. Xu,
Drug Discov. 2002, I , 337-346. C. Homnick, W. Halczenko, C.A.
79. J.N. Weinstein, T.G. Myers, P.M. Buser, M. Schaber, G.D. Hartman,
O’Connor, S.H. Friend, A.J. Fornace, H.E. Huber, L.C. Kuo, Inhibition of a
K.W. Kohn, T. Fojo, S.E. Bates, L.V. mitotic motor protein: where, how,
Rubinstein, N.L. Anderson, J.K. and conformational consequences, /.
Buolamwini, W.W. van Osdol, A.P. Mol. Biol. 2004, 335, 547-554.
Monks, D.A. Scudiero, E.A. Sausville, 88. T. Kouzarides, Acetylation: a
D.W. Zaharevitz, B. Bunow, V.N. regulatory modification to rival
Viswanadhan, G.S. Johnson, R.E. phosphorylation? E M B O J . 2000, 19,
Wittes, K.D. Paul1 Jr, An 1176-1 179.
information-intensive approach to the 89. S.M. Sternson, J.C. Wong, C.M.
molecular pharmacology of cancer, Grozinger, S.L. Schreiber, Synthesis of
Science 1997, 275, 343-349. 7200 small molecules based on a
80. S.J. Haggarty, K.M. Koeller, J.C. substructural analysis of the histone
Wong, R.A. Butcher, S.L. Schreiber, deacetylase inhibitors trichostatin and
Multidimensional chemical genetic trapoxin, Org. Lett. 2001, 3,
analysis of diversity-oriented 4230-4242.
synthesis-derived deacetylase S.J. Haggarty, K.M. Koeller, J.C.
90.
inhibitors using cell-based assays,
Wong, C.M. Grozinger, S.L. Schreiber,
Chem. B i d . 2003, 10,383-396.
Domain-selective small molecule
81. A.T. Balaban, Chemical Applications of
inhibitor of HDAC6-mediated tubulin
Graph Theory, Academic Press,
London, 1976. deacetylation, Proc. Natl. Acad. Sci.
82. S.J. Haggarty, P.A. Clemons, S.L. U.S.A. 2003, 100,4389-4394.
Schreiber, Chemical genomic 91. S.J. Haggarty, P.A. Clemons, J.C.
profiling of biological networks using Wong, S.L. Schreiber, Mapping
graph theory and combinations of chemical space using molecular
small molecule perturbations, J. Am. descriptors and chemical genetics:
Chem. SOL.2003, 125, 10543-10545. deacetylase inhibitors, Comb. Chem.
83. E. Hamel, Antimitotic natural High Throughput Screen. 2004, 7,
products and their interactions with 669-676.
tubulin, Med. Res. Rev. 1996, 16, 92. ChemBank, 2006; http://www.broad
207-23 1. harvard.edu/chembio.
354
IG
93.
Blueprint’s Small-Molecule 95. A.B. Parsons, R. Geyer, T.R. Hughes,

Interaction Database (SMID),2006; C. Boone, Yeast genomics and
http://smid.blueprint.org. proteomics in drug discovery and
94. PubChem, 2006; http://pubchem. target validation, Prog. Cell Cycle Rex
ncbi.nlm.nih.gov/. 2003,5,159-166.
Chemical Biology
I355
7
Reverse Chemical Genetics Revisited
7.1
Reverse Chemical Genetics - An Important Strategy for the Study of Protein
Function in Chemical Biology and Drug Discovery
Rolf Breinbauer, Alexander Hillisch, and Herbert Waldmann
7.1.1
Introduction
Drug discovery has seen several paradigm shifts over the last two decades.
Several new techniques have been introduced to widen what was believed
to be the bottleneck of this endeavor at the given time. Although many of
these techniques did not keep their initial promise, there is no doubt that
high-throughput screening (HTS) and protein structure-based drug design
have contributed enormously to the process of developing new high-affinity
protein binders and have made it more efficient. The sequencing of whole
genomes has provided numerous new potential drug targets. Unfortunately,
the undisputed value of these techniques has not (yet) led to an increase
in the number of new chemical entities entering the market. Spectacular
cases of several costly failures of drug candidates in late-stage clinical trials
or - even worse - the withdrawal of several drugs, (e.g., COX-2 inhibitors),
which benefited millions of patients due to unanticipated side effects, has
reminded us that the biological systems with which we are dealing are
extremely complex. Target validation has become the critical factor in drug
discovery. Consequently, all methods that contribute to a deeper understanding
of biological systems ranging from protein function within a cell to the
complex interplay within multicell organisms will gain importance in the
future. Systems biology, although still in its infancy, might be one approach
to achieve this goal.
The pharmacological approach, in which protein function is modulated by
small molecules, has played a prominent role in the study ofbiological systems.
Compared to other and complementary approaches, such as DNA knockouts,
Edited by Stuart L. Schreiber, Tarun M. Kapoor. and Cunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KCaA, Weinheim
ISBN: 978-3-527-31150-7
356
I 7 Reverse Chemical Genetics Revisited
DNA mRNA Proteins
I I
- Gene knockout - Antisense - Small molecules
- DNA-binder - RNAi
Scheme 7.1-1 Probing of biological systems on different levels of hierarchy.
antisense, or RNA interference it has several advantages. The most significant

among them is the fact that small molecules probe biological systems at the
level of proteins. This aspect is shared only by antibodies, which are usually
limited to the interaction with extracellular proteins (Scheme 7.1-1).
In an analogy to related terms in mutation genetics, Schreiber and
Mitchison defined “forward chemical genetics” as the probing of biological
systems with small molecules and observing changes in phenotypes or
biomarkers. On the other hand, in “reverse chemical genetics” a small
molecule probe with validated affinity to a defined protein is used as a tool
to study the biological function of this particular protein within its natural
context [ 1-31.
7.1.2
The concept of reverse chemical genetics has been applied since natural
product probes have been discovered as research tools in biology. In
experiments on the salivary gland of the cat, J. N. Langley (in 1878)
showed the mutually antagonistic effect between pilocarpine and atropine.
He observed a similar relationship between nicotine and curare in his study
of the contraction of muscle cells. These results inspired him to formulate the
“receptor theory” of drugltarget interaction, which has become the main pillar
of pharmacology [4].Once it was realized that the toxicity of colchizine, the
poison of meadow saffron, originates from its ability to lead to cell cycle arrest,
biologists have exploited this property to intentionally create this condition
and study the biological consequences. The use of microtubule poisons has
enabled numerous important discoveries, such as the determination of the
correct number of diploid chromosomes in humans or the demonstration
of the role of microtubuli in cell migration, tumor invasion, or anchoring
of the Golgi complex at the microtubule-organizing center [ S ] . Many other
such probes have been identified and as shown in Table 7.1-1 the number
of references containing their name may serve as an indicator how big their
impact is on biological studies.
7. I The Study ofprotein Function in Chemical Biology and Drug Discovery I 357
Lm
3
r
N
.h
r 2
0
00 m CQ
N L
D
3 2
0 0
fp
""\$
y yo o
0
N
358
00
0
* 2
0
00
N
i
I
0
I
-0
~
But, it is not only secondary metabolites, that function as natural

poisons, that have stimulated small molecule-fueled research of protein
function. In 1914, Henry Dale classified cholinergic receptors as being
either nicotinic or muscarinic on the basis of whether nicotine or muscarine
stimulated a response [GI. Similarly, Raymond Ahlquist explained the different
pharmacological actions of drugs on smooth muscle using the existence
of two types of adrenoceptors. Noradrenaline was an a-receptor agonist
(making smooth muscles to contract), whereas isoprenaline was a B-receptor
agonist (causing smooth muscles to relax). Adrenaline, which is a mixed
alB-receptor agonist, exhibits both activities, but varied with the site of action
(Fig. 7.1-1) [6, 71. Today, 60 years later, these receptors have been recognized
to be membrane located G-protein coupled receptors (GPCRs) for which
several subtypes a1-2,B1-3 and even subsubtypes have been identified. These
receptors represent some of the most important drug targets addressed
by current medications. Very selective inhibitors have been identified and
developed as drugs. For example, selective -antagonists (“B-blockers”)have
saved millions of lives and have reached a blockbuster status.
James Black, who was one of the most important contributors in the
development of the B-blockers, applied the lesson learnt there for the
development of the most successful drug of the 1980s. He and others
interpreted the observation that alkyl-substituted histamine analogs did not
exhibit equal activity on histamine receptors in different tissues as a result
of the existence of more than one histamine receptor. Indeed, it could be
Adrenaline
(a@-agonist)
Noradrenaline lsoprenaline
(a-agonist) (P-agonist)
Fig. 7.1-1 Agonists o f a-and B-adrenergic receptors.

360
I shown that classical antihistamines in the treatment of inflammation affected
7 Reverse Chemical Genetics Revisited
the so-called histamine HI-receptor, whereas in the stomach, a new type of

receptor named histamine HZ -receptor was involved in the release of gastric
acid. Refinement of the early antihistamine compounds led to the development
of the selective H2-receptor antagonist cimetidine, which revolutionized the
treatment of ulcers (Fig. 7.1-2) [GI.
Until the early 1980s small molecules played an important role in the
discovery of new proteins. Tissue-dependent differences in the responses of
drug candidates often indicated that several subtypes of a receptor might exist,
stimulating research in this direction. On the other hand, clinical observations
of the side effects of the drugs used revealed that other protein targets were
affected as well. By variation of the structure this side effect could be optimized
to become a new drug against a different disease. This approach is highlighted
by the classic example of the development of sulfoantibiotics into antidiuretics
targeting carbonic anhydrases, enzymes which had been characterized just a
few years before [8, 91. For a long time, the search for proteins was guided
by the proposition that an observation made within a biological experiment
could be best explained if an according protein would exist. This meant that
in many cases essential features of its function were known before it was
identified.
In contrast, today with the emergence of new techniques in molecular
biology the scenario dominates, in which new genes and proteins are found
for which no experimental evidence of their function is known [lo]. Sequence
comparisons by bioinformatics tools often allow making qualified guesses
about their potential functions, by proposing functional relationships with
proteins of similar sequence. While sequencing of a gene or a protein had
previously been a multiyear effort, nowadays it is routinely performed and
offered by service groups within large research institutions or commercial
companies. The currently pending functional assignment of the many newly
sequenced proteins will benefit from a new renaissance of the use of small
molecule probes.
,f--NH2
N
H
Histamine Cimetidine
(agonist) (Hp-agonist)
Fig. 7.1-2 Development ofcimetidine as a Hz-selective agonist for the treatment of ulcers.
7.1 The Study of Protein Function in Chemical Biology and Drug Discovery I 361
7.1.3
The key element of any reverse chemical genetics approach is the access
to a small molecule, which modulates protein function by binding to the
target protein [11]. Such molecules can be identified using two different
approaches (1) HTS of large compound collections and (2) computer-aided
design of compounds on the basis of the structure of the target protein,
directed synthesis, and biological testing of selected compounds.
1. High-throughput screening: HTS is used to test large numbers of
compounds for their ability to affect the activity of target proteins. Today,
entire in-house compound libraries with millions of compounds can be
screened with a throughput of 10000 (HTS) up to 100000 compounds per
day (ultra high-throughput screening, uHTS) using robust test assays [12, 131.
Homogeneous “mix and measure” assays are preferred for HTS as they avoid
filtration, separation, and wash steps that can be time consuming and difficult
to automate. Assays for HTS can be grouped into two categories: so-called
solution-based biochemical assays and cell-based assays [ 14, 151. The former
are based on radioactive (scintillation proximity assay, SPA), fluorescence
(fluorescence resonance energy transfer, FRET, fluorescence polarization, FP,
homogeneous time resolved fluorescence, HTRF, and fluorescence correlation
spectroscopy, FCS), calorimetric and surface plasmon resonance (SPR, e.g.,
BiaCore) detection methods to quantify the interaction of test compounds with
biological target molecules. SPAS in HTS have largely replaced heterogeneous
assays that make use of radiolabeled ligands with subsequent filtration steps to
measure high-affinity binding to receptors. Cell-based assays include (a)second
messenger assays that monitor signal transduction, (b) reporter gene assays
that monitor cellular responses at the transcriptional/translational level, (c) cell
proliferation assays that detect induction or inhibition of cell growth, and
(d) phenotypic assays that monitor change in cell morphology or related
parameters.
Once a robust test assay has been set up, the choice of suitable compound
libraries is the next key step. An excellent source of selective small molecule
probes is the natural product pool. In an evolutionary process of millions of
years, nature has come up with molecular structures that offer an evolutionary
advantage to the species that makes the effort to synthesize these molecules. In
most cases, these molecules are used to defend against enemies or to paralyze
or kill preys. It is in the nature of these processes that such molecular weapons
act most efficiently if they interfere with important biological processes of the
target species, meaning that biologically relevant protein targets are addressed.
A disadvantage of natural compounds is the often complex structure and
the associated low synthetic accessibility. However, as has been outlined in
Chapter 7.1.2 natural products have been the first small molecule probes
used in biological studies and continue to be of significant importance (vide
362
I infia). Recently,the combination of chemoinformatics,bioinformatics, and the
chemistry of natural products has led to the insight that natural products can
be regarded as evolutionary selected starting points in chemical space and to
the establishment of “natural product guided compound library development”
[ l G , 171. Historically grown libraries of synthetic compounds or compounds
from combinatorial chemistry approaches are usually the first choice in the
pharmaceutical industry for HTS. Every large pharmaceutical company and an
increasing number of startup companies and research institutions now have
access to a collection of these compounds. These collections have been built by
in-house synthetic efforts, purchased from commercial vendors, or obtained
by the synthesis of compound libraries using combinatorial methods [ 181.
2. Computer-assisted drug design: Small molecule probes can also be identified
or designed from scratch using computational tools exploiting knowledge of
pharmacophores or the protein structure as a guiding principle. Computational
tools encompass 3D-pharmacophore searches and high-throughput docking
[17, 191. In 3D-database searching, structures of compounds from virtual
or physically existing libraries are screened to identify compounds that
fulfill a certain spatial arrangement of functional groups (a pharmacophore).
High-throughput docking involves the in silico docking of small molecules
into binding sites of target proteins with known or predicted structure.
Empirical scoring functions are used to evaluate the steric and electrostatic
complementarity (the fit) between the compounds and the target protein.
The highest ranked compounds are then suggested for biological testing.
These software tools are attractive and cost-effective approaches to generate
chemical lead structures, virtually and before committing expensive synthetic
chemistry. Furthermore, they allow rapid and thorough understanding of the
relationship between chemical structure and biological function. Depending
on the software used, the virtual screening of small molecules normally takes
less than a minute per chemical structure per computer processor (CPU)
[17]. Utilizing clusters of CPUs results in a high degree of parallelization. The
throughput with 100parallel CPU machines is even higher compared to current
uHTS technologies. The main advantage is that the method does not depend
on the availability of compounds, meaning that not only in-house libraries can
be searched but also external or virtual libraries. The application of scoring
functions on the resulting data sets facilitates smart decisions about which
chemical structures bear the potential to exhibit the desired biological activity.
On the other hand, the high-throughput docking approach can only be applied
to protein targets for which structural information based on X ray, NUCLEAR
MAGNETIC RESONANCE NMR, or homology models are available.
Once a hit compound has been identified, its specificity to the protein target
has to be assigned. Ideally,the small molecule should exhibit perfect selectivity
toward the protein of interest. In reality, it is more likely that none of the
small molecule probes used today fulfill this requirement. Compounds that
previously had been thought to be specific have turned out to hit more protein
7.7 The Study ofprotein Function in Chemical Biology and Drug Discovery I 363
targets once they are subjected to screens against other protein targets. In the
light of new technological opportunities and by failure of drugs in clinical trials
or practice due to off-target activity, efforts have been initiated to reinvestigate
the biological activity of existing drugs or interesting chemical compounds
and annotate their activity to as many proteins as available. An example of a
pioneering effort toward this direction has been the proteomic analysis of the
selectivity of kinase inhibitors by the groups of Meijer, Daub, and Lockhart
[20-231.
As the development of protein assays progresses rapidly and leads to
improvements in quality and quantity of information and a significant increase
in scope of screened protein targets, the door for full annotation of chemical
compounds has been opened. Screening the hit compound against many
protein targets has become imperative for two reasons: First of all, lack of
selectivity might be addressed by preparation of a second generation compound
library using the methods described above, and secondly, if this process does
not lead to further improvement, knowledge about the off-target promiscuity
of a small compound probe will allow a careful and critical interpretation of
the results of the biological studies carried out with this probe (Scheme 7.1-2).
The small molecule probe that has been selected by the process detailed
above is then used as a tool in a series of biological studies, exploiting the
whole repertoire of modern molecular and cell biology, such as genomic or
proteomic profiling, imaging techniques, or functional readouts (241.
Other techniques that are used for the assignment of gene function involve
the preparation of DNA mutants or gene knockouts, the application of
gene silencing via antisense probes, or RNA interference [25]. As shown
in Scheme 7.1-1, biological systems are probed with these strategies at the
level of genetic information or transcriptional expression. Consequently, the
main advantage of these genetic techniques is the pronounced, in many
cases even absolute specificity,with which they allow the probing of biological
systems (Table 7.1-2).On the other hand, reverse chemical genetics has several
unique advantages complementing these genetic techniques [26, 271:
Table 7.1-2 Comparison of different strategies to probe biological

systems
Property Gene knockout RNA interference Small molecule
Rate of action X - +++

Specificity ++S ++ 0
Tunability - 0 +++
Cost of individual experiment - - ++
Time to set up experiment - ++
Reversibility + ++
Developmental studies + +++
t:positive, -: negative, 0 : neutral, x: not relevant
364
Scheme 7.1-2 Flow scheme for a reverse chemical genetics approach.
The effect of small molecules is rapid (high temporal control

of the experiment).
Concentration of small molecules can in many cases be
spatially controlled and monitored.
The effect is tunable. By varying the concentration different
degrees of phenotype expression can be created.
In most cases the biological effect is reversible (due to
metabolism or excretion), which allows transient study of
protein function.
7.1 The Study ofprotein Function in Chemical Biology and Drug Discovery I 365
The effect is conditional. I t can be initiated at any stage during

the development of an organism. In contrast, a gene knockout
that is lethal for embryonic development cannot be studied in
an adult organism.
Knockout studies cannot differentiate between different
protein forms that result from the same gene. Small
molecules should, in principle, be able to distinguish between
the different forms. Small molecules can even consolidate
protein structures in different conformations (agonists resp.
antagonists), allowing gain-of-function as well as
loss-of-function studies to be performed.
As ligand-binding sites of a protein exhibit in many cases a
very high structural similarity in different species, the same
small molecule probe can be used for studies in different
species, whereas any genetic experiment would have to be
adapted to the different genetic repertoire.
The effect can be studied by anyone who has access to the
small molecule probe (simple reproducibility).
Recently, several techniques have been introduced, which combine the

experimental advantages of chemical probes with the specificity of genetic
methods. Conklin et al. have established the “receptors activated solely by
synthetic ligands” (RASSL)) approach for the study of G-proteins in vivo. In
one example they removed the third extracellular loop of the K opioid receptor
(KOR), which reduced the binding affinity of natural endogenous peptide
ligand dynorphin to t O . O S % , while maintaining affinity for small molecule
K agonists that have a different binding pocket close to the transmembrane
region [28].
The human genome encodes >SO0 kinases, many of them playing important
roles in key processes such as cell signaling and cell division. Although
all kinases have an ATP-binding pocket, which qualifies them for small
molecule binding, the structural Similarity of these ligand-binding sites
renders specificity almost impossible. Shokat et al. have developed an elegant
approach, which allows for the allele-specificchemical intervention of kinases.
A promiscuous kinase inhibitor was modified by a bulky substituent, which
prohibited binding to the regular ATP-binding sites of native kinases. Almost
all kinases exhibit a hydrophobic residue at the ATP-binding site, which
functions as the “gatekeeper”. Mutational replacement of the gatekeeper-
residue against Gly does not affect the regular activity of the kinase, but
opens intervention by the bulky inhibitor, which interacts only with sensitized
kinases. Shokat et al. used this technique, for instance, to show that there
are significant phenotypic differences between the rapid loss of activity by
inhibition and the deletion of the genomic copy of the cyclin-dependent kinase
Pho85 [29, 301.
366
7.1.4
Since a comprehensive description of all examples for reverse chemical

genetics investigations carried out is beyond the scope of this chapter,
we will highlight several notable examples from seven case studies, which
exemplify key elements of this approach. Many other important contributions,
such as the seminal work of the Schreiber group in revealing the chemical
biology of immunophilins and histone deacetylases, and the preparation of
subtype-selective agonists of the somatostatin receptor through combinatorial
chemistry by researchers from Merck, are listed in Table 7.1-3. A recently
published review article describes forward and reverse chemical genetics
related to cell division, cytoskeleton, protein trafficking, and the ubiquitin-
proteasome pathway [31].
Case Study 1: Isotype-SelectiveSmall Molecule Probes for Orphan Nuclear

Receptors (CW4064 and Farnesoid X Receptor)
To date, 48 nuclear receptors have been identified in the human genome. Each
of these receptors contains the signature DNA-binding and/or ligand-binding
domain (LBD). However, only 12 receptors bind to the classical steroid and
retinoid hormones, and the remaining 36 have been designated as orphan
nuclear receptors. Researchers from GlaxoSmithKline Inc. used HTS of nat-
ural compound and combinatorial chemistry libraries to deorphanize selected
members ofthe nuclear receptor family [49,50].The farnesoid X receptor (FXR)
has been shown to be weakly activated by farnesol. However, this effect is only
indirect since farnesol does not bind to the receptor. Screening ofa collection of
naturally occurring steroids revealed that FXR is a receptor for bile acids, with
Table 7.1-3 Selected examples for reverse chemical genetics
Small molecule probes Comments References
Cytochalasin, latrunculin Inactivates actin (cytoskeleton)

Cyclosporin, FKSOG, rapamycin Calcineurin, FRAP, TOR pathway (signal
transduction)
Trichostatin A, tubacin, histacin Histone deacetylase (gene expression)
Uretupamine Ure2p (glucose signaling)
MT1-2 agonists and antagonists Melatonin receptors (cell signaling)
Kinase inhibitors Raf/MAP kinase pathway (cell signaling)
SSTI-5 selective agonists Somatostatin receptors (cell signaling)
Src-kinase inhibitors Maturation of T-cell contacts
SAG Smo protein (Hedgehog signaling)
+
Monastrol kinase inhibitors Aurora kinases (cell division)
Tunicamycin Glycoprotein biosynthesis
368
I experiments have been aiding in gaining insight into estrogen signaling,
additional information on the function of E R a and ERB was provided by

the application of isotype-selective ER agonists. These compounds include
the E R a selective agonists propyl pyrazole trio1 (PPT) [55], the ERB selective
agonists diarylpropionitrile (DPN) [SG], and the benzoxazole derivative ERB-
041 [57].On the basis of the crystal structure of the ERa, LBD and a homology
model of the ERB-LBD (59% sequence identity to ERa) [58] Hillisch et al.
designed steroidal ligands that exploited the differences in size and flexibility
between the two ligand-binding cavities (Fig. 7.1-4). Computer-aided drug
design methods were used to dock compounds into the binding pockets.
Compounds predicted to bind preferentially to either ERa or ERB were
synthesized and tested in vitro. This approach directly led to high ER, isotype
selective, (200-250 fold) and potent ligands. To unravel the physiological roles
of each of the two receptors, in vivo experiments with rats were conducted
using the ERa- and ERB-selective agonists in comparison to the natural ligand,
17B-estradiol.The compounds were administered to Wistar rats using osmotic
pumps to overcome pharmacokinetic deficiencies of these tool compounds.
A specifically developed, highly sensitive RIA (Radio Immune Assay) allowed
the detection and quantification of the compounds in systemic circulation [59].
The E R a agonist 1Ga-LE2 was shown to be responsible for most of the known
estrogenic effects such as induction of uterine growth, and bone-protective,
pituitary, and liver effects. In addition, the compound showed positive
effects on blood vessels in ovariectomized spontaneously hypertensive rats;
endothelium-dependent NO-mediated vasorelaxation; and e-NOS (endothelial
Nitric Oxide Synthase) expression [59]. The ERB agonist 8B-VE2 was shown
Fig. 7.1-4 Isotype-selective probes for E R a and ERB. Reprinted with permission from The
Endocrine Society [58].
to stimulate early folliculogenesis, decrease follicular atresia, induce ovarian

gene expression, and stimulate late follicular growth, accompanied by an
increase in the number of ovulated oocytes in hypophysectomized rats
and gonadotropin-releasing hormone antagonist-treated mice [GO]. Affymetrix
analysis revealed the expression of a considerable number of genes to be
strongly modulated in the ovary by treatment of juvenile rats with the natural
hormone estradiol ( E l ) and the tool compounds 8B-VE2, among these cellular
retinoic acid binding protein I1 (CRABP-11),a-L-fucosidase (ALFUC),calcium-
binding protein (CaBP), prostacyclin synthase (PGIS), and inhibin a. These
experiments revealed several new aspects of estrogen signaling and stimulated
further research. Use of the ERB agonist might provide clinicians with a
new option for tailoring classical ovarian stimulation protocols. These studies
show that it is possible to design highly selective compounds, if structure
information on all relevant homologs of the target is available and the designed
tool compounds contribute essentially to the elucidation of the physiological
roles of the target protein.
Case Study 3: Deorphanizing Receptors by Reverse Pharmacology (Orexins and

C PCRs)
The sequencing of the human genome has resulted in the identification of

300-400 nonolfactory GPCRs, for most of them an endogenous ligand has
not yet been identified (“orphan receptors”). GPCRs respond to a variety of
signals, including photons, biogenic amines, lipids, or peptides. The biological
activity of all known small regulatory peptides (small peptide hormones and
neuropeptides) is associated with their acting on GPCRs. It is believed that
for most orphan GPCRs, peptides are their unidentified signaling molecules.
To understand the biological significance of the many GPCRs in the human
genome, deorphanization is a goal of utmost importance. Sakurai et al. have
demonstrated that “reverse pharmacology” is a powerful strategy to accomplish
this task. After generating over 50 transfectant cell lines, each expressing a
distinct orphan GPCR, they challenged the cells with HPLC (high performance
liquid chromatography) fractions of extracts derived from different tissues and
monitored a number of signal transduction readouts for G-protein activation.
In such an experiment, they observed interesting initial activity in an extract
from rat brain. Several rounds of reverse phase-H PLC purification revealed
a 3 3 amino acid peptide as the active substance, which received the name
orexin-A. The corresponding receptor received the name orexin receptor
(greek: orexis = appetite). Further investigations resulted in the notion that
two substances orexin-A and orexin-B exist, both exhibiting intramolecular
disulfide bridges, which activate two receptors A and B that are found mainly
in the brain [Gl].A combination of chemical, genetic, and physiological studies
revealed that these peptides stimulate food consumption and their production
is influenced by the nutritional state of a test animal. The discovery of orexin
370
I 7 Reverse C h e m i c a l Genetics Revisited
deficiency in narcoleptic patients showed that orexins play an important role

in the regulation of sleep and wakefulness.[G2]
The strategy of “reverse pharmacology” has turned out to be a generally
applicable and productive approach for the deorphanization of GPCRs [G3].
For example, it has been used for the functional annotation of the receptors
Drostar-1 and Drostar-2, for which a role in visual information processing has
been identified [G4].
Case Study 4:lsoform Selective Inhibitor made by Combinatorial Chemistry

Unravels the Roles of lsoforms In Vivo(Cranzymes A and 6)
Natural killer (NK) cells and cytotoxic T lymphocytes (CTL)are the primary line
of defense against viruses and other intracellular pathogens in the immune
system. The cytotoxic lymphocytes recognize infected host cells and kill them
with the help of the pore-forming protein perforin and by proteolytic events
carried out by members of the granzyme family of serine proteases. Although
an essential component of immunity under normal conditions, aberrant
cytotoxic lymphocyte activity has been associated with autoimmune disorders
such as rheumatoid arthritis, diabetes, or allograft rejection [GS].
Craik and Mahrus applied a reverse chemical genetics approach to reveal
the role of the most important granzymes A and B in cell lysis, as two
classical approaches of cell biology have led to contradictory results: Cytotoxic
lymphocytes from knockout mice (lacking either granzyme A, granzyme B, or
both) behave relatively normal in their ability to lyse target cells. On the other
hand, a reconstituted system in which target cells are treated with sublytic
levels of perforin and either granzyme A or granzyme B leads to efficient cell
lysis. This discord in findings could result from the well-known limitations
of these two approaches: It is known that the results from genetic deletion
studies are obscured by compensation effects of similar genes, whereas in
reconstituted systems the concentrations and mode of delivery of the agents
can be nonphysiological.
Craik and Mahrus used a positional scanning approach to prepare two
isozyme-specific phosphonate inhibitors as affinity labels of granzymes A and
B (Fig. 7.1-5). Both inhibitors were tested against a panel of all known human
granzymes A, B, H, K, and M and only exhibited activity against their target
protein. Use of these activity-based probes in cytotoxicity assays then allowed
dissection of the contribution of granzymes A and B to lysis of target cells by
N K cells. Granzyme B functions as a major effector of target cell Ivsis, whereas
granzyme A is only a minor effector in the same process. Tlie difFerence
between the outcome of the reverse chemical genetics approach and the above
mentioned conventional experiments might be a consequence of the fact that
in pharmacological studies high temporal control circumvents compensation,
and also because no alterations are made to the concentrations and mode of
delivery of granzymes and perforin.
7. I The Study ofprotein Function in Chemical Biology and Drug Discovery 1 371
Probe A
(granzyme A-selective)
Probe B
(granzyme B-selective)
Fig. 7.1-5 Isozyme-selective probes for reverse chemical genetics of granzymes A and B.
Case Study 5: Design o f an Inhibitor of a Protein to Study Protein Function in a

Cell (Raspalin 3 and APT1)
The observation that the Ras proteins are critically involved in the development
of cancer has spurred substantial interest in developing new classes of
antitumor drugs on the basis of interference with the impaired signal
transducing activities of Ras. The Ras proteins belong to the class of proteins
whose biological activity is dependent on lipid modification. In the normal
and oncogenic state, the H- and N-Ras isoforms are anchored to the plasma
membrane by means of S-farnesylation and S-palmitoylation at their C-
terminus, which are required to exert their full biological activity. While
inhibition of the enzyme farnesyltransferase is known and has become a drug
target for intervention of tumors carrying a mutation in the Ras oncogene, the
enzyme responsible for the palmitoylation of the Ras and other G-protein has
not been identified so far.
The only known “bona jide player” in Ras-palmitoylation was acyl protein
thioesterase 1 (APTl), which depalmitoylates H-Ras and other lipidated
proteins [GG].However, its relevance to Ras biology was unclear. In an attempt
to elucidate the biological role of APT1 the groups of Giannis, Kuhlmann,
and Waldmann followed a Chemical Genetics approach, that is, developed a
372
Fig. 7.1-6 Raspalin

3 - inhibitorofAPT1.
Raspalin 3
(APTl : C
I, = 148 nM)
potent inhibitor of APTl to perform a chemical knockout of the protein in

cellular assays and to study the subsequent response of the biological system.
Peptidomimetics that imitate the C-terminus of the H-Ras protein and embody
different lipidation patterns, in particular a nonhydrolyzable sulfonamide as
analog of the palmitic acid thioester, were designed and investigated as
inhibitors of APTl, among which Raspalin 3 emerged as the most useful
inhibitor (Fig. 7.1-6) [67].
Raspalin 3 was then used in experiments employing the neuronal
precursor cell line PC12, in which the semisynthetic Ras proteins modified
with fluorescent probes played a major role (Fig. 7.1-7). Cell-biological
experiments with these protein conjugates had shown that if a farnesylated
yet still palmitoylatable Ras protein (that is with a free and palmitoylatable
cysteine-SH) was microinjected into PC12 cells, the cellular machinery
would carry out the palmitoylation, resulting in localization of the protein
at the plasma membrane, and neurite outgrowth from the cells. It was
to be expected that APTl through depalmitoylation should antagonize this
process leading to reduced neurite outgrowth. Consequently, inhibition of
the depalmitoylating thioesterase by the freshly designed inhibitors should
lead to an increase of neurite formation. However, when microinjected or
added to the culture medium, application of an APTl inhibitor surprisingly
resulted in reduced formation of neurites. Thus, this compound did not
behave as an inhibitor of Ras-depalmitoylation but rather as an inhibitor
of Ras-palmitoylation. This finding was backed up by employing a different
semisynthetic Ras protein that is biologically active yet not palmitoylatable
or depalmitoylatable (it embodies a stable hexadecyl thioether instead of a
labile palmitic acid thioester and was synthesized employing the methods
described above). Use of yet another semisynthetic Ras protein that is
palmitoylatable and additionally fluorescent-labeled in the PC12 cell assay,
and inspection of the cells by confocal laser fluorescence microscopy
showed that - as expected, if palmitoylation and not depalmitoylation
Fig. 7.1-7 Reduction of PC12 cell differentiation rate by Raspalin i n the PC12
differentiation assay.
was inhibited - in the presence of the inhibitor, the Ras protein is no

longer localized to the plasma membrane but rather accumulates in
intracellular membranes (Fig. 7.1-8).Taken together these findings indicated
that APT1 may be involved in mediating both Ras-depalmitoylation and
Ras-palmitoylation.
Case Study 6: Rationally Designed lsoform Selective Inhibitor Exhibiting a New

Clinical Aspect of the Protein Target (Viagra and PDE5)
Cyclic guanosine monophosphate (cGMP) is the ubiquitous second messenger

for GPCRs activated by endogenous substances such as nitric oxide (NO)
374
Fig. 7.1-8 Inhibition o f plasma membrane staining ofthe plasma membrane (a),
localization o f fluorescently labeled Ras coinjection o f 2 pM inhibitor Raspalin 3
protein by Raspalin 3. Localization ofthe results in an accumulation ofthe lipoprotein
fluorescent lipoprotein was monitored 7 h in cytoplasmic structures, which is typical
after microinjection by confocal microscopy. for nonpalmitoylatable Ras constructs (b).
Although Ras protein alone shows a distinct
and atrial natriuretic factor (ANF). Intracellular levels of cGMP are

controlled by cyclic nucleotide cyclases (synthesis of cGMP from GTP) and
phosphodiesterases (PDE) (hydrolysis of cGMP to inactive GMP). Among
at least seven families of PDEs, PDE5 is a calcium/calmodulin insensitive
cGMP PDE, occurring in the lung, platelets, and in various forms of smooth
muscles. A research team at Pfizer/UK was of the opinion that a selective
PDE5 inhibitor would preserve tissue levels of cGMP and hence would
potentiate the vasodilator and natriuretic effects of ANF. Therefore, such
a PDE5 inhibitor would show potential for the treatment of hypertension
and other cardiovascular indications [68]. Starting from an unselective lead
substance, a medicinal chemistry approach led to sildenafil showing, at that
time, an unprecedented selectivity over other PDE isoenzymes (Fig. 7.1-9).
Despite encouraging results in the laboratory, the clinical results in coronary
heart disease were disappointing. Surprisingly, several participants in a trial of
sildenafil on 30 men in Merthyr Tydfil/Wales refused to return their unused
tablets when the trial was stopped. On questioning by the physician in charge, it
emerged that the patients had discovered that PDE5 is the predominant cGMP
hydrolyzing activity in the cytosolic fraction from human corpus cavernosum
[6].As penile erection is mediated by NO and thus cGMP, sildenafil improves
erection by enhancing relaxation of the corpus cavernosal smooth muscle
(Scheme 7.1-3). Sildenafil (Viagra'") revolutionized the treatment of male
erectile dysfunction and became a blockbuster drug in the market. Follow-up
drugs exhibit even higher potency and isozyme selectivity, potentially reducing
some of the unwanted side effects of sildenafil.
7. I The Study ofProtein Function in Chemical Biology and Drug Discovery 1 375
Sildenafil
(ViagraTM)
Fig. 7.1-9 Structure and isozyme selectivity of sildenafil.
NO
GTP Smooth
~GMP muscle Erection
GMP relaxation
T
Sildenafil
Scheme 7.1-3 NO-signaling pathway interfered by sildenafil.
Case Study 7: Natural Products Allow the Characterization of Different Binding

Sites within a Family o f Proteins (Conotoxins and Nicotinic Acetylcholine
Receptors)
As mentioned above, the classic experiments by Langley with the nicotinic

acetylcholine receptor (nAChR) at the neuromuscular junction has led to the
376
I formulation of the receptor concept. nAChRs are ligand-gated ion channels
belonging to the Cys-loop receptor superfamily, which allow the passage

of potassium, sodium, or calcium ions across the synaptic membrane.
Two classes of nAChRs exist - neuromuscular and neuronal - each being
composed of five subunits that can form heteropentameric or homopentameric
membrane-bound channel structures [69-71]. While the identification and
pharmacological distinction of nAChR subtypes at the neuromuscular
endplate (responsible for muscle contraction) and in sympathetic and
parasympathetic ganglia (mediating neurotransmission) were accomplished
earlier, the investigation of neuronal nAChRs in the brain is more elusive.
The basic framework of neuronal nAChRs takes the form ~ 2 8 3 whose ,
extraordinary variety and complexity results from the fact that so far a2-a7,
a9, a10, 82-84 subunits have been cloned from neuronal and sensory
mammalian tissues. Diseases like Alzheimer's, Parkinson's, epilepsy, and
schizophrenia, or nicotine addiction have been proven to be connected to
specific subclasses of nAChRs, which creates an urge for understanding these
potential targets for pharmaceutical intervention [70].
The venom of the Conus genus of marine snails contains a family of toxins,
which contains oligopeptides that are highly selective at blocking nAChRs by
binding to acetylcholine binding pockets between specific subunit pairs. The
so-called a-conotoxins range in size between 12 and 19 amino acids and use
disulfide bonds to maintain their three-dimensional shape. Although only a
fraction of a-conotoxins has been isolated from snail venom yet, the small
proportion of toxins whose biological activity has been annotated, has proven
to be a bounty of selective tools for the study of both neuromuscular and
neuronal nAChRs (Table 7.1-4) [70].
The conotoxins have not only proven invaluable for the chemical biological
study of nAChRs but some of them have also been developed for the treatment
of neurological conditions and are in advanced stages of clinical trials [72].
Just recently Elan Pharmaceuticals has introduced the synthetic equivalent
of the w-conotoxin MVIIa Ziconotide (Prialt'") in the market as a novel
nonopioid drug for the treatment of severe chronic pain. Ziconotide acts by
potently and selectively blockading neuronal N-type voltage-sensing calcium
channels, causing the inhibition ofthe activity ofa subset of neurons, including
pain-sensing primary nociceptors [73].
7.1.5
Future Developments
Although the pharmacological approach of target validation is almost as old

as the idea of target receptors, a series of recent breakthroughs in method
developments in chemistry, biochemistry, bioinformatics, cheminformatics,
biology, and pharmacology will boost reverse chemical genetics to new heights.
Table 7.1-4 Sequences and mammalian subunit specificities of

neuronal u-conotoxins [70]
Name Sequence Subunit specificity
MI1 Gly-Cys-Cys-Ser-Asn-Pro-Val-Cys-His-Leu-Glu-His-Ser-
a 6 b 2 Y u3B2
Asn=u-Cys-NH2
AuIA Gly-Cys-Cys-Ser-Tyr-Pro-Pro-Cys-Phe-Ala-Thr-As~-Ser-
a3p4
Asp-Tyr-vs-NHz
AuIC Gly-Cys-Cys-Ser-Tyr-Pro-Pro-Cys-Phe-Ala-Thr-As~-Ser-
u3p4
Gly-Tyr-CT-NHl
PnIA Gly-Cys-Cys-Ser-Leu-Pro-Pro-Cys-Ala-Ala-Asn-Asn-Pro-
- u3B2
Hz
Asp-Tyrl”1-Cys-N
PnI B Gly-Cys-Cys-Ser-Leu-Pro-Pro-Cys-Ala-Leu-Ser-Asn-Pro-
u7
Asp-Tyrlcys-NH2
EPI Gly-Cys-Cys-Ser-Asp-Pro-Arg-Cys-Asn-Met-Asn-Asn-Pro-
~ 3 ~u3B4. 2 . a7
Asp-TyrlGys-NH2
AnIA Cys-Cys-Ser-His-Pro-Ala-Cys-Ala-Ala-Asn-Asn-Gln-Asp-
- a3p2
TyrIal-Cys-NHl
AnlB Gly-G~Cys-Cys-Ser-His-Pro-Ala-Cys-Ala-Al~-Asn-Asn-
- a3B2
Gln-Asp-Tyr[”l-Cys-NHz
AnIC Gly-Gly-Cys-Cys-Ser-His-Pro-Ala-Cys-Phe-Ala-Ser-As~.
- u3P2
Pro-Asp-Tyrl”I-Cys-NH2
GIC Gly-Cys-Cys-Ser-His-Pro-Ala-Cys-Ala-Gly-As~-Asn-Gln-
u3b2 (~6B283
His-Ile-CGNHz
GID Ile-Arg-~p-Gla~’~-Cys-Cys-Ser-Asn-Pro-Ala-Cys-Arg-Val-
w3P2 2 (u7
Asn-Asn-Hyp-His-Val-Cys
VCl.1 Gly-Cys-Cys-Ser-Asp-P~Arg-Cys-Asn-Tyr-Asp-His-Pro-u3B4
G lu-He-CTNH 2
PIA Arg-Asp-Pro-Cys-Cys-Ser-Asn-Pro-Val-Cys-Thr-Val-His-
a 6 l a 382B3
Asn-Pro-Glu-Ile-Cys-NH2
AuIB Gly-Cys-Cys-Ser-~-Pro-Pro-Cys-Phe-Ala-Thr-Asn-Pro-a3b4
ASP-CYS-NH~
ImI Gly-Cys-Cys-Ser-Asp-Pro-Arg-Cys-Ala-Trp-Arg-Cys-NHl
u7
lmI1 a7
Ala-Cys-Cys-Ser-Asp- Arg-Arg-Cys- Arg-Trp- Arg-qs-N Hz
- - n.d.(not
ImIII Tyr-Cys-Cys-His-Arg-Gly-Pro-Cys-Met-Val-Trp-C>-NHl
determined)
BuIA Gly-Cys-Cys-Ser-Thr-Pro-Pro-Cys-Ala-Val-Leu-Tyr-Cys-
- - a6lu3B2 Y
NH2 a6lu3p4
~
Disulfide bonds are linked as bold pairs and underlined pairs

a Sulfotyrosine.
b Carboxyglutamate.
We think that the following developments will shape the future of the field to
a major extent:
1. The completion of the sequencing of the human genome
has provided a global map of the potential landscape of
378
efforts in reverse chemical genetics. At present, a qualified

total number of genes or gene products is available, and
most proteins are available at least as expressed sequence
tags (EST) sequence data. Future efforts in sequencing
and single nucleotide polymorphism (SNP) analysis of
subpopulations, defined by health respective disease
status, genetic heritage, ethnic background, etc. will
increase the resolution of sequence data and information.
2. The large-scale efforts in biochemistry and biology using
the whole repertoire of classical mutation genetics,
antisense, RNAi, cell-biological methods, etc. will
continue and support the exponential growth of biological
understanding of cells and organisms.
3. The now fruit-bearing structural genomic initiatives will
increase the number of available protein structures that
could be exploited for rational design of small molecule
ligands, as detailed above. Unfortunately, for a series
of important target protein classes such as GPCRs and ion
channels, only a very limited number of experimentally
solved protein structures are available. Hopefully,
new protein expression techniques and crystallization
procedures will eliminate this bottleneck in the near future.
Homology modeling techniques have been improved
substantially in the last years and they provide a way to
bridge the time gap until experimentally derived structure
information on target proteins becomes available 1741.
4. Combinatorial chemistry, parallel synthesis, and solid-
phase synthesis will continue to become more efficient
and productive tools for the synthesis of compound
libraries. Despite their still incomplete status, rationales
about library diversity, drug-likeness, promiscuity
of functional groups or structural elements, metabolic
stability, bioavailability, etc. will become increasingly
important guiding principles for library design. Growing
accessibility of building blocks and an increasing number
of different scaffolds will allow creation of chemical
compounds of a new quantity and quality, which can
be subjected to biological screening for protein-binding
assays or phenotypic forward genetic screening.
5. An increasing number of available protein-binding assays,
functional cell-based assays, and methods of chemical
proteomics (affinity chromatography, three-hybrid assays,
pull-down assays) will allow for a better assignment of the
specificity and selectivity ofa hit compound. It would be
desirable that the data collected during these screening
7.1 The Study ofprotein Function in Chemical Biology and Drug Discouev 1 379
programs, will be translated into an understanding of the

correlation between the chemical structure and the
protein-binding capability. New cheminformatic
approaches will support this approach.
6. With the more specific chemical probes, identified from
the screening processes outlined earlier in 1-5, more
educational and functional analyses of cells and
organisms can be carried out, taking advantage of new
methodologies describing the physiological state of an
object, such as DNA-chip analysis, imaging techniques,
RT-PCR, proteomics, phenotypic assaying using
antibodies, and many more [75-771.
7. The holistic approach of system biology is assisted by
large-scale computing that is able to deal with the
complexity of the biological networks and experimental
data. Once it is possible to compute the global response of
a biological system to a perturbation or external
intervention, the system can be regarded as understood
and this might accelerate the search for new
pharmacological targets tremendously [78].
Although these techniques will certainly bear fruit, the difficulty and the com-
plexity ofthe task tackled should not be underestimated. Research carried out at
the interface of chemistry and biology over the last two decades has taught one
important lesson: the increase in our understanding of processes at a cellular
or organismic level goes parallel to the notion that nature is much more compli-
cated than most might have anticipated. What once were signal pathways have
turned into signal networks, which shows an almost brainlike plasticity which
is currently beyond our understanding. Recent results indicate that “dirty”
drugs (i.e.,drugs targeting several protein targets at the same time) [79]used in
the treatment of CNS (central nervous system) disorders are more effective and
cause less side effects than “clean” drugs [80].A similar effect, in which a syner-
gistic interplay between kinases plays a role, has been proposed for cancer drugs
[81].Manipulation of a network with multiple redundant backup lines needs the
orchestrated tracking down of a signal via multiple interactions but most likely
not the knockout of a single mode (i.e., a single protein). This will lead to new
rules for drug discovery. Whether randomly created or intentionally designed
unselective drugs or mixtures of selective drugs will be the ideal remedies
against those diseases, will be a question which has to be answered in the future.
7.1.6
Conclusion
Reverse chemical genetics is one of the several necessary tools in target

validation. Among these tools it holds a particularly prominent role because
380 7 Reverse Chemical Genetics Revisited
I full control over the biological function of a protein is the key to its complete
understanding in a physiological context. Unfortunately, it will not be easy to
achieve this ultimate goal, as it will be very difficult to develop chemical probes
with complete selectivity and specificity. Nevertheless, even an approximation
to this goal will be rewarded with a major gain in insight and understanding
of biological systems.
Acknowledgments
R. B. and H. W. thank the Max-Planck-Society, the Deutsche Forschungs-

gemeinschaft, the Fonds der Chemischen Industrie, and the University of
Dortmund for continuous and generous financial support of their research.
References
I . S.L. Schreiber, Chemical genetics relationships, Nat. Rev. Genet. 2003, 4,

resulting from a passion for synthetic 309-314.
organic chemistry, Bioorg. Med. Chem. 11. M. Bredel, E. Jacoby,
1998, 6, 1127-1152. Chemogenomics: an emerging
2. T.J. Mitchison, Towards a strategy for rapid target and drug
pharmacological genetics, Chem. Biol. discovery, Nat. Rev. Genet. 2004, 5,
1994, 1, 3-6. 262-275.
3. H.E. Blackwell, Y. Zhao, Chemical 12. R.P. Hertzberg, A. J. Pope,
genetic approaches to plant biology, High-throughput screening:
Plant Physiol. 2003, 133,448-455. technology for the 21st century, Curr.
4. A.H. Maehle, C.-R. Priill, R.F. Opin. Chem. Biol. 2000, 4,445-451.
Halliwell, The emergence of the drug 13. J. Wolcke, D. Ullmann, Miniaturized
receptor theory, Nat. Rev. Drug Discou. HTS technologies-uHTS, Drug
2002, 1, 637-641. Discov. Today 2001, 6,637-646.
5. J.R. Peterson, T.J. Mitchison, Small 14. S.A. Sundberg, High-throughput and
molecules, big impact: a history of ultra-high-throughput screening:
chemical inhibitors and the solution- and cell-based approaches,
cytoskeleton, Chem. Biol. 2002, 9, Curr. Opin. Biotechnol. 2000, 11,
1275-1285. 47-53.
6. W. Sneader, Drug Discovery: A History, 15. L. Silverman, R. Campbell, J.R.
Wiley, Chichester, 2005. Broach, New assay technologies for
7. R.P. Ahlquist, A study of the high-throughput screening, Curr.
adrenotropic receptors, A m .J. Physiol. Opin. Chem. Biol. 1998, 2, 397-403.
1948, 153,586-600. 16. R. Breinbauer, I.R. Vetter,
8. C.G. Wermuth, Selective optimization H. Waldmann, From protein domains
of side activities: another way of drug to drug candidates-natural products
discovery, J. Med. Chem. 2004,47, as guiding principles in the design
1303- 1314. and synthesis of compound libraries,
9. J. Drews, Drug discovery: a historical Angew. Chem. 2002, 114,3002-3115;
perspective, Science 2000, 287, Angew. Chem. Int. Ed. Engl. 2002, 41,
1960- 1964. 2879-2890.
10. B.R. Bochner, New technologies to 17. G. Schneider, H.J. Bohm, Virtual
assess genotype-phenotype screening and fast automated docking
References I381
methods, Drug Discov. Today 2002, 7, genetics in drosophila melanogaster,
64-70. Nut. Rev. Genet. 2002, 3, 189-198.
18. Glaxo Wellcome, Redesigning drug 26. T.U. Mayer, Chemical genetics:
discovery, Nature 1996, 384 (Suppl-5). tailoring tools for cell biology, Trends
19. L.M. Toledo-Sherman, D. Chen, Cell Biol. 2003, 13, 270-277.
High-throughput virtual screening for 27. B.R. Stockwell, Chemical genetics:
drug discovery in parallel, Curr. Opin. ligand-based discovery of gene
Drug. Discov. Deuel. 2002, 5,414-421. function, Nut. Rev. Genet. 2000, I,
20. M. Knockaert, N. Gray, E. Damiens, 116-125.
Y.-T. Chang, P. Grellier, K. Grant, 28. K. Scearce-Levie, P. Coward, C.H.
D. Fergusson, J. Mottram, M. Soete, Redfern, B.R. Conklin, Tools for
J.-F. Dubremetz, K. Le Roch, dissecting signaling pathways in vivo:
C. Doerig, P.G. Schultz, L. Meijer, receptors activated solely by synthetic
Intracellular targets of ligands, Meth. Enzymol. 2002, 343,
cyclin-dependent kinase inhibitors: 232-248.
identification by affinity 29. K. Shokat, M . Vellaca, Novel chemical
chromatography using immobilised genetic approaches to the discovery of
inhibitors, Chem. Biol. 2000, 7, signal transduction inhibitors, Drug
411-422. Discov. Today 2002, 7,872-879.
21. J. Wissing, K. Godl, D. Brehmer, 30. A.S. Carroll, A.C. Bishop, J.L. DeRisi,
S. Blencke, M. Weber, K.M. Shokat, E.K. O’Shea, Chemical
P. Habenberger, M. Stein-Gerlach, inhibition of the Pho85
A. Missio, M. Cotton, S. Muller, cyclin-dependent kinase reveals a role
H. Daub, Chemical proteomic analysis in the environmental stress response,
reveals alternative modes of action for Proc. Natl. Acad. Sci. U.S.A. 2001, 98,
Pyrido[2,3-d]pyrimidine kinase 12578-12583.
inhibitors, Mol. Cell. Proteomics 2004, 31. N.A. Hathaway, R.W. King, Dissecting
3,1181-1193. cell biology with chemical scalpels,
22. D. Brehmer, Z. Greff, K. Godl, Curr. Opin. Cell Biol. 2005, 17, 12-19.
S. Blencke, A. Kurtenback, M. Weber, 32. M.-A. Bjornsti, P.J. Houghton, The
S. Muller, B. Klebl, M. Cotton, G. Keri, TOR pathway: a target for cancer
J. Wissing, H. Daub, Cellular targets therapy, Nat. Rev. Cancer 2004, 4,
of gefitinib, Cancer Res. 2005, 65, 335 -348.
379-382. 33. S.L. Schreiber, Immunophilin-
23. M.A. Fabian, W.H. Biggs 111, D.K. sensitive phosphatase action in cell
Treiber, C.E. Atteridge, M.D. signaling pathways, Cell 1992, 70,
Azimioara, M.G. Benedetti, T.A. 365-368.
Carter, P. Ciceri, P.T. Edeen, M. Floyd, 34. C.M. Grozinger, S.L. Schreiber,
J.M. Ford, M. Galvin, J.L. Gerlach, Deacetylase enzymes: biological
R.M. Grotzfeld, S. Herrgard, D.E. functions and the use of
Insko, M.A. Insk0,A.G. Lai, J.-M. small-molecule inhibitors, Chem. Biol.
Lelias, S.A. Mehta, Z.V. Milanov, A.M. 2002, 9, 3-16.
Velasco, L.M. Wodiscka, H.K. Patel, 35. S. J. Haggerty, K.M. Koeller, J.C.
P.P. Zarrinkar, D.J. Lockhart, A small Wong, C.M. Grozinger, S.L. Schreiber,
molecule-kinase interaction map for Domain-selective small-molecule
clinical kinase inhibitors, Nut. inhibitor of histone deacetylase 6
Biotechnol. 2005, 23, 329-336. (HDAC6)-mediated tubulin
24. R.A. Butcher, S.L. Schreiber, Using deacetylation, Proc. Natl. Acad. Sci.
genome-wide transcriptional profiling U.S.A. 2003, 100,4389-4394.
to elucidate small-molecule 36. S.J. Haggerty, K.M. Koeller, J.C.
mechanism, C u r . Opin. Chem. Biol. Wong, R.A. Butcher, S.L. Schreiber,
2005, 9, 25-30. Multidimensional chemical genetic
25. M.D. Adams, J.J.Sekelsky, From analysis of diversity-oriented
sequence to phenotype: reverse synthesis-derived deacetylase
382
inhibitors using cell-based assays, molecules to new applications, Curr.

Chem. Biol. 2003,10,383-396. Opin. Phamacol. 2004,4, 608-613.
37. F.G. Kuruvilla, A.F. Shamji, S.M. 45. K. Kohler, A.C. Lellouch, S. Vollmer,
Sternson, P.J. Hergenrother, S.L. 0. Stoevesandt, A. Hoff, L. Peters,
Schreiber, Dissecting glucose H. Rogl, B. Malissen, R. Brock,
signalling with diversity-oriented Chemical inhibitors when timing is
synthesis and small-molecule critical: a pharmacological concept for
microarrays, Nature 2002,41 6, the maturation of T cell contacs,
653-657. Chembiochem 2005, 6, 152-161.
38. J.A. Boutin, V. Audinot, G. Ferry, 46. J.K. Chen, J. Taipale, K.E. young,
P. Delagrange, Molecular took to T. Maiti, P.A. Beachy, Small molecule
study melatonin pathways and actions, modulation of smoothend activity,
Trends Phamacol. Sci. 2005,26, Proc. Natl. Acad. Sci. U.S.A. 2002,99,
412-419. 14071- 14076.
39. J.S. Sebolt-Leopold,R. Herrera, 47. M.A. Lampson, K. Renduchitala,
Targeting the mitogen-activated A. Khodjakov,T.M. Kapoor,
protein kinase cascade to treat cancer, Correcting improper
Nat. Rev. Med. 2004,4, 937-947. chromosome-spindle attachments
40. J.S. Sebolt-Leopold,D.T. Dudley, during cell division, Nat. Cell Biol.
R. Herrera, K. van Becelaere, 2004,6,232-237.
A. Wiland, R.C. Gowan, H. Tecle, S.D. 48. W. McDowell, R.T. Schwarz,
Barrett, A. Bridges, S. Przybranowski, Dissecting glycoprotein biosynthesis
W.R. Leopold, A.R. Saltiel, Blockade of by the use of specific inhibitors,
Biochimie 1998, 70,1535-1549.
the MAP kinase pathway suppresses
growth of colon tumors in vivo, Nut. 49. T. Willson, Chemical genomics of
Med. 1999,5,810-816.
orphan nuclear receptors, in Ernst
Schering Research Foundation Workshop
41. S.P. Rohrer, E.T. Birzin, R.T. Mosley,
42: Small Molecule-Protein Interactions,
S.C. Berk, S.M. Hutchins, D.-M. Shen,
(Eds.: H. Waldmann, M. Koppitz),
Y. Xiong, E.C. Hayes, R.M. Parmar,
Springer, Berlin, 2003, pp. 29-42.
F. Foor, S.W. Mitra, S.J. Degrado,
50. S.A. Kliewer, J.M. Lehmann, T.M.
M. Shu, J.M. Klopp, S.-J.Cai, A. Blake,
Willson, Orphan nuclear receptors:
W.W.S. Chan, A. Pasternak, L. Yang, shifting endocrinology into reverse,
A.A. Patchett, R.G. Smith, K.T. Science 1999, 284, 757-760.
Chapman, J.M. Schaeffer, Rapid 51. D.J. Parks, S.G. Blanchard, R.K.
Identification of subtype-selective Bledsoe, G. Chandra, T.G. Consler,
agonists of the somatostatin receptor S.A. Kliewer, J.B. Stimmel, T.M.
through combinatorial chemistry, Willson, A.M. Zavacki, D.D. Moore,
Science 1998, 282, 737-740. J.M. Lehmann, Bile acids: natural
42. S.P. Rohrer, J.M. Schaeffer, ligands for an orphan nuclear receptor,
Identification and characterization of Science 1999,284,1365-1368.
subtype selective somatostatin A.M. Zavacki, J.M. Lehmann, W. Seol,
receptor agonists, 1.Physiol. 2000,94,
52.
T.M. Willson, S.A. Kliewer, D.D.
211-215. Moore, Activation of the orphan
43. K.L. Geris, B. De Groef, S.P. Rohrer, receptor RIP14 by retinoids, Proc. Natl.
S. Geelissen, E.R. Kuhn, V.M. Darras, Acad. Sci. U.S.A. 1997, 94, 7909-7914.
Identification of somatostatin 53. P.R. Maloney, D.J. Parks, C.D.
receptors controlling growth hormone Haffner, A.M. fivush, G. Chandra,
and thyrotropin secretion in the K.D. Plunket, K.L. Creech, L.B. Moore,
chicken using receptor J.G. Wilson, M.C. Lewis, S.A. Jones,
subtype-specificagonists, /. T.M. Willson, Identification of a
Endocrinol. 2003,177,279-286. chemical tool for the orphan nuclear
44. M. Pawlikowski, G. Melen-Mucha, receptor FXR, J . Med. Chem. 2000,43,
Somatostatin analogs-from new 2971-2974.
References I 3 8 3
54. B. Goodwin, S.A. Jones, P.R. Price, Impact of isotype-selective estrogen
M.A. Watson, D.D. McKee, L.B. receptor agonists on ovarian function,
Moore, C. Galardi, J.G. Wilson, M.C. Proc. Natl. Acad. Sci. U.S.A. 2004, 101,
Lewis, M.E. Roth, P.R. Maloney, T.M. 5129-5134.
Willson, S.A. Kliewer, A regulatory 61. T. Sakurai, A. Amemiya, M. Ishii,
cascade of the nuclear receptors FXR, I. Matsuzaki, R.M. Chemelli,
SHP-1, and LRH-1 represses bile acid H. Tanaka, S.C. Williams, J.A.
biosynthesis, Mol. Cell 2000, 6, Richardson, G.P. Kozlowski, S.
517-526. Wilson, J.R.S. Arch, R.E. Buckingham,
55. S.R. Stauffer, C.J. Coletta, R. Tedesco, A.C. Haynes, S.A. Carr, R.S. Annan,
G. Nishiguchi, K. Carlson, J. Sun, D.E. McNulty, W.S. Liu, J.A. Terrett,
B.S. Katzenellenbogen, J.A. N.A. Elshourbagy, D.J. Bergsma,
Katzenellenbogen, Pyrazole ligands: M. Yanagisawa, Orexins and orexin
structure-affinity/activity relationships receptors: a family of hypothalamic
and estrogen receptor-alpha-selective neuropeptides and G protein-coupled
agonists, J. Med. Chem. 2000, 43, receptors that regulate feeding
4934-4947. behaviour, Cell 1998, 92, 573-585.
56. M.J. Meyers, J. Sun, K.E. Carlson, 62. T. Sakurai, Reverse pharmacology of
G.A. Marriner, B.S. Katzenellenbogen, orexin: from an orphan GPCR to
J.A. Katzenellenbogen, Estrogen integrative physiology, Regul. Pept.
receptor-beta potency-selective 2005, 126,3-10.
ligands: structure-activity relationship 63. S. Katugampola, A. Davenport,
studies of diarylpropionitriles and Emerging roles for orphan G-protein
their acetylene and polar analogues, I. coupled receptors in the
Med. Chem. 2001,44,4230-4251. cardiovascular system, Trends
57. H.A. Harris, L.M. Albert, Phamacol. Sci. 2003, 24, 30-35.
Y. Leathurby, M.S. Malamas, R.E. 64. H.J. Kreinkampf, H. J. Larusson,
Mewshaw, C.P. Miller, Y.P. Kharade, I. Witte, T. Roeder, N. Birgiil, H.-H.
J. Marzolf, B.S. Komm, R.C. Winnek, Honck, S. Harder, G . Ellinghausen,
D.E. Frail, R.A. Henderson, Y. Zhu, F. Buck, D. Richter, Functional
J.C. Keith Jr, Evaluation of an estrogen annotation of two orphan
receptor-beta agonist in animal G-protein-coupled receptors, drostar-1
models of human disease, and -2 from drosophila melanogaster
Endocrinology 2003, 144,4241-4249. and their ligands by reverse
58. A. Hillisch, 0. Peters, D. Kosemund, pharmacology, J. Biol. Chem. 2002,
G. Muller, A. Walter, B. Schneider, 277, 39937-39943.
G. Reddersen, W. Elger, K.-H. 65. S . Mahrus, C.S. Craik, Selective
Fritzemeier, Dissecting physiological chemical functional probes of
roles of estrogen receptor alpha and Granzymes A and B reveal granzyme
beta with potent selective ligands from B is a major effector of natural killer
structure-based design, Mol. cell-mediated lysis of target cells,
Endocrind. 2004, 18,1599-1609. Chem. Biol. 2005, 12,567-577.
59. J. Widder, T. Pelzer, C. Poser-Klein, 66. J.A. Duncan, A.G. Gilman, A
K. Hu, V. Jazbutyte, K.H. Fritzemeier, cytoplasmic acyl-protein thioesterase
C. Hegele-Hartung, L. Neyses, that removes palmitate from G protein
J. Bauersachs, Improvement of alpha subunits and pZl(RAS),]. Bid.
endothelial dysfunction by selective Chem. 1998,273, 15830-15837.
estrogen receptor-alpha stimulation in 67. P. Deck, D. Pendzialek, M. Biel,
ovariectomized SH R, Hypertension M. Wagner, B. Popkirova, B. Ludolph,
2003,42,991-996. G. Kragol, J. Kuhlmann, A. Giannis,
60. C. Hegele-Hartung, P. Siebel, H. Waldmann, Development and
0. Peters, D. Kosemund, G. Miiller, biological evaluation of acyl protein
A. Hillisch, A. Walter, thioesterase 1 (APT1) inhibitors,
J. Kraetzschmar, K.-H. Fritzemeier, Angew. Chem. 2005, 117,5055-SOGO:
384
I Angew. Chem. Int. Ed. Engl. 2005, 44, 75. D.E. Root, S.P. Flaherty, B.P. Kelley,
4975-4980. B.R. Stockwell, Biological mechanism
68. N.K. Terrett, A.S. Bell, D. Brown, profiling using an annotated
P. Ellis, Sildenafil (ViagraTM,
a potent compound library, Chem. Biol. 2003,
and selective inhibitor oftype 5 CGMP 10,881-892.
phosphodiesterase with utility for the 76. Z.E. Perlman, M.D. Slack, Y. Feng,
treatment of male erectile dysfunction, T.J. Mitchison, L.F. Wu, S.J.
Bioorg. Med. Chem. Lett. 1996, 6, Altschuler, Multidimensional drug
1819-1824. profiling by automated microscopy,
69. A. Nicke, S. Wonnacott, R.J. Lewis, Science 2004,306,1194-1 198.
a-Conotoxins as tools for the 77. Z.E. Perlman, T.J. Mitchison, T.U.
elucidation of structure and function Mayer, High-content screening and
of neuronal nicotinic acetylcholine profiling of drug activity in an
receptor subtypes, Eur. J . Biochem. automated centrosome-duplication
2004, 271,2305-2319. assay, Chembiochem 2005, 6, 145-151.
70. R.W. James, a-Conotoxins as selective 78. E.C. Butcher, E.L. Berg, E.J. Kunkel,
probes for nicotinic acetylcholine, Systems biology in drug discovery,
Curr. Opin. Pharmacol. 2005, 5, Nat. Biotechnol. 2004, 22, 1253-1259.
280-292. 79. R. Morphy, C. Kay, Z. Rankovic, From
71. R.C. Hogg, M. Raggenass, magic bullets to designed multiple
D. Bertrand, Nicotinic acetylcholine ligands, Drug Discov. Today 2004, 9,
receptors: from structure to brain 641-651.
function, Rev. Physiol. Biochem. 80. B.L. Roth, D. J. Sheffer, W.K. Kroeze,
Pharmacol. 2003, 147, 1-46. Magic shotguns versus magic bullets:
72. B.G. Livett, K.R. Gayler, Z. Khalil, selectively non-selective drugs for
Drugs from the sea: conopeptides as mood disorders and schizophrenia,
potential therapeutics, Curr. Med. Nut. Rev. Drug Discov. 2004, 3,
Chem. 2004, 1 I, 1715-1723. 353-359.
73. G.P. Miljanich, Ziconotide: Neuronal 81. C. Kung, D.M. Kenski, S.H.
calcium channel blocker for treating Dickerson, R.W. Howson, L.F.
severe chronic pain, C u m Med. Chem. Kuyper, H.D. Madhani, K.M. Shokat,
2004, I I , 3029-3040. Chemical genomic profiling to identify
74. A. Hillisch, L.F. Pineda, R. Hilgenfeld, intracellular targets of a multiplex
Utility of homology models in the kinase inihibitor, Proc. Natl. Acad. Sci.
drug discovery process, Drug Discov. U.S.A. 2005, 102, 3587-3592.
Today 2004, 9, 659-669.
Chemical Biology
7.2 Chemical Biology and Enzymology: Protein Phosphorylation as a Case Study I 385
7.2
Chemical Biology and Enzymology: Protein Phosphorylation as a Case Study
Philip A. Cole
Outlook
This chapter discusses two chemical technologies used to evaluate protein

kinase structure and function. The introduction of phosphorlate analogs of
phosphoamino acids site specifically into proteins by protein semisynthesis
has allowed for unique insights into the regulation of protein tyrosine phos-
phatases (PTP) and melatonin production. Mechanistically designed peptide
and protein-based bisubstrate analogs of protein kinases have been demon-
strated to be selective and also high-affinity ligands for both tyrosine and
serinelthreonine kinases. These compounds can be useful structural as well
as functional proteomic tools. By complementing well-established methods
used in protein kinase analysis, phosphonate incorporation into proteins and
bisubstrate analogs show promise in sorting out cell-signaling pathways. More
broadly, this chapter has attempted to convey the enormous opportunities for
modern chemical intervention in the study of enzymes in the postgenomic era.
7.2.1
Overview
The discovery of enzymes as protein-based catalysts for chemical reactions

in living organisms represents a milestone in our understanding of life and
in our development of cures in post-nineteenth-centurymedicine. While we
now know that not all proteins are enzymes, the study of enzymes in a range
of venues is still a central focus of modern biomedical research. Historians
of science point out that it has been a combination of the discovery and
development of new technologies and their experimental exploitation that
has led to new scientific concepts. Over the course of the twentieth century,
the application of novel technologies provided fundamental advances in our
understanding of enzyme mechanism and function. In the early years of en-
zymology, extensive use of chemically modified substrates (including isotopic
labels), group-modifying reagents to target specific amino acid side chains, and
varied reaction conditions (salt, pH, viscosity) led to relatively simple, but sur-
prisingly accurate, models of understanding of how enzymes work. Later in the
twentieth century, the revolutions in structural biochemistry beginning with
the first X-ray structure of an enzyme (lysozyme) bound to substrate analog
in 1965 have been critical to elucidating catalytic mechanisms and substrate
selectivity [ 11. Other biophysical techniques, especially N M R spectroscopy,
mass spectrometry, and fluorescence spectroscopy, have, in parallel, led to key
Chemical Biology From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gbnther Wess
ISBN: 978-3-527-31150-7
In 1994, the method of native chemical ligation was developed, which allows
for the efficient linking of large peptide segments with amide bonds [7].
The native chemical ligation strategy is based on Wieland’s chemoselective
reaction between an N-terminal Cys of one peptide and a C-terminal thioester
of another. This methodology was subsequently expanded in 1996 to use
in protein semisynthesis by generating N-terminal cysteines in recombinant
protein fragments via proteolysis [8]. An even more practical advance was
achieved when recombinant protein fragments containing thioesters were
generated by exploiting nature’s inteins [9, 101. These thioesters can be linked
to N-terminal cysteine containing peptides in a process that has been called
expressedprotein ligation (EPL)(Fig. 7.2-1).This technology has been particularly
useful in the study of enzyme recognition, mechanism, and regulation. EPL
is most efficiently applied when the region of the protein under study is near
the C-terminus such that chemical modification can be introduced within the
N-terminal cysteine containing synthetic peptide.
7.2.2
The Enzymology of Posttranslational Modifications o f Proteins
Whereas the field of enzymology has primarily concerned small-molecule

metabolic pathways over the past 80years, there is a growing interest
in focusing on enzyme structure and function that relates to protein
posttranslational modifications. It is now believed that posttranslational
modifylng pathways are hierarchically elevated in the context of governing cell
Fig. 7.2-1 Method of expressed protein ligation. Thiophenol can be substituted by

M ESNA (mercaptoethylsulfonate).
388
I growth and differentiation in health and disease. Modifications of particularly
intensive investigation include proteolysis, phosphorylation, acetylation,

methylation, ubiquitination, glycosylation, and carboxylation [ll]. Current
understanding of these processes, in general, is rather primitive. Many of the
chemical tags produced by posttranslational modifying enzymes are reversible
and tightly regulated by cellular machinery. Reconstructing these enzyme
pathways is especially challenging since protein substrates are abundant
and varied in the cell, creating an almost infinite number of potential sites of
modification. It is in addressing problems in the posttranslational modification
arena that the experimental arsenal of biochemists is sorely tested.
7.2.2.1 Protein Kinases and Phosphatases

Among enzyme superfamilies, protein kinases and protein phosphatases
(Fig. 7.2-2) have occupied a preeminent position in biomedical research
both because of their relatively large size and involvement in a myriad
of cell regulatory and disease processes. It is estimated that the human
genome encodes 500 protein kinases, about 80% serine/threonine selective
and the remaining 20% tyrosine selective [12]. There are about 100 protein
tyrosine phosphatases (PTPs)which include classical as well as dual specificity
enzymes [13]. Understanding the function and regulation of these enzymes
is a daunting task because of their large numbers as well as the numerous
potential cellular substrates and complex signaling networks in which they
participate. Peptide substrates and in vitro kinase assays are often unable to
replicate the specificity of in vivo phosphorylation events [14]. Protein kinase
inhibitors developed so far lack the specificity necessary to pinpoint kinase
function. Genetic knockouts, coimmunoprecipitation studies, two-hybrid
screens, site-directed mutagenesis, and other classical molecular biological
techniques have been of enormous help in analyzing protein kinases and their
functions but even these can be imprecise tools. Kinase-substrate interactions
are often very weak with regard to ground-state binding. Thus, standard
protein-protein interaction techniques can lack the sensitivity necessary
to identify kinase-substrate relationships. Gene deletions, even conditional
and tissue-specific knockouts, are unable to provide the temporal resolution
that underlies rapid phosphorylation events characterized by kinases. While
mutagenesis can be effective in analyzing the role of phosphorylation events,
the genetically encoded amino acids fall short in mimicking phosphoserine
and especially phosphotyrosine function. Since the early 199Os, chemical
Protein kinase
4&isx
u
ROH ROP0,'-
Protein phosphatase Fig. 7.2-2 Reversible protein phosphorylation.

7.2 Chemical Biology and Enzymology: Protein Phosphorylation as a Case Study 1 389
biologists have designed several powerful approaches to augment our ability

to analyze phosphorylation networks and functions [15-181. We will discuss
the development of two of these approaches, their scopes and limitations, and
highlight several applications.
7.2.2.1.1 Phosphonates as Probes o f Kinase Function

As described earlier, the ability to site specifically replace one amino acid with
another genetically encoded residue provides extraordinary access to analyze
protein structure and function. An area where it is often applied is in the
assessment of the role of phosphorylation of side chains. Typically, two classes
of mutants are made: those that prevent modification (nonphosphorylatable)
and those that are constitutive (nonhydrolyzable) phosphorylated mimics. For
the former, the phosphorylatable residues Ser and Thr are replaced with Ala,
and Tyr with Phe (Fig. 7.2-3). These are reasonably successful in many cases,
although they can be misleading because they lack the hydrogen-bonding and
polarity characteristics of the authentic residues [19]. More difficult is the
substitution of a phosphoamino acid with one of the 20 encoded residues.
Phosphoserine/threonine is commonly replaced with Asp or Glu residues
(Fig. 7.2-4). However, Asp and Glu are deficient in several respects. First,
Asp and Glu are considerably smaller than phosphoserine/threonine. Second,
Asp and Glu side chains have only two oxygen atoms available for receiving
hydrogen bonds and can only be monoanionic, unlike the typical dianionic
form of phosphate. Third, the pKa values of Asp and Glu are considerably
higher than that of the phosphate monoanion - indeed Asp and Glu
carboxylates can sometimes be found in the neutral form. Thus, interpreting
results with Asp and Glu substitutions can be difficult. For phosphotyrosine,
there are no really suitable replacements among the 20 natural amino acids.
Recognition of the lack of similarity between the phosphoamino acids
and the natural residue mimics have led investigators to design synthetic
analogs. Among these, the phosphonates are probably the closest mimics
and have been the most popular alternatives [20]. In these analogs, the
bridging oxygen between phosphorus and carbon is replaced by a methylene
or a difluoromethylene (Fig. 7.2-5). While the bond distances and angles are
Hobo- 0
+H3N
Ser
H O G0o -
+H3N
Thr
Ho\o-+H3N
TYr
0 +H3N qo-
Ala
0 "--i.:
+H3N
Phe
0 0-
Amino acid residues targeted by eukaryotic protein kinases and their

Fig. 7.2-3
nonphosphorylatable analogs.
PhosphoSer PhosphoThr ASP Glu
Fig. 7.2-4 Phosphosphorylated amino acid residues and genetically encoded mimics
0 0
-0 -;Lo- -o-;!.o-
0- 0- 0-
0 0 0 0
Prna F,Prna PrnP F2PrnP
Fig. 7.2-5 Phosphonate mimics of phosphorylated amino acids.
slightly different from an ester linkage, they are fairly close approximations.
The relative merits of fluoro versus hydrogen substitution in the bridging
methylene have also been described [21]. While the CF2 is slightly larger than
CH2 and sterically bulkier than a single oxygen atom, CF2, like oxygen, has the
potential to be a hydrogen bond acceptor via the fluorine lone pairs. Perhaps
more importantly, it confers a more physiologic pKa for the nonbridging
phosphate oxygens, encouraging the dianionic form at neutral pH. From a
practical perspective, the CF2 group can be exploited as a specific and sensitive
probe in NMR studies, although this has not been performed routinely.
Early work on the use of phenylalanine phosphonates in synthetic peptides
as SH2 domain ligands and phosphotyrosine phosphatase inhibitors proved
the efficicacy of these agents in medicinal chemistry [20,22]. Incorporation of
phosphonomethylene alanine (Pma)and phosphonomethylene phenylalanine
(Pmp) using nonsense-mediated suppression has also been shown to be
feasible using in vitro translation [5], but this has not been used for practical
applications, perhaps because of scale-up challenges. Pma and Pmp have not
yet been used in vivo in nonsense suppression, presumably because of the
limited cell permeability of the amino acids.
Protein semisynthesis and, in particular, EPL can provide a straightforward
route to phosphonate incorporation. Indeed, these techniques prove valuable
for site-specificincorporation of the standard phosphoamino acids which have
been effectively used in structural and enzymatic analyses [9, 231. EPL is most
efficiently used when the phosphate modification is within 50 amino acids of
the C-terminus of the desired protein or protein fragment. The next simplest
case for protein semisynthesis occurs when the modification of interest is near
the N-terminus and is installed in a C-terminal thioester containing peptide.
Because of the somewhat more challenging task of preparing complex peptides
carrying thioesters, this strategy can be a bit more cumbersome than EPL.
However, phosphonates have now been incorporated using both strategies
and in the following text, we will describe applications of these approaches in
investigations on PTPs and serotonin N-acetyltransferase.
7.2.2.1.2 Protein Tyrosine Phosphatases as Substrates o f Kinases

The PTPase family consists of about 100 family members that include
both classical and dual specificity (Ser/Tyr) for hydrolyzing phosphoproteins
and, sometimes, phospholipids [13]. Like protein kinases, they are usually
multidomain enzymes and are subject to a range of regulatory events.
Somewhat paradoxically, many PTPases are themselves substrates for
protein tyrosine kinases [24]. However, working out the function of these
phosphorylation events has been a challenging task. As one might expect,
these phosphorylated PTPase forms are quite unstable and readily undergo
presumed autodephosphorylation, complicating biochemical analysis. Some
investigators have attempted to use thiophosphorylation catalyzed by protein
kinases, but achieving high stoichiometry and site specificity is very difficult;
moreover, thiophosphates are still susceptible to enzymatic hydrolysis, albeit
more slowly [25]. Here, phosphonate analog incorporation is an attractive
solution.
7.2.2.1.3 SHP-1 and SHP-2

Examples of tyrosine phosphatases that are subject to tyrosine phosphorylation
include SHP-1 and SHP-2 [26]. These phosphatases are the SH2 domain
containing tyrosine phosphatases that have the domain architecture shown
and include two tandem N-terminal SH2 domains followed by a catalytic
domain and ending in a C-terminal tyrosine phosphorylated tail (Fig. 7.2-6).
They are quite homologous overall in terms of the amino acid sequence but
do show significant functional differences. SHP-2 is ubiquitously expressed
and implicated as a positive effector of growth factor receptor tyrosine kinase
signaling through MAP kinases [26]. Noonan syndrome, which is a genetic
disease involving multiple developmental abnormalities, is often caused by
mutations in SHP-2 [26].SHP-1 expression is most prominently expressed in
cells of hematopoietic lineage [26]. In contrast to SHP-2, SHP-1 is generally
regarded as a negative regulator of MAP kinase signaling [26]. Mutations of
SHP-1 in mice lead to pulmonary fibrosis through unclear mechanisms [26].
Both SHP-1 and SHP-2 show similar three-dimensional structures which
are noteworthy for a large surface of interaction between the N-terminal
S H 2 domain and the catalytic domain [26]. Enzymatic studies show that this
interaction, which can be disrupted by point mutations or SH2 engagement by
392
N N-SHP C-SH2 PTPase C SHP-2
PTPase c SHP-1
Fig. 7.2-6 Domain architecture of protein tyrosine phosphatases SHP-1 and SHP-2. The
highlighted tyrosine residues are modified by protein tyrosine kinases.
trans-phosphotyrosinepeptide ligands, is quite repressive for catalytic activity

[26].Removal of the SH2 domains activates the phosphatase activity of SHP-1
and SHP-2 by 10-fold or more and the binding of the SH2 domains by
phosphotyrosine ligands can be comparably stimulating [26].
7.2.2.1.4 Phosphonates as Probes o f SHP-1 and SHP-2 Regulation

Several groups have shown that SHP-2and SHP-1are C-terminallyphosphory-
lated on two tyrosine residues but the function of these phosphorylation events
is controversial. One model is that these phosphorylation events may recruit
SH2 domain containing adaptor proteins such as Grb2. Another model is that
they may modulate the activity of the enzymes. To address these problems,
EPL was employed to incorporate the phosphonate analogs Pmp or FzPmp
at the sites of modification. Semisynthetic proteins containing one or two
phosphonates at the physiologic sites were prepared [24, 27, 281.
In the case of SHP-2,each ofthe phosphonate replacements were responsible
for two- to threefold stimulation of phosphatase activity [24]. It should
be noted that FzPmp was associated with about 1.5-fold greater activation
than the corresponding Pmp substitution [27]. Moreover, the two Pmps,
when present together, showed nearly additive effects, suggesting concerted
mechanistic models [27]. Partial proteolysis studies along with site-directed
mutagenesis experiments revealed that Y-542 was likely interacting with
the N-terminal SH2 domain and Y-580, with C-terminal SH2 domain [24,
271, each in an intramolecular fashion (Fig. 7.2-7). Not surprisingly, the
corresponding phosphotyrosine groups were “protected” from intermolecular
phosphatase activity by these SH2 interactions [27]. While the activation by
Pmp-542 resulting from intramolecular engagement of the N-SH2 domain
could be readily rationalized from the X-ray structure, the effects of the
C-SH2 interaction with Pmp-580 were less easily understood and were
presumably related to an indirect effect on conformation. To evaluate the
relevance of these findings to in vivo signaling, cellular microinjection
studies were undertaken [24]. It should be pointed out that a practical
shortcoming of in vitro semisynthesis of an engineered protein is the need
to rely on relatively cumbersome techniques, such as microinjection, to
UnphosphorylatedSHP-2
PTPase
-
Y-542
/ \
protein tyrosine kinase
,7-580
pj-580
pY-542
i C-SH2
PTPase
PTPase
542-Phosphorylated
580-Phosphorylated
Fig. 7.2-7 Model for structural regulation o f SHP-2 by tyrosine phosphorylation
study its intracellular effects and behavior. Nevertheless, the microinjection

method for the introduction of semisynthetic SHP-2-modifiedproteins proved
feasible and permitted an analysis of the effects of Pmp-542 modification on
protein stability and MAP kinase activation [24]. The effects on MAP kinase
activation were indirectly monitored via a serum response element reporter.
Immunocytochemical analysis revealed that the Pmp-542 containing SHP-2
showed a significant relative activation of MAP kinase compared with Tyr542
containing SHP-2, whereas both the proteins showed similar stabilities in the
cell. This provided compelling data that the tyrosine phosphorylation of SHP-2
could specifically stimulate signaling in an important cellular pathway, and this
finding has subsequently been confirmed and extended in other studies [29].
In experiments on SHP-1, related but nonidentical effects of tail phos-
phonates were observed [28]. While up to an eightfold enhancement of
catalytic activity by FLPmp substitution at Tyr536 was detected, only a 1.6-
fold stimulation of phosphatase action by substitution at Tyr564 was found
[28]. Mutagenesis revealed that these effects were mediated by intramolecular
interactions with the N-SH2 and C-SH2 domains, respectively, analogous to
the behavior of SHP-2 [28]. Interestingly, unlike SHP-2, these phosphony-
lated residues were quite accessible to Grb2 interaction, indicating that the
intramolecular interactions were less energetically favorable than the SHP-2
case [24, 281. These studies reveal the value of studying the detailed molecular
energetics of posttranslational effects on individual protein homologs.
394
I 7.2.2.2
Regulation o f Serotonin N-acetyltransferase by Phosphorylation

Serotonin N-acetyltransferase (arylalkylamine N-acetyltransferase, AANAT)
catalyzes the penultimate and regulated step in the pineal gland biosynthesis
of melatonin, the critical circadian rhythm hormone (Fig. 7.2-8) [30]. It has
been known for over 30years that the rhythm of melatonin production is
driven by a rise and fall of AANAT, which is highest at night and falls during
the day [30]. Moreover, when mammals and people are exposed to light in
the middle of the night, a rapid decrease in AANAT follows [30]. Over the
last few years, the role of phosphorylation of AANAT has been proposed to
contribute to this regulatory process. In the current model, AANAT can be
phosphorylated on Thr32 and Ser205 by protein kinase A (PKA), which is, in
turn, under the regulation of the adrenergic G-protein-coupled receptor [31].
Upon phosphorylation, a 14-3-3recruitment is believed to occur which might
somehow shield AANAT from proteolytic degradation (Fig. 7.2-9).
7.2.2.2.1 Phosphonates as Probes o f Serotonin N-acetyltransferase Regulation

A prediction of the kinase regulatory model for melatonin rhythm is that
AANAT, which incorporates phosphate mimics at the protein kinase A (PKA)
phosphorylation sites, should show resistance to proteolysis and increased
cellular stability [32, 331. The usual Ser/Thr to Glu mutations were considered
unlikely to be a promising strategy on the basis of the structural features
of the 14-3-3-phosphoprotein interaction [32]. The phosphoAANAT-14-3-3
complex reveals that each of the three nonbridging phosphate oxygens are
involved in hydrogen-bonding interactions with 14-3-3 residues [47]. Thus,
phosphonate-containing AANATs were prepared by the methods of native
chemical ligation (Thr32 replacement) and EPL (Ser205) [32, 331. These
studies used Pma (Thr32) and FZPma (Ser205). The corresponding Glu32
AANAT was generated for use in 14-3-3binding analysis [32]. As expected, the
-
dNH2
- H
C02H
Tryptophan
Hydroxylase
0 2
H o
H
d ' 2
Aromatic
aminoacid
decarboxylase
H
Serotonin
L-Tryptophan (5-hydroxytryptarnine)
0 0
A
-
Serotonin N-acetyl- o-methyl

HydroxyindoleM
e
OE
J CH3
transferase
..
H H
N-Acetyl-serotonin
Melatonin
Fig. 7.2-8 Biosynthetic pathway t o melatonin from tryptophan.

1
. ... ..
"Destruction"
dirner
"Protection"
Fig. 7.2-9 Proposed model for the regulation of serotonin N-acetyltransferase (AANAT)
by phosphorylation.
Pma-32 and PhosThr32 AANAT proteins showed strong (and similar) affinity
for the 14-3-3interaction, whereas the Ala and Glu AANAT proteins showed
minimal binding to 14-3-3under these conditions [32]. Likewise, F2 Pma-205
and PhosSer205 AANAT showed similar 14-3-3binding affinity to each other
but enhanced 14-3-3affinity compared to Ser205 AANAT.
The stabilities of semisynthetic AANATs were explored in Chinese hamster
ovarian (CHO)cells using microinjection methods [32,33].This cell type, while
not identical to the natural pinealocytes, has been shown to recapitulate many
of the features of AANAT regulation and has, thus, been used as a model system
[34].Immunocytochemistry showed that nonphosphorylated AANAT injected
into CHO cells is readily observed minutes after microinjection but disappears
mostly by 1 h [32]. Stabilities were low and similar for PhosThr32 and Glu32
containing AANATs. Strikingly, Pma-32 AANAT is greatly stabilized compared
to each of these other proteins, indicating a direct role for this phosphorylation
event in stimulating melatonin production [32].It is noteworthy that PhosThr32
AANAT showed diminished stability compared to Pma-32 AANAT and this
suggests that phosphatases play a critical role in rapidly reversing the effects
of cellular phosphorylation. The importance of 14-3-3 in contributing to the
AANAT regulation was revealed by demonstrating that PhosThr32 AANAT but
not Glu32 AANAT was significantly stabilized by concomitant microinjection
with the 14-3-3 adaptor protein [32]. Related findings were demonstrated in
the case of Ser205-modified protein comparing F2Pma and Ser205 AANAT
stability [33].Thus, phosphonate analogs have been effectivelyutilized to clarify
the basis of AANAT and melatonin regulation.
7.2.2.3 Bisubstrate Analogs as Protein Kinase Inhibitors

For the past 20years, investigators have recognized the need for selective
protein kinase inhibitors as research tools [35]. Such tools can be used to
396
I examine the function of a particular kinase in cell lysates, cell culture, or in
vivo.They can be used to aid in structural studies and other biophysical analyses.
Numerous natural products and synthetic scaffolds have been employed for
this purpose [35]. Most efforts that have led to potent protein kinase inhibitors
have exploited the ATP-binding site [35]. The advantage of this site is that it is
relatively hydrophobic, deep, and contains hydrogen bond donorslacceptors,
which allow for enhanced affinity. Molecules that target the ATP site are often
cell permeable and can show favorable pharmacokinetic properties. However,
ATP binding is relatively conserved among protein kinases, making specificity
difficult to achieve.
Because protein kinases, by definition, always must bind a protein substrate
prior to phosphorylation, compounds that disrupt this interaction would also
be useful kinase inhibitors. The advantage of protein substrate sites is that
they often display relatively specific interactions with their individual tar-
gets, necessary for achieving their precise biological functions [36]. However,
the kinase interactions with protein targets are often of modest affinity,
reflecting the shallow interaction surfaces involved. Aside from a few notable
exceptions often inspired by naturally occurring protein kinase inhibitor
peptide sequences [37],protein substrate site inhibitors have not yet proved to
be highly efficacious.
An approach to inhibitors that have the potential to improve both potency
and specificity involves the covalent linking of nucleotide and peptide site
ligands. Often termed bisubstrate analogs, these compounds can, in principle,
achieve binding energies that are equal to or greater than the sum of the
binding energies of the individual ligands [38]. In the case of protein kinases,
much of the potency can be expected to be derived from the nucleotide-
binding site, whereas the specificity should relate to the more divergent
protein substrate-binding site. A critical element in the design of such protein
kinase-bisubstrate analog inhibitors relates to the choice of the linker. To
underscore this point, an early effort to prepare a potent protein kinase A
bisubstrate inhibitor resulted in a relatively weak compound [39]. In this
design, the consensus peptide substrate kemptide was directly linked via its
Ser oxygen to the y-phosphate of ATP generating 1 (Fig. 7.2-10). Bisubstrate
analog 1 showed an approximate Ki of 125 p M and was slightly weaker in
affinity than ATP itself [39].
7.2.2.3.1Bisubstrate Tyrosine Kinase Inhibitors Designed for Dissociative

Mechanisms
Finding effective linkers for bisubstrate analogs could, in principle, be based
on combinatorial chemistry or rational design principles. Since compounds
synthesized to mimic the transition state are often potent enzyme inhibitors,
a consideration of enzyme mechanism might be helpful in linker design.
In this regard, a preponderance of evidence including enzyme model
reactions, linear free-energy relationships, pH-rate profiles, and X-ray crystal
7.2 Chemical Biology and Enzymology: Protein Phosphoryylation as a Case Study
I 397
R2
RZ
H O OH HO OH
R1=NH2-Leu-Arg-Arg-Ala- R =AcNH-Lys-Lys-Lys-Leu-Pro-Ala-Thr-Gly-Asp-
1
R2= -Leu-Gly-C02H R,= -Met-Asn-Met-Ser-Pro-Val-Gly-Asp-C02H 2
n
HO OH
Fig. 7.2-10 Bisubstrate analogs for protein kinases
structures suggests that protein kinases catalyze phosphoryl transfer via a

dissociative transition state [18]. In such a transition state, the entering
group (Ser/Thr/Tyr) forms little or no bond with the attacked phosphorus
before near-complete severing of the bond between the phosphorus and
the leaving group (ADP). This mechanism relies on the high reactivity of
the electrophilic metaphosphate-like species. Mildvan has suggested that the
reaction coordinate distance between the ATP and Ser or Tyr might be 5 A
prior to the development of a dissociative transition state [40].
A bisubstrate analog 2 for the insulin receptor kinase (IRK) was developed
with this framework in mind, in which an acetyl spacer was inserted between
the ATPyS and an I R K peptide substrate [41]. Because pH-rate studies had
suggested that proton removal from the substrate Tyr occurs late [18],a Tyr
isostere was chosen in which the Tyr oxygen was replaced with a nitrogen
atom. This anilino nitrogen could comprise part of the linker but still serve as
a hydrogen bond donor to the highly conserved catalytic-loop Asp carboxylate.
The extended distance from the anilino nitrogen to the y-phosphorus was
estimated to be 5.7 A, slightly longer than the 5 A reaction coordinate distance
predicted for a dissociative transition state. The synthesis of this compound was
efficiently achieved by exploiting a chemoselective ligation between ATPy S
and the readily prepared bromoacetanilido peptide [41]. While these peptide-
ATP conjugates are acid labile, they are quite stable under physiologic buffer
conditions. In accordance with design, compound 2 was shown to be a
potent I R K inhibitor with K, of 370 nM, competitive versus both ATP and
peptide substrate [41]. This potency was nearly equivalent to that expected
for summing the binding energies of the individual ligands ATPyS and
the insulin receptor peptide substrate. Deletion of the peptide moiety (as in
compound 3, Fig. 7.2-10) led to a much weaker inhibitor, comparable to the
398
I potency of ATPyS itself
[41]. An X-ray crystal structure of the IRK-bisubstrate

analog complex (Fig. 7.2-11) indicated that several of the design principles
were validated [41]. Thus, the nucleotide- and peptide-binding sites on the
IRK were dually occupied by the inhibitor, the distance between the anilino
nitrogen and the y-phosphate was about 5 A, and a hydrogen bond between
the anilino nitrogen and the catalytic Asp was maintained. Surprisingly, the
acetyl linker carbonyl was found to be a ligand for the active site Mg, replacing
a water molecule observed in the ternary complex structure.
The structural basis for potent inhibition has also been probed by preparing
and testing a series of closely related analogs of 2 as IRK inhibitors (Fig. 7.2-12)
[42]. Among these, replacement of the anilino nitrogen with a more native
Fig. 7.2-11 Cocrystal structure o f o f compound 2. Compound 2 is shown in a

bisubstrate analog 2 bound t o the insulin ball-and-stick representation with nitrogen
receptor kinase (IRK) domain [41]. IRK is atoms colored blue, oxygen atoms colored
shown in molecular surface representation red, sulfur atoms colored green, and
with atoms ofthe N-terminal lobe colored phosphorus atoms colored black. Carbon
blue and atoms ofthe C-terminal lobe atoms o f the peptide moiety are colored
colored gray. The molecular surface is yellow, and carbon atoms of the ATP moiety
semitransparent and shows the ATP moiety and linker are colored orange.
NH2
HO OH
b
0
R2
6
HO OH
R, =AcNH-Lys-Lys-Lys-Leu-Pro-Ala-Thr-Gly-Asp-
Rp= -Met-Asn-Met-Ser-Pro-VaCGly-Asp-COzH
Fig. 7.2-12 Bisubstrate analog inhibitors of the insulin receptor kinase with varying
linkers.
oxygen atom (compound 4) introduced an 80-fold penalty in binding affinity

[42]. This gave further credence to the relative importance of the hydrogen
bond between the anilino nitrogen and Asp. Also deleterious to potency
were alterations in the spacer length by methylene insertion (compound 5)
or phosphate removal (compound 6) which cost 18-fold and more than 200-
fold penalties, respectively [42]. These observations underscore the value of
targeting the precise reaction coordinate distance by the designed inhibitor.
One unanticipated dividend of the structure of the complex between the IRK
and 2 was the more detailed information relating to the molecular recognition
of the peptide moiety-kinase interaction [42]. Many more contacts between
the enzyme and peptide moiety were seen in this structure than in the ternary
complex where the peptide was largely disordered [43]. In hindsight, this can
be understood as reflecting the greater stability of the bisubstrate complex. As
expected, substitution or deletion of key amino acids observed in the structure
led to reduced affinity, in the range of 5-lO-fold per modification [42]. These
results indicate that bisubstrate analogs combined with X-ray crystallographic
analysis have the potential to enhance the understanding of peptide recognition
by k'inases.
7.2.2.3.2 Bisubstrate Analog Designed for a Serine/Threonine Kinase

The favorable results in the case of the insulin receptor tyrosine kinase
prompted the application of the bisubstrate analog approach to a serine/
threonine kinase [44]. Protein kinase A was selected because it had been
400
I previously targeted with the directly linked ATP-kemptide conjugate 1 as
7 Reverse C h e m i c a l Genetics Revisited
described above [39]. In this case, aminoalanine was used as a surrogate

for serine, and bromoacetamide was readily coupled with ATPy S, affording
compound 7 (Fig. 7.2-13) [44].The ATPy S-acetyl-kemptideconjugate 7 proved
to be a moderately potent inhibitor of protein kinase A with a Ki of 3 pM
[44].Interestingly, this compound was a competitive inhibitor against ATP but
noncompetitive against peptide [44].This pattern of inhibition can be attributed
to the previously established preferred order of the binding mechanism of ATP
prior to peptide [44].Bisubstrate analog 7 was about 40-fold more potent than
the original ATP-kemptide conjugate 1, consistent with the importance of
spacer length. Bisubstrate analog 7 showed very weak ability to block protein
kinase C, which is noteworthy because of the overlapping peptide substrate
specificity of these two enzymes [44]. While its structural basis is not yet
understood, this selectivity highlights the potential of using the bivalent
approach to individually target closely related protein kinases.
7.2.2.3.3 Protein-ATP Conjugates as Kinase Ligands Prepared by Expressed

Protein Ligation
Many protein kinases are rather inefficient at catalyzing the phosphorylation
of short synthetic peptides but are highly effective at attaching a phosphate
to full-length protein substrates. In general, the molecular basis for this
specificity is not understood. A classical example of this behavior is the
phosphorylation of the tail tyrosine residue of Src by the protein tyrosine
kinase Csk [45].This phosphorylation event is known to be important because
it downregulates the Src kinase activity by inducing a complex conformational
change in the Src protein [45]. It has been demonstrated that C-terminal
tyrosine containing peptides derived from Src are very poor Csk substrates
in vitro [45]. Nevertheless, recombinant Src protein that includes at least the
7 o y p ?3
R4
HNxNH2
HNLNH
1. (PhW4Pd(0)
+
2. Et2NCS2H
Et3N R4
NH,
1. Bromoacetic acid, DIC

___)
2. TFA. H20, CH2C12,thioanisole
7 HO OH
R, =AcNH-Leu-Arg-Arg-Ala-
R2= -Leu-Gly-C02H
R,=AcNH-Leu-Arg( Pmc)-Arg(Pmc)-Ala-
R4= -Leu-Gly-C02-Wang resin
Fig. 7.2-13 Synthetic scheme for the generation o f a protein kinase A selective
bisubstrate analog inhibitor based on a dissociative transition state.
References I401
Fig. 7.2-14 A Src-ATPyS conjugate as a high-affinity Csk ligand produced by expressed

protein ligation.
Src catalytic domain and C-terminal tail is an excellent in uitro substrate,

about 1000-fold better than peptides [45]. I t is noteworthy that the ground-
state interaction between Csk and Src is quite weak (& > 50pM) even
though the apparent Src K, is in the 2-4 pM range [45]. A high-resolution
cocrystal structure of the Csk-Src complex that might provide insights into
the molecular recognition has not yet been obtained.
In order to generate a high-affinity Src-related ligand for Csk which might
aid structural studies, a bivalent Src conjugate was prepared in which ATPyS
linkage was introduced into the Src tail [4G].Because the target molecule
contains a protein ofgreater than 300 amino acids, total chemical synthesis was
an unrealistic option. However, using EPL, the ATPy S-acetanilide function
was readily introduced into the Src tail (Fig. 7.2-14) [4G].As expected, this
produced a potent (sub-micromolar) ligand for Csk [4G].Specificity of this
Src-ATP conjugate for Csk was shown using a pull-down experiment from
cell extracts [4G].These studies also point to the use of both peptide- and
protein-ATP conjugates in proteomic analysis.
References
1. L.N. Johnson, D.C. Phillips, Nature 5. L. Wang, P.G. Schultz, Angav. Chem.,
1965, 206,761-763. Int. Ed. Engl. 2004,44, 34-66.
2. C.T. Walsh, Enzymatic Reaction 6. C.]. Wallace, Cum. Opin. Biotechnol.
Mechanisms, W.H. Freeman, 1978, 1995, 6,403-410.
New York, NY. 7. P.E. Dawson, T.W. Muir,
3. G. Winter, A.R. Fersht, A.J. 1. Clark-Lewis, S.B. Kent, Science 1994,
Wilkinson, M. Zoller, M. Smith, 266, 776-779.
Nature 1982, 299,756-758. 8. D.A. Erlanson, M. Chytil, G.L.
4. T.W. Muir, S.B. Kent, Curr. Opin. Verdine, Chem. B i d . 1996,
BiotechnoL 1993, 4,420-427. 3,981-991.
402
I 9. T.W. Muir, D. Sondhi, P.A. Cole, Proc. 27. W. Lu, K. Shen, P.A. Cole,
Nat!. Acad. Sci. U.S.A. 1998, 95, Biochemistry 2003, 42, 5461-5468.
6705-6710. 28. Z. Zhang, K. Shen, W. Lu, P.A. Cole,
10. T.C. Evans Jr, J. Benner, M.Q. Xu, J . Biol. Chem. 2003, 278,4668-4674.
Protein Sci. 1998, 7, 2256-2264. 29. T. Araki, H. Nawa, B.G. Neel,J. Biol.
11. C.T. Walsh, Posttranslational Chem. 2003,278,41677-41684.
Modijcation of Proteins: Expanding 30. S . Ganguly, S.L. Coon, D.C. Klein, Cell
Nature’s Inventory, Roberts & Co, 2005, Tissue Res. 2002, 309, 127-137.
Greenwood Village, Co. 31. S. Ganguly, J.L. Weller, A. Ho,
12. G. Manning, D.B. Whyte, R. Martinez, P.Chemineau, B. Malpaux, D.C.
T. Hunter, S. Sudarsanam, Science Klein, Proc. Natl. Acad. Sci. U.S.A.
2002,298,1912-1934. 2005, 102,1222-1227.
13. A. Alonso, J. Sasin, N. Bottini, 32. W. Zheng, Z. Zhang, S. Ganguly, J.L.
I. Friedberg, A. Osterman, A. Godzik, Weller, D.C. Klein, P.A. Cole, Nat.
T. Hunter, J. Dixon, T. Mustelin, Cell Struct. Biol. 2003, 10, 1054-1057.
2004, 117,699-711. 33. W. Zheng, D. Schwarzer, A. LeBeau,
14. K.M. Shokat, Chem. Biol. 1995, 2, J.L. Weller, D.C. Klein, P.A. Cole,].
509-514. Biol. Chem. 2005,280,10462-10467.
34. G. Ferry, J. Mozo, C. Ubeaud,
15. M.A. Shogren-Knaak, P.J. Alaimo,
K.M. Shokat, Annu. Rev. Cell Deu. Biol. S. Berger, M. Bertrand, A. Try,
2001, 17,405-433. P. Beauverger, C. Mesangeau,
16. S.A. Johnson, T. Hunter, Nat. Methods P. Delagrange, J.A. Boutin, Cell. Mol.
2005, 2,17-25.
L f e Sci. 2002,59,1395-1405.
35. P. Cohen, Nat. Rev. Drug Discov. 2002,
17. D.M. Williams, P.A. Cole, Trends
1, 309-315.
Biochem. SOC.2001, 26, 271-273.
36. D.S. Lawrence, J. Niu, Pharmacol.
18. P.A. Cole, A.D. Courtney, K. Shen,
Ther. 1998, 77, 81-114.
Z. Zhang, Y. Qiao, W. Lu, D.M.
37. J.H. Lee, S.K. Nandy, D.S. Lawrence, J .
Williams, Acc. Chem. Res. 2003, 36,
Am. Chem. SOC.2004, 126,3394-3395.
444-452.
38. K. Parang, P.A. Cole, Pharmacol. Ther.
19. D. Wang, P.A. Cole,J. Am. Chem. SOC.
2002, 93,145-157.
2001, 123,8883-8887.
39. D. Medzihradszky, S.L. Chen, G.L.
20. S.M. Domchek, K.R. Auger, Kenyon, B.W. Gibson, J . Am. Chem.
S. Chatterjee, T.R. Burke Jr, S.E. SOC.1994, 116,9413-9419.
Shoelson, Biochemistry 1992, 31, 40. A.S. Mildvan, Proteins 1997, 29,
9865-9870. 401-416.
21. L. Chen, L. Wu, A. Otaka, M.S. Smyth, 41. K. Parang, J.H. Till, A.J. Ablooglu,
P.P. Roller, T.R. Burke Jr, J. den R.A. Kohanski, S.R. Hubbard, P.A.
Hertog, Z.Y. Zhang, Biochem. Biophys. Cole, Nat. Struct. Biol. 2001, 8, 37-41.
Res. Commun. 1995,216,976-984. 42. A.C. Hines, K. Parang, R.A. Kohanski,
22. T.R. Burke Jr, Z.J.Yao, D.G. Liu, J. S.R. Hubbard, P.A. Cole, Bioorg.
Voigt, Y. Gao, Biopolymers 2001, 60, Chem. 2005,33,285-297.
32-44. 43. S.R. Hubbard, EMBOJ. 1997, 16,
23. J.W. Wu, M. Hu, J. Chai, J. Seoane, 5572-5581.
M. Huse, C. Li, D.J. Rigotti, S. Kyin, 44. A.C. Hines, P.A. Cole, Bioorg. Med.
T.W. Muir, R. Fairman, J. Massague, Chem. Lett. 2004, 14,2951-2954.
Y. Shi, Mol. Cell. 2001, 8, 1277-1289. 45. P.A. Cole, K. Shen, Y. Qiao, D. Wang,
24. W. Lu, D. Gong, D. Bar-Sagi, P.A. Curr. Opin. Chem. Biol. 2003, 7,
Cole, Mol. Cell. 2001, 8, 759-769. 580-585.
25. H. Cho, R. Krishnaraj, M. Itoh, 46. K. Shen, P.A. Cole, J . Am. Chem. SOC.
E. Kitas, W. Bannwarth, H. Saito, C.T. 2003, 125,16172-16173.
Walsh, Protein Sci. 1993, 2, 977-984. 47. T. Obsil, R. Ghirlando, D.C. Klein,
26. B.G. Ned, H. Gu, L. Pao, Trends S. Ganguly, F. Dyda, Cell 2001, 105,
Biochem. Sci. 2003, 28, 284-293. 257-267.
Chemical Biology
7.3 Chemical Strategiesfor Activity-based Proteomics I 403
7.3
Chemical Strategies for Activity-based Proteomics
NadimJessani and Benjamin F. Cravatt
Outlook
The assignment of molecular and cellular functions to the numerous protein

products encoded by prokaryotic and eukaryotic genomes presents a major
challenge to the field of proteomics. To address this need for higher order
functional proteomic strategies, a chemical proteomic method referred to as
activity-based protein profiling (ABPP) was introduced, in which active site-
directed small-molecule probes are employed to measure protein activity
rather than abundance. By covalently labeling the active sites of enzyme
superfamilies, ABPP provides a direct readout of global changes occurring in
the functional state of enzyme families present in samples of high biological
complexity.
The goal of this chapter is to detail the need for such activity-based methods,
and to describe the development and application of ABPP by highlighting
several studies that have established the utility of this chemical proteomic
method as a powerful strategy for the discovery and functional analysis of
complex biological proteomes, as well as their individual constituents.
7.3.1
Introduction
The molecular information provided by the availability of complete genome

sequences for numerous prokaryotic and eukaryotic organisms has granted
biomedical researchers an unprecedented opportunity to understand better
the molecular basis of life in its many forms. To accelerate this process,
global experimental approaches, such as genomics [ 11 and proteomics [ 2 ] ,
have been introduced to characterize genes and proteins collectively, rather
than individually. Most genomic and proteomic methods, however, rely
on measurements of mRNA and protein abundance as indirect estimates
of protein function, a potentially risky assumption considering that most
proteins are regulated by posttranslational events in vivo [ 3 ] . Considering
that proteins mediate nearly all biochemical events underlying cell and
organismal physiology and pathophysiology, the need to develop general
methods to measure levels and activities of these biomolecules directly in
cell and tissue proteomes is apparent. Indeed, the ability to profile classes of
proteins based on the activity would greatly accelerate assignment of protein
function and identification of new biomarkers and therapeutic targets for the
diagnosis and treatment of human disease. To address this need for higher
Copyright 0 2007 WILEY-VCH Verlag G d b H & Co KGaA Weinheim
ISBN 978-3-527-31150-7
404
I order functional proteomics methods, a chemical proteomic strategy referred
to as activity-based protein profiling (ABPP) [4,51 emerged, which utilizes

active site-directed chemical probes that measure protein activity rather than
abundance to profile the functional state of enzyme families directly in
complex proteomes. By providing a covalent link between labeled proteins
and a chemical tag, ABPP permits the consolidated detection, isolation, and
identification of active enzymes directly from samples of high biological
complexity.
7.3.2
History/Development
7.3.2.1 Global Approaches for Biological Research in the Postgenome Era

A fundamental goal of biological research is to understand the complex
roles that enzymes play in physiological and pathological processes and
to use this knowledge to decipher the molecular correlates of health
and disease. Until recently, this process of discovery principally entailed
an iterative cycle of identifying, isolating, and functionally characterizing
proteins and genes associated with a particular molecular or cellular event.
However, with the dawn of complete genome sequence availability for
numerous prokaryotic and eukaryotic organisms, the scientific community
experienced a paradigm shift that transformed the most basic methods of
experimentation. From this, several global experimental approaches evolved
to meet the emerging challenge and opportunity of characterizing genes
and/or proteins collectively, rather than individually. These approaches
included genomics [ 11, the analysis of a cell’s complete transcript repertoire
(transcriptome), and proteomics [ 2 ] , the analysis of a cell’s complete protein
repertoire (proteome). Indeed, genomics, or “functional” genomics, evolved
rapidly as a field, with gene microarray studies nearing the goal of quantitatively
comparing in a single experiment the complete transcriptomes of two
test samples. Such studies have provided valuable insights into the global
gene expression patterns of many pathologies, such as cancer[6] and
inflammation [7].
However, inherent to most genomics approaches is their reliance on mRNA
transcript levels as an indirect measure of protein quantity and function. To
grant biochemical and cell biological meaning to genomic data, one must
accept that dynamics in mRNA expression correlate with similar changes in
protein levels and activity, a potentially problematic assumption given the
numerous posttranscriptional and posttranslational events known to regulate
protein expression and function [3].Furthermore, although transcript profiling
has become a standard tool in biomedical research, the need for global
characterization of biological samples at the level of the proteome will likely be
critical for the identificationof new diagnostic markers and drug targets. While,
proteomics as a field has rapidly evolved to meet these challenges, standard
approaches are often restricted to detecting changes in protein abundance,

and therefore, do not take into account numerous posttranslational events that
regulate protein activity. Thus, the need for proteomic methods that measure
activity rather than abundance to complement conventional genomic and
proteomic strategies has become apparent.
7.3.2.2 Chemical Strategies for Functional Proteomics

Given the success of genome sequencing projects, biological research has been
launched into a new era where focus has shifted from the identification of
novel genes to the functional characterization of gene products. Considering
that the number of unique human genes appears to exceed 25000, the
daunting task of assigning molecular, cellular, and physiological function to
the protein products encoded by these genes awaits postgenomic researchers.
To accelerate this process, and as a complement to genomics, the field of
proteomics has the development and application of methods for the parallel
analysis of large numbers of proteins as one of its major goals [2]. However,
the technical challenges associated with proteomic studies greatly exceed those
faced by genomics [S]. For example, while gene microarrays can exploit the
inherent specificityof complementary oligonucleotidehybridization to analyze
vast numbers of distinct mRNA transcripts in parallel, proteins lack such high-
specificity binding partners for use as selective probes. Unlike nucleic acids,
molecular amplification strategies such as PCR (polymerase chain reaction)
do not exist for proteins, thereby restricting the ability to analyze samples
where only minimal or limited quantities of cellular material are available
(e.g., single cell analysis or clinical specimens). Moreover, while nucleic acids
generally display similar biochemical properties, proteins exhibit a wide range
of distinct biochemical properties and cannot be treated as experimentally
equivalent. These properties include membrane-association, hetero- and
homo-oligomerization,and a host of posttranslational modifications, meaning
that no single experimental protocol is suitable for the characterization of all
proteins.
Given these technical challenges, the development of complementary
analytical strategies must maximize the information content extractable
from proteomic samples. Such proteomic strategies included efforts to
characterize both protein expression and protein function on a global
scale. The most mature current method for analyzing protein expression
patterns utilizes two-dimensional electrophoresis (2DE) for the separation
of proteins coupled with protein staining and mass spectrometry (MS)
for protein detection and identification, respectively [9]. Although 2DE-MS
methods permit the consolidated analysis of the relative expression levels of
many proteins across multiple proteomic samples, these approaches suffer
from an inability to resolve several important protein classes, including
low abundance and membrane-associated proteins [lo]. To address these
shortcomings, several powerful MS-based strategies for the gel-free analysis
406
of proteomes have emerged, including isotope-coded affinity tagging (ICAT)

for quantitative proteomics [I11 and multidimensional protein identification
technology (MudPIT) for comprehensive proteomics [12]. ICAT, for example,
utilizes chemical labeling reagents, referred to as isotope-coded aflnity
tags to enable the comparative analysis of protein expression levels by
liquid chromatography (separation) and tandem MS (detection), thereby
circumventing several limitations of gel-based methods, and providing
improved access to membrane-associated and low abundance proteins [13].
Nonetheless, these methods, like 2DE-MS, still focus on measuring changes
in protein abundance and, therefore, provide only an indirect estimate
of dynamics in protein function. Indeed, several important forms of
posttranslational regulation, including protein-protein and protein-small
molecule interactions [ 3 ] ,may elude detection by abundance-based proteomic
methods.
To facilitate the analysis of protein function, several proteomic methods
have been introduced to characterize the activity of proteins on a global scale.
These include large-scale yeast two-hybrid screens [14] and epitope-tagging
immunoprecipitation experiments [ 15, 161, which aim to construct compre-
hensive maps of protein-protein interactions and protein microarrays [ 17,
181, which aim to provide an assay platform for the rapid assessment of
protein activities. Although these methods have the advantage of assigning
specific molecular functions to individual protein products, they typically rely
on the recombinant expression of proteins in artificial environments, and
therefore, do not directly assess the functional state of these biomolecules in
their native settings. It was to address this need for higher order functional
proteomic methods, that ABPP has emerged as a strategy to measure protein
activity rather than abundance (Fig. 7.3-1). In contrast to conventional pro-
teomic strategies, which aim to catalogue the entire complement of protein
products in a given sample, ABPP is designed to address the proteome at the
level of discrete enzyme families, providing a way to distinguish, for exam-
ple, active enzymes from their inactive zymogen [ 191 and/or inhibitor-bound
forms [20].
DNA -b RNA __+ Protein -

b Protein activity
Microarrays
t
Genomics
MudPlT
Proteomics
Chemical
probes
f
ABPP
Fig. 7.3-1 Overview of genomic and (ABPP) applies active site-directed chemical
proteomic methods. Standard genomic and probes t o measure dynamics in enzyme
proteomic approaches measure changes in activities, directly in the context of whole
mRNA and protein abundance, respectively. Proteomes and living systems.
In contrast, activity-based protein profiling
7.3.3
7.3.3.1 Activity-based Protein Profiling (ABPP) - A Chemical Strategy for the

Global Profiling of Enzyme Activities in Complex Proteomes
7.3.3.1.1 The Need for Activity-based Proteomic Methods

As described above, genomic and proteomic approaches assess protein
function indirectly, by measuring changes in mRNA and protein level,
respectively. A proponent of these strategies might reasonably argue that
alterations in transcript and protein level will generally correlate well with
changes in protein function. However, several enzyme families clearly
represent important exceptions to this premise. For example, most proteases
are produced as inactive precursors (zymogens), and upon activation are
often bound by a complex array of endogenous inhibitors that serve as
critical posttranslational regulators of their catalytic activities in uivo [ 3 ,
211. Thus, a change in the level of a given protease may or may not
have functional impact depending on whether the enzyme is processed
and/or its abundance exceeds the level of its endogenous inhibitors
(Fig. 7.3-2).
4
Protease gene
t- Transcription
4
Protease mRNA
+Translation
Inactive
zyrnogen
J- t Secretion
Inactive
4
zyrnogen
+Activation
Endogenous Active
inhibitors protease
1 t Degradation
ECM
Fig. 7.3-2 Schematic o f representative function, including production as inactive

protease posttranslational regulation zymogens, compartmentalization/secretion.
mechanisms. Multiple levels o f and inhibition by endogenous
posttranscriptional and posttranslational protein-binding partners.
regulation of protease expression levels and
408
Chemical probes that can react with proteases in an activity-dependent

manner offer a powerful means to distinguish in a given proteome those
enzymes that are active (free)from those that are inactive (zymogens;inhibitor-
bound), thereby providing a readout of net proteolytic activity. Notably, several
other enzyme families, including kinases [22] and phosphatases [23] also
undergo dramatic changes in their activities in the absence of alterations in
abundance, indicating that numerous classes of enzymes are relevant targets
for ABPP. Moreover, because labeling by ABPP probes is based on conserved
features contained within enzyme active sites (rather than abundance) these
reagents provide a means to access low abundance proteins contained within
samples of high complexity, thus addressing the large dynamic range of
protein expression displayed by most proteomes [24].
7.3.3.1.2 The Design of Chemical ABPP Probes for Functional Proteomics

In the appraisal of strategies for ABPP that focus on protein function rather
than abundance, it is important to consider how the cell regulates protein
activity. In the case of enzymes, most posttranslational regulatory mechanisms
share a common feature in that they perturb, either structurally or sterically,
the active sites of these proteins [3]. Accordingly, it was hypothesized that
chemical probes capable of directly reporting on the integrity of enzyme active
sites might serve as effective activity-based profiling tools capable of parallel
monitoring of many enzymes directly within the proteomes in which they are
naturally expressed. Such “activity-based”probes, can be defined as chemical
reagents that meet the following criteria:
1. React with a broad range of enzymes from a particular
class (or classes) directly in complex proteomes.
2. React with these enzymes in a manner that correlates
with their catalytic activities.
3. Display minimal cross-reactivitywith other undesired
protein classes.
4. Possess a chemical tag for the rapid detection and
isolation of reactive enzymes.
An activity-based probe meeting these requirements could, in principle,

enable the comparative measurement and molecular identification of all the
active members of a given enzyme class present in one or more proteomes.
Importantly, these enzyme activity profiles can be read out in a variety of
formats including gels [20,25], microarrays [26], liquid chromatography-mass
spectrometry (LC-MS)[27], and capillary electrophoresis [28] (Fig. 7.3-3).
7.3.3.1.3 The General Structure of Activity-based Probes: Directed versus

Nondirected Strategies
An activity-based chemical probe consists of at least two general elements:
(a) a reactive group (RG) that binds and covalently modifies the active sites
7.3 Chemical Strategiesfor Activity-based Proteomics 1 409
Fig. 7.3-3 General strategy for RG - reactive group, BC - binding group,

activity-based protein profiling (ABPP). tag - biotin and/or fluorophore.
Proteomes are treated with chemical probes Probe-labeled proteomes can be analyzed
that label active enzymes o f a particular via several different platforms, including
class (or classes) in a manner that allows for gel [20] or microarray [26] analysis o f
their detection, isolation, and identification. probe-labeled enzymes, or capillary
Active enzymes are denoted by electrophoresis (CE) [28] and liquid
openlunshaded active sites, with their chromatography-mass spectrometry
inactive counterparts (e.g., zymogen or (LC-MS) [27] analysis o f probe-labeled
inhibitor-bound forms) shaded in black. tryptic peptides.
of a broad range of enzymes from a particular enzyme class (or classes), and
(b) one or more chemical tags, such as biotin and/or a fluorophore, for the
consolidated detection and isolation of probe-labeled enzymes from complex
proteomes. The RG elements of moderate reactivity and electrophilicity were
selected, thereby priming them to preferentially modify enzyme active sites
that offer a binding pocket enriched in nucleophilic residues important for
catalysis. Finally, in certain cases a third structural element may also be
introduced into probes design in the form o f a binding group (BG) intended
to direct RGs to different enzyme active sites present in the proteome.
Directed ABPP - Probe Design for Enzyme Classes Possessing

Cognate Affinity Labels
Initial strategies for ABPP focused on the design and application of chemical
probes that targeted specific classes of enzymes. In this approach, well-
characterized affinity labels were incorporated as the RG to direct probe
reactivity toward enzymes sharing a similar catalybc mechanism and/or sub-
strate specificity. The design of first-generation serine hydrolase (SH)-directed
ABPP probes, for example, exploited the irreversible inhibition that fluorophos-
phonate (FP) compounds exhibit toward the majority of the members of this
enzyme superfamily (Fig. 7.3-4).To date, these directed ABPP efforts have gen-
erated probes that profile numerous enzyme classes, including members of
all major families of proteases (serine [4,19]cysteine [29-321, metallo [33,34],
aspartyl [35], proteasomal [36,37]),as well as select phosphatases [38,39], ki-
nases [40,41],and glycosidases [42]. Some specific examples of directed ABPP
410
Fig. 7.3-4 Fluorophosphonate labeling o f Reactivity of FPs depends on SHs being

serine hydrolase (SH) active sites. As a catalytically active, which enables FP
result o f a shared catalytic mechanism, reagents coupled with reporter tags t o serve
nearly all SHs are potently and irreversibly as activity-based probes for this large
inhibited by fluorophosphonates (FPs). enzyme family.
probes include: (a) biotinylated/fluorophore-tagged FPs that target the SH

superfamily [4,19], (b) biotinylated electrophilic ketones that target the cas-
pase class of cysteine proteases [30], and (c) biotinylated/fluorophore-tagged
variants of the natural product EG4 that target the papain class of cysteine
proteases [29]. In many of these cases, the chemical probes have been shown
to label their enzyme targets in an activity-dependent manner directly within
complex proteomes, distinguishing, for example, active enzymes from inactive
zymogen or inhibitor-bound forms [4,19,20].
Nondirected ABPP - Probe Design for Enzyme Classes Lacking

Cognate Affinity Labels
From these examples of directed approaches for ABPP it may be extrapolated
that, for enzyme classes with known covalent inhibitors, the design of activity-
based proteomic probes is, at least in concept, straightforward. However,
covalent inhibitors do not yet exist for majority of proteins in the proteome;
therefore, an alternative strategy is needed to discover active site-directed
profiling reagents for proteins lacking known affinity labels. With this goal
in mind, a combinatorial, or “nondirected” strategy for ABPP was introduced
in which libraries of candidate probes with fixed RGs and variable BGs are
synthesized and screened against complex proteomes to identify “specific”
protein labeling events, which are defined as those that occurred in native,
but not heat-denatured proteomes [43,44]. Probe-protein reactions that are
heat-sensitive were predicted to occur in structured, small molecule-binding
sites that would often determine the biological activity of the proteins (e.g., the
active site of an enzyme or ligand-binding pocket of a receptor). In contrast,
proteins reacting with probes in a heat-insensitive manner would be considered
“nonspecific” targets, as these labeling events could occur with either native or
denatured versions of the proteins. This type of general screen to distinguish
specific from nonspecific labeling was deemed particularly important for
7.3 Chemical Strategiesfor Activity-based Proteornics I 41 1
nondirected ABPP, which utilizes probes that, unlike directed reagents, lack
well-established selectivity for a given class of enzymes. Screening libraries of
probes against individual proteomes also provided a complementary method to
detect specifically labeled proteins, which were expected to show selectivity for
a select number of probes on the basis of the structure of their respective BGs
and should therefore be discernible from proteins that reacted indiscriminately
(i.e., nonspecifically) with the probe library.
The utility of nondirected methods for ABPP was initially demonstrated
with a modest-sized library of sulfonate ester (SE) probes bearing varying
alkyl/aryl BGs that was generated and screened against a collection of tissue
and cell line proteomes [43,44]. The SE-group was selected as the library’s
RG based on a general survey of the literature, which revealed that a large
range of enzyme classes, including proteases, kinases, and phosphatases,
are susceptible to covalent inactivation by natural products and/or synthetic
inhibitors that possess carbon electrophiles. Accordingly, it was hypothesized
that ABPP probes incorporating a carbon electrophile RG may prove capable
of profiling enzymes not only within but also across mechanistically distinct
classes. Consistent with this premise, several heat-sensitive protein targets of
the sulfonate library were identified and found to represent members of at least
nine different enzyme classes (Table 7.3-1). Interestingly, each enzyme target
displayed a unique reactivity profile with the SE probe library, indicating
that the structure of the variable BG strongly influenced probe-protein
interactions. Several lines of evidence supported that the sulfonate probes
labeled the active sites of their enzyme targets. For example, the addition
of cofactors and/or substrates was found to inhibit the labeling of several
enzymes, while the reactivity of others was either positively or negatively
affected by known allosteric regulators of catalytic activity [43,441. Notably,
for one enzyme target, aldehyde dehydrogenase-1 (ALDH-1) sulfonate probes
were shown to act as time-dependent inactivators of catalytic activity [43, 441.
Finally, advanced LC-MS platforms for ABPP have revealed that, in nearly
all cases, SE probes label their enzyme targets on conserved active site
residues [27].
While these original studies demonstrated that nondirected strategies can in
fact deliver bonafide activity-based probes for enzyme families not yet accessible
by directed methods, one major drawback still existed in the limited structural
diversity of the SE library, a factor proposed to be responsible for the modest
differences in the proteome reactivity profiles observed for these probes. To
test the hypothesis, that exploring further proteome space would require a
more structurally diverse library of electrophilic agents, one such library was
developed in which an a-chloroacetamide (a-CA)RG was coupled to a variable
dipeptide BG that would enable the intrinsic diversity of amino acid functional
groups to be exploited for probe binding to additional enzyme families [45].In
addition to its tempered electrophilicity (stable under many synthetic chemistry
conditions), the a-CA group is small in size, therefore limiting the likelihood
412
6
5
m
W
m
v)
v)
-
U
W
$
S
W
I
W
c
'0
S
m
v)
2
c
2
m
4-
al
n
ea
a
n
m
Q
7
x
2 %
S P
I_mE
a,
F
414
-
*
3
0"
*
I
7.3 Chemical Strategiesfor Activity-based Proteamics I 41 5
of unduly influencing noncovalent probe-protein interactions driven by the

dipeptide BG. Furthermore, given the precedence of other carbon electrophile
RGs, such as the SEs [43, 441 and epoxides [29], to label a range of active
site residues, it was proposed that the inherent reactivity of the a-CA probe
library would not be strongly biased toward a specific enzyme class. Indeed,
initial studies identified more than 10 different classes of enzymes targeted
by a representative “optimal set” of a-CA dipeptide library members, most
of which were not labeled by previously developed ABPP probes, including
several obesity-associated enzyme activities, and proteins involved in lipid
metabolism and gluconeogenesis (Table 7.3-1).
Collectively, these studies reveal that, through the use of both directed and
nondirected strategies, activity-based probes compatible with whole proteome
analysis can be generated for numerous enzyme classes. While comparing
directed and nondirected approaches for ABPP, it is perhaps most interesting
to note the striking nonoverlap between enzyme targets profiled by each
method (Table 7.3-1). Indeed, none of the SE-labeled enzymes identified to
date represent known targets of directed ABPP probes. This finding suggests
that the amount of “active site space” in the proteome accessible to chemical
profiling is still far from saturation.
7.3.4
7.3.4.1 Biological Applications: Comparative and Competitive ABPP

Methods for ABPP have matured rapidly since their introduction in
the late 1990s, providing a new avenue for identifying novel disease-
associated enzymes (target discovery) and chemical inhibitors thereof
(inhibitor discovery). In addition to highlighting the biological utility of activity-
based proteomic methods to provide information content not readily achieved
by other expression-based techniques, the studies presented in this section
are aimed at demonstrating the benefit of parsing the proteome into tractable
functional units (activity states of given enzyme classes), for the discovery of
disease-related enzymes, as well as lead inhibitors that target these enzymes.
7.3.4.1.1 Comparative Profiling for the Discovery o f Enzyme Activities

Associated with Discrete Physiological and Pathological States
The identification of enzymes selectively expressed by tumor cells and tissues
may provide a rich source of new biomarkers and targets for the diagnosis and
treatment of cancer. In one such effort, the activity, subcellular distribution,
and glycosylation state of members from the SH superfamily of enzymes
was quantitatively profiled across a panel of human cancer cell lines [20].
The SHs represent one of the largest and most diverse enzyme classes
in higher eukaryotic proteomes, consisting of proteases, lipases, esterases,
416
I and amidases, that collectively constitute approximately 1%of the predicted
7 Reverse C h e m i c a l Genetics Revisited
protein products encoded by the human genome. By profiling the secreted,

membrane-associated, and soluble cellular fractions derived from human
breast carcinoma and melanoma lines, this study led to the identification of
SH activities that distinguished cancer lines according to their respective tissue
of origin. Interestingly, nearly all of these activities were downregulated in
the most invasive cancer lines analyzed that instead upregulated a distinct set
of secreted and membrane-associated SH activities. In contrast to the diverse
patterns of enzyme activity observed in the secreted and membrane proteomes
of cancer cells, their soluble proteomes appeared quite similar, with few
enzyme activities exhibiting restricted patterns of distribution. These findings
suggest that, at least for the SH superfamily, the membrane and secreted
proteomes are enriched in enzyme activities that depict cellular phenotype,
highlighting the value of methods, like ABPP, that can analyze technically
challenging proteomic fractions (e.g., secreted, membrane, glycosylated, and
low abundance proteins). More generally, these results suggest that invasive
cancer cells share discrete proteomic signatures that are more reflective of their
biological phenotype than their cellular heritage, highlighting that a common
set of enzymes may support the progression of tumors from a variety of origins
and thus represent attractive targets for the diagnosis and treatment of cancer.
This comparative ABPP analysis was subsequently extended to a more
sophisticated in vivo model of human cancer-breast cancer xenografts grown
in immunodeficient mice [4G]. The mixed species nature ofthe xenograft model
enabled the discrimination of active enzymes that were tumor-associated
(human) or host-derived (mouse), resulting in the identification of several
different classes of activities, including: carcinoma enzyme activities expressed
selectively in culture or in xenograft tumors, as well as host stromal activities
that either infiltrated or were excluded from xenograft tumors.
Interestingly, cell lines derived from xenograft tumors exhibited profound
differences in their enzyme activity profiles, as compared to the parental
line, which correlated with increased tumor growth rates and metastasis
upon reintroduction into mice. In particular, xenograft-derived breast cancer
cells exhibited dramatic elevations in secreted protease activities (urokinase
and tissue-type plasminogen activator), as well as the downregulation of key
glycolytic enzymes (phosphofructokinase). These findings suggest that the
behavior of human cancer cell lines grown in vivo may vary considerably from
their characteristics in culture, and that the in vivo microenvironment of the
mouse mammary fat pad cultivates the growth of human breast cancer cells
with altered enzyme activity profiles and elevated tumorigenic properties.
The benefit of addressing the proteome at the level ofdistinct enzyme classes,
as well as the versatility of ABPP reagents, is highlighted in a third example
of comparative ABPP profiling. In this study, carried out by Greenbaum and
colleagues, activity-based probes were applied to characterize the functional
role of the papain subclass of cysteine proteases in the Plasmodium falciparum
life cycle [47]. While cysteine proteases are known to be essential for the
survival of several human parasites, the specific roles played by these enzymes
during the complex life cycle of P. fulcipururn remain ill defined. ABPP of
P. fulcipurum proteomes isolated at various stages of the parasite life cycle
identified a specific cysteine protease, falcipain 1,that was upregulated during
the invasive merozoite stage of growth. Falcipain 1-selective inhibitors were
then identified by screening epoxide-based chemical libraries for compounds
that blocked probe labeling of this enzyme in complex proteomes. These
inhibitors were subsequently demonstrated to inhibit parasite invasion of host
erythrocytes, with no detectable effect on other parasite processes (as opposed
to the general papain family protease inhibitor, E-64,which produced multiple
aberrations and, ultimately, developmental arrest). Importantly, this ABPP
analysis of falcipain 1 function and inhibition was carried out directly in whole
parasite lysates, circumventing the need for technically difficult gene ablation
experiments and/or recombinant enzyme expressions that often serve as the
basis for such studies.
7.3.4.1.2Competitive ABPP for Discovering Potent and Selective Reversible

Enzyme Inhibitors
While activity-based probes can serve as powerful tools for the discovery of
enzyme activities associated with discrete (patho) physiological function, the
target promiscuity displayed by these profiling agents limits their utility for
defining the biological function of individual enzymes, which often depends on
the development of specific reagents to perturb the protein function of defined
members contained within large enzyme classes. However, as illustrated in
the study done by Greenbaum and colleagues [47, 481, ABPP can in fact be
effectively applied to identify irreversible inhibitors that, for certain enzyme
classes like cysteine proteases, achieve sufficient selectivity to serve as useful
pharmacological agents in vivo. Since, for many enzyme classes, irreversible
inhibitors display poor target selectivity due to their inherent reactivity, it was
also necessary to adapt the ABPP method to serve as an effective primary
screen of reversible enzyme inhibitors as well. Toward this end, Leung and
colleagues devised a competitive screening strategy to evaluate the activity of
libraries of candidate reversible inhibitors, in this case against SH activities
expressed in mouse tissue proteome [49].
In this study, proteomes were incubated with a library of candidate
inhibitors and a SH-directed probe for a restricted period of time during
which most enzymes had not yet reacted to completion with the probe.
Under such kinetically controlled conditions, the binding of competitive
reversible inhibitors to specific enzymes was detected as a reduction in probe
labeling (Fig. 7.3-5). By performing this screen in mouse brain and heart
proteomes using varying inhibitor concentrations, both potencies (ICSO values)
and selectivities of inhibitors were determined concurrently. Importantly,
calculated values, as measured by ABPP, matched closely with K, values,
determined by standard substrate assays, closely. Analysis of resulting data
sets demonstrated that inhibitors selective for individual SHs could be readily
418
Fig. 7.3-5 Inhibitor discovery by ABPP. The analyzed to identify enzymes sensitive t o
potency and selectivity of inhibitors can be individual inhibitors (reflected by a reduction
profiled in parallel by performing in intensity of probe labeling). Active
competitive ABPP reactions in proteomes. enzymes are denoted by open/unshaded
Complex proteomes are treated with a active sites, with their inhibitor-bound
reversible inhibitor library and an counterparts shaded in color.
activity-based probe, and subsequently
distinguished from compounds that displayed comparable or greater activity

toward multiple enzymes. Notably, inhibitors were discovered for both-known
enzymes of therapeutic interest (e.g., fatty acid amide hydrolase) and novel
enzymes that lack known substrates. A further advantage of inhibitor screening
by ABPP is that these analyses can be carried out directly in native proteomes
without the need for recombinant expression or purification of proteins.
Finally, because inhibitors are tested against numerous enzymes in parallel
within the context oftheir native proteomes, promiscuous agents can be readily
triaged in favor of equally potent compounds that display high target selectivity.
Inhibitor screening by ABPP has also facilitated the design of selective
covalent agents for several proteases, including papain-directed ABPP probes
that have been used as in vivo imaging tools for characterizing cathepsin
protease activity in mouse models of human multistage tumorigenesis [SO].
This study culminated in the detection of a pronounced upregulation of
cathepsin activity associated with the angiogenic vasculature and invasive
fronts of pancreatic and uterine cervical carcinomas, distinguishing the
activities derived from the differential expressions in immune, endothelial,
and cancer cells. Consistent with these findings, pharmacological inhibition
of protease activity with a broad-spectrum cathepsin inhibitor at defined
stages of tumorigenesis resulted in the impairment of angiogenic switching in
progenitor lesions, as well as tumor growth, tumor vascularity, and invasion
in the pancreatic model.
7.3.4.1.3 ABPP strategies for the in uiuo Analysis o f Enzyme Activities

The in vivo imaging studies carried out with cysteine protease-directed
probes [SO] underscored the need for a generally applicable methodology
for in vivo analysis of enzyme activities. Indeed, as exemplified by many
protease families, most enzymes are subject to multiple mechanisms for
tightly regulating their activity within the cell, including spatial and temporal
expression, binding to small-molecule or protein cofactors, and posttransla-
tional modification. Furthermore, since the physical disruption of cells and
tissues may alter the concentrations of endogenous activators/inactivators of
enzymes, as well as their respective subcellular distributions, i n vitro proteomic
preparations can only, at best, approximate the dynamic functional state of
proteins within the physiologically relevant environment of the living cell or
organism.
A general method for performing ABPP in vivo required that this strategy
be transformed into a “tagfree” method, as most reporter groups (e.g., biotin
and fluorophores) inhibit the cell permeability and distribution of probes. To
address this issue, bio-orthogonal chemical reactions were sought to enable
ligation of reporter tags onto proteins after covalent labeling by ABPP probes.
In one example, conjugation of the reporter group to the probe following
proteome labeling was accomplished by engineering into these reagents a pair
of biologically inert coupling partners, the alkyne and azide, which can react
to form a stable triazole product via the Huisgen’s 1,3-dipolar cycloaddition
reaction [51, 521. The key to the success of this strategy was the recent
description by Sharpless and colleagues of a Cu(1)-catalyzed,stepwise version
of the azide-alkyne cycloaddition reaction, which can be carried out under
mild conditions to produce high yields of product in rapid reaction times (“click
chemistry” [53]).Click chemistry-based ABPP has been applied to living cells
and organisms, leading to the discovery of enzymes that are selectively labeled
i n vivo but not i n vitro [52]. A second bio-orthogonal reaction, the Staudinger
ligation, has also been applied to profile proteasomal subunits labeled i n
situ with azide-modified probes [37]. Collectively, these studies emphasize
the importance of performing ABPP in vivo and underscore the value of
bio-orthogonal chemical reactions to achieve this goal.
7.3.4.2 Expanding the Scope ofABPP
7.3.4.2.1 Activity-based Probes for the Proteomic Profiling o f Metalloproteases

So far we have described the development of ABPP probes derived from
a combination of two complementary approaches, namely directed and
nondirected ABPP, where covalent modification of enzyme active sites was
achieved by electrophilic labeling of complementary nucleophilic residues.
What about enzyme families that do not utilize an enzyme-bound nucleophile
for catalysis? The metalloprotease family of enzymes, for instance, plays
key roles in many physiological and pathological processes including tissue
remodeling, peptide hormone signaling, and cancer, and are also regulated
by myriad posttranslational events [54],thus making them an attractive target
for ABPP. However, unlike previous enzyme families targeted by ABPP,
metalloproteases (MPs) do not use a protein-bound nucleophile, but rather a
zinc-activated water molecule.
420
To address this important challenge, a novel approach to ABPP probe

design was undertaken, in which a zinc-chelating group (hydroxomate) and
a photocrosslinking group (benzophenone) were incorporated to promote
selective binding and modification of MP active sites, respectively [33, 341 (see
Table 7.3-1 for probe structure). Some of these hydroxamate-benzophenone
(HxBP) probes were shown to serve as bona fide activity-based probes for
several matrix metalloproteases (MMPs), including MMP-2, MMP-7, and
MMP-9, labeling the active forms of these proteases but not their zymogen or
inhibitor-bound variants [33].Interestingly, competitive profiling experiments
carried out with HxBP probes uncovered several MPs in tissue proteomes that
constituted “off-target” sites of action for the MMP-directed inhibitor GM6001.
Notably, none of these enzymes shared any sequence homology with MMPs,
indicating that GM6OOl (a compound currently in clinical trials) inhibits
several MPs outside its intended target family (MMPs) and, more generally,
that these off-target sites may be partially responsible for the repeated failure
of MMP inhibitors developed for clinical use. These findings also emphasize
that enzymes can share considerable active site homology without showing
sequence relatedness and can underscore the value of ABPP for the discovery
of such unanticipated sites of action for inhibitors and drugs.
7.3.4.2.2 Class Assignment o f Sequence-unrelated Members of Enzyme

Superfamilies
As a corollary to the notion that enzyme superfamilies comprise members
that share a common catalytic mechanism, but not necessarily sequence or
structural homology, recent studies have shown that directed ABPP probes,
which typically target a large set of mechanistically related enzymes (e.g.,
SHs, metalloproteases), can also facilitate the identification of unannotated
members of enzyme superfamilies [55, 561.
Typically, probe-labeled activities identified by ABPP can be readily assigned
to a superfamily on the basis of database (BLAST) searches, which identify
conserved sequence elements shared by members of a particular enzyme class.
For instance, in the analysis of the human cancer cell lines described earlier,
numerous FP-labeled protease, lipase, and esterase activities were identified
in this manner. However, one FP target identified in this study, sialic acid
9-O-acetylesterase (SAE), which was selectively expressed in melanoma cell
lines, shared no sequence homology with SHs or any other known enzyme
class. Thus, to determine whether SAE was, in fact, a member of the SH
superfamily, experiments were carried out to determine the site of FP probe
labeling, a site that was identified as a serine residue that is completely
conserved among all SAE family members [55]. Mutagenesis of this residue
to alanine, produced an SAE variant that exhibited negligible FP-labeling and
enzyme activity, indicating that SAE and its sequence homologs constitute a
novel branch of the SH superfamily. More generally, these findings suggest
that ABPP can uncover cryptic members of enzyme classes that have eluded
classification based on sequence comparisons, an important discovery given

the large numbers of unannotated proteins that have come out of recent
eukaryotic and prokayotic genome sequencing projects, and “orphan” or
cryptic members of many enzyme classes likely still exist in these proteomes.
7.3.5
Future Development
The discipline of chemistry is perhaps uniquely suited to provide powerful

new tools and methods for the functional analysis of the proteome. A s has
been highlighted in this chapter, chemical approaches for ABPP have, over the
past few years, enjoyed an intense phase of technical innovation, during which
these strategies have advanced our understanding of the role that enzymes
play in complex physiological and pathological processes. Looking forward,
researchers interested in broadening the scope and impact of ABPP are faced
with several conceptual and experimental challenges. First, active site-directed
chemical probes, which constitute the fundamental currency of ABPP, have, to
date, only been developed for a modest portion of the proteome. The successful
generation of proteomic-compatible profiling reagents for additional enzyme
(and protein) classes will likely require the synthesis of more structurally
diverse libraries of candidate probes, which may be either directed (e.g..
possess reactive and/or BGs that bias probe affinity for certain enzyme classes)
or nondirected in nature. Enticing forays have already been made into “high-
priority” enzyme families, like kinases [40,411 and phosphatases [38, 391,
suggesting that most, if not all, enzyme classes should be amenable to active
site profiling in whole proteomes.
In the development of new active site-directed proteomic probes, it is also
important to consider the fidelity with which these reagents will report on
changes in protein activity. For certain probes, like the FPs, which react
with conserved catalytic residues in the active sites of their enzyme targets,
probe labeling has been shown to provide an excellent readout of catalytic
activity. However, it is likely that other probes may be discovered that
modify enzyme active sites on noncatalytic residues, akin to the manner
in which microcystin labels a noncatalytic cysteine residue in serine/threonine
phosphatases [57]. Although such active site-directed labeling events would
not be considered purely activity-based in a mechanistic sense, from a more
biological perspective, if, as is commonly the case, enzyme activity is regulated
in vivo by steric blockade of the active site (by autoinhibitory domains or
protein/small molecule-binding partners, for example) [ 3 ] , then any probe
that is sensitive to these molecular interactions should effectively report on the
functional state of enzymes in complex proteomes. More generally, these issues
highlight the importance of understanding the molecular basis for individual
probe-enzyme reactions, especially those originating from nondirected ABPP
422
I efforts, where the parameters that dictate probe bindingllabeling are not always
obvious.
Finally, as the proteome coverage of ABPP continues to grow, it is
becoming clear that this strategy would benefit from improved methods for
the qualitative and quantitative analysis of probe-labeled samples. Currently,
most probe-labeled proteomes are analyzed by 1DE or 2DE, which exhibit
limited resolving power, especially for large protein families with members
of similar molecular mass. Future efforts to merge ABPP with gel-free (e.g.,
LC-MS [27], capillary electrophoresis [28]) proteomic platforms, may provide a
complementary strategy for resolving large numbers of probe-labeled enzyme
activities. The enhanced resolution offered by gel-free methods may permit the
multiplexing of ABPP probes, such that proteomes of limited quantity could
be analyzed simultaneously with a collection of probes. Adapting ABPP for
direct LC-MS analysis should also permit comparative quantitation of probe-
labeled proteomes by isotope-coded mass tagging [ l l ] . Still, it is important to
emphasize that, although such LC-MS platforms will surely exhibit superior
resolving power compared to 1DE gel-based methods for analyzing probe-
labeled proteomes, the 1DE approach does possess the advantage of exhibiting
much higher throughput (i.e., dozens of proteomes can be compared on
a single gel). Thus, the choice of whether to employ gel-based or gel-free
strategies (or both) for the analysis of ABPP experiments will likely depend on
the scientific problem under examination, with the former strategy being more
suitable for the rapid comparison of large numbers of proteomes and the latter
approach being superior for the in-depth analysis of a restricted set of samples.
In either case, continued efforts to advance both the chemical and technical
components of ABPP should foster the development of an increasingly robust
and sensitive platform for the functional analysis of both the proteome and its
individual constituents.
7.3.6
Conclusions
The field of proteomics aims to develop new tools and methods for the
functional characterization of proteins on a global scale. The daunting size and
diversity of eukaryotic proteomes, however, have inspired efforts to approach
this goal by developing technologies that address the proteome as tractable
functional units, that is, the profiling of activity state of specific enzyme classes.
In this chapter, we have attempted to illustrate how ABPP offers a powerful
strategy to directly access higher order biological information to assist in
elucidating the function of proteins in complex cell and organismal systems.
Ultimately, the general and systematic application of ABPP will likely require
the advent of integrated platforms for the design, synthesis, and analysis of
chemical probes that target a large diversity of enzyme classes. However,
as outlined here, the success of ABPP studies carried out thus far suggests
References I 4 2 3
that this goal may in fact be attainable. This is highlighted by the impressive
number of enzyme classes for which activity-based probes have already been
developed as a result of both directed and nondirected approaches, as well as
the insights that have been gained by applying ABPP to complex biological
systems, ranging from cancer cells and tumors to invasive malarial parasites
to mouse models of obesity.
More broadly, this chapter has attempted to emphasize the potential ofABPP
to identify new diagnostic markers and therapeutic targets for human disease.
Through the integration of the comparative and competitive profiling platforms
that have been described here, ABPP provides a powerful new avenue for
the parallel discovery of disease-associated enzymes (target discovery) and
chemical inhibitors thereof (inhibitor discovery), thus complementing the
studies being carried out within other realms of chemical biology, as well
as providing valuable tools and insight that can be beneficial across multiple
disciplines, extending from the lab to the clinic. Indeed, it has been recently
stated that chemical biology, as a whole, has as one of its grand challenges the
charge of identifying small-molecule modulators for each individual function
of all human proteins [58], which would address the large gap that currently
exists between basic and clinical research. We anticipate that ABPP will play
an important role in achieving this goal.
Acknowledgments
The authors would like to acknowledge the support of the National Institutes of
Health [CA087660(B.F.C.)],the California Breast Cancer Research Foundation
(N.J. and B.F.C.), and the Skaggs Institute for Chemical Biology.
References
1. P.O. Brown, D. Botstein, Exploring the profiling, Cum. Opin. Chew. Biol.
new world of the genome with DNA 2004, 8, 54.
microarrays, Nut. Genet. 1999, 21, 33. 6. L.J. van’t Veer, H. Dai, M.J. van de
2. S.D. Patterson, R. Aebersold, Vijver, Y.D. He, A.A. Hart, M. Mao,
Proteomics: the first decade and H.L. Peterse, K. van der Kooy, M.J.
beyond, Nat. Genet. 2003, 33, 311. Marton, A.T. Witteveen, G.J.
3. B. Kobe, B.E. Kemp, Active Schreiber, R.M. Kerkhoven,
site-directed protein regulation, Nature C. Roberts, P.S. Linsley, R. Bernards,
1999,402,373. S.H. Friend, Gene expression
4. Y. Liu, M.P. Patricelli, B.F. Cravatt, profiling predicts clinical outcome of
Activity-based protein profiling: the breast cancer, Nature 2002, 415,530.
serine hydrolases, Proc. Natl. Acad. 7. R.A. Heller, M. Schena, A. Chai,
Sci. U.S.A. 1999, 96, 14694. D. Shalon, T. Bedilion, J. Gilmore,
5. N. Jessani, B.F. Cravatt, The D.E. Woolley, R.W. Davis, Discovery
development and application of and analysis of inflammatory
methods for activity-based protein disease-related genes using cDNA
424
I microarrays, Proc. Natl. Acad. Sci. 16. Y. Ho, A. Gruhler, A. Heilbut, G.D.
U.S.A. 1997, 94, 2150. Bader, L. Moore, S.L. Adams,
8. T. Kodadek, Protein microarrays: A. Millar, P. Taylor, K. Bennett,
prospects and problems, Chew. Biol. K. Boutilier, L. Yang, C. Wolting,
2001, 8,105. I. Donaldson, S. Schandorff,
9. W.F. Patton, B. Schulenberg, T.H. J. Shewnarane, M. Vo, J. Taggart,
Steinberg, Two-dimensional M. Goudreault, B. Muskat,
electrophoresis: better than a poke in C. Alfarano, D. Dewar, Z. Lin,
the ICAT? Curr. Opin. Biotechnol. K. Michalickova, A.R. Willems,
2002, 13, 321. H. Sassi, P.A. Nielsen, K.J.
10. V. Santoni, M. Molloy, T. Rabilloud, Rasmussen, J.R. Andersen, L.E.
Membrane proteins and proteomics: Johansen, L.H. Hansen, H. Jespersen,
un amour impossible? Electrophoresis A. Podtelejnikov, E. Nielsen,
2000, 21,1054. J. Crawford, V. Poulsen, B.D.
11. S.P. Gygi, B. Rist, S.A. Gerber, Sorensen, J. Matthiesen, R.C.
F. Turecek, M.H. Gelb, R. Aebersold, Hendrickson, F. Gleeson, T. Pawson,
Quantitative analysis of complex M.F. Moran, D. Durocher, M. Mann,
protein mixtures using isotope-coded C.W. Hogue, D. Figeys, M. Tyers,
affinity tags, Nat. Biotechnol 1999, 17, Systematic identification of protein
994. complexes in Saccharomyces
12. M.P. Washburn, D. Wolters, J.R. Yates cerevisiae by mass spectrometry,
111, Large-scale analysis of the yeast Nature 2002, 415, 180.
proteome by multidimensional 17. G. MacBeath, S. Schreiber, Printing
protein identification technology, Nat. proteins as microarrays for
Biotechnol. 2001, 19, 242. high-throughput function deter-
13. D.K. Han, J. Eng, H. Zhou, mination, Science 2000, 289, 1760.
R. Aebersold, Quantitative profiling of 18. H. Zhu, M. Bilgin, R. Bangham,
differentiation-induced microsomal D. Hall, A. Casamayor, P. Bertone,
proteins using isotope-coded affinity N. Lan, R. Jansen, S. Bidlingmaier,
tags and mass spectrometry, Nat. T. Houfek, T. Mitchell, P. Miller, R.A.
Biotechnol. 2001, 19, 946. Dean, M. Gerstein, M. Snyder, Global
14. T. Ito, T. Chiba, R. Ozawa, analysis of protein activities using
M. Yoshida, M. Hattori, Y. Sakaki, A proteome chips, Science 2001, 293,
comprehensive two-hybrid analysis to 2101.
explore the yeast protein interactome, 19. D. Kidd, Y. Liu, B.F. Cravatt, Profiling
Proc. Natl. Acad. Sci. U.S.A. 2001, 98, serine hydrolase activities in complex
4569. proteomes, Biochemistry 2001, 40,
15. A.C. Gavin, M. Bosche, R. Krause, 4005.
P. Grandi, M. Marzioch, A. Bauer, 20. N. Jessani, Y. Liu, M. Humphrey, B.F.
J. Schultz, J.M. Rick, A.M. Michon, Cravatt, Enzyme activity profiles of the
C.M. Cruciat, M. Remor, C. Hofert, secreted and membrane proteome that
M. Schelder, M. Brajenovic, depict cancer invasiveness, Proc. Natl.
H. Ruffner, A. Merino, K. Klein, Acad. Sci. U.S.A. 2002, 99, 10335.
M. Hudak, D. Dickson, T. Rudi, 21. Y.A. DeClerck, S. Imren, A.M.P.
V. Gnau, A. Bauch, S. Bastuck, Montgomery, B.M. Mueller, R.A.
B. Huhse, C. Leutwein, M.A. Heurtier, Reisfeld, W.E. Laug, Proteases and
R.R. Copley, A. Edelmann, protease inhibitors in tumor
E. Querfurth, V. Rybin, G. Drewes, progression, Adv. Exp. Med. Biol. 1997,
M. Raida, T. Bouwmeester, P. Bork, 425,239.
B. Seraphin, B. Kuster, G. Neubauer, 22. M. Huse, J. Kuriyan, The
G. Superti-Furga, Functional conformational plasticity of protein
organization of the yeast proteome by kinases, Cell 2002, 109, 275.
systematic analysis of protein 23. H. Shirato. H. Shima, G. Sakashita.
complexes, Nature 2002,415,141. T. Nakano, M. Ito, E.Y. Lee,
References I 4 2 5
K. Kikuchi, Identification and M. Bogyo, Activity-based probes that

characterization of a novel protein target diverse cysteine protease
inhibitor of type 1 protein families, Nat. Chem. Biol. 2005, I , 33.
phosphatase, Biochemistry 2000, 39, 33. A. Saghatelian, N. Jessani, A. Joseph,
13848. M. Humphrey, B.F. Cravatt,
24. G.L. Corthals, V.C. Wasinger, D.F. Activity-based probes for the
Hochstrasser, J.C. Sanchez, The proteomic profiling of
dynamic range of protein expression: a metalloproteases, Proc. Natl. Acad. Sci.
challenge for proteomic research, U.S.A. 2004, 101, 10000.
Electrophoresis 2000, 21, 1104. 34. E.W. Chan, S. Chattopadhaya, R.C.
25. D. Greenbaum, A. Baruch, Panicker. X. Huang, S.Q. Yao,
L. Hayrapetian, Z. Darula, Developing photoactive affinity probes
A. Burlingame, K.F. Medzihradszky, for proteomic profiling:
M. Bogyo, Chemical approaches for hydroxamate-based probes for
functionally probing the proteome, metalloproteases, J . Am. Chem. Soc.
Mol. Cell. Proteomics 2002, I , 60. 2004, 126,14435.
26. S.A. Sieber, T.S. Mondala, S.R. Head, 35. Y.M. Li, M. Xu, M.T. Lai, Q. Huang,
B.F. Cravatt, Microarray platform for J.L. Castro, J. DiMuzio-Mower,
profiling enzyme activities in complex T. Harrison, C. Lellis, A. Nadin, J.G.
proteomes, J . Am. Chem. Soc. 2004, Neduvelil, R.B. Register, M.K.
126,15640. Sardana, M.S. Shearman, A.L. Smith,
27. G.C. Adam, J.J. Burbaum, J.W. X.P. Shi, K.C. Yin, J.A. Shafer, S.J.
Kozarich, M.P. Patricelli, B.F. Cravatt, Gardell, Photoactivated
Mapping enzyme active sites in gamma-secretase inhibitors directed
complex proteomes, J . Am. Chem. SOC. to the active site covalently label
2004, 126,1363. presenilin 1, Nature 2000, 405, 689.
28. E.S. Okerberg, J. Wu, B. Zhang, 36. M. Groll, T. Nazif, R. Huber,
B. Samii, K. Blackford, D.T. Winn, M. Bogyo, Probing structural
K.R. Shreder, J.J. Burbaum, M.P. determinants distal to the site of
Patricelli, High-resolution functional hydrolysis that control substrate
proteomics by active-site peptide specificity of the 20s proteasome,
profiling, Proc. Natl. Acad. Sci. U.S.A. Chem. Biol. 2002, 9, 655.
2005, 102,4996. 37. H. Ovaa, P.F. Van Swieten, B.M.
29. D. Greenbaum, K.F. Medzihradszky, Kessler, M.A. Leeuwenburgh,
A. Burlingame, M. Bogyo, Epoxide E. Fiebiger, A.M. Van Den
electrophiles as activity-dependent Nieuwendijk, P.J. Galardy, G.A. Van
cysteine protease profiling and Der Marel, H.L. Ploegh, H.S.
discovery tools, Chem. Biol. 2000, 7, Overkleeft, Chemistry in living cells:
569. detection of active proteasomes by a
30. L. Faleiro, R. Kobayashi, H. Fearn- two-step labeling strategy, Angew.
head, Y. Lazebnik, Multiple species of Chem., Int. Ed. Engl. 2003, 42, 3626.
CPP32 and Mch2 are the major active 38. S. Kumar, B. Zhou, F. Liang, W.Q.
caspases present in apoptotic cells, Wang, Z. Huang, Z.Y. Zhang,
E M B O J . 1997, 16,2271. Activity-based probes for protein
31. A. Borodovsky, H. Ovaa, N. Kolli, tyrosine phosphatases, Proc. Natl.
T. Can-Erdene, K.D. Wilkinson, H.L. Acad. Sci. U.S.A. 2004, 101, 7943.
Ploegh, B.M. Kessler, 39. K.R. Shreder, Y. Liu, T. Nomanhboy,
Chemistry-based functional S.R. Fuller, M.S. Wong, W.Z. Gai,
proteomics reveals novel members of J. Wu, P.S. Leventhal, J.R. Lill,
the deubiquitinating enzyme family, S. Corral, Design and synthesis of
Chem. Biol. 2002, 9, 1149. AX7 5 74: a microcystin-derived,
32. D. Kato, K.M. Boatright, A.B. Berger, fluorescent probe for serine/threonine
T. Nazif, G . Blum, C. Ryan, phosphatases, Bioconjugate Chem.
K. Chehade, G.S. Salvensen, 2004, 15, 790.
426
I 40. Y. Liu, K.R. Shreder, W. Gai, S. Corral, 49. D. Leung, C. Hardouin, D.L. Boger,
D.K. Ferris, J.S. Rosenblum, B.F. Cravatt, Discovering potent and
Wortmannin, a widely used phospho- selective reversible inhibitors of
inositide 3-kinase inhibitor, also enzymes in complex proteomes, Nat.
potently inhibits mammalian polo-like Biotechnol. 2003, 21,687.
kinase, Chem. Biol. 2005, 280,99. 50. J.A. Joyce,A. Baruch, K. Chehade,
41. M.C. Yee, S.C. Fas, M.M. Stohlmeyer, N. Meyer-Morse,E. Giraudo, F.Y.
T.J. Wandless, K.A. Cimprich, A Tsai, D.C. Greenbaum, J.H. Hager,
cell-permeable activity-based probe for M. Bogyo, D. Hanahan, Cathepsin
protein and lipid kinases, J. Biol. cysteine proteases are effectors of
Chem. 2005,280(32), 29053-9. invasive growth and angiogenesis
42. D.J. Vocadlo, C.R. Bertozzi, A strategy during multistage tumorigenesis,
for functional proteomic analysis of Cancer Cell 2004, 5, 443.
glycosidase activity from cell lysates, 51. A.E. Speers, G.C. Adam, B.F. Cravatt,
Angew. Chern., Int. Ed. Engl. 2004,43, Activity-basedprotein profiling in vivo
5338. using a copper(1)-catalyzed
43. G.C. Adam, B.F. Cravatt, E. J. azide-alkyne [3 + 21 cycloaddition,J .
Sorensen, Profiling the specific Am. Chem. SOC.2003, 125,4686.
reactivity of the proteome with 52. A.E. Speers, B.F. Cravatt, Profiling
non-directed activity-basedprobes, enzyme activities in vivo using click
Chem. Biol. 2001, 8, 81. chemistry methods, Chem. Biol. 2004,
44. G.C. Adam, E.J. Sorensen, B.F. 11, 535.
Cravatt, Proteomic profiling of 53. H.C. Kolb, K.B. Sharpless, The
mechanistically distinct enzyme growing impact of click chemistry on
classes using a common chemotype, drug discovery, Drug Discov Today
Nat. Biotechnol. 2002, 20, 805. 2003, 8, 1128.
45. K.T. Barglow, B.F. Cravatt, 54. C. Chang, Z. Werb, The many faces of
Discovering disease-associated metalloproteases: cell growth,
enzymes by proteome reactivity invasion, angiogenesis, and
profiling, Chem. Biol. 2004, 1 I , 1523. metastasis, Trends Cell Biol. 2001, 1 1 ,
46. N. Jessani, M. Humphrey, W.H. s37.
McDonald, S. Niessen, K. Masuda, 55. N. Jessani, J.A. Young, S.L. Diaz, M.P.
B. Gangadharan, J.R. Yates 111, B.M. Patricelli, A. Varki, B.F. Cravatt, Class
Mueller, B.F. Cravatt, Carcinoma and assignment of sequence-unrelated
stromal enzyme activity profiles members of enzyme superfamilies by
associated with breast tumor growth activity-basedprotein profiling, Angew.
in vivo, Proc. Natl. Acad. Sci. U.S.A. Chem., Int. Ed. Engl. 2005, 44, 2400.
2004, 101,13756. 56. S.M. Baxter, J.S. Rosenblum,
47. D.C. Greenbaum, A. Baruch, S. Knutson, M.R. Nelson, J.S.
M. Grainger, Z. Bozdech, K.F. Montimurro, J.A. Di Gennaro, J.A.
Medzihradszky, J. Engel, J. DeRisi, Speir, J. J. Burbaum, J.S. Fetrow,
A.A. Holder, M. Bogyo, A role for the Synergistic computational and
protease falcipain 1 in host cell experimental proteomics approaches
invasion by the human malaria for more accurate detection of active
parasite, Science 2002, 298, 2002. serine hydrolases in yeast, Mol. Cell.
48. D.C. Greenbaum, W.D. Arnold, F. Lu, Proteornics 2004, 3, 209.
L. Hayrapetian, A. Baruch, 57. M. Runnegar, N. Berndt, S.M. Kong,
J. Krumrine, S. Toba, K. Chehade, E.Y. Lee, L. Zhang, In vivo and in vitro
D. Bromme, I.D. Kuntz, M. Bogyo, binding of microcystin to protein
Small molecule affinity fingerprinting. phosphatases 1 and 2A, Biochem.
A tool for enzyme family Biophys. Res. Commun. 1995,21 6, 162.
subclassification, target identification, 58. S.L. Schreiber, Small molecules: the
and inhibitor design, Chem. Biol. 2002,
v
missing link in the central dogma,
9. 1085. Nat. Chern. Biol. 2005, I , 64.~
Chemical Biology
8
Tags and Probes for Chemical Biology
8.1
The Biarsenical-tetracysteine Protein Tag: Chemistry and Biological Applications
Stephen R. Adams
Outlook
The biarsenical-tetracysteine method was first introduced more than 7 years

ago, and further refinements and development of novel applications are
still appearing. Within the last few years, biologists have started to exploit
the unique features of this system for probing protein trafficking, turnover,
localization, and dynamics. This review aims to describe the conception and
development of this protein tag and its applications in the biological sciences.
8.1.1
Introduction
The ability to label proteins with green fluorescent protein (GFP) in living cells
has been a major research advance in cell biology in the last decade [I]. In
response to this success, chemical biologists have devised an ever-increasing
variety of alternative methods to provide a wider range of fluorescent colors and
other useful functionalities than those available from GFP and its variants. One
of the key features of GFP is that it can be genetically encoded; that is, the DNA
of the GFP gene can be fused to the DNA of any desired protein by standard
molecular biology techniques and then the chimeric protein can be expressed
in cells, tissues, or transgenic animals [ 2 ] . All the chemical biological methods
incorporate this major stratagem but differ from GFP in that the genetically
encoded peptide or protein sequence does not become autofluorescent (like
GFP) but acts as a specific receptor for derivatives of fluorophores that can
be added exogenously to the expressing cells. The size and structure of this
Edited by Stuart L.. Schreiber, Tarun M. Kapoor, and Gunther Wess
ISBN: 978-3-527-31150-7
428
I receptor can be quite varied, from proteins or enzymes the size of GFP (-240
8 Tags and Probes for Chemical Biology
amino acids) such as 06-alkylguanine-DNA alkyltransferase (AGT) [3-5) and

single-chain antibodies [GI,to small peptide epitopes as small as 6-20 amino
acids [7-91 (Fig. 8.1-1).Binding of the fluorophore derivative with the receptor
can be through covalent or ionic bonds or through noncovalent interactions,
and may be reversible or irreversible.
This review will discuss a method that uses a genetically encoded peptide
epitope; a tetracysteine-containing sequence that forms a high affinity yet
reversible, covalent complex with biarsenical fluorophores [7, 8, 101. This
was one of the first chemical biological methods for tagging proteins to
be introduced and has been particularly useful in applications where the
GFP is (so far) less capable of, such as protein turnover [ll,121, correlated
fluorescence and electron microscopy [I11, and chromophore-assisted light
inactivation (CALI) [13,14].It has also been shown to have advantages over the
conventional chemical labeling of proteins in vitro, as an affinity-purification
handle [8, 151, and as a fluorescence anisotropy probe of protein dynamics
[8, 16, 171. New examples of applications of this method, in progress or
recently published, include targeting fluorescent calcium sensors to channels
inside living cells and replacement for cyan fluorescent protein (CFP) in
fluorescence resonance energy transfer (FRET) sensors of G-protein coupled
receptor (GPCR) activation [18].
Fig. 8.1-1 A comparison o f t h e relative sizes o f GFP and the

biarsenical-tetracysteine complex. The atoms comprising the
chromophores are shown in color with the peptide backbone
depicted in green.
8. I The BiarsenicaI-tetracysteine Protein Tag: Chemistry and Biological Applications 1 429
8.1.2
History and Design Concepts o f the Tetracysteine-biarsenical System
Forming a high-affinity interaction with a peptide as short as 6-20 amino acids

generally requires covalent bonds (a notable exception are the florettes for Texas
Red; Ref. 19). The thiolate of cysteine is one of the most reactive chemical
groups in proteins and its comparable rarity in intracellular proteins offers
some hope of specificity. Well-known reactants of protein cysteines include
arsenite ion and phenylarsenoxides, both of which contain the arsenic(111)
atom. Importantly, these form only weak complexes (about millimolar affinity)
with single cysteines (such as those in glutathione which is present at 5- 10 mM
in cytoplasm) but bind with micromolar affinity to closely spaced pairs of
cysteines. The reaction of such vicinal thiols in cells with arsenic is well
described; as is their regeneration by small dithiols such as 1,l-ethanedithiol
(EDT) by forming more stable, five-membered ring chelates with the arsenic
(Scheme 8.1-1).
The concept was to design a high-affinity ligand containing two arsenic
groups (a biarsenical) that bind four appropriately spaced cysteines (a
tetracysteine) forming a complex that would be stable to such dithiol antidotes.
Thereby, preventing binding of the ligand to endogenous vicinal cysteines or
thiols leading to additional nonspecific or background labeling and toxicity.
The first such molecule was 4’,5’-bis(dithioarsolanyl)fluorescein (F1AsH) that
binds with picomolar affinity to peptides or proteins containing appropriately
spaced tetracysteines with the general sequence Cys-Cys-Xaa-Xaa-Cys-Cysin
which Xaa is an amino acid other than cysteine [7].Such tetracysteine motifs are
very rare in naturally occurring proteins, so only the tagged protein is labeled
X = H, 1.2-ethanedithiol,EDT
X = CHzOH, British Anti-Lewisite,BAL
p.
SSI X
Scheme 8.1-1 The regeneration of protein-lipoates cofactors and enzyme thiols bound to
arsenic by reaction with small dithiols.
8 Tags and Probesfor Chemical Biology
430
I
Fig. 8.1-2 Fluorescent enhancement of FIAsH-EDT2 on binding a tetracysteine peptide.
with FlAsH. When FlAsH is bound to two moles of EDT, forming FlAsH-
EDT2, its fluorescence is almost completely quenched; but on reaction with
a tetracysteine peptide a strongly fluorescent complex is formed (Fig. 8.1-2).
This feature is particularly useful when labeling cells expressing tetracysteine-
tagged proteins, as unbound dye does not have to be fully removed by
washing to generate contrast unlike most alternative labeling methods. Even
so, nonspecific binding of FlAsH to thiols and hydrophobic sites can generate
some background signal that limits the sensitivity of this method compared to
GFP [8, 10, 201.
8.1.3
8.1.3.1 The Chemistry of Biarsenicals

FlAsH-EDT2 is synthesized by the palladium acetate-catalyzed transmetal-
lation of fluorescein 4’,5’-bis-mercuricacetate (or trifluoroacetate) by arsenic
trichloride in polar aprotic solvents such as N-methylpyrrolidinone [lo].Rather
than isolate the resulting unstable dichlorophenylarsine intermediate, EDT is
added to generate FlAsH-EDTZ,which can be purified in modest overall yield
by chromatographyon silica gel (Scheme 8.1-2).FlAsH-EDT2can be hydropho-
bic like its parent fluorescein (e.g., soluble in toluene) or hydrophilic (soluble
in aqueous neutral buffer) because of a reversible lactone-quinone tautomer-
ization (Scheme 8.1-2).FlAsH is therefore permeable across cell membranes
but can still generate a sufficiently high concentration in the cytoplasm to give
a rapid reaction with a tetracysteine-tagged protein.
8.7 The Biarsenical-tetracysteine Protein Tag: Chemistry and Biological Applications I 431
HOTOH
/ , -
HgO
/
O Y C F 3OYCF3
~ 0 00
,
~
,1 ASC13
0
Pd(OAc)z
H
A
'As'"
o
n
~'As' o
/
7
A
H
n
'As'
\ - \
'As'
2
n
/ H + ~ ~
. O0
/ \ TFA / \
\
O
0
2 EDT
' \
\
0
/
\
coz-
Fluorescein Fluorescein 4',5'-bis- FIAsH:EDT, FIAsH-EDT,

mercuric trifluoroacetate Free acid Dianion
lactone tautomer quinone tautomer
I
colorless colored,
non-fluorescent
1 Hg2'
2. -2H'
0
As
0
As
O w , . FlAsHO
Dianion
auinone tautomer
&CO, Colored.
weakly-fluorescent
Scheme 8.1-2 The synthesis of FIAsH-EDT2 and FIAsHO.
Biarsenicals sharing the dihydroxyxanthene skeleton of FlAsH (Sche-

me 8.1-3) can be synthesized analogously (Scheme 8.1-2). Mercuration of
the parent dye usually occurs quite cleanly using mercuric trifluoroacetate in
trifluoroacetic acid as a solvent; using mercuric acetate-acetic acid can lead
to a mixture of substituted products that are difficult to separate. ReAsH,
the corresponding derivative of the red-fluorescent dye resorufin [8], is the
most important biarsenical besides FlAsH as it has additional features as a
photosensitizer in addition to a fluorophore [8].A blue fluorescent biarsenical
181, CHoXAsH-EDT2 completes the range of colors available, although it is
more prone to photobleaching than FlAsH or ReAsH. Biarsenicals substituted
with halogens, carboxylic acids, amines, sulfonic acids, and so on can be
synthesized and are useful in adding other functionalities or reactivities [8].
For example, carboxy- or amino-FlAsH can be used to attach the biarsenical to
a solid support for affinity chromatography of tetracysteine-tagged proteins [8,
151, or are useful intermediates in the synthesis of more complex biarsenicals
such as environmentally sensitive fluorophores [21] and calcium indicators
(unpublished [22]). The sulfo derivative renders the biarsenical membrane
impermeable allowing the labeling of extracellular or membrane proteins with
no intracellular staining 18).Adding halogens such as the chloro substituents
in ChoXasH-EDT2decreases the pH sensitivity of the dye in the physiological
range, whereas adding bromine substituents in FlAsH or ReAsH increases the
photosensitizing properties via the heavy atom effect. Replacing the oxygen
bridge of the xanthone with sulfur has a similar effect, but almost completely
quenches the fluorescence [8].
/ J $o
432
In8 Jags and Probesfor Chemical Biology
n n
S\A<S s,AAis
O&L&-
"CI O 0 w -
CI
CHOXASH-EDT, ReAsH-EDT, \
Red fluorescence BrAsH-EDT,

FIAsH-EDT, Blue fluorescence FRET from GFP, YFP
Green fluorescence FRET to GFP,YFP
FRET from CFP Photoconversion of diaminobenzidine for electron rnicroscopy
n n
v
6
BarNile-EDT,
Qco2 H A 0
Environment-sensitive
fluorescence
OR SulfoFIAsH-EDT,
Immobilized
",1 FIAsH-EDT,
coz-co2- Membrane impermeant
ligand for extracellular AffinltYchromatography
Calcium green FIAsH-EDT, proteins
Low affinity fluorescent Ca2+indicator
Scheme 8.1-3 Biarsenical ligands for tetracysteine tags.
Biarsenicals, which replace one or both of the phenolic groups with amino
substituents to form rhodol or rhodamine biarsenicals, can also be synthesized.
An amino derivative of Nile Red, a napthorhodol, can be converted by the
usual method to give an environmentally sensitive biarsenical fluorophore
(Scheme 8.1-3) [23]. Biarsenical derivatives of tetramethyl rhodamine or
rhodamine B have also been made [S]; the usual mercuration conditions
gave no reaction, but reaction of the free base in nonpolar solvents
was successful. However, despite both rhodamine biarsenicals binding to
tetracysteine peptides, the complexes were neither fluorescent nor colored,
suggesting that the rhodamine is in the lactone form. This is presumably
because steric hindrance between the arsenic-dicysteine group and the N,N-
dialkyl group forces the nitrogen lone pairs out of conjugation and destabilizes
the quinone tautomer. Screening a library of tetracysteine variants failed to
find any sequences that formed fluorescent complexes with these biarsenicals,
appropriately named TrAsH and RbAsH (unpublished results). Rhodamines
lacking alkyl substituents have proven much harder to synthesize so far
and would also fluoresce in the green-like FlAsH; however, their improved
resistance to photobleaching might make them valuable as labels for single-
molecule studies.
Biarsenical derivatives of other fluorophores emitting at longer wavelengths
would also be useful, particularly those based on nonxanthene skeletons
such as cyanines. Naphthofluorescein biarsenicals have been synthesized but

become less fluorescent on binding cysteine-containing peptides (unpublished
results). Cyanines containing two pendant phenylarsenoxides attached by
flexible linkers have been reported but their properties have not yet been
described in sufficient detail [24]. The loss of rigidity between the two
arsenic atoms may lead to a greatly reduced affinity. In general, the
increased distance between the arsenic groups in these biarsenicals based
on such fluorophores would probably require tetracysteine motifs with more
intervening residues between the cysteines, namely, CC(X)3_5CC,which might
result in an orthogonal labeling system to the existing biarsenical-tetracysteine
system.
8.1.3.2 The Tetracysteine Motif

The first tetracysteine peptide to be designed (AcWEAAAREACCRECCARA-
NHZ) was based upon a a-helical peptide with four cysteines incorporated
such that they form a patch on one face of the helix. Amino acids such as
alanine were included to increase the proportion of helical character in such a
short peptide (17 amino acids) and salt bridge forming arginine and glutamate
+
pairs were included at the i. i 4 positions. A number of biarsenicals were
synthesized in which two phenylarsenoxides were connected by linkers of
various lengths or on disubstituted ferrocenes, but they all proved unstable to
even stoichiometric amounts of dithiol antidotes. Only the complex formed
with the rigidly spaced arsenic groups of FlAsH was sufficiently stable with
the low concentrations of EDT required to prevent promiscuous binding to
other cellular thiols. A helical structure of the complex was proposed as a
model monoarsenical increased helicity in dicysteine peptides according to
circular dichroism measurements. This could not be shown to be the case with
the FlAsH-tetracysteine complex because of obscuring absorbance changes
from FlAsH at these wavelengths (Griffin, unpublished). As a test of the
proposed helical structure of the complex, we designed a peptide containing
the helix-breaking residues proline and glycine interposed between the pairs of
dicysteine residues [8].Surprisingly, FlAsH readily bound to this peptide with
higher affinity (Table 8.1-1)forming a more fluorescent complex. Preliminary
N M R studies with a short peptide, AcWDCCAECCK-NH2,indicated through
interactions consistent with a hairpinlike structure, with the turn centered at
the residues between the cysteines (unpublished results with D. Wemmer).
A more conclusive answer must await either more complete NMR work or a
crystal structure.
Confirmation of the increased affinity of FlAsH for the motif CCPGCC
came from detailed kinetic studies of the on- and off-rates for the complexes
[8]. The reaction of FlAsH-EDT2 with tetracysteines involves at least two
steps, as each EDT has to dissociate before binding the cysteines. To simplify
the reaction, we looked at the reaction of FlAsH lacking EDT in which
the arsenics are present as arsenoxides (Scheme 8.1-2). Reaction of FlAsH
434
I 8 Tags and Probes for Chemical Biology
Table 8.1-1 Fluorescent properties and stability of

FIAsH-tetracysteine complexes
Peptide sequence ccxcc Fluorescence Apparent kd

X= quantum (in 20 mM
yield ofcomplex monothiol)
(PM)
AcWDCCCCK-NH2 0.14 1800

AcWDCCACCK-NH2 A 0.6 67
AcWDCCGCCK-NH2 G 0.55 100
AcWDCCPCCK-NH2 P 0.6 150
AcWEAAAREACCRECCARA-NH2 RE 0.5 70
AcWDCCAECCK-NH2 AE 0.58 72
AcWDCCSECCK-NH2 SE 0.58 42
AcWDCCDECCK-NH2 DE 0.5 41
AcWDCCPGCCK-NH2 PG 0.67 4
AcWDCCGPCCK-NH2 GP 0.44 72
AcWDCCDEACCK-NH2 DEA 0.23 92 000
bis-arsenoxide (generated by addition of two equivalents of Hg2+) with a

tetracysteine peptide can be followed by an increase in fluorescence and occurs
rapidly with rates of lo6 M-' s-l . To determine the off-rates of the complex,
we used an exchange reaction with a large excess of the red biarsenical,
ReAsH also present as the arsenoxide. To increase this extremely slow
reaction and also to mimic intracellular thiol concentrations, the off-rates
were determined in the presence of varying amounts of a monothiol, 2-
mercaptoethane sulfonate (MES). Even with 20 mM MES, exchange rates with
the AcWDCCPGCCK-NH2 complex still took several weeks for completion,
indicating affinity constants of the complex in the low picomolar range.
Devising a method of measuring the complex stability allowed some testing
of the optimal spacing of the four cysteines in the tetracysteine peptide
(Table 8.1-1).Sequences with spacing of one or two amino acids turned out to
be considerably more stable than the ones with zero or three residues, with the
latter having off-rates up to 3 orders of magnitude greater. Additionally, the
fluorescent quantum yields of the complexes roughly paralleled their stability
with the CCPGCC sequence yielding a value of >0.6, comparable with that
of GFP.
Studies with model peptides were also able to suggest the orientation of
FlAsH binding to a tetracysteine peptide [8]. Each arsenic atom can bind
+
to adjacent cysteines, namely, at the i, i 1 positions or to alternating
cysteines, i, i 4.The reaction of the two peptides, AcWEAAARECCARA-
+
NH2 and AcWEACARECAARA-NH2 with a fluorescein monoarsenical atom
gave binding kinetics and fluorescent products similar to those of FlAsH to
a tetracysteine peptide with the first peptide only, indicating FlAsH prefers
+
the i , i 1configuration. HPLC (high-performanceliquid chromatography) of
8. I The Biarsenical-tetracysteine Protein Tag: Chemistry and Biological Applications I 435
the reaction products of FlAsH with different peptides indicate that a number
of isomeric products can be formed, indicating that such different binding
configurations are possible. However, with all the peptides containing the
CCXXCC motif, only two products were formed with identical fluorescent
quantum yields, suggesting conformational isomers involving the hindered
benzoic acid group. These isomers interconvert at pH 7 and are only isolatable
under the acidic conditions of the HPLC separation. Indeed, ReAsH that has
no benzoic acid substituent forms only a single product with these tetracysteine
peptides.
8.1.3.3 Optimizing the Tetracysteine Sequence with Peptide Libraries

Despite the picomolar affinity of FlAsH for peptides containing the CCPGCC
motif, further improvements seemed possible as a fluorescein monoarsenical
bound to a dicysteine peptide with an affinity of 100nM, simplistically
suggesting an affinity of = 10 fM was theoretically possible. A higher
affinity ligand is desirable to increase the specificity of labeling with FlAsH
in living cells. Under typical labeling conditions, low concentrations (10 pM)
of the EDT antidote must be used to ensure that labeling occurs within
a reasonable time frame as high EDT concentrations reverse the reaction
with tetracysteines. Under these less stringent conditions, nonspecific binding
of FlAsH to cellular thiols occurs, leading to some fluorescent background
staining. This staining is almost completely removable by washing with a
high (millimolar) concentration of antidotes such as British anti-Lewisite
(BAL) but this completely reverses the desirable staining to tetracysteine-
tagged proteins [8]. Optimized tetracysteine sequences that are stable in such
dithiol concentrations would greatly enhance the specificity of the biarsenical-
tetracysteine method.
A library approach was required to sample sufficient optimal peptides. We
used a retrovirally transduced library of tetracysteine peptides fused to GFP in
mammalian cells, which could be labeled with ReAsH and screened for high
affinity (resistance to dithiol) and enhanced fluorescence with a fluorescence
activated cell sorter (FACS) [22, 251. Advantages of this approach include the
reducing cellular environment that maintains the thiol form of the cysteines
required for reaction with biarsenicals (phage libraries containing cysteines are
usually avoided because of the formation of disulfides). Peptides are screened
in the environment in which they will be used and any peptides that are toxic
or express poorly will be selected against. Inclusion of GFP as a reference
fluorophore allows measurement of ReAsH binding through FRET while
expression levels of the peptide can be assessed by emission from GFP. This
method also selects for an optimal GFP-tetracysteine combination (i.e., no
unfavorable effect of the peptide on the folding efficiency and fluorescence of
GFP) for biological applications that use GFP fluorescence but require ReAsH
for additional functionalities such as pulse-chase labeling, photoconversion,
or CALI. Using a retroviral vector allows generation of libraries with high
8 Jags and Probesfor Chemical Biology
436
I complexity (>lo8 different peptide sequences) and integration into the cell’s
DNA forms stable cell lines expressing single variants with high expression
levels. Recovery of the peptide sequence can be achieved by Polymerase chain
reaction (PCR). Finally, FACS permits comparatively quick sorting of cells
(about 10 million cells/day) on the basis of their fluorescence properties at
several different wavelengths (i.e., FRET or ReAsH fluorescence) with either
pooling or single-cell collection options available.
Two libraries were constructed; the first to test the feasibility of the approach
and optimize the residues between the cysteines and some flanking residues,
and the second to incorporate the intervening residues into a larger library
of flanking residues. The first retroviral library, RR1 included all amino acids
other than cysteine as a C-terminal fusion peptide to GFP, was used to infect
CHO cells so that on an average only one sequence was expressed per cell.
Initial FACS analysis of the ReAsH stained cells showed as expected a range of
GFP to ReAsH fluorescence ratios, with a few cells showing increased ReAsH
fluorescence. Those cells were collected and expanded, and then restained and
sorted at higher stringency for binding by increasing the dithiol concentration
during the washing step. After three cycles of increasing EDT washes and
sorts, single cells were sorted, and the resulting 10 clones were sequenced.
The clonal cell lines, which included several duplicates, were those assayed
on a multiplate reader or a fluorescence microscope for ReAsH binding when
titrated with EDT. The most resistant clones contained the CCPGCC motif
with the highest affinity sequence being MPCCPGCCGC which showed a
twofold improvement in its resistance to EDT compared to our previous best
peptide, AEAAARECCPGCCARA. The additional cysteine when mutated to
serine showed no change in EDT resistance, indicating that ReAsH still bound
to the invariant cysteines.
Encouraged by this confirmatory result, the second library RR2 consisting
of XXXCCPGCCXXX (where X is any amino acid other than cysteine) fused
to the N-terminus of GFP was used to produce 30 million transduced HEK
293 cells. This time the top 10-15% cells in the FRET-to-ReAsH ratios were
collected by FACS, so no library members were lost through overly restrictive
sorts that lost library diversity because of the high degree of noise in such
single-cell measurements. Ten subsequent sorts at ever-increasing BAL wash
conditions enriched the population in high binders and improved fluorescence
until the cells were sorted into two pools for either high FRET or high ReAsH
fluorescence. Dithiol titration of these cells indicated many lines, showing
that high ReAsH resulted merely from GFP overexpression while the high
FRET fluorescence lines had the desired properties. Cloning and sequencing
some of these lines gave two peptide sequences (HRWCCPGCCKTF and
FLNCCPGCCMEP) with greatly improved BAL resistance (up to 10-fold
improvement) and increased ReAsH fluorescence (a twofold increase in
brightness). The two sequences showed little consensus for acidic or basic
residues at any position but did include a surprisingly high number of aromatic
residues including fluorescent quenchers such as tryptophan.
8. I The Biarsenical-tetracysteine Protein Tag: Chemistry and Biological Applications I 437
These improved properties were maintained when fused to cellular proteins

such as actin and gave increased staining specificity compared to earlier
tetracysteine sequences. The improvements were independent of GFP (as this
was coselected for in the screen this was a distinct possibility: in fact a peptide
that precipitated GFP when bound to ReAsH and mimicked high affinity was
one of the final sorted clones) so actin tagged with the peptides alone, showed
high contrast staining in the presence of up to 1 mM BAL. Comparison of
the staining sensitivity of these new peptides with GFP showed that they were
about 15-fold lower with detection limits in the low micromolar for diffuse
cytoplasmic staining which is about lo-fold higher than with our previous best
peptide.
8.1.3.4 Toxicity and Antidotes

Toxicity is often an initial concern for potential users of the biarsenical-
tetracysteine system for labeling living cells. However, the typical staining
conditions used for FlAsH and ReAsH (50 nM-2.5 pM biarsenical and 10 pM
EDT) with incubations of a few hours have shown no general effects on
cell viability, signaling pathways, or mitochondria1 toxicity (this is the site
of several proteins containing lipoic acid cofactors sensitive to arsenic - see
Scheme 8.1-1)[7].Inclusion of the usual 10 pM EDT during staining does not
even appear necessary according to a recent labeling protocol [26].Cells can be
stained overnight (usuallywith lower concentrations of dye) with no apparent
negative effects. Cells stained with FlAsH or ReAsH have been followed
through cell division by fluorescence microscopy (unpublished results), and
have been clonally expanded following single-cell sorting, suggesting little
if any long-term effects from labeling [25]. Sensitive primary cells such as
cultured neurons, tolerate staining with biarsenicals [12] in addition to more
robust tissue culture cell lines. Reports of staining of transgenic flies do
not mention any adverse effects although the desired labeling was ultimately
unsuccessful as FlAsH was preferentially absorbed by fat bodies presumably
because of its high lipophilicity [13].
A number of dithiol arsenic antidotes are useful with the biarsenical-
tetracysteine method. EDT is the simplest, most hydrophobic, and has the
weakest affinity for biarsenicals that makes it most suitable as the antidote
used during labeling. Solutions of FlAsH-EDT2 can lose the bound EDT
through a process of hydrolysis followed by irreversible oxidation such that
labeling cells with a 10-foldexcess of EDT usually improves the staining. FlAsH
lacking one or both bound EDT is less membrane permeable but readily reacts
with free EDT at physiological pH. Removing nonspecific staining is best
achieved with BAL as it is about 2.5-fold more efficacious than EDT [8];it
is also less toxic to cells (it is still used clinically for treating acute arsenic
and heavy metal poisoning, especially for organoarsenic compounds), more
soluble in the cell culture solutions, and has a considerablyless offensive odor.
However using it (at appropriately low concentrations) during cell staining
438
I can prevent labeling presumably because the FlAsH-BAL2 that is formed
is considerably less membrane permeable (unpublished results). The high

concentrations of dithiol used in destaining, particularly with the optimized,
high-affinitytetracysteines could cause cellular toxicity. BAL has been reported
to decrease cell viability when present continuously at concentrations over
1 mM [27]; however, the brief incubation of a few minutes required for
the destaining of biarsenical labeled cells seems much less toxic. BAL and
EDT are comparatively weak reducing agents compared to such dithiols as
dithiothreitol (DTT) and probably do not significantly alter the redox potential
of the cytoplasm (which contains 5-10 mM reduced thiol as glutathione).
They do not appear capable of reducing oxidizing environments such as the
endoplasmic reticulum and Golgi as the reduction of oxidized tetracysteine-
tagged proteins in such compartments requires the addition of more potent,
reducing agents such as phosphines (unpublished results).
Aromatic dithiols such as 1,2-benzenedithiol and 3,4-toluenedithiol also
quench the fluorescence of FlAsH when bound to the arsenic groups
(unpublished results). Their increased lipophilicity and weaker binding to
biarsenicals compared to EDT, make them faster but less specific reagents
for labeling tetracysteines in cells. So far, all attempts to make dithiol analogs
that have a higher affinity for biarsenicals than EDT (to reduce nonspecific
staining) but with a low-molecular-weight and suitable polarity have been
unsuccessful.
8.1.3.5 Comparison with Other Small-molecule Labeling Systems

The revolutionary effect of GFP on cell biological research has inspired
numerous, often ingenious, alternatives requiring the interplay of chemistry
and biology [3-6, 9, 19, 28, 291. All these methods, including the biarsenical-
tetracysteine method generally require the addition of an exogenous ligand,
usually labeled with a fluorophore that has to penetrate cell membranes
(for more useful intracellular applications), bind specifically, and remove the
excess by washing. The difficulty in achieving these steps with high efficiency
and speed for a range of different conjugated fluorophores and labels is the
major limitation of all these methods. By comparison, autofluorescent proteins
do not require any additional components other than molecular oxygen and
so are far easier to use experimentally. In general, the more complex the
method is (i.e., the number of components such as receptor, ligand/substrate,
modifying enzyme, etc.) the more limited the applications. Of course, GFP
and related proteins have limitations: size of tag, colors available, no readouts
other than fluorescence; yet these have proven to be less of a problem than
what might have been expected, and active research has made steady progress
in improving their properties and extending applications. Widespread use
of alternative methods needs to offer new functionalities such as sensors,
caged molecules, and so on and with additional readout modes such as MRI
(magnetic resonance imaging) and PET (positron emission tomography), in
addition to improvements in its practicability in simple tissue culture cells

and eventually intact organisms. The development of such new molecules will
require chemical ingenuity coupled with expertise and knowledge of biological
systems.
A detailed comparison between genetically encoded tags for proteins
is beyond the scope of this review; the topic has been reviewed quite
extensively [30-321 and elsewhere in this book (Chapter 8.2). Ultimately,
such comparisons must include an assessment of the biological discoveries
made with these methods, as has been attempted in this review for one ofthem,
the biarsenical-tetracysteine system. Many newer methods have not yet had
the time to progress beyond proof-of-concept experiments and await testing
in the more complex and challenging experiments that are often required for
advances in biological knowledge.
8.1.4
Practical Applications of the Biarsenical-tetracysteine System
8.1.4.1 A Small, Genetically Encoded Fluorescence Tag

The small size of the biarsenical-tetracysteine tag has made it useful in
biological studies requiring a genetically encoded fluorescent tag and GFP
is not tolerated or has deleterious effects because of its size. FlAsH binding
occurs rapidly, does not require any protein secondary structure to generate
fluorescence (unlike GFP and its variants that can take minutes to days to
become fully fluorescent) and should therefore be a more faithful reporter
of the initiation of protein synthesis. The following examples include studies
in viruses, bacteria, yeast, and mammalian cells and also indicate the broad
applicability of the biarsenical-tetracysteine system.
8.1.4.1.1 Trafficking of Viral Coat Proteins

The biarsenical-tetracysteine tag has been used successfully to monitor the
targeting and trafficking of Ebola virus coat protein, VP40 to lipid rafts in
mammalian cells [33]. Tagging this protein with GFP results in a failure
to efficiently form virus-like particles (VLP). However, a 17 amino acid N-
terminal tetracysteine tag (WEAAAREACCRECCARA), when coexpressed
with the viral glycoprotein, resulted in the efficient release of filamentous
viral particles that could be visualized by electron microscopy. HEK 293T cells
transfected with tagged VP40, stained with FlAsH gave a plasma membrane
(PM) localization but with an unexpected fraction in the intracellular globular
or tubular compartments. FlAsH staining correlated well with fixed cells
labeled with anti-GP40 antibody. Live cell imaging of VP40 and its truncation
mutants, coupled with biochemical fractionation experiments suggests that
the protein concentrates in lipid rafts at the PM and on oligomerization results
in viral budding.
4-40
I 8 Jags and Probesfor Chemical Biology
8.1.4.1.2 Infection of Mammalian Cells by Yersinia Bacteria

The small size of the biarsenical-tetracysteine tag was also used advantageously
to investigate the role of specific proteins of the pathogenic bacteria Yersinia in
the infection of mammalian cells [34].This family of bacteria, which includes
Yersinia pestis - responsible for the plague, infects cells by injecting various
Yops (Yersinia outer proteins) through a thin needle complex. Two of the 14
Yops present in Yersinia enterocolitica, YscMl and YscM2, which are known to
regulate the expression of these proteins in the bacteria, were tagged with a
16 amino acid tetracysteine tag. These were used to directly detect secretion
of these proteins into HeLa cells following infection, washing, fixation, and
labeling with FlAsH. The tag did not appear to effect protein function as it
confirmed biochemical differential fractionation studies that show that YscM2
was injected into the cytoplasm. Using a larger tag, such as GFP, might
hinder secretion of these proteins through the narrow pore of the needle
complex.
8.1.4.1.3 Comparing CFP and Tetracysteine Tags to j3-tubulin in Yeast

GFP as a fusion tag is surprisingly well tolerated in living cells and even
organisms despite its size. For example, in yeast when GFP was systematically
fused to the C-termini of all the open reading frames in their chromosomal
location, over 75% of the resulting proteins were expressed at levels that
permitted subcellular localization of the protein [35]. Recently, the effect
of tagging B-tubulin with GFP or with much smaller one, two, or three
tandem tetracysteine tags (lOaa, 20, or 29aa respectively) has been compared
as chromosomal insertions in yeast strains [36].Generally, the larger the tag
the more effect on yeast viability. For example, one or two tetracysteine tags
were viable in haploid yeast (in which only the tagged protein is expressed),
whereas three tetracysteine tags or GFP could only be tolerated in diploid
heterozygous yeast (with an unmodified tubulin also present) although both
could be incorporated into microtubules. However, even one tetracysteine
had a discernible effect on spore viability and showed a subtle growth defect
at elevated temperatures. All the yeast strains expressing tetracysteine-tagged
B-tubulin could be labeled with FlAsH after overnight staining of growing cells
and gave the expected microtubule staining with remarkably low nonspecific
background. However, the staining of B-tubulin with multiple tetracysteine
tags were no more fluorescent than the singly tagged version although they
were more resistant to photobleach. This is reminiscent of fluorophore-labeled
antibodies that are quenched at high dye-to-proteinratios and may be caused
by dye-dye stacking interactions.
8.1.4.1.4 Replacing CFP with FlAsH in a FRET Sensor o f C-protein Coupled

Receptor (CPCR) Activation
One of the most successful and novel applications of GFP has been in the
design of FRET sensors of biochemical pathways [l]. Two-color mutants of
8. I The Biarsenical-tetracystei~eProtein Tag: Chemistry and Biological Applications
GFP, typically CFP and YFP (yellow fluorescent protein), which are capable
I 44’
of FRET are fused with intervening proteins or protein domains. Biochemical

pathways that modify these domains, for example, phosphorylation, or binding
of a small ion or molecule such as Ca2+or CAMP,alter the distance and/or the
orientation of CFP and YFP relative to each other, changing the degree of FRET
occurring. The resulting ratiometric fluorescence signals allow monitoring of
such pathways by the imaging of single cells, tissues, and organisms.
Recently, Lohse’s group has replaced YFP with FlAsH in two FRET sensors of
GPCR activation with two advantageous consequences [18].These constructs
consist of a C-terminal CFP and a tetracysteine site inserted in the third
intracellular loop of either the A2-adenosine receptor or the (Y 2-adrenergic
receptor. FlAsH labeling of cells expressing these chimeras showed FRET
from CFP to FlAsH that was modulated by the binding of the receptors to their
natural ligands, adenosine and nor-epinephrine respectively. Similar sensors
containing CFP inserted at this loop (with a C-terminal YFP) also respond to
receptor activation but their FRET changes are small (a few percent change
in the FRET ratio). However, the presence of CFP greatly impairs G-protein
coupling and therefore the downstream signaling of the receptors [37]. In
contrast, the FlAsH versions give much larger changes in FRET (three- to
fivefold), permitting single-cell imaging of the adrenergic receptor activation.
Importantly, for the adenosine receptor, downstream activation of adenyl
cyclase was unaltered compared to the wild-type receptor, indicating the less
perturbative effect of the FlAsH-tetracysteine tag compared to the much larger
fluorescent proteins. Such improvements in these fluorescent reporters should
be very useful in the development of optical assays for GPCRs.
8.1.4.1.5 Release of Mitochondria1Cytochrome c during Apoptosis

One of the key events in programmed cell death or apoptosis is the release
of cytochrome c from the mitochondria1 matrix into the cytoplasm where it
triggers the formation of the apoptosome, resulting in caspase activation and
eventual cell death. What controls the release of cytochrome c and what extra
steps are required (if any) following loss of the mitochondria1 transmembrane
potential ( A Q m ) are current areas of active research. Single-cell studies of
cytochrome c release have used cytochrome c-GFP as a fluorescent marker in
conjunction with a fluorescent indicator of A q m and the revealed release is
sudden, rapid, and complete before any changes in the A q m [38]. However,
concerns have been voiced about the size of GFP (-30kDa) relative to
cytochrome c (10.5 kDa) and whether this assay faithfully reports the behavior
of the endogenous polypeptide. Recent work [39] has taken advantage of
the small size of the biarsenical-tetracysteine system to construct a much
smaller fluorescent reporter (13.3 kDa). Interestingly, this construct when
labeled with FlAsH in single living cells targeted to the mitochondria and
was released in a similar manner and kinetics to the GFP reporter, again
preceding decreases in A Q m. Additionally, the ReAsH-labeled reporter could
442
I be monitored simultaneously with the GFP reporter in the same cell and
showed that identical kinetics demonstrating the pores generated during

mitochondria1 permeability do not hinder diffusion of large proteins.
8.1.4.1.6 An Assay for Targeted Nuclear Acid Repair for Gene Therapy in Yeast
The biarsenical-tetracysteine system has also been used to develop an in vivo
assay of nucleic acid repair, targeted by DNA-RNA hybrid oligonucleotides
[40, 411. These double-stranded hairpin-capped molecules form a double-D
loop structure on hybridization with the targeted chromosomal gene, initiating
repair, and have been used to correct mutated genes in several animal models.
As a model system to investigate the proteins involved in such repair and
the conversion efficiency, yeast was transfected with a plasmid that expresses
a mutated neomycin marker containing an internal stop codon TAG, which
is tagged with C-terminal 19 amino acid tetracysteine tag. Expression of the
protein results in premature termination, so no fluorescence is seen when the
cells are labeled with FlAsH. Repair oligonucleotides can be coelectroporated
into the cells and if repair is successful, the G is converted to C and a
complete protein is produced resulting in green fluorescence after labeling
with FlAsH. The biarsenical-tetracysteines system is advantageous over GFP
as the dyes bind and generate fluorescence rapidly with no requirement for
protein folding, allowing the rate of conversion of different oligonucleotides
to be compared. Cellular inheritance of the repair can also be demonstrated
by washing out the label, expanding individual cells, and then by relabeling;
long-lived GFP molecules might yield false positives.
8.1.4.1.7 Monitoring Protein Synthesis and Folding in Bacteria with FlAsH

To generate an indicator of protein folding, Gierasch and coworkers inserted
a tetracysteine motif into a surface-exposed loop of a mammalian cellular
retinoic acid binding protein (CRABP 1) and expressed it in E. coli [42]. A
CCGPCC site was created by mutating two adjacent amino acids to cysteines
around an existing GP followed by insertion of two additional cysteines.
The tetracysteine insertion had a minimal effect on folding compared to the
wild type. Unexpectedly, FlAsH-labeled protein increased in fluorescence on
folding, allowing the measurement of the free energy of unfolding and urea-
dependant denaturation of CRAB P in the crowded cellular environment of
living E. coli bacteria. Preloading the cells with FlAsH permitted the real-time
monitoring of protein folding and the formation of inclusion bodies by a
slow-foldingmutant, which show up as bright fluorescent puncta at the poles
of the cells. Although other assays of protein folding that use GFP have
been proposed, this method probes directly the protein of interest. As FlAsH
binding can occur as soon as the protein is synthesized, it also potentially
reports much faster than GFP, which can take up to tens of minutes to fold
and fluoresce.
8.1.4.2
8.7 The Biarsenical-tetracysteine Protein Tag: Chemistry and Biological Applications
Multicolor Pulse-chase Labeling

I 443
One noticeable feature of the biarsenical-tetracysteine system is the high

stability of the complex when formed, with off-rates up to weeks in vitro [8].
This property can be exploited with a two-color pulse chase of protein turnover
in living cells, using the sequential addition of FlAsH and ReAsH to label
old and newly synthesized proteins. The time course of protein turnover can
be determined simply by varying the time interval between removal of the
first label and the addition of the second label. Unlike traditional biochemical
methods of pulse chase that use incorporation of radioactive amino acids, the
two-color method allows continuous imaging of single cells and can reveal
additional information about subcellular localization of protein turnover.
8.1.4.2.1 Turnover of Connexin43 in Gap Junctions

The two-color pulse chase was developed to examine the turnover of gap
junctions in HeLa cells (111.Gap junctions are channels that connect adjacent
cells and are permeable to small molecules and ions. They are composed
of connexins, a family of transmembrane proteins with molecular weights
ranging from 25 to 50 kDa. A functional channel is composed of 12 connexin
subunits with each adjacent cell contributing a hexameric hemichannel.
Large semicrystalline clusters of these channels form gap junctional plaques.
Specific FlAsH or ReAsH staining of plaques in HeLa cells could be achieved
by expressing connexin43 tagged with a C-terminal EAAAREACCRECCARA
tetracysteine motif (Fig. 8.1-3). Old protein was first stained with FlAsH, the
unbound dye being removed by washing with a low concentration of EDT,
and subsequent freshly synthesized connexin43 was visualized by staining
with ReAsH. When the period between FlAsH and ReAsH staining was
24 h, green fluorescence was not visible in the gap junctional plaques but in
cytoplasmic vesicles indicating complete turnover of connexin43 in that time
period. However, decreasing the staining interval to about 4 h (to match the
expected half-life of connexin43) gave striking images of plaques with green
centers surrounded by a red rim (Fig. 8.1-4) indicating freshly synthesized
protein additions to the outside of the plaque with the old protein removed
from its center. The cytoplasm contained abundant separate green and red
vesicles of labeled protein trafficking to and from the plaque respectively.
8.1.4.2.2 Activity Dependant Turnover and Trafficking of Glutamate Receptors

in Neurons
The role of AMPA glutamate receptor trafficking in the neural plasticity
required for learning and memory is currently an area of active research
in neurobiology. Malenka and coworkers tagged the carboy termini of two
AMPA receptor subunits, GluRl and GluR2 with the tetracysteine motif,
EAAAREACCRECCARA and used two-color pulse-chase labeling to look at
their trafficking and localized synthesis in living cultured hippocampal neurons
by confocal microscopy [12]. The tagged proteins following staining gave the
444
I
Fig. 8.1-3 Specificity of FlAsH staining in (a) FlAsH fluorescence, (b) staining with a
HeLa cells expressing Cx43-tetracysteine. A Cx43-specific antibody, (c) overlay of these
gap junction plaque between two channels combined with a propidium iodide
transfected cells is marked with an arrow. stain (blue) to indicate nucleii.
Fig. 8.1-4 Two-color pulse chase of connexin43-tetracysteine in HeLa cells. See text for
details.
expected localization to synaptic puncta with no detectable signs of toxicity

caused by the procedure. Labeling existing GluR with ReAsH, and freshly
synthesized protein with FlAsH gave clear evidence of an increase in the rate
of synthesis of GluRl but not GluR2 under pharmacological conditions (activity
blockade) known to enhance synaptic strength in such cultures. An important
control was that adding the protein synthesis inhibitor cycloheximide,
8.1 The BiarsenicaI-tetracysteine Protein Tag: Chemistry and Biological Applications
immediately after ReAsH labeling, greatly reduced subsequent FlAsH staining

indicating that ReAsH had saturated all preexisting tetracysteine-tagged GluR.
That both GluR subunits were locally synthesized in dendrites rather than the
cell body was demonstrated by physically isolating these regions by transection
and showing the appearance of freshly synthesized protein at the synapses.
Again, activity blockade only increased the localized synthesis of GluRl and
not GluR2 in transected neurons. The application of this technique to other
synaptic proteins synthesized in response to activity may permit elucidation
of the molecular mechanisms involved in synaptic plasticity.
8.1.4.2.3 Probing the lntracellular Site o f Synthesis ofthe HIV-1 Gag Protein
Recently, the two-color pulse chase has been used to image the dynamics
of recently synthesized Gag, a primary structural protein of human
immunodeficiency virus type 1 (HIV-1) in living HeLa, Me1 JuSo, and
Jurkat T cells [43].The biarsenical-tetracysteine system was used for its small
size and because binding of the dye is independent of localized secondary
structure unlike GFP that only generates fluorescence after folding (various
mutants have half-lives of 30 min-4 h). Gag was tagged with a C-terminal
improved sequence (GSMPCCPGCCGC)derived from the first peptide library
screen described above, and gave good FlAsH staining in these cell types
that colocalized with anti-Gag antibody staining. Deconvolution microscopy
revealed that Gag-TC (tetracysteine) localized primarily to discrete areas
(possiblylipid rafts) of the (PM) plasma membrane even when using two-color
pulse chase to detect recently synthesized protein (-30 min) suggesting that
Gag is synthesized close to the PM. Gag-tetracysteine and similar construct
containing an extended linker were compatible with forming VLPs when cells
were transfected with a plasmid containing the complete HIV-1 genome. These
lower expressing viral plasmids also gave good plasma membrane staining;
although, the construct with a longer linker showed more intracellular vesicular
staining that colocalized with markers for the protein degradation pathway. The
importance of posttranslational myristoylation of Gag for correct targeting was
demonstrated, as mutations at this site gave diffuse cytoplasmic fluorescence
with no plasma membrane or organellar fluorescence. In contrast, mutations
in the L-domain required for efficient budding from the PM gave no effect on
Gag localization.
8.1.4.3 Environment-sensitive Fluorescent Biarsenicals

The fluorescence of FlAsH and ReAsH are, like the parent fluorophores
fluorescein and resorufin, relatively insensitive to the local protein environment
that the tetracysteine is surrounded by. Such insensitivity is useful in
quantitative studies such as protein localization or trafficking in cells when
the protein monitored may experience different local environments. However,
environment-sensitive fluorophores are very useful probes of changes in
446
I protein conformation and the synthesis of such biarsenical derivatives has
permitted their use in in vitro studies of purified proteins or in living
cells [21, 231. Umezawa and coworkers have pioneered this application of
the biarsenical-tetracysteine system with the synthesis of BarNile-EDT2 and
mansyl FlAsH-EDT2.Their first strategy was to add two arsenoxide groups
to the 9-amino derivative of Nile Red, a well-known environment-sensitive
fluorophore containing the phenoxazine ring system of ReAsH. The 9-
amino group was chosen rather than the diethylamino group of Nile Red
because of potential steric hindrance with the dithioarsolanyl substituent; an
effect seen with the biarsenical derivative of rhodamines (see Section 8.1.3.1).
BarNile specifically bound to tetracysteine peptide fused to GST (glutathione
S-transferase) and to calmodulin (CaM)-tetracysteine in vitro and the latter
construct gave a small (10%)increase in fluorescence in living cells when Ca2+
was added. CaM undergoes a large conformational change upon metal binding.
Larger fluorescent increases of up to almost 40% were achieved with the
mansyl-FlAsH (an amino derivative of FlAsH conjugated to the environment-
sensitive mansyl fluorophore) possibly through a PET process. CaM labeled
with FlAsH at the N-terminal helix has been reported to give a 12% decrease
in fluorescence on binding with Ca2+ [16]but the same construct with BarNile
gave negligible change. These results indicate the difficulty in predicting
the optimal site for an environment-sensitive fluorophore even in well-studied
proteins, such as CaM, and the difficulties in interpreting any spectral changes.
8.1.4.4 Fluorescence Anisotropy o f the FIAsH-tetracysteine Complex

The four arsenic-sulfur bonds present in the F1AsH-tetracysteine complex
rigidly lock the fluorophore to the peptide so that any rotational motion reflects
that of the peptide or protein rather than the dye. This is in direct contrast
to conventional coupling chemistries used to modifi. proteins in which the
fluorophore is attached by a rotatable single bond to the flexible side chain of
an amino acid.
8.1.4.4.1 Protein Dynamics of Calmodulin on Ca2+ Activation

This useful feature of the biarsenical-tetracysteine labeling system was
demonstrated [8, 161 by the high fluorescence anisotropy values found for
CaM labeled with FlAsH at the N-terminal helix mutated to give the motif
CCEQCC. By comparison, CaM labeled with FITC (fluorescein isothiocyanate)
at a lysine residue had low anisotropy values in nonviscous aqueous solutions
reflecting decoupling of mobility of the fluorophore and protein. Increases
in the steady-state anisotropy of the FlAsH-labeled CaM on Ca2+ binding
revealed that this helical region rotates somewhat freely of the remaining
protein until rigidified by the Ca2+-inducedconformational change [ 161. The
ninefold slower labeling of Ca2+-boundCaM labeling by FlAsH was consistent
with this, if FlAsH preferentially bound disordered structures.
8. I The Biarsenical-tetracysteine Protein Jag: Chemistry and Biological Applications
8.1.4.4.2 Monitoring Proteolysis with Biarsenical-Tetracysteines

F1AsH-tetracysteine anisotropy was shown to be useful in monitoring the
rates and specificity of proteases in cleaving affinity-purification tags from
expressed proteins [44]. The initial high values of anisotropy and subsequent
large decreases measurable on cleavage of a FlAsH-labeled 3-4 kDa peptide
fragment from the target protein permitted parallel real-time measurements
in a plate-reader format. Strategies involving a CCPGCC motif adjacent to
N-terminal his6 and S-tag affinity sites or when inserted in multidomain
proteins were both successful. Optimization of protease cleavage sites and the
ability to easily monitor completion of reaction are important for large-scale,
automated purification of proteins for structure analysis. Alternative methods
using HPLC, capillary electrophoresis or polyacrylamide gel electrophoresis
(PAGE)are discontinuous and time-consuming.
8.1.4.4.3 Determining the Structure o f the Phospholamban Pentamer

by Homo FRET
The high stability of the biarsenical-tetracysteine complex and the sensitivity
of its fluorescence polarization to localized protein dynamics complex has
recently been used to probe the structure of the pentameric oligomer of
phospholamban [17], a key regulator of contractility in the heart. Tetracysteine
sites were formed at three internal sites within the a-helical region involved
in oligomerization by mutation of existing amino acids at positions 5, 6,
9, 10, or 23, 24, 27, 28, or 41, 42, 45, 46 to give sequences containing
CCLTCC, CCRQCC, and CCLLCC motifs respectively. A fourth construct with
an N-terminal tag used the hairpin-favoring sequence MCCPGCCMDK. All
these sites could be labeled with biarsenicals apart from CCLLCC site in
the transmembrane region of phospholamban. The latter site was probably
resistant to labeling because of suppression of cysteine ionization in the
membrane. Labeling of these mutant phospholambans with a mixture of
FlAsH and ReAsH followed by separation of the pentamer from monomer by
gel electrophoreses, revealed the presence of intraoligomer FRET occurring
in the gel. Similar FRET could be measured by confocal microscopy in
living sf21 insect cells expressing the constructs by again labeling with
both FlAsH and ReAsH. The uncertainty of the relative stoichiometries of
labeling with FlAsH and ReAsH prevents determination of the distance
between the tetracysteine sites in each oligomer. However, measuring
homoFRET measurements using in-gel fluorescence anisotropy with F1AsH-
labeled phospholamban did allow such distances to be calculated. Surprisingly,
the amount of FRET decreased as the labeling site is moved away from the
transmembrane region toward the N-terminus and was consistent with the
pentamer having a helical pinwheel conformation in which each N-terminus is
slightly bent back toward the membrane. This work provides strong evidence
that FlAsH can bind tightly to a-helices containing tetracysteine sites without
inducing a hairpin turn. The leucine/isoleucine zipper-stabilized quaternary
448
I conformation of phospholamban in these helices presumably disfavors hairpin
formation.
8.1.4.5 Single-molecule Studies Using Biarsenical-tetracysteines

There has been considerable interest in single-molecule experiments in the
last 5-10 years as they often give different and unique insights from ensemble-
averaged measurements [45]. Fluorescence techniques give the required
sensitivity but require very photostable fluorophores such as rhodamines
or cyanines. As the biarsenicals currently available are based on the less
photoresistant fluorophores of fluorescein and resorufin, single-molecule
studies using tetracysteine labeling are somewhat limited. If more photostable
biarsenicals can be synthesized, their potential for specificallylabeling proteins
in cells and complex mixtures would be of considerable use in single-molecule
experiments. Despite these current limitations, the use of both FlAsH and
ReAsH has been demonstrated to be feasible for some applications.
8.1.4.5.1 Single-molecule Fluorescence Anisotropy Measurements

o f Calmodulin
Protein motions in single F1AsH-labeled CaM molecules tethered to glass
slides have been measured by anisotropy using time-correlated single-photon
counting in a confocal microscope [46]. Average anisotropy values were similar
to bulk measurements but showed wide variability from molecule to molecule.
Decay rates indicated that rapid-scale protein motions occur in the N-terminal
domain on a nanosecond timescale but limited signal-to-noise levels precluded
detailed analysis. Comparable experiments with CaM labeled with Texas Red
failed to detect such motions because of faster dye rotation, independent of
the protein motions.
8.1.4.5.2 Nanometer Localization of Single ReAsH-tetracysteineComplexes

Selvin and coworkers have shown that a single ReAsH molecule (bound
to CaM-tetracysteine on the surface of a glass slide) can be localized with
a precision of 5 n m in less than a second using total internal reflection
microscopy and imaging the emitted photons with a CCD (charge-coupled
device) camera [47]. This technique requires collecting thousands of emitted
photons per molecule and determining their center of distribution, and
has recently been used by this group to measure the step size of the
molecular motors, myosin V and kinesin, using conventional single-molecule
fluorophores such as Cy3 and GFP [48-SO]. ReAsH was shown to produce
more photons than GFP before photobleaching (but < Cy3) and it could be
used to follow the 25 or 40 n m stepping movements with variable stepping rate
shown by molecular motors. This study opens up the possibility of using the
biarsenical-tetracysteine system to measure the movements of such proteins
inside living cells.
8.7 The Biarsenical-tetracysteine Protein Tag: Chemistry and Biological App/ications I 449
8.1.4.6 Photoinduced Generation of Singlet Oxygen by Biarsenicals

Unlike GFP, FlAsH and ReAsH can generate significant amounts of reactive
oxygen species such as singlet oxygen when strongly illuminated. The physical
processes responsible for this are illustrated in Scheme 8.1-4. Excitation of
the biarsenical dye promotes it to its first excited state, which usually rapidly
decays to the ground state in a few nanoseconds by emitting a photon,
resulting in fluorescence. Occasionally, intersystem crossing in the singlet
state results in the formation of the comparatively long-lived (microseconds)
triplet state that can sensitize molecular oxygen (whose ground state is
triplet) producing highly reactive singlet oxygen and the ground state of
the biarsenical dye. Strong illumination can therefore catalytically generate
many hundreds to tens of thousands of singlet oxygen molecules per dye
molecule. The cycle is stopped by destruction of the dye (photobleaching),
often on reaction with singlet oxygen or its by-products. Most fluorophores
(and chromophores) are capable of generating singlet oxygen, especially
those with increased rates of intersystem crossing (or high triplet quantum
yield) resulting from incorporation of atoms of high atomic weight such
as bromine, iodine, or sulfur. However, the fluorophore of GFP does not
seem to be an efficient generator of singlet oxygen probably because it is
deeply buried within the interior of the P-barrel structure (Fig. 8.1-1).Oxygen
diffusion to and singlet oxygen diffusion away from the chromophore are
probably strongly hindered. Singlet oxygen’s high reactivity (with a half-
life in aqueous solutions of a few microseconds) results in highly localized
zones of this species around the generating fluorophore. This property has
been exploited with the biarsenical-tetracysteine system to locally inactivate
proteins (CALI) or to generate localized precipitates that permit visualization
of the tagged protein on electron microscopy (diaminobenzidine (DAB)
photoconversion).
Scheme 8.1-4 The photogeneration of singlet oxygen by the ReAsH-tetracysteine

complex and its use for CALI and photoconversion of diaminobenzidine (DAB).
450
I 8.1.4.6.1 Chromophore or Fluorophore Assisted Laser or Light Inactivation
(CALI or FALI)
Jay and coworkers used illumination of antibodies that had been labeled with
the photosensitizer malachite green or fluorescein and bound to the target
protein to acutely inactivate the protein in living cells with high specificity
[511. The requirements for antibodies that did not block the biological
functions of the protein before CALI, their microinjection into cells, and
laser excitation are all limitations, that can be avoided using tetracysteine-
tagged proteins labeled with biarsenicals. Independently, Davis and coworkers
and our group have demonstrated the high efficiency and specificity of
FlAsH and ReAsH-mediated CALI in living cells [13, 14, 521. CALI offers
advantages to alternative methods-such as transgenic knockouts, siRNA, and
pharmacological methods-in that the inactivation can be acute (within a few
seconds), it is generally applicable provided the tetracysteine tag is tolerated,
and can be spatially targeted to specific cells or even subcellular regions of a
single cell.
FIAsH-FALI Inactivation of Synaptotagmin

Davis and coworkers used FlAsH-mediated CALI (or FIAsH-FALI) to
investigate the function of the synaptotagmin, a protein involved in transmitter
vesicle release at synapses and neuromuscular junctions in animals [13].
Drosophila synaptotagmin I (Syt I) tagged with a AEAAARECCRECCARA
motif at the C-terminus and expressed in fly larvae Syt I null mutants had
no detectable deleterious-biological effects. Specific labeling of the expressed
protein in dissected larvae with FlAsH could be achieved with no noticeable
background staining or perturbation of normal transmitter release monitored
by patch-clamp recordings. A few seconds of illumination with a mercury
lamp in an epifluorescence microscope, produced Syt I inactivation without
affecting other proteins necessary for vesicle fusion. Surprisingly, almost
the same degree of inactivation was achieved when Syt 1 was overexpressed
fourfold in wild-type flies suggesting that a dominant negative effect is
possible. In further studies [52], CALI of Syt I combined with simultaneous
imaging of endocytosis of released vesicles with a GFP-based pH-sensitive
reporter, synapto-pHluorin,revealed an additional role for Syt I in endocytosis
of released vesicles.
ReAsH-mediated CALI of Connexin43 and L-type Calcium Channels

The high efficiency of ReAsH in generating reactive oxygen species (for
the photoconversion of DAB in correlated electron microscopy - see below)
compared to that seen with FlAsH suggested the use of ReAsH for CALI. Oded
Tour of our group measured the inactivation of ReAsH-labeled, tetracysteine-
tagged gap junctions using a connexin43 construct [14]. Illumination of gap
junctions in living cells resulted in a rapid (few seconds), light-dependant
decrease in their electrical coupling, measured by double patch-clamp
recordings, indicating channel inactivation. FlAsH-labeled gap junctions

inactivated substantially slowly, with GFP and monomeric red-fluorescent
protein (mRFP) tagged constructs showing minimal CALI. As 12 connexins
are required to form one channel, each of which is labeled with ReAsH
and capable of generating reactive oxygen species, we tested the ability of
CALI to inactivate the a l c L-type Ca channel that contains a single pore-
forming subunit. Using a N-terminal tag containing two tandem, high-affinity
CCPGCC motifs, light-dependant inactivation of the ReAsH-labeled channel
could be monitored by whole cell patch-clamprecordings. Importantly, the low
expression levels ofthese channels in HEK 293 cells and the nonspecific ReAsH
staining prevented visualization of the labeling by fluorescence microscopy
but still permitted specific CALI. The inhibitory effects of scavengers such as
azide and imidazole, enhancement by D2O and increased concentrations of
O2 were evidence for the role of singlet oxygen in CALI.
8.1.4.6.2 ReAsH-mediated Photoconversion of Diaminobenzidine for

Correlated Fluorescence and Electron Microscopy (EM)
Imaging of fluorescently labeled proteins or molecules in cells is a powerful
technique but its limited resolution often prevents precise subcellular
localization and requires the use of electron microscopy. Fluorescent labels
do not intrinsically show contrast in the electron microscope but some can
photosensitize the polymerization of DAB to generate a localized precipitate
that can be stained with electron dense metals. Cells expressing tetracysteine-
tagged proteins stained with ReAsH can efficiently photoconvert DAB, unlike
FlAsH or GFP, and have distinct advantages over alternative EM methods such
as immunogold labeling. The large size of antibodies prevents penetration
into fixed cells resulting in low labeling efficiencies and requires membrane
permeabilization with detergents and low concentrations of fixatives that result
in poor preservation of ultrastructure. For photoconversion of tetracysteine-
tagged proteins, only the small membrane-permeable molecules ReAsH, DAB,
and O2 are required that are compatible with fixation protocols that result in
excellent cellular preservation.
The technique was demonstrated with gap junctions of connexin43-
tetracysteine that had been pulse chased with FlAsH and ReAsH [ll]
as described above in Section 8.1.4.2.1. Fixation of the cells and strong
illumination of ReAsH in the presence of DAB and O2 gave localized
deposition of precipitates that could be stained with osmium tetroxide. Electron
microscopy revealed electron dense material only at cell regions that had been
stained with ReAsH (Fig. 8.1-5).Higher magnification of cross sections of gap
junction plaques showed structures with appropriate dimensions of individual
connexons. Photoconverted vesicles of characteristic size trafficking to or from
the plaque could also be visualized.
The sensitivity of this method still requires improvement before being
generally applicable to all proteins, as it requires high concentrations of
452
Fig. 8.1-5 Correlated fluorescence and ReAsH (red) two-color pulse-chase labeling.
electron microscopy o f ReAsH-labeled (b) The corresponding electron micrograph
connexin43-tetracysteine in HeLa cells. with photoconverted DAB staining indicated
(a) Fluorescence confocal image o f a gap with arrows. (c) Higher magnification
junction plaque after FlAsH (green) and micrograph of boxed region in (b).
localized protein such as those present in the gap junction. Attempts to

improve the ability ofthe biarsenical to generate singlet oxygen have so far been
unsuccessful. Adding the heavy atom substituents of classical photosensitizers
(e.g., Rose Bengal and eosin) such as bromo groups and replacing the xanthene
or phenoxazine ring oxygen with a sulfur atom decrease fluorescence, increase
photobleaching and hydrophobicity that decreases the specific labeling in live
cells [8]. A more promising strategy may be to increase the photostability
of the biarsenical, permitting more singlet oxygen to be produced before
photodestruction. This is probably why ReAsH is far superior than FlAsH in
photoconverting DAB as it is about fivefold more resistant to photobleach.
The ideal molecule will therefore have a modest fluorescent and triplet
quantum yields coupled with high photostability. However, ReAsH-mediated
photoconversion has proved practical for proteins other than connexin43
including actin, Golgi transmembrane proteins, and DNA-binding proteins
(unpublished results from the Ellisman and Tsien labs)
8.1.4.7 Affinity Purification o f Tetracysteine-tagged Proteins

The picomolar affinity of biarsenicals for a tetracysteine peptide and its quick
reversal by millimolar concentrations of dithiols make it a useful system
8. I The BiarsenicaI-tetracysteine Protein Tag: Chemistry and Biological Applications I 453
for affinity purification of recombinant proteins. Two similar approaches

(with comparable results) have been described that differ only in the
chemistry used to immobilize the biarsenical to the solid support. Vale
and coworkers prepared a B-alanyl derivative of FlAsH with an aliphatic amino
substituent in four steps, suitable for reaction with carboxy-sepharose gel
activated as N-hydroxysuccinimide [ 151. Our approach [8] involving coupling
a N-hydroxy succinimidyl ester (NHS) of 5-carboxyF1AsH (prepared from
carboxyfluorescein by the usual steps of mercuration and transmetallation)
with an amino-agarose, has fewer steps, gives higher overall yields, and
involves only simple column separations. Despite these minor differences,
both supports specifically bind proteins tagged with tetracysteines from
bacterial or mammalian lysates with reasonable efficiency. However, care has
to be taken to prevent any oxidation of the tetracysteine thiols, which would
prevent binding to the support, by the inclusion of reducing agents such as
DTT, monothiols, or phosphines. The absence of endogenous tetracysteine-
containing proteins in mammalian and bacterial extracts allows removal of
any contaminating proteins with a simple wash containing low concentrations
of EDT or BAL. The tetracysteine-tagged protein can then be eluted with
a high concentration of a highly water-soluble dithiol antidote such as 2,3-
dimercaptopropanesulfonate (DMPS)or by DTT. The protein purity obtainable
usually exceeds that achievable using the hisG tag.
8.1.4.8 SDS-polyacrylarnide Gel Electrophoresis (PAGE) Analysis

The biarsenical-tetracysteine complex is also stable in the denaturing
conditions used in SDS-PAGE and provides a quick method to check the
specificity of biarsenical staining in live cells, crude assays, or purified
proteins [8, 531. The fluorescent complex can be visualized using a light
box and appropriate filters before staining for protein with Coomassie Blue
or similar reagents. The high sensitivity of this method with FlAsH has been
reported to permit detection limits of less than 1 pmol per band with UV
excitation (Invitrogen), and would probably be higher using blue light. The
high concentration of thiols such as 2-mercaptoethanol or DTT used in some
sample buffers can disrupt the complex so it is advisable to use phosphines
or low concentrations of monothiols or DTT as reducing agents [8]. Any
unbound FlAsH runs with the tracking dye front as a brightly fluorescent
band presumably because of loss of EDT during sample preparation or
electrophoresis.
8.1.5
Future Developments and Applications
Work in progress on future applications of the tetracysteine-biarsenical system

includes targeting fluorescent indicators, such as Ca2f sensors to voltage-
gated Ca2+ channels on the plasma membrane (unpublished results [22]).
454
I Unexpected localized hotspots of channel activity have already been visualized
with this approach in tissue culture cells; future work in more biologically
interesting cell types such as neurons is likely to be informative about the spatial
and temporal dynamics of Ca2+ signaling pathways. Using imaging modes
other than fluorescence are currently being explored, such as luminescence
which in conjunction with time-resolved imaging could lead to more sensitive
detection limits of tagged proteins in cells. Transgenic animals expressing
tetracysteine-tagged proteins have already been described but the development
of new protocols or biarsenical derivatives for labeling live animals and tissues
will probably be required. Nonspecific staining will probably be the major
limitation in these applications and better antidotes will be required to either
prevent or remove such background. This will be aided by the more-than-GO-
years research into more effective dithiol antidotes to combat the enormous
human health problems resulting from arsenic-contaminatedwater supplies
in many parts of the world today. Finally, this method has considerable
potential as a general approach for site-specific labeling of crude or purified
proteins in vitro with any desired probe, for example, for phosphorescence and
fluorescence anisotropy, FRET, NMR, EPR (electron paramagnetic resonance),
and so on, by simple conjugation to a biarsenical. The ability to predict where
tetracysteines can be inserted and labeled in proteins may become possible
with the determination of the three-dimensional structure of the complex and
will lead to more applications, both in vitro and in living cells.
8.1.6
Conclusions
Designing protein tags for use in living cells requires chemistry compatible
with the complex biochemical milieu that proceeds with high reactivity and
selectivity. The biarsenical-tetracysteine method was one of the first such
methods to be developed and the lessons learnt during the process and from
its application to address biological questions should be of general interest to
chemical biologists.
Acknowledgments
I would like to thank all my coworkers over the years in the Tsien and Ellisman
labs who have contributed to the development of the biarsenical-tetracysteine
method, particularly Roger Tsien for devising the original concepts and for
continual input into their improvement.
References
1. J. Zhang, R. Campbell, A. Ting, probes for cell biology, Nat. Rev. Mol.
R. Tsien, Creating new fluorescent Cell Biol. 2002, 12, 906-918.
References I 4 5 5
2. R. Tsien, The green fluorescent 12. W. Ju,W. Morishita, J. Tsui,

protein, Annu. Rev. Biochem. 1998, 67, G. Gaietta, T. Deerinck, S. Adams,
509-544. C. Garner, R. Tsien, M. Ellisman,
3. A. Keppler, S. Gendreizig, R. Malenka, Activity-dependent
T. Gronemeyer, H. Pick, H. Vogel, regulation of dendritic synthesis and
K. Johnsson, A general method for the trafficking of AMPA receptors, Nut.
covalent labeling of fusion proteins Neurosci. 2004, 7, 244-253.
with small molecules in vivo, Nat. 13. K. Marek, G . Davis, Transgenically
Biotechnol 2003, 21, 86-89. encoded protein photoinactivation
4. A. Juillerat, T. Gronemeyer, (FIAsH-FALI):acute inactivation of
A. Keppler, S. Gendreizig, H. Pick, synaptotagmin I, Neuron 2002, 36,
H. Vogel, K. Johnsson, Directed 805-813.
evolution of 06-alkylguanine-DNA 14. 0. Tour, R. Meijer, D. Zacharias,
alkyltransferase for efficient labeling of S. Adams, R. Tsien, Genetically
fusion proteins with small molecules targeted chromophore-assisted light
in vivo, Chem. Biol. 2003, 10, 313-317. inactivation, Nat. Biotechnol. 2003, 21,
5. A. Keppler, H.Pick, C. Arrivoli, 1505- 1508.
H. Vogel, K. Johnsson, Labeling of 15. K. Thorn, N. Naber, M. Matuska,
fusion proteins with synthetic R. Vale, R. Cooke, A novel method of
fluorophores in live cells, Proc. Nutl. affinity-purifying proteins using a
Acad. Sci. 2004, 101,9955-9959. bis-arsenical fluorescein, Protein Sci.
6. J. Farinas, A. Verkman, Receptor- 2000, 9,213-217.
mediated targeting of fluorescent 16. 8. Chen, M. Mayer, L. Markillie,
probes in living cells, /. B i d . Chem. D. Stenoien, 7 . Squier, Dynamic
1999,274,7603-7606. motion of helix A in the
7. B. Griffin, S. Adams, R. Tsien, amino-terminal domain of calmodulin
Specific covalent labeling of is stabilized upon calcium activation,
recombinant protein molecules inside Biochemistry 2005, 44, 905-914.
live cells, Science 1998, 281, 269-272. 17. S . Robia, N. Flohr, D. Thomas,
a. S. Adams, R. Campbell, L. Gross, Phospholamban pentamer quaternary
B. Martin, G. Walkup, Y. Yao, conformation determined by in-gel
J. Llopis, R. Tsien, New biarsenical fluorescence anisotropy, Biochemistry
ligands and tetracysteine motifs for 2005,44,4302-4311.
protein labeling in vitro and in vivo: la. C. Hoffmann, G . Gaietta,
synthesis and biological applications, M. Bunemann, S. Adams,
J. Am. Chem. SOC.2002, 124, S. Oberdorff-Maass, B. Behr,
6063 - 6076. J. Vilardaga, R. Tsien, M. Ellisman,
9. I. Chen, M. Howarth, W. Lin, A. Ting, M. Lohse, A FIAsH-based FRET
Site-specific labeling of cell surface approach to determine G protein-
proteins with biophysical probes using coupled receptor activation in living
biotin ligase, Nut. Methods 2005, 2, cells, Nut. Methods 2005, 17, 171-176.
99-104. 19. K. Marks, M. Rosinov, G. Nolan, In
10. B. Griffin, S. Adams, J. Jones, vivo targeting of organic calcium
R. Tsien, Fluorescent labeling of sensors via genetically selected
recombinant proteins in living cells peptides, Chem. Biol2004, 11,
with FlAsH, Methods Enzymol.2000, 347-356.
327, 565-578. 20. K. Stroffekova, C. Proenza, K. Beam,
11. G. Gaietta, T. Deerinck, S. Adams, The protein-labeling reagent
J , Bouwer, 0. Tour, D. Laird, FLASH-EDT2binds not only to
G. Sosinsky, R. Tsien, M. Ellisman, CCXXCC motifs but also
Multicolor and electron microscopic non-specifically to endogenous
imaging of connexin trafficking, cysteine-rich proteins, Pflugers Arch.
Scie~ce2002,296,503-507. ~ 2001, 442,859-866.
456
I 21. J. Nakanishi, M. Maeda, Y. Umezawa, cells, CUT. Opin. Chem. Biol. 2005, 9,
A new protein conformation indicator 56-61.
based on biarsenical fluorescein with 33. R. Panchal, G. Ruthel, T. Kenny,
an extended benzoic acid moiety, G. Kallstrom, D. Lane, S. Badie, L. Li,
Anal. Sci. 2004, 20, 273-278. S. Bavari, M. Aman, In vivo
22. R. Tsien, Building and breeding oligomerization and raft localization
molecules to spy on cells and tumors, of ebola virus protein VP40 during
FEBS Lett. 2005,579,927-932. vesicular budding, Proc. Natl. Acad.
23. J. Nakanishi, T. Nakajima, M. Sato, Sci. 2003, 100,15936-15941.
T. Ozawa, K. Tohda, Y. Umezawa, 34. E. Cambronne, J. Sorg,
Imaging of conformational changes of 0 . Schneewind, Binding of SYcH
proteins with a new environment- chaperone to YscM1 and YscM2
sensitive fluorescent probe designed activates effector yop expression in
for site-specificlabeling of Yersinia enterocolitica, J . Bacteriol.
recombinant proteins in live cells, 2004, 186,829-841.
~ ~them, ~ 2001,
l 73,, 2920-2928, 35. W. Huh, I. Falvo, L. Gerke, A. Carroll,
24. R. Ebright, Y. Ebright, Reagents and R. Howson, J. Weissman, E. O'Shea,
procedures for high-specificity Global analysis of protein localization
labeling, U. S. Pat. Appl. 2004, in budding yeast, Nature 2003, 425,
US 0019104 A l . 686-691.
25. B, Martin, B, ,-iepmans, s. Adams, 36. M. Andresen, R. Schmitz-Salue,
R. Tsien, Mammalian cell-based S. Jakobs, Short tetracysteine tags to
optimization of the biarsenical- beta-tubulin demonstrate the
binding tetracysteine motif for significance of small labels for live cell
improved fluorescence and affinity, imaging, Mol. Biol. Cell. 2004, 15,
5616-5622.
Nat. Biotechnol. 2005, 23, 1308-1314.
37. J. Vilardaga, M. Bunemann, C. Krasel,
26. Instruction Manualfor Lumid" In-Cell
M. Castro, M. Lohse, Measurement of
Labeling Kits, InVitrogen Life
the millisecond activation switch of G
Technologies, Carlsbad, 2003.
protein-coupled receptors in living
27. V. Boyd, J. Harbell, R. O'Connor,
cells, Nat. Biotechnol. 2003, 21,
E. McGown, 2,3-Dithioerythritol,a
807-812.
possible new arsenic antidote, 38. J , Goldstein, N, Waterhouse, p. Juin,
Res. Toxicol. 1989, 2, 301-306. G. Evan, D. Green, The coordinate
28. L. Miller, J. Sable, P. Goelet, release of cytochrome c during
M. Sheetz, V. Cornish, Methotrexate apoptosis is rapid, complete and
conjugates: a molecular in vivo protein kinetically invariant, Nat, Cell Biol,
tag, Angew. Chem. Int. Ed. Engl. 2004, 2000,2,156-162.
43,1672-1675. 39. J. Goldstein, C. Mufioz-Pinedo, J.-E.
29. E. Guignet, R. Hovius, H. Vogel, Ricci, S. Adams, A. Kelekar,
Reversible site-selectivelabeling of M. Schuler, R. Tsien, D. Green,
membrane proteins in live cells, Nat. Cytochrome c is released during
Biotechnol. 2004, 22, 440-444. apoptosis in a single step, Cell Death
30. N. Johnsson, K. Johnsson,A fusion of if^^,2005, 12, 453-462,
disciplines: chemical approaches to 40. M. ~ iK, Czymmek,
~ ~ ,E. ~ ~The i ~
exploit fusion proteins for functional potential of nucleic acid repair in
genomics, Chembiochem 2003, 4, functional genomics, Nat. Biotechnol.
803-810. 2001, 19,321-326.
31. I. Chen, A. Ting, Site-specificlabeling 41. M. Rice, M. Bruner, K. Czymmek,
of proteins with small molecules in E. Kmiec, In vitro and in vivo
live cells, Curr. Opin. Biotechnol. 2005, nucleotide exchange directed by
16,35-40. chimeric RNA/DNA oligonucleotides
32. L. Miller, V. Cornish, Selective in saccharomyces cerevisae, Mol.
chemical labeling of proteins in living Microbiol. 2001, 40, 857-868.
References I 4 5 7
42. Z. Ignatova, L. Gierasch, Monitoring localization, Science 2003, 300,

protein stability and aggregation in 206 1- 2065.
vivo by real-time fluorescent labeling, 49. A. Yildiz, M. Tomishige, R. Vale,
Proc. Natl. Acad. Sci. 2004, 101, P. Selvin, Kinesin walks
523-528. hand-over-hand, Science 2004, 303,
43. L. Rudner, S. Nydegger, L. Coren, 676-678.
K. Nagashima, M. Thali, D. Ott, 50. G . Snyder, T. Sakamoto, J. Hammer,
Dynamic fluorescent imaging of J. Sellers, P. Selvin. Nanometer
human immunodeficiency virus type 1 localization of single green fluorescent
gag in live cells by biarsenical labeling, proteins: evidence that myosin V
/. Virol. 2005, 79, 4055-4065. walks hand-over-hand via telemark
44. P. Blommel. B. Fox, Fluorescence configuration, Biophys. /. 2004, 87,
anisotropy assay for proteolysis of 1776-1783.
specifically labeled fusion proteins, 51. F. Wang, D. Jay, Chromophore-
Anal. Biochem. 2005, 336, 75-86. assisted laser inactivation (CALI):
45. S. Weiss, Measuring conformational probing protein function in situ with a
dynamics of biomolecules by single high degree of spatial and temporal
molecule fluorescence spectroscopy, resolution, Trends Cell Biol. 1996, 6,
Nat. Struct. Biol. 2000, 7, 724-729. 442-445.
46. X. Tan, D. Hu, T. Squier, H. Lu, 52. K. Poskanzer, K. Marek, S. Sweeney,
Probing nanosecond protein motions G. Davis, Synaptotagmin I is
of calmodulin by single-molecule necessary for compensatory synaptic
fluorescence anisotropy, Appl. Phys. vesicle endocytosis in vivo, Nature
Lett. 2004, 85, 2420-2422. 2003,426,559-563.
47. H. Park, G. Hanson, S. Duff, 53. G. Feldman, R. Bogoev, J. Shevirov,
P. Selvin, Nanometre localization of A. Sartiel. I. Margalit, Detection of
single ReAsH molecules, J Microsc. tetracysteine-tagged proteins using a
2004,216,199-205. biarsenical fluorescein derivative
48. A. Yildiz, J. Forkey, S. McKinney, through dry microplate array gel
T. Ha, Y. Goldman, P. Selvin, Myosin electrophoresis, Electrophoresis 2004,
V walks hand-over-hand: single 25,2447-2451.
fluorophore imaging with 1.5-nm
Chemical Biology

458
I 8.2
Chemical Approaches to Exploit Fusion Proteins for Functional Studies
Anke Arnold, India Sielas Nils Johnsson, and Kai Johnsson
Outlook
Contemporary approaches to study protein function often rely on the

expression of the protein of interest as a fusion protein with an additional
polypeptide or tag, whose role is to aid in the purification, detection, or
functional characterization of the corresponding fusion protein. Recently,
the role of these polypeptides has been extended to mediate the labeling of
the protein of interest with chemically diverse compounds to monitor and
manipulate protein function in both living cells and in vitro. To highlight
the potential and limitations of this approach we discuss in this chapter two
methods developed in our laboratories for the specific and covalent labeling of
fusion proteins in living cells and in vitro.
8.2.1
Introduction
Proteins participate in almost all biological processes and a detailed

understanding of protein function and mechanism is therefore a prerequisite
for an understanding of biological processes on a molecular level. The function
of a protein is affected by temporal control of its expression, its localization
and posttranslational modifications, its chemical microenvironment, and
interactions with other biomolecules. Because of the enormous complexity
of this problem and the large number of different proteins that are expressed
in any given cell, cell biologists and protein chemists are struggling to invent
more efficient and generally applicable methods for studying proteins. By
far the most successful strategy to meet this challenge has been the use
of fusion proteins. Here, the protein of interest is genetically engineered to
contain an additional sequence either at its N- or C-terminus. This so-called
tag equips the resulting fusion with a unique property that can be exploited
to study certain activities of the protein. Tags of fusion proteins currently
have two main applications: their use in purification schemes and as tools
to explore the basic cellular properties of the protein. Examples of tags used
in purification schemes are the polyhistidine tag recognized by immobilized
metals, and glutathione S-transferase recognized by immobilized glutathione
[l,21. Most applications for the use of fusion proteins to examine biological
functions in live cells involve the use of autofluorescent proteins, the green
fluorescent protein (GFP) being the most prominent example [3, 41. Coupled
with sensitive imaging techniques, the behavior of autofluorescent fusion
ISBN: 978-3-527-31150-7
8.2 Chemical Approaches to Exploit Fusion Proteinsfor Functional Studies I 459
proteins can be observed in real time, thereby giving new insights into the
dynamic distribution and localization of proteins in the cell. Other prominent
fusion protein-based approaches to study protein function in live cells include
the yeast two-hybrid system and the split-ubiquitin sensor, two methods that
allows the characterization and identification of protein-protein interactions
[5, 61. One impressive proof of the enormous utility of fusion proteins can
be found in the efforts to fuse all open reading frames of a given organism
to an appropriate tag and to exploit the properties of the tag to investigate
certain aspects of the biological function of the corresponding fusion protein
library. Examples include efforts to construct genome-wide protein interaction
maps using the two-hybrid system, to gain an inventory of all cellular protein
complexes using affinity tags, to observe the intracellular localization of all
proteins in the cell via GFP fusions, to map protein-protein interactions
among 705 integral membrane proteins of the yeast Saccharomyces cerevisiae,
or to display the entire proteome of an organism as a protein microarray
using a polyhistidine tag and nickel-coated glass slides [7-121. As impressive
these efforts were, the genome-wide application of fusion proteins dramatically
revealed two shortcomings of the currently available tags: their limitation to
properties that can be genetically encoded and the restriction of each fusion
tag to one particular type of functional assay. The latter point is acceptable
for the studies of individual proteins but more bothersome for genome-wide
approaches. In recent years, a new approach to exploit fusion proteins in
functional proteomics, which addresses these limitations has been developed:
this new approach is based on a tag-mediated labeling of fusion proteins,
either in vitro or in live cells, with synthetic molecules that transfer a unique
and specific property to the fusion protein [ 131.
8.2.2
The labeling of proteins with small molecules that can serve as spectroscopic
probes or cross-linkers is one of the cornerstones of protein chemistry.
However, the lack of specificity of the underlying chemistry used in traditional
protein labeling makes its application in the living cell or complex protein
mixtures, impossible. Currently, there exist two different strategies to equip
proteins with synthetic probes to monitor and manipulate protein function:
the incorporation of unnatural amino acids pioneered by the group of Schultz
[14],and the use of protein tags to mediate an exclusive labeling of synthetic
molecules, which will be the focus of this article. To be of general use, the
mechanism of labeling must be sufficiently promiscuous with respect to the
synthetic molecule so that different functionalities can be attached to the
tag but at the same time highly specific with respect to the protein tag so
that only the fusion protein is labeled with the synthetic molecule. Currently
used approaches for the labeling of fusion proteins with small molecules or
8 Jags and Probes for Chemical Biology
460
I ligands that carry the desired functionality can be classified into three groups:
(a) intein-based labeling of proteins with small molecules; (b) tags that bind to
a small molecule through noncovalent interactions, and (c) tags that bind to
a small molecule through covalent bond formation. Intein-based approaches
are a powerful method for the derivatization and semisynthesis of proteins in
uitro and applications of this approach will be discussed in detail in a different
chapter of this book [15, 161. Concerning the labeling of proteins with small
synthetic probes in live cells, an approach based on transsplicing inteins has
been developed by the group of Tom Muir [17].This approach is very elegant
as the intein tag removes itself in the process of the labeling, however, its
general applicability remains to be shown. A list compiling the approaches
developed so far for the noncovalent or covalent labeling of fusion proteins is
shown in Table 8.2-1. Concerning the labeling of fusion proteins via tags that
noncovalently interact with small molecules, a variety of different approaches
has been developed over the last few years. These include antibodies binding
to haptens, streptavidin binding to biotin derivatives, dihydrofolate reductase
(DHFR) binding to methotrexate (Mtx) or trimethoprim derivatives, FKBP12
mutants binding to a synthetic ligand, and short peptides binding to derivatized
a-bungarotoxin or to Texas red derivatives [18-241. These tags have been
successfully used for the labeling of fusion proteins with fluorophores and
other probes in live cells. A good example is the study of receptor trafficking of
the a -amino-3-hydroxy-5-methyl-4-isoxazole-propionate(AMPA) receptor [241.
In this study, the AMPA receptor was expressed as fusion protein with an
a-bungarotoxin-binding peptide and subsequently labeled with fluorescent,
radioactive, or biotinylated a-bungarotoxin derivatives. Using this approach,
the total receptor expression, surface expression, internalization, and insertion
of receptors into the plasma membrane could be visualized and quantified in
fixed or live cells. A possible limitation of tags labeled through noncovalent
interactions is the reversibility of the labeling. This feature is disadvantageous
for applications such as pulse-chase type labeling experiments, long-term
studies, and the detection of the labeled protein under denaturing conditions.
The remaining part of the chapter is therefore dedicated to discussing the
approaches for a covalent labeling of fusion proteins, in more detail.
The first tag allowing for a covalent labeling of fusion proteins in vitro
and in live cells was the tetracysteine tag that specifically binds to biarsenical
compounds such as FlAsH, a biarsenical fluorescein derivative [25,33].The two
main advantages of the tetracysteine tag is its relatively small size, which can
be as small as 6 amino acids (CCPGCC in one-letter code), and the possibility
to use different fluorophores. Potential disadvantages of the approach are the
reported unspecific binding of the biarsenical fluorophores and the need to
coincubate with dithiols such as 1,2-ethanedithiol to minimize this unspecific
binding. However, the Tsien group recently reported sequences with increased
affinity toward biarsenical compounds that should enhance the performance
of the approach in live cells [34]. The use of the tetracysteine tag is discussed
in detail by Steve Adams in another chapter of this book.
Table 8.2-1 Tags used for the selective labeling of fusion proteins
with synthetic molecules
Tag Sizela] Label Required additives Type of linkage Applications Comments References Po
h,
Tetracysteine tag 6-12 Biarsenical Dithiols to suppress Covalent and Intracellular Cell surface (251
fluorophores unspecific binding reversible applications require
reduction of
disulfide bonds n
b
182 Benzylguanine None Covalent and Intracellular, - ‘D
AGT [261
derivatives irreversible cell surface a
5-
n
N-terminal Cys -
>l[’’] Thioester None Covalent and Intracellular Limited specificity ~ 7 1 2
derivatives irreversible of labeling; slow
reaction
16 NTA derivatives None Noncovalent and Intracellular, -
::
‘p
His tag PI
reversible cell surface +
.P.
Texas red binders 38,42 Texas red None Noncovalent and Intracellular, Lo.
2
derivatives reversible cell surface
cY-Bungarotoxin- 1 3 a-Bungarotoxin None Noncovalent and Cell surface
binding peptide derivatives (74 a.a.) reversible
157 Methotrexatel None Noncovalent and Intracellular - [21, 221
DHFR
trimethoprim reversible
derivatives 2
2.
3
%
P
N
m
-
00
2
B
n
5
Table 8.2-1 (continued) sl.
-0
a
Tag Size[a] Label Required additives Type of linkage Applications Comments References
FKBP12 mutant 108 Synthetic ligand None Noncovalent and Intracellular

reversible
Streptavidin >125 Biotin derivatives None Noncovalent and Intracellular
reversible
scFv -250 Hapten derivatives None Noncovalent and Intracellular,
reversible cell surface
CP >77
- CoA derivatives Phosphopantetheine Covalent and Cell surface
transferase irreversible
Biotin acceptor 315 Biotin Biotin ligase, ATP Covalent and Intracellular, Allows use of [311
peptides irreversible cell surface derivatized
streptavidins to
label cell surface
proteins
Biotin acceptor 115 Keto isostere of Biotin ligase, ATP, Covalent and Cell surface Two-step labeling [321
peptides biotin hydrazide derivatives reversible required
~
a Size is given in amino acids; AGT 06-alkylguanine-DNA
alkyltransferase; DH FR - dihydrofolate reductase;
NTA - nitrilotriacetic acid; CP - carrier protein.
b Requires expression as N-terminal fusion with ubiquitin or
intein.
8.2 Chemical Approaches to Exploit Fusion Proteinsfor Functional Studies 1 463
We have developed two general approaches for the covalent labeling of

fusion proteins with chemically diverse compounds. The first approach
allows for the labeling of fusion proteins of human O'-alkylguanine-DNA
alkyltransferase (AGT) with synthetic molecules [26]. The labeling is based
on the irreversible and specific reaction of ACT with 06-benzylguanine(BG)
derivatives, leading to the irreversible transfer of a synthetic probe to a reactive
cysteine residue of AGT. The second approach is based on the expression
of the protein of interest as a fusion with acyl carrier protein (ACP) and the
specific ligation of chemically diverse compounds to this fusion protein using
a phosphopantetheine transferase (PPTase), an approach that is particularly
well suited for the labeling of cell surface proteins [30]. Using these two
technologies as representative examples, we will discuss in the following text
the potential and limitations of such a chemical approach to exploit fusion
proteins for functional studies. As the tags are only the links between the
protein of interest and the synthetic molecule, at least some of the experiments
described in the following section could have been performed with different
tags but similar synthetic molecules.
8.2.3
8.2.3.1 Labeling ofACT Fusion Proteins

ACT is a DNA repair enzyme that reverts lesions resulting from the
06-alkylation of guanine [35]. DNA repair is achieved by irreversibly
transferring the alkyl group to a reactive cysteine of AGT. The mechanism is in
as much unusual, as alkylated AGT is not regenerated after repair but degraded
at some point after alkyl transfer. Taking advantage of the observation that
human ACT not only reacts with alkylated guanine incorporated in DNA but
also with the base BG, we demonstrated that BG derivatives carrying various
labels at the 4-position of the benzyl ring can be used for the specific labeling
of AGT fusion proteins, both in living cells and in vitro (Fig. 8.2-la) [26].
Importantly, the labeling is highly specific for ACT fusion proteins as BG
derivatives do not show any appreciable reactivity toward other proteins or
simple nucleophiles. Also important for practical applications is the ease with
which BG derivatives can be synthesized, resulting already in a large number
of derivatives.(Fig. 8.2-lb)
Wild-type human AGT is a monomeric protein of 207 residues that due to
its affinity for DNA is located in the nucleus of the cell [36]. To remove this
undesired feature we engineered the properties of AGT so that the affinity for
DNA was suppressed and at the same time its activity against BG derivatives
was increased by a factor of about 50 [37, 381. When expressed in mammalian
cells, these mutants show cellular distributions similar to GFP and the
increased activity of ACT against BG derivatives also translates into a more
efficient labeling of ACT fusion proteins [39]. Another potential limitation of
464
I 8 Tags and Probesfor Chemical Biology
hTA?
: ~ k
coo-
{<ANH2
BGBT BGMtx BGBD

Biotin Methotrexate Biotin and dinitrophenol
Fig. 8.2-1 (a) General scheme for labeling ofACTfusion proteins

using BC derivatives. (b) BC derivatives described in this work;
compounds are listed with their abbreviations and the name ofthe
label transferred t o ACT fusion proteins.
the approach is interference from endogenous wild-type 06-alkylguanine-DNA

alkyltransferase (wtAGT). In contrast to AGTs from yeast or Escherichia coli
that do not react with BG derivatives, mammalian wtAGTs can react with
BG derivatives. Although the activities of the currently used mutants are at
least 50-fold higher than wtAGT, the mammalian wtAGTs might still lead to
some unwanted background labeling. To exclude such an undesired labeling
of endogenous wtAGT, the specific labeling of AGT fusion proteins has
initially been restricted to AGT-deficient mammalian cell lines. To address

this limitation in a more general way, we have synthesized an inhibitor of
wtAGT and have generated AGT mutants that are resistant to this inhibitor.
This scheme allows to inactivate wtAGT, whereas labeling of the AGT mutant
is still possible [40]. The ability to specifically label AGT fusion proteins in
the presence of endogenous AGT by a brief preincubation of the cells with a
small molecule, significantly broadens the scope of application of AGT fusion
proteins in living cells.
One important application of AGT fusion proteins is fluorescence labeling.
Currently, a variety of different fluorophores have been coupled to BG and
depending on cell permeability, these molecules can be used for the labeling
of AGT fusion proteins in live cells [39]. Up to now, the BG derivatives of the
fluorophores: fluorescein, Oregon green, rhodamine green, tetramethylrho-
damine, and SNARF-1 have been used for fluorescence labeling within live
cells, whereas the conjugates of BG with fluorophores such as Cy3 and Cy5
proved to be cell impermeable and are therefore more suitable for applica-
tions on cell surfaces and in vitro (Fig. 8.2-l(b)).The possibility to choose the
nature of the fluorophore and the time point of labeling opens up a number
of interesting applications. One such application is sequential labeling of
ACT fusion proteins to distinguish older copies from newer copies of the
same protein within one cell through multicolor analysis. To demonstrate
the feasibility of multicolor imaging of AGT fusion proteins, we analyzed the
translocation of the temperature-sensitiveglycoprotein of vesicular stomatitis
virus-0'-alkylguanine-DNA alkyltransferase (tsVSVG-AGT)[39].The tsVSVG
is a membrane protein that is transported via the secretory pathway to the
plasma membrane at permissive temperatures, whereas at the restrictive
temperature of 40 "C the protein reversibly misfolds and is retained in the en-
doplasmic reticulum (ER) [41]. AGT was fused to the cytoplasmic C-terminus
of tsVSVG (tsVSVG-AGT)and Chinese hamster ovary cells (CHO) transiently
expressing tsVSVG-AGT at the permissive temperature of 34 "C show efficient
transport of the fusion protein to the plasma membrane, as demonstrated by
labeling with fluorescein. We then tried to discriminate between tsVSVG-AGT
populations expressed before and after a temperature shift through sequen-
tial labeling with different fluorophores. In these experiments, CHO cells
transiently expressing tsVSVG-AGT were incubated for 20 h at 34 "C and
subsequently labeled with fluorescein. The temperature of the medium was
then shifted to 40°C for 75 min, and the tsVSVG-AGT synthesized at this
temperature was labeled with SNARF-1. Subsequent fluorescence imaging of
cells demonstrated that fluorescein-labeled tsVSVG-AGT was located predom-
inantly in the plasma membrane, whereas SNARF1-labeled tsVSVG-AGTwas
predominantly located in internal membrane structures, most likely perinu-
clear E R or Golgi (Fig. 8.2-2). The data clearly demonstrate that within live
cells, older and newer copies of AGT fusion proteins can be discriminated by
sequential labeling with different fluorophores.
466
Fig. 8.2-2 Multicolor analysis of fluorescence micrographs. (b) Fluorescence

tsVSVC-ACT. (a-c) Sequential labeling o f channel for fluorescein-labeled tsVSVC-ACT
tsVSVC-ACT: Labeling with fluorescein at (ex. 488 nrnlern. 505-530 nm).
permissive temperature (34 “C) and with (c) Fluorescence channel for
SNARF-1 at nonperrnissive temperature SNARF-1-labeled tsVSVC-ACT (ex.
(40°C). (a) Overlay oftransmission and 543 nm/em.>650 nm).
Fluorescence resonance energy transfer (FRET) measurements between

pairs of (auto) fluorescent proteins are an important tool to investigate the spa-
tial distance between proteins in a time resolved manner [42].Furthermore, var-
ious reporter systems to monitor cellular parameters are based on intramolec-
ular FRET between two autofluorescent proteins fused in a single polypeptide
chain [3].To demonstrate the use of AGT fusion proteins in FRET applications,
we constructed a fusion protein between enhanced green fluorescent protein
(EGFP) and O6-a1kylguanine-DNAalkyltransferase-nuclear localization se-
quence (AGT-EGFP-NLS3)[39].Here, EGFP would be the donor (488 nm) and
SNARF-1-labeled AGT would be the acceptor for intramolecular FRET. The
broad absorbance of SNARF-1-labeledAGT in the region from 500 to 580 nm
at pH 7.4 makes it an ideal acceptor in such experiments. Labeling of AGT-
EGFP-NLS3 with 5 pM beuzylguanine-SNARF (BGSF) for 1 h led to a 98%
decrease in EGFP emission at 505 to 530 nm, indicating both efficient FRET as
well as an efficient labeling of AGT-EGFP-NLS3with SNARF-I. As SNARF-1
can also be excited to some extent at 488 nm, we assumed that the observed
emission above 650 nm is due to both direct laser excitation as well as FRET.
The labeling ofAGT fusion proteins is not restricted to monitor protein func-
tion, but can be also used to manipulate it. One example for this approach is
induced protein dimerization through covalent labeling ofAGT fusion proteins
[43].The dynamic dimerization and dissociation of pairs of proteins plays an
important role in various biological processes. As a chemical approach to study
processes that depend on protein dimerization, the teams of Schreiber, Crab-
tree have introduced “chemical inducers ofdimerization” (CIDs) [44].CIDs are
cell-permeable molecules, which can bind simultaneously to two different pro-
teins, thereby inducing their dimerization. Various biological processes have
been controlled and studied with this approach, including signal transduction
and control of transcription in eukaryotic and prokaryotic cells. Previously used
CIDs relied on noncovalent interactions and we extended the approach through
covalent labeling of AGT fusion proteins with ligands capable of interacting
with other proteins. As a first ligand we chose Mtx. Mtx is a tight-binding in-
hibitor of DHFR, and heterodimers of Mtx and dexamethasone, a ligand of the
glucocorticoid receptor (GR),have been used as CIDs to control transcription
in yeast [45]. In this so-called three-hybrid system, a DNA-binding domain and
a transcriptional activation domain were expressed as DHFR and GR fusion
proteins, respectively, and transcription was initiated through the addition of
the CID. On the basis of these studies, we synthesized a 06-benzylguanine-
methotrexate (BGMtx) heterodimer as CID (Figs. 8.2-l(b)and 3) [43]. To use
BGMtx as CID in a three-hybrid system, we constructed fusion proteins ofAGT
with the DNA-binding domain LexA and of DHFR with the transcriptional ac-
tivation domain B42 (Fig. 8.2-3).The in vivo labeling of the AGT fusion protein
with Mtx using BGMtx then induced the dimerization of the AGT and DHFR
fusion proteins, leading to stimulation of transcription of a reporter gene.
Pairs of plasmids encoding LexA and B42 fusion proteins were transformed
into the yeast strain L40, in which the dimerization of LexA and I342 fusion
proteins leads to transcription of the reporter genes H I S 3 and lacZ. Growing
these yeast strains in the presence of BGMtx then complemented the histidine
auxotrophy of the yeast and also induced the expression of B-galactosidase.
These experiments clearly showed that BG derivatives can be used as CIDs
to control transcription in yeast and, more generally, also demonstrated how
AGT fusion proteins can be used to control protein dimerization in vivo.
Fig. 8.2-3 ACT-based three-hybrid system.

468
The previously described experiments focused on labeling of AGT fusion

proteins in live cells, but a number of interesting in vitro applications are also
possible. One important application is the covalent immobilization of AGT
fusion proteins. We were able to show that by linking BG via a flexible linker
to a bioinert surface, AGT fusion proteins can be specifically and covalently
immobilized (Fig. 8.2-4) [4G]. Importantly, AGT fusion proteins generally retain
their function after immobilization and because of the specificity of the reaction
can be also directly immobilized out of cell extracts without prior purification
of the fusion protein. These features make AGT fusion proteins particularly
well suited for the generation of protein microarrays. Protein microarrays are
regarded as one key research tool in proteomics [47]. They are generated by
arraying a repertoire of different proteins on a solid support at high spatial
density for the subsequent characterization of the immobilized proteins. While
the continuous identification of proteins with unknown function fuels a need
for high-throughput technologies for their characterization, protein function
microarrays so far have been used in relatively few laboratories. The reasons for
this are the technological hurdles associated with their generation, in particular,
the parallel expression and purification of large numbers of proteins and their
subsequent immobilization on the microarray in a functional state. AGT fusion
proteins appear as attractive candidates for applications on protein microarrays
for two main reasons: firstly,the specificity of the reaction between AGT and BG
should allow a direct covalent immobilization of fusion proteins from complex
mixtures. This is particularly important when large numbers of proteins must
be processed in parallel. Secondly, the ability to both immobilize and to
fluorescence label, the AGT fusion proteins should facilitate rapid screening
for protein-protein interactions by generating a defined array of AGT fusion
proteins and subsequent probing of the microarray with a fluorescence-
labeled AGT fusion protein (Fig. 8.2-5(a)).Such a strategy requires that both
Fig. 8.2-4 General scheme for immobilization of ACT fusion proteins

8.2 Chemical Approaches to Exploit Fusion Proteinsfor Functional Studies I
469
Fig. 8.2-5 (a) Use of ACT-based protein and rapamycin (100 nM) and afterwashing
microarrays t o screen for protein-protein analyzed for fluorescence: (1) detection o f
interactions. (b) Purified ACT-FKBP and Cy3, (2) detection o f Cy5 on same
ACT-FRB (both 1 p M ) were immobilized in microarray as in ( l ) , (3) overlay of (1) and
arrays o f 8 x 8 spots each on a BG-covered (2). (c) Same experiments as in (b) but
glass. The slide was then incubated with a using cell lysates o f E. coli BL21 (DE3)
solution containing Cy3-labeled ACT-FKBP expressing either ACT-FKBP or ACT-FRB for
(100 pM), Cy5-labeled ACT-FRB (100 pM), spotting.
the labeling of the protein and its immobilization are practically irreversible, a
requirement that is not fulfilled by low-affinity tags such as the His tag.
The generation of AGT-based protein microarrays requires the display
of BG on otherwise bioinert glass slides. We have previously shown that
surfaces covered either with carboxymethylated dextran or polymer brushes
of poly(oligo(ethy1eneglyco1)methacrylate)(POEGMA) and displaying BG are
sufficiently bioinert for the selective immobilization of ACT fusion proteins
[46, 481. Building on these results, we used glass slides covered either
470
I with carboxylated hydrogel or POEGMA. To demonstrate the use of AGT-
based protein microarrays for the analysis of protein-protein interactions, we

first analyzed the rapamycin-dependent heterodimerization of FKSOG-binding
protein (FKBP)and the binding domain of FKBP rapamycin-associated protein
(FRB). FKBP and FRB were expressed in E. coli with AGT fused to their N-
termini, yielding AGT-FKBP and AGT-FRB. Each of these proteins had a His
tag fused to the N-terminus of AGT for purification. In the initial experiments,
AGT-FKBP and AGT-FRB were expressed and purified via their His tag
and the purified proteins were arrayed on glass slides displaying BG. In
separate experiments, AGT-FKBP was labeled with Cy3 and AGT-FRB was
labeled with Cy5, using appropriate BG derivatives. The protein microarray
displaying AGT-FKBP and AGT-FRB was then simultaneously incubated with
fluorescence-labeled AGT-FKBP and AGT-FRB in the presence of rapamycin.
Analysis of the microarray clearly showed that Cy3-labeled FKBP interacts only
with immobilized FRB and Cy5-labeled FRB interacts only with immobilized
FKBP, demonstrating that both AGT fusion proteins remain functional after
immobilization or labeling (Fig. 8.2-5(b)).As mentioned above, the specificity
of the reaction of AGT fusion proteins with BG derivatives should allow for
a direct arraying of AGT fusion proteins from cell extracts. We therefore
repeated the above experiments by spotting extracts of E. coli expressing
either AGT-FKBP or AGT-FRB. The resulting microarrays were incubated
with Cy3-labeled FKBP and Cy5-labeled FRB in the presence of rapamycin,
allowing for the recapitulation of the specific interaction between FKBP and
FRB and demonstrating that a purification of the fusion protein prior to
immobilization was not necessary (Fig. 8.2-5(c)).The possibility of using
sets of AGT fusion proteins to rapidly screen for mutual protein-protein
interactions is certainly one of the major applications we envision for AGT-
based protein microarrays. Furthermore, we were able to show that small
molecule-protein interactions as well as posttranslational modifications can
be detected on AGT-based protein microarrays. As already mentioned, the
possibility to label AGT fusion proteins with fluorophores is also an important
feature of functional studies. To facilitate the purification of fluorescence-
labeled AGT fusion proteins, we developed a synthesis of BG derivatives that
allows for the labeling of AGT fusion proteins with bifunctional synthetic
probes such as fluorophores and affinity labels (Fig. 8.2-l(b))[49].The affinity
label allows for the isolation of fluorescence-labeled AGT fusion proteins and
the bifunctional substrates could become useful tools for various applications
in functional proteomics. Together, these features should make AGT-based
protein microarrays a powerful tool for functional proteomics.
8.2.3.2 Labeling o f CP-fusion Proteins as a Tool to Study Cell Surface Proteins

The cell surface plays a key role in a variety of complex biological processes
ranging from signal transduction to cell-cell and host-pathogen interactions.
Proteins that act as receptors, channels, transporters, or enzymes that build
8.2 Chemical Approaches to Exploit Fusion Proteinsfor functional Studies I 471
and remodel the extracellular matrix play the most prominent role in these
activities. The detailed in vivo characterization of proteins is therefore an
important prerequisite for understanding the biology of the cell surface in
molecular terms. As the surfaces of cultured cells are freely accessible to
chemical treatment, the labeling of their proteins with synthetic molecules
appears to be an attractive strategy to equip them with probes that allow
for their functional characterization [SO]. Tetracysteine tag and AGT are two
examples for tags that were designed primarily for the covalent modification
of intracellular proteins. Consequently, these protein tags are not necessarily
suitable for applications in the oxidizing environment of the cell surface. For
example, the application of the tetracysteine tag on cell surfaces requires the
reduction of the otherwise oxidized and unreactive cysteines of the tag using
membrane-impermeable reductants such as 2-mercaptoethanesulfonate and
tris(carboxyethy1)phosphine [Sl]. Since this treatment will also reduce the
disulfide bridges of most cell surface proteins, it will automatically perturb
many of their activities. The labeling of AGT fusion proteins, on the other
hand, relies on the alkylation of the reactive cysteine of AGT. While we
have previously shown that AGT mutants with increased stability toward
oxidizing conditions can be displayed in an active form on cell surfaces or viral
particles, the requirement for a reactive cysteine makes AGT fusion proteins,
nevertheless, to some extent sensitive to the oxidative environment of cell
surfaces. The noncovalent labeling of cell surface proteins can alternatively
be achieved by expressing them with an oligohistidine tag and incubating
the corresponding cells with probes comprising a chromophore together
with a metal-ion-chelating nitrilotriacetate (NTA) moiety [28]. This moiety
binds reversibly to the oligohistidine sequences that are displayed by the
fusion proteins. The feasibility of the approach has been demonstrated by
binding NTA-chromophore conjugates to oligohistidine fusion proteins of a
ligand-gated ion channel and a G protein-coupled receptor (GPCR). Possible
drawbacks of the approach are the modest stability of the complex and
unspecific binding of the NTA derivate to other proteins. As already mentioned,
an alternative strategy is based on the expression of a cell surface protein as
a fusion protein with an a-bungarotoxin-binding peptide and the incubation
of cells expressing this protein with covalently derivatized a-bungarotoxin
derivatives [24]. This labeling is of higher specificity than the His tag-based
labeling, but also suffers from the fact that it is noncovalent and hence
reversible.
We have recently developed a novel labeling strategy for cell surface proteins,
which promises to overcome some of the limitations of these approaches [30].
Here, the protein of interest is fused to a carrier protein (CP) and the
corresponding fusion protein is then specifically labeled with CoA derivatives
through a posttranslational modification catalyzed by a PPT.
CPs are integral components of various primary and secondary metabolic
pathways, including fatty acid synthesis (FAS),nonribosomal peptide synthesis
(NRPS), polyketide synthesis (PKS), and lysine biosynthesis. All CPs harbor
472
I a phosphopantetheine (Ppant) as a covalently attached prosthetic group
(Fig. 8.2-G(a))[52]. The Ppant serves as the attachment site for the building
blocks and intermediates of different pathways. The different substrates are
coupled as acyl thioesters to the free SH group of Ppant. Depending on
the structure of the bound substrate, CPs are named acyl carrier proteins
(ACPs),peptidyl carrier proteins (PCPs) or aryl carrier proteins (ArCPs).The
covalent attachment of Ppant to the CP is catalyzed by a group of enzymes
named phosphopantetheine transfrases [52]. PPTases use CoA as the source
for Ppant and attach it as a phosphodiester to an invariant serine residue of
the CP (Fig. 8.2-G(a)).Representative examples for PPTases are the PPTase
acyl-carrier protein synthase (AcpS) from E. coli, which modifies ACPs, and
the PPTase Sfp from Bacillus subtilis, which accepts PCPs from NRPS as
substrates but also ACPs of FAS and PKS [52]. The overlapping substrate
specificity of Sfp stands in contrast to that of AcpS which transfers the Ppant
only to ACPs, but not to the PCPs of the enterobactin synthetase EntF from
E. coli or other PCPs.
Structural and biochemical studies have revealed that the #?-mercapto-
ethylamine group of CoA does not participate in the recognition of CoA by
PPTases and that thiol-modified CoA derivatives can be employed for the
labeling of CPs [53]. This lack of sensitivity with respect to the modification
of the #?-mercaptoethylamine of CoA has been exploited to achieve specific
labeling of CP-fusion proteins on the surface of eukaryotic cells (Fig. 8.2-G(b))
[30, 501. In initial applications, we chose the ACP/PPTase pair from E. coli.
ACP from E. coli is a small protein of only 77 residues that folds into a
compact structure composed of four a-helices, a fold shared by other CPs.
The Ppant derivative is attached to Ser3G of ACP. The protein contains no
cysteines, thus avoiding a potential misfolding of secreted ACP fusion proteins
due to unwanted oxidations. When tested in vitro, ACP from E. coli is readily
modified by CoA derivatives and the rate of reaction does not show a significant
dependence on the nature ofthe label. At concentrations of 0.2 pM AcpS, 1 pM
ACP, and 5 pM of the CoA derivative, a typical labeling experiment is complete
within 10 min and the reaction is nearly quantitative.
The ACP-Saglp fusion protein serves as a representative example for the
modification of a protein on the surface of the yeast S. cerevisiae. Saglp is
the a-agglutinin of yeast cells and is covalently attached to the B-1,G-glucan
of the cell wall via its modified glycosylphosphatidylinositol anchor [54].
For the construction of the fusion protein we replaced the natural signal
sequence of Saglp with the signal sequence of the a-factor followed by the
coding sequence of the bacterial ACP. The combined addition of CoA-Cy3
and AcpS resulted in the specific labeling of yeast cells expressing ACP-
Saglp (Fig. 8.2-7(a)).The observed specificity and efficiency of labeling can be
rationalized by two properties of the system. First, the cell surface separates
the cell-impermeable CoA derivatives and the appropriate PPTase from host
PPTase, host ACPs, and underivatized CoA, thereby suppressing unwanted
side reactions such as the labeling of internal CPs. Second, bacterial ACPs
Fig. 8.2-6 CP-based labeling o f fusion proteins.

(a) Phosphopantetheinylation ofCPs, (b) Labelrng o f CP (cell
surface) fusion proteins, (c) CoA derivatives described in this
article.
474
Fig. 8.2-7 Fluorescence labeling o f ACP protein fused t o a nuclear localization

fusion proteins on cell surfaces. sequence (ECFP-NLS3). The nuclear green
(a) Fluorescence micrographs o f yeast cells fluorescence identifies the transfected cells.
expressing ACP-Sag1p. Cells shown in Confocal micrographs are showing overlays
(a) were labeled with CoA-Cy3. (b,c) Labeling o f fluorescence and transmission channels.
o f HEK293 cells transiently coexpressing (b) Labeling with Cy3 using CoA-Cy3,
ACP-NK1 and enhanced green fluorescent (c) Labeling with Cy5 using CoA-Cy5.
are not efficient substrates of eukaryotic PPTases. This feature minimizes

unwanted phosphopantetheinylation of the fusion protein before it escapes
from the cytosol into the secretory pathway. In addition to ACP-Saglp, we
have previously shown that ACP C-terminally attached to the cu-agglutinin

receptor Aga2p (Aga2p-ACP) can be effectively labeled on the surface of
yeast. Together, these experiments demonstrate the flexibility of the ACP tag
with respect to different orientations in the fusion protein. As the N- and
C-termini of ACP reside on the same side of the protein and are proximal
to each other it should be possible to insert ACP into the loops of cell
surface proteins without dramatically perturbing the structures of the host
and the guest protein. ACP fusion proteins can also be specifically labeled
on the surfaces of mammalian cells. For example, ACP was attached to the
exoplasmic N-terminus of the human GPCR neurokinin 1 ( N K l ) [30]. GPCRs
represent an important class of therapeutic targets and the specific labeling
of these proteins with spectroscopic probes on live cells makes the technique
an interesting starting point for the development of functional cell-based
assays [55]. As observed for yeast, HEK293 cells transiently expressing ACP-
NKI could be labeled with different fluorophores or affinity labels, whereas
nontransfected cells were not labeled to any significant extent (Fig. 8.2-7(b)).
Furthermore, PCP fusion proteins can be labeled specifically in vitro and on the
surface of bacteriophage M13, further extending the number of hosts shown
to be able to display active CP-fusion proteins [SG, 571.
Besides its promiscuity toward different labels, CP fusions of cell surface
proteins can be used for studying the dynamics of their distribution on and
in the cell. Specifically, the membrane impermeability of PPTases and CoA
derivatives limits the labeling to proteins that are already displayed on the
cell surface during the incubation and leaves those proteins unlabeled that
are either still in the secretory pathway or already internalized. This feature
allows monitoring of the subsequent movement of the fusion protein from
the plasma membrane to other cellular locations. Furthermore, the controlled
addition of enzyme and substrate and their rapid removal permits a precise
timing of the labeling. Thus, labeling reactions with different fluorophores at
different times could be used to discriminate between different generations of
CP-fusion proteins in individual cells. As a proof of principle, we have recently
performed pulse-chase experiments in which three different generations of
ACP-Saglpwere labeled with different fluorophores on yeast cells, allowing for
a stunning visualization of localized cell wall growth. Of course, speed and high
efficiency of the labeling are important prerequisites for these applications.
Our previous measurements have indicated that the kinetics of the labeling
of ACP fusion proteins on cell surfaces are comparable to those measured
for the purified ACP. Consequently, labeling can be quantitative within a
period of about 10 min, provided that sufficiently high substrate and PPTase
concentrations are used.
Another general approach for the labeling of fusion proteins, which
conceptually resembles the labeling of CP-fusion proteins is the biotinylation
of so-called acceptor peptides by biotin ligase [31, 581. The biotinylation of
fusion proteins by itself is a valuable modification, as numerous streptavidin-
and avidin-based probes and materials are commercially available. However,
476
I
streptavidin is a tetramer of 53 kD, which can be problematic for a number of
applications. The versatility of the approach would therefore be significantly
broadened if the synthetic probe could be directly attached to the biotin.
Recently, Ting and coworkers have demonstrated that biotin ligase BirA from
E. coli also accepts ketone isostere as a substrate for the labeling of a protein
fused to an acceptor peptide [32]. The introduction of a keto functionality
into fusion proteins allows their subsequent labeling with hydrazides linked to
biophysical probes. This two-step labeling approach is particularly attractive for
the labeling of fusion proteins of the cell surface or for the labeling of proteins
in vitro. Its main advantages are the short size of the tag (15 amino acids) and
the ease with which hydrazide derivatives can be synthesized. However, the
formation of the hydrazone has two problematic features. Firstly, hydrazone
formation is a slow process at pH 7 and, secondly, its formation is reversible.
8.2.4
Conclusions and Future Developments
The labeling of AGT and CP-fusion proteins discussed here demonstrates the
two main advantages of a tag-mediated labeling of fusion proteins. Firstly,
proteins can be equipped with functionalities that cannot be genetically
encoded. This can be achieved in live cells or in vitro and possible functionalities
range from synthetic fluorophores to ligands that mediate the interaction with
other proteins. Secondly, a single fusion protein can be used for a variety of
different applications. This second point applies, in particular, to AGT fusion
proteins that can be used for pulse-chase experiments in live cells or for
the generation of protein microarrays. Together, these properties make such
fusion proteins powerful tools for functional proteomics and we are convinced
that we will see many applications of these and other related technologies in
the near future.
What kind of further technological developments can be expected in this
area of research? An obvious extension of the previous work is the specific
labeling of fusion proteins in multicellular organisms such as Duosophila
melanoguster, Caenorhabditis elegans, or mice. Another important development
would allow the specific and simultaneous labeling of multiple fusion proteins
with different (fluorescent) probes to collect multiple parameters and proteins,
simultaneously in one cell. Such a multicolor imaging could either be achieved
by using different labeling approaches, such as AGT and the tetracysteine tag,
or by generating mutants of one tag with so-called orthogonal substrate
specificities. As previous experiments have shown, AGT appears to be an
ideal candidate for the latter strategy. Furthermore, the active transport of
membrane-impermeable compounds for labeling experiments in live cells
would significantly extend the general applicability of the approach. Here, the
recently described arginine transporters are attractive candidates to achieve
References I 4 7 7
this goal [59,60].Research in these directions is currently pursued in a number

of laboratories.
Acknowledgments
I.S. was supported by a stipend of the Fonds der chemischen Industrie. We

thank the Swiss National Science Foundation, the EPFL, and the Human
Frontier Science Program for generous support.
References
1. J.A. Bornhorst, J.J. Falke, Methods B. Seraphin, B. Kuster, G. Neubauer,

Enzymol. 2000,326,245-254. G. Superti-Furga, Nature 2002, 415,
2. D.B. Smith, Methods Enzymol. 2000, 141-147.
326,254-270. 10. P. Uetz, L. Giot, G. Cagney,
3. J. Zhang, R.E. Campbell, A.Y. Ting, T.A. Mansfield, R.S. Judson,
R.Y. Tsien, Nat. Rev. Mol. Cell Biol. J.R. Knight, D. Lockshon, V. Narayan,
2002, 3,906-918. M. Srinivasan, P. Pochart,
4. J. Lippincott-Schwartz, A. Qureshi-Emili, Y. Li, B. Godwin,
G.H. Patterson, Science 2003, 300, D. Conover, T. Kalbfleisch,
87-91. G . Vijayadamodar, M. Yang,
5. S. Fields, 0. Song, Nature1989, 340. M. Johnston, S. Fields, J.M. Rothberg,
245-246. Nature 2000,403, 623-627.
6. N. Johnsson, A. Varshavsky, Proc. 11. T. Ito, T. Chiba, R. Ozawa,
Natl. Acad. Sci. U.S.A. 1994, 91, M. Yoshida, M. Hattori, Y. Sakaki,
10340- 10344. Proc. Natl. Acad. Sci. U.S.A. 2001, 98,
7. M.R. Martzen, S.M. McCraith, 4569-4574.
S.L. Spinelli, F.M. Torres, S. Fields, 12. J.P. Miller, R.S. Lo, A. Ben-Hur,
E.J. Grayhack, E.M. Phizicky, Science C. Desmarais, I. Stagljar, W. Stafford
1999,286,1153-1155. Noble, S. Fields, Proc. Natl. Acad. Sci.
8. H. Zhu, M. Bilgin, R. Bangham, U.S.A. 2005, 102,12123-12128.
D. Hall, A. Casamayor, P. Bertone, 13. N. Johnsson, K. Johnsson,
N. Lan, R. Jansen, S. Bidlingmaier, ChemBioChem2003,4,803-810.
T. Houfek, T. Mitchell, P. Miller, 14. L. Wang, P.G. Schultz, Angew. Chem.,
R.A. Dean, M. Gerstein, M. Snyder, [at. Ed. Engl. 2004, 44, 34-66.
Science 2001, 293,2101-2105. 15. C.J. Noren, J. Wang, F.B. Perler,
9. A.C. Gavin, M. Bosche, R. Krause, Angew. Chem., Int. Ed. Engl. 2000, 39,
P. Grandi, M. Marzioch, A. Bauer, 450-466.
J. Schultz, J.M. Rick, A.M. Michon, 16. T.W. Muir, Annu. Rev. Biochem. 2003,
C.M. Cruciat, M. Remor, C. Hofert, 72, 249-289.
M. Schelder, M. Brajenovic, 17. 1. Giriat, T.W. Muir, J . Am. Chem. SOL.
H. Ruffner, A. Merino, K. Klein, 2003, 125,7180-7181.
M. Hudak, D. Dickson, T. Rudi, 18. J. Farinas, A.S. Verkman, J . Biol.
V. Gnau, A. Bauch. S. Bastuck. Chem. 1999, 274,7603-7606.
B. Huhse, C. Leutwein, M.A. Heurtier 19. M.M. Wu, J. Llopis, S. Adams,
R.R. Copley, A. Edelmann, J.M. McCaffery, M.S. Kulomaa,
E. Querfurth, V. Rybin, G. Drewes, T.E. Machen. H.P. Moore, R.Y. Tsien,
M. Raida, T. Bouwmeester, P. Bork, Chem. Biol. 2000, 7, 197-209.
478
I 20. D.I. Israel, R.J. Kaufman, Proc. Natl. A. Terskikh, K. Johnsson,
Acad. Sci. U.S.A. 1993, 90,4290-4294. Chembiochem 2005,6,1263-1269.
21. L.W. Miller, J. Sable, P. Goelet, 39. A. Keppler, H. Pick, C. Arrivoli,
M.P. Sheetz, V.W. Cornish, Angew. H. Vogel, K. Johnsson, Proc. Natl.
Chem., Int. Ed. Engl. 2004, 43, Acad. Sci. U.S.A. 2004, 101,
1672-1675. 9955-9959.
22. L.W. Miller, Y. Cai, M.P. Sheetz, 40. A. Juillerat, C. Heinis, I. Sielaff,
V.W. Cornish, Nut. Methods 2005, 2, J. Barnikow, H. Jaccard, B. Kunz,
255-257. A. Terskikh, K. Johnsson,
23. K.M.Marks, P.D. Braun,G.P. Nolan, ChemBioChem 2005, 6,1263-1269.
Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 41. J.E. Bergmann, S.J. Singer, J . Cell Biol.
9982-9987. 1983, 97,1777-1787.
24. Y. Sekine-Aizawa, R.L. Huganir, Proc. 42. Y. Chen, J.D. Mills, A. Periasamy,
Natl. Acad. Sci. U.S.A. 2004, 101, Diferentiation 2003, 71, 528-541.
17114-171 19. 43. S. Gendreizig, M. Kindermann,
25. B.A. Griffin, S.R. Adams, R.Y. Tsien, K. Johnsson,]. Am. Chem. SOC.2003,
Science 1998, 281, 269-272. 125,14970-14971.
26. A. Keppler, S. Gendreizig, 44. D.M. Spencer, T.J. Wandless,
T. Gronemeyer, H. Pick, H. Vogel, S.L. Schreiber, G.R. Crabtree, Science
K. Johnsson, Nut. Biotechnol. 2003, 21, 1993,262,1019-1024.
86-89. 45. H.N. Lin, W.M. Abida, R.T. Sauer,
27. D.S. Yeo, R. Srinivasan,
V.W. Cornish, J. Am. Chem. Soc. 2000,
M. Uttamchandani, G.Y. Chen, 122,4247-4248.
Q. Zhu, S.Q. Yao, Chem. Commun. 46. M. Kindermann, N. George,
(Camb.) 2003, 23,2870-2871. N. Johnsson, K. Johnsson,J. A m .
28. E.G. Guignet, R. Hovius, H. Vogel,
Chem. SOC.2003, 125,7810-7811.
Nut. Biotechnol. 2004, 22,440-444.
47. J. LaBaer, N. Ramachandran, Curr.
29. K.M. Marks, M. Rosinov, G.P. Nolan,
Opin. Chem. Biol. 2005, 9, 14-19.
Chem. Biol. 2004, 1I, 347-356.
48. S. Tugulu, A. Arnold, 1. Sielaff,
30. N. George, H. Pick, H. Vogel,
K. Johnsson, H.A. Klok,
N.Johnsson, K. Johnsson, J. Am.
Chem. SOC.2004, 126,8896-8897. Biomacromolecules 2005, 6,
31. J.E. Cronan Jr,J. Biol. Chem. 1990,
1602-1607.
265,10327-10333. 49. M. Kindermann, I . Sielaff,
32. I. Chen, M. Howarth, W. Lin,
K. Johnsson, Bioorg. Med. Chem. Lett.
A.Y. Ting, Nut. Methods 2005, 2, 2004, 14,2725-2728.
99-104. 50. N.Johnsson, N. George, K. Johnsson,
33. G. Gaietta, T.J. Deerinck, S.R. Adams, ChemBioChem 2005, 6,47-52.
J. Bouwer, 0. Tour, D.W. Laird, 51. S.R. Adams, R.E. Campbell,
G.E. Sosinsky, R.Y. Tsien, L.A. Gross, B.R. Martin, G.K. Walkup,
M.H. Ellisman, Science 2002, 296, Y. Yao, J. Llopis, R.Y. Tsien, J. Am.
503-507. Chem. SOC.2002, 124,6063-6076.
34. R.Y. Tsien, FEBS Lett. 2005, 579, 52. R.H. Lambalot, A.M. Gehring,
927-932. R.S. Flugel, P. Zuber, M. LaCelle,
35. A.E. Pegg, Mutat. Res. 2000, 462, M.A. Marahiel, R. Reid, C. Khosla,
83-100. C.T. Walsh, Chem. Biol. 1996, 3,
36. A. Lim, B.F. Li, E M B O J . 1996, 15, 923-936.
4050-4060. 53. J.J.La Clair, T.L. Foley, T.R. Schegg,
37. A. Juillerat, T. Gronemeyer, C.M. Regan, M.D. Burkart, Chem.
A. Keppler, S. Gendreizig, H. Pick, Biol. 2004, 11, 195-201.
H. Vogel, K. Johnsson, Chem. Biol. 54. P.N. Lipke, J. Kurjan, Microbiol. Rev.
2003, 10, 313-317. 1992,56,180-194.
38. A. Juillerat, C. Heinis, I. Sielaff, 55. J . Drews, Science 2000, 287,
1. Barnikow, H. Jaccard, B. Kunz, 1960-1964.
References I 4 7 9
56. J. Yin, F. Liu, X. Li, C.T. Walsh,]. Am. 59. D. Derossi, G. Chassaing,
Chem. SOC.2004, 126,7754-7755. A. Prochiantz, Trends Cell Biol. 1998,
57. J. Yin, F. Liu, M. Schinke, C. Daly, 8, 84-87.
C.T. Walsh, J. Am. Chem. SOC.2004, 60. J.B. Rothbard, E. Kreider, C.L.
126, 13570-13571. VanDeusen, L. Wright, B.L. Wylie,
58. D. Beckett, E. Kovaleva, P.J. Schatz, P.A. Wender,]. Med. Chem. 2002, 45,
Protein Sci. 1999, 8, 921-929. 3612-3618.
PART IV
Expanding the Scope of Chemical Synthesis
Chemical Biology. Fr-om Small Molecules to System Biology and Drug Design.
ISBN: 978-3-527-31150-7
Chemical Biology
9
Diversity-oriented Synthesis
9.1
Diversity-orientedSynthesis
Derek S. Tan
Outlook
Diversity-oriented synthesis (DOS) involves the synthesis of combinatorial

libraries of diverse small molecules for biological screening. Rather than being
directed toward a single biological target, DOS libraries can be used to identify
novel ligands for a variety oftargets. These ligands can then be used as powerful
probes to investigate biological processes. This chapter discusses the origins of
DOS, key enabling technologies, and library design strategies. Several recent
examples of novel chemical probes identified from DOS libraries are also
described.
9.1.1
Introduction
Small molecules are extremely powerful tools for studying biological systems
[ 11. They allow rapid and conditional modulation of biological functions,
often in a reversible, dose-dependent manner. Moreover, they can modulate
individual functions of multifunctional targets and distinguish between
different posttranslational modification and conformational states of proteins.
These features make the chemical, genetic, or pharmacological approach
a valuable complement to genetic and RNA interference-based methods,
particularly for dissecting complex, dynamic biological processes. Small
molecules can also be used to illuminate new potential therapeutic targets
and provide a very direct means of validating these targets in model systems.
ISBN: 978-3-527-31150-7
9 Diversity-orjented Synthesis
484
I However, the identification of new, highly specific small molecule probes
remains a major challenge in chemical biology. Structure- or mechanism-
based rational design is sometimes feasible when a single protein target and
a natural ligand are known. Conversely, high-throughput screening (HTS)
of small molecule libraries provides a practical and effective solution for
individual targets that may be less well characterized and for systems that
involve multiple targets. Diversity-oriented synthesis (DOS) has emerged as a
valuable approach to generate combinatorial libraries for use in these screens,
particularly novel libraries that explore untapped or underrepresented regions
of biologically relevant chemical structure space [2]. Efforts in DOS have led
to the discovery of powerful new biological probes and have also spurred
continuing advances in synthetic organic chemistry.
9.1.2
Early efforts in DOS were reported in the 1990s. However, several key synthetic
technologies that were developed earlier laid the foundations for DOS.
Foremost among these are (1)solid-phase synthesis and related separation
techniques and (2) combinatorial synthesis.
9.1.2.1 Solid-phase Synthesis and Related Separation Techniques
9.1.2.1.1 Solid-phase Synthesis

Solid-phase peptide synthesis was developed by Merrifield in the early 1960s
[ 3 ] . Subsequently, solid-phase strategies were developed for the synthesis
of oligonucleotides, as well as for nonbiopolymer small molecules, such
as synthetic drugs and natural products. In a solid-phase synthesis, the
starting material (e.g., first residue or molecular scaffold) is attached to an
insoluble solid support, such as a polymer or glass bead, via a chemical
“linker” that can be cleaved under specific, orthogonal reaction conditions
[4](Fig. 9.1-1(a)).The support-bound substrate is then exposed to solutions
of reagents to effect building block coupling reactions or other chemical
transformations. The support-bound product is then separated from the
excess reagents and reaction by-products simply by rinsing the solid supports
with appropriate solvents. This allows stoichiometric excesses and multiple
couplings to be used to drive these reactions to completion. This process is
repeated at each step in the synthesis. Finally, the product is cleaved from
the solid supports and purified as necessary, using standard techniques such
as column chromatography. Thus, solid-phase synthesis provides a rapid,
convenient, and often automatable means to isolate synthetic intermediates
9. I Diversity-oriented Synthesis I 485
that circumvent the need for tedious purifications at each step of a multistep
synthesis.
9.1.2.1.2 Precipitation Tags and Fluorous Tags

A number of other related strategies have been developed more recently to
facilitate the recovery and handling of synthetic intermediates [5]. The hetero-
geneous nature of solid-phase reactions sometimes causes problems related
to reaction kinetics, due to the slow diffusion of reagents into the polymer
matrix. Monitoring reaction progress also presents a new challenge, generally
requiring cleavage of an aliquot of the solid support, which introduces an
additional chemical reaction that may alter or degrade the reaction product, or
recourse to “on-bead” spectroscopic analysis, which often suffers from poor
sensitivity and resolution. Thus, in one alternative technique, the starting ma-
terial is linked to a “precipitation tag” or “phase switch” that is soluble under
most reaction conditions (Fig. 9.1-1(b)).Non-cross-linked polymers such as
poly(ethyleneglyco1)and polystyrene have been used for this purpose, as well
as individual functional groups. This allows standard homogeneous reaction
conditions and analytical techniques to be used during the synthesis. The
reaction product is then precipitated from the reaction mixture by adding a
solvent or reagent that induces phase switching of the tag. The precipitated
product is rinsed with the appropriate solvents to remove excess reagents and
reaction by-products. Ideally, the product can then be resolublized to allow
subsequent reactions in the synthetic sequence.
Along similar lines, polyfluorocarbon chains have been used as “fluo-
rous tags” that are soluble in both organic and fluorocarbon solvents [59]
(Fig. 9.1-1(c)).This again allows homogeneous reaction conditions to be em-
ployed. The tagged reaction products are easily separated from the reagents and
reaction by-products by extraction of the reaction mixture with an immiscible
fluorocarbon solvent or by passage over a column of fluorinated silica gel. The
recovered products can then be subjected to subsequent downstream reactions.
9.1.2.1.3 Solid-supported Reagents and Scavengers

As a variation of this theme, reagents, instead of reaction substrates, can also
be attached to solid supports or otherwise tagged to facilitate their separation
from reaction mixtures [GI. This approach is particularly useful for reagents
or reaction by-products that are difficult to remove by extraction or silica gel
chromatography. Carbodiimide coupling reagents and their urea by-products
are excellent examples (Fig. 9.1-1(d)).
Furthermore, solid supports carrying reactive functional groups can be used
as “scavengers” to trap excess reagents and reaction by-products [7]. This
allows completely homogeneous reaction conditions to be used, followed by
direct addition of or passage over a column of the appropriate scavenger-
bearing supports. Thus, for example, excess isocyanates are readily trapped
with solid supports having primary amine functionalities (Fig. 9.1-1(e)).
486
I 9 Diversity-oriented Synthesis
9.1 Diversity-oriented Synthesis I 487
4 Fig. 9.1-1 Separation techniques used in (purple) are soluble in organic solvents,
diversity-oriented synthesis. (a) Solid-phase again allowing homogeneous reaction
synthesis allows reaction products to be conditions to be used. The reaction products
separated easily from excess reagents and can then be separated from excess reagents
reaction by-products by rinsing the solid and reaction by-products by extraction with
supports (yellow) with appropriate solvents. an immiscible fluorocarbon solvent, or by
At the end o f t h e synthesis, the products are passage over a column offluorinated silica
usually cleaved from the solid support for gel (not shown). (d) Solid-supported
screening. (b) Precipitation tags or phase reagents, such as the carbodiimide shown,
switches (red) allow homogeneous reaction are used with substrates in solution t o
conditions, but can then be precipitated facilitate removal o f reaction by-products
(blue) from the crude reaction mixture by that may be difficult t o separate using
addition o f the appropriate solvent or traditional extraction or chromatographic
reagent. The reaction products are again techniques. (e) Solid-supported scavengers,
easily separated from excess reagents and such as the amine shown, are used t o
reaction by-products. Ideally, the remove excess reagents or reaction
precipitation tag can then be resolublized by-products from solution phase reactions.
for subsequent reactions. (c) Fluorous tags
9.1 2 . 2 Cornbinatorial Synthesis
9.1.2.2.1 The Power o f Cornbinatorialization

Combinatorial synthesis actually traces its origins to biological processes.
For example, the genetic recombination processes at the heart of the
immune response involve mixing and matching of various gene seg-
ments to produce libraries of antibodies and cell surface receptors.
Similarly, combinatorial chemistry involves systematic mixing and match-
ing of various chemical building blocks and transformations to gen-
erate libraries of small molecules. Importantly, solid-phase synthesis
allows convenient handling and distribution of synthetic intermediates
to facilitate this combinatorialization process. This feature was initially
leveraged to generate combinatorial peptide libraries. Subsequently, as
with solid-phase synthesis, combinatorial chemistry has been extended
to the synthesis of oligonucleotide and nonbiopolymer small molecule
libraries.
The power of combinatorialization in systematically generating large num-
bers of compounds is readily demonstrated by the following example:
Consider that there are 20 naturally occurring amino acids in humans.
This seems a rather small number in comparison to the tremendous
diversity of protein structures and functions that result by simply cou-
pling these 20 monomers. However, when one considers the exponen-
tially increasing number of combinatorial possibilities in the progressively
longer polypeptide chains below, the source of this diversity becomes
clear. (For the sake of simplicity, additional possibilities arising from
the disulfide formation between cysteine residues are omitted from this
analysis.)
488
Amino acid 20 = 2 0 ~ = 20 monomers

Dipeptide 20 x 20 = 202 = 400 combinations
Tripeptide 20 x 20 x 20 = 203 = 8000 combinations
Tetrapeptide 2ox2ox2ox2o = 204 = 160000 combinations
Decapeptide (10-mer) 20 x 20 x 20 x 2 0 . . . = 20" = 1.02 x 1013 combinations
Icosapeptide (20-mer) 20 x 20 x 20 x 20. , . . . . = 20'' = 1.05 x combinations
Hectapeptide (100-mer)20 x 20 x 20 x 20.. . . . . . . . = 20'" = 1.27 x combinations
The separation techniques described in Section 9.1.2.1 allow rapid parallel

processing of synthetic reactions, a critical requirement for the synthesis
of combinatorial libraries that may contain hundreds, thousands, or even
millions of members. Several synthetic formats are discussed below, using
two examples. First, we will consider a library of all the possible tetrapeptides
formed using any of the four amino acids: aspartate (D), histidine (H),lysine
(K), and threonine (T). There are 4 x 4 x 4 x 4 = 44 = 256 combinations.
Second, we will consider a library of all the possible tetrapeptides formed
using any of the 20 natural amino acids. There are 160 000 combinations as
indicated above.
9.1.2.2.2 Mixture Synthesis

Conceptually, the most straightforward approach to combinatorial synthesis is
to simply use all the possible building blocks in a one-pot reaction at each step
of the synthesis. Using this approach, the 256-member library described above
could be synthesized using only four coupling reactions, in which all four
possible amino acid building blocks are used in each reaction (Fig. 9.1-2(a)).
(For simplicity, the requisite intermediate N-deprotection reactions and the
final cleavage step are omitted from this analysis. Further, the relative
stoichiometries of the amino acids would need to be adjusted for differential
reactivities.) Interestingly, the 160 000-member library can also be synthesized
using only four coupling reactions, in which all 20 amino acids are used at
each step.
Although this approach is extremely efficient synthetically, it creates a
daunting problem in terms of subsequent biological evaluation of the libraries,
since the "product" of the synthesis is a complex mixture of all library
members. The library members cannot be screened individually, and so
the only hope of identifying active compounds is through in vitro binding
assays. Recourse to Edman sequencing provides some possibilities when
using peptide libraries; however, identifying active compounds from libraries
of nonbiopolymer small molecules represents a major, if not impossible,
analytical challenge.
It is worth noting, however, that processes based on selection, rather than
screening, would allow amplification of the active library members to facilitate
their eventual identification using standard analytical techniques. This is the
basis of the phage display selection protocols used for peptide libraries. Recent
progress in the DNA-templated synthesis of small molecules suggests that one

day this may be possible for nonbiopolymer small molecule libraries as well [8].
9.1.2.2.3 Parallel Synthesis

The analytical problems caused by mixture synthesis can be overcome by using
parallel synthesis. In this approach, each library member is synthesized in a
separate reaction vessel and, thus, can later be screened individually. Key early
efforts in this area were taken separately by Geysen and Houghten [9, 101. The
efficiency of separation techniques such as solid-phase synthesis allows the
parallel synthesis of relatively small libraries to be accomplished quite readily.
For example, the 256-member library above could be synthesized using
256 x 4 = 1024 coupling reactions, which could probably be accomplished in
a week or two (Fig. 9.1-2(b)).Additional efficiencies can be achieved by carrying
out the earlier coupling reactions in larger combined batches, which are then
progressively split into smaller batches as the synthesis proceeds. In this
+ + +
fashion, the synthesis can be accomplished using 4 16 64 256 = 340
coupling reactions. Even so, manual parallel synthesis rapidly becomes
unmanageable for larger libraries and recourse to expensive robotics becomes
necessary. For example, the 160 000-member library above would require
160 000 x 4 = 640 000 coupling reactions under totally parallel conditions,
and 20 + 400 + 8000 + 160 000 = 168420 coupling reactions using partially
batched protocols.
9.1.2.2.4 Split-Pool Synthesis

A third alternative approach, split-pool synthesis, combines much of the
synthetic efficiency of mixture synthesis with the ease of screening provided
by parallel synthesis. This method was developed separately by Furka and Lam
[I1,121.At each step in the synthesis, each individual building block is coupled
to the solid supports in a separate reaction vessel. The solid supports are then
combined (pooled), mixed, and redistributed (split) into new reaction vessels,
one for each building block to be coupled in the next step. The resulting
stochastic distribution of solid supports provides roughly equal numbers of
the different synthetic intermediates in each of the subsequent reactions.
Typically, enough solid supports are used to yield at least three “copies” of
the library to increase the probability that each desired library member will
be represented at least once. Thus, the 256-member library above could be
synthesized using only 4 x 4 = 16 coupling reactions (Fig. 9.1-2(c))and the
160000-member library could be produced with only 20 x 4 = 80 coupling
reactions!
The split-pool process does provide a mixture of solid supports at the end
of the synthesis. However, a critical feature of split-pool synthesis is that
each solid support has been exposed to only a single reaction sequence. Thus,
assuming ideal reaction efficiency, each solid support carries only a single
library member, which can be cleaved and assayed individually. One drawback
490
9. 7 Diversity-oriented Synthesis I 491
4 Fig. 9.1-2 Synthetic strategies used t o stochastic distribution of substrates for the
generate combinatorial libraries. Several next reaction. Generally, at least three copies
approaches t o a 256-member library o f o f a library are synthesized t o maximize the
tetrapeptides, composed o f the four amino probability that each putative library
acids aspartate (D), histidine ( H ) , lysine (K), member is represented a t least once. Since
and threonine (T), are shown. For simplicity, there are four coupling reactions required at
only the coupling reactions are considered each step, the overall synthesis requires
in these analyses. (a) Mixture synthesis 4 x 4 = 16 coupling reactions. Importantly,
involves simultaneous coupling o f all each solid support has been exposed t o only
building blocks in one-pot reactions. The a single synthetic sequence, and hence
synthesis requires only four coupling steps, carries only a single library member.
but provides a complex mixture o f 256 Encoding the solid supports with orthogonal
products, complicating screening and chemistry or physical methods (e.g., TAGT)
identification o f active library members. allows the history o f each bead t o be
(b) In parallel synthesis, each library reconstructed t o identify active library
member (3 out o f 256 are shown) is members. (d) Recursive deconvolution can
(a),
synthesized in a separate reaction vessel, also be used to identify active library
allowing each to be cleaved, purified, and members. In the first round the last set
screened individually. Since each o f reaction products are not repooled, but
tetrapeptide requires four coupling steps, are screened separately so that, for a given
the overall synthesis requires active library member (*), the identity o f t h e
256 x 4 = 1024 coupling reactions. For final (N-terminal) building block is known.
(a,
larger libraries, recourse to robotics may be Using this information, progressively
necessary. (c) In split-pool synthesis, each smaller sublibraries are made @, 0)
building block is coupled t o the solid with an increasing number o f fixed building
supports in a separate reaction vessel. The blocks until the identities o f all the building
solid supports are then pooled, mixed, and blocks have been determined.
split into new reaction vessels t o provide a
of split-pool synthesis is that the pooling steps obscure the precise identity
of each individual library member. Thus, either recursive deconvolution or
encoding strategies must be used to determine the identities of active library
members.
9.1.2.2.5 Recursive Deconvolution

One approach to identifying individual members of split-pool libraries
is recursive deconvolution, which involves resynthesis and screening with
progressively smaller sublibraries, based on the initial screening results of the
primary library [13]. So, in the example of the 256-member library above, the
products of the fourth coupling step are not pooled prior to distribution of the
individual solid supports for the final cleavage step (Fig. 9.1-2(d)).Thus, the
identity of the fourth (N-terminal) residue is known for each compound being
screened. Importantly, the compounds must still be cleaved and screened
individually to avoid the complications of compound mixtures. Once an active
library member is found, a smaller sublibrary of 43 = 64 tetrapeptides is
synthesized in which the N-terminal residue is fixed according to the active
library member. In this case, the products of the third coupling step are not
pooled prior to coupling of the fourth residue and cleavage. Thus, the identities
492
I of the third and fourth residues are known for each compound being screened.
9 Diversity-oriented Synthesis
This process is continued with a sublibrary of 42 = 16 tetrapeptides having

fixed third and fourth residues, then a final sublibrary of four tetrapeptides
having fixed second, third, and fourth residues, until a single active library
member is identified. The process is analogous for the 160 000-member library.
9.1.2.2.6 Encoding
Recursive deconvolution is effective but time consuming. An alternative
approach to identifying individual members of split-pool libraries is to use
a physical or chemical method to encode the building block coupled in each
reaction vessel [13].This can be accomplished by attaching an inert chemical
“tag” to the solid supports using orthogonal reactions (Fig. 9.1-2(c)).Once an
active library member is found, its identity can be determined by decoding the
tags from the corresponding solid support. Notably, the tags do not identify
the structure of the product directly, but instead provide a history of reaction
conditions to which the solid support has been exposed. This reaction sequence
must then be repeated to determine the structure of the library member using
standard analytical techniques.
Another approach involves physical tagging of the solid supports with
“barcoding” devices. This can be accomplished by direct modifications to
the solid supports or by enclosing the solid supports in a small permeable
reaction vessel along with the tag. A variety of tags have been used for this
purpose, ranging from Houghten’s original “tea bags” [14],to colored plastic
pegs, to fluorescent colloids, to radiofrequency transponders, to laser-etched
two-dimensional barcodes. In some cases, if the barcodes are assigned at the
very beginning of the synthesis, which are then read prior to each split step,
an exactly even distribution of synthetic intermediates can be accomplished,
allowing synthesis of exactly one copy of the library. Again, the tag only tracks
the history of reactions to which the solid support has been exposed.
9.1.2.3 Early Efforts in Diversity-oriented Synthesis

Although the term “diversity-oriented synthesis” had not yet been coined,
early efforts in the synthesis of diverse, nonoligomeric small molecule libraries
were taken by several labs in the 1990s. Ellman and Hobbs DeWitt separately
developed libraries built around benzodiazepines, privileged scaffolds that had
been shown to bind a variety of biological targets. Ellman synthesized a library
of 192 benzodiazepines by parallel solid-phase synthesis using 2 x 12 x 8
building blocks with a solid-phase cyclization reaction as the key step [15]
(Fig. 9.1-3(a)).Hobbs DeWitt synthesized a library of 40 benzodiazepines by
parallel solid-phase synthesis using 5 x 8 building blocks [16] (Fig. 9.1-3(b)).
In this case, the key cyclization reaction occurred concurrently with cleavage
from the solid support, providing products with >90% purity, which could be
used directly in biological screening.
9. I Diversity-oriented Synthesis 1493
Schreiber’s early efforts in this area were focused on libraries of compounds

having structural features reminiscent of rigid, complex, stereochemically rich
natural products. In a key early example, solid-phase split-pool synthesis
was used to generate a combinatorial library of over two million complex,
polycyclic compounds derived from shikimic acid [ 171. A stereoselective
tandem acylation-nitrone cycloaddition was used to generate 18 tetracyclic
scaffolds, to which 30 alkynes were coupled using a Sonogashira reaction, 62
amines were coupled via y-lactone aminolysis, and 62 carboxylic acids were
coupled by alcohol esterification (Fig. 9.1-3(c)).In addition, a portion of the
solid supports were left unreacted at each of the last three steps to generate a
“skip codon” that further increased the diversity of the library.
Bartlett proposed some early guidelines for library synthesis: (a)The
sequence should involve a small number of steps, (b) no more than one variable
should be introduced in any step, (c) starting materials should be readily
obtained with a diverse selection of substituents, and (d) cyclic, nonoligomeric
structures represent the most interesting targets [l8].Furthermore, Armstrong
did some early work toward libraries composed of multiple scaffolds,
derived from common synthetic intermediates [19, 201. In one case, Ugi
multicomponent coupling reaction products were converted to various linear
and cyclic derivatives (Fig. 9.1-4(a)).In another, squaric acid was proposed as
a precursor that could be converted to multiple cyclic and polycyclic products
(Fig. 9.1-4(b))and several such transformations were demonstrated.
DOS has evolved significantly since these early efforts, particularly in the
areas of library design and synthetic planning. DOS has also proven to be
a fertile ground for the development of new chemistry. These topics are
discussed in greater detail in Section9.1.3 below. DOS libraries have also
provided a variety of powerful new probes to dissect biological systems. Recent
examples are discussed in Section 9.1.4 below.
9.1.2.4 Related Alternative Strategies

Several related approaches that are complementary to the synthesis and
screening of combinatorial libraries deserve mention. Many of these can be
grouped under the broad heading of fragment-based ligand discovery [21].
This involves identification of two or more low-molecular-weight “fragments”
that bind to an individual protein target of interest. Importantly, the individual
fragments can bind with very low affinities (micromolar to millimolar), but
once they are covalently linked, through either deliberate laboratory synthesis
or in situ target-directed coupling, high-affinity (nanomolar) ligands can be
obtained. These fragment-based approaches have proven to be an effective
means to identify new ligands, although they require selection of an individual
biological target and are currently limited to biochemical screening methods.
In addition to traditional “wet” screening, in silico “virtual” screening
has also been used to identify new ligands or ligand fragments [22,231.
494
4 Fig. 9.1-3 Early efforts in diversity-oriented library o f rigid, complex, stereochemically

synthesis o f combinatorial libraries. rich, natural productlike polycyclics
(a) Ellman’s solid-phase parallel synthesis o f featuring a stereoselective t a n d e m
a 192-member library built around a acylation-nitrone cycloaddition reaction
benzodiazepine scaffold, a privileged [17]. The library size calculations are
structure f o u n d i n many synthetic drugs adjusted for the fact that the aminolysis
[15]. (b) H o b b s DeWitt’s solid-phase parallel “skip codon” (unreacted y-lactone) leaves
synthesis o f a 40-member library o f 558 tetracyclic products that are n o t
benzodiazepines featuring a cyclorelease substrates for t h e final alcohol acylation step
strategy [16]. (c) Schreiber’s solid-phase and, thus, m u s t be counted separately.
split-pool synthesis o f a 2 180 106-member
Computational algorithms are used to “dock” potential binders to an

experimentally determined protein structure, or to a homology model based
on a similar protein. The virtual hits are then purchased or synthesized and
binding is confirmed in traditional wet experiments. This approach can be
more cost-effectivethan wet screening and has successfully produced a number
of new ligands. However, it again requires selection of an individual biological
target and is also dependent on the availability of structural information about
that target.
/ /
R&
’R
R2-
0 4
Fig. 9.1-4 Early efforts toward multiscaffold (b) proposed squaric acid as a versatile
libraries. (a) Armstrong converted an Ugi precursor t o various cyclic scaffolds,
multicomponent reaction product t o several demonstrating several such reactions [20].
linear and cyclic derivatives [19] and
496
I 9.1.3
Current efforts in DOS are focused in three areas. First, a variety of library
design strategies are being explored to generate libraries that will provide
new biologically active molecules to probe a wide range of targets. Second,
new synthetic strategies are being developed to generate structural diversity
in a flexible, efficient fashion. Third, new chemical methodologies are being
developed to meet the stringent demands of DOS on reaction efficiency and
selectivity.
9.1.3.1 DOS Library Design Strategies
9.1.3.1.1 Chemical and Biological Space

Chemical structure space [24], the complete set of all possible small molecules,
has been variously calculated to contain 1030-10200structures, depending on
the algorithms used and the upper limits placed on molecule size. Clearly, it
is impossible to synthesize all the possible small molecules. Moreover, even
the largest industrial screening campaigns are limited to RZ 10' compounds, a
practically infinitesimal fraction of the total possibilities. Fortunately, however,
one can expect that only a fraction of that space will comprise molecules that
are stable and soluble in aqueous media, have appropriate functional groups
to interact with biological targets such as proteins and nucleic acids, and have
sufficient structural complexity to do so with useful levels of specificity.
Additional structural constraints are imposed when cell permeability or
bioavailability in whole organisms are considered.
Thus, a key goal in DOS is to design combinatorial libraries that target
the biologically relevant regions of chemical structure space. To address this
issue, most DOS library design strategies leverage information about known
biologically active small molecules to generate compounds that will target
these regions in a similar manner. These can be based on synthetic drugs
or natural products and both classes are attractive, complementary starting
points for DOS library design.
9.1.3.1.2 Drug-like Libraries

Synthetic drugs are often built around nitrogen-containing heteroaromatic
scaffolds that are of appropriate size to bind in the active site pock-
ets of biological targets such as enzymes and G protein-coupled re-
ceptors (Fig. 9.1-5(a)). They tend to have few or no stereogenic centers,
which greatly simplifies their synthesis. Some of these scaffolds have
been identified as privileged structures in that they have an empirically
demonstrated ability to bind multiple classes of protein targets [25, 261.
The benzodiazepine scaffold is a classic example. Although the underly-

ing basis for this “privileged” standing is not well understood, it has
been suggested that conservation of protein folds may contribute toward
this [25, 271.
These common drug scaffolds often serve as the basis for DOS of “drug-like”
libraries [29]. Furthermore, since synthetic drugs are most useful when orally
bioavailable, these library designs often take into account physicochemical
properties that have been found to correlate with this feature [30]. Notably,
many of the current commercially available libraries still fail to recapitulate
these properties [28]. Thus, efforts continue to develop drug-like libraries that
match the properties of known synthetic drugs more closely.
Recent attention has also been drawn to generating “lead-like’’libraries [31].
During the drug development process, lead optimization to provide clinical
candidates often results in increased molecular weight and hydrophobicity,
factors that can adversely affect permeability and solubility. Thus, leadlike
libraries are composed of relatively simple, low-molecular-weight, hydrophilic
compounds that must be screened at relatively high concentrations but are
then more suitable candidates for medicinal chemistry optimization.
9.1.3.1.3 Natural Product-like Libraries

Natural products exhibit tremendous structural diversity and often have
increased size and structural complexity compared to synthetic drugs
(Fig. 9.1-5(b)).They frequently contain a greater proportion of oxygen than
nitrogen heteroatoms and a significant number of stereogenic centers [28].
Although clinically used natural products are sometimes not orally bioavailable,
they are able to address a wider range of biological targets than synthetic
drugs. For example, rather than acting as ligands that bind in a protein pocket,
glycopeptide antibiotics such as vancomycin act as receptors for the C-terminal
D-ala-D-alaof bacterial peptidoglycan precursors. Moreover, protein-protein
interactions, which have historically been very difficult targets for synthetic
drugs [32, 331, can often be modulated with natural products. The natural
product anticancer drugs paclitaxel (Taxol) and vincristine are examples that
modulate tubulin protein-protein interactions.
Thus, DOS of “natural product-like’’ libraries is a major area of current
interest. Library design strategies have been divided into three general
categories, according to the degree of structural similarity to natural products
[34,35]: (a) Libraries based on the core scaffold ofan individual natural product,
(b) libraries based on specific structural motifs that are found across a class of
natural products, and (c) libraries that emulate the structural characteristics of
natural products in a more general sense. Each strategy balances the degree
of connection to natural product structure space against the accessibility
of structural diversity that is likely required to address multiple different
biological targets. Interestingly, some structural motifs originally found in
natural products have subsequently been identified as privileged structures
498
(a) Synthetic drugs
F
Difiucan F 0 0 Cipro Paxi1
(fiuconazole) (ciproftoxacin) (paroxetine)
Claritin 0 Viagra
(loratadine) ,N.Jf (sildenafil)
(b) Natural products
OH
Penicillin G
Amphotericln B
0 Hor'u"o
Me
NMe,
Vancomycin
OH
HO
HO
Vincristine
Paclitaxel(Taxol)
t Fig. 9.1-5 Structures of synthetic drugs which are often rich in stereochemical
and natural products. (a) Representative features and complex ring systems. For a
examples o f approved synthetic drugs, recent comparison o f synthetic drugs and
which are often rich in aromatic rings and natural products. see Ref. [28]. See also
nitrogen atoms. (b) Representative Fig. 9.1-7.
examples o f clinically used natural products,
and have been used in synthetic drugs. Some examples include purines,
indoles, and benzopyrans [26].
9.1.3.2 DOS Synthetic Strategies
9.1.3.2.1 New Synthetic Planning Strategies

DOS requires the development of new synthetic planning strategies. Several
early ideas were outlined by Bartlett (see Section 9.1.2.3). More recently,
Schreiber has advanced the concept of “forward synthetic analysis” as a
more formal approach to DOS planning [36, 371. Chemical transformations
are classified as generating structural diversity and/or complexity, both of
which are considered as important aims in DOS. Diversity allows a given
library to address a wide range of biological targets while complexity provides
compounds having useful levels of biological specificity [ 381.
To maximize the efficiency of a synthetic route, each diversity-generating
reaction should provide products that are substrates for another such reaction.
For example, in Schreiber’s two million-member library (see Section 9.1.2.3),
y -1actone aminolysis simultaneously couples a building block and unmasks a
hydroxyl group for the subsequent acylation step (Fig. 9.1-3(c)). Diversity-
generating reactions can be grouped into those that afford appendage,
stereochemical, and skeletal diversity. Complexity-generating reactions are
also desirable to convert simple substrates into complex products. In general,
such products will have multiple ring systems and stereochemical features.
Complexity can also be quantitated using graph theoretical and structure-
based approaches [38]. Ideally, complexity-generating reactions also introduce
structural diversity simultaneously, although this is not always possible. In the
two million-member library example above, the tandem acylation-nitrone
cycloaddition reaction generates structural complexity, while concurrently
introducing a degree of diversity through the use of ortho, meta, and para
substituted aromatics.
Developing synthetic routes that provide skeletal diversity with multiple core
scaffolds or backbone structures is an area of particular current interest. Several
approaches to generating such multiscaffold libraries have been advanced. In
one straightforward strategy, Schultz synthesized a 45 140-member library
from multiple heterocyclic scaffolds, each having a set of functional groups in
common [39].The scaffolds were coupled as building blocks, which were then
500
9. I Diversity-oriented Synthesis 1 501
4 Fig. 9.1-6 Approaches t o skeletal diversity. scaffolds from precursors, all having a triene
(a) Schultz used multiple heterocyclic functionality in common (401. (c) Schreiber
scaffolds, all having a common set o f used a set o f differentially functionalized
functional groups, as building blocks that furan precursors t o generate multiple
could then undergo the same set o f scaffolds under a single set o f reaction
appendage-coupling reactions [39]. conditions [41]. The nature ofthe scaffold
(b) Schreiber used consecutive stepwise was determined by the functionalization o f
Diels-Alder cycloaddition reactions with the furan sidechains.
various dienophiles to generate multiple
all substrates for the same set of subsequent appendage-coupling reactions

(Fig. 9.1-6(a)).
In another approach, originally advanced by Armstrong (see Section 9.1.2.3),
a set of substrates bearing a common functional group is exposed to different
complexity-generating reactions to provide a variety of new scaffolds. For
example, Schreiber has used a triene intermediate in various appendage-
coupling consecutive Diels-Alder reactions to generate multiple complex
polycyclic scaffolds in a 29 400-member library [40] (Fig. 9.1-6(b)).
Alternatively, substrates bearing different functional groups can be exposed
to a single set of reaction conditions that generates different products
depending on the “programming“ provided by the functional groups. In
a key demonstration of this strategy, Schreiber used an oxidative-acidic
reaction sequence to generate multiple scaffolds from furan precursors in
a 1260-member library [41] (Fig. 9.1-6(c)). The nature of each scaffold was
determined by the functionality of the furan sidechains.
9.1.3.2.2 Assessing Library Diversity

A key goal of these synthetic strategies is to increase the structural diversity
provided by each individual library. Since DOS libraries are not directed toward
a single biological target, their utility is based on their ability to provide selective
probes for different multiple targets. While this “functional diversity” can be
assessed only through biological screening, “structural diversity” is often
used as an intermediate metric, since it is more readily accessible and likely
correlates, at least to some extent, with functional diversity. In both cases, a
key tool for analyzing diversity (and similarity) is a statistical method called
principal component analysis (PCA) [42].
In this process, a set of n descriptors is defined for each compound in
the library. These can be structural descriptors, such as molecular weight;
physicochemical descriptors, such as experimentally determined artificial
membrane permeability; or biological descriptors, such as binding constants.
Each compound can then be represented as a vector in n-dimensional space.
Of course, for n > 3 such vectors are difficult to visualize. Thus, PCA is
used to analyze the entire data set and to define new unitless axes, called
principal components or eigenvectors. Each new axis is a linear combination
of the original descriptors, calculated to represent as much of the variance
502
in the dataset as possible in each successive principal component, based on

correlations between the original descriptors. The new axes are orthogonal
and uncorrelated. Each compound can then be replotted as a vector in readily
visualized one-, two-, or three-dimensional space using its coordinates, or
eigenvalues, on these new axes (Fig. 9.1-7).This representation limits the loss
of information relative to the original n-dimensionaldataset and allows further
processing using statistical methods such as clustering or partitioning [42].
It is important to recognize that the PCA results are highly dependent on the
compounds selected for analysis and the descriptors used for each compound,
especially for small datasets and for those with outliers. However, PCA has
been useful in comparing the molecular properties of synthetic drugs, natural
products, and commercial combinatorial libraries [28] and in visualizing
small molecule inhibitors of protein-protein interactions in comparison to
commercial libraries [33]. Moreover, PCA has proven to be a powerful tool
for analyzing biological screening data to assess the functional diversity or
similarity of small molecules (see Section 9.1.4.2).
9.1.3.3 New Chemical Methodologies for DOS

DOS has proven to be fertile ground for new advances in chemical
methodology. Although synthetic techniques such as solid-phase synthesis
facilitate the separation of synthetic intermediates from excess reagents
and soluble reaction by-products, they do not allow separation of support-
bound impurities that may arise from undesired side reactions. With
traditional chromatographic purification of synthetic intermediates precluded,
extraordinarily high requirements are placed on reaction efficiency and
selectivity. In general, DOS routes require reactions that provide ~ 9 0 % yield
and stereoselectivity, lest the synthetic sequence produce such a complex
mixture as to make purification of the final product impossible. Further,
each reaction must be compatible with hundreds or even thousands of
different substrates generated by the preceding combinatorial steps. Thus,
the same ideals that have driven reaction development in traditional organic
synthesis - high yield, selectivity, and generality - apply to DOS to an even
greater extent and, as a result, DOS has been an important engine for new
advances in synthetic organic chemistry [2, 341. In particular, efforts in DOS
have led to a variety of new stereoselective reactions and a resurgence of
interest in multicomponent coupling reactions.
9.1.4
Screening of DOS libraries has provided a significant number of new biological

probes [I].Several recent examples are presented below, with a particular focus
on studies that have provided new biological insights [2]. In addition, many of
9.7 Divers@-oriented Synthesis I 503
Fig. 9.1-7 Example o f principle component principal components account for 84.2%.
analysis comparison o f synthetic drugs and Synthetic drugs (squares, capitalized) and
natural products. A set o f 20 synthetic natural products (circles, italicized) cover
drugs, including the top 10 best-sellers in distinct regions of chemical space with
2004, and 20 natural products was analyzed limited overlap; Flonase and Zocor are
for nine molecular descriptors: molecular synthetic drugs that are analogs o f natural
weight, hydrophobicity (X log P or C log P), products. Molecular descriptors were
# hydrogen-bond donors, # hydrogen-bond obtained from PubChem
acceptors, # rotatable bonds, topological (http://pubchem.ncbi.nlm.nih.gov/) and
polar surface area 1431, # stereogenic ChemBank (http://chembank.broad.
centers, # nitrogen atoms, # oxygen atoms. harvard.edu/) or calculated using
PCA was used t o reduce the ChemDraw/Biobyte and Molinspiration
nine-dimensional vectors t o (http://www.molinspiration.com). PCA was
two-dimensional vectors, which were then performed with R v1.01 (http://cran.
replotted as shown. The first principal r-project.org/). Adapted from Ref. [2] with
component accounts for 55.1% o f the permission.
original information and the first two
504
Q
R3 CNH2
High-throughput
o*o screening 0 0
HO
Q
1,890-member library Uretupamine A
Structure-activity
relationship analysis
* 9 9
HO
Ph
Uretupamine B
/-kYR4 &w.,,
Q
HY 0
High-throughput
screening
ex*"'
_______)
oho and
statistical analysis
A
HO
HO+ S Y
N T P h
7,200-member Tubacin Ph
biased library
YR4 = OH, NHOH.
0% CN-Ph
HO
Histacin
4 Fig. 9.1-8 Uretupamines, tubacin, and library [45]. This biased library was targeted
histacin. (a) Schreiber discovered to HDACs by capping each library member
uretupamine A as a function-selective with a metal-binding functional group at the
suppressor o f the yeast nutrient signaling end o f a long alkyl chain (YR4). Each subset
protein Ure2p through HTS o f a library of ofthe library was screened in two cytoblot
natural productlike compounds [44]. assays for histone acetylation and cr-tubulin
Analysis o f SAR led t o the development o f acetylation. PCA was used to replot the data
an improved analog, uretupamine B. See t o identify selective inhibitors o f histone
Fig. 9.1-9 for biological data. (b) Tubacin versus a-tubulin deacetylation, including
and histacin were discovered as histacin and tubacin. See Fig. 9.1-9 for
paralog-selective HDAC (histone biological data.
deacetylase) family inhibitors from a related
these studies highlight a key advantage of screening synthetic combinatorial

libraries, as opposed to collections of individually archived compounds.
Namely, once a flexible synthetic route has been developed, a “primary” library
of diverse molecules can be screened to identify initial “hit” molecules and
to provide information on structure-activity relationships (SAR). Using the
same synthetic route, the initial hits can then be readily optimized through the
synthesis and evaluation of “secondary”, “tuning”, or “focused” libraries and
individual analogs to identify compounds with improved potency, specificity,
and pharmacological properties.
9.1.4.1 Uretupamines, UreZp, and Glucose Signaling

Ure2p is a yeast signaling protein that regulates cellular responses to the quality
of both carbon and nitrogen nutrients (e.g.,glucose vs. acetate and ammonium
vs. proline). Ure2p represses the transcription factors Nillp and Gln3p, and
differential regulation is thought to distinguish carbon- and nitrogen-nutrient-
responsive signaling. Thus, these two effects cannot be separated using Ure2p
knockouts (ure2A), while a function-selective small molecule inhibitor would
be ideally suited to this task.
Since the functional binding sites of Ure2p have not been identified,
structure-based rational design cannot be used to identify such an inhibitor.
Thus, Schreiber screened a DOS library of 1890 natural productlike
compounds in a Ure2p binding assay on a small molecule microarray [44]
(Figs. 9.1-8(a)and 9(a)). The initial hits were retested in a secondary cell-
based reporter gene assay, leading to the identification of uretupamine A
as a functional Ure2p suppressor. The availability of analogs using the
established synthetic route allowed rapid development of a more potent
analog, uretupamine B. Despite their relatively moderate binding affinities,
uretupamines A and B (& = 18.1 and 7.5 pM) exhibited high specificity
for targeting Ure2p-mediated effects in transcriptional profiling studies of
wild-type and targetless ure2A knockout strains (Fig. 9.1-9(b)).
Further examination of the transcriptional profiling data revealed that the
uretupamines upregulated a subset of genes that are induced in response to
506
I
9.I Diversity-oriented Synthesis 507
4 Fig. 9.1-9 Biological data obtained using mesenchymal progenitor cells (C3HlOT1/2)
probes identified from DOS libraries. as demonstrated by histochemical staining
Uretupamine: (a) A small molecule of the osteoblast-specific marker alkaline
microarray o f library members was probed phosphatase (red) in purmorphamine-
with Cy5-labeled Ure2p. The resulting treated (bottom), but not dimethyl sulfoxide
fluorescent spot corresponding t o DMSO-treated (top) cells. Cell nuclei are
Ure2p-bound uretupamine A is shown. stained blue. (8) Conversely, reversine
(b) The biological effects of uretupamine A (5 p M ) induces dedifferentiation o f
were assessed by transcriptional profiling o f lineage-specific murine myoblasts (C2C12)
wild-type (PM38) and ure2A knockout yeast to multipotent mesenchymal progenitor
treated with 100 p M uretupamine A versus cells, which can then be induced t o
vehicle control (N,N-dimethylformamide). differentiated into osteoblasts or adipocytes
Uretupamine upregulates UREZ-dependent (not shown). Histochemical staining for the
genes in wild type, but not “targetless” osteoblast-specific marker alkaline
ure2A cells, indicating a high degree o f phosphatase (red) was apparent in cells
specificity (right). Reprinted from Ref. [44] exposed to osteogenesis-inducing medium
with permission. Tubacin and histacin: following initial dedifferentiation with
(c) Fluorescence microscopy experiments reversine (bottom), but not DMSO (top).
were used to evaluate the effects o f Adapted from Refs. [47-501 with
trichostatin A (TSA), a pan-HDAC inhibitor, permission. Fexaramine: (h) Transcriptional
tubacin, and histacin on histone acetylation profiling was used to analyze the effects o f
(green, top), and a-tubulin acetylation (red, various FXR agonists in human primary
bottom) in A549 cells. Nuclei are stained hepatocytes. Profiles were compared
with Hoechst dye (blue). TSA upregulates following treatment with fexaramine
both histone and a-tubulin acetylation while (10 p M ) , a highly specific FXR agonist;
tubacin is selective for a-tubulin acetylation chenodeoxycholic acid (CDCA, 100 pM), the
and histacin is selective for histone primary bile acid; and GW4064 (10 p M ) ,
acetylation. Adapted from Refs. [45] and [46] another nonsteroidal FXR agonist, versus
with permission. Stem cell differentiation DMSO-treated controls. Genes whose
modulators: (d) TWS119 (1-5 p M ) induces expression patterns were altered by >2-fold
neurogenesis o f mouse embryonic stem relative to DMSO were identified and
cells (D3), as demonstrated by subjected to hierarchical clustering as
immunofluorescence staining with the shown. The differences between the
neuron-specific markers microtubule- expression profiles indicate that CDCA and
+
associated protein 2(a b) (red, top), CW4064 affect other signaling pathways as
neurofilament-M (red, bottom), and well as the FXR pathway. (i) Fexaramine was
PIII-tubulin (green, top and bottom). cocrystallized with the FXR and the binding
(e) Cardiogenol C (0.25 p M ) induces interactions were identified. (j) This allowed
cardiomyogenesis o f mouse embryonic construction o f a structural model for the
stem cells (D3), as demonstrated by weak binding o f CDCA to FXR. Reprinted
immunofluorescence staining with the from Ref. [51] with permission. llA6B17
cardiomyocyte-specific markers myosin Myc-Max inhibitor: (k and I) llA6B17
heavy chain (green, top) and the inhibits cell foci formation in Myc-
transcription factor MEF2 (red, bottom). transformed chicken embryo fibroblasts.
Cell nuclei are stained with DAPl This compound also inhibits foci formation
(4’,6-diamidino-2-phenylindole) (blue, top in Jun- but not Src-transformed cells,
and bottom). . (.f ,) Purmorphamine (2 pM) indicating a limited degree o f specificity.
induces osteogenesis o f mouse multipotent Adapted from Ref. [52] with permission
carbon nutrient quality, including Nillp. Although Ure2p is usually considered

a nitrogen-nutrient-responsivesignaling protein, this suggested that it might
also be a direct target of carbon-nutrient-responsivepathways (as opposed to
508
I pathways bypassing Ure2p and acting directly upon Nillp). Further evidence
for this model was provided by transcriptional profiling experiments with

the uretupamines in n d l A and gln3A strains. Ure2p was also found to be
selectivelydephosphorylated in response to changes in carbon, but not nitrogen
nutrient quality. Thus, these studies with a function-selective small molecule
probe from a DOS library shed a new light on the role of Ure2p in glucose
signaling.
9.1.4.2 Tubacin, Histacin, and the HDACs

The HDAC family of proteins plays a critical role in modulating chromatin
structure and in regulating the functions of other proteins. Several HDAC
inhibitors are used in clinical trials as anticancer drugs. However, these
inhibitors are not selective among the multiple HDAC paralogs that have
been identified. Thus, new selective inhibitors are required to separate the
functions of the various HDAC family members. In particular, treatment
with pan-HDAC inhibitors also results in hyperacetylation of a-tubulin, the
functional implications of which are unclear.
Despite the availability of protein structural information, structure-based
design of selective HDAC inhibitors has proved to be challenging. Thus,
Schreiber leveraged this structural information in combination with DOS to
synthesize a library of 7200 dioxane-containing natural productlike molecules
that were targeted to HDACs [45] (Fig. 9.1-8(b)).Each library member was
capped with a metal-binding functional group at the end of a long alkyl
chain, designed to bind a zinc ion at the bottom of a channel in the HDAC
active site. This library was first screened using cell-based cytoblot assays
that monitored levels of histone and tubulin acetylation. Statistical analysis
of the screening data using PCA was then carried out to identify compounds
that selectively induced histone or tubulin acetylation. These initial hits were
retested in fluorescence microscopy assays to confirm these effects, leading
to the identification of tubacin as a selective inducer of a-tubulin acetylation
(EC50 = 2.9 pM) and histacin as a selective inducer of histone acetylation
(ECSO= 34 pM) [46](Fig. 9.1-9(c)).
Tubacin proved to be a particularly valuable tool for studying HDACG,
an a-tubulin deacetylase having two catalytic domains [53, 541. In contrast
to the pan-HDAC inhibitor trichostatin A (TSA), tubacin had no effect on
gene expression in transcriptional profiling experiments and did not affect
cell cycle progression. Further, tubacin-induced a-tubulin hyperacetylation
did not alter microtubule dynamics, but it did inhibit cell migration.
Conversely, overexpression of HDACG had previously been shown to
increase cell motility [54]. Additional experiments indicated that HDACG
colocalized with acetylated a-tubulin following tubacin treatment, possibly
via the HDACG N-terminal catalytic domain, which did not have tubulin
deacetylase activity. This suggested a direct role for HDACG in modulating the
activities of other microtubule-associated proteins and implicated HDACG in
metastasis and angiogenesis, as well as in neurodegenerative disorders such

as Alzheimer’s disease. Recently, tubacin was also shown to synergize with
bortezomib against multiple myeloma [60].
9.1.4.3 Stem Cell Differentiation Modulators

The ability to control the fate of stem cells has major potential therapeutic
implications in areas such as cancer, neurodegenerative disease, and
tissue regeneration. Small molecules that can induce differentiation (or
dedifferentiation) are valuable tools for studying these processes and the
underlying signaling pathways that regulate them. Schultz has identified
several such molecules [55]by screening a multiscaffold DOS library of 45 140
druglike molecules [39](Fig. 9.1-6(a)).Cell-based phenotypic assays have been
useful in identifying molecules that may act by novel mechanisms to elucidate
new signaling pathways that control differentiation.
Several molecules have been identified that induce differentiation of
pluripotent embryonic stem cells into particular tissue-specific adult stem
cells (Fig. 9.1-10). These adult stem cells have exciting therapeutic potential,
but have generally been difficult to obtain by direct isolation and expansion.
HTS was accomplished using pluripotent mouse carcinoma cell lines
transfected with reporter genes driven by lineage-specific markers. SAR
analysis and the ease of secondary tuning library synthesis again proved
useful for optimizing the initial hits. Differentiation-inducing activity was
further confirmed by immunostaining for additional neuronal or cardiac
muscle markers in the carcinoma cell line as well as mouse embryonic
stem cell lines. TWS119 was identified as a compound that induces
neurogenesis (ECsOx 1 yM) [47] (Fig. 9.1-9(d)) while another series of
compounds, the cardiogenols, induce cardiomyogenesis (EC50 = 0.1- 1.O pM)
[48] (Fig. 9.1-9(e)).Affinity chromatography experiments identified glycogen
synthase kinase-3b (GSK-3b) as one target of TWS119 (& = 126 nM,
ICs0 = 30 nM), supporting a role for this protein in neuronal differentiation.
Studies to identify the molecular targets of the cardiogenols are ongoing.
Another molecule, purmorphamine, was identified in a screen for molecules
that induce differentiation of multipotent mesenchymal stem cells into os-
teoblasts (ECS0= 1 pM) [49] (Fig. 9.1-9(4). HTS was accomplished using a
fluorescence-based enzymatic assay for the bone-specific marker, alkaline
phosphatase. Consistent with its osteogenic activity, purmorphamine also up-
regulated Cbfal/Runx2, a master regulator of bone development, and other
bone-specific markers. Subsequent transcriptional profiling experiments re-
vealed that purmorphamine upregulates the Hedgehog signaling pathway 1551.
Conversely, dedifferentiation of tissue-specific progenitor cells could provide
another source of multipotent stem cells, which could then be retasked to
other lineages. This would be analogous to the process of tissue regeneration
observed in some amphibians. Along these lines, reversine has been identified
as a compound that induces dedifferentiation of myoblasts to multipotent
Diversity-oriented Synthesis
5101 9
H H
HO 0
HO-N cyNO
,NH2 R
Cardiogenol A (R = NHPh)
Cardiogenol B (R = OPh)
Cardiogenol C (R = OMe)
TWS119 Cardiogenol D (R = (QCH=CHPh)
NH
Purmorphamine Reversine
Fig. 9.1-10 Small molecule modulators o f mouse embryonic stem cells [48].
stem cell differentiation. Schultz has (c) Purmorphamine induces osteogenesis o f
discovered a number o f small molecules mouse mesoderm fibroblast cells [49].
that modulate stem cell differentiation from (d) Reversine induces dedifferentiation o f
a multiscaffold library o f druglike lineage-specific murine myoblasts to
heterocycles [39] (see Fig. 9.1-6(a)). multipotent mesenchymal progenitor cells,
(a) T W S l l 9 induces neurogenesis o f mouse which can then be induced t o differentiate
embryonic stem cells [47]. (b) The into osteoblasts or adipocytes [50]. See
cardiogenols induce cardiomyogenesis of Fig. 9.1-9 for biological data.
mesenchymal progenitor cells (complete at 5 pM) [50] (Fig. 9.1-9(g)).HTS

was accomplished using a two-stage assay involving initial treatment of
myoblasts with a compound to induce dedifferentiation, followed by exchange
into osteogenesis-inducing medium and assaying for alkaline phosphatase
expression as above, to detect osteoblast formation. The dedifferentiating
capacity of reversine was further confirmed by dedifferentiation of myoblasts
followed by redifferentiation to adipocytes, and by the inability of reversine to
induce direct transdifferentiation of myoblasts to osteoblasts. Efforts to identify
the molecular targets of reversine and to improve its potency and specificity
are ongoing.
9. I Diversity-oriented Synthesis I 51 1
9.1.4.4 Fexaramine and the Farnesoid X Receptor

The farnesoid X receptor (FXR) is a nuclear hormone receptor implicated in
the regulation of cholesterol metabolism. In response to bile acids, FXR is
thought to repress genes responsible for conversion of excess cholesterol to
bile acids and to induce genes involved in bile acid transport. However, bile
acids are low-affinity ligands for FXR. Thus, novel high-affinity ligands would
be useful probes to study the physiological functions of FXR and to evaluate
its potential as a new therapeutic target.
In the absence of protein structural information, Nicolaou and Evans
used a reporter gene assay to screen a DOS library of 10000 compounds
based on 2,2-dimethylbenzopyran, a privileged substructure that is found
in numerous natural products and has also been used in synthetic drugs
[51, 561. This provided a number of moderate agonists (EC50 = 5-10 yM).
Through extensive SAR analysis and the synthesis and evaluation of several
secondary focused libraries, they identified fexaramine as a potent agonist
(ECSo = 25 nM), which no longer contained the benzopyran substructure
(Fig. 9.1-11).A fluorescence resonance energy transfer (FRET) assay was used
to confirm that fexaramine induces binding of FXR and the steroid receptor
coactivator (SRC-1). Fexaramine was further demonstrated to upregulate
known FXR target genes in FXR-expressing cell lines. However, it did not
activate a panel of other nuclear hormone receptors, indicating a high degree
of specificity.
The genome-wide effects of fexaramine-induced FXR activation were then
evaluated in transcriptional profiling experiments (Fig. 9.1-9(h)).Strikingly,
fexaramine induced a distinct transcriptional profile compared to a bile acid,
indicating that the latter likely interacts with multiple signaling pathways.
Moreover, new potential roles for FXR in the bilirubin biosynthetic pathway,
thyroid metabolism, and amino acid transport were revealed. Fexaramine was
also cocrystallized with FXR (Fig. 9.1-9(i))to gain structural insights into the
binding interactions, allowing a model for low-affinity binding by bile acids to
be proposed (Fig. 9.1-9(j)).Thus, this molecule identified from a DOS library
has proven to be a valuable tool for probing FXR structure and function.
9.1.4.5 Protein-Protein and Protein-DNA Interaction Antagonists

Historically, protein-protein and protein-DNA interactions have been
extremely difficult targets to address with synthetic druglike molecules, owing
in part to the large, flat, discontinuous binding surfaces often involved and
the lack of endogenous small molecule ligands to use as starting points
for rational design [32]. To address this important challenge, Boger has
synthesized a variety of natural product-like libraries that are based on
peptides, peptidomimetics, or other oligomeric natural products. Notably,
efficient solution phase syntheses and mixture deconvolution protocols were
developed to synthesize and screen these libraries.
512
I
High-throughput
screening
_____)
R'
10,000.rnember
lfbrary
(R' = 9 scaffolds) Lead compounds (EC,, = 5-10yM)
Secondary OMe
Screening and
__3c _____)
0
Screening and
_____)
secondary library ~3.
synthesis
OMe OMe
Screening and
_____t
Secondary library R3
synthesis
Me,N
Fexaramine
OMe (EC,, = 25 nM) OMe
Fig. 9.1-11 Fexaramine, a potent, highly receptor [51, 561. Synthesis and screening o f
specific nonsteroidal agonist o f the multiple secondary libraries provided
farnesoid X receptor. Nicolaou used a library extensive SAR data, ultimately leading t o the
o f compounds built around the privileged development of fexaramine as a potent
2,2-dimethylbenzopyran substructure, which agonist. Fexaramjne proved t o be highly
is found in a wide range of natural products, specific for activation ofthe FXR signaling
t o discover lead compounds that were pathway. See Fig. 9.1-9 for biological data.
moderate agonists o f the farnesoid X
This approach has yielded an impressive collection of molecules that

inhibit both extracellular and intracellular protein-protein interactions, as
well as protein-DNA interactions [57]. In one particularly interesting case, a
9. 1 Diversity-oriented Synthesis I 513
series of isoindoline-based compounds were identified by Vogt and Boger

as inhibitors of the protein-protein interaction between the Myc and
Max transcription factors [52]. Myc is aberrantly activated in a number
of human cancers and acts by heterodimerization with Max via their
helix-loop-helix leucine zipper domains, leading to transcription of Myc
target genes.
Several different DOS libraries were screened using a biochemical FRET
assay, yielding four hits, including IIAGB17, from a 240-member library built
around a peptidomimetic isoindoline scaffold (Fig. 9.1-12). The activity of
the hits was further confirmed using enzyme-linked immunosorbent assays
(ELISA) and electrophoretic mobility shift assays (EMSA) (IIAGB17 ELISA
IC50 125 pM; EMSA IC50 50 pM). Two of the hits also inhibited cell
focus formation in Myc-transformed chicken embryo fibroblasts (IIAGB17
IC50 = 15-20 pM) (Fig. 9.1-1O(k,l)).In control experiments, IIAGB17 also
inhibited focus formation in Jun-transformed cells, but not Src-transformed
cells, indicating a limited degree of biological specificity. While further
characterization of these inhibitors is necessary, this work demonstrated
the feasibility of inhibiting transcription factor protein-protein interactions
Solubon phase
synthesis
OMe X-R3
R’
0 0
lsoindoline diester 240-member library
High-throughput
screening
___L
-s 0 llA6817
Fig. 9.1-12 llA6B17, a small molecule biochemical FRET assay was used in the
inhibitor ofthe Myc-Max protein-protein initial screen and the hits were analyzed
interaction. Vogt and Boger identified further using ELISA, EMSA, and cell foci
llA6B17 by screening a library built around a formation assays. See Fig. 9.1-9 for
peptidomimetic isoindoline scaffold [52].A biological data.
514
I with small molecules. Such probes should be valuable tools for dissecting the
roles of these transcription factors in cancer and for evaluating their potential
as new therapeutic targets.
9.1.5
Future Development
DOS has provided a powerful arsenal of new small molecule probes to dissect
complex biological processes. It has also driven new advances in the field
of synthetic organic chemistry. In the continuing evolution of this field, the
current focus is on refining library design strategies so that new probes can be
identified as efficiently as possible given a particular biological target or system
of interest. For example, correlation of particular chemical scaffolds with
specific classes of biological targets will facilitate prioritization of appropriate
compounds to screen against these targets. Other targets may prove more
challenging, requiring ventures into new, uncharted regions of chemical
structure space. Systematic evaluation of various library design strategies
across a wide range of biological assays is on the horizon under the Molecular
Libraries Initiative of the National Institutes of Health [58]. Importantly, the
results of these experiments will be deposited into the publicly available
PubChem database (http://pubchem.ncbi.nlm.nih.gov/) to allow subsequent
statistical analyses through data mining. This will provide valuable information
for future efforts in library design.
9.1.6
Conclusion
DOS is a powerful new approach to identifying new small molecule probes

to dissect complex biological systems. Both drug-like and natural product-
like libraries that target biologically relevant regions of chemical structure
space have proven useful for discovering such probes. New synthetic planning
strategies and new chemical methodologies have also been developed in the
context of DOS. Thus, the exciting potential of DOS in chemical biology has
now been demonstrated clearly. Further evolution and refinement of this field
can be expected in the coming years.
Acknowledgments
Generous financial support for my laboratory has been provided by the

NIH (P41 GM076267, R21 CA 104685), CDMRP (CM030085), NYSTAR
James D. Watson Investigator Program, William Randolph Hearst Fund in
References 1515
Experimental Therapeutics, Mr. William H. Goodwin and Mrs. Alice Goodwin

and the Commonwealth Foundation for Cancer Research, and Experimental
Therapeutics Center of MSKCC.
References
1. J.S. Potuzak, S.B. Moilanen, D.S. Tan, 10. R.A. Houghten, General method for
Discovery and applications of small the rapid solid-phase synthesis of large
molecule probes for studying numbers of peptides: specificity of
biological processes, Biotechnol. Genet. antigen-antibody interaction at the
Eng. Rev. 2004, 21, 11-77. level of individual amino acids, Proc.
2. D.S. Tan, Diversity-oriented synthesis: Natl. Acad. Sci. U. S. A. 1985, 82,
exploring the intersections between 5131-5135.
chemistry and biology, Nut. Chem. 11. A. Furka, F. Sebestyen, M. Asgedom,
Biol., 2005, I, 74-84. G. Dibo, General method for rapid
3. R.B. Merrifield, Solid phase peptide synthesis of multicomponent peptide
synthesis. I. The synthesis of a mixtures, Int.]. Pept. Protein Res. 1991,
tetrapeptide, /. Am. Chem. Soc. 1963, 37,487-493.
85,2149-2154. 12. K.S. Lam, S.E. Salmon, E.M. Hersh,
4. F. Guillier, D. Orain, M. Bradley, V.J. Hmby, W.M. Kazmierski, R. J.
Linkers and cleavage strategies in Knapp, A new type of synthetic peptide
solid-phase organic synthesis and library for identifying ligand-binding
combinatorial chemistry, Chem. Rev. activity, Nature 1991, 354, 82-84.
2000, 100,2091-2157. 13. R.L. Affleck, Solutions for library
5. C.C. Tzschucke, C. Markert, encoding to create collections of
W. Bannwarth, S. Roller, A. Hebel, discrete compounds, Curr. opin.
R. Haag, Modern separation Chem. Bid. 2001, 5, 257-263.
techniques for efficient workup in 14. R.A. Houghten, General method for
organic synthesis, Angew. Chem. Int. the rapid solid-phase synthesis of
Ed. Engl. 2002,41,3964-4000. large numbers of peptides: specificity
6. A. Kirschning, H. Monenschein, of antigen-antibody interaction at the
R. Wittenberg, Functionalized level of individual amino acids, Proc.
polymers-emerging versatile tools for Natl. Acad. Sci. U. S . A. 1985, 82,
solution-phase chemistry and 5131-5135.
automated parallel synthesis, Angew. 15. J.A. Ellman, Design, synthesis, and
Chem. Int. Ed. Engl. 2001, 40,650-679. evaluation of small-molecule libraries,
7. J.G. Garcia, Scavenger resins in Ace. Chem. Res. 1996, 29,
solution-phase combichem, Methods 132-143.
En~ymol.2003,369,391-412. 16. S. Hobbs DeWitt, J.S. Kiely, C.J.
8. X. Li, D.R. Liu, DNA-templated Stankovic, M.C. Schroeder, D.M.
organic synthesis: Nature’s strategy for Reynolds Cody, M.R. Pavia,
controlling chemical reactivity applied “Diversomers”: an approach to
to synthetic molecules, Angew. Chem. nonpeptide, nonoligomeric chemical
Int. Ed. Engl. 2004, 43,4848-4870. diversity, Proc. Natl. Acad. Sci. U. S. A.
9. H.M. Geysen, R.H. Meloen, S. J. 1993, 90,6909-6913.
Barteling, Use of peptide synthesis to 17. D.S. Tan, M.A. Foley, M.D. Shair, S.L.
probe viral antigens for epitopes to a Schreiber, Stereoselective synthesis of
resolution of a single amino acid, Proc. over two million compounds having
Natl. Acad. Sci. U. S . A. 1984, 81, structural features both reminiscent of
3998-4002. natural products and compatible with
516
I 9 Diversity-orjented Synthesis
miniaturized cell-based assays, J. Am. combinatorial library design, Biol.

Chem. SOC.1998, 120,8565-8566. Chem. 2003,384,1265-1272.
18. M.A. M a n , A.-L. Grillot, C.T. Louer, 28. M. Feher, J.M. Schmidt, Property
K.A. Beaver, P.A. Bartlett, Synthetic distributions: differences between
design for combinatorial chemistry. drugs, natural products, and
Solution and polymer-supported molecules from combinatorial
synthesis of polycyclic lactams by chemistry, J. Chem. InJ Comput. Sci.
intramolecular cyclization of 2003, 43,218-227.
azomethine ylides, J. Am. Chem. Soc. 29. D.A. Horton, G.T. Bourne, M.L.
1997, 119,6153-6167. Smythe, The combinatorial synthesis
19. T.A. Keating, R.W. Armstrong, of bicyclic privileged structures or
Postcondensation modifications of ugi privileged substructures, Chem. Rev.
four-component condensation 2003, 103,893-930.
products: 1-isocyanocyclohexeneas a 30. M.S. Lajiness, M. Vieth, J. Erickson,
convertible isocyanide. Mechanism of Molecular properties that influence
conversion, synthesis of diverse oral drug-like behavior, C u r . Opin.
structures, and demonstration of resin Drug Discov. Devel. 2004, 7,470-477.
capture, J. Am. Chem. SOC.1996, 118, 31. M.M. Hann, T.I. Oprea, Pursuing the
2574-2583. leadlikeness concept in
20. P.A. Tempest, R.W. Armstrong, pharmaceutical research, C u r . Opin.
Cyclobutenedione derivatives on solid Chem. Biol., 2004, 8, 255-263.
support: toward multiple core 32. M.R. Arkin, J.A. Wells,
structure libraries, J. Am. Chem. SOC. Small-molecule inhibitors of
1997, 119,7607-7608. protein-protein interactions:
21. D.A. Erlanson, R.S. McDowell, progressing towards the dream, Nat.
T. O’Brien, Fragment-based drug Rev. Drug Discov. 2004, 3, 301-317.
discovery,]. Med. Chem. 2004,47, 33. L. Pagliaro, J. Felding, K. Audouze,
3463-3482. S.J. Nielsen, R.B. Terry,
22. B.K. Shoichet, Virtual screening of C. Krog-Jensen,S. Butcher, Emerging
chemical libraries, Nature 2004,432, classes of protein-protein interaction
862-865. inhibitors and new tools for their
23. D.B. Kitchen, H.Decornez, J.R. Furr, development, C u r . Opin. Chem. Bid.
J. Bajorath, Docking and scoring in 2004,8,442-449.
virtual screening for drug discovery: 34. S. Shang, D.S. Tan, Advancing
methods and applications, Nat. Rev. chemistry and biology through
Drug D~SCOV. 2004, 3, 935-949. diversity-oriented synthesis of natural
24. C.M. Dobson, Chemical space and product-like libraries, Curr. Opin.
biology, Nature 2004, 432, 824-828. Chem. Bid. 2005, 9,248-258.
25. B.E. Evans, K.E.Rittle, M.G. Bock, 35. D.S. Tan, Current progress in natural
R.M. DiPardo, R.M. Freidinger, W.L. product-like libraries for discovery
Whitter, G.F. Lundell, D.F. Veber, P.S. screening, Comb. Chem. High
Anderson, et al. Methods for drug Throughput. Screen. 2004, 7, 631-643.
discovery: development of potent, 36. S.L. Schreiber, Target-oriented and
selective, orally effective diversity-oriented organic synthesis in
cholecystokinin antagonists, J. Med. drug discovery, Science 2000, 287,
Chem. 1988, 31,2235-2246. 1964-1969.
26. R.W. DeSimone, K.S.Currie, S.A. 37. M.D. Burke, S.L. Schreiber, A
Mitchell, J.W. Darrow, D.A. Pippin, planning strategy for
Privileged structures: applications in diversity-oriented synthesis, Angew.
drug discovery, Comb. Chem. High Chem. Int. Ed. Engl. 2004,43,46-58.
Throughput Screen. 2004, 7,473-493. 38. P. Selzer, H.-J. Roth, P. Ertl,
27. M.A. Koch, R. Breinbauer, A. Schuffenhauer, Complex
H. Waldmann, Protein structure molecules: do they add value? C u r .
similarity as guiding principle for Opin. Chem. Biol..2005, 9, 310-316
References I 517
39. S. Ding, N.S. Gray, X. Wu, Q. Ding, induce cardiomyogenesis in

P.G. Schultz, A combinatorial scaffold embryonic stem cells, /. Am. Chem.
approach toward kinase-directed SOL. 2004, 126,1590-1591.
heterocycle libraries, /. Am. Chem. SOC. 49. X. Wu, S. Ding, Q. Ding, N.S. Gray,
2002, 124,1594-1596. P.G. Schultz, A small molecule with
40. 0. Kwon, S.B. Park, S.L. Schreiber, osteogenesis-inducing activity in
Skeletal diversity via a branched multipotent mesenchymal progenitor
pathway: efficient synthesis of 29,400 cells, /. Am. Chem. SOC.2002, 124,
discrete, polycyclic compounds and 14520-14521.
their arraying into stock solutions, 1. 50. S. Chen, Q. Zhang, X. Wu, P.G.
Am. Chem. SOC.2002, 124, Schultz, S. Ding, Dedifferentiation of
13402-13404. lineage-committed cells by a small
41. M.D. Burke, E.M. Berger, S.L. molecule, /. Am. Chem. SOC.2004, 126,
Schreiber, Generating diverse 410-411.
skeletons of small molecules 51. M. Downes, M.A. Verdecia, A.J.
combinatorially, Science 2003, 302, Roecker, R. Hughes, J.B. Hogenesch,
613-618. H.R. Kast-Woelbern, M.E. Bowman,
42. L. Xue, F.L. Stahura, J. Bajorath, J.-L. Ferrer, A.M. Anisfeld, P.A.
Cell-based partitioning, Methods Mol. Edwards, J.M. Rosenfeld, J.G.A.
Biol. 2004, 275, 279-289. Alvarez, J.P. Noel, K.C. Nicolaou, R.M.
43. P. Ertl, B. Rohde, P. Selzer, Fast Evans, A chemical, genetic, and
calculation of molecular polar surface structural analysis of the nuclear bile
area as a sum of fragment-based acid receptor FXR, Mol. Cell 2003, I I ,
contributions and its application to the 1079-1092.
prediction of drug transport 52. T. Berg, S.B. Cohen, J . Desharnais,
properties, J . Med. Chem. 2000, 43, C. Sonderegger, D.J. Maslyar,
3714-3717. J. Goldberg, D.L. Boger, P.K. Vogt,
44. F.G. Kuruvilla, A.F. Shamji, S.M. Small-molecule antagonists of
Sternson, P.J. Hergenrother, S.L. Myc/Max dimerization inhibit
Schreiber, Dissecting glucose Myc-induced transformation of
signalling with diversity-oriented chicken embryo fibroblasts, Proc. Natl.
synthesis and small-molecule Acad. Sci. U. S. A. 2002, 99,
microarrays, Nature 2002, 416, 3830-3835.
653-657. 53. S.J. Haggarty, K.M. Koeller, J.C.
45. S.J. Haggarty, K.M. Koeller, J.C. Wong, C.M. Grozinger, S.L. Schreiber,
Wong, R.A. Butcher, S.L. Schreiber, Domain-selective small-molecule
Multidimensional chemical genetic inhibitor of histone deacetylase 6
analysis of diversity-oriented (HDAC6)-mediated tubulin
synthesis-derived deacetylase deacetylation, Proc. Natl. Acad. Sci.
inhibitors using cell-based assays, U.S.A. 2003, 100,4389-4394,
Chem. Biol. 2003, 10,383-396. 54. C. Hubbert, A. Guardiola, R. Shao,
46. J.C. Wong, R. Hong, S.L. Schreiber, Y. Kawaguchi, A. Ito, A. Nixon,
Structural biasing elements for in-cell M. Yoshida, X.-F. Wang, T.-P. Yao,
histone deacetylase paralog selectivity, HDAC6 is a microtubule-associated
J. Am. Chem. SOC.2003, 125, deacetylase, Nature 2002, 41 7,
5 586- 5 587. 455-458.
47. S. Ding, T.Y.H. Wu, A. Brinker, E.C. 55. S. Ding, P.G. Schultz, A role for
Peters, W. Hur, N.S. Gray, P.G. chemistry in stem cell biology, Nat.
Schultz, Synthetic small molecules Biotechnol. 2004, 22, 833-840.
that control stem cell fate, Proc. Natl. 56. K.C. Nicolaou, R.M. Evans, A.J.
Acad. Sci. U. S. A. 2003, 100, Roecker, R. Hughes, M. Downes, J.A.
7632-7637. Pfefferkorn, Discovery and
48. X. Wu, S. Ding, Q. Ding, N.S. Gray, optimization of non-steroidal FXR
P.G. Schultz, Small molecules that aeonists from natural Droduct-like
D
518
libraries, Org. Biomol. Chem. 2003, I , 59. W. Zang, Fluorous technologies for
908-920. solution-phase high-throughput
57. D.L. Boger, J. Desharnais, K. Capps, organic-synthesis, Tetrahedron, 2003,
Solution-phase combinatorial 59,4475-4489.
libraries: modulating cellular 60. T. Hideshima, J.E. Bradner, J. Wong,
signaling by targeting protein-protein D. Chauhan, P. Richardson, S.L.
or protein-DNA interactions, Angew. Schreiber, K.C. Anderson,
Chem., Int. Ed. Engl. 2003, 42, Small-molecule inhibition of
4138-4176. proteasome and aggresome function
58. C.P. Austin, L.S. Brady, T.R. Insel, induces synergistic antitumor activity
F.S. Collins, Policy forum: molecular in multiple myeloma, Proc. Natl. Acad.
biology: NIH molecular libraries S C ~U.
. S. A. 2005, 102,8567-8572,
initiative, Science 2004,
306,1138- 1139.
Chemical Biology
9.2 Cornbinatorial Biosynthesis ofpolyketides and Nonribosomal Peptides 1 519
9.2
Combinatorial Biosynthesis o f Polyketides and Nonribosomal Peptides
Nathan A. Schnarr and Chaitan Khosla
Outlook
The pursuit of novel biologically active molecules remains a difficult and

critical challenge at the forefront of the chemistry-biology interface. Nature
provides a vast array of chemical scaffolds on which to build diversity and
functionality. This chapter outlines the advances in the area of chemical
and genetic manipulation of the biosynthetic machinery responsible for the
production of polyketide and nonribosomal peptide natural products. We
hope to familiarize the reader with important developments and remaining
challenges in this area as well as demonstrate the enormous potential that lies
ahead for chemists, biologists, and engineers alike.
9.2.1
Introduction
As the need for new and improved pharmaceutical and material-based

compounds continues to grow, it is abundantly clear that cooperation among
scientists with very diverse backgrounds is essential to meet the demand.
As questions become increasingly complex, we rely more heavily on nature
to provide insight. This is especially true in the area of drug discovery and
design where biological systems offer an inordinate amount of important
information regarding structure-activity relationships in small molecules. In
many cases, the organism has accomplished the difficult task of creating the
appropriate chemical scaffold and it is left to the researcher to optimize for
a particular target. Reprogramming the biosynthetic machinery responsible
for assembling these molecules offers unmatched potential for production of
useful natural product analogs.
Polyketides and nonribosomal peptides are an important class of com-
pounds, which display a wide range of properties including antibiotic
(erythromycin, vancomycin), immunosuppressant (rapamycin), and antitu-
mor (epothilone) activities [l, 21 (Fig. 9.2-1). Although the specific building
blocks that make up these structural diverse molecules vary widely, their
biosynthetic pathways remain highly conserved and readily deconstructed.
Significant efforts have focused on understanding the basic processes associ-
ated with polyketide and nonribosomal peptide syntheses resulting in several
successful reprogramming attempts to create “unnatural” natural products.
To understand better the rationale behind these biosynthetic manipulations,
ISBN 978-3-527-31150-7
520
OH 0
0
'
OH 0
Actinorhodin Tetracenornycin
Erythromycin A
YNH2
0
I ,OH 0
Tyrocidine A
Surfactin A
Fig. 9.2-1 Polyketide and nonribosomal tetracenomycin are constructed via aromatic
peptide structures described in the text. polyketide synthases. Surfactin A and
Erythromycin A is produced by a modular tyrocidine A are produced through
polyketide synthase. Actinorhodin and nonribosomal peptide synthetases.
we need to first become familiar with the mechanisms involved in constructing

these molecules.
Polyketides are generally separated into two common classes on the basis
of the precise organization of biosynthetic enzymes [3-6]. Multimodular
(type I) polyketide synthases (PKSs), consist of large polypeptides, containing
individual, covalently tethered modules responsible for single ketide-unit
elongation of the growing chain. The specific arrangement of these modules
directly determines the structural and stereochemical outcome of the final
product. In contrast, type I1 PKSs, primarily involved in biosynthesis of
aromatic compounds, function through iterative cycling of the growing
polyketide chain between noncovalently interacting enzymes. Product size
is ultimately determined by a chain length factor (CLF) associated with the
clustered enzymatic domains. Although subtle, this mechanistic distinction
produces vastly different structures and each will be discussed separately.
As stated, modular PKSs function through cooperation of large, multienzyme
polypeptides. Primer units, which vary widely from simple acetate/propionate
to complex aromatic acids, are loaded onto the ketosynthase (KS) domain of
I
[TryrLqTJ- [?r?..lT]
9.2 Cornbinatorial Biosynthesis offolyketides and Nonribosornal Peptides
A
521
-
B
3 5 3 3 3
OH SH S OH SH
0
H o t B O A0
f j C0o A
[7ry-rq-7][FlTrTIF] JL
H
3 3 3 3 3 3
SH OH SH OH
1 , 1 1 ,
HO HO
Fig. 9.2-2 Proposed mechanism for Claisen-like condensation between diketide

polyketide formation in modular PKSs. and extender unit produces ACP-bound
(A) Substrate is transferred t o KS from triketide. KR domain reduces ,!?-ketothioester
upstream ACP. (B) AT i s loaded with to p-hydroxy thioester. KS - ketosynthase,
methylmalonyl extender unit. (C) Extender AT - acyltransferase, KR - ketoreductase,
unit is transferred to downstream ACP. ACP - acyl carrier protein.
the first module via thioester linkage to the active-site cysteine (Fig. 9.2-2).
The next sequential (downstream) acyl carrier protein (ACP)receives a specific
extender unit, usually derived from malonyl- or methylmalonyl-CoA, from
the appropriate acyl transferase (AT) domain. A Claisen-like decarboxylative
condensation between the primer and extender units affords an ACP-bound
p-ketothioester. The ultimate oxidation state and stereochemical configuration
of the intermediates are determined by collaboration of optional ketoreductase
522
I (KR), dehydratase (DH), and enol reductase (ER) domains while docked at
the ACP. Once fully processed, the extended chain is passed to the KS of
the subsequent module by a transthioesterification reaction. The process is
repeated, leading to the final module where the product is generally excised
via hydrolysis or thioesterase (TE)mediated macrocyclization.
The less clearly understood aromatic PKSs utilize a single KS(CLF)/ACP
pair capable of multiple elongation reactions to construct the complete
polyketide backbone. The number of elongation events is controlled by the CLF
associated with the KS domain. Transthioesterification and decarboxylative
condensation reactions proceed in an analogous fashion to modular systems.
The ultimate topology of advanced aromatic polyketides is controlled by
precise combination of tailoring enzymes responsible for redox chemistry and
cyclization pattern.
Analogous to polyketide biosynthesis, nonribosomal peptide natural prod-
ucts are produced by nonribosomal peptide synthetase (NRPS) assembly lines.
A thioester template similar to the PKS systems is employed but with very
different extender units. In place of simple malonate and substituted malonate
groups, NRPSs utilize amino acids (proteinogenic and nonproteinogenic) as
their aminoacyl-AMP derivatives for chain extension. Minimal NRPSs consist
of an adenylation domain (A), peptidyl carrier protein (PCP) or thiolation
domain (T), and a condensation domain (C). The A domain is responsible
for loading the PCP or T domain with the appropriate aminoacyl component.
The condensation domain then catalyzes the peptide bond formation between
flanking aminoacyl-PCP/T domains. Auxiliary domains including methylation
(M), epimerization (E), cyclization (Cy), and TEs combine to control peptide
topology and functionality similar to aromatic PKS assemblies.
An increasing number of “hybrid” systems containing both NRPS and PKS
components are being identified. The compatibility of these systems speaks
of the mechanistic similarities and offers an additional level of potential
regarding genetic and chemical reprogramming. Despite the many lingering
questions concerning nonribosomal peptide and polyketide syntheses (vide
infa), our current level of understanding provides numerous possibilities for
combinatorial biosynthesis. It is clear that deciphering the elaborate interplay
between chemistry and biology that governs the reactivity in these systems will
require innovative thought and experimentation.
In the simplest of terms, manipulation of polyketide and nonribosomal
peptide components involves alteration of materials, tools, or both. From a
chemical standpoint, modification of building blocks can ideally result in
structures limited only by our imagination. Biologically, genetic control over
biosynthetic machinery could allow, theoretically, for boundless reprogram-
ming capabilities. Realistically, insight from both perspectives will be required
as enzyme selectivity and reactivity can impede combinatorial prospects.
With a basic understanding of the intricate construction of polyketides
and nonribosomal peptides, we can discuss the potential for biosynthetic
generation of analogous compounds. Chemical synthesis provides a powerful
9.2 Cornbinatorial Biosynthesis ofpolyketides and Nonribosornal Peptides I 523
approach to this end. Modification of simple reagents incorporated into these

elaborate scaffolds opens possibilities for customized tailoring of structure and
functionality. In addition, subjecting more advanced intermediates to specific
sets of enzymes allows for additional chemical variation and control. This
approach may permit circumvention of highly selective enzymes that limit
processing capabilities.
Genetic manipulation of macromolecular components offers promise as
an orthogonal method of analog production. In lieu of chemical synthesis,
redirecting the biosynthetic machinery to produce novel compounds may
be opted for. Numerous approaches can be considered including physical
swapping of domains or modules, addition or inactivation of tailoring
enzymes, and alteration of product release. The significant challenge to these
methods, thoroughly discussed later in this chapter, involves optimization of
protein-protein recognition elements to achieve usable kinetic parameters for
product transfer.
Combating inherent selectivity, both small molecule and macromolecular,
will likely require combinations of the above methods. Subtle changes in
polyketide structure may necessitate reconstruction of synthase components.
Each case will provide important advances and significant obstacles. As we will
see, progress toward true combinatorial biosynthesis continues to advance and
with it, our understanding of polyketide and nonribosomal peptide synthesis
on the whole.
9.2.2
History/Development
For the past few decades, efforts toward combinatorial biosynthesis of

polyketides and nonribosomal peptides have primarily focused on determining
enzyme reactivity and specificity in truncated synthases [7-121. Given the
enormous size of the intact systems, obtaining information about individual
steps would prove challenging. Despite this, several successful attempts at
producing full-length products have been realized. This section will highlight
some of these accomplishments for each class of molecule described above.
Most ofour knowledge regarding modular PKSs comes from the work on the
6-deoxyerythronolide B synthase (DEBS) that is responsible for production of
the 14-membered macrolactone precursor to erythromycin [ 31. This relatively
small PKS is composed of three polypeptides (DEBS1, DEBS2, and DEBS3),
each of which contains two distinct modules (Fig. 9.2-3). In addition, DEBS1
possesses a loading didomain, which specifically transfers the propionate
group to the KS of module 1. Module 6 bears a TE domain responsible for
cyclization of the full-length polyketide. Recognition domains, termed linker
regions, control the precise arrangement of the individual polypeptides [13- 181.
Early studies showed high selectivity for the natural propionate starter
unit on the loading didomain of DEBS. Slight alterations in chain length,
524
Fig. 9.2-3 Schematic diagram o f terminal TE domain is responsible for

6-deoxyerythronolide B synthase (DEBS). cyclization and release o f the fully elongated
The synthase consists ofthree separate product. LDD - loading didomain,
polypeptides composed o f two modules KS - ketosynthase, AT - acyltransferase,
each, which are responsible for a single KR - ketoreductase, ER - enoyl reductase,
ketide-unit elongation of the DH - dehydratase, ACP - acyl carrier
6-deoxyerythronolide B backbone. The protein, TE - thioesterase.
acetyl or butyryl, resulted in significantly lower incorporation rates relative

to propionate [ 191. More complex substrates including benzoyl, phenylacetyl,
and B-hydroxybutyryl displayed little to no relative loading propensity. To
circumvent this obvious difficulty, a strategy was employed where the KS of
module 1 was inactivated through site-directed mutagenesis. This allowed for
direct incorporation of a phenyl analog of the natural diketide resulting in
production of 14-phenyl-6-deoxyerythronolide B [20].
The AT domain, responsible for selecting suitable extender units has also
been shown to possess high substrate specificity. To address the challenge
of incorporating unnatural extender units, methylmalonyl-specific DEBS
AT domains have been replaced with malonyl and ethylmalonyl-specific
AT domains to generate novel macrolactones [21-231. Production of 6-
desmethyl-6-ethylerythromycinA, from the ethylmalonyl-specific AT domain
replacement, required increased levels of intracellular ethylmalonyl-CoA.The
authors explain this to be the result of competitive loading with methylmalonyl-
CoA, suggesting some level of relaxed substrate specificity in the heterologous
AT domain.
Several successful attempts at altering the extent of reduction have been
completed through mutagenic inactivation of KR, DH, and ER domains
9.2 C o m b i n a t o h / Biosynthesis ofPolyketides and Nonribosomal Peptides I 525
[24-261. The difficult task of adding these domains where they are absent
has been accomplished through generation of hybrid modules. Santi and
coworkers were able to control ultimate oxidation state of 6-deoxyerythronolide
B analogs by genetic insertion of redox-active domains from the rapamycin
synthase into various DEBS modules [27]. Interestingly, some modifications
resulted in incomplete reduction of intermediates possibly due to competition
between reduction and chain transfer to the downstream module. This
observation underscores the delicate reactivity balance that must be addressed
when combining domains and modules not naturally associated with one
another.
Attempts at altering polyketide chain length have resulted in a number
of abridged lactones. By repositioning the thioester domain in DEBS to the
C-terminal end of module 5, a 12-membered macrolactone analog of 10-
deoxymethynolide, the aglycon precursor to methymycin, was produced [28].
This study revealed the propensity for TE cyclization of nonnatural substrates,
which has since been used to permit multiple turnover experiments using
single, isolated modules. In contrast, the stand-alone TE domain exhibits
increased selectivity relative to those fused to various modules indicating a
possible change in the mechanism [29].
In contrast to the modular systems, our understanding of aromatic PKSs
remains largely undeveloped. However, this area does benefit from several
high-resolution crystal and solution structures of individual domains, which
provide enormous insight into enzyme specificity and mechanism [30-341.
The ability to program specifio polymerization parameters promises readily
accessible structure variation. By simply choosing an appropriate starter unit
and polyketide length determinant, arrays of small aromatic molecules could
be potentially designed.
To elucidate the precise role of the CLF, the chain length specificity in
the actinorhodin (act) and tetracenomycin (tcrn) PKSs was effectively altered
by site-specific mutagenesis of the CLF [35]. For this, residues associated
with the KS-CLF dimer interface (as determined from crystallographic data)
were compared across a number of aromatic PKSs that specifically produce
polyketide backbones ranging from Clb to C24. Mutation of two key residues
in the CLF enabled the production of decaketide products in the typically
octaketide-specific act system. Similarly, single point mutation of the wild-
type tcrn CLF effected conversion of a decaketide synthase to an octaketide
one. Importantly, overall polyketide yields in these mutant systems were
comparable to the natural synthases indicating no significant influence on
enzyme reactivity.
Some aromatic polyketides including frenolicin and R1128 are derived
from nonacetate starter units which require a unique primer module for
their incorporation into the iterative portion of the PKS [ll].Tang et al.
have recently combined the R1128 priming module with the actinorhodin
or tetracenomycin minimal PKS in an attempt to generate novel aromatic
polyketide structures [36-381. The engineered bimodular PKS could efficiently
526
171128loading module C16Minimal PKS

ZhuC. ZhuH, ZhuG Act KSiCLF
KR, DH, ER ZhuN, MAT
*
I 0 0 I 0 0
HO-S -CoA
5x HO-S-CoA
Fig. 9.2-4 Production of aromatic polyketide analogs. Combining

the R1128 loading module with act minimal PKS produces a novel
biaromatic polyketide. See text for domain abbreviations,
MAT - malonyl acyltransferase.
produce novel hexaketides (act), octaketides (tcm),and decaketides (pms)

bearing propionyl and isobutyryl primer units in place of acetyl primers
(Fig. 9.2-4). KR, aromatase, and cyclase enzymes could effectively recognize
and modify these nonnative substrates indicating that specificity arises from
functional group recognition rather that polyketide chain length. This could
potentially allow for generation of large libraries of related, fully processed
aromatic compounds via simple, bimodular synthases.
Efforts toward reprogramming N RPSs have closely resembled those
for polyketides. Through chemical modification of building blocks and
rearrangement of biosynthetic scaffolds, the fundamental rules governing
nonribsomal peptide synthesis are gradually being deciphered [8, lo].
Increased substrate complexity within these systems, relative to PKSs,
underscores the potential for developing elaborate functionality yet unmatched
amongst polyketide structures. However, more sophisticated substrates often
bring with them challenges concerning enzyme specificity and synthetic
feasibility.
Early efforts toward novel nonribosomal peptide production focused on
module replacement in the surfactin (srf) NRPS system. Marahiel and
coworkers genetically replaced the leucine-incorporating A-T components
of module 2 and module 7 with A-T components specific for cysteine
(from d-aminoadipyl-cysteinyl-D-vahe,ACV synthetase) and ornithine (from
gramicidin S synthetase) respectively [39, 401. Although surfactin analogs
containing the predicted amino acid alterations were identified, their yields,
relative to wild-type production of surfactin, were significantly impeded. This
again underscores the importance of understanding the consequences of
9.2 Combinatorid Biosynthesis ofPolyketides and Nonribosomal Peptides I 527
mismatched protein-protein interfaces when engineering heterologous or

hybrid synthases.
The isolated TE domain from the tyrocidine (tyc) NRPS has recently been
shown to catalyze the macrocyclization of unnatural substrates to generate a
variety of cyclic peptides. In conjunction with standard solid-phase peptide
synthesis, Walsh and coworkers demonstrated a broad substrate tolerance for
peptidyl-N-acetylcysteamine thioesters by the tyrocidine TE [41,42].Cyclization
of peptide analogs, where individual amino acids were replaced with ethylene
glycol units, was observed with high efficiency. In addition, hydroxyacid starter
units were readily cyclized by the isolated TE domain to form nonribosomal
peptide-derived macrolactones. More recently, Walsh and coworkers have
demonstrated effective cyclization of PEGA resin-bound peptide/polyketide
hybrids by the tyrocidine TE domain [43]. Utilization of a pantetheine mimic for
covalent attachment of small molecules to the resin, serves as an appropriate
recognition domain for the enzyme. As peptide macrocyclizations remain
challenging in the absence of enzymatic assistance, this approach promises
facile construction of previously unattainable structures.
9.2.3
To achieve vast chemical diversity through biosynthetic manipulation, the

basic principles, outlined above, must be extended to generate small molecule
libraries efficiently. Although seemingly straightforward, this process brings
with it many difficult challenges. Fortunately, initial efforts at combinatorial
biosynthesis have provided some early insight into specific requirements that
researches should bear in mind when venturing into this area. This section will
outline the essential components and necessary considerations for bringing
library generation to practice.
With the goal of producing many novel natural product analogs in a
timely manner, the precise method of small molecule generation is a critical
consideration that must be addressed. For in vivo production, this often means
appropriate selection of the host organism. It must be readily engineered
to produce compounds of interest in at least high enough quantities for
facile detection and analysis. In addition, the host proteome should be well
characterized and readily controlled to avoid unintentional post-PKS/NRPS
tailoring that may attenuate activity.
Methods involving in vitro polyketide and nonribosonal peptide production
involve a similar set of considerations. High turnover numbers are essential
to increase product yields and minimize the amount of enzyme required. It
is important that proteins used in these experiments be readily expressible
in practical quantities and exhibit broad substrate tolerance. The latter is
imperative to minimize laborious purification of numerous proteins for library
construction.
528
I"' 0 '/OH
I+ O I", O
0 0
10 21 22 i3 24 is
9.2 Combinatorial Biosynthesis of Polyketides and Nonribosomal Peptides 1 529
4 Fig. 9.2-5 Cornbinatorial library o f legend. Figure taken from R. McDaniel,

6-deoxyerythronolide B analogs by domain A. Tharnchaipenet, C. Custafsson, H. Fu,
substitution. Colors correspond to specific M. Betlach, G. Ashley, Proc. Natl. Acad. Sci.
engineered ketide units resulting from U.S.A. 1999, 9G, 1846-1851.
substitution o f modules indicated in the
9.2.4
Thus far, we have examined several approaches toward generating natural

product analogs through chemical and genetic manipulation of PKS and NRPS
assembly lines. Realization of combinatorial biosynthetic methods requires
extension of these basic principles to create larger libraries of compounds from
known templates. The complexity of these molecules precludes traditional
chemical synthesis making biosynthetic manipulation the only viable means
to access them. This section will focus on several examples of successful library
generation using the techniques described above.
Manipulation of the DEBS system has led to the most impressive
demonstration of combinatorial biosynthesis to date. McDaniel and coworkers
have utilized specific module-swapping strategies to access a variety of
6-deoxyerythronolide B analogs with modifications at each carbon of the
macrolide backbone [26]. Modules 1-6 of DEBS were systematically replaced
with individual rapamycin synthase components to alter oxidation state and
methylation in the final polyketide product. The study produced 60 unique
structures at yields ranging from 1 to 70% of that of 6-deoxyerythronolide
B (Fig. 9.2-5). However, each new compound required independent synthase
engineering, which made library construction quite tedious.
To circumvent this laborious process, Santi and coworkers developed a
multiplasmid approach whereby genetic variations on separate plasmids could
be combined to produce a variety of analogs with multiple modifications [27].
Specifically, three discrete plasmids, each encoding one DEBS polypeptide
(i.e., DEBSl), were prepared and appropriate module swaps were executed for
each. The modified plasmids could then be selectively combined to generate
genetically altered DEBS systems. The novel synthases produced a library
comparable to the single plasmid one, but with a fraction of the effort and time
(Fig. 9.2-6).
The potential for combinatorial biosynthesis of aromatic polyketides has
remained largely untapped. However, recent work has laid the appropriate
groundwork for further exploration. Matching various initiation modules with
heterologous elongation components produced a moderate sized library of
small aromatic compounds [38]. For instance, coexpression of the R1128
loading module with the tcm minimal PKS generated the predicted products
YT127 and YT127b derived from propionyl and isobutyryl starter units
9.2 Combinatorial Biosynthesis OfPolyketides and Nonribosomal Peptides I 531
respectively (Fig. 9.2-7). Structural variants of these compounds were readily

formed by simple swapping of act with the tcm minimal PKS.
In addition to the array of molecules prepared in this study, the authors
suggest numerous possibilities for production of related structures through
alternative bimodular combinations. In all, a library of over 100 known and
predicted aromatic polyketides could be described with this methodology.
More recently, similar strategies have also been applied for the engineered
biosynthesis of nonacetate primed decaketides.
Combinatorial methods in NRPS systems have been limited to chemoenzy-
matic strategies as described above. However, given the relative ease ofmodern
peptide synthesis, these studies have resulted in a vast array of highly func-
tionalized macrocycles. A particularly impressive work in this area, executed
by Burkart and coworkers, involved the synthesis and subsequent cyclization
of more than 300 distinct peptides [44]. In an effort to gain access to improved
tyrocidine A analogs, an assortment of peptides containing both natural and
nonnatural amino acids at the D-Phe 1 and D-Phe 4 positions were synthesized
and cyclized by tyc TE on the solid phase. Products were assayed for antimi-
crobial activity and most of the analogs tested showed improved therapeutic
profiles over natural tyrocidine A. The authors mention that this methodology
may ideally be used for initial discovery purposes. The chemical synthesis
component permits limited NRPS engineering, until promising candidates
are identified.
9.2.5
Future Development
Future success in combinatorial biosynthesis will rely heavily on increased

understanding of specific recognition interfaces. This includes both motifs
associated with protein-substrate and protein-protein interactions. In
addition, development of improved techniques for monitoring and optimizing
engineered processes will be critical to test the viability of using these methods
to produce novel compounds efficiently. Despite the impressive examples
described above, the area ofcombinatorial biosynthesis is still in its infancy and
will require significant attention and ingenuity to truly harness its potential.
Structure-based design of catalytically efficient synthetases will prove vital
for future success in this area. As we have seen in the case of CLF
engineering above, intrinsic specificity in these enzymes may be altered
through manipulation of a set of key residues. However, this approach
requires knowledge of three-dimensional protein structure. As little is known
regarding the precise arrangement of specificity determinants in modular
P I G and NRPS systems, efforts toward elucidating this information are
critical to advancement. The extent to which these systems must be altered
to achieve appreciable yields of natural product analogs remains to be seen.
In some cases, analog production may be hindered by a single module or
532 I ? Diversity-on'ented Synthesis
&
\'
,
b b b
R+kIDlmu€
01
9.2 Cornbinatorid Biosynthesis ofpolyketides and Nonribosorna/ Peptides I 533
4 Fig. 9.2-7 Aromatic polyketide library from been identified are in gray. Blue - keto-
genetically combining initiation modules reductase (KR) requirements, red - cyclase
(IM) with minimal aromatic PKSs. requirements, green - other methyl trans-
Compounds that have been reported are ferases (MT), and additional KRs. Figure
shown in bold. Predicted combinations are taken from Y. Tang, T.S. Lee, H.Y. Lee,
shown in plain text. KS-CLFs that have not C. Khosla, Tetrahedron 2004, GO, 7659-7671.
domain, whereas others may require extensive engineering. The future of

combinatorial biosynthesis will rely on our collective ability to answer these
questions.
Techniques to monitor individual transformations along the assembly line
will offer necessary insight into analog processing. Ideally, problematic steps
could be precisely identified in a high-throughput manner. Recent work
by Kelleher and coworkers provides promise for realization of this goal
[45-481. In short, they have established high-resolution mass spectrometry as
a tool for evaluating intact domain-bound intermediates. This allows for facile
assessment of mechanism and specificity in these systems under biologically
relevant conditions. The enormous technological and intellectual advances
in bioanalytical chemistry promise numerous opportunities for the future of
real-time monitoring and troubleshooting.
Genetic selection of organisms capable of efficiently producing natural
product analogs represents a complementary approach to the structure-
based design described above. Evolution of microorganisms in response to
external pressures can provide an efficient means of producing novel bioactive
molecules. It may be possible to produce strains whose survival relies on their
ability to utilize heterologous biosynthetic machinery introduced through
genetic manipulation. In this way, compounds can be selected for specific
targets by simply altering the external stimuli. For instance, the discovery of
antibiotics active against certain resistant bacterial strains may be realized
by providing competitors with a host of chemical and biosynthetic resources
followed by high-throughput analysis of those that produce effective small
molecule defenses.
9.2.6
Conclusion
Given a wealth of natural chemical scaffolds for improved drug design, our
ability to generate novel pharmaceuticals requires increased understanding of
the biosynthetic processes that may lead to their discovery and production.
Polyketide and nonribosomal peptide assembly offers enormous potential for
development of combinatorial biosynthetic methods. The structural complexity
of these natural products often prohibits practical chemical synthesis, which
underscores the need for alternative means of accessing them in usable
quantities. Research in this area requires in-depth knowledge of chemical,
534
I biological, and engineering principles that typify the field of chemical biology.
The studies highlighted in this chapter demonstrate significant forward
progress but there is much need for motivated scientists from all disciplines
to take part in the development and exploration of improved methods.
Acknowledgment
This work was supported by grants from the National Institutes of Health
(CA66736 and CA77248). Nathan A. Schnarr is a recipient of an NIH
postdoctoral fellowship.
References
1. D. O’Hagan, The Polyketide Metabolites, High Throughput Screen. 2003, 6,

Ellis Horwood, New York, 1991. 527-540.
2. David E. Cane (Ed.), For a thematic 11. I. Kantola, T. Kunnari, P. Mantsala,
review covering polyketide and K. Ylihonko, Expanding the scope of
nonribosomal peptide biosynthesis aromatic polyketides by combinatorial
see, Chem. Rev. 1997, 97(7). biosynthesis, Comb. Chem. High
3. ). Staunton, K. Weissman, Polyketide 7hroughput Screen. 2003, 6, 501-512.
biosynthesis: a millennium review, 12. J. Staunton, B. Wilkinson,
Nat. Prod. Rep. 2001, 18, 380-416. Combinatorial biosynthesis of
4. C. Khosla, Natural product polyketides and nonribosomal
biosynthesis, I.Org. Chem. 2000, 65, peptides, Cum. Opin. Chem. Biol. 2001,
8127-8133. 5,159-164.
5. D. Cane, C. Walsh, C. Khosla, 13. N. Wu, S. Tsuji, D. Cane, C. Khosla,
Harnessing the biosynthetic code: Assessing the balance between
Combinations, permutations, and protein-protein interactions and
mutations, Science 1998, 282, 63-68. enzyme-substrate interactions in the
6. L. Katz, G. Ashley, Translation and channeling of intermediates between
protein synthesis: Macrolides, Chem. polyketide synthase modules, I.Am.
Rev. 2005, 105,499-528. Chem. SOC.2001, 27,6465-6474.
7. H. Floss, Antibiotic biosynthesis: 14. S. Tsuji, N. Wu, C. Khosla,
From natural to unnatural Intermodular communication in
compounds, J. Indust. Micro. Biotech. polyketide synthases: Comparing the
2001, 27, 183-194. role of protein-protein interactions to
8. C. Walsh, Combinatorial biosynthesis those in other multidomain proteins,
of antibiotics: Challenges and Biochemistry 2001, 40,2317-2325.
opportunities, ChemBioChem 2002, 3, 15. N. Wu, D. Cane, C. Khosla,
124-134. Quantitative analysis of the relative
9. S. Donadio, M. Sosio, Strategies for contributions of donor acyl carrier
combinatorial biosynthesis with proteins, acceptor ketosynthases, and
modular polyketide synthases, Comb. linker regions to intermodular transfer
Chem. High %roughput Screen. 2003, of intermediates in hybrid polyketide
6,489-500. synthases, Biochemistry 2002,42,
10. U. Keller. F. Schauwecker, 5056-5066.
Combinatorial biosynthesis of 16. R. Broadhurst, D. Nietlispach,
non-ribosomal peptides, Comb. Chem. M. Wheatcroft, P. Leadlay,
References 1535
K. Weissman, The structure of analog produced by reprogramming of
docking domains in modular polyketide synthesis, Proc. Natl. Acad.
polyketide synthases, Chem. Biol. Sci. U.S.A. 1993, 90,7119-7123.
2003, 10,723-731. 26. R. McDaniel, A. Thamchaipenet,
17. S. Tsuji, D. Cane, C. Khosla, Selective C. Gustafsson, H. Fu, M. Betlach,
protein-protein interactions direct the G. Ashley, Multiple genetic
channeling of intermediates between modifications of the erythromycin
polyketide synthase modules, polyketide synthase to produce a
Biochemistry 2001, 40, 2326-2331. library of novel “Unnatural” natural
18. P. Kumar, Q. Li. D. Cane, C. Khosla, products, Proc. Natl. Acad. Sci. U.S.A.
Intermodular communication in 1999, 96, 1846-1851.
modular polyketide synthases: 27. Q. Xue, G. Ashley, C. Hutchinson,
structural and mutational analysis of D. Santi, A multiplasmid approach to
linker mediated protein-protein preparing large libraries of
recognition, 1.Am. Chem. Soc. 2003, polyketides, Proc. Natl. Acad. Sci.
125,4097-4102. U.S.A. 1999, 96, 11740-11745.
19. J. Lau, D. Cane, C. Khosla, Substrate 28. C. Kao, G. Luo, L. Katz, D. Cane,
specificity of the loading didomain of C. Khosla, Manipulation of macrolide
the erythromycin polyketide synthase, ring size by directed mutagenesis of a
Biochemistry 2001, 29, 10514-10520. modular polyketide synthase, J . Am.
20. J. Jacobsen, C. Hutchinson, D. Cane, Chem. Soc. 1995, 117,9105-9106.
C. Khosla, Precursor-directed 29. R. Gokhale, D. Hunziker, D. Cane,
biosynthesis of erythromycin analogs C. Khosla, Mechanism and specificity
by an engineered polyketide synthase, of the terminal thioesterase domain
Science 1997, 277, 367-369.
from the erythromycin polyketide
21. X. Ruan, A. Pereda, D. Stassi,
synthase, Chem. Biol. 1999, 6 ,
D. Zeidner, R. Summers, M. Jackson,
117-125.
A. Shivakumar, S . Kakavas, M. Staver,
30. M. Crump, J. Crosby, C. Dempsey,
S. Donadio, L. Katz, Acyl transferase
J. Parkinson, M. Murray,
domain substitutions in erythromycin
D. Hopwood, T. Simpson, Solution
polyketide synthase yield novel
erythromycin derivatives, 1.Bacteriol.
structure of the actinorhodin
polyketide synthase acyl carrier protein
1997, 179,6416-6425.
from Streptomyces coelicolor A3(2),
22. J. Lau, H. Fu, D. Cane, C. Khosla,
Dissecting the role of acyl transferase Biochemistry 1997,36,6000-6008.
domains of modular polyketide 31. H. Pan, S.-C. Tsai, E. Meadows,
synthases in the choice and L. Miercke, A. Keating-Clay,
stereochemical fate of extender units, J . O’Connell, C. Khosla, R. Stroud,
Biochemistry 1999,38,1643-1651. Crystal structure of the priming
23. D. Stassi, S. Kakavas, K. Reynolds, ,8-Ketosynthase from the R1128
G. Gunawardana, S. Swanson, polyketide biosynthetic pathway,
D. Zeidner, M. Jackson, H. Liu, Structure 2002, 10, 1559-1568.
A. Buko, L. Katz, Ethyl-substituted 32. S. Findlow, C. Winsor, T. Simpson,
erythromycin derivatives produced by J . Crosby, M. Crump, Solution
directed metabolic engineering, Proc. structure and dynamics of
Natl. Acad. Sci. U.S.A. 1998, 95, oxytetracycline polyketide synthase
7305-7309. acyl carrier protein from Streptornyces
24. S. Donadio, M. Staver, J. McAlpine, uimosus. Biochemistry 2003, 42,
S. Swanson, L. Katz, Modular 8423 -8433.
organization of genes required for 33. Q. Li, C. Khosla, J. Puglisi, C. Liu,
complex polyketide biosynthesis, Solution structure and backbone
Science 1991, 252,675-679. dynamics of the holo form of the
25. S. Donadio, J . McAlpine, P. Sheldon, frenolicin acyl carrier protein,
M. Jackson, L. Katz, An erythromycin Biochemistry 2003, 42,4648-4657.
536
I 34. K. Watanabe, C. Khosla, R. Stroud, isolated thioesterase domains of
S.-C. Tsai, Crystal structure of an nonribosornal peptide synthetases,
Acyl-ACP dehydrogenase from the Biochemistry 2001,40,7099-7108.
FK520 polyketide biosynthetic 42. J. Trauger, R. Kohli, C. Walsh,
pathway: Insights into extender unit Cyclization of backbone-substituted
biosynthesis, J. Mol. Biol. 2003, 334, peptides catalyzed by the thioesterase
435-444. domain from the tyrocidine
35. Y. Tang, S.-C.Tsai, C. Khosla, nonribosomal peptide synthetase,
Polyketide chain length control by Biochemistry 2001,40,7092-7098.
chain length factor,]. Am. Chem. SOC. 43. R. Kohli, M. Burke, J. Tao, C. Walsh,
2003, 125,12708-12709. Chemoenzymatic route to macrocyclic
36. Y. Tang, T.S. Lee, C. Khosla, hybrid peptidelpolyketide-like
Engineered biosynthesis of molecules, J . Am. Chem. SOC.2003,
regioselectively modified aromatic 125,7160-7161.
polyketides using bimodular 44. R. Kohli, C. Walsh, M. Burkart,
polyketide sythases, PLoS Biol. 2004, 2, Biomimetic synthesis and
227-238. optimization of cyclic peptide
37. Y. Tang, T.S. Lee, S. Kobayashi, antibiotics, Nature 2002, 418,658-661.
C. Khosla, Ketosynthases in the 45. S. McLoughlin, N. Kelleher, Kinetic
initiation and elongation modules of and regiospecific interrogation of
aromatic polyketide synthases have covalent intermediates in the
orthogonal acyl carrier protein nonribosomal peptide synthesis of
specificity, Biochemistry 2003, 42, yersiniabactin,]. Am. Chem. SOC.2004,
6588-6595. 126,13265-13275.
38. Y. Tang, T.S. Lee, H.Y. Lee, C. Khosla, 46. L. Hicks, S. O’Connor, M. Mazur,
Exploring the biosynthetic potential of C. Walsh, N. Kelleher, Mass
bimodular aromatic polyketide spectrometric interrogation of
synthases, Tetrahedron 2004, 60, thioester-bound intermediates in the
7659-7671. initial stages of epothilone
39. T. Stachelhaus, A. Schneider, biosynthesis, Chem. Biol. 2004, 11,
M. Marahiel, Rational design of 327-335.
peptide antibiotics by targeted 47. S. Garneau, P. Dorrestein,
replacement of bacterial and fungal N. Kelleher, C. Walsh,
domains, Science 1995, 269, 69-72. Characterization of the formation of
40. A. Schneider, T. Stachelhaus, the pyrrole moiety during clorobiocin
M. Marahiel, Targeted alteration of the and coumermycin Al biosynthesis,
substrate specificity of peptide Biochemistry 2005,44,2770-2780.
synthetases by rational module 48. G . Gatto, S. McLoughlin, N. Kelleher,
swapping, MoE. Gen. Genet. 1998, 257, C. Walsh, Elucidating the substrate
308-318. specificity and condensation domain
41. R. Kohli, J. Trauger, D. Schwarzer, activity of FkbP, the FK520
M. Marahiel, C. Walsh, Generality of pipecolate-incorporating enzyme,
peptide cyclization catalyzed by Biochemistry 2005, 44, 5993-6002.
Chemical Biology
I 537
10
Synthesis of Large Biological Molecules
10.1
Expressed Protein Ligation
Matthew R. Pratt and Tom W. Muir
Outlook
The generation of proteins containing homogeneous natural and unnatural

modifications is a key component in understanding biological processes.
With this goal in mind a variety of protein-enineering approaches have been
developed, including expressed protein ligation (EPL). EPL is an intein-based
approach that yields chemically modified proteins from smaller synthetic
and/or recombinant fragments allowing for the construction of proteins
containing a broad range of a theoretically unlimited number of modifications.
The history and applications of this powerful protein-engineering technology
are highlighted below.
10.1.1
Introduction
As the biological sciences continue forward in what is referred to as the

postgenomic era, an intimate understanding of protein structure and function
has become a core goal in biological study. Looking at the number of genes
in the human genome this goal appears large but within reach; however, the
grand scope of this task is further complicated by the spatial and temporal
dynamics of protein modification on the pre- and posttranslational levels.
Seventy to ninety percent of the transcripts encoded in the human genome
contain two or more exons, allowing for the alternative splicing of pre-mRNAs.
In addition, one-third of the entire mammalian proteins are thought to be
phosphorylated [l],and 1% of all gene products (-300 genes) encode for
glycosyltransferases involved in the biosynthesis of carbohydrates appended
ISBN: 978-3-527-31150-7
538
I 70 Synthesis of Large Biological Molecules
to glycoproteins and glycolipids [ 2 ] . It is becoming increasingly clear that a

full understanding of the human proteome will be achieved only when the
individual members have been considered in a context that includes tissue and
cell-typeexpression, modification patterns, and how those patterns change over
timescales, ranging from minutes to years. Cataloging the human proteome
begins with a full description of the modifications of a given protein and
how they affect function, stability, structure, localization, and interactions with
other molecules. This task is a very large proposition, yet it is a crucial long-
term objective of biology. Indeed, many new fields including bioinformatics,
chemical biology, proteomics, and structural genomics have emerged in recent
years providing new technologies with these goals clearly in mind.
Chemistry has long played a key role in the elucidation of biological pro-
cesses. The strength of chemistry has been, and always will be, the synthesis
of homogeneous, structurally defined materials. The extension of this strength
to proteins has been a major focus of biological chemistry research, both for
the understanding of native biological function and from the perspective of
harnessing that function for nonbiological applications (e.g., reaction catalysis,
surface chemistry). Chemical synthesis has elegantly allowed for the incorpora-
tion of unnatural or modified amino acids into proteins that would otherwise
be unattainable using standard ribosomal synthesis and has facilitated the
construction of proteins possessing natural posttranslational modifications.
This second feature is of importance because it is extremely difficult to
obtain, by traditional recombinant methods, homogeneous preparations of
posttranslationally modified proteins for structural and functional studies.
The demand for specifically modified proteins has encouraged the develop-
ment of a variety of protein-engineering approaches. These techniques range
from classical chemical labeling methods to more recent methodologies such as
specific chemical reactions [3,4], enzymatic labeling [5],nonsense suppression
mutagenesis [6, 71, and expressed protein ligation (EPL) [8-121. EPL involves
the linking of synthetic and recombinant peptidelprotein building blocks to
give a final protein product. This semisynthesis is achieved using chemoselec-
tive functional groups at the appropriate ends of the fragments, allowing for
their assembly to take place with complete regioselectivity in water at physiolog-
ical pH. Although EPL can involve more chemical steps (e.g.,peptide synthesis)
than the other methods mentioned above, it has two important advantages: A
theoretically unlimited number of unnatural amino acids can be incorporated,
and a much broader range of modifications are possible. For these reasons,
EPL has been successfully applied to a broad variety of protein-engineering
problems, and this technology and its applications are highlighted below.
10.1.2
History/Development
EPL had its genesis in the convergence of chemical synthesis and protein
biochemistry. The established areas of peptide and protein chemistry provided
70.7 Expressed Protein Ligation I 539
the technical foundation, and inputs from a naturally occurring biological

process, protein splicing, catalyzed the development of the technology. To
see how this union led to the development of EPL, it is worth reviewing the
relevant areas of protein chemistry.
10.1.2.1 Protein Semisynthesis

Protein semisynthesis was originally achieved as the process by which
proteolytic or chemical cleavage fragments of natural proteins were used as the
building blocks for the resynthesis of the protein [13]. For example, it has been
shown that CNBr-induced cleavage fragments of certain proteins (pancreatic
trypsin inhibitor and cytochrome c) [14, 151 spontaneously reform the native
peptide bond between them. This spontaneous process was used to incorporate
natural and unnatural amino acids into cytochrome c. More recently, the scope
of protein sernisynthesis has been broadened to include the site-specific
modification of a natural protein. The most successful approach of this type
to date has been the introduction, by standard site-directed mutagenesis,
of a unique cysteine residue into the protein of interest, permitting selective
derivatization of the sulfhydryl group with any number of thiol-reactive probes.
This method has been used to incorporate photoactivatable cross-linkers [ 161,
fluorophores [17], and carbohydrates [18]into proteins and has been used to
prepare photocaged enzymes [19].
Another approach to protein semisynthesis involves the use of proteolytic
enzymes to facilitate the regioselective ligation of peptide fragments. Carrying
out reverse proteolysis involves the altering of the reaction conditions such
that aminolysis of an acyl-enzyme intermediate is favored over hydrolysis. This
is typically achieved by using high concentrations of organic solvents such as
glycerol, dimethylformamide (DMF), or acetonitrile in the reaction medium.
Under these conditions, the acyl-enzyme intermediate will undergo aminolysis
with a second peptide fragment, giving an amide-linked product [20].
Significant progress in the area of enzyme-mediated protein ligation has
been realized through enzyme active site engineering. In an elegant example,
Wells and coworkers made a double mutant of substillin, termed subtiliguse,
giving an enzyme capable of ligating peptide fragments with a high level
of efficiency [21]. The Bordusa laboratory has also improved the reverse
proteolysis technology by developing substrate mimetic leaving groups at the
C-terminus of the N-terminal peptide-coupling partner. These peptide esters
have been successful in trypsin-, V8 protease-, and chymotrypsin-catalyzed
reactions [22].
10.12.2 Chemical Ligation

Over the last -15 years, chemoselective ligation has emerged as a power-
ful technique in chemical biology, allowing mutual and exclusive reactive
540
I partners to be joined without the need for protecting groups in an aqueous
70 Synthesis o f h r g e Bio/ogica/ Molecules
environment. Naturally, this ligation strategy was further developed as a solu-

tion to the classic problems associated with classical fragment condensation
reactions, which are handicapped by the necessity for protected peptide build-
ing blocks. In the area of protein engineering, Offord and Rose pioneered the
use of hydrozone/oxime forming reactions for chemically ligating synthetic
and recombinant peptide fragments together [23-251. In the early 1990s, the
idea of using a chemoselective coupling reaction with fully synthetic peptides
was realized in the Kent laboratory when the 99-residue HIV-1 protease was
assembled from two -50-residue unprotected peptides using a thioester-bond
forming reaction [2G]. Given the simplicity and elegance of chemoselective
ligation, a large amount of effort has gone into expanding the technique to
include thioether, thiazolidine, and amide forming reactions [27].
The next major step in establishing chemoselective ligation as a general route
for protein synthesis came with the development of native chemical ligation
(NCL) [28]. Using this technique, two fully unprotected peptide fragments can
be reacted under neutral aqueous conditions culminating in the formation of a
native peptide bond at the ligation site (Fig. 10.1-l(a)).The first step in NCL in-
volves the chemoselective transthioesterification reaction between one peptide
containing an N-terminal cysteine residue and another peptide containing a
a-thioester group. This initial reaction is followed by a spontaneous intramolec-
ular S 4 N acyl shift, generating a native amide bond at the ligation junction.
NCL is compatible with all naturally occurring side chain functionalities in-
cluding the sulfhydryl group of cysteine. This compatibility with cysteine is due
to the reversibility of the initial transthioesterification step and allows for the
presence of internal cysteine residues in both peptide sequences. Because of its
compatibility with all naturally occurring amino acids, NCL is ideally suited for
protein semisynthesis. The only requirement for the recombinant protein is
that it contains one ofthe two chemoselective reactive groups, either a-Cys or an
a-thioester. Indeed, NCL has been used in a semisynthetic context through the
recombinant incorporation of an a-Cys residue, providing access to natural pro-
teins modified by synthetic molecules at their N-terminus [29].The remaining
obstacle is how to prepare recombinant protein a-thioesters, which are required
if synthetic peptides are to be incorporated at the C-terminus or the middle of
semisynthetic proteins. The solution to this problem fell serendipitously out
of studies on the naturally occurring process known as protein splicing.
10.1.2.3 Protein Splicing

Protein splicing is a posttranslational process whereby a precursor protein
undergoes a series of self-catalyzedintramolecular rearrangements that result
in the removal of an internal protein segment, termed intein,and the ligation
of the two flanking polypeptides, referred to as exteins (Fig. 10.1-l(b))[30,
311. One hundred and seventy-six members of the intein protein domain
family are currently cataloged (http://www.neb.com/neb/inteins.html), being
70.7 Expressed Protein Ligation 1 541
Fig. 10.1-1 (a) Mechanism o f native Intramolecular rearrangements result in the

chemical ligation. Both polypeptides are ligation o f two polypeptides with the
fully unprotected, and the reaction proceeds requisite removal of an internal
in water at neutral pH. (b) Schematic segment.
representation o f protein splicing.
characterized by several conserved sequence motifs. Inteins are autocatalytic

and some are promiscuous for the sequences of the two flanking exteins,
allowing many polypeptides to participate in protein splicing. As shown in
Fig. 10.1-l(b),the first step of protein splicing involves an N -+ S (or N + 0)
acyl shift in which the N-terminal extein is transferred to the side chain of a
cysteine (or Ser) residue at the immediate N-terminus of the intein. A second
cysteine residue (or Ser/Thr) located at the N-terminus ofthe remaining extein
attacks the resulting thioester yielding a branched thioester intermediate. The
branched intermediate is subsequently resolved on cyclization of the con-
served asparagine residue located at the C-terminus of the intein. The intein
is thus excised as a C-terminal succinimide derivative. The final step in this
process involves the S + N (or 0 -+ N ) acyl shift providing the spliced protein
product. The final step of protein splicing closely resembles the second step of
70 Synthesis of Large Biological Molecules
542
I NCL. In fact, NCL provided the chemical insight for unraveling the last step
in the protein splicing mechanism [32].
Inteins have been found in proteins of species ranging from eubacte-
ria, archaea, and eucarya, suggesting that they have an ancient evolutionary
origin. However, a biological role for inteins is yet to be discovered. Inter-
estingly, the products of inteins share structural homology to autoprocessing
domains, such as hedgehog proteins, present in higher eukaryotes. Fur-
thermore, inteins are often found in gene products responsible for DNA
replication or recombination, ensuring their conservation. The subject of
intein distribution and evolutionary history has been discussed at length
elsewhere [33].
Although the biological role of protein splicing remains a matter of inquiry,
the process has been exploited extensively in the areas of biotechnology and
protein chemistry. The first of these applications exploits the knowledge of the
mechanism of protein splicing to produce beneficial intein mutants. A number
of mutant inteins (many contain a C-terminalAsn + Ala mutation) have been
designed that can achieve only the first step of protein splicing [32, 34-37].
Proteins expressed as in-frame N-terminal fusions to one of these mutant
inteins can be cleaved by thiols via an intermolecular transthioesterification
reaction. This system provides two things: first it acts as a traceless chemical
protease that can be exploited for the purification of recombinant proteins [34],
and more importantly, a key ingredient of NCL, protein a-thioesters, can also
be prepared by this method. A second application involves the use of naturally
or artificially split inteins [38-411. These split inteins individually have no
activity but when combined associate noncovalently to give a functional
protein. Protein transsplicing, as this process is generally known, provides a
way of selectively ligating two different polypeptides together and represents
an augmenting alternative to EPL. Indeed, transsplicing has been exploited for
the generation of cyclic peptides and proteins, for detecting protein-protein
interactions, and for controlling protein function, some of which will be
discussed later in this chapter.
Harnessing protein splicing, researchers now have the ability to generate
recombinant protein a-thioesters through the thiolysis of an appropriately
mutated protein-intein fusion. In principle, this means that synthetic and
recombinant building blocks can be fused in a semisynthetic version of NCL.
Such an approach was first reported in 1998 and has been named expressed
protein ligation [8].
10.1.3
10.1.3.1 Generation ofThioesters

The bottleneck of EPL is the generation of peptide or protein thioesters.
This has encouraged many groups to develop methods for their construction.
10. I Expressed Protein Ligation 1 543
Fig. 10.1-2 Generation of peptide a-thioesters by Fmoc-based

SPPS using sulfonamide safety catch linker resin (a), a masked
thioester equivalent incorporated post-SPPS (b), and a masked
thioester linker strategy (c).
Several methods for the production of peptide thioesters using solid-phase

peptide synthesis (SPPS) have been fashioned. The most general strategy
involves the use of tert-butylmethoxy carbonyl (Boc)-based peptide synthe-
sis because the thioester is labile to the repeated base treatments required
in 9-fluroenylmethoxycarbonyl (Fmoc)-based SPPS [28]. However, different
technologies employing the Fmoc synthesis method have been developed
because the strategy has the advantage of milder cleavage conditions allowing
for the incorporation of acid sensitive functionality, such as phosphates and
carbohydrates, not accessible through Boc chemistry. One such method is
based on the modifications of Kenner’s sulfonamide “safety catch” linker
(Fig. 10.1-2(a))[42].The growing peptide chain is attached to the resin with an
acid and base stable N-acyl sulfonamide linker. After the peptide synthesis is
complete, the sulfonamide can be activated by N-alkylation using electrophiles
such as iodoacetonitrile. This activated species can then be cleaved with a
thiol nucleophile to generate the peptide thioester [43].An aryl hydrazine resin
544
I 10 Synthesis ofLarge Biological Molecules
has also been reported recently, which could be utilized in a similar fashion
to create peptide thioesters through thioylsis [44]. Another method involves
the coupling of “masked” thioester equivalents to fully protected peptide free
acids post-SPPS [45]. In one example (Fig. 10.1-2(b)),an amino acid derivative
was coupled to a fully protected peptide, followed by global deprotection, to
give a masked thioester intermediate. Treatment of this intermediate with
exogenous thiols reduces the disulfide bond, allowing for a spontaneous
rearrangement resulting in the formation of a peptide thioester. Finally, a
masked thioester equivalent has recently been introduced as a linker for SPPS
(Fig. 10.1-2(c))[46]. Standard cleavage conditions allows for the isolation of the
peptide-linker intermediate, which upon treatment with thiols, rearranges to
yield a peptide thioester. These examples, along with others, have been used
successfully in NCL and EPL syntheses of peptides and proteins.
As noted above, the production of recombinant protein thioesters was first
achieved by the use of mutant inteins rendered incapable of resolving their
Fig. 10.1-3 Expressed protein ligation. as a fusion t o the N-terminus of an intein.

Synthesis o f recombinant protein thioesters The CBD allows for purification. The
using the IMPACT’“ system. Thioesters are thioester resulting from thiolysis can be
obtained by expressing a protein o f interest ligated under the conditions o f NCL.
thioester intermediate [32, 34-36]. This technology is commercially available

as the IMPACT (intein-mediated purification with an affinity chitin binding
tag) system (Fig. 10.1-3)[34]. In this system, a target protein is expressed as
an N-terminal fusion of a modified intein. A chitin binding domain (CBD)
from Bacillus circulans is fused to the C-terminal portion of the intein allowing
for affinity purification of the three-component fusion protein of interest over
chitin resin. Other proteins are washed away from the desired immobilized
protein, followed by cleavage with an excess of thiol, yielding the protein
of interest as a C-terminal thioester. Modified mini inteins, containing an
Asn + Ala mutation, from the genes of Mycobacterium xenopi ( M x e GyrA),
Saccharomyces cerevisiae (Sce VMA), Methanobacterium thermoautotrophicum
( M t h R l R l ) , and Synechocystis sp. PCC6803 (Ssp DnaB) are commonly used for
this process. The cleavage occurs directly at the N-terminus of the intein due
to the lack of Asn cyclization. These inteins can be cleaved with various thiols
such as ethanethiol, thiophenol, and 2-mercaptoethansulfonic acid (MESNA)
with great efficiency.
10.1.3.2 ProtectingGroups and Sequential Ligations

Most EPL applications involve just two building blocks and thus a single
ligation reaction. However, the restrictions of SPPS, which limits the length
Fig. 10.1-4 Schematic representation of transformed into a new a-cysteine residue

sequential ligation reactions. A synthetic poised for the next ligation reaction.
peptide containing an N-terminal Likewise, a recombinant protein’s a-cysteine
thioproline residue can be ligated t o the residue can be masked by a prosequence
N-terminus o f a protein containing a cleavable by the protease factor Xa.
a-cysteine. The thioproline can then be
546
I of a synthesized peptide to about 50 residues, require that the region of
interest in a protein be relatively close to the native N- or C-terminus.

To address this issue, a sequential ligation method is necessary, and thus,
protecting groups for N-terminal cysteine residues, both in synthetic peptides
and recombinant proteins, are needed. The cysteine protection is necessary
to prevent the peptide or protein from reacting with itself in either an intra-
or intermolecular fashion. This allows for a sequential ligation strategy such
that multiple (three or more) building blocks can be linked together in series.
Two commonly used protecting group strategies are outlined in Fig. 10.1-4.
Synthetic peptide fragments can be protected as an N-terminal thioproline
residue [47],which can be removed by treatment with 0.2 M methoxylamine
following a ligation reaction [48]. Recombinant proteins can contain a cryptic
a-cysteine residue masked by a factor Xa cleavable prosequence [49]. The
advantage of this proteolytic approach is that the protecting group sequence
can be encoded at the genetic level. Thus, the prosequence can be used for
both synthetic and recombinant inserts in sequential EPL reactions.
10.1.3.3 Alternatives to N-terminal Cysteine

The only absolute requirement for NCL and EPL, other than a a-thioester,is of a
cysteine residue or a homolog at the ligation site. The natural occurrence of this
amino acid is low and there is the possibility that insertion ofadditional cysteine
residues can alter the structure and function of a given protein. Therefore,
different approaches have been developed to overcome this requirement [SO].
The first such approach extends NCL methodology to -X-Gly- and -Gly-X-
ligation sites through the use of removable auxiliaries, an example of which
is shown in Fig. 10.1-5(a)[51]. In this case, an oxyethanethiol group acts as
a cysteine surrogate allowing for the formation of a thioester intermediate
capable of rearranging to give a peptide bond. The auxiliary can then be
removed by reaction with Zn and acid. A second method allows for the ligation
site to be extended to -X-Ala- (Fig. 10.1-5(b))[52]. NCL is performed in the
usual fashion yielding a cysteine at the ligation site. In the following step, the
Cys is converted to an Ala by desulfurization using Raney nickel and hydrogen.
However, selectivity of the desulfurization reaction is impossible to achieve,
prohibiting the use of this method in the case of proteins containing further
Cys residues. In the final example, an entirely different chemoselective ligation,
the Staudinger ligation [ 5 3 ] has been used to extend the NCL methodology
(Figure 10.1-5(c))[54]. A peptide containing a C-terminal phosphinothioester
is coupled to another peptide bearing an N-terminala-azido functionality. The
reaction proceeds through the formation of an iminophosphorane possessing
a nucleophilic nitrogen that will react with a nearby acyl donor to form a peptide
bond. This methodology has successfully extended the NCL methodology to
an -X-Gly-ligation site. Further extension of these and similar technologies
allows for the extension of NCL to many different ligation sites in the future.
Fig. 10.1-5 The extension of ligation technology past the requirement o f cysteine using
auxiliaries (a), desulfination (b), and the Staudinger ligation (c).
10.1.3.4 Ligation Strategies

EPL requires, by the limitations of SPPS, that a Cys residue be located rel-
atively close to the region of the protein where unnatural moieties will be
introduced. As noted above, it is possible to reproducibly synthesize peptides
of -50 residues in length. Thus, for a protein to be completely accessible to
modification by EPL, there must be a Cys residue for every 50 or less residues
in the primary sequence. Many proteins meet this requirement and are ideal
targets for EPL. However, many more proteins do not contain suitable Cys
residues, and the simplest solution is to introduce one through mutation.
This technique has been used successfully for the semisynthesis of several
fully active proteins 19, 55-57]. In these cases, the mutation site should be
chosen with care. The mutation should be chosen to be as conservative as
possible in relation to primary sequence (e.g., Ala --f Cys or Ser-Cys) [58] and
structure (e.g., loops or linkers) [9]. Highly conserved residues from a family
of related proteins should also be avoided as sites of mutation. Given the
availability of straightforward site-directed mutagenesis strategies, the effect
of a Cys mutation can often be evaluated prior to beginning a semisynthesis
by recombinant expression of the protein containing a point mutation [59]. As
noted in the above section, technologies are being developed to overcome the
requirement of an N-terminal cysteine; however, the use of these methods is
yet to be reported in the context of EPL.
Another factor affecting the choice of where a Cys residue should be
introduced for ligation is the identity of the preceding amino acid. This
548
I residue will be at the C-terminus of the thioester fragment, and the effects
of varying this amino acid on the kinetics of NCL have been studied [GO].
Increasing the steric bulk of the side chain (particularly p-substitution)
slows the rate of the reaction. Thus, Cys substitutions directly following
bulky amino acids, especially Thr, Ile, and Val, should be avoided. A related
issue is the effect of the identity of this amino acid on the efficiency of
the protein-intein thiolysis step [Gl]. Certain residues result in premature
cleavage (e.g., Asp, Asn, Glu, Gln), while others result in no cleavage at all
(e.g., Pro).
EPL reactions can be carried out in two different ways: thiolysis and NCL
can be carried out in one pot, or the recombinant protein thioester can
be isolated initially. The first method obviates the need for a purification
step but somewhat limits the types of additives that can be present in the
reactions mixture. However, one-pot EPL reactions have been successful in
the presence of detergents, guanidinium chloride, urea, and organic solvent
mixtures [ll].Thiols, such as MESNA or thiophenol, which generate reactive
thioesters can be used directly in one-pot reactions. If the protein thioester
is first isolated, then harsher denaturants may be used in the subsequent
NCL reactions [27]. This has the advantage of increasing the solubility of
the reaction partners, allowing for high concentrations (millimolar) of the
polypeptides to be achieved, increasing the ligation yield. Less reactive alkyl
thiols are often used for the thiolysis of proteins to be isolated, followed by
in situ activation through the addition of MESNA or thiophenol in the NCL
reaction.
10.1.4
EPL has been applied to an array of proteins ranging from kinases and
phosphatases, to transcription factors, polymerases, ion channels, and many
others. A variety of modifications have been introduced into these proteins,
allowing for studies of protein structure and function that would be
difficult with other techniques. Some of these applications are highlighted
below.
10.1.4.1 Introduction of Fluorescent Probes

Fluorescent spectroscopy, because of its high level of sensitivity, has long been
a powerful method for studying protein behavior. Site-specific attachment
of fluorophores to a unique cysteine in a protein of interest is a traditional
route for the production of fluorescent proteins. In addition, the discovery
of fluorescent proteins, such as the green fluorescent protein (GFP) from
the jellyfish Aequorea victoria [G2], has provided a genetic approach for the
production of fluorescently labeled proteins. Both these methods, however,
have drawbacks. The chemical labeling of a unique cysteine is often practically

difficult and the tagging of a protein with GFP appends a -30 kDa protein,
which may affect the properties of the protein of interest. The use of EPL
can in principle overcome both these limitations. Typically, a fluorophore is
attached to the side chain of an amino acid (e.g., the &-aminogroup of lysine)
in the synthetic peptide and subsequently incorporated into the protein though
EPL. Several protection schemes have been developed to allow probes, such
as fluorescein or tetramethylrhodamine, to be introduced into peptides using
SPPS [8]. Simple derivatives of fluorophores have also been created that can
participate in EPL reactions directly [63, 641.
The ability to introduce a fluorescent probe into a specific site in a protein
opens up many possibilities for the assaying function. The simplest of these
approaches involves the monitoring of intrinsic fluorescence of the probe
during the biological process under investigation. Several fluorophores are
known to be sensitive to the surrounding environment, that is, their quantum
yields and/or Stokes shifts are responsive to changes in the dielectric constant
of the immediate surroundings. Thus, the incorporation of one of these
probes near the area of a protein that will undergo a structural change
or to a site of ligand binding allows direct observation of these events.
In one example, Alexandrov and coworkers incorporated a dansyl probe
into a semisynthetic version of a GTPase, Rab7 [65]. The fluorophore was
incorporated near the C-terminus of Rab7, which has been shown to be
posttranslationally prenylated by the enzyme Rab geranylgeranyl transferase
(RabGGTase). This modification controls the subcellular localization, and
thus the activity, of Rab7. The prenylation reaction is further modulated by the
presence of Rab escort protein (REP), which is necessary for enzymatic
activity. Both steady-state and time-resolved fluorescence measurements
were used to determine micromolar affinities of Rab7 for RabGGTase and
REP, independent of each other. This finding supports a hypothesis that
RabGGTase possesses two independent weak binding sites for Rab7 and
REP. The same group used semisynthesis to obtain a crystal structure of
mono-prenylated Yptl (a Rab homolog) bound to RabGDI, a critical GDP
dissociation inhibitor, involved in the regulation of Rab proteins [66]. This
structure provided a basis for the ability of RabGDI to inhibit the release
of nucleotide by Rab proteins. Initial binding of RabGDI to Yptl causes a
conformational change that opens a hydrophobic cavity in RabGDI. This cavity
can then accept an isoprenyl group on Ypt, forming a soluble complex that
is free to dissociate from the membrane where prenylated Rab proteins are
localized.
Fluorescence resonance energy transfer (FRET) is another powerful tech-
nique for the determination of structural and functional information using
fluorescent proteins. FRET is a physical phenomenon in which the distance
between donor and acceptor fluorophores can be determined with reason-
able accuracy [67]. This phenomenon was harnessed to study the c-Crk-I1
signaling protein, which is a substrate of the c-Abl protein kinase [68]. Using
550
I 70 Synthesis of Large Biologics/ Molecules
Fig. 10.1-6 Biosensor for c-Abl a change in the distance between the termini
phosphorylation o f c-Crk-ll. c-Abl ofthe protein. This change is reported by
phosphorylates Tyr221 of c-Crk-ll, which the FRET pair tetramethylrhodamine (Rh)
induces an intramolecular association with and fluorescein (FI) incorporated at the N-
the SH2 domain. This rearrangement yields and C-termini, respectively.
EPL, a FRET pair, tetramethylrhodamine and fluorescein, was incorporated

in c-Crk-11. By judicious placement of the fluorophores within the c-Crk-11
molecule, it was possible to monitor the phosphorylation state of the protein
using FRET measurements (Fig. 10.1-6).In a subsequent study, an extremely
sensitive dual labeled c-Crk-11 analog was developed that enabled real-time
monitoring of c-Abl kinase activity, and provided a nonradioactive assay for
the screening of potential inhibitors of the kinase [69].
10.1.4.2 Introduction of Posttranslational Modifications and Unnatural Amino

Acids
As noted above, the heterogeneous and often dynamic nature of posttransla-
tional modifications, such as phosphorylation, lipidation, and glycosylation,
makes their effects on protein structure and function extremely difficult to
study using traditional biological techniques. The semisynthetic nature of EPL,
however, is ideally suited for the incorporation of homogeneous posttransla-
tional modifications, as well as for the introduction of completely unnatural
amino acids. In the previous section, the effect of prenylation on a Rab GTPase
was shown to be necessary for not only its correct localization but also interac-
tions with an inhibitory molecule RabGDI. Shown in Fig. 10.1-7 are some of
the noncoded amino acids that have been incorporated into proteins using this
approach [I11. In most cases, these amino acids were used to study some aspect
of protein function that was difficult or impossible to study by other means.
Glycosylation is a vital posttranslational modification involved in a variety of
cellular processes including development, immune recognition, and cellular
trafficking [70]. Establishing the biological consequences of specific oligosac-
charides is difficult owing to glycoprotein microheterogeneity, which arises
from the fact that protein glycosylation is not under direct genetic control.
Because of the complex structure of oligosaccharides and the inherent incom-
patibilities between carbohydrate and peptide chemistry (e.g.,glycan stability,
protecting group compatibilities), the synthesis of homogeneous glycopro-
teins remains a daunting task. In a recent example, EPL was applied toward
the understanding of protein glycosylation on the mucinlike glycoprotein
H2N(OH H N G : H H*N
H'N$H #? HO 0 H N
Homocysteine Selenocysteine Kynurenine R-Aipocotic acid- Dapa(N'-levulinic Dapa(NL-benzophenone]

SNipocotic acid acid]
OH
HPcH'
H,N
2-Me-Tyr a-Me-Tyr Amino-Phe 2,B-Difluoro-Tyr Homotyrosine Cysteine(Sgeranylgerany1;
0
I -
- o=p-o
lH
,NO
uN,C
OH OH OH 0
H,N HN,
0 0 0 R
NorLeu Phospho-SerTThr Phospho-Tyr Tyr phosphonate N-Biotin EDTA
HO
HO
0 0 R R R=N'-Lysine
(a-Ga1NAc)SerTThr (p-GlcNAc)Asn N-Fluorescein N-Rhodamine N-Dansyl
Fig. 10.1-7 Some o f t h e amino acids introduced into proteins using EPL.
GlyCAM-1 [71]. GlyCAM-1 functions as a ligand for the leukocyte adhesion

molecule L-selectin,which is involved in leukocyte trafficking to sites of injury
and infection. GlyCAM-1 comprises two glycosylated mucin domains, sepa-
rated by a central, unglycosylated domain. The mucin domains, which are
characterized by clusters of oligosaccharides linked through an a-0-glycosidic
bond between N-acetyl galactosamine (GalNAc) and the hydroxyl groups of
Ser/Thr residues of the protein backbone, are essential for binding L-selectin.
To address the question ofwhich mucin domains are important for GlyCAM-1
function, Bertozzi and Macmillan used EPL to make three semisynthetic
10 Synthesis ofLarge Biological Molecules
552
I
Fig. 10.1-8 Semisynthesis of three different ClyCAM-1 molecules bearing different

glycosylation patterns.
versions containing either or both of the mucin domains (Fig. 10.1-8). The
two proteins containing only one mucin domain were synthesized using one
ligation site between a synthetic glycopeptide and a recombinant protein.
GlyCAM-1 containing both mucin domains was created using a three-part
sequential ligation strategy with two synthetic glycopeptides and a recombi-
nant thioester protected at the N-terminus with a factor Xa cleavage peptide.
The resulting glycoproteins bearing a-GalNAc residues can then be enzymati-
cally elaborated with further glycsosyltransferases to generate the endogenous
6-sulfo sialyl Lewis' motifs required for L-selectin binding.
Transforming growth factor /3 (TGFB) is a member of a large family of
secreted cytokines of central importance in the eukaryotic development and
homeostasis [72]. The initiation of TGFB signaling involves a ligand-induced
multiple phosphorylation event ofTGFB receptor I by TGFB receptor I1 (TBR-I
andTBR-I1respectively). This yields an activated TBR-I, enabling it to phospho-
rylate members of the Smad family of transcription factors. The modification
of Smads allows them to oligomerize, giving active transcription complexes
that can enter the nucleus and mediate gene expression. EPL has been used ele-
gantly to shed light on the molecular mechanisms of many of these steps in the
TGFB signaling pathway. To understand the activation of TBR-I by phosphory-
lation, a semisynthetic version of the receptor was produced containing three
phosphoserines and one phosphothreonine [73].Access to this homogeneous
preparation of activated TBR-I allowed the mechanism of receptor activation
to be studied for the first time [74].Accordingly, phosphorylation was shown
to increase the binding affinity of TBR-I for Smad2 and decrease its affinity
for an inhibitor of the pathway, FKBP12. These observations yielded a new
model of receptor activation in which phosphorylation of the receptor switches
it from an inhibited state into an activated form capable of binding substrate.
The next step in the pathway, the effectof phosphorylation on Smad2, has also
10.J Expressed Protein Ligation I 553
Fig. 10.1-9 Semisynthetic SmadZ containing two phosphoserines was used to confirm
the trimeric state of the active protein.
been investigated using EPL [75]. Phosphorylation occurs in the last two serine
residues in the C-terminus of Smad2 during signaling. It had been shown
previously that phosphorylation of h a d 2 is indispensable in TGFB signaling,
but how phosphorylation affects the conformation and function of Smad was
yet to be elucidated. To investigate this, a homogeneous, doubly phosphory-
lated version of Smad2 was synthesized. Biochemical studies on this protein
indicated that phosphorylation induced trimerization of the protein. As show
in Fig. 10.1-9,this conclusion was confirmed when the crystal structure of such
a trimer was determined. These investigations revealed how phosphorylation
of Smad2 allows dissociation from the activated TBR-I receptor and simulta-
neously induces hetero-oligomerization with a key regulatory protein, Smad4.
Muir and coworkers have used EPL to generate two semisynthetic versions
of Smad2 to probe its transport to the nucleus. The first such protein contains
two phosphates, a fluorescent probe, a fluorescence quenching molecule, and
a photocleavable linker (Fig. 10.1-10)[7G]. The linker acts as a bifunctional
caging group, both interfering with Smad2 trimerization and quenching the
fluorescence of the molecule. Thus, cleavage of this linker with light results in
the formation of active protein, as well as the induction of protein fluorescence.
Indeed, when examined by gel filtration, the caged protein was found to be
incapable of forming trimers, but after cleavage there was a clean conversion
to the trimeric state. Importantly, this was also accompanied by an -26-fold
increase in fluorescence. This caged protein is currently the focus of study
for unraveling the behavior of Smad2 and the kinetics of the TGFB signaling
pathway. In a complementary system, the same group synthesized a unique
version of Smad2 in which the phosphate groups on the last two serines
are photocaged (Fig. 10.1-ll(a))[77]. Again, the caged protein was unable
to form the obligatory trimers for signaling. However, after photoactivation
the phosphates were released and oligomerization could occur. Furthermore,
the semisynthetic protein was used successfully in a nuclear import assay
554
Fig. 10.1-10 Design of caged SmadZ based Photolysis with 365 n m light causes
on a modified C-terminal phosphopeptide. simultaneous activation of both Smad2 and
Fluorescence and activity of Smad2 are fluorescence.
blocked by a photocleavable caging group.
demonstrating that the caged protein behaves controllably and as desired in a

biological context (Fig. 10.1-11(b)).
The selectivity filter of K+ channels contains four main chain carbonyl
oxygen atoms directed toward the pore. These carbonyl oxygens create four
K+-binding sites in a row inside the filter. To create these binding sites, the
peptide backbone has to adopt an unusual conformation in which the dihedral
angles of the four amino acid sequence alternate between the left-handed
and right-handed regions of the Ramachandran plot. One way to achieve this
conformation is to use alternating L- and D-amino acids. However, in ribosome-
synthesized proteins, nature uses exclusively L-amino acids, precluding the
enantiomeric D-configuration of side chains. These L-amino acids strongly
prefer right-handed a-helical conformations. Glycine is the only amino acid
in proteins synthesized by the ribosome to comfortably reside in the left-
handed a-helical region of the Ramachandran plot, and, therefore in this
instance, could be acting as a surrogate D-amino acid. Muir, MacKinnon, and
coworkers used EPL to construct a semisynthetic version of the K+ channel
KcsA containing a D-alanine in place of the conserved glycine (Gly77) [78].
Indeed, it was demonstrated that replacement of Gly77 with D-Ala yielded a
protein that exhibited complete retention of function. In contrast, substitution
with an L-Ala acid resulted in a nonfunctional channel. Therefore, it was
concluded that, above all, glycine is used in the K+ channel’s selectivity filter
Fig. 10.1-11 (a) Smad2 bearing two caged phophoserines, and

its subsequent activation with light. (b) Caged Smad2 is excluded
from the nucleus, while deprotected Smad2 forms trimers and
accumulates in the nucleus.
to fulfill specific dihedral angle requirements, and, thus, it serves as a D-amino

acid surrogate.
10.1.4.3 Introduction of Stable Isotopes

EPL has also been used successfully to develop a segmental isotopic labeling
strategy designed to overcome the practical size limit for protein structure
determination using nuclear magnetic resonance (NMR)spectroscopy [79].
This limit exists because of the loss of spectral resolution occurring from both
increased linewidths at longer rotational correlation times, and from the in-
creased number of amino acids in the protein. The first of these problems has
to a large extent be overcome with the development of new NMR techniques
and technology. However, standard isotopic labeling techniques involving the
uniform incorporation of 13C, "N, and 2 H cannot address the problem of
signal overlap for larger systems. Segmental isotopic labeling solves this prob-
lem by allowing selected portions of a protein to be enriched with NMR active
isotopes. Unlabeled regions can then be filtered out of the NMR spectrum
using suitable heteronuclear correlation experiments. Therefore, segmental
labeling significantly reduces the spectral complexity of large proteins allowing
for a variety of NMR experiments.
Segmental isotopic labeling has been accomplished using both protein
transsplicing and EPL. Yamazaki and coworkers used a protein transsplicing
70 Synthesis of Large Bio/ogica/ Molecules
556
I system based on a split PI-Pfu intein to selectively *'N label the C-terminal
domain of the Escherichia coli RNA polymerase a subunit [41]. EPL was first
applied to this area when a single domain within the Src homology domain
derived form the Abl protein tyrosine kinase was labeled with "N [58]. In
both these pioneering experiments, one-half of the protein of interest was
bacterially expressed using a growth medium enriched with a "N source.
Subsequent ligation of this labeled fragment with another protein fragment,
in this case unlabeled, yielded the selectively labeled protein. EPL and protein
transsplicing have been successfully applied to a variety of proteins and have
yielded proteins labeled not only at either termini but in internal segments as
well [79]. For example, the mechanism of autoregulation of bacterial D factor
was explored using EPL [80]. Autoregulation of this enzyme was purposed
to occur through direct interactions between two regions of the protein. By
specifically labeling one of these domains, the authors were able to use N M R to
argue against a high affinity interaction between the two regions and suggest
that autoinhibition of DNA binding occurs through an indirect steric and/or
electrostatic mechanism. In another example, Muir and coworkers used in-
ternal isotopic labeling to study the mechanism of intein-catalyzed protein
splicing [81].The peptide bond at the N-extein-intein junction was labeled
with 13Cusing semisynthesis. The subsequent N M R experiments showed that
this peptide bond exists in an unusual conformation, which may help catalyze
the first step of protein splicing.
10.1.4.4 Topology Engineering of Proteins

Protein engineering has traditionally involved the modification of amino
acid side chains, however, there has been increasing interest in altering the
underlying backbone and even the overall topology of a protein. Examples
of such topological changes include cyclic and branched polypeptides.
EPL and protein transsplicing have both been used for the synthesis of
cyclic peptides and proteins. Protein circularization is of particular interest
because basic polymer theory predicts that cyclization will yield a net
thermodynamic stabilization of a protein's folded state owing to reduced
conformational entropy in the denatured state. Indeed, some circular proteins
prepared by EPL and protein transslicing are more stable than their linear
counterparts (e.g., GFP [82], B-lactamase [83], and dihydrofloate reductase
(DHFR) [84]). Other proteins, however, such as the c-Crk-11SH3 domain [85]
and pancreatic trypsin inhibitor [86], have not been found to be more
stable. In both these latter examples, it is likely that unfavorable enthalpic
effects (e.g., strain) offset the beneficial entropic effect resulting from
circularization.
Many pharmaceutically important natural products, including antibiotics
and immunosuppressants, are based on cyclic peptides. Therefore, the ability
to synthesize backbone cyclic peptides using EPL or protein transsplicing
10.I Expressed Protein Ligation I 557
is an enticing opportunity for drug development. For example, Payan and

coworkers used a split intein approach for identifying bioactive peptides [87].
A random cyclic pentapeptide library was introduced into human B cells
using a retroviral delivery system. A cell-based screen was then used to
identify peptides that exhibited the ability to inhibit the IL-4 signaling pathway.
These active peptides have potential as anti-inflammatory therapeutics or
may serve as lead compounds for the synthesis of even more efficacious
drugs.
10.1.4.5 Protein Splicing in Living Cells

Although a large amount of information can be gleaned from in vitro protein
characterization and semisynthesis, characterization of proteins in the context
of a living cell is of extreme importance for a complete understanding of their
function. Although classical genetic methods to disrupt protein function (e.g.,
mutagenesis, gene knockouts, and overexpression) and posttranscriptional
technology such as RNAi have provided incredible insights into protein
function, they have their limitations. Genetic knockouts, although exquisitely
precise, can in many instances lead to a lethal phenotype for essential genes
or show a limited phenotype in cases of genetic compensation. RNAi can
overcome some of these limitations and has been used with great success;
however, as with gene knockouts, protein levels cannot be tuned subtlety and
thus delicate effects of protein activity are difficult to study. Semisynthesis of
proteins in living cells can to some extent surmount these problems, as it is
an inducible, temporal, and tunable technology for the modulation of protein
function at the posttranslational level.
Muir and Giriat described the first example of protein semisynthesis in a
living cell (Fig. 10.1-12)[88].In this system, a protein ofinterest is expressed in
cultured cells with the first half of the naturally occurring Ssp DnaE split intein
(inteinN)genetically fused to its C-terminus. Then a semisynthetic polypeptide,
comprising the second half of the intein (intein') covalently attached to a
synthetic probe and a protein transduction domain (PTD) peptide, is added
to the cellular media. The PTD peptide delivers the semisynthetic construct
into the cells, where the intein' can interact with its complementary half,
triggering protein splicing. This yields the protein of interest linked to the
probe through a native peptide bond. As a proof of principle, GFP was ligated
to a short synthetic peptide on the basis of the FLAG epitope.
Muir and coworkers have developed a technology to control protein splicing
in a living cell. This technology, termed conditional protein splicing (CPS),relies
on the FKBP/rapamycin/FRB three-hybrid heterodimerization system [89].
Fusing separate halves ofa split intein to either FKBP or FRB allows the intein
fragments to be brought together in response to the dimerizer molecule.
Provided the juxtaposition of the intein fragments in the resulting dimer is
compatible with functional complementation, this results in spicing together
of the flanking extein sequences (Fig. 10.1-13(a)).This was realized through
558
Fig. 10.1-12 Principle o f protein semisynthesis in living cells. The

protein transduction domain (PTD) delivers the probe t o the cell,
which is followed by complementation o f the DnaE intein halves
and protein splicing.
the use of an artificially split S. cerevisiae VMA intein. Two model exteins,
maltose binding protein (MBP) and a polyhistidine-containing sequence (HIS),
were used to explore the scope of the technology. CPS displays little to no
background and produces the product within 10min of the addition of
rapamycin, indicating the advantage of the posttranslational nature of CPS for
quick responses. Furthermore, the level of product formation was dose and
time dependent (Fig. 10.1-13(b))and can be attenuated with inhibitors of the
three-hybrid system, such as ascomycin [go].
Because of the promiscuity of inteins for their flanking extein sequences,
CPS is expected to have a certain level of generality. In fact, the only strict
extein sequence requirement is the cysteine residue of the C-extein, necessary
in EPL. In the most general form of CPS, a polypeptide with a novel func-
tion could be obtained by splicing together two fragments that lack function
individually. This general goal can be achieved in several ways. For example,
two domains of a protein that display no activity could be spliced together to
give a functional protein. Alternatively, one splicing partner could be a peptide
localization sequence, resulting in relocalization of the splicing product on
addition of rapamycin.
Liu and coworkers have recently developed a different strategy for small-
molecule activated protein splicing [91]. In this report, an intein was inserted
Fig. 10.1-13 (a) Principle of conditional protein splicing (CPS) A

split intein is reconstituted by the addition of rapamycin, which
heterodimerizes FKBP and F R B resulting i n protein splicing
(b) Dose and t i m e dependence o f the CPS reaction
into a protein of interest, interrupting its function, which is restored after splic-
ing. Simple insertion ofa natural ligand-binding domain into a minimal intein,
destroyed the splicing activity and yielded an evolvable intein-based molecular
switch that transduces binding of a srnall molecule into the activation of a
protein of interest. Specifically, the Mycobui-terium tuberculosis RecA intein was
modified with the human estrogen receptor- ( E R ) ligand binding domain (LBD)
(residues 304-55 I ) ,which binds the small-molecule 4-hydroxytamoxifen. This
protein was then evolved through multiplr rounds of mutation and selection
in S.ctrevkiat by linking the splicing to cell survival or fluorescence. Iterated
cycles of inutagenesis and selection yielded intcins with strong splicing activ-
ities that depended highly on the presencc ofthe srnall molecule. Insertion of
one of these inteins into different unrelated proteins in living cells revealed
560
I that the technology allows for ligand-dependent protein function that it is
fairly rapid, dose dependent, and posttranslational. This system represents an

exciting complementary technology to the CPS discussed above.
10.1 .s
Future Development
Because of the power of EPL and protein splicing, these techniques will
undoubtedly be used for many applications in the future. EPL provides
researchers with a versatile tool for the study of protein function by allowing
the preparation of proteins containing both natural and artificial modifications.
As seen above, this technology is well suited for biochemical and biophysical
studies; however, it may also be a valuable tool for areas such as proteomics,
material science, and nanotechnology. For example, the Yao group has reported
on the preparation of a protein microarray by first biotinylating proteins
using EPL and then spatially arranging these on an avidin-coated slide [92].
Importantly, EPL ensures that the site of modification in all proteins is
consistent with respect to the site of immobilization, the C-terminus in this
case. These types ofprotein surfaces could be used for both proteomic profiling
of cellular interactions and protein modifications. In addition, homogeneous
surfaces coated with specific proteins can be prepared, which can be useful
for materials and other biophysical applications (e.g., assay development,
and cellular patterning). The highly controlled nature of EPL could also be
used in the areas of biomedicine, through the generation of novel protein
therapeutic drugs and diagnostic tools. In one such example, Sydor et al.
established conditions that allow single-chain antibodies to be utilized in EPL
reactions [93].Thus, it should now be possible to attach any synthetic molecule
to the C-terminus of an antibody. Used in conjugation with technologies such
as quantum dots and contrast reagents, EPL can be powerful in the area of
bioimaging, as well as vaccine development and targeted-drug delivery.
Protein transsplicing also has potential in the area of proteomics.
The Umezawa group has developed a two-hybrid approach to probe for
protein-protein interactions in the cytosol of prokaryotic [94] and eukaryotic
cells [95]. The strategy involves fusing each half of a reporter protein (GFP or
luciferase) to the appropriate end of a split intein. The intein fragments are
then fused to either a receptor protein (fish) or to a library of potential ligands
(bait). Interaction between a fish and bait pair results in protein splicing
and generation of an active reporter protein. This type of strategy could be
extended to profile interacting partners of a protein of interest, by tagging
binding partners with a reporter construct. CPS could also be extended to the
investigation of enzymes and signaling proteins. Indeed, this has already been
accomplished in vitro through the generation of an inducible version of the
kinase PKA [96]. Extrapolation of this technology to cellular systems should
References I561
follow in due course, and the development of nontoxic rapamycin analogs [97]
may broaden the technology to living animals.
10.1.6
Conclusion
As noted at the beginning of this chapter, a true understanding of biological

processes requires that they be studied in a context that accounts for tissue
and cell-type expression, modification patterns, and temporal changes in these
patterns. EPL and protein splicing have been used with great success to scratch
the surface of some of these questions by allowing for homogeneous protein
engineering. In the future, these technologies should provide for a more
intimate understanding of protein structure and function.
References
1. P. Cohen, The development and Natl. Acad. Sci. U.S.A. 1998, 95,
therapeutic potential of protein kinase 6705-6710.
inhibitors, Curr. Opin. Chem. Bid. 9. K. Severinov, T.W. Muir, Expressed
1999, 3,459-465. protein ligation, a novel method for
2. N.L. Pohl, Functional proteomics for studying protein-protein interactions
the discovery of carbohydrate-related in transcription, J . Biol. Chem. 1998,
enzyme activities, C u r . Opin. Chem. 273,16205-16209.
Biol. 2005, 9, 76-81. 10. T.C. Evans Jr, I. Benner, M.Q.Xu,
3. J.M. Antos, M.B. Francis, Selective Semisynthesis of cytotoxic proteins
tryptophan modification with rhodium using a modified protein splicing
carbenoids in aqueous solution, J . Am. element, Protein Sci. 1998, 7,
Chem. SOC.2004, 126,10256-10257. 2256-2264.
4. N.S. Joshi, L.R. Whitaker, M.B. 11. T.W. Muir, Semisynthesis ofproteins
Francis, A three-component by expressed protein ligation, Annu.
Mannich-type reaction for selective Rev. Biochem. 2003, 72, 249-289.
tyrosine bioconjugation, J. Am. Chem. 12. R. David, M.P. Richter, A.G.
SOC.2004, 126,15942-15943. Beck-Sickinger, Expressed protein
5. I. Chen, A.Y. Ting, Site-specific ligation. Method and applications, Eur.
labeling of proteins with small J . Biochem. 2004, 271,663-677.
molecules in live cells, Curr. Opin. 13. C.J. Wallace, Peptide ligation and
Biotechnol. 2005, 16, 35-40. semisynthesis, Curr. Opin. Biotechnol.
6. P.M. England, Unnatural amino acid 1995, 6,403-410.
mutagenesis: a precise tool for probing 14. D.F. Dyckes, T. Creighton, R.C.
protein structure and function, Sheppard, Spontaneous re-formation
Biochemistry 2004, 43, 11623-11629. of a broken peptide chain, Nature
7. L. Wang, P.G. Schultz, Expanding the 1974,247,202-204.
genetic code, Angew. Chem., Int. Ed. 15. C.J. Wallace, I. Clark-Lewis,
E& 2004, 44,34-66. Functional role of heme ligation in
8. T.W. Muir, D. Sondhi, P.A. Cole, cytochrome c. Effects of replacement
Expressed protein ligation: a general of methionine 80 with natural and
method for protein engineering, Proc. non-natural residues by
562
I semisynthesis,]. Biol. Chem. 1992, 25. K. Rose, Facile synthesis of homo-
267,3852-3861. geneous artificial proteins,]. Am.
16. Y. Chen, Y.W. Ebright, R.H. Ebright, Chem. SOC.1994, 116,30-33.
Identification of the target of a 26. M. Schnnlzer, S.B.H. Kent,
transcription activator protein by Constructing proteins by dovetailing
protein-protein photocrosslinking, unprotected synthetic
Science 1994, 265, 90-92. peptides-backbone-engineered HIV
17. J. Mukhopadhyay, A.N. Kapanidis, protease, Science 1992, 256, 221-225.
V. Mekler, E. Kortkhonjia, Y.W. 27. P.E. Dawson, S.B. Kent, Synthesis of
Ebright, R.H. Ebright, Translocation native proteins by chemical ligation,
ofo(70)with RNA Polymerase during Annu. Rev. Biochem. 2000, 69,
transcription: fluorescence resonance 923-960.
energy transfer assay for movement 28. p , ~D, ~T.W. ~~ ~ ~ i ~~ ,~ ,
relative to DNA, Cell 2001, 106, I. Clark-Lewis,S.B. Kent, Synthesis of
45 3-463. proteins by native chemical ligation,
18. D. Macmillan, R.M. Bill, K.A. Sage, Science 1994, 266, 776-779.
D. Fern, S.L. Flitsch, Selective in vitro 29. M, Chytil, B,R, peterson, D,A,
glycosylation of recombinant proteins: Erlanson, G,L, Verdine, The
semi-synthesis Of novel homogeneous orientation ofthe AP-1 heterodimer on
glycoforms of human erythropoietin, DNA strongly affects transcriptional
Chem. Bid. 2001, 8,133-145.
potency, Proc. Natl. Acad. Sci. U.S.A.
19. M. Ghosh, I. Ichetovkin, X. Song, J.S.
1998, 95, 14076-14081,
Condeelis, D.S. Lawrence, A new
30. C.J. Noren, J. Wang, F.B. Perler,
strategy for caging proteins regulated
Dissecting the chemistry of protein
by kinases,]. Am. Chem. SOC.2002,
splicing and its applications, Angew.
124,2440-2441.
Chem., [nt. Ed. Engl. 2000, 39,
20. G.A. Homandberg, M. Laskowski Jr,
450-466.
Enzymatic resynthesis of the
31. H. Paulus, Protein splicing and related
hydrolyzed peptide bond(s) in
ribonuclease S, Biochemistry 1979, 18, forms of protein autoprocessing,
586-592. Annu. Rev. Biochem. 2000, 69,
21. D.Y. Jackson, J. Burnier, C. Quan, 447-496.
M. Stanley, J. Tom, J.A. Wells, A 32. M.Q. Xu, F.B. Perler, The mechanism
designed peptide ligase for total of protein splicing and its modulation
synthesis of ribonuclease A with by mutation, EMBO]. 1996, 15,
unnatural catalytic residues, Science 5146-5153.
1994,266,243-247. 33. I. Giriat, T.W. Muir, F.B. Perler,
22. F. Bordusa, Proteases in organic Protein splicing and its applications,
synthesis, Chem. Rev. 2002, 102, Genet. Eng. (N.Y.) 2001, 23, 171-199.
4817-4868. 34. S. Chong, F.B. Mersha, D.G. Comb,
23. H.F. Gaertner, K. Rose, R. Cotton, M.E. Scott, D. Landry, L.M. Vence,
D. Timms, R. Camble, R.E. Offord, F.B. Perler, J. Benner, R.B. Kucera,
Construction of protein analogues by C.A. Hirvonen, J.J. Pelletier,
site-specificcondensation of H. Paulus, M.Q. Xu, Single-column
unprotected fragments, Bioconjugate purification of free recornbinant
Chem. 1992,3,262-268. proteins using a self-cleavableaffinity
24. H.F. Gaertner, R.E. Offord, R. Cotton, tag derived from a protein splicing
D. Timms, R. Camble, K. Rose, element, Gene 1997, 192,271-281.
Chemo-enzymic backbone 35. T.C. Evans Jr, J. Benner, M.Q. Xu, The
engineering of proteins. Site-specific in vitro ligation of bacterially
incorporation of synthetic peptides expressed proteins using an intein
that mimic the 64-74 disulfide loop of from Methanobacterium themoauto-
granulocyte colony-stimulating factor, trophicum, 1.Bid. Chem. 1999, 274,
I. Bid. Chem. 1994, 269,7224-7230. 3923-3926.
References I 5 6 3
36. S. Mathys, T.C. Evans, I.C. Chute, synthetic glycoproteins by ultimately
H. Wu, S. Chong, J. Benner, X.Q. Liu, convergent routes: a solution to a
M.Q. Xu, Characterization of a long-standing problem, /. Am. Chem.
self-splicing mini-intein and its Soc. 2004, 126,6576-6578,
conversion into autocatalytic N- and 46. P. Botti, M. Villain, S. Manganiello,
C-terminal cleavage elements: facile H. Gaertner, Native chemical ligation
production of protein building blocks through in situ 0 to S acyl shift, Org.
for protein ligation, Gene 1999, 231, Lett. 2004, 6, 4861-4864.
1-13. 47. M. Villain, J. Vizzavona, K. Rose,
37. D.W. Wood, W. Wu, G. Belfort, Covalent capture: a new tool for the
V. Derbyshire, M. Belfort, A genetic purification of synthetic and
system yields self-cleaving inteins for recombinant polypeptides, Chem. Biol.
bioseparations, Nut. Biotechnol. 1999, 2001, 8,673-679.
17,889-892. 48. D. Bang, S.B. Kent, A one-pot total
38. M.W. Southworth, E. Adam, synthesis of crambin, Angew.Chem.,
D. Panne, R. Byer, R. Kautz, F.B. lnt. Ed. Engl. 2004, 43, 2534-2538.
Perler, Control of protein splicing by 49. G.J. Cotton, B. Ayers, R. Xu, T.W.
intein fragment reassembly, E M B O J . Muir, Insertion of a synthetic peptide
1998, 17,918-926. into a recombinant protein
39. K.V. Mills, B.M. Lew, S. Jiang, framework: a protein biosensor, /. Am.
H. Paulus, Protein splicing in trans by Chem. Soc. 1999, 121, 1100-1101.
purified N- and C-terminal fragments 50. R.M. Hofmann, T.W. Muir, Recent
of the Mycobacterium tuberculosis RecA advances in the application of
intein, Proc. Natl. Acad. Sci. U.S.A. expressed protein ligation to protein
1998, 95, 3543-3548. engineering, Curr. Opin. Biotechnol.
40. H. Wu, Z. Hu, X.Q. Liu, Protein 2002, 13,297-303.
trans-splicing by a split intein encoded 51. L.E. Canne, S.J. Bark, S.B. Kent,
in a split DnaE gene of Synechocystis Extending the applicability of native
sp. PCC6803, Proc. Nutl. Acad. Sci. chemical ligation, 1.Am. Chem. Soc.
U.S.A. 1998, 95,9226-9231. 1996, 118,5891-5896.
41. T. Yamazaki, T. Otomo, N. Oda, 52. L.Z. Yan, P.E. Dawson, Synthesis of
Y. Kyogoku, K. Uegaki, N. Ito, peptides and proteins without cysteine
Y. Ishino, H. Nakamura, Segmental residues by native chemical ligation
isotope labeling for protein NMR combined with desulfurization, 1.Am.
using peptide splicing, /. Am. Chem. Chem. SOC. 2001, 123,526-533.
SOC.1998, 120,5591-5592. 53. E. Saxon, C.R. Bertozzi, Cell surface
42. B.J. Backes, ].A. Ellman, An engineering by a modified Staudinger
alkanesulfonamide “safety-catch” reaction, Science 2000, 287,
linker for solid-phase synthesis, /. Org. 2007-2010.
Chem. 1999, 64,2322-2330. 54. B.L. Nilsson, R.J. Hondal, M.B.
43. Y. Shin, K.A. Winans, B.J. Backes, Soellner, R.T. Raines, Protein
S.B.H. Kent, J.A. Ellman, C.R. assembly by orthogonal chemical
Bertozzi, Fmoc-based synthesis of ligation methods, 1.Am. Chem. Soc.
peptide-(cu)thioesters: Application to 2003, 125,5268-5269.
the total chemical synthesis of a 55. R.J. Hondal, B.L. Nilsson, R.T. Raines,
glycoprotein by native chemical Selenocysteine in native chemical
ligation, /. Am. Chem. Soc. 1999, 121, ligation and expressed protein
11684-11689. ligation, /. Am. Chem. SOC.2001, 123,
44. Y. Kwon, K. Welsch, A.R. Mitchell, 5140- 5141.
J.A. Camarero, Preparation of peptide 56. D. Wang, P.A. Cole, Protein tyrosine
p-nitroanilides using an aryl hydrazine kinase Csk-catalyzed phosphorylation
resin, Org. Lett. 2004, 6, 3801-3804. of Src containing unnatural tyrosine
45. 1.D. Warren, 1,s. Miller, S.I. Keding, analogues, 1. Am. Chem. Sac. 2001,
S.J. Danishekky, Toward fully ” 123, f883-8886.
564
I 57. K. Alexandrov, I . Heinemann, K. Alexandrov, Structure of Rab
T. Durek, V. Sidorovitch, R.S. Goody, GDP-dissociation inhibitor in complex
H. Waldmann, Intein-mediated with prenylated YPTl GTPase, Science
synthesis of geranylgeranylated Rab7 2003,302,646-650.
protein in vitro, /. Am. Chem. SOC. 67. P.R. Selvin, Fluorescence resonance
2002, 124,5648-5649. energy transfer, Methods Enzymol.
58. R. Xu, B. Ayers, D. Cowburn, T.W. 1995,246,300-334.
Muir, Chemical ligation of folded 68. G.J. Cotton, T.W. Muir, Generation of
recombinant proteins: segmental a dual-labeled fluorescence biosensor
isotopic labeling of domains for N M R for Crk-I1 phosphorylation using
studies, Proc. Natl. Acad. Sci. U.S.A. solid-phase expressed protein ligation,
1999, 96, 388-393. Chem. Biol. 2000, 7,253-261.
59. F.I. Valiyaveetil, R. MacKinnon, T.W. 69. R.M. Hofmann, G.J. Cotton, E. J.
Muir, Semisynthesis and folding of Chang, E. Vidal, D. Veach,
the potassium channel KcsA, 1.Am. W. Bornmann, T.W. Muir,
Chem. SOC.2002, 124,9113-9120. Fluorescent monitoring of kinase
60. T.M. Hackeng, J.H. Griffin, P.E. activity in real time: development of a
Dawson, Protein synthesis by native robust fluorescence-based assay for
chemical ligation: expanded scope by Abl tyrosine kinase activity, Bioorg.
using straightforward methodology, Med. Chem. Lett. 2001, 11,3091-3094.
Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 70. A. Varki, R. Cummings, J. Esko,
10068- 10073. Essentials of Clycobiology, Cold Spring
61. S. Chong, K.S. Williams, Harbor Labs, Cold Spring Harbor,
C. Wotkowicz, M.Q. Xu, Modulation 1999.
of protein splicing of the 71. D. Macmillan, C.R. Bertozzi, Modular
Saccharomycescerevisiae vacuolar assembly of glycoproteins: towards the
membrane ATPase intein, /. Biol. synthesis of GlyCAM-1 by using
Chem. 1998,273,10567-10577. expressed protein ligation, Angew.
62. R.Y. Tsien, The green fluorescent Chem., Int. Ed. Engl. 2004, 43,
protein, Annu. Rev. Biochem. 1998, 67, 1355-1359.
509-544. 72. P.M. Siegel, J. Massague, Cytostatic
63. T.J. Tolbert, C.-H. Wong, Intein- and apoptotic actions of TGFP in
mediated synthesis of proteins homeostasis and cancer, Nat. Rev.
containing carbohydrates and other Cancer 2003,3,807-821.
molecular probes, /. Am. Chem. SOC. 73. M. Huse, M.N. Holford, J. Kuriyan,
2000, 122,5421-5428. T.W. Muir, Semisynthesis of
64. V. Mekler, E. Kortkhonjia, hyperphosphorylated type I TGFB
J. Mukhopadhyay, J. Knight, receptor: addressing the mechanism
A. Revyakin, A.N. Kapanidis, W. Niu, of kinase activation, /. Am. Chem. SOC.
Y.W. Ebright, R. Levy, R.H. Ebright, 2000, 122,8337-8338.
Structural organization of bacterial 74. M . Huse, T.W. Muir, L. Xu, Y.G.
RNA polymerase holoenzyme and the Chen, J. Kuriyan, J. Massague, The
RNA polymerase-promoter open TGF beta receptor activation process:
complex, Cell 2002, 108, 599-614. an inhibitor- to substrate-binding
65. A. lakovenko, E. Rostkova, switch, Mol. Cells 2001, 8, 671-682.
E. Merzlyak, A.M. Hillebrand, N.H. 75. J.W. Wu, M. Hu, J. Chai, J. Seoane,
Thoma, R.S. Goody, K. Alexandrov, M. Huse, C. Li, D.J. Rigotti, S. Kyin,
Semi-synthetic Rab proteins as tools T.W. Muir, R. Fairman, J. Massague,
for studying intermolecular Y. Shi, Crystal structure o f a
interactions, FEBS Lett. 2000, 468, phosphorylated Smad2. Recognition
155- 158. ofphosphoserine by the MH2 domain
66. A. Rak, 0. Pylypenko, T. Durek, and insights on Smad function in
A. Watzke, S. Kushnir, L. Brunsveld, TGF-beta signaling, Mol. Cells 2001, 8,
H. Waldmann, R.S. Goody, 1277-1289.-
References I 5 6 5
76. J.P. Pellois, M.E. Hahn, T.W. Muir, protein fold through backbone
Simultaneous triggering of protein cyclization, /. Mol. Biol. 2001, 308,
activity and fluorescence, /. Am. Chem. 1045- 1062.
Soc. 2004, 126,7170-7171. 86. D.P. Goldenberg, T.E. Creighton,
77. M.E. Hahn, T.W. Muir, Photocontrol Folding pathway of a circular form of
of Smad2, a multiphosphorylated bovine pancreatic trypsin inhibitor, /.
cell-signaling protein, through caging Mol. Biol. 1984, 179, 527-545.
of activating phosphoserines, Angew. 87. T.M. Kinsella, C.T. Ohashi, A.G.
Chem., Int. Ed. Engl. 2004, 43, Harder, G.C. Yam, W. Li, B. Peelle,
5800-5803. E.S. Pali, M.K. Bennett, S.M.
78. F.I. Valiyaveetil, M. Sekedat, Molineaux, D.A. Anderson, E.S.
R. Mackinnon, T.W. Muir, Glycine as Masuda, D.G. Payan, Retrovirally
a D-amino acid surrogate in the delivered random cyclic Peptide
K(+)-selectivity filter, Proc. Natl. Acad. libraries yield inhibitors of
Sci. U.S.A. 2004, 101,17045-17049. interleukin-4 signaling in human B
79. D. Cowburn, T.W. Muir, Segmental cells, J . Biol. Chem. 2002, 277,
isotopic labeling using expressed 37512-37518.
protein ligation, Methods Enzymol. 88. I. Giriat, T.W. Muir, Protein
2001,339,41-54. semi-synthesis in living cells, /,Am.
80. J.A. Camarero, A. Shekhtman, E.A. Chem. SOC.2003, 125,7180-7181.
Campbell, M. Chlenov, T.M. Gruber, 89. H.D. Mootz, T.W. Muir, Protein
D.A. Bryant, S.A. Darst, D. Cowburn, splicing triggered by a small molecule,
T.W. Muir, Autoregulation of a 1.Am. Chem. SOC.2002, 124,
bacterial m factor explored by using 9044- 9045.
segmental isotopic labeling and N M R , 90. H.D. Mootz, E.S. Blum,A.B.
Proc. Natl. Acad. Sci. U.S.A. 2002, 99, Tyszkiewicz, T.W. Muir, Conditional
8536-8541. protein splicing: a new tool to control
81. A. Romanelli, A. Shekhtman, protein structure and function in vitro
D. Cowburn, T.W. Muir, and in vivo, J. Am. Chem. SOC.2003,
Semisynthesis of a segmental 125,10561-10569.
isotopically labeled protein splicing 91. A.R. Buskirk, Y.C. Ong, Z. J. Gartner,
precursor: N M R evidence for an D.R. Liu, Directed evolution of ligand
unusual peptide bond at the dependence: small-molecule-activated
N-extein-intein junction, Proc. Natl. protein splicing, Proc. Natl. Acad. Sci.
Acad. Sci. U.S.A. 2004, 101, U.S.A. 2004, 101, 10505-10510.
6397 - 6402. 92. M.L. Lesaicherre, R.Y.P. Lue, G.Y.J.
82. H. Iwai, A. Lingel, A. Pluckthun, Chen, Q. Zhu, S.Q. Yao,
Cyclic green fluorescent protein Intein-mediated biotinylation of
produced in vivo using an artificially proteins and its application in a
split PI-PfuI intein from Pyrococcus protein microarray, I . Am. Chem. SOC.
furiosus,J. Biol. Chem. 2001, 276, 2002, 124,8768-8769.
16548-16554. 93. J.R. Sydor, M. Mariano, S. Sideris,
83. H. Iwai, A. Pluckthun, Circular S. Nock, Establishment of
b-lactamase: stability enhancement by intein-mediated protein ligation under
cyclizing the backbone, FEBS Lett. denaturing conditions: C-terminal
1999,459,166-172. labeling of a single-chain antibody for
84. C.P. Scott, E. Abel-Santos, M. Wall, biochip screening, Bioconjugate Chem.
D.C. Wahnon, S.J. Benkovic, 2002, 13,707-712.
Production of cyclic peptides and 94. T. Ozawa, S. Nogami, M. Sato,
proteins in vivo, Proc. Natl. Acad. Sci. Y. Ohya, Y. Umezawa, A fluorescent
U.S.A. 1999, 96,13638-13643. indicator for detecting protein-protein
85. J.A. Camarero, D. Fushman, S. Sato, interactions in vivo based on protein
I. Giriat. D. Cowburn, D.P. Raleigh, splicing, Anal. Chem. 2000, 72,
T.W. Muir, Rescuing a destabilized 515 1- 5157.
566
95. T. Ozawa, A. Kaihara, M. Sato, Angew. Chem., Int. Ed. Engl. 2004, 43,
K. Tachihara, Y. Umezawa, Split 5189-5192.
luciferase as an optical probe for 97. S.D. Liberles, S.T. Diver, D.J. Austin,
detecting protein-protein interactions S.L. Schreiber, Inducible gene
in mammalian cells based on protein expression and protein translocation
splicing, Anal. Chern. 2001, 73, using nontoxic ligands identified by a
2516-2521. mammalian three-hybrid screen, Proc.
96. H.D. Mootz, E.S. Blum, T.W. Muir, Natl. Acad. Sci. U.S.A.1997, 94,
Activation of an autoregulated protein 7825-7830.
kinase by conditional protein splicing,
Chemical Biology
70.2 Chemical Synthesis offroteins and Large Bioconjugates I 567
10.2
Chemical Synthesis o f Proteins and Large Bioconjugates
Philip Dawson
Outlook
This chapter describes the strategies and techniques used to chemically syn-
thesize large macromolecules. Due to the large size and functional diversity of
biological macromolecules, traditional approaches that require extensive use
of protecting groups have limited utility. Instead, biological macromolecules
are synthesized using chemical ligation methods that utilize highly chemose-
lective reactions to link medium sized synthetic precursors without the need
of extensive functional group protection. Although these reactions are used for
the synthesis of carbohydrates and nucleic acids, the general principles will be
described with a focus on the chemical synthesis of proteins.
10.2.1
Introdudion
In many ways, proteins represent the most functionally diverse family of

organic molecules. Polypeptides fold to form enzymes that are potent catalysts
of an astounding variety of chemical transformations, and molecular machines
and motors drive the movement cargo within cells and cell motility. Other
proteins form selective ion channels and highly specific binding proteins,
while others display structural roles for maintaining cellular structure or for
forming the coat of a virus. Much of our knowledge about protein function is
a result of detailed biophysical analysis of altered proteins. These proteins are
produced using site-specific amino acid substitutions enabled by a technique
termed site-directed mutagenesis [l].Although these techniques are powerful,
the ability to incorporate noncoded elements of structure and function enables
new questions to be experimentally addressed and the ability may also be
applied in the development of novel proteins with altered functions for use as
pharmaceuticals, biosensors, or for applications in nanotechnology [2-41.
The sophisticated tools of organic synthesis have enabled the straight-
forward assembly of biopolymers such as peptides, oligonucleotides, and
carbohydrates. Many complex biopolymers can be assembled using classical
solution phase organic synthesis. In addition, solid phase organic chemistry,
originally developed for the synthesis of these biopolymers [ S , 61, has greatly
facilitated the handling and solubility of protected biological macromolecules.
These methods have been further elaborated for the synthesis of more com-
plex biopolymers containing nonstandard subunits such as posttranslational
modifications to amino acids, unnatural amino acids, unnatural base pairs,
and modified glycans.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Cunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA. Weinheim
ISBN: 978-3-527-31150-7
568
However, the application of these tools becomes significantly more

challenging as the molecular weight and functional group complexity of
the biological macromolecules increase [7]. As a result, the synthesis of large
proteins and their bioconjugates remains a significant challenge. To address
these challenges, a growing set of highly chemoselective reactions has been
developed that enables the conjugation of unprotected fragments of biological
macromolecules in aqueous solution [2, 8, 91. These chemoselective ligation
reactions bridge the gap between the biopolymers accessible by classical
solution phase and solid phase methodologies and the larger products that
correspond to macromolecules such as proteins and glycoproteins. Although
this chapter will focus on proteins and protein conjugates, the chemoselective
ligation approach can be used to covalently assemble any large organic molecule
of interest, and is not limited to biological polymers.
10.2.2
10.2.2.1 Chemical Synthesis o f Peptides

The goal of attaining synthetic access to proteins was a stated goal of Emil
Fisher at the turn of the twentieth century [lo]. Early approaches for peptide
synthesis utilized a-haloacids, acyl chlorides, and azide coupling methods
[lo, 111. Interestingly a-haloacids are currently commonly used both in the
synthesis of N-alkyl peptides [12]and for chemical ligation [13, 141. Indeed, the
challenge of synthesizing peptides has driven the development of key methods
used in modern synthetic organic chemistry including the use of reversible
protecting groups [15], novel activation methods for carboxylic acids [lG], as
well as solid supported organic chemistry [S, GI. The chemical synthesis of
polypeptides in solution was refined throughout the twentieth century with
notable achievements such as the synthesis of glutathione, oxytocin, and
B-corticotrophin. Although these methods tend to be time consuming and
suffer from extreme solubility problems of large fully protected fragments, the
synthesis of several proteins in using traditional solution phase methods has
been achieved, notablyangiogenin (123 aa), and Midkine (121 aa) by Sakakibara
and coworkers [17].The use of solvent mixtures greatly enhanced the solubility
of late-stage fully protected synthetic products [ 171. More recently, the solubility
problem of fully protected peptides has been addressed by reversible backbone
protection strategies that disrupt aggregation through backbone hydrogen
bonding [18, 191.
10.2.2.2 Solid Phase Peptide Synthesis

Despite the achievements of polypeptide synthesis in solution, currently
at the research level, most polypeptides are synthesized by solid phase
peptide synthesis (SPPS) [S, GI. This approach, pioneered by Bruce Merrifield
revolutionized the synthesis of peptides and the principles have been applied
10.2 Chemical Synthesis ofproteins and Large Bioconjugates 1 569
to oligonucleotides and in recent years, carbohydrates. The essential idea was

to covalently anchor the C-terminal residue of a peptide to an insoluble swollen
polymer support. The subsequent amino acids could then be assembled in a
stepwise manner with activated amino acids while the growing polypeptide
chain remained on the “solid support.” Following chain assembly, the
polypeptide could be cleaved from the support and deprotected to yield the
desired polypeptide product. The advantages of the method were that synthetic
intermediates did not require extensive isolation and purification following
the coupling of each amino acid. Instead, all reagents could be washed away,
leaving the polypeptide attached to the solid support. The facile removal of
reagents enabled an excess of activated amino acids to be used to ensure
pseudo first-order kinetics throughout the course of the coupling reaction.
One key advantage of SPPS, which is often overlooked, is the tremendous
solvation of the peptide on the solid support. As discussed before,
fully protected peptides are poorly soluble in organic solvents such as
dimethylformamide (DMF). However, as the polypeptide grows on a solid
support (typically cross-linked polystyrene, although many new resins have
been introduced in recent years) the peptide remains soluble and the peptide
resin swells as much as 10-foldin volume. As a result, resin bound peptides are
effectively in solution at a much higher concentration than the same peptide
that is free in solution [20].
Through years of intense effort to perfect protecting groups, coupling
reagents, and deprotection strategies, SPPS has become a standard technique
for making polypeptides. There are two basic protecting group strategies used
in a majority of peptide syntheses. The first method, Boc/bzl uses trifluo-
roacetic acid (TFA) for deprotection of the Boc group at the N-terminus of the
growing peptide chain and hydrofluoric acid (HF) for side chain deprotection
and cleavage from the solid support [5-71. The second method is Fmoc/tBu
in which the N-terminal Fmoc group is removed by a treatment with base
(piperidine) and TFA is used to deprotect side chains and cleave the peptide
from the resin [21]. In addition to improvements in synthetic techniques,
SPPS has been enabled through the development of powerful methods for
the analysis and subsequent purification of the complex mixture of products
typically produced by SPPS. In particular, the development of reversed phase,
high performance liquid chromatography (HPLC) and macromolecular mass
spectrometry [22],matrix assisted laser desorption/ionization mass spectrom-
etry (MALDI) [23] and electrospray ionization mass spectrometry (ESI-MS)
[24]have revolutionized our ability to produce high quality synthetic peptides.
10.2.2.3 Protein Synthesis using Peptide Fragments Derived from Solid Phase
Peptide Synthesis
The ability of SPPS to generate high purity polypeptides (30-GO amino
acids) in reasonable yields (5-25% based on the loading of the C-terminal
amino acid) has lead to the development of approaches to assemble these
570
I polypeptide fragments into the large polypeptides that compose proteins. One
10 Synthesis $Large Biological Mo/ecu/es
approach uses the backbone protection methods described above to enable the
purification and assembly of protected peptide fragments [25].However, more
frequently, these approaches start with largely unprotected peptides derived
from SPPS and purified by HPLC.
10.2.2.4 Partially Protected Peptides

Peptide fragment condensation using partially protected fragments in polar
organic solvents was developed as a strategy to avoid some of the solubility and
deprotection problems associated with fully protected peptides [26]. One key
observation of this approach was that many amino acid side chains such as
those of Ser, Thr, Asp, Glu, His, Asn, Gln, and Trp could be left unprotected
during fragment coupling while the amino group of Lys and the thiol group
of Cys required protection. The second key observation was that thioacid (and
later thioester) groups could be chemoselectively activated toward acylation
in the presence of Glu and Asp carboxylic acid side chains. In this method
(Fig. 10.2-1), peptides were synthesized by SPPS on a resin that yielded a
C-terminal thioacid group. These peptides were deprotected and cleaved from
the solid support and the resulting unprotected peptides were purified to
homogeneity by chromatography. In order to assemble these peptides, the
Lys side chains had to be selectively reprotected. This approach has been
refined to enable the synthesis of several proteins, some with posttranslational
modifications. For example, CAMP response element binding protein with
two phosphorylated threonine residues was synthesized by this method [27].
However, the general use of these methods has been limited because of the
Fig. 10.2-1 Thioester method for the fragment condensation of partially protected
peptides. (R = Horalkyl).
10.2 Chemical Synthesis ofproteins and Large Bioconjugates I 571
challenges associated with side chain reprotection and epimerization of the

C-terminal activated amino acid in polar organic solvents. In addition, a final
deprotection of a large peptide is still necessary to complete the synthesis.
A philosophically different approach for the coupling of partially protected
peptides was developed by Kemp (Fig. 10.2-2) [28]. In this method, the inter-
molecular linking of the peptides was achieved by an initial, nonamide forming
reaction - a rapid asymmetric disulfide formation between an N-terminal Cys
peptide and a peptide with a C-terminal 4-hydroxy-6-mercaptodibenzofuran
ester. Once the peptide fragments joined together, an intramolecular 0 to
N acyl shift enabled peptide bond formation using moderate activation of
Fig. 10.2-2 Auxiliary mediated segment condensation in organic solvent.

572
I the C-terminus (aryl ester). Since the method avoids strong activation of the
10 Synthesis of large Biological Molecules
C-terminus, most side chains did not need protection except for the Cys thiol
group. In addition, this approach was not demonstrated using Lys with an
unprotected side chain amine. However, these acyl transfer reactions pro-
ceeded over several hours in dimethylsulfoxide (DMSO)/base and enabled the
synthesis of several peptides, up to 39 amino acids.
10.2.2.5 Chemoselective Ligation of Unprotected Peptides

The majority of chemically synthesized proteins have been synthesized using
chemoselective ligation methods. In principle, the problems associated with
protected peptides could be avoided entirely by using fully unprotected
peptides. However, this approach is complicated by the lack of selectivity
of fragment coupling chemistries for the N-terminal amine over Lys side chain
amino groups. The initial approaches to solve this problem were enabled by
the powerful insight that molecules as large as proteins are able to tolerate
significant changes to their covalent structure without significant affects
to their function. For example, Ala scanning mutagenesis of proteins has
demonstrated the tolerance of most side chains to alteration, except for a
select few critical residues involved in binding or catalysis [29]. As a result,
the synthetic chemist need not be limited to amide bond formation to link
peptides together if the object is to use synthetic chemistry to understand and
manipulate proteins. With this insight in mind, Offord and Rose utilized the
chemoselective reaction ofhydrazides and aldehydes to form a stable hydrazone
linkage [30].The reaction between one peptide with a C-terminal hydrazide
and another peptide incorporating an N-terminal glyoxylyl functionality was
facile in aqueous buffer at pH 4.6 (Fig. 10.2-3).
Fig. 10.2-3 Hydrazone ligation o f unprotected peptides in aqueous solution.

10.2 Chemical Synthesis off’roteins and Large Bioconjugates I 573
Fig. 10.2-4 Thioester ligation of unprotected peptides in aqueous solution.
Concurrently, Kent demonstrated the chemoselective ligation principle with

a thioester forming ligation reaction between a C-terminal thioacid group
and an N-terminal bromoacetyl moiety (Fig. 10.2-4) [31]. This ligation took
advantage of the unique nucleophilicity of thioacids at low pH. All strong
nucleophiles in proteins have high pK, values, for example, Cys pK, 9, and
Lys and Tyr pK,-lO. In contrast, thioacids have a pKa-3, and react rapidly and
-
selectively with alkyl bromides at pH 3-4. A key component of the thioester
and oxime ligation is that no side chain protecting groups are needed [32],
and the final polypeptide product is generated after ligation with no further
chemical manipulation.
The concept of chemoselective ligation for polypeptides inspired the
development of an expanding set of selective chemical reactions to link complex
organic molecules in aqueous solution [33]. These reactions include Schiff base
type ligations (hydrazone [30], oxime [34]), thiazolidine-based ligations [33],
alkylation of sulfhydryl groups [3 11 (thioester, thioether), Staudinger chemistry
135-381 (chemoselective reaction between a phosphine and an azide followed
+
by acyl transfer to form an amide), and [3 21 cycloaddition/click chemistry
(reaction of an azide and alkyne to yield a triazole) [39-411. Many of these
reactions have found wide utility in the synthesis of proteins and other
biological macromolecules.
A conceptually different approach to assemble fully unprotected peptides is
to use an enzyme to attain both specificity and catalysis of the amide bond
formation. This strategy has been developed using proteases, enzymes that
cleave peptide backbone amide bonds. Following the principle of microscopic
reversibility, any enzyme can be coerced to catalyze a reaction not only
in the forward direction but also in the reverse direction. Such “reverse
proteolysis” methods typically use substrates containing activated C-termini,
574 I 70 Synthesis OfLarge Biological Molecules
altered reaction conditions (changingthe solvent polarity, temperature or pH),

or active site modified enzymes [42-441. In addition, the product ratio can
be shifted in favor of ligated products by using organic solvents (lowering
the concentration of water). However, slow ligation rates and background
aminolysis of the peptides are significant problems with the approach.
The most successful strategy for this reverse proteolysis approach is the
engineered protease “subtiligase” developed by Wells and coworkers 145,461.
This approach took advantage of (a) C-terminal glycolate ester dipeptides that
are stable to background hydrolysis but are excellent substrates since they
mimic the natural substrate, and (b) protein engineering of the protease,
thiolsubtiligase [44], to yield an enzyme that better catalyzes amide bond
formation rather than hydrolysis. The so-called “subtiligase” was used to
synthesize RNaseA with fluorinated His analogs incorporated to probe the
mechanism of RNaseA catalysis. Later studies used phage display to evolve a
subtiligase variant that was more robust in the presence of denaturants 1451.
Even with the improvements, the main hurdle for extensive use ofthis approach
is the low solubility of large unprotected peptides in the nondenaturing buffer
conditions required for efficient enzyme catalysis.
10.2.2.6 Practical Requirements for Chemical Ligation Reactions

An effectiveligation chemistry needs to fulfill several criteria. First, the reaction
needs to be chemoselective - there should be no cross-reactivity between other
functional groups found in biomolecules such as peptides, carbohydrates, or
oligonucleotides. The necessity of even a single protecting group greatly
complicates a synthesis and limits the utility of the method. Second, the
ligation needs to be compatible with neutral or weakly acidic aqueous solutions
to ensure compatibility with hydrophilic biomolecules without promoting base
catalyzed side reactions. Third, the reaction kinetics needs to be rapid. As their
name implies, biological macromolecules are high in molecular weight and
also have limited solubility in solution. In addition, ligation reactions between
two large biopolymers are bimolecular and require equimolar amounts of
reactants to avoid wasting precious starting materials. As a result, reaction
rates decline rapidly as the concentration decreases. Typically, effective ligation
reactions need to proceed to completion within 24 hours of starting, at room
temperature and at peptide concentrations at or below 1 mM.
10.2.2.7 Chemoselective Ligation to Form Native Peptide Bonds

The most commonly used chemical ligation reaction for the synthesis of
proteins utilizes the highly chemoselective reaction between one peptide
bearing an N-terminal Cys residue and another peptide containing a C-terminal
thioester moiety (Fig. 10.2-5) [47]. In this native chemical ligation strategy, the
deprotonated thiolate of the N-terminal Cys residue undergoes facile exchange
with the C-terminal thioester group, forming an intermediate structure that
Fig. 10.2-5 Native chemical ligation in aqueous solution.
links the peptides through a thioester bond. Subsequently, a rapid S-to-N

intramolecular acyl transfer yields a stable amide bond at the site of ligation
[47, 481. An advantage ofthis reaction is that the “native” polypeptide with a Cys
residue at the ligation site is obtained without further chemical manipulation.
The chemoselectivity of the reaction stems from the combination of a Cys
specific, reversible thioester exchange (any Cys residue in either peptide can
participate in this equilibrium) with an essentially irreversible intramolecular
reaction that is specific to N-terminal Cys residues. Under typical ligation con-
ditions (pH 6.5-7.5, 1 mM peptide) the intermolecular transthioesterification
is rate limiting and no thioester intermediate is observed because of rapid
rearrangement [47]. The reaction also utilizes the unique reactivity profile of
the thioester as an activated acyl group. Compared to oxoesters with identical
substituents, thioesters are much more reactive toward thiol nucleophiles
[49] (and to a lesser extent toward amine nucleophiles [SO]), facilitating rapid
576
reaction kinetics without resorting to high levels of activation that could re-
sult in epimerization of the C-terminal amino acid. In contrast to the high
reactivity toward thiols, thioesters are remarkably resistant to hydrolysis, the
main competing reaction in aqueous solution (55 M). Indeed, thioesters have
been shown to hydrolyze more slowly than the corresponding ester derivative
[50, 511. It is these properties of thioesters that have made them important
reactive intermediates in numerous biological processes from nonribosomal
peptide synthesis, ubiquitination polyketide synthesis, and lipid biosynthesis.
The native chemical ligation reaction has proved to be remarkably robust
and has enabled the synthesis of a variety of proteins [52] from two polypeptide
fragments, or using a single N-terminal protecting group, multiple peptide
segments assembled in a sequential manner [53]. The chemoselectivity of
the reaction extends beyond functional groups found in polypeptides, and
the reaction has been used in the context of posttranslationally modified
peptideslproteins including glycopeptides, lipopeptides, and phosphopeptides
[2-41. In addition, native chemical ligation has proved to be an effective ap-
proach for the synthesis of macromolecules that do not require “native” amide
bonds. For example, the reaction has been used for the conjugation of peptides
to deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and PNA (peptide nu-
cleic acids), and to N-terminal Cys or thioester bearing complex carbohydrates
and in the assembly of branched dendritic macromolecules [54]. Because at
their fundamental level most chemical ligation reactions are bimolecular, the
ligation rate is highly sensitive to concentration. As a result, successful appli-
cation of native chemical ligation to a given target is largely a function of the
solubility of the macromolecules and is generally independent of its molecular
weight. Indeed, as described in Chapter 10.1,methods have been developed to
use large biologically derived protein fragments in these reactions.
10.2.2.8 Some Variations on the Native Chemical Ligation Theme

Although most applications of native chemical ligation utilize the originally
envisioned cysteine-thioester pairing, several variations of this reaction have
been described [3, 551. The critical amino-thiol moiety of an N-terminal Cys
residue can be varied to yield alternative ligating groups. For example, adding
an additional methylene group into the side chain yields a homocysteine that
can react with a peptidyl thioester at pH 8 to form a thioester intermediate
that can rearrange through a six-membered ring to form an amide bond at a
hCys ligation site [5G]. Similarly, selenocysteine has been a substitute for Cys
to facilitate ligation at pH G to form selenoproteins. These ligation reactions
have excellent kinetics due to the high nucleophilicity of the selenol side chain
[57-60]. An alternative strategy is to form the thioester intermediate by the
reaction of a nucleophilic thioacid group on an N-terminal /3-bromoalanine
residue at low pH, in analogy to the thioester forming ligation described earlier
in this chapter. Subsequent neutralization of the reaction leads to acyl transfer,
generating an amide bond at the Cys ligation site [Gl].
10.2.2.9 Native Chemical Ligation to Yield Noncysteine Ligation Products

The main limitation of the native chemical ligation approach is the requirement
for a Cys residue at the site of ligation. Although Cys is a natural amino acid,
it is found in low abundance, limiting the chances of finding a convenient
natural ligation site. In addition, the reactivity profile of free thiols that are so
useful in ligation, can be a liability when present in the final protein product.
One approach to address this limitation is to modify the Cys residue following
ligation. For example, Cys residues can be alkylated by alkyl halides to yield
analogs of amino acid side chains, for example, glutamine, glutamate, or lysine
[53]. This reaction is high yielding at pH 8 and is specific for reduced Cys
residues. An alternative approach is to convert all reduced cysteine residues in
the final polypeptide product to alanine by desulfurization [62]. This reaction
is facilitated by treatment with hydrogenation catalysts such as activated
Raney nickel and has been shown to proceed with retention of the peptide
stereochemistry.
10.2.2.10 Amide Ligation Using Auxiliaries

Another approach for assembling unprotected peptides is through the
reversible attachment of the functional equivalent of a Cys side chain (ligation
auxiliary) onto the N-terminus of a peptide. In analogy to native chemical
ligation (Fig. 10.2-G(a)),an intermolecular thioester exchange followed by an
intramolecular S-to-N acyl transfer yields an amide bond at the site of ligation.
Subsequent removal of the auxiliary yields the desired polypeptide [63, 64, 661.
Two strategies for ligation auxiliaries have proven to be practical for polypeptide
synthesis, and both utilize a benzyl moiety that is stable as a benzyl amine
but labile as a benzyl amide. The first strategy is to incorporate a 3,4,5-
trimethoxy-2-mercaptobenzyl (Tmb)group onto the N-terminus of the peptide
(Fig. 10.2-6(b))[64, 651. Following the S-to-N acyl transfer to yield a secondary
benzyl amide, the Tmb group can be removed by TFA and scavengers. A
second strategy is to use an N-terminal I-phenyl-2-mercaptoethyl group to
facilitate ligation (Fig. 10.244~)).When the phenyl ring has a 2,4-methoxy
substitution, the auxiliary can be removed with TFA; alternatively, substitution
with a 2-nitro moiety, results in an auxiliary that is photolabile following
(b)
eOMeo
Me0
Fig. 10.2-6
H
N
]+
OMe
- (C)
*s-
HN3NpGzq
X
Auxiliary mediated native chemical ligation. (a) trans thioesterification,
S-to-acyl tranfer, removal of auxiliary. (b) Tmb auxiliary (c) Z-phenylethane thiol auxiliary
70 Synthesis $Large Bio/ogica/ Molecules
578
I ligation [66-691. Both these approaches enable ligation when there is a Gly
residue at the ligation junction.
10.2.3
10.2.3.1 Synthesis of N-terminally Functionalized Peptides

N-terminal modification of peptides with reactive groups for chemoselective
ligation is synthetically straightforward using both Boc and Fmoc SPPS. AS
shown in Fig. 10.2-7, bromoacetyl, ketone, aminooxy, azide, alkyne, thiol, and
Cys, groups, can all be incorporated at the N-terminus using standard peptide
coupling conditions. Aldehydes are most easily introduced after solid phase
synthesis through quantitative transformation of an N-terminal Ser residue to
a glyoxylyl group using NaI04.
10.2.3.2 Synthesis o f Functionalized Amino Acid Side Chains

Any group that can be attached to the N-terminus of a peptide can be attached to
an amine side chain through appropriate protecting group manipulation. For
example, Lys (Alloc) side chain protecting groups can be removed selectively
after full chain assembly in both Fmoc and Boc solid phase synthesis protocols.
The revealed amino group can be modified as described above. In addition,
numerous amino acids with chemoselective ligation moieties have been
synthesized for direct incorporation into peptides.
10.2.3.3 Synthesis o f Peptides Modified at the C-terminus

C-terminal modification is significantly more complicated since it requires
manipulation of the cleavable peptide linker or activation of the C-terminus
after chain assembly. Specific peptide resin-linkers have been developed
Fig. 10.2-7 Solid phase synthesis of N-terminally modified peptides.

Fig. 10.2-8 Solid phase synthesis of C-terminally modified peptides.
that generate C-terminal moieties such as thioacids, thiols, aldehydes, and

hydrazones directly upon cleavage from the resin (Fig. 10.2-8(a)).Alternatively,
safety-catchapproaches, like the sulfonamide linker, can be selectively activated
following chain assembly, enabling the peptide to be cleaved from the resin
by a desired nucleophile (typically an amine or thiol). An additional approach
is to modify the side chain of the C-terminal amino acid (Fig. 10.2-8(b)).This
approach is useful if the geometry of the ligation site can tolerate significant
changes. Such side chain manipulation is easier to perform than direct
modification of the peptide C-terminus, making it an attractive alternative
approach for C-terminal modification.
10.2.3.4 Synthesis o f C-terminal Thioester Peptides

C-terminal thioester peptides are critical for the native chemical ligation
approaches for peptide synthesis. In addition, thioester peptides are useful
synthetic intermediates for many C-terminal modifications that can be
introduced through aminolysis of the thioester bond after chain assembly
by SPPS (see Chapter 10.1).Boc-based SPPS is the most effective method for
the generation of thioester peptides because the thioester group is stable to
the deprotection conditions for Boc removal (TFA) and side chain removal
(HF). In contrast, Fmoc-based SPPS methods are less compatible with these
thioester linkers since the Fmoc group must be removed with base, typically
the secondary amine, piperidine. Several protocols have been developed using
hindered, nonnucleophilic bases for Fmoc deprotection that facilitate the
generation of short (- 10 amino acid) thioester peptides. Other approaches for
thioester peptide synthesis by Fmoc SPPS protocols utilize “safety-catch”
linkers that are stable for peptide elongation but can be subsequently
converted into activated acyl groups. Following activation, the peptide can
be cleaved from the resin using thiols. Several of these strategies are discussed
in Chapter 10.1. However, Fmoc-based thioester peptide synthesis is still
technically challenging and an active area of methodology development.
580
I 10 Synthesis $Large Bio/ogica/ Mo/ecu/es
Development of Fmoc SPPS compatible thioester synthesis is important

since several posttranslation modifications such as phosphorylation and
glycosylation are most efficient with Fmoc SPPS.
10.2.3.5 Native Chemical Ligation Reactions
10.2.3.5.1 Selection ofthe Ligation Site

The first consideration when planning a protein synthesis by native chemical
ligation approaches is the selection of an appropriate ligation site. Because
of the challenges of large polypeptide synthesis by SPPS, no segment should
exceed -60 amino acids in length and, typically, peptides of 25-60 amino acids
are selected. Many synthetic protein targets must be assembled sequentially
from more than two components, requiring a protecting group for the
N-terminal Cys residue of all internal segments. Since ligation requires both
the amine and thiol of the N-terminal Cys, only one of these groups needs to
be protected and the N-Msc and S-Acm protecting groups have been utilized
for this purpose [53, 70, 711. Alternatively, protection of both groups can be
achieved using a thiaproline residue that can be converted to Cys through
treatment with methoxylamine [72].An additional synthetic constraint is the
requirement for a Cys residue to facilitate the ligation reaction. Ideally, a
natural Cys residue is selected and it has been shown that native chemical
ligation is compatible with a variety of Xaa-Cys ligation sites [73, 741. If no
native Cys residue is available, one solution is to substitute a Cys residue at
a noncritical site in the polypeptide sequence. Following ligation, this Cys
residue can be left as a free thiol, alkylated to mimic a natural side chain or
the polypeptide can be globally desulfurized, yielding a protein with Ala in
place of all Cys residues [62].An alternative approach to non-Cys ligation sites
is the use of a ligation auxiliary that facilitates ligation at unhindered Glycine
residues in a polypeptide sequence. Although a Gly-Gly sequence is the most
synthetically straightforward sequence to use with these auxiliaries, ligation
sites using Xaa-Gly and Gly-Xaa sequences have been demonstrated [64-69].
Overall, it is important to consider that a strategy which involves the fewest
number of chemical manipulations and purifications following SPPS is likely
to result in the highest yield of synthetic products.
10.2.3.5.2 Selection o f Ligation Conditions

Chemical ligation methods are typically compatible with a wide range of
reaction conditions. However, it is important to note that in addition to
optimizing ligation rates, maintenance of the chemoselectivity of the reaction
is critical. As a result, native chemical ligation is typically performed at a
pH of 6.5-8.0 at 25-40°C to avoid the possibility of unwanted thioester
reactivity such as aminolysis, hydrolysis, or epimerization of the C-terminal
amino acid. To maintain pH control in the presence of high concentrations
10.2 Chemical Synthesis ofProteins and Large Bioconjugates 1 581
of peptide functional groups, and thiol additives, high buffer concentrations

(100-500 mM) are used. Another important consideration is that Cys residues
are prone to oxidation to form disulfides, which are unable to participate
in ligation. An alkyl thiol or soluble phosphine is typically added to provide
reducing conditions for the ligation reaction.
Chemical ligation reactions proceed rapidly in aqueous solution and additives
or cosolvents can be added to facilitate peptide solubility. The most common
additive is the denaturant 6 M guanidine hydrochloride that facilitates the
solubility of unstructured peptide fragments, thereby increasing peptide
concentration and reducing the possibility of peptide conformation affecting
ligation rates. Similarly, detergents have been used to facilitate the solubility
of hydrophobic peptides and in some cases may also increase ligation rates by
concentrating the peptides in peptide-micellar structures. Organic cosolvents
such as trifluoroethanol, DMF, dimethylsulfoxide, or acetonitrile can also
enhance peptide solubility, although these additives can make purification by
HPLC more challenging.
10.2.3.5.3 Enhancing Ligation Rates

Ideally, chemical ligation rates should proceed with fast kinetics to avoid
unwanted side reactions. Since ligations are typically equimolar bimolecular
reactions, the most straightforward approach for increasing ligation rates
is to increase peptide concentration. In addition, it has been shown that
thioester peptides with better thiol leaving groups undergo faster ligation and
transthioesterification. They can be synthesized before ligation, or preferably
in situ, by adding an excess of thiophenol to the ligation reaction. It should
be noted that the ligation buffer can significantly affect thiophenol solubility
and more soluble thiols, such as mercaptoethylsulfonate, can be used when
solubilizing agents such as 6 M guanidine HC1 are not used. Another approach
to enhance the ligation rate is to increase the effective concentration of the
peptides. I t has been shown that some proteins can adopt a native-like
(although less stable) folded conformation following cleavage into two or more
polypeptide segments. As a result, performing the ligation reaction under
conditions that promote polypeptide folding can significantly accelerate the
ligation reaction [75]. Similarly, use of detergents or lipid bilayers [76] can
increase the effective concentration of hydrophobic polypeptides.
10.2.4
10.2.4.1 Structure-function Analysis o f Chemokines and the Development o f

Protein Pharmaceuticals
Chemokines are a large family of proteins that mediate the directed migration
of leukocytes in the body. The moderate size (-70 amino acids) and medical
importance of these proteins have made them an attractive target for chemical
I0 Synthesis of Large Biological Molecules
582
I \
S
I
XRANTES(4-68)
Moderate potency 0 - 0
Natural Product &- ,
?'il
Nzi
HO Position 1 optimization
0 XRANTES(4-68)
0-
HO
. o
Position 2 optimization
HO Positions 1,2 and 3 combination
4
~ ' ~ i Y ! 4 A0 N T E S ( 4 - 6 8 )
Highly potent
O . 0
protien mimetic
A
Fig. 10.2-9 Protein Medicinal Chemistry. The N-terminus of the chemokine RANTES was
systematically modified to improve receptor binding and HIV microbicide activity.
synthesis. These proteins adopt a conserved fold, consisting ofthree antiparallel

,&strands and a C-terminal a-helix, which is stabilized by two conserved
disulfide bonds (Fig. 10.2-9). The structure-function analysis of chemokines
has been greatly enhanced by chemical synthesis, particularly in the work
of Clark-Lewis and coworkers. Using total SPPS, over 1000 chemokine and
chemokine analogs were synthesized, utilizing both natural and unnatural
amino acids to probe the molecular basis of chemokine function. One
notable study probed the biological relevance of dimerization for the biological
activity of chemokines. The chemokine interleukin-8 (IL-8) dimerizes at high
concentrations necessary for structural determination by nuclear magnetic

resonance (NMR) or crystallography. The dimerization interface includes an
extended /I-sheet structure between the monomers. To test the hypothesis
that IL-8 functioned as a monomer at biologically relevant subnanomolar
concentrations, a derivative of IL-8 was synthesized with a methyl group
attached to the backbone amide (N-Me amide) designed to disrupt backbone
hydrogen bonding and to prevent dimerization. The full biological activity of
this analog provided the first strong support for monomeric IL-8 being the
biologically relevant conformation of the chemokine.
The chemokine IL-8 was also the first protein synthesized by native
chemical ligation. Forming the protein by ligation has the advantage of
using smaller synthetic peptides that can be synthesized rapidly with high
purity. (Although chemokines can be synthesized by SPPS, at -70 amino
acids, they represent the upper limit of effective synthesis by this approach and
different chemokines contained variable amounts of microheterogeneity.) The
centrally located Cys34 provided a convenient site for ligation between peptides
corresponding to IL-8 I-33-thioester and IL-8 34-72. Following ligation, the
reduced polypeptide was oxidatively folded in 1 M guanidine HC1, pH 8.5 to
yield fully active IL-8.
Recently, work on synthetic chemokines has been stimulated by the potential
for analogs of the chemokine RANTES (regulated on activation, normal, T
expressed, and secreted) to block human immunodeficiency virus (HIV) entry
of cells. This inhibition is achieved through intracellular sequestration of
the chemokine receptor CCR5, which is also a coreceptor for HIV entry. In
order to develop RANTES as a pharmacological agent for use as an HIV
microbicide, a large set of RANTES analogs was synthesized with nonnatural
amino acid structures at the N-terminus of the protein. The analogs were
synthesized by native chemical ligation in analogy to the approach described
for IL-8. As shown in Fig. 10.2-9, chemical synthesis enabled the screening
of multiple analogs and resulted in a RANTES analog with >50-fold greater
potency than the starting lead compound, AOP (amino0xypentane)-RANTES.
Interestingly, AOP-RANTES was originally generated by an oxime ligation
between aminooxypentane and an N-terminal glyoxylyl-RANTES analog
(derived form biological expression),demonstrating the power of semisynthetic
methods in protein chemistry. I t is also notable that attempts to generate more
potent N-terminal variants of RANTES using phage display libraries were
unsuccessful. It was concluded that this work “was able to exploit the greater
breadth of possible substitutions and thus higher degree of spatial resolution,
afforded by total chemical synthesis.”
10.2.4.2 Synthesis of N-myristoylated HIV-1 Matrix Protein p17 from Three

Peptide Segments
Protein lipidation is a critical posttranslational modification that serves
to regulate the membrane attachment of numerous cellular and viral
584
I 10 Synthesis of Large Bio/ogica/ Molecules
Fig. 10.2-10 Total synthesis of HIV-1 matrix protein with an N-terminal myristoyl group.
proteins. HIV-1 matrix protein p17 is a 131 amino acid protein with an
N-terminal myristoyl (C14) group. When covalently linked to the HIV Gag
polyprotein, p17 targets the polyprotein to the host-cell membrane for particle
assembly. However, on HIV viral maturation, proteolytic cleavage occurs
at the C-terminus of p17 and enables p17 to partially dissociate from the
viral membrane. Since large quantities of myristoylated p17 cannot be
obtained through heterologous expression systems, the protein was chemically
synthesized to study the effects ofmyristoylation on p17 structure and function.
As shown in Fig. 10.2-10, the 131 amino acid protein was assembled from
three peptide segments using an S-Acm protecting group for the peptide
corresponding to residues 56-85 to avoid cyclization of this central subunit.
Using this approach, 275 mg of this 15-kDa lipoprotein was synthesized which
enabled detailed biophysical measurements. These studies suggest that the
role of the myristoyl group is to stabilize the trimeric folded state of the
protein rather than to effect a conformational change as had been previously
proposed. Significantly, this large protein was synthesized with an overall yield
of 7.5% based on the loading of the peptide resin used in solid phase synthesis,
emphasizing the efficiency of the synthetic procedures (over 300 synthetic
steps were performed in the synthesis of this protein).
10.2.4.3 Synthesis o f Nonlinear Protein Structures

The synthesis of proteins with nonlinear architecture has found many
applications in protein design. One class of designed proteins consists of
70.2 Chemical Synthesis ofProteins and Large Bioconjugates 1 585
a linear template that contains multiple reactive groups onto which linear
peptides can be ligated to generate a branched peptide structure. Chemical
ligation approaches are the methods of choice for the generation of such
template assembled synthetic protein (TASP) [77]and multiantigenic peptide
(MAP) [78] structures, and they have been assembled using thioester [79],
thioether, oxime, hydrazone, and thiazolidine ligation reactions.
A notable example of this approach for assembling proteins is the synthesis of
tetrameric and pentameric TASP molecules on the basis of the transmembrane
(TM)domain of HIV virus protein u (Vpu).Viral membrane proteins frequently
oligomerize to form ion channels but analysis ofthese channels is complicated
by difficulties in determining the oligornerization state of the protein. As a
result, the chemical synthesis of branched peptides with a desired (four or five)
stoichiometry of TM peptides is an attractive approach. However, TM peptides
are highly insoluble, which complicates the purification and assembly of the
multimeric product. To overcome these problems, polyethylene glycol-derived
polyamide (PPO) solubilization tag was attached through a cleavable thioester
bond to the C-terminus of each Vpu TM peptide. In order to ligate the peptides
to the tetravalent or pentavalent template, an N-terminal aminooxy group was
incorporated to each TM peptide, complementary to the ketoamide moieties
on the template. As shown in Fig. 10.2-11,this synthetic strategy enabled the
assembly of soluble Vpu TM-PPO-based TASP molecules with a molecular
weight of over 20 000 Da. Cleavage of the thioester link to the solubilizing
PPO moiety and incorporation into liposomes enabled the characterization of
4 and 5 helical bundle ion channels. Conductivity measurements on these Vpu
TASP molecules suggest that a pentamer is the oligomeric state of the Vpu
ion channel.
Another nonlinear architecture that has been explored in proteins is head-
to-tail cyclization. Small cyclic peptides are common in peptidomimetic efforts
to mimic protein loops using peptides but traditional peptide cyclization
methods are not applicable to large polypeptide chains. Cyclic proteins
can be synthesized from a polypeptide containing both an N-terminal
Cys and a C-terminal thioester [80-821. It has been shown in multiple
proteins that the intramolecular ligation reaction proceeds at a faster rate
than the competing polymerization reaction yielding near-quantitative cyclic
polypeptide structures. This procedure has been used to synthesize naturally
cyclic proteins such as the cyclotide family [82] and also engineered cyclic
proteins designed to increase thermodynamic stability [SO-821.
Protein cyclization was taken one step further by the synthesis of a protein
catenane, consisting of two interlocked cyclic peptides [83, 841. This structure
was designed from the tetramerization domain of p53 which folds in a
bisecting U conformation (Fig. 10.2-12). To construct the catenane, linear
peptides corresponding to the p53 tet domain were synthesized with both an
N-terminal Cys and a C-terminal thioester. The catenane was assembled by
folding the peptide to preorganize the bisecting conformation. Since protein
folding is faster than chemical ligation, native chemical ligation of the ends
586
I 10 Synthesis $Large Biological Molecules
Fig. 10.2-11 Assembly of a pentameric ion channel based a transmembrane domain o f

HIV (Vpu). The membrane domain was attached to a PPO-peg group t o solubilize the
peptide for purification and ligation. Upon assembly into the 5-helix TASP molecule, the
PPO-peg group was removed by hydrolysis.
of the p53 polypeptide resulted in quantitative catenane formation, forming

a topologically linked dimer. These interlocked protein structures were found
to be extremely thermodynamically stable - stabilizing the fold by >SO"C at
10 pM. Interestingly, the stability of these proteins stems from destabilization
of the denatured state rather than stabilization of the folded state.
10.2.5
Future Directions
Chemical ligation approaches have revolutionized the synthesis of macro-

molecules, enabling the synthesis of monodisperse products over 50 000 Da
in molecular weight. These highly chemoselective reactions have proven to be
robust for the assembly of a wide variety of biological macromolecules and, as
a result, many of the future directions in this field depend on the application
of synthetic macromolecules to address fundamental questions about protein
10.2 Chemical Synthesis ofProteins and large Bioconjugates I 587
CGGGEY ~'TLVIKGKERt;EMFKELNEALELKDAQAGKEPCIG-COS~
Fig. 10.2-12 Synthesis of a protein catenane based on the p53 tetramerization domain
structure and function in vitro as well as in vivo. Systematic incorporation of

unnatural amino acids to modify the side chains and backbone structures of
polypeptides promises to yield new insights into protein structure and function
as well as into enzymatic catalysis. In addition, the incorporation of specific
stable isotopes into proteins (2H,I3C, l S N ) promises to be a powerful approach
for both NMR and infrared (IR) analysis of proteins.
In order to use chemical ligation approaches, it is necessary to synthesize
the large macromolecular precursors in a straightforward manner. Indeed,
the synthesis of the modified synthetic polypeptides is frequently the rate-
determining step in synthesizing a protein. New methods for the synthesis
of all peptides but particularly peptide thioesters need to be developed
to improve synthetic access to proteins. For example, new approaches
for synthesizing fragile posttranslationally modified glyco-, phospo-, and
lipopeptides are being developed [85-871. Similarly, improvements to SPPS
will increase the length of peptide precursors, and enable larger proteins to be
synthesized.
588
I Current methods for chemical ligation have great utility but new advances
will greatly enhance the size and quantity of proteins that can be chemically
synthesized. Of particular importance is the development of straightforward
methods for the handling of peptides following ligation reactions. The
development of solid phase ligation approaches [88, 891, one-pot syntheses
[90, 911, and the use of affinity tags [92]promise to greatly simplify the yield of
synthetic proteins assembled from multiple components. New approaches for
chemical ligation will provide greater synthetic flexibilityas shown with amide-
forming ligation auxiliaries [62-691. Approaches have been described to use
the chemoselective reaction between phosphines and azides to yield a thioester
linked aminophosphorane intermediate that rearranges to yield a native
amide bond [36, 37, 931. In addition, non-native ligation chemistries forming
structures such as triazoles promise to enhance the types of modifications that
can be made to synthetic macromolecules [39-411. Further development of
simple and general ligation approaches will greatly enhance the synthesis of
macromolecules and protein natural products.
References
1. M. Smith, In vitro mutagenesis, Annu. 10. E. Fisher, Untersuchungen uber

Rev. Genet. 1985, 19,423. aminosauren, polypeptide, und
2. P.E. Dawson, S.B.H. Kent, Synthesis proteine, Ber. Chem. Ges. 1906,39, 530.
of native proteins by chemical ligation, 11. T. Kimmerlin, D. Seebach, ‘100years
Annu. Rev. Biochem. 2000, 69, 923. of peptide synthesis’: ligation methods
3. B.L. Nilsson, M.B. Soellner, R.T. for peptide and protein synthesis with
Raines, Chemical synthesis of applications to beta-peptide
proteins, Annu. Rev. Biophys. Biomol. assemblies, /. Pept. Res. 2005,
Struct. 2005, 34,91. 65, 229.
4. J.D. Hartgerink, Covalent capture: a 12. R.N. Zuckermann, J.M. Kerr, S.B.H.
natural complement to self-assembly, Kent, W.H. Moos, Efficient method
Curr. Opin. Chem. Biol. 2004, 8, 604. for the preparation of peptoids
5. R.B. Merrifield, Solid phase peptide [oligo(n-substituted glycines)]by
synthesis, J . Am. Chem. SOC.1963, 85, submonomer solid-phase synthesis,
2149. J . Am. Chem. SOC.1992, 114,10646.
6. B. Merrifield, in Peptides: Synthesis, 13. F.A. Robey, R.A. Fields, Automated
Structures, and Applications, 1st ed., synthesis of N-bromoacetyl-modified
(Ed.: B. Gutte),Academic Press, San peptides for the preparation of
Diego, 1995, 93. synthetic peptide polymers,
7. S.B. Kent, Chemical synthesis of peptide-protein conjugates, and cyclic
peptides and proteins, Annu. Rev. peptides, Anal. Biochem. 1989, 177,
Biochem. 1988,57,957. 373.
8. J.A. Borgia, G.B. Fields, Chemical 14. M. Schnolzer, S.B.H. Kent,
synthesis of proteins, Trends Constructing proteins by dovetailing
Biotechnol. 2000, 18, 243. unprotected synthetic peptides:
9. H.C. Hang, C.R. Bertozzi, backbone engineered HIV protease,
Chemoselective approaches to Science 1992, 256, 221.
glycoprotein assembly, Acc. Chem. Res. 15. M. Bergmann, L. Zervas, Biochem. Z.
2001,-34, 727. 1932, 203, 280.
References I 5 8 9
16. F. Albericio, L.A. Carpino, Methods a 40-fold rate acceleration of the

Enzymol., 1997, 289, 104. intramolecular 0,N-acyl transfer
17. S. Sakakibara, Chemical synthesis of reaction between peptide fragments
proteins in solution, Biopolymers 1999, bearing only cysteine protecting
51, 279. groups,/. Org. Chem. 1993, 58,2216.
18. J. Bedford, C. Hyde, T. Johnson, 29. J.A. Wells, Systematic mutational
W. Jun, D. Owen, M. Quibell, R.C. analyses of protein-protein interfaces,
Sheppard, Amino acid structure and Methods Enzymol.1991, 202, 390.
“difficult sequences” in solid phase 30. K. Rose, L.A. Vilaseca, R. Werlen,
peptide synthesis, Int. /. Pept. Protein A. Meunier, I. Fisch, R.M. jones, R.E.
Res. 1992, 40, 300. Offord, Preparation of well-defined
19. M. Mutter, A. Nefzi, T. Sato, X. Sun, protein conjugates using
F. Wahl, T. Wohr, Pseudo-prolines enzyme-assisted reverse proteolysis,
(psi-pro)for accessing inaccessible Bioconjug. Chem. 1991, 2, 154.
peptides, Pept. Res. 1995, 8, 145. 31. M. Schnolzer, S.B. Kent, Constructing
20. V.K. Sarin, S.B.H. Kent, R.B. proteins by dovetailing unprotected
Merrifield, Properties of swollen synthetic peptides:
polymer networks: solvation and backbone-engineered HIV protease,
swelling of peptide-containing resins Science 1992, 256, 221.
in solid phase peptide synthesis, /. 32. M. Baca, T.W. Muir, M. Schnolzer,
Am. Chem. SOC.1980, 102,5463. S.B.H. Kent, Chemical ligation of
21. R.C. Sheppard, New solid-phase cysteine-containing peptides:
methods in the synthesis of natural synthesis of a 22 kDA tethered dimer
peptides, Biochem. SOC.Trans. 1980, 8, of HIV-1 protease, /. Am. Chem. SOC.
744. 1995, 117,1881.
22. B.T. Chait, S.B.H. Kent, Weighing 33. J.P. Tam, J.X. Xu, K.D. Eom, Methods
naked proteins-practical, and strategies of peptide ligation,
high-accuracy mass measurement of Biopolymers 2001, GO, 194.
peptides and proteins, Science 1992, 34. K. Rose, Facile synthesis of
257,1885. homogeneous artificial proteins, /.
23. K. Tanaka, The origin of Am. Chem. SOC.1994, 116, 30.
macromolecule ionization by laser 35. E. Saxon, C.R. Bertozzi, Cell surface
irradiation (Nobel lecture), Angew. engineering by a modified Staudinger
Chem., Int. Ed. Engl. 2003, 42, 3860. reaction, Science 2000, 287, 2007.
24. T. Hunt, Nobel Lecture. Protein 36. E. Saxon, 1.1. Armstrong, C.R.
synthesis, proteolysis, and cell cycle Bertozzi, A “traceless” staudinger
transitions, Biosci. Rep. 2002, 22,465. ligation for the chemoselective
25. M. Quibell, L.C. Packman, T. Johnson, synthesis of amide bonds, Org. Lett.
Solid-phase assembly of backbone 2000, 2, 2141.
amide-protected peptide segments: an 37. B.L. Nilsson, L.L. Kiessling, R.T.
efficient and reliable strategy for the Raines, Staudinger ligation: a peptide
synthesis of small proteins, 1.Am. from a thioester and azide, Org. Lett.
Chem. SOC.,Perkin Trans. 1 1996, I, 2000, 2, 1939.
1227. 38. M. Kohn, R. Breinbauer, The
26. J. Blake, C.H. Li, New segment- Staudinger ligation-a gift to chemical
coupling method for peptide synthesis biology, Angew. Chem., Int. Ed. Engl.
in agulous solution, Proc. Natl. Acad. 2004,43,3106.
Sci. U.S.A.1981, 78,4055. 39. H.C. Kolb, M.G. Finn, K.B. Sharpless,
27. S. Aimoto, Contemporary methods for Click chemistry: diverse chemical
peptide and protein synthesis, Curr. function from a few good reactions,
Organ. Chem. 2001, 5 4 5 . Angew. Chem., Int. Ed. Engl. 2001,40,
28. D.S. Kemp, R.I. Carey, Synthesis of a 2004.
39-peptide and a 25-peptide by 40. Q. Wang, T.R. Chan, R. Hilgraf, V.V.
thiol-capture ligations: observation of Fokin, K.B. Sharpless, M.G. Finn,
5901 10 Synthesis of Large Biological Molecules
Bioconjugation by copper( I)-catalyzed 50. K.A. Connors, M.L. Bender, Kinetics

+
azide-alkyne [3 21 cycloaddition, /. of alkaline hydrolysis and
Am. Chem. SOC.2003, 125,3192. N-butylaminolysis of ethyl
41, C.W. Tornoe, C. Christensen, P-nitrobenzoate and ethyl
M. Meldal, Peptidotriazoles on solid P-nitrothiolbenzoate, /. Org. Chem.
phase: [1,2,3]-triazolesby regiospecific 1961,26,2498.
copper(i)-catalyzed1J-dipolar 51. W. Yang, D.G. Drueckhammer,
cycloadditions of terminal alkynes to Understanding the relative acyl-
azides,]. Org. Chem. 2002, 67, 3057. transfer reactivity of oxoesters and
42. Z. Machova, R. von thioesters: computational analysis of
Eggelkraut-Gottanka, N. Wehofsky, transition state delocalization effects,
F. Bordusa, A.G. Beck-Sickinger, 1.Am. Chem. SOC.2001, 123,11004.
Expressed enzymatic ligation for the 52. P.E. Dawson, S.B. Kent, Synthesis of
semisynthesis of chemically modified native proteins by chemical ligation,
proteins, Angew. Chem., Int. Ed. Engl. Annu. Rev. Biochern. 2000, 69, 923.
2003,42,4916. 53. T.W. Muir, P.E. Dawson, S.B.H. Kent,
43. Z.P. Wu, D. Hilvert, Conversion of a Protein-synthesis by chemical ligation
protease into an acyl of unprotected peptides in
transferase-selenolsubtilisin, J . Am. aqueous-solution, Methods Enzymol.
Chem. SOC.1989, 1I I, 4513. 1997, 289,266.
54. A. Dirksen, E.W. Meijer, W. Adriaens,
44. T. Nakatsuka, T. Sasaki, E.T. Kaiser,
Peptide segment coupling catalyzed by T.M. Hackeng, Strategy for the
the semisynthetic enzyme synthesis of multivalent peptide-based
thiolsubtilisin, /. Am. Chem. SOC.1987, nonsymmetric dendrimers by native
chemical ligation, Chem. Commun.
109, 3808.
2006, I S , 1667.
45. S. Atwell, J.A.Wells, Selection for
55. J.P. Tam, Q. Yu, Z. Miao, Orthogonal
improved subtiligases by phage
ligation strategies for peptide and
display, Proc. Natl. Acad. Sci. U.S.A.
protein, Biopolymers 2000, 51, 311.
1999, 96,9497.
56. J.P. Tam, Q. Yu, Methionine ligation
46. D.Y. Jackson, J. Burnier, C. Quan,
strategy in the biomimetic synthesis of
M. Stanley, J. Tom, J.A. Wells, A
parathyroid hormones, Biopolymers
designed peptide ligase for total 1998, 46, 319.
synthesis of ribonuclease a with 57. R. Quaderer, A. Sewing, D. Hilvert,
unnatural catalytic residues, Science Selenocysteine-mediated native
1994,266,243. chemical ligation, Helv. Chim. Acta
47. P.E. Dawson, T.W. Muir, I . Clark- 2001,84, 1197.
Lewis, S.B.H. Kent, Synthesis of 58. W.A. van der Donk, M.D. Gieselman,
proteins by native chemical ligation, Synthesis of selenocysteine-containing
Science (Washington, D. C.) 1994, 266, peptides by native chemical ligation,
776. Abstr. Pap. Am. Chem. SOC.2001, 222,
48. T. Wieland, E. Bokelmann, L. Bauer, u45.
H.U. Lang, H. Lau, Uber Peptid 59. S.M. Berry, M.D. Gieselman, M. J.
synthesen. 8. Mitteilung Bildung van Nilges, W.A. van der Donk, Y. Lu, An
S-haltigen Peptiden durch engineered azurin variant containing a
intramolekulare Wanderung van selenocysteine copper ligand, /. Am.
Arninoacylresten. Liebigs Ann. Chem. Chem. SOC.2002, 124,2084.
1953,583,129. 60. R.J. Hondal, B.L. Nilsson, R.T. Raines,
49. I.H. Um, G.R. Kim, D.S. Kwon, The Selenocysteine in native chemical
effects of solvation and polarizability ligation and expressed protein ligation,
on the reaction of S-P-Nitrophenyl / . A m . Chem. SOC.2001, 123, 5140.
thiobenzoate with various anionic 61. J.P. Tam, Y.A. Lu, L. Chuan-Fa,
nucleophiles, Bull. Korean Chem. SOC. J. Shao, Peptide synthesis using
1994, is,58s. unprotected peptides through -
References I591
orthogonal coupling methods, Proc. 73. T.M. Hackeng, J.H. Griffin, P.E.
Natl. Acad. Sci. U.S.A. 1995, 92, 12485. Dawson, Protein synthesis by native
62. L.Z. Yan, P.E. Dawson, Synthesis of chemical ligation: expanded scope by
peptides and proteins without cysteine using straightforward methodology,
residues by native chemical ligation Proc. Natl. Acad. Sci. U.S.A.1999, 96,
combined with desulfurization, /. Am. 10068.
Chem. Soc. 2001, 123, 526. 74. M. Villain, H. Gaertner, P. Botti,
63. L.E. Canne, S.J. Bark, S.B.H. Kent, Native chemical ligation with aspartic
Extending the applicability of native and glutamic acids as C-terminal
chemical ligation, J . Am. Chem. Soc. residues: scope and limitations, Eur. /.
1996, 118,5891. Org. Chem. 2003, 17, 3267.
64. J. Offer, P.E. Dawson. N"-2- 75. G.S. Beligere, P.E. Dawson,
Mercaptobenzylamine-assisted Conformationally assisted protein
chemical ligation, Org. Lett. 2000, ligation using C-terminal thioester
2, 23. peptides,J. Am. Chem. SOC. 1999, 121,
65. J. Offer, C.N. Boddy, P.E. Dawson, 6332.
Extending synthetic access to proteins 76. C.L. Hunter, G.G. Kochendoerfer,
with a removable acyl transfer Native chemical ligation of
auxiliary, 1.Am. Chem. Soc. 2002, 124, hydrophobic [corrected] peptides in
4642.
lipid bilayer systems, Bioconjugate
66. T. Kawakami, K. Akaji, S. Aimoto, Chem. 2004, 15,437.
Peptide bond formation mediated
77. M. Mutter, P. Dumy, P. Garrouste,
by 4,5-dimethoxy-2-
C. Lehmann, M. Mathieu, C. Peggion,
mercaptobenzylamine after periodate
S. Peluso, A. Razaname,
oxidation of the N-terminal serine
G . Tuchscherer, Template assembled
residue, Org. Lett. 2001, 3, 1403.
synthetic proteins (tasp) as functional
67. C. Marinzi, J. Offer, R. Longhi, P.E.
mimetics of proteins, Angew.Chem.,
Dawson, An o-nitrobenzyl scaffold for
peptide ligation: synthesis and
Int. Ed. Engl. 1996, 35, 1482.
applications, Bioorg. Med. Chem. 2004, 78. J.P. Tam, Recent advances in multiple
12, 2749. antigen peptides, /. Immunol. Methods
68. P. Botti, M. Villain, S. Manganiello, 1996, 196, 17.
H. Gaertner, Chemical synthesis of 79. P.E. Dawson, S.B.H. Kent, Convenient
proteins through native and extended total synthesis of a 4-helix
chemical ligation, Biopolymers 2003, template-assembled synthetic protein
71, 283. (TASP) molecule by chemoselective
69. P. Botti, M.R. Carrasco, S.B.H. Kent, ligation, /. Am. Chem. Sac. 1993, 215,
Native chemical ligation using 7263.
removable N-alpha-(l-phenyl-2- 80. J.P. Tam, Y.A. Lu, Synthesis of large
mercaptoethyl) auxiliaries, Tetrahedron cyclic cystine-knot peptide by
Lett. 2001, 42, 1831. orthogonal coupling strategy using
70. T.M. Hackeng, J.A. Fernandez, P.E. unprotected peptide precursor,
Dawson, S.B. Kent, J.H. Griffin, Tetrahedron Lett. 1997, 38, 5599.
Chemical synthesis and spontaneous 81. J.A. Camarero, T.W. Muir,
folding of a multidomain protein: Biosynthesis of a head-to-tail cyclized
anticoagulant microprotein S, Proc. protein with improved biological
Natl. Acad. Sci. U.S.A. 2000, 97, 14074. activity, /. Am. Chem. Soc. 1999, 121,
71. G.S. Beligere, P.E. Dawson, Synthesis 5597.
of a three zinc finger protein, Zif268, 82. N.L. Daly, S. Love, P.F. Alewood, D.J.
by native chemical ligation, Craik, Chemical synthesis and folding
Biopolymers 2000, 52, 363. pathways of large cyclic polypeptide:
72. D. Bang, S.B. Kent, A one-pot total studies of the cystine knot polypeptide
synthesis of crambin, Angew. Chem., kalata B1, Biochemistry 1999, 38,
Int. Ed. Engl. 2004, 43, 2534. 10606.
7 0 Synthesis of Large Bio/ogica/Mo/ecules
592
I 83. L.Z. Yan, P.E. Dawson, Design and 88. L.E. Canne, P. Botti, R.J. Simon,
synthesis of a protein catenane, Angew. Y.J. Chen, E.A. Dennis, S.B.H. Kent,
Chem., lnt. Ed. Engl. 2001, 40, 3625. Chemical Protein Synthesis by Solid
84. J.W. Blankenship, P.E. Dawson, phase ligation, J . Am. Chem. Soc.,
Thermodynamics of a designed 1999, 121,8720.
protein catenane, J . Mol. Biol. 2003, 89. A. Brik, E. Keinan, P.E. Dawson,
327, 537. Protein synthesis by solid-phase
85. J.D. Warren, J.S. Miller, S.J. Keding, chemical ligation using a safety
S.J. Danishefsky, Toward fully catch linker, J. Org. Chem. 2000, 65,
synthetic glycoproteins by ultimately 3829.
convergent routes: a solution to a 90. D. Bang, S.B.H. Kent, A one-pot total
long-standing problem, J . Am. Chem. synthesis of crambin, Angew. Chem.,
SOC.2004, 126, 6576. lnt. Ed. Engl. 2004, 43, 2534.
86. R.S. Goody, T. Durek, H. Waldmann, 91. T.W. Muir, Development and
L. Brunsveld, K. Alexandrov, in application of expressed protein
GTPases Regulating Membrane ligation, Synlett 2001,733.
Targeting and Fusion, Methods 92. D. Bang, S.B. Kent, His6 tag-assisted
Enzymol.,2005, 403, 29. chemical protein synthesis, Proc. Natl.
87. Y. Kajihara, N. Yamamoto, Acad. Sci. U.S.A.2005, 102, 5014.
T. Miyazaki, H. Sato, Synthesis of 93. B.L. Nilsson, L.L. Kiessling, R.T.
diverse asparagine linked Raines, High-yielding Staudinger
oligosaccharides and synthesis of ligation of a phosphinothioester and
sialylglycopeptide on solid phase, azide to form a peptide, Org. Lett.
Cum. Med. Chem. 2005, 12,527. 2001, 3, 9.
Chemical Biology
10.3 New Methods for Protein Bioconjugation I 593
10.3
New Methods for Protein Bioconjugation
Matthew B. Francis
Outlook
This chapter surveys new chemical methods for the attachment of synthetic
molecules to proteins. Strategies targeting both native and unnatural functional
groups are discussed, including an evaluation of the selectivity that each
technique can achieve. A particular emphasis has been placed on the
unique mechanistic attributes that these reactions possess and the practical
circumstances under which they can be used.
10.3.1
Introduction
The field of bioconjugation occupies a central role in chemical biology.

At its simplest, this technique involves the attachment of new synthetic
components to biomolecules of interest, with the goal of altering their chemical
function or biological properties. The resulting hybrid structures have served
as powerful tools for a variety of applications, including the observation of
protein trafficking [l, 21, the elucidation of electron transfer pathways [3],
the improvement of pharmacokinetic properties [4,51, the synthesis of
artificial glycoproteins [6], and the construction of nanoscale materials [7,
81. Figure 10.3-1 summarizes some of the molecules and materials that are
commonly used to achieve these goals. Regardless of the application, the
preparation of each bioconjugate critically relies on at least one chemical
reaction that forms a well-defined covalent link between the biomolecule and
the synthetic group, creating a need for organic transformations that can
modify biomolecules with high yield and specificity. The goals of this chapter
are to survey the new chemical tools that have emerged to meet this demand
and to provide a perspective on the unique reactivity attributes that have led to
their success.
Synthetic organic chemistry has provided countless powerful and elegant
strategies for the construction of complex natural products. Generally, the
reactions used for this purpose arise from the systematic optimization
of reaction parameters, such as solvent, temperature, concentration, and
protecting groups, until the desired reactivity and selectivity are achieved. In
sharp contrast, reactions for biomolecule modification cannot be developed
with this flexibility because they must be carried out under a narrow set of
conditions to maintain the properly folded structure of the protein substrates.
Ideally, they should proceed in aqueous solution within a pH range of 6-8,
ISBN: 978-3-527-31150-7
594
Fig. 10.3-1 A survey of molecules and materials that are commonly attached to proteins
through bioconjugation reactions.
at temperatures ranging from 4 to 37 "C, and in the absence of any protective

groups. In most cases, they also require the complete removal of excess
reagents before the proteins are returned to the biological setting. Perhaps the
most significant challenge to meet, however, is the low concentration of most
biomolecules in solution (typicallywell below 100 pM), requiring reaction rate
constants that are effectively 1000-100 000 times greater than those needed
for traditional synthetic operations. Thus, from the perspective of an organic
chemist, the field of biomolecule modification provides a fascinating context
for the development of chemical transformations that push the limits of
reactivity, chemoselectivity, and functional group tolerance.
Conceptually, the new bioconjugation reactions described herein have
been divided into two types: Those that introduce new functionality by
modifying the natural amino acid side chains, and those that target reactive
groups not occurring in natural biomolecules. Historically, bioconjugation
techniques targeting native functionality have been used more widely, as the
introduction of abiotic functional groups into proteins has been difficult to
achieve. However, with the advent of new technologies for the biosynthetic
incorporation of unnatural amino acids, sugars, and lipids into proteins,
exquisitely selective reactions targeting chemically distinct functional groups
10.3 New Methodsfor Protein Bioconjugation 1 595
have become possible. These techniques are not used for the majority of
bioconjugation reactions at the time of this writing, but they are certain to
provide countless new strategies as these methods become more available
and general. Although these techniques are described in more detail in other
chapters of this book, some examples of their use in selective bioconjugation
will be presented whenever possible.
10.3.2
History/Development
By far, the most common bioconjugation reactions target nucleophilic amino

acid side chains, including lysine, cysteine, and aspartic/glutamic acid residues
that occur in areas of the protein that are not required for proper function [9].
Of these, the reaction of NHS esters 1,isocyanates 2, and isothiocyanates 3 with
the &-aminogroups oflysine residues (Fig. 10.3-2(a))is perhaps the most widely
used strategy, as most proteins possess multiple copies ofthis residue (often 20
or more) on their surface. These reactions rely on the ability of these reagents
to acylate amino groups much more rapidly than they are hydrolyzed by the
aqueous solvent. Because of the reliability of this reaction for simple protein
modification, dozens of active acylating agents are now commercially available.
As an alternative, lysine residues can also be modified through reductive
alkylation. This reaction proceeds through the condensation of aldehydes
with the amino groups, forming transient imines that are reduced by water-
compatible hydride sources, such as NaBH3CN, NaBH4, or transition metal
hydrides (see below). An advantage of this technique over lysine acylation is
that it maintains the basicity of the amino group, thus preserving the overall
charge state of the protein target.
The carboxylate residues of proteins can also serve as sites for
functionalization. Water-soluble carbodiimides, such as N-ethyl-3-N',N'-
dimethylaminopropyl carbodiimide (EDC, 4),form active esters with aspartic
and glutamic acid residues that react with exogenous amines to form amide
bonds, Fig. 10.3-2(b).It should be noted that this reaction often generates side
products arising from the rearrangement of the 0-acylisouronium interme-
diate to form N-acyl urea 5, although nucleophilic catalysts (such as HOBT
(hydroxybenzotriazole), 6 ) have been shown to suppress this pathway [lo].
In instances where lysine amino groups are located near the activated car-
boxylates, this strategy can serve as a particularly useful method for protein
cross-linking [ll].
Unfortunately, the high prevalence of lysine and carboxylate-containing
residues on protein surfaces places severe limitations on the ability to control
the precise locations and the number oftimes a particular biomolecule is modified
(for a notable exception, see Ref. 12). The need for this selectivity depends
on the application at hand: while many experiments are tolerant of unevenly
labeled samples, studies designed to probe enzyme function or to measure
596
I
(a) Lysine residues R (c) Cysteineresidues
R-N=C=X
2: X = 0 (Isocyanates)
3: x = s (Isothiocyanates) *\N
H
8: lodoacetamides
-
f
(b) Aspartic and glutamic acid residues
QH
5 (in varying amounts) 6: HOBT
Fig. 10.3-2 Common strategies for protein bioconjugation,

targeting lysine, cysteine, aspartic acid, and glutamic acid
residues. In most situations, only cysteine modification reactions
are site selective.
distances with fluorescence resonance energy transfer (FRET) [ 131 require

exquisite labeling specificity.To a limited extent, differences in pKa values can
be used to distinguish between multiple copies of a single residue, but this
does not provide a general method for achieving site selectivity.
At present, virtually all applications require the site-specific modification
of protein target cysteine residues. The low pK, of the sulfhydryl group
(4), coupled with the potent nucleophilicity of the thiolate anion, provides a
particularly reactive functional group for alkylation reactions. Cysteine is the
rarest of the genetically encoded amino acids [14], and typically does not occur
in the reduced form as a surface residue; as a result, it is frequently possible to
introduce a uniquely reactive cysteine group using site-directed mutagenesis.
Although this strategy can sometimes be accompanied by unwanted disulfide
10.3 New Methodsfor Protein Bioconjugation I 597
bond formation or scrambling, the reliability of cysteine modification reagents

renders this the current method of choice for applications that require
functionalization in a precise location.
Reagents for the modification of cysteine fall into two general classes.
The first involves a series of alkylation reagents, including maleimides
7, acrylamides, iodoacetamides 8, and vinyl sulfones, designed to modify
cysteines through the formation of a sulfur-carbon bond. This method is
usually quite selective for thiolate anions, and in cases where lysine cross-
reactivity is problematic, the selectivity can sometimes be improved by lowering
the pH of the reaction medium. Similar to lysine modification strategies, a
range of reagents is commercially available for the alkylation of cysteine
residues.
The second class of cysteine modification reagents includes disulfide
formation reagents. Free cysteine residues participate in rapidly equilibrating
exchange reactions with symmetric disulfides, such as 9, with complete
modification occurring through mass action [ 151. For more precious reagents,
asymmetric disulfides can be generated with 4-and 2-thiopyridines [16].These
species react with cysteine residues through the selective release of the
stabilized thiopyridone group. Disulfide formation reactions are inherently
chemoselective, and offer the unique feature of reversibility. This property can
be used to release chemical groups on entrance of the protein into reducing
environments, a useful feature for drug delivery applications [17].
Despite the utility of cysteine modification, there remains a growing need for
reactions that can target other functional groups on proteins. These techniques
are necessary in cases where it is inconvenient or impossible to introduce a
unique cysteine residue, or when complementary strategies are required to
attach two different functional groups to a single protein (e.g., for FRET and
optical tweezer studies). Additionally, the targeting of a cysteine residue alone
is not sufficient to select a single protein of interest in a living cell or crude
lysate. To address these needs, new chemical strategies have become available
to expand the set of residues that can be modified and to improve the selectivity
with which they can be targeted. The remainder of this chapter focuses on the
development, application, and future directions of this active area of research.
10.3.3
New Bioconjugation Methods Targeting the Natural Amino Acids
10.3.3.1 New Chemical Tools for the Modification of Tyrosine Residues

Tyrosine residues are underutilized targets for bioconjugate preparation. As
it is displayed with intermediate frequency on protein surfaces, tyrosine can
often be modified with greater selectivity than other residues. In contrast to
charged amino acids, tyrosine residues are often partially “buried” in the
surface of the proteins owing to the amphipathic nature of the phenolic
group, Fig. 10.3-3(a-d). This close association with the topography of protein
598
I
Fig. 10.3-3 Tyrosine residues as targets for by the white arrows) can be (b) fully
bioconjugation. (a) In contrast t o charged exposed, (c) partially buried, or (d) fully
amino acid side chains, tyrosine residues buried. The protein shown is
(yellow) are more closely associated with the a-chymotrypsinogen A. (e) Modification o f
protein surface. The reactive 3- and tyrosine residues through electrophilic
5-positions ofthe phenolic ring (indicated aromatic substitution reactions.
surfaces results in varying levels of accessibility for tyrosine residues, and

thus significant differences in their reactive properties. In cases where no
surface accessible tyrosines are present, they can be introduced using genetic
methods, with the added advantage that their incorporation produces minimal
changes in the charge state and redox sensitivity of the expressed proteins. As
an additional consideration, the tyrosine reactivity is largely complementary
to that of cysteine, lysine, and carboxylate-containing residues. When used
in conjunction with other methods, this chemical orthogonality is extremely
useful for the preparation of proteins that are labeled in multiple sites.
Electrophilic aromatic substitution is the most common method for the
modification of tyrosine residues, typically involving iodination [18, 191,
nitration [20],or azo bond formation [21-231, Fig. 10.3-3(e).Coupling reactions
with diazonium salts provide the most general method for the introduction of
new functional groups, as virtually any substituent can be attached to the aniline
precursor. Through quantitative reactivity studies, it has been determined that
diazonium salts prepared from 4-nitroaniline derivatives (such as 10a) are
particularly effective, typically reaching very high levels of conversion in under
30 min using less than five equivalents of reagent [lo, 241. Diazonium salts
bearing nitrile- lob and acyl substituents 1Oc in the 4-position provide efficient
coupling in some instances, but more electron-rich analogs are generally low
yielding. A general route to appropriately functionalized diazonium salts is
provided using 4-nitro-3-anthranilic acid (ll),Fig. 10.3-4(a).
Fig. 10.3-4 Highly efficient modification o f and (e) the appearance o f an azo absorption
tyrosine residues using electron-deficient band in the visible spectrum. (t) Similarly,
diazonium salts. (a) General preparation 2100 copies oftyrosine 139 (yellow) line the
method for nitro-substituted diazonium exterior surface ofthe tobacco mosaic virus
salts. (b) There are 180 copies oftyrosine 85 (TMV). (g) These sites can be modified
(green) displayed on the interior surface o f using a two-step diazonium-couplingjoxime
bacteriophage MS2. (c) Virtually all these formation strategy. In both cases, the
sites can be modified using diazonium salt reactions are completely selective for the
10a, as evidenced by (d) MALDI-TOF MS indicated tyrosine residues.
An advantage of diazonium-coupling strategies is the high level of conversion

that can be reached. This is particularly useful for the functionalization of
protein assemblies designed to serve as scaffolds for material applications,
as their surfaces possess hundreds or thousands of individual sites for
potential functionalization. As an example, diazonium-coupling reactions
have been used to modify the tyrosine residues of two viral capsids, resulting
in supramolecular assemblies that are homogeneously functionalized on the
interior or exterior surfaces. In the first example, the targeting of tyrosine 85 of
the protein capsid of bacteriophage MS2 provided 180 attachment sites on the
interior surface ofthe spherical protein shell, Fig. 10.3-4(b)[24].After exposure
to two equivalents of nitrodiazonium salt 10a, analysis by MALDI-TOF MS
and UV-vis spectroscopy indicated that >90% of the sites had been modified
(Fig. 10.3-4(c-e)). Remarkably, no capsid disassembly was observed in these
600 I 10 Synthesis of Large Biological Molecules
studies. Through further elaboration of these sites, carrier materials are being
prepared for drug delivery applications and as targeted diagnostic agents. As
a second example, tyrosine 139 of the tobacco mosaic virus (TMV) capsid
was modified using ketone-substituted diazonium salt lOc, resulting in the
installation of 2100 sites on the exterior surface, which can be further labeled
through oxime formation [lo]. Once again, virtually complete conversion was
obtained, and the capsid remained assembled after the modification reaction.
As a result, tubelike materials with tailorable surface properties have become
available for nanoscience applications.
The above studies emphasize the ability of diazonium-coupling reactions
to modify proteins with extremely high efficiency, but one of the limitations
of this method is the lack of selectivity that can be obtained when there
are multiple tyrosines on the surface of a single protein. This has not
been problematic for the viral capsids shown above, as only one tyrosine
is accessible on each monomer, but many applications demand higher levels
of selectivity than allowed by these coupling reactions. To address this need,
and to increase the substrate scope for bioconjugation reactions in general, a
versatile Mannich-typereaction has been developed for tyrosine modification,
Fig. 10.3-5 [25]. In this reaction, aldehydes and anilines are mixed to form
(4
r ~ J ! O H
Tyrosine residues
0
HKR
Phosphate buffer
25 mM 25 mM 12 22% 18 h
(b) Reactive anilines (with formaldehyde):
Unreactive anilines and aliphatic amines (<5% conversion):
H2N7QNO2 HZNQ HzN)&

CI CO,H
H 2 N b QCo2H
H A N
H A
Fig. 10.3-5 Tyrosine modification using a when proteins are treated alone with either
three component Mannich-type reaction. component. (b) The reaction conversion is
(a) Aldehydes and anilines condense to listed for a number o f anilines and aliphatic
form imines in situ, which react with tyrosine amines using a-chymotrypsinogen A as the
residues through an electrophilic aromatic substrate and formaldehyde as the aldehyde
substitution reaction. No reaction occurs component.
10.3 New Methodsfor Protein BioconJugation I 601
imines 12, which subsequently react with phenolic side chains through an
electrophilic aromatic substitution reaction [26]. Anilines bearing electron-
donating substituents have proven to be the most effective components in
the reaction, affording over 70% overall conversion in some cases. To date,
no aliphatic amines have been observed to participate in the reaction - a
useful feature, as cross-linking reactions with lysine residues are avoided.
Formaldehyde has yielded the highest amount of reactivity, although aldehydes
such as pyruvaldehyde, glyoxylic acid, and furan-carboxylic acid have proven
effective in some instances. Enolizable aldehydes are generally ineffective in
the reaction, presumably due to competing aldol self-condensation pathways.
Some particularly attractive features of this reaction include its mild conditions
(pH 6.5, aqueous buffer, 22-37 "C), very high selectivity for tyrosine residues,
and broad substrate tolerance with respect to the aniline component. I t
should be noted that formaldehyde cross-linking techniques require high
concentrations of the aldehyde (up to 37%) and/or elevated temperatures [27].
With the low concentrations used in these reactions, no modification of the
proteins has been observed in the absence of the aniline component.
In many labeling applications, anilines bearing additional aliphatic amino
groups (such as 13) are particularly useful building blocks, as the aliphatic
amino group of these compounds can be coupled to NHS esters before
using them in the Mannich coupling reaction. This effectively converts the
large number of commercially available lysine labeling reagents into more
selective tyrosine modification reagents using a simple one-pot procedure,
Fig. 10.3-6. This strategy has been applied to the labeling of two antibody
binders, protein A and protein G', with a number of useful functional groups
for immunoassays [28].
As the Mannich reaction does not target cysteine or lysine residues,
both thiols and aliphatic amines can be present in the bioconjugation
substrates. This allows unprotected peptides to be coupled to tyrosine residues
using a tandem Mannich-native chemical ligation (NCL) [29] strategy. TO do
this, N-terminal cysteine mimic 14 has been coupled to tyrosine residues
using the Mannich reaction, Fig. 10.3-7(a). This functional group couples
to peptide thioesters (e.g., IS), ultimately resulting in the synthesis of
branched polypeptide backbone architectures. By moving the location of the
tyrosine residue through site-directed mutagenesis, the branch point can be
repositioned on the protein surface, Fig. 10.3-7(b).The use of this technique
allows the growing set of peptide building blocks, including lanthanide binding
peptides [30]and affinity tags, to be appended to proteins in a flexible manner.
10.3.3.2 Protein Modification Using Transition Metal Catalyzed Reactions

Transition metal-mediated reactions provide an exceptionally powerful set of
tools for site-selectiveprotein modification. These strategies have had a striking
impact on organic synthesis over the last three decades due to the ability of
transition metals to activate otherwise unreactive functional groups with
602
I 10 Synthesis $Large Biological Molecules
Fig. 10.3-6 Tyrosine modification using chymotrypsinogen A with several

commercially available lysine-reactive chromophores using a two-step, one-pot
probes. (a) The aliphatic amino group reacts procedure. Control reactions carried out in
chemoselectively with NHS esters, leaving the absence o f formaldehyde indicate that
the aniline amino group free t o participate no lysine modification occurred owing to
in the Mannich reaction, O n addition of remaining NHS esters. (c) Structures o f t h e
formaldehyde and a protein target, tyrosine chromophores used in (b).
residues are modified. (b) Modification o f
S H
100 HM lysozyme
Ligation
center
Fig. 10.3-7 Native chemical ligations using C-terminal thioesters (e.g., 15) obtained
tyrosine residues. (a) Reactive N-terminal using solid-phase synthesis techniques.
cysteine mimics can be installed through (b) By changing the location o f t h e tyrosine
tyrosine modification using the Mannich residue, the branch point o f the resulting
reaction. After disulfide reduction with DTT structure can be moved.
(dithiothreitol), these groups react with
exceptional selectivity. Many of these reactions have been used successfully in

aqueous solution and possess virtually complete functional group tolerance.
These features suggest that transition metal catalyzed reactions could similarly
expand the synthetic repertoire for bioconjugation by targeting previously
unmodifiable protein functional groups. It is also possible to tune the reactivity
of transition metals through adjustments in the ligand sphere, and the complex
stereochemical environments provided by asymmetric ligands could provide
a way to distinguish between several otherwise identical amino acid residues.
To demonstrate the feasibility of this approach, pioneering studies by several
groups have shown that aryl halides introduced into amino acids and peptides
can participate in cross-couplingreactions using palladium catalysts in aqueous
solution [31-331.
As proteins possess a number of nucleophilic groups, it is likely that
electrophilic transition metal complexes will prove to be the most useful.
As an example, a new palladium based method has been developed for
the alkylation of tyrosine residues [34]. In this reaction, allylic carbonates,
esters, and carbamates are activated by palladium(0) complexes in aqueous
solution, resulting in the formation of electrophilic ir -ally1 complexes (such
as 16), Fig. 10.3-8(a). These species react at pH 8-10 with the phenolate
anions of tyrosine residues, resulting in the formation of aryl ether 17 and
regeneration of the Pd(0) catalyst. The reaction requires no organic cosolvent,
is catalytic in palladium, and requires P(m-CbH4S03-)3 as a water-soluble
phosphine ligand. In contrast to alkyl or allylic halides, the inert character of
the allyloxycarbonylcompounds used in this reaction ensures that nonspecific
40 M M Pd(OAc),
0 5 mM P(C,H,SO,-),>
(“w
D y e - d PdLn+
Tyrosine
residues ~
Dye-NJ
pH 8 6
16 17
(b) 0
w \ o A\ N - s o 3 -? / / /
25667(Expecled 25656)
H (unmodified)
18 water soluble farnesyl derivative
ll -1
0 44 mM Pd(OAc),. 5 3 mM P(m-C,H,SO,.),
>
$I 25875 (Expected 25860)
(M+1 modification)
pH 9, RT. 3 h
t 25000 27000 29000
ESI-MS ( d z )
Chymotrypsinogen A
(200 PM)
Fig. 10.3-8 Tyrosine modification using selectivity. (b) Charged groups can be
palladium n-ally1 chemistry. (a) Allylic attached to hydrophobic chains to assist in
acetates (shown), carbonates, and solubilization. These carriers are lost on
carbamates can be activated by formation of the n-ally1 complexes, and thus
palladium(0) in aqueous solution t o yield are not incorporated into the protein targets.
electrophilic rr-ally1 complexes. These This provides a useful method for the
species alkylate tyrosine residues with high synthesis o f membrane-associated proteins.
604
background alkylation of the protein does not occur. Extensive reactivity

studies and trypsin digests have confirmed that the reaction displays excellent
selectivity for tyrosine residues. Activated n-ally1 complexes that do not react
with tyrosine residues undergo B-elimination under the basic conditions, to
yield diene by-products.
A particularly attractive feature of this method is the use of a “disposable”
activating group that is cleaved prior to protein attachment. This allows
otherwise prohibitively hydrophobic molecules to be solubilized in water by
coupling them to charged carrier groups, such as taurine, Fig. 10.3-8(b).This
group is lost on activation of carbamate 18 by the water-soluble palladium
complex, which then transfers the hydrophobic group to the protein. This
“solubility switching” strategy is used to prepare lipid membrane-associated
proteins.
10.3.3.3 Modification of Tryptophan Residues Using Metallocarbenoids

Similar to cysteine residues, the low abundance of surface accessible
tryptophans suggests that these residues could serve as highly selective
bioconjugation handles when introduced using genetic methods. Furthermore,
the importance of indole side chains as mediators of protein-protein
interactions and electron transfer processes [35]creates a need for modification
reactions that can target this residue. To achieve this, a highly selective
transition metal-based method has been reported for the functionalization of
these groups [36].On exposure ofvinyl diazo compound 19 [37]to R h ~ ( 0 A cin) ~
aqueous solution, electrophilic metallocarbenoid intermediate 20 is produced,
Fig. 10.3-9. Normally, this highly reactive species reacts with water to form
alcohol 21; however, it has been found that 20 can react with the indole side
chains of tryptophan residues with comparable rates, resulting in one of the
first modification reactions for this residue. The reaction proceeds readily in
aqueous solution with ethylene glycol (up to 20%) [38] added as a cosolvent
to assist in the solubilization of the diazo compound. Typically, 10 mM diazo
compound and 100 pM Rh2(OAc)4are used, reaching up to 70% conversion
with protein concentrations as low as 10yM. On the basis of reactions
carried out with small molecule analogs, mixtures of N-alkyl 22 and 2-alkyl
23 products are produced, presumably resulting from direct NH insertion or
through cyclopropanation followed by ring opening, respectively. Although
the addition of some cosolvents can lead to the modification of disulfides (see
below), the reaction otherwise displays excellent tryptophan selectivity.
Early studies identified hydroxylamine hydrochloride as an essential
component for the success of the reaction. When added to an unbuffered
aqueous solution, this additive results in dramatically enhanced catalytic
activity, presumably through the binding of the oxygen atom to the remaining
vacant coordination site of the bimetallic metallocarbene complex (species
24 in Fig. 10.3-9(b)).However, the addition of this HC1 salt lowers the pH
of the solution to 3.5, effectively denaturing many protein targets. Elevated
(4
4;R
100 pM Rh,(OAc),
22
ph+oR 75 mM HONHpHCI p h q O R H * +
H,O/ethylene
(80:20) glycol Tryptophan
10 mM 20 0 (1residues
0-100 pM)
RT, 7 h
19: R = (CH,CH2O),CH3
p h q O R 23 Ph
21 0
R'O C 0 3
CH,
?-OH
Low pH H
I + 26
CH3 CH3
24: active carbene 25: inactive carbene
Fig. 10.3-9 Tryptophan modification using addition t o reacting with the aqueous
rhodium carbenoids. (a) These species can solvent. Control experiments that were run
be formed in situ through the reaction of in the absence o f rhodium catalyst afford no
vinyldiazo compound 19 with catalytic modification products. (b) Proposed
amounts of RhZ(OAc)4. Intermediate 20 can binding modes for hydroxylamine at low 24
react with tryptophan residues, forming a and elevated 25 p H levels.
mixture o f N - and 2-alkylated indoles, in
reaction pH level results in substantial losses in reactivity with this additive,

possibly by liberating the nitrogen lone pair and switching the preferred
binding mode to 25. A solution to this problem was found through the use of
N-tert-butyl hydroxylamine (26),which discourages the deprotonated nitrogen
from binding through steric interactions with the catalyst ligand sphere. Using
this additive, reactions can be carried out at pH 6-7.
10.3.3.4 Modification of Disulfide Bonds Using Metallocarbenoids

The selectivity of transition metal-mediated reactions is often sensitive to
changes in the specific reaction conditions. This behavior is also observed
in the case of metallocarbenoid-based protein modification reactions. In the
presence of >25% tert-butanol, metallocarbenoid intermediate 20 reacts with
the nucleophilic lone pairs of disulfide groups to form ylides, such as 27a,
Fig. 10.3-10(a)[39]. In some instances, this species undergoes a sigmatropic
606
I 10 Synthesis of Large Bio/ogica/ Mo/ecu/es
(a)
/
Sigma tropic
rearrangement P h w R
28 0
0
19: R = (CH2CH,0),CH3 20
27b X = Rh,(OAC),
27C: X =H
100 FM Rh2(0Ac),
0 0
RT. 7 h
3 0 Tocinoic acid Ph*CO,R

(1 mM) Mixture of 4 isomeric products
yrotein
100 rnM HONH,.HCI

50% HO, / 30% glycerol
/ 20% 1-BuOH Ph+R
10 mM RT. 1.5 h
0
R = (CH,CHZO),CH3
Chymotrypsinogen A
(100 FM)
Fig. 10.3-10 Disulfide modification using alkylate nearby nucleophiles. These

rhodium carbenoids. (a) Disulfide bonds pathways are demonstrated for (b) a cyclic
react with metallocarbenes to form ylide-like peptide hormone and (c) a protein. The
intermediates 27a-c (where x = a negative disulfide in (c) is represented by the yellow
charge, coordinated rhodium, or a proton). spheres, and the N-terminus i s indicated by
These can undergo a sigmatropic the yellow arrow.
rearrangement to form 1,3-adducts, or
rearrangement to form IJ-dithiane 28 [40], a stable species that incorporates

the functionality originating from the diazo compound while maintaining
the overall linkage provided by the disulfide bond. Similar to the tryptophan
reaction described above, this reaction requires hydroxylamine hydrochloride
as a reaction additive, and it affords high product yields with substrate
concentrations as low as 100 pM. Specific reaction conditions are shown for
tocinoic acid (30) in Fig. 10.3-10(b).
This reaction has also been applied to the modification of a protein
target, although a new reaction pathway has been observed in this case.
Instead of the pericyclic rearrangement, the ylide intermediate formed with a
disulfide of chymotrypsinogen A is attacked by the nearby N-terminus, likely
after protonation (yielding species 27c) or recomplexation of the rhodium
catalyst 27b. This results in the transfer of the styryl acetic acid group to
this neighboring site. As is the case with the 1,3-insertion pathway, this
reaction preserves the disulfide linkage after protein modification. Although
the conditions of this reaction are unlikely to maintain secondary and tertiary
protein structures, it still provides the only protein modification method that
is directed by disulfide groups.
10.3.3.5 Reductive Alkylation o f Lysine Residues Using Transfer Hydrogenation

A transition metal catalyst has also been used to effect the reductive
alkylation of amino groups on proteins [41]. This reaction uses [Cp* Ir(4-
4’-dimethoxybipy)(H20)]S0431 as a mild transfer hydrogenation catalyst
and formate ion as the stoichiometric hydride source, in Fig. 10.3-11(a).
Presumably, this reaction occurs via the reversible formation of imine 33
with free amino groups on the protein surface, followed by reduction of
iridium hydride 32. For most proteins, multiple modifications are observed
(Fig. 10.3-11(b)),although the overall level ofconversion can be altered through
variation of either the reaction temperature or the concentrations of the
aldehyde and catalyst. In general, the reaction has shown excellent reliability
for protein alkylation between pH 5 and 7.4.
Compared to lysine acylation with NHS esters, reductive alkylation strategies
offer several key advantages. First, the overall charge state of the protein
remains unchanged after the modification takes place, thus minimizing
changes in protein solubility and stability. This method also avoids competitive
hydrolysis pathways that can be problematic in some activated esters. Similarly,
the aldehyde feedstock materials that are used in this technique are frequently
more convenient to prepare and store than the corresponding NHS esters. As
an example of the latter case, a simple two-step oxidation/reductive alkylation
protocol can be used to attach unfunctionalized poly(ethy1ene glycol) (PEG) to
proteins. Conversion of commercially available PEG alcohol to corresponding
aldehyde 34 is accomplished through oxidation with Dess-Martin periodinane
(DMP) in CH2C12, Fig. 10.3-11(c).After isolation ofpolymer 34byprecipitation
from ethyl ether, it can be coupled to proteins using the transfer hydrogenation
reaction. In addition to providing access to PEGylated proteins for biomedical
applications, the simplicity of this technique allows the facile attachment of
virtually any polymer bearing primary hydroxyl groups.
10.3.3.6 Site-selective Modification o f the N-terminus

The N-termini of proteins offer several reactive options for the installation
of a single new functional group. Compared to lysine side chains, the lower
pK, of N-terminal amino groups (6-8) [42],can in principle be used to acylate
this position selectively, although absolute specificity is seldom achieved in
practice. More reliable strategies instead target the amino group in combination
with b-functional groups that are absent in the case of competing lysine
608
HCOi
,8, + H,NR'
31
-A
OMe
H
x H
H
32
OMe
33
14428
(b)
(+I) 14547 (+2)
14665 (+3)
20 pM catalyst
0 25 mM HC0,Na H
10 pM protein
(+4)
50 mM K,HPO, buffer
$N
-R
, 14309
+ R ~ H
pH 7.4,22-37"C, 2-18h 114781
(1 m w
13600 14500 15400
ESI-MS (ml~)
(4 1. 1 equiv DMP
CH ZCI, 1 h
M e O b o * O nH
2.PEG precipitation
* Me0 n
MW = 2000 37% conversion 34
100 pM lysozyme
20 uM catalvst
(aldehyde 34 at'l mM)
* N A Protein Catalyst: + - +
25 mM HC0,Na MeO+O*
" H Aldehyde: + + -
50 mM K,HPO, buffer PEG-OH: - - +
pH 7.4,37"C, 15 h
Fig. 10.3-11 Reductive alkylation of (c) Commercially available PEG alcohols can
proteins using iridium catalyzed transfer be readily oxidized to aldehydes using the
hydrogenation. (a) The iridium(ll1) catalyst Dess-Martin periodinane (DMP). This
shown reacts with formate ion to form a product can then be conjugated to proteins
water-stable hydride. This species reduces using the transfer hydrogenation process, as
imines formed in situ. (b) This reduction observed by SDS-PAGE analysis. The arrows
process proceeds readily on proteins, indicate the PEG conjugates. No reaction
affording multiple alkylated products. occurs in the absence o f catalyst.
residues. This approach has been particularly successful in the context of NCL
strategies with thioesters (Fig. 10.3-12(a))[29],a technique that is discussed in
depth elsewhere in this book (see also Fig. 10.3-14). In addition, N-terminal
cysteines can be modified with aldehydes through thiazolidine formation
(Fig. 10.3-12(b))[43],although the amide linkage formed in NCL reactions is
more resistant to hydrolysis. Similar linkages have been reported using the
Pictet-Spengler reaction (Fig. 10.3-12(c))[44],which proceeds via electrophilic
aromatic substitution reactions between indoles and imines formed with the
N-terminus. An extensive review of these techniques has recently appeared in
Ref. 43.
A critical consideration for N-terminal modification strategies is the ease
with which the identity of the first amino acid can be established. Although all
proteins begin with methionine as the first amino acid due to the commonality
of the AUG start codon, this group is nearly always removed after translation
in eukaryotes. The situation is more complicated in prokaryotes, however, as
the methionyl aminopeptidases are sensitive to the size of the second amino
(b)
HS
H2N Thiazolidine formation

0
R = H, CH,
Fig. 10.3-12 Common strategies for modification of the N-terminus.

610
I acid residue [45]. Virtualy
100% of the proteins expressed in Escherichia coli

lack the N-terminal methionine i j the second residue is small (such as glycine,
alanine, serine, or cysteine). However, in cases where leucine, tryptophan, or
tyrosine are present, the methionine is usually retained (note that the initial
N-formyl group is always cleaved posttranslationally). As a result, 40% of the
proteins in the E. coli genome retain a methionine at the N-terminus, thus
requiring further processing of the protein before some modification reactions
can take place.
For the modification of proteins expressed in prokaryotes, strategies targeting
N-terminal serine residues offer the advantage that the initial methionine is
always removed when this residue is present. The resulting p-amino alcohol
can then be oxidized in the presence of periodate to afford a glyoxamide
group [46],which can serve as a handle for additional bioconjugation through
oxime or hydrazone formation (Fig. 10.3-12(d)).This reaction is reported to
occur under mild conditions (40 pM NaI04, pH 7, 0 "C) and with high yield.
As an alternative to these techniques, reactive functionality can also be
introduced at the N-terminus using a biomimetic strategy [47, 481. In the
presence of pyridoxal phosphate (PLP, 35), imines are formed reversibly
with lysine side chains and the N-terminus; in the latter case, the relatively
low pK, of the a-proton enables a tautomerization reaction, affording imine
36, Fig. 10.3-13. This species hydrolyzes in the presence of water, resulting
in an overall transamination sequence that generates a reactive pyruvamide
or glyoxamide group 37 for further elaboration. In the case of proteins
bearing N-terminal aspartic acid residues, this reaction is accompanied by
HZO * .&ZY -HzN -OR'

R'O,
RJ...&
2-03~0 &iH 0
37
0
Fig. 10.3-13 A biomirnetic strategy for transamination at p H 6.5 and at 22-37°C.

N-terminal modification. After condensation The resulting pyruvamides can be further
with pyridoxal phosphate (PLP), a variety of derivatized through oxime formation.
N-terminal amino acids undergo oxidative
10.3 New Methodsfor Protein BioconJugation I 611
a decarboxylation step. Because it can be used with many amino acids, this
technique provides a general method for the site-selective modification of
virtually any protein under mild reaction conditions.
10.3.3.7 Selective Modification o f the C-terminus

In contrast to the relatively large number of reactions that target the N-
terminus, only one generally effective method for C-terminal modification
is currently available. This can be achieved through the use of intein-based
methods to produce C-terminal thioesters, which can then be modified through
NCLs with functionalized cysteines 38, Fig. 10.3-14 [49]. A more thorough
description of the scope of this convenient method appears somewhere else in
this book.
10.3.3.8 Binding o f Tetracysteine Motifs to Fluorescein Bis(arsenica1) (FIAsH)

Dyes
The labeling of a single biomolecule in a complex protein mixture presents
a particularly difficult challenge, as no bioconjugation reaction targeting a
H,Ni Protein target F U G d F F C O z H

n
I'
H,N Protein target
Fig. 10.3-14 Modification of the C-terminus using native chemical ligation.

612
FH SH SH sH
$-cyscys cys-cys-$
I 1
Pro-Gly
+
HO
- + HS-SH
u
39
Pro-Gly
Non-fluorescent Fluorescent
Fig. 10.3-15 Sequence-specific protein labeling with FlAsH dyes.

Tetracysteine motifs on expressed proteins replace the
ethanedithiol groups on biarsenical dye 39, resulting in a
substantial enhancement in fluorescence.
single natural amino acid can be expected to display the required selectivity.
As a solution to this problem, a labeling technique based on the recognition
of a specific sequence of amino acids has been reported. It was recognized
that the ethanedithiol groups of fluorescein bis(arsenica1)dye 39 (aka FlAsH)
can be displaced by tetracysteine motifs expressed on a protein of interest,
Fig. 10.3-15 [2]. Conformational changes that occur on binding reduce the
fluorescence-quenching effect of the arsenic atoms, resulting in a substantial
(up to GO-fold) enhancement in the quantum yield of the chromophore. The
unbound dye remains relatively nonfluorescent, thereby reducing the need for
scrupulous removal of the excess reagent. Although many ( C Y S ) ~sequences
can be recognized, CCPGCC has been particularly effective. Since the initial
publication, additional chromophores with varied optical characteristics have
become available [SO]. Although, similar labeling selectivity can be achieved
on the translational level using green fluorescent protein (GFP) fusion
techniques [Sl], a particular strength of the FlAsH approach is the reliance on
a small molecule modification that is less likely to affect protein trafficking,
binding, and catalytic function. A more detailed description covering the
applications of this powerful technique in cellular imaging appears somewhere
else in this book.
10.3.4
New Methods for the Biosynthetic Incorporation of Unnatural Functional Groups
A number of versatile methods have recently become available for the

incorporation of unnatural functional groups into biomolecules, allowing
the development of previously impossible bioconjugation reactions that target
these sites. The advantage of such techniques lies in their selectivity, as abiotic
groups can be targeted with reagents that show no reactivity with ordinary
biomolecules. As such, these reactions are exceptionally useful for the labeling
of a single target in a crude cell lysate or on the surface of living cells.

Although more detailed descriptions of these techniques appear elsewhere in
this book, each is briefly summarized here because of the striking impact that
these methods are destined to have on protein modification. Corresponding
bioconjugation reactions designed to target unnatural functional groups are
detailed in Section 3.5.
Recently, efforts by two groups have made it possible to incorporate unnatural
amino acids into proteins on the translational level. The first of these makes
use of the “Amber” codon, which lacks a cognate tRNA and therefore normally
halts protein biosynthesis. Synthetic tRNAs generated to recognize this codon
are charged with the new amino acid of interest and added to i n vitro protein
expression systems [52,53].This effectively reprograms the ribosome to install
a 21st amino acid wherever the Amber codon occurs. Although this method
displays absolute site selectivity and is remarkably general with respect to the
amino acids that can be introduced, the difficulty in preparing and purifying
the synthetic tRNAs restricts the quantities of protein that can be obtained.
More recently, this challenge has been addressed through the evolution of
modified tRNA synthetases that attach the unnatural amino acid to the tRNA
molecules directly [54-571. This allows bacteria or yeast to generate their own
tRNA molecules, thus expanding the approach to large-scale applications. In
one instance, bacterial hosts capable of biosynthesizing even the new amino
acid have also been developed [58].As this method continues to improve with
respect to practicality and accessibility, it is certain to provide many new
avenues for selective protein modification.
An alternative strategy takes advantage of the promiscuity with which
some naturally occurring tRNA synthetases attach amino acids to tRNA
carriers [59]. In this technique, auxotrophic hosts that cannot produce a
targeted amino acid are used. When the amino acid is removed from the
culture medium, protein biosynthesis comes to a halt. A replacement amino
acid is then added, and the expression of a desired protein is induced. The
new amino acid is recognized by the synthetase and is incorporated into
the expressed protein at each site where the original amino acid would
have appeared. This residue-specijic method has been demonstrated for both
methionine [GO] and phenylalanine [ G l ] analogs to date. The technique is rapid
and can incorporate new functionality without using directed evolution or
heterologous tRNA/synthetase expression, although the site selectivity is not
absolute.
The incorporation of new functional groups can also be accomplished
using the metabolic machinery for posttranslational protein modifications.
These methods rely on the ability of some modification enzymes to process
and install analogs of their natural substrates containing reactive handles
of interest. In an early demonstration of this technique, it was shown that
derivatives of N-acetylmannosamine 40a bearing ketones 40b) [G2] or azides
40c [63] in the acyl moiety are tolerated by enzymatic pathways that produce
sialic acid. By “feeding” these unnatural building blocks to cell cultures,
614
I the new functional groups are incorporated into the secreted and cell-surface
glycoproteins of mammalian cells, Fig. 10.3-16.This technique has also proven

successful for N-acetylglucosamine derivatives [64]. The new sites can then
Fig. 10.3-16 Introduction o f unnatural uniquely electrophilic handle to be

functional groups through posttranslational introduced on a single lysine residue. In this
modification. (a) Ketones and azides can be example, fluorescent hydrazide 42 is
introduced onto cell surfaces by “feeding” condensed with this group to form a hydra-
cells with unnatural sialic acid precursors, zone. (c) C-terminal modification through
such as mannosamine derivatives 40b and protein farnesylation. Azide-containing
c. These are incorporated into cell-surface farnesyl derivatives 43a and bare recognized
glycans, which can be further elaborated by farnesyltransferases and added t o
using additional bioconjugation reactions. “CaaX” sequences. (d) Biotinylated
(b) Specific amino acid sequences can be phosphopantetheinyl derivative 44 can be
modified using biotin ligase. Interestingly, added to fusion proteins bearing peptide
“ketobiotin” i s also recognized as a carrier proteins (PCPs) derived from
substrate for the enzyme, allowing a nonri bosomal peptide synthetases.
be targeted for chemoselective modification using secondary bioconjugation

methods described below, enabling one to modulate the surface interactions
of living cells [65, 661, achieve surface attachment [67], and identify specific
glycoprotein subtypes at the proteome level [68].
As many posttranslational modification enzymes display exquisite specificity
for a particular amino acid sequence, they are uniquely effective tools for
labeling one particular protein present in a complex mixture. A recent report
has capitalized on this feature by using biotin ligase to install biotin groups
41a on a single lysine residue embedded in a specific 15 amino acid sequence,
Fig. 10.3-16(b)[69].After amide bond formation, these proteins can be further
modified using commercially available avidin derivatives, including those
bearing fluorescent semiconducting nanocrystals [70]. As the recognition
sequence appears with little or no frequency in the proteome of most cells,
virtually absolute selectivity can be obtained.
Interestingly, biotin ligase has been shown to possess some substrate
tolerance, allowing even “ketobiotin” analog 411, to be added to the
proteins [69]. This results in the installation of a uniquely electrophilic
ketone (see below) on the protein of interest, which can then be labeled
chemoselectively with fluorescent chromophore 42 through hydrazone
formation. Although this method is currently limited to the labeling of
cell surfaces because of the presence of competing ketone metabolites in the
cytoplasm, it provides a powerful tool for chemospecific labeling using the
natural set of amino acids.
A method for the incorporation of artificial functional groups into
lipoproteins has also been developed [71]. In this technique, isoprenoid
biosynthesis is first halted through the addition of lovastatin to a cell culture.
After this, exogenous azidofarnesol derivatives 43a and b are added, leading to
the attachment of these groups to “CaaX” boxes by farnesyl transferases (where
a is any aliphatic amino acid andX is a C-terminalamino acid). The azide groups
can then be biotinylated using the Staudinger ligation (see Section 3.5.2),
allowing detection of the modified proteins using Western blot analysis.
Importantly, proteins targeted for geranylgeranylation are not modified using
the azidofarnesyl analogs. In conjunction with mass spectrometric sequencing
techniques, this method can be used for the proteomic analysis of farnesylated
proteins.
Specific labeling can also be accomplished through the targeting of specific
protein domains. In one report, an 80 amino acid peptide carrier protein (PCP)
domain derived from a nonribosomal peptide synthetase was fused to a protein
of interest [72]. After expression and lysis of the cells, biotinylated derivative
44 was installed on this domain by an added phosphopantetheinyl transferase,
Fig. 10.3-16(d).After the labeling step, the protein was attached to avidin coated
glass for further screening applications. The use of an exogenous transferase
for the labeling step is advantageous, as the native transferases present in the
E. coli lysate do not recognize the protein domain or substrate 44.
616
I 10.3.5
New Bioconjugation Methods Targeting Unnatural Functional Groups
Once artificial functional groups are incorporated into biomolecules, new

reactive strategies can be developed to target these sites chemoselectively.
As virtually any reactive group can be chosen in principle, these reactions
allow the full range of organic chemical transformations to be considered.
Several functional group pairs that exhibit orthogonal chemical reactiviq in a
biological setting have already been identified.
10.3.5.1 Ketone Functionalizationthrough Hydrazone and Oxime Formation

Although proteins possess numerous nucleophilic groups that can be used for
covalent modification, the natural amino acids do not provide any electrophilic
sites. Because of this, a useful strategy for the introduction of chemically
orthogonal reactive groups is to add electrophiles that react selectively with
exogenous nucleophiles. Perhaps the first functional group that was used
for this purpose was the ketone, using the highly selective condensation of
this group with hydrazine and alkoxyamine derivatives to form hydrazones
and oximes, respectively [62, 731. Both these reactions proceed readily in
aqueous solution, typically using 25 - 1000 pM concentrations of ketone-
reactive reagents. The reactions occur with a maximum rate at pH 6.5, and
the dehydration of the aminocarbinol intermediate has been determined to be
the rate-limiting step [74].In most cases, complete selectivity is observed, and
high conversions can be obtained through the use of excess reagent. These
condensation reactions can be found in many of the examples provided above
(Figs. 10.3-4(g),10.3-12(d),10.3-13, and 10.3-16(b)).This is one of the most
reliable strategies for protein modification under mild reaction conditions.
Ketone groups are often introduced using the primary bioconjugation
reactions described above, and are generated directly by biotin ligase [69],
N-terminal serine oxidation with periodate [46], and N-terminal transamina-
tion with PLP [47]. In addition, this group has been introduced through the
incorporation of 4-acetophenylalanineusing the nonsense codon suppression
technique [75]. As an alternative, ketone functional groups can also be in-
corporated into glycoproteins using metabolic engineering [62], as described
above.
10.3.5.2 Azide Modification Using the Staudinger Ligation

Although ketones show predictable and chemospecificreactivity on the exterior
of cells, they are of limited use in labeling studies that must take place inside
living cells or in crude cell extracts because of the presence of endogenous
ketone metabolites. To provide labeling reactions that can function under
these circumstances, several pairs of reagents that do not react with any
biological functional groups have been developed. Of these, the azide has
proven particularly useful, as it has a high thermodynamic driving force for
several reactions, and yet it is kinetically inert under physiological conditions.
The first bioconjugation reaction to capitalize on these properties was the
Staudinger ligation [G3]. In this method, azides on biomolecules react with
triarylphosphines (such as 45)to form iminophosphorane 46 with concomitant
loss of nitrogen gas, Fig. 10.3-17(a).Normally, this species would be hydrolyzed
to yield the amine and the phosphine oxide; however, it was shown that this
intermediate could be trapped by a pendant ester group displayed on the
aromatic ring. This ultimately results in the formation of an amide bond
that links the phosphine group to the biomolecular target of interest. The
mechanism of the reaction was examined in detail, including the isolation
and X-ray characterization of intermediate 471,when the reaction was carried
out in anhydrous solvent [7G].These studies have determined that the reaction
rate is accelerated both in polar solvents (such as water) and when electron-
rich aryl rings are attached to the phosphorus atom (although this also leads
to more rapid aerobic oxidation). For aliphatic azides, the rate-determining
step is the formation of iminophosphorane 46, but for aromatic azides the
45 46 47a X = H 48
47b X = CH3
(b)
Cellular
metabolism
* PhzP”oO HO~M& 2-
0-Cell surface 0-Cell surfac
HO
H P ! ? f N HO
Biotin-N 0
40c
0
(4
50 N3
0
Fig. 10.3-17 The Staudinger ligation. into sialic acid residues through metabolic
(a) Triarylphosphines and azides react to engineering. These groups can then be
form iminophosphorane imtermediate 46, labeled using biotinylated phosphine 49.
which is trapped by the pendant ester group. (c) For direct protein modification,
Intermediate 47b has been characterized by azidohomoalanine 50 can be incorporated
X-ray crystallography under anhydrous into proteins biosynthesized in methionine
conditions. (b) Modification o f cell surfaces auxotrophs. In this example, the azides were
using the Staudinger ligation. Treatment of labeled with a phosphine conjugated t o a
mammalian cells with mannose derivative FLAG peptide epitope 51.
40c results in the incorporation o f azides
618
I intramolecular attack on the ester group is rate limiting. The size of the
10 Synthesis $Large Biological Molecules
ester substituent also influences the efficiency of the reaction, with bulky
alkyl groups favoring competing hydrolysis pathways. It should be noted that
“traceless” versions of this reaction have also been developed [77,78],in which
the phosphine oxide moiety is excised during the peptide bond formation step.
This alternative method has proven especially useful for protein synthesis
via segment condensation reactions [79].A review of both Staudinger ligation
types has recently appeared in Ref. 80.
The use of this reaction in the biological context was first demonstrated for
the chemospecific labeling of Jurkat cell surfaces [63]. Metabolic engineering
with N-acetylmannosamine derivative 40c was used to incorporate azides
into sialic acid groups on cell surfaces. The cells were then incubated with
biotinylated phosphine 49,and the extent of the reaction was quantified by
flow cytometry after treatment with fluorescent avidin. Importantly, neither
the azide nor the phosphine displayed any reactivity with the cell-surface
groups in the absence of its reactive partner. In addition, the cells showed
unchanged growth rates after modification.
Since the original disclosure, the Staudinger ligation has evolved into a
powerful tool for the study of glycosylation pathways. The reactive specificity
for the azide/phosphine pair allows virtually any substrate bearing an azido
sugar to be derivatized and quantified in a Western blot or well-plate assay.
As examples, azide analogs have been used to identify protein targets
for N-acetylglucosamine modification in crude lysates [81] and to identify
glycosidases using azidosugars further substituted with fluorine atoms to
prevent enzyme turnover [82]. It has also been used to develop a parallel-plate
“azido-ELISA” assay for the identification of specific peptide sequences that
are targeted for mucin-type 0-glycosylation [68,83].More recently, the reaction
has even been used to modify cell-surface glycoproteins in living animals [84].
The Staudinger ligation has also been used to modify proteins into
which azides have been incorporated directly [85]. In this case, methionine
auxotrophic hosts were used to introduce azidohomoalanine 50 into multiple
sites of murine dihydrofolate reductase. These groups were then modified
using a phosphine bearing a FLAG peptide epitope 51 and detected using
a Western blot assay. The specificity of the labeling reaction was again
demonstrated by labeling proteins in crude cell lysates.
Second generation phosphine reagents have been developed for the
fluorescent detection of azide groups [86]. This system employs coumarin-
substituted phosphine 52, which is nonfluorescent due to excited state
quenching by the lone pair on the phosphorus atom. On oxidation of the
phosphine in the Staudinger ligation this quenching process is relieved,
resulting in a dramatic enhancement in the quantum yield for the dye,
Fig. 10.3-18.The use ofthis activatable fluorescence system provides significant
advantages over the traditional Western blot analyses because it can detect
azide-labeled proteins without the need for extensive washing steps and
antibody-based detection schemes.
N,-Protein -
52: nonfluorescent 53: fluorescent
(@= 0.01 1) (@= 0.65)
Fig. 10.3-18 Generation o f fluorescent Staudinger ligation

products. Excited state quenching by the phosphorus lone pair is
lost on ligation t o azides, resulting in a dramatic enhancement in
the fluorescence quantum yield.
+
10.3.5.3 [3 21 Dipolar Cycloadditions of Azides and Alkynes
In 2001, Kolb, Finn, and Sharpless published an article enumerating the
stereospecific chemical reactions that can join reactive components with high
yields and little by-product formation [87]. An interesting feature that they
share, termed Click reactions, is a great deal of exothermicity through the
use of “spring-loaded” reactive components. This report also focused on
the use of reactions that are air and water tolerant and can be used in the
absence of protecting groups. Thus, many of the reactions on the “Click”
list (e.g. hydrazone formation, oxime formation, and epoxide opening) would
be natural considerations for biomolecule modification, and, in fact, have
been used.
One reaction that proceeds particularly well in aqueous solution is the
+
Huisgen [ 3 21 electrocyclization of azides and alkynes [88]. Although the
individual components of the reaction are unreactive under most conditions,
they can be joined under thermal conditions (often by heating them to
80°C in the absence of solvent) to form triazole products. In the thermal
reaction, equimolar mixtures of syn- and anti-triazoles are obtained when
terminal alkynes are used. As an early demonstration of the specificity of these
components in this reaction, highly potent enzyme inhibitors were synthesized
in the active site of acetylcholine esterase using a library of azide and alkyne
components [89]. Although no reaction occurred between these compounds
in the absence of enzyme, the proximity of the reactive groups in the active
site promoted the [ 3 + 21 cycloaddition at room temperature, affording hybrid
compounds with femtomolar binding constants.
The chemospecificity of the reaction suggested that it could be carried out
using azides or alkynes attached to proteins if the reaction temperature could be
lowered. This breakthrough was achieved by two groups who simultaneously
reported that the reaction could be dramatically accelerated in the presence
of Cu(1) salts [90, 911. This allowed the reaction to take place in aqueous
solution with temperatures from 4 “C to RT. In the copper-catalyzed version
of the reaction, terminal alkynes show high specificity for the antiproduct.
620
I 70 Synthesis ofLarge Biological Molecules
Because of the handling difficulties associated with Cu(1) salts, CuSO4 is

typically reduced in situ using a reducing agent, such as ascorbic acid,
tris(carboxyethy1phosphine) (TCEP), or small portions of copper wire. It has
also been reported that the tris(pyrazolylmethy1)amineligand 58 dramatically
accelerates the reaction rate, presumably by stabilizing the Cu(1) oxidation
+
state [92]. As a result, the copper-catalyzed [3 21 cycloaddition reaction
has matured into a powerful method for the construction of glycosylation
inhibitors [93], dendrimers [94], and many other targets. In a recent study,
a binuclear copper cluster has been implicated as the active species in the
reaction mechanism [95].
Several reports have been made using this reaction as a protein bioconjuga-
tion technique. In the first report, GO azide functional groups attached to the
surface of the cowpea mosaic virus (CPMV)using NHS ester or iodoacetamide
reactions served as attachment sites for fluorescent alkyne derivatives [92].
The optimal conditions for the reaction were 21 pM protein (based on azide-
functionalized capsid monomers), 1 mM CuSO4, 2 mM TCEP, and 2 mM 58
in pH 8 buffer at 4°C for 16 h. As an alternative, small amounts of copper
wire were also effective in generating and maintaining the Cu(1) species. The
reaction was also successful when the alkynes were attached to the viral capsid
and azides were attached to the dye.
Further improvements in the reaction rate have resulted from ligand
optimization studies using a fluorescence-quenching assay [96]. In particular,
sulfonated bathophenanthroline ligand 59 was identified as an improved
ligand for the reaction, leading to substantial rate enhancements. It has been
suggested that the reaction acceleration is largely due to the improved solubility
of the charged ligand, compared to 58. With this ligand system, and the use
of C U ( M ~ C N ) ~ Oefficient
T ~ , protein modification has been reported using as
little as 2-2.5 equivalents of the coupling partner for each protein monomer.
One drawback of this system is the requirement for the rigorous exclusion
of oxygen. The exceptional functional group tolerance of the reaction was
demonstrated through the coupling of PEG polymers, peptides, and even
an intact protein (transferrin) to the surface of azide-functionalized CPMV
capsids [97].
+
The [3 21 cycloaddition reaction has also been used as a method to detect
probes attached to protein reactive sites [98]. Termed activity based protein
profiling, this approach first involves the alkylation of active site thiols in
proteases using azide-bearing phenylsulfonates. Because of the enhanced
nucleophilicity of these residues in the active site, only these locations are
modified. The protein conjugates are then coupled to alkyne-functionalized
biotin or rhodamine compounds for subsequent identification. Interestingly,
the reaction takes place in crude homogenates, further underscoring the high
+
functional group tolerance of the [3 21 cyclization chemistry. The advantage
of this approach is that it avoids the use of bulky chromophores or biotin
affinity tags that could influence the selectivity of the key protein-labeling step.
Although the previously-described studies have relied on primary bioconju-

gation reactions to introduce the azide and alkyne functional groups, several
+
reports have used the [ 3 21 cycloaddition reaction for the modification of
artificial amino acids incorporated at the transcriptional level. In the first
case, propargyloxy- and azide-functionalized phenylalanine derivatives were
effectively added to the genetic code of Saccharomyces cerevisiae using orthog-
onal tRNA/synthetase pairs, Fig. 10.3-19(b)[99]. This approach was used to
prepare human superoxide dismutase (SOD) mutants in which a tryptophan
residue was replaced by either GO or 61. After purification of the His,-tagged
+
protein using Ni-affinity chromatography, the [ 3 21 cycloaddition reaction
was carried out through exposure of the protein to CuSO4, copper wire, ligand
58, and the appropriate reactive partner. No reaction was observed in the case
Fig. 10.3-19 Modification o f proteins using obtained when capsids bearing alkynes were
“Click” chemistry. (a) Sixty azide groups exposed t o azides. (b) Azide- and
were introduced on the surface o f t h e alkyne-containing amino acids were
cowpea mosaic virus (CPMV) through the incorporated into proteins using unnatural
alkylation o f genetically introduced cysteine tRNA/synthetase pairs obtained using
residues. These groups can be modified selection techniques. These groups can be
through exposure t o alkynes, &(I) (the modified with high chemoselectivity using
Cu(ll) source is reduced in situ by the TCEP), the appropriate Click CoLJPlingPartners.
and ligand 58 or 59. Similar results were
622
I of the wild-type protein. This study highlights the power of artificial amino
acid incorporation in the development of selective bioconjugation.

In another example, azide functional groups were displayed on the surface
of E. coli methionine auxotrophs by adding azidohomoalanine 50 to the culture
+
medium before induction [loo, 1011. The Cu-catalyzed [3 21 cycloaddition
reaction was then used to attach a biotinylated alkyne to expressed proteins on
the surface of the living cells. After secondary labeling with fluorescent avidin,
flow cytometry was used to confirm the success of the labeling reaction. No
conjugation was detected in the case of cells grown in methionine-containing
media.
A potential cause for concern in Click reactions is the requirement of
copper ions, as Cu(1) can bind to proteins and is toxic to cells. Although
the previous studies indicate that these problems are not insurmountable
(particularly if high affinity, water-soluble ligands are used), a metal-free
version of the Click reaction has been developed for applications in which
cells must remain viable after modification [102].This reaction is driven by the
relief of ring strain for cyclooctyne, resulting in successful reaction with azides
at room temperature (Fig. 10.3-20). This reaction was tested in the context
of a bioconjugation reaction with recombinant glycoprotein GlyCAM-Ig, into
which azides were incorporated through metabolic engineering techniques
(see above). On treatment with biotin-cyclooctyne 62,the [3 21 cycloaddition
+
occurred in the absence of metal ions, as determined by biotin quantification
using Western blot analysis. Under identical conditions terminal alkynes
did not react, and control reactions performed on proteins lacking azide
groups showed no labeling. This reaction was also used successfully for the
labeling of Jurkat cells bearing azide-containing sialic acid derivatives on
+ oo
H
9 0 - N ~ Biotin
3 0
62 (250 pM)
63a 63b
Fig. 10.3-20 Metal-free bioconjugation using a strain-promoted

+
[3 21 dipolar cycloaddition reaction. This reaction i s accelerated
by the relief o f ring strain in the transition state as the alkyne
carbons become sp2 hybridized.
their surface. Although this reaction appears to be somewhat slower than the
copper-catalyzed reaction, no losses in cell viability were observed in these
studies.
10.3.5.4 Aniline Functionalization through Oxidative Coupling Reactions

The techniques described above demonstrate the value of identifying new
coupling partners that possess no cross-reactivity with native functionality. To
add to this group, new reactive pairs based on the chemoselective oxidative
coupling of aniline groups have recently been developed [ 1031. This reaction
is based on the observation that N-acyl phenylenediamine derivatives (such as
64) trimerize extremely rapidly under oxidative conditions, ultimately resulting
in the formation of highly stable dye molecules (e.g.. G S ) , Fig. 10.3-21 [104].
It was found that the addition of alkyl groups to the free phenylene diamine
nitrogen atom, blocks its participation as a nucleophile in this reaction, but
NH2 NH2
64 65
+ NAO
Protein
67
R2
ProteinJyq HNAo
\ N 4
1. Oxidation
3. Oxidation
2.H20 ~ .a.q0
Protein
\ N 4
69 70
Fig. 10.3-21 Secondary bioconjugation chemoselective two-component analog was

using oxidative coupling reactions. developed. Following formation o f adduct
(a) N-acylphenylene diamine derivatives 69, a series o f oxidation and water addition
rapidly trimerize under oxidative conditions steps occur to afford stable product 70.
t o yield stable dyes, such as 65. (b) By Presumably this reaction proceeds via
adding substituents to the amino group, a charge transfer complex 68.
624
I still allows species 66 to react rapidly with additional anilines to form adduct
10 Synthesis $Large Biological Molecules
69. Subsequent reoxidation of this intermediate, followed by the nucleophilic

addition of water, and a final oxidation step, affords product 70. This “A B”+
analog of 65 has similar stability, showing no degradation at pH levels 1 to 11
and under both oxidative and reductive conditions. The reaction proceeds in
aqueous solution and is complete in less than 1 min, even at low concentrations
of 66 and 67. The reaction has also been carried out using anilines attached to
proteins, and shows virtually complete selectivity for the desired modification
pathway. No protein modification occurs using either the aniline or the
bis(iminoquinone) component alone. The origin of this chemoselectivity is
presumed to result from the intermediary of radical pair 68 produced on
electron transfer between the aniline and the oxidized phenylene diamine
groups.
A similar coupling reaction has also been developed for aminotyrosines,
which can be convenienty prepared from azotyrosine 71 through reduction
with sodium dithionite, Fig. 10.3-22(a)[24]. In the presence of NaI04,
(NH4)2Ce(N03)G, or Hz02, these groups undergo exceptionally rapid coupling
reactions with phenylene diamine derivatives (again, typically reaching
complete conversion in under 1 min), presumably through an analogous
charge transfer complex 74 [103]. Subsequent oxidation of adduct 75 yields
stable product 76. The chemoselectivity of this technique for aminotyrosine-
containing proteins is shown in Fig. 10.3-22(b).
10.3.6
New Methods for Bioconjugate Purification
Many, if not most, bioconjugation reactions do not reach full conversion,

affording mixtures of modified and unmodified proteins that are difficult
to separate. While the presence of residual wild-type protein is tolerable in
some cases, many applications (such as FRET studies) would benefit from the
removal of unreacted proteins from the sample. Furthermore, the isolation
and concentration of specifically labeled proteins or peptide fragments can
assist their characterization by mass spectrometry. This is most commonly
achieved using affinity chromatography based on biotinlavidin interactions.
However, the harsh conditions required to release the protein substrate from
the solid support can lead to substantial losses in activity in many instances.
This method also requires the synthesis of bifunctional labeling agents that
possess both the group of interest and the biotin tag.
A simpler and more general alternative relies on the ability of B-cyclodextrin
immobilized on a Sephacryl support to form hostlguest complexes with a wide
range of organic molecules, including chromophores [105]. On addition of
cyclodextrin-functionalizedSephacryl resin 77 to the protein-labeling reactions,
the modified protein is bound by the resin while unmodified protein is left in
the solution. Following isolation of the resin via filtration, the captured protein
Fig. 10.3-22 Secondary bioconjugation proceed through charge transfer complex

using aminotyrosines. (a) Following 74. (b) The chemoselectivity of the reaction
reduction o f azotyrosine 71 using dithionite, was demonstrated by mixing proteins
the aminotyrosine product 72 couples containing (boxed) or lacking aminotyrosine
rapidly with N-acylphenylene diamine 73 and labeling them with 73. Separation by
under oxidative conditions. Following the SDS-PACE and visualization o f the
formation of adduct 75, an additional fluorescent dye confirmed that only the
oxidation step occurs t o yield stable product aminotyrosine-labeled proteins participated
76. Similar to the aniline coupling strategy in the reaction. Proteins: A - BSA,
described above, this reaction i s believed t o B - chymotrypsinogen A, C - RNAse A.
can be released through the addition of a competitive cyclodextrin binder,

such as adamantane carboxylic acid 78 (Fig. 10.3-23). A particularly attractive
feature of this method is the mild conditions that can be used to elute the
protein from the resin.
10.3.7
Future Development
As the field of bioconjugation continues to evolve at a rapid pace, there

are a number of challenges that are likely to be addressed. The first of
these involves improvements in the overall reliability of existing protein
modification reactions. A lesson from the field of natural product synthesis
626
I
Fig. 10.3-23 A general strategy for the o f unmodified protein via filtration. The
purification o f chromophore-labeled captured proteins can be eluted from the
proteins. (a) This approach takes advantage resin using a competitive cyclodextrin
o f host/guest interactions between binder, such a s adamantane carboxylic acid
Sepharose-bound cyclodextrins and (78). (c) Purification o f Oregon Green
hydrophobic organic molecules. A sample o f labeled myoglobin. The removal o f residual
compatible chromophores is shown at right. unlabeled protein can be confirmed through
(b) The resin captures chromophore-labeled UV-vis analysis, or (d-f) by using ESI-MS.
proteins selectively, allowing facile removal
is that even the most predictable chemical reactions can display unexpected
reactivity and selectivity when applied to complex molecular targets. Similar
behavior is often observed for protein modification, as each biomolecular
target presents multiple chemical environments of unmatched complexity.
The “personality” of each protein can be difficult to predict, owing to
variations in the solvent accessibility oftargeted residues and the effects oflocal
environments on p K, values. Further complications arise on consideration of
the rapid conformational changes of the surface groups and the aggregation
of proteins and reagents in aqueous solution. As a result, the scope and utility
of each bioconjugation reaction can be evaluated only by applying it to many
targets over a period of time. Although some of the structure/reactivity data

can be generated through crystal structure analysis, it is likely that these
studies will be facilitated by the new understanding which N M R structure
determination, single molecule spectroscopy, and molecule dynamics can now
provide.
Given this structural diversity, the continued development of new reactions
is also crucial. Even in cases where a modification strategy is already
in place for a particular functional group, alternative reactions can allow
expansions in substrate scope, alterations in modification selectivity, synthetic
convenience, and perhaps even greater biocompatibility. Just as a well-
trained synthetic chemist must know a dozen methods for the oxidation
of an alcohol to an aldehyde, protein bioconjugation will be approached with
much more success if many techniques are available to address the situation
at hand.
In terms of the reactions themselves, there are several areas that are likely
to see improvement. First, there are still many native functional groups
for which reliable modification strategies are yet to be developed (including
disulfides, asparagines (potentially allowing an efficient synthesis of N-linked
glycoproteins), and methionines). Secondly, many bioconjugation reactions
currently in use do not reach full conversion in a reasonable period of time.
The availability of new transition metal-based methods is likely to address both
of these limitations as the design rules for effecting these transformations in
aqueous media are further elucidated. Currently, it is difficult or impossible
to distinguish between two instances of a native residue - a situation that
is also likely to change as more complex modification reagents and catalyst
ligands are applied. Clearly, artificial amino acid incorporation techniques will
also be used to address these challenges. As noted above, chromatography
techniques that can purify modified proteins are beginning to appear, and the
analytical techniques for bioconjugate characterization are steadily improving.
Advances in mass spectrometry (both in terms of capability and accessibility)
will certainly continue to play a key role in this regard, as the mass accuracy
of this analysis method can reveal information about a reaction outcome that
SDS-PAGE cannot resolve.
A frontier that is certainly gaining considerable attention is the ability to
modify proteins inside living cells. Given the success of FlAsH techniques
for in situ labeling, these strategies are certain to provide important tools
that increase our understanding of cellular function. The design of these
reactions is very challenging, in part due to the very low concentration of
individual protein targets in a cell and the exquisitely high specificity required
to avoid background labeling. Another significant challenge is presented by the
high concentration (-5 mM) of glutathione in the cytoplasm of mammalian
cells, as the free thiol group of this reagent foils most reactions involving
electrophilic reagents, radicals, and oxidants. Added to this is the requirement
for nontoxic reagents and the challenge of designing compounds that can
cross the cell membrane to reach the targets inside. Over time these criteria
10 Synthesis of Large B;o/ogica/ Molecules
628
I will undoubtedly be met, perhaps by using drug-design principles from the
pharmaceutical industry.
In the light of all of these considerations, perhaps the most important
frontier is a conceptual one. As the access to structural information increases,
along with convenient computer programs that can be used to visualize and
analyze complex biomolecules, it is hoped that more chemists will see proteins
as the organic molecules that they are. The principles of chemical reactivity
and conformational analysis apply to these compounds just as they do to any
other natural product, and many groups have used exactly these same concepts
to develop the reactions described herein. The development of future methods
to meet the above challenges will be achieved most successfully by those who
have adopted this mindset.
10.3.8
Conclusion
Taken together, the new chemical tools described herein have dramatically
altered the landscape of chemical biology. Each of these techniques has
expanded the scope of bioconjugates that can be prepared, and thus the
creativity with which new experimental systems can be designed. Many of
the labeling reactions can achieve levels of selectivity that were previously
impossible to attain, even allowing single proteins to be targeted in the
complex biochemical settings of living cells. Equally important is the
continued development of the conceptual framework that is needed to create
future reactions. In addition to improving our understanding of enzyme
function and protein trafficking, these new techniques have enabled frontier
applications in proteomics, single molecule spectroscopy, and the preparation
of biomolecular materials, among others. As new strategies continue to
emerge, this field is certain to retain its crucially important role in chemical
biology.
Acknowledgments
I would especially like to thank the students with whom I have had the pleasure
of working during the past four years. They are an extremely talented and
creative group of scientists, and 1 cannot overemphasize my gratitude for their
enthusiasm, hard work, and intellectual input. Our efforts in the area ofprotein
modification have been generously funded by the Biomolecular Materials
Program at Lawrence Berkeley National Labs, the DOE Nanoscale Science and
Engineering Technology (NSET)program, the NIH (R01 GM072700-Ol),and
the Department of Chemistry at UC Berkeley.
References I 6 2 9
References
1. A.F. Straight, A. Cheung, J. Limouze, 11. ].A. Maurer, D.E. Elmore, H.A.
I. Chen, N.J. Westwood, J.R. Sellers, Lester, D.A. Dougherty, Comparing
T.J. Mitchison, Dissecting temporal and contrasting Escherichia coli and
and spatial control of cytokinesis with Mycobacterium tuberculosis
a myosin I1 inhibitor, Science 2003, mechanosensitive channels
299,1743-1747. (MscL) - new gain of function
2. B.A. Griffin, S.R. Adams, R.Y. Tsien, mutations in the loop region, /. B i d .
Specific covalent labeling of Chem. 2000,275, 22238-22244.
recombinant protein molecules 12. Q. Wang, T.W. Lin, L. Tang, J.E.
inside live cells, Science 1998, 281, Johnson, M.G. Finn, Icosahedral
269-272. virus particles as addressable
3. E. Babini, I. Bertini, M. Borsari, nanoscale building blocks, Angew.
F. Capozzi, C. Luchinat, X.Y. Zhang, Chem. Int. Ed. Engl. 2002, 41,
G.L.C. Moura, I.V. Kurnikov, D.N. 459-462.
Beratan, A. Ponce, A.J. Di Bilio, J.R. 13. For an example of double
Winkler, H.B. Gray, Bond-mediated chromophore labeling for FRET
electron tunneling in studies, see M. Borsch, M. Diez,
ruthenium-modified high-potential B. Zimmermann, R. Reuter,
iron-sulfur protein, J. Am. Chem. SOL. P. Graber, Stepwise rotation of the
2000, 122,4532-4533. y-subunit of EFoFl-ATP synthase
4. S. Zalipsky, Chemistry of observed by intramolecular
polyethylene-glycol conjugates with single-molecule fluorescence
biologically-active molecules, Adu. resonance energy transfer, FEES Lett.
Drug Deliv. Rev. 1995, 16, 157-182. 2002, 527,147-152.
5. S. Zalipsky, J.M. Harris, Introduction 14. R.F. Doolittle, Redundancies in
to chemistry and biological protein sequences, in Prediction of
applications of poly(ethy1ene glycol), Protein Structure and the Principles of
Poly(EthyleneGlycol) 1997, 680, 1-1 3. Protein Conformation,(Ed.: G.D. Fas-
6. H.C. Hang, C.R. Bertozzi, man), Plenum Press, New York,
Chemoselective approaches to 1989.
glycoprotein assembly, Acc. Chem. 15. J. Houk, G.M. Whitesides, Structure
Res. 2001, 34, 727-736. reactivity relations for thiol disulfide
7. C.M. Niemeyer, Nanoparticles, interchange, /. Am. Chem. SOL.1987,
proteins, and nucleic acids: 109,6825-6836.
biotechnology meets materials 16. T.P. King, Y. Li. L. Kochoumian,
science, Angew. Chem. Int. Ed. Engl. Preparation of protein conjugates via
2001,40,4128-4158. intermolecular disulfide bond
8. N.C. Seeman, A.M. Belcher, formation, Biochemistry 1978, 17,
Emulating biology: building 1499- 1506.
nanostructures from the bottom up, 17. For an example, see S. Zalipsky,
Proc. Nut. Acad. Sci. U. S. A. 2002, 99, M. Qazen, J.A. Walker, N. Mullah,
6451-6455. Y.P. Quinn, S.K. Huang, New
9. For an excellent review of common detachable poly(ethy1ene glycol)
bioonjugation techniques, see G.T. conjugates: cysteine-cleavable
Hermanson, Bioconjugute Techniques, lipopolymers regenerating natural
Academic Press, San Diego, 1996. phospholipid, diacyl
10. T.L. Schlick, Z.B. Ding, E.W. Kovacs, phosphatidylethanolamine,
M.B. Francis, Dual-surface Bioconjug. Chem.1999, 10, 703-707.
modification of the tobacco mosaic 18. H.R.Adams,C.H. Paik, W.C.
virus, J . Am. Chem. SOC.2005, 127, Eckelman, R.C. Reba, Electrophilic
3718-3723. iodination of aromatic rings, J .
630
I I0 Synthesis of Large Biological Molecules
Labelled Comp. Radiopharm. 1982, 19, 30. K.J. Franz, M. Nitz, B. Imperiali,
1477- 1478. Lanthanide-binding tags as versatile
19. W.C. Eckelman, H.R. Adams, C.H. protein coexpression probes,
Paik, Electrophilic iodination of Chembiochem 2003,4,265-271.
aromatic rings, Int. ]. Nucl. Med. Biol. 31. H. Dibowski, F.P. Schmidtchen,
1984, 11,163-166. Bioconjugation of peptides by
20. J.F. Leite, M. Cascio, Probing the palladium-catalyzed C-C
topology of the glycine receptor by cross-coupling in water, Angew.
chemical modification coupled to Chem. Int. Ed. Engl. 1998, 37,
mass spectrometry, Biochemistry 476-478.
2002,41,6140-6148. 32. D.T. Bong, M.R. Ghadiri,
21. H.G. Higgins, D. Fraser, The Chemoselective Pd(0)-catalyzed
reaction of amino acids and proteins peptide coupling in water, Org. Lett.
with diazonium compounds. 1. A 2001,3,2509-2511.
spectrophotometric study of 33. A. Ojida, H. Tsutsumi, N. Kasagi,
azo-derivativesof histidine and I. Hamachi, Suzuki coupling for
tyrosine, Australian]., Sci. Res. Ser. A protein modification, Tetrahedron
Phys. Sciences 1952, 5, 736-753. Lett. 2005, 46, 3301-3305.
22. H.G. Higgins, K.J. Harrington, 34. S.D. Tilley, M.B. Francis, Submitted.
Reaction of amino acids and proteins 35. J. Stubbe, D.G. Nocera, C.S. Yee,
with diazonium compounds. M.C.Y. Chang, Radical initiation in
2. Spectra of protein derivatives,Arch. the class I ribonucleotide reductase:
Biochem. Biophys. 1959, 85, 409-425. long-range proton-coupled electron
23. J.A. Shin, Specific DNA binding transfer? Chem. Rev. 2003, 103,
peptide-derivatized solid support,
2167-2201.
Bioorg. Med. Chem. Lett. 1997, 7,2367.
36. J.M. Antos, M.B. Francis, Selective
24. J.M. Hooker, E.W. Kovacs, M.B.
tryptophan modification with
Francis, Interior surface modification
rhodium carbenoids in aqueous
of bacteriophage MS2, J. Am. Chem.
SOC.2004, 126,3718-3719. solution, ]. Am. Chem. SOC.2004, 126,
10256-10257.
25. N.S. Joshi, L.R. Whitaker, M.B.
Francis, A three-component 37. H.M. Davies, P.R. Bruzinski, D.H.
mannich-type reaction for selective Lake, N. Kong, M.J. Fall, Asymmetric
tyrosine bioconjugation, J. Am. cyclopropanations by rhodium(I1)
Chem. SOC. 2004, 126, 15942-15943. N-(arylsu1fonyl)prolinate catalyzed
26. For an example of a decomposition of
lanthanide-promoted phenol vinyldiazomethanes in the presence
modification with imines in organic of alkenes. Practical enantioselective
solvents, see T.S. Huang, C.J. Li, synthesis of the four stereoisomers of
Synthesis of amino acids via a 2-phenylcyclopropan-1-aminoacid, J .
three-component reaction of phenols, Am. Chem. SOC. 1996, 118,
glyoxylates and amines, Tetrahedron 6897 - 6907.
Lett. 2000, 41, 6715. 38. Most proteins are not denatured by
27. H. Fraenkel-Conrat, H.S. Olcott, the use of this cosolvent. For
Reaction of formaldehyde with examples, see Y.L. Khmelnitsky, V.V.
proteins. VI. cross-linking of amino Mozhaev, A.B. Belova, M.V.
groups with phenol, imidazole, or Sergeeva, K. Martinek, Denaturation
indole groups, /. Biol. Chem. 1948, capacity - a new quantitative
174,827-843. criterion for selection of
28. N.S. Joshi, M.B. Francis, Submitted. organic-solvents as reaction media in
29. P.E. Dawson, T.W. Muir, biocatalysis, Eur. ]. Biochem. 1991,
I. Clarklewis, S.B.H. Kent, Synthesis 198,31-41.
of proteins by native chemical 39. J.M. Antos, M.B. Francis,
ligation, Science 1994, 266, 776-779. Unpublished results.
References I 6 3 1
40. An analogous rearrangement proteins - a review, J . Protein Chem.
pathway has been observed for small 1984,3,99-108.
molecule disulfides in organic 49. For examples, see T.J. Tolbert, C.H.
solvents M. Hamaguchi, T. Misumi, Wong, Intein-mediated synthesis of
T. Oshima, Reaction of proteins containing carbohydrates
vinylcarbenoids with cyclic disulfides: and other molecular probes, /. Am.
formation of 1,3-insertion products Chem. SOC. 2000, 122, 5421-5428.
as well as 1,l-insertion products, 50. S.R. Adams, R.E. Campbell, L.A.
Tetrahedron Lett. 1998, 39, Gross, B.R. Martin, G.K. Walkup,
7113-7116. Y. Yao, I. Llopis, R.Y. Tsien, New
41. J.M. McFarland, M.B. Francis, biarsenical ligands and tetracysteine
Reductive alkylation of proteins motifs for protein labeling in vitro
using iridium catalyzed transfer and in vivo: synthesis and biological
hydrogenation, J . Am. Chem. SOC. applications, J . Am. Chem. SOC. 2002,
2005, in press. 124,6063-6076.
42. T.J. Sereda, C.T. Mant, A.M. Quinn, 51. R.Y. Tsien, The green fluorescent
R.S. Hodges, Effect of alpha-amino protein, Annu. Rev. Biochem. 1998,
group on peptide retention behavior 67,509-544.
in reversed-phase 52. C.J. Noren, S.J. Anthonycahill, M.C.
chromatography - determination of Griffith, P.G. Schultz, A general
the pK(a) values of the alpha-amino method for site-specific incorporation
group of 19 different N-terminal of unnatural amino-acids into
amino-acid-residues, /. Chromatogr. proteins, Science 1989, 244, 182-188.
1993, 646,17-30. 53. J.A. Ellman, D. Mendel,
43. J.P. Tam, Q.T. Yu, Z.W. Miao, S. Anthonycahill, C.J. Noren, P.G.
Orthogonal ligation strategies for Schultz, P. G. Biosynthetic method
peptide and protein, Biopolymers for introducing unnatural
1999, 51,311-332. amino-acids site-specifically into
44. X.F. Li, L.S. Zhang, S.E. Hall, J.P. proteins, Methods Enzymol.1991,
Tam, A new ligation method for 202,301-336.
N-terminal tryptophan-containing 54. L. Wang, A. Brock, B. Herberich,
peptides using the Pictet-Spengler P.G. Schultz, Expanding the genetic
reaction, Tetrahedron Lett. 2000, 41, code of Escherichia coli, Science 2001,
4069-4073. 292,498-500.
45. P.H. Hirel, J.M. Schmitter, 55. J.W. Chin, S.W. Santoro, A.B. Martin,
P. Dessen, G. Fayat, S. Blanquet, D.S. King, L. Wang, P.G. Schultz,
Extent of N-terminal methionine Addition of p-azido-L-phenylalanine
excision from escherichia-coli to the genetic code of Escherichia
proteins is governed by the coli, J . Am. Chem. SOC. 2002, 124,
side-chain length of the penultimate 9026-9027.
amino-acid, Proc. Nut. Acad. Sci. U. S. 56. L. Wang, P.G. Schultz, Expanding
A. 1989, 86,8247-8251. the genetic code, Chem. Commun.
46. K.F. Geoghegan, J.G. Stroh, 2002, 1 , 1-11.
Site-directed conjugation of 57. L. Wang, Z. Zhang, A. Brock, P.G.
nonpeptide groups to peptides and Schultz, Addition of the keto
proteins via periodate-oxidation of a functional group to the genetic code
2-amino alcohol - application to of Escherichia coli, Proc. Nat. Acad.
modification at N-terminal serine, S C ~U.. S. A. 2003, 100, 56-61.
Bioconjug. Chem. 1992, 3, 138-146. 58. R.A. Mehl, J.C. Anderson, S.W.
47. J.M. Gilmore, R.A. Scheck, M.B. Santoro, L. Wang, A.B. Martin, D.S.
Francis, Unpublished results. King, D.M. Horn, P.G. Schultz,
48. For a related reaction catalyzed by Generation o fa bacterium with a 2 1
copper ions, see H.B.F. Dixon, amino acid genetic code, J . Am.
N-terminal modification of Chem. SOC. 2003, 125,935-939
6321 70 iynthesis of Large Biological Molecules
59. K.L. Kiick, D.A. Tirrell, Protein 69. 1. Chen, M. Howarth, W. Lin, A.Y.
engineering by in vivo incorporation Ting, Site-specificlabeling of cell
of non-natural amino acids: control surface proteins with biophysical
of incorporation of methionine probes using biotin ligase, Nat.
analogues by methionyl-tRNA Methods 2005, 2, 99-104.
synthetase, Tetrahedron 2000, 56, 70. M. Howarth, K. Takao, Y. Hayashi,
9487-9493. A.Y. Ting, Targeting quantum dots to
60. K.L. Kiick, R. Weberskirch, D.A. surface proteins in living cells with
Tirrell, Identification of an expanded biotin ligase, Proc. Nat. Acad. Sci. U.
set of translationally active S. A. 2005, 102,7583-7588.
methionine analogues in Escherichia 71. Y. Kho, S.C. Kim, C. Jiang, D. Barma,
coli, F E B S Lett. 2001, 502, 25-30. S.W. Kwon, J.K. Cheng, J. Jaunbergs,
61. K. Kirshenbaum, I.S. Carrico, D.A. C. Weinbaum, F. Tamanoi, J. Falck,
Tirrell, D. A. Biosynthesis of proteins Y.M. Zhao, A tagging-via-substrate
incorporating a versatile set of technology for detection and
phenylalanine analogues, proteomics of farnesylated proteins,
Chembiochem 2002,3, 235-237. Proc. Nut. Acad. Sci. U. S. A. 2004,
62. L.K. Mahal, K.J. Yarema, C.R. 101,12479-12484.
Bertozzi, Engineering chemical 72. J. Yin, F. Liu, X.H. Li, C.T. Walsh,
reactivity on cell surfaces through Labeling proteins with small
oligosaccharide biosynthesis, Science molecules by site-specific
1997, 276,1125-1128. posttranslational modification, J . Am.
63. E. Saxon, C.R. Bertozzi, Cell surface Chem. SOC.2004, 126,7754-7755.
engineering by a modified staudinger 73. V.W. Cornish, K.M. Hahn, P.G.
Schultz, Site-specificprotein
reaction, Science 2000, 287,
modification using a ketone handle,
2007-2010.
J. Am. Chem. SOC.1996, 118,
64. E. Saxon, S.J. Luchansky, H.C. Hang,
8150-8151.
C. Yu, S.C. Lee, C.R. Bertozzi,
74. W.P. Jencks, Studies on the
Investigating cellular metabolism of
mechanism of oxime and
synthetic azidosugars with the
semicarbazone formation, J . Am.
staudinger ligation, J. Am. Chem. SOC.
Chem. SOC.1959,81,475-481.
2002, 124,14893-14902.
75. Z.W. Zhang, B.A.C. Smith, L. Wang,
65. J.H. Lee, T.J. Baker, L.K. Mahal, A. Brock, C. Cho, P.G. Schultz, A
J. Zabner, C.R. Bertozzi, D.F. new strategy for the site-specific
Wiemer, M.J. Welsh, Engineering modification of proteins in vivo,
novel cell surface receptors for Biochemistry-Us2003, 42,6735-6746.
virus-mediated gene transfer, J. B i d . 76. F.L. Lin, H.M. Hop, H. van Halbeek,
Chem. 1999,274,21878-21884. R.G. Bergman, C.R. Bertozzi,
66. S.J. Luchansky, C.R. Bertozzi, Azido Mechanistic investigation of the
sialic acids can modulate cell-surface staudinger ligation, J. Am. Chem. SOC.
interactions, Chembiochem2004, 5, 2005, 127,2686-2695.
1706- 1709. 77. E. Saxon, J.I. Armstrong, C.R.
67. R.A. Chandra, E.A. Douglas, R.A. Bertozzi, A “traceless” Staudinger
Mathies, C.R. Bertozzi, M.B. Francis, ligation for the chemoselective
Programmable cell adhesion encoded synthesis of amide bonds, Org. Lett.
by DNA hybridization, Angew. Chem. 2000,2,2141-2143.
Int. Ed. Engl. 2006, 45,896-901. 78. B.L. Nilsson, L.L. Kiessling, R.T.
68. H.C. Hang,C.Yu, D.L. Kato, C.R. Raines, Staudinger ligation: a peptide
Bertozzi, A metabolic labeling from a thioester and azide, Org. Lett.
approach toward proteomic analysis 2000,2,1939-1941.
of rnucin-type 0-linked glycosylation, 79. B.L. Nilsson, R.J. Hondal, M.B.
Proc. Nat. Acad. Sci. U. S. A. 2003, Soellner, R.T. Raines, Protein
100,14846-14851. assembly by orthogonal chemical
References I 6 3 3
ligation methods, J . Am. Chem. Soc. 90. V.V. Rostovtsev, L.G. Green,V.V.
2003, 125,5268-5269. Fokin, K.B. Sharpless, A stepwise
80. M. Kohn, R. Breinbauer, The huisgen cycloaddition process:
staudinger ligation-A gift to copper(1)-catalyzed regioselective
chemical biology, Angew. Chem. Znt. “ligation” of azides and terminal
Ed. Engl. 2004,43, 3106-3116. alkynes, Angew. Chem. Znt. Ed. Engl.
81. D.J. Vocadlo, H.C. Hang, E.J. Kim, 2002,41,2596-2599.
J.A. Hanover, C.R. Bertozzi, A 91. C.W. Torn~re,C. Christensen,
chemical approach for identifying M. Meldal, Peptidotriazoles on solid
0-GlcNAc-modified proteins in cells, phase: [1,2,3]-triazoles by
Proc. Natl. Acad. Sci. U. S. A. 2003, regiospecific copper(1)-catalyzed
100,9116-9121. 1,3-dipolar cycloadditions of terminal
82. D. J. Vocadlo, C.R. Bertozzi, A alkynes to azides, J . Org. Chem. 2002,
strategy for functional proteomic 67,3057-3062.
analysis of glycosidase activity from 92. Q. Wang, T.R. Chan, R. Hilgraf, V.V.
cell lysates, Angew. Chem. Int. Ed. Fokin, K.B. Sharpless, M.G. Finn,
Engl. 2004,43,5338-5342. Bioconjugation by
83. H.C. Hang, C. Yu, M.R. Pratt, C.R. copper(1)-catalyzedazide-alkyne
Bertozzi, Probing glycosyltransferase +
(3 21 cycloaddition,]. Am. Chem.
activities with the staudinger ligation, SOC.2003, 125, 3192-3193.
1.Am. Chem. Soc. 2004, 126,6-7. 93. L.V. Lee, M.L. Mitchell, S.J. Huang,
84. J.A. Prescher, D.H. Dube, C.R. V.V. Fokin, K.B. Sharpless, C.H.
Bertozzi, Chemical remodelling of Wong, A potent and highly selective
cell surfaces in living animals, Nature inhibitor of human
2004,430,873-877. alpha-l,3-fucosyltransferasevia click
85. K.L. Kiick, E. Saxon, D.A. Tirrell, C.R. chemistry,]. Am. Chem. SOC.2003,
Bertozzi, Incorporation of azides into 125,9588-9589.
recombinant proteins for 94. P. Wu, A.K. Feldman, A.K. Nugent,
chemoselective modification by the C.J. Hawker, A. Scheel, B. Voit,
staudinger ligation, Proc. Natl. Acad. J. Pyun, J.M.J. Frechet, K.B.
S C ~U.. S . A. 2002, 99, 19-24. Sharpless, V.V. Fokin, Efficiency and
86. G.A. Lemieux, C.L. de Graffenried, fidelity in a click-chemistry route to
C.R. Bertozzi, A fluorogenic dye triazole dendrimers by the
activated by the staudinger ligation,]. copper(I)-catalyzed ligation of azides
Am. Chem. Soc. 2003, 125, and alkynes, Angew. Chem. Int. Ed.
4708-4709. Engl. 2004, 43, 3928-3932.
87. H.C. Kolb, M.G. Finn, K.B. 95. V.O. Rodionov, V.V. Fokin, M.G.
Sharpless, Click chemistry: diverse Finn, Mechanism of the ligand-free
chemical function from a few good Cu-I-catalyzed azide-alkyne
reactions, Angew. Chem. Znt. Ed. Engl. cycloaddition reaction, Angew. Chem.
2001,40,2004-2021. Znt. Ed. Engl. 2005, 44, 2210-2215.
88. R. Huisgen, in 1,3-Dipolar 96. W.G. Lewis, F.G. Magallon, V.V.
Cycloaddition Chemistry, (Ed.: Fokin, M.G. Finn, Discovery and
A. Padwa), Vol I, Wiley, New York, characterization of catalysts for
1984, pp. 1-176. azide-alkyne cycloaddition by
89. W.G. Lewis, L.G. Green, fluorescence quenching,]. Am.
F. Grynszpan, 2 . Radic, P.R. Carlier, Chem. SOC.2004, 126,9152-9153.
P. Taylor, M.G. Finn, K.B. Sharpless, 97. S . S . Gupta, J. Kuzelka, P. Singh,
Click chemistry in situ: W.G. Lewis, M. Manchester, M.G.
Acetylcholinesterase as a reaction Finn, Accelerated bioorthogonal
vessel for the selective assembly of a conjugation: a practical method for
femtomolar inhibitor from an array the ligation of diverse functional
of building blocks, Angew. Chem. Int. molecules to a polyvalent virus
Ed. Engl. 2002,41,1053-1057. scaffold, Bioconjug. Chem. in press.
634
98. A.E. Speers, G.C. Adam, B.F. Cravatt, functionality in bacterial cell surface
Activity-based protein profiling in proteins,]. Am. Chem. SOC.2004, 126,
vivo using a copper(1)-catalyzed 10598- 10602.
azide-alkyne [ 3 + 21 cycloaddition, ]. 102. N.J. Agard, J.A. Prescher, C.R.
Am. Chem. SOC. 2003, 125, Bertozzi, A strain-promoted ( 3 + 21
4686-4687. azide-alkyne cycloaddition for
99. A. Deiters, T.A. Cropp, M. Mukherji, covalent modification of
J.W. Chin, J.C. Anderson, P.G. biomolecules in living systems, J .
Schultz, Adding amino acids with Am. Chem. SOC.2004, 126,
novel reactivity to the genetic code of 15046- 15047.
Saccharomyces cerevisiae, /. Am. 103. J.M. Hooker, M.B. Francis,
Chem. SOC.2003, 125,11782-11783. Submitted.
100. A.J. Link, D.A. Tirrell, Cell surface 104. J.F. Corbett, Benzoquinone imines.
labeling of Escherichia coli via part IV. Mechanism and kinetics of
copper(1)-catalyzed [3 + 21 the formation of bandrowski’s base,
cycloaddition, J. Am. Chem. SOC. J . Chem. SOC. B 1969, 818.
2003, 125,11164-11165. 105. T. Nguyen, N.S. Joshi, M.B. Francis,
101. A.J. Link, M.K.S. Vink, D.A. Tirrell, Submitted.
Presentation and detection of azide
Chemical Biology
I635
11
Advances in Sugar Chemistry
11.1
The Search for Chemical Probes to Illuminate Carbohydrate Function
Laura L. Kiessling and Erin E. Carlson
Outlook
Until the 1970s, it was believed that the major cellular functions of
carbohydrates were confined to their use as structural elements or energy
sources. Since then, evidence that glycoconjugates function in many diverse
roles has led to an increased appreciation of these biomolecules. Saccharides
act as information carriers and effect many signaling events, cell-cell
communication, cell adhesion, differentiation, inflammation, and tumor cell
metastasis [ 1-31. Moreover, defects in the production of glycoconjugates cause
a series of human diseases referred to as congenital disorders of glycosylation
(CDG) [4,51. In prokaryotes, carbohydrates are essential constituents of
bacterial cell walls; consequently, agents that block their incorporation can
function as novel antimicrobials. These examples underscore the value
of understanding glycoconjugate biosynthesis and function for human
health.
11.1.1
Introduction
One barrier to understanding glycoconjugates is that their function is often

only manifested in the context of the organism or physiologically relevant
environment. For example, the loss of a glycosyltransferase can have no effect
on eukaryotic cells grown in culture, but can have significant effects on the
organism [GI. Similarly, some glycoconjugates in prokaryotes are likely to
function only in pathogenesis [7]. Genetic approaches, such as the generation
of knockout mice or the application of RNA interference, can illuminate
ISBN: 978-3-527-31150-7
636 I J Advances in Sugar Chemistry
I the physiological roles of glycoconjugates. The use of compounds that block
a specific protein-carbohydrate interaction or that inhibit the biosynthesis
of carbohydrates would facilitate insights complementary to those obtained
using only genetic approaches. Here, we will discuss the issues that have
complicated the efforts to determine how carbohydrates function, the tools
that have been developed to enhance our understanding, and the advances
in the generation and identification of chemical agents to probe carbohydrate
function.
11.1.2
11.1.2.1 Protein-Carbohydrate Interactions

The diversity of carbohydrate structures is a major obstacle to understanding
their function. Carbohydrate units can be connected in a variety of ways to
afford many structural isomers and therefore thousands of polysaccharides.
Moreover, complex glycoconjugates can be further elaborated via enzymes
that add acyl, sulfate, and phosphate groups to sugar heteroatoms. Even subtle
differences in functional group display can result in drastic differences in
the specific proteins that recognize oligosaccharides and the affinity of the
resulting complexes. Biologically relevant glycoconjugates are synthesized by
the actions of many different enzymes. Activated sugar-nucleotide substrates
are produced by nucleotidyltransferases and utilized by glycosyltransferases
for the addition of carbohydrate units to a growing saccharide chain. Finally,
glycoconjugates can also be heterogeneous. An oligosaccharide structure can
differ depending on environmental conditions, cell type, and other variables.
Thus, it is often very difficult to ascertain the substrate(s) of a carbohydrate-
binding protein or enzyme of interest.
Advances in chemistry are facilitating the identification of glycocon-
jugate ligands for target lectins of interest. Access to improved mass
spectrometers and new methods for sample preparation and analysis
are critical for determining the oligosaccharide sequences of physiologi-
cal glycoconjugates [8,91. Additionally, the advent of new methods for
carbohydrate and glycoprotein synthesis provide access to homogeneous
samples of oligosaccharides and glycoproteins [ 10-201. Glycoarrays, which
can be generated using isolated glycoconjugates or chemically synthe-
sized oligosaccharides or glycoconjugates, have emerged as useful tools
to rapidly assess lectin specificity [21-241. Increasing access to these
complex biomolecules has dramatically improved our ability to probe pro-
tein-carbohydrate interactions.
Chemical approaches also have fueled the development of new assays to
study protein-carbohydrate interactions. The features of these interactions
mandate such novel strategies. When extracellular carbohydrate-binding
events are investigated in solution, the resulting complexes are often of low
71. I The Searchfor Chemical Probes to Illuminate Carbohydrate Function I 637
affinity. Specifically, monovalent protein-carbohydrate dissociation constants

are typically on the order of to M, and many protein-carbohydrate
interactions are multivalent [25-291. Both these factors complicate the
evaluation of the binding kinetics and thermodynamics of these proteins.
Thus, assay selection can be crucial for identifying those compounds that
interfere with the target process under physiologically relevant conditions. For
example, selectins are involved in the attachment and rolling of leukocytes
on the vascular surface. Like most proteins that bind carbohydrates, their
-
affinity for the monovalent tetrasaccharide is very weak (& 1 mM) thus a
multimeric carbohydrate display is necessary for effective assay development.
Additionally, because the selectins act under blood flow, static assays
can provide dramatically different results than the more physiologically
relevant flow-based methods [30]. Unfortunately, assays that closely mimic
physiological conditions are often of very low throughput and are therefore
not ideal for ligand discovery.
Despite these challenges, a number of techniques have been applied
successfully to study either monovalent or multivalent protein-carbohydrate
interactions, including fluorescence anisotropy assays 131- 371, enzyme-linked
immunosorbant assays (ELISA),cell agglutination assays, and surface plasmon
resonance studies 138-401. Carbohydrate affinity screening using derivatized
latex beads, magnetic particles, agarose, or Sepharose resins is another
promising approach [41]. As mentioned previously, glycoarrays are useful
new tools, and have been used to characterize protein-carbohydrate [23,
42-45] and enzyme-carbohydrate [39, 46-48] interactions and to examine
the adhesion properties of hepatocytes, leukocytes [49], and bacteria [SO].
These successes emphasize the value of high-throughput assay formats for
elucidating carbohydrate function.
Many of these techniques have also been successfully utilized for the
identification of inhibitors of protein-carbohydrate interactions. ELISAs are
often employed, and their utility is illustrated in the efforts to identify selectin
inhibitors 151-531. However, other assays such as fluorescence polarization
[33, 541 and carbohydrate affinity matrices [55] have also been utilized. These
methods have provided valuable information about the substrate specificity
of a number of carbohydrate-binding proteins and led to the discovery of
many useful inhibitors. Still, there remains an acute need to develop effective
high-throughput assays.
Another key issue for identifying inhibitors of protein-carbohydrate
interactions is the lack of information on what types ofcompounds might target
lectins. Oligosaccharides that resemble the known or putative physiological
ligand are the logical starting point, yet these compounds have drawbacks.
They are typically polar, have low binding affinities, and often lack specificity
(i.e., they interact with many related lectins). Moreover, even with advances
in carbohydrate synthesis, it is difficult to rapidly assemble oligosaccharide
derivatives to optimize their potency. These challenges are also apparent in
the efforts to block carbohydrate-modifying enzymes.
638
I 11.1.2.2
1 1 Advances in Sugar Chemistry
Carbohydrate-modifyingEnzymes
For understanding glycoconjugate function, a complementary strategy to
inhibiting protein-carbohydrate interactions is to block glycoconjugate
assembly. The development of strategies for high-throughput analysis of
enzymes that act on carbohydrates or glycoconjugates is also challenging. Many
such enzymes use identical or similar glycosyl donors; therefore, it is difficult
to determine their acceptor specificity. For example, there are several hundred
glycosyltransferases in humans [SG], and many utilize similar or identical
sugar-nucleotide substrates. Traditional proteomics-based strategies, such as
two-dimensional gel electrophoresis [57] and isotope-coded affinity tags [58],
can provide valuable information about protein abundance. However, these
experiments give no information about enzyme activity levels or substrate
specificity. The need for this information has prompted the development of
several new strategies [59]. Recently, Pohl and coworkers reported an assay
based on mass spectrometry for the study of a rabbit muscle phosphorylase
[GO] and sugar nucleotidyltransferases from yeast and Escherichia coli [ G l ] . This
group also has reported the design of a library of mass-differentiated substrates
to examine the substrate specificity of glycosidases [G2]. Additionally, several
research groups have used sugar derivatives to directly label and detect active
enzymes [G3-G5]. Finally, carbohydrate microarrays have also been utilized
to explore the substrate specificity of carbohydrate-utilizing enzymes [39,
481. These tools have provided valuable information about the activity and
specificity of carbohydrate-utilizing enzymes. This knowledge is fundamental
and also critical for developing assays that monitor the activities of biosynthetic
enzymes.
Another issue facing those interested in developing inhibitors of glycocon-
jugate biosynthesis is what types of compounds to test as inhibitors. Many
known ligands for carbohydrate-utilizing enzymes are transition state analogs.
For example, imino sugars are commonly used to mimic oxocarbenium ions
that serve as intermediates in glycosidase or glycosyltransferase reactions [GG].
Although transition state analogs have provided important information about
many enzymes, they often lack selectivity for the target of interest. Moreover,
few of these compounds are cell permeable. Unlike protein-carbohydrate
interactions that occur on the surface of the cell, carbohydrate processing
occurs within the cell, necessitating the development of cell-permeable lig-
ands to investigate the roles of carbohydrate-utilizing enzymes within an
organism. Genetic knockout animals and human genetics have uncovered
new and unexpected roles for enzymes that participate in glycoconjugate
biosynthesis [GI. Some of these enzymes, however, are essential for early devel-
opment. Cell-permeable compounds that block these enzymes would offer a
number ofbenefits, including the ability to exert temporal control over enzyme
function [G7].
One of the most efficient ways to identify "cell-permeable" ligands is
through the utilization of high-throughput screens. Identification of inhibitors
through this method has been hampered by the lack of effective assays.
1 I . 1 The Searchfor Chemical Probes to llluminate Carbohydrate Function I 639
Recently, however, several such assays have been developed for the study
of carbohydrate-enzyme interactions. As mentioned previously, carbohydrate
microarrays can be employed for inhibitor identification. Indeed, they were
recently used by Wong and coworkers to identify fucosyltransferase inhibitors
from a small library (85 compounds) of triazole-containing compounds [4G,
471. Kiessling [G8] and Walker [G9] have reported high-throughput binding
assays that use fluorescence polarization to facilitate the identification
of ligands for uridine 5’-diphosphate-galactopyranose mutase (UGM) and
MurG, enzymes that utilize nucleotide-sugar substrates and are involved
in bacterial cell wall biosynthesis. These assays were used to screen
large commercially available small molecule libraries (-16 000 and -49 000
members respectively). The availability of data from high-throughput screens
such as these may lead to the identification of key scaffolds for inhibitor
design. Such information will guide the development of effective probes for
glycobiology.
11.1.2.3 Inhibition Strategies

Given the barriers to identifying inhibitors of protein-carbohydrate interac-
tions and carbohydrate-modifying enzymes, it is perhaps not surprising that
most inhibitors studied to date are analogs of natural carbohydrate substrates.
Inhibitors based on the sugar scaffold, or “carbohydrate-derived” glycomimet-
ics, have been used extensively to explore saccharide binding events [GG].Many
incorporate structural alterations to improve their affinity or stability over the
natural sugar substrates (Fig. 11.1-1).Common strategies involve the removal
of unnecessary functional groups (e.g., hydroxyl groups) or the addition of
hydrophobic or charged groups to alter the polarity of the ligand and, thereby,
facilitate additional interactions. Other designs incorporate changes in the
pyranose ring to afford compounds with enhanced stability or altered elec-
tronic properties, such as imino sugars. Glycosidic linkages have also been
replaced with a carbon or sulfur to yield a more stable substrate. Rarely do
these kinds of changes result in high potency inhibitors. Still, multiple iterative
rounds of synthesis, testing, and redesign have resulted in the generation of
effective ligands.
An alternative approach to the use of monomeric sugar mimics is to
employ multivalent displays of either natural saccharides or glycomimetics.
A number of reviews describe the generation and applications of multivalent
ligands [28, 29, 40, 70-741. There are several key variables that determine
the activity of synthetic multivalent ligands: epitope valency and density;
and the arrangement and flexibility of the binding epitopes [29, 751. For
example, well-defined, low-molecular-weight dimeric sugar displays have
been employed as have more complex, high-molecular-weight ligands that
display hundreds of copies of a binding epitope. Structurally distinct
scaffolds have been used to create multivalent carbohydrate displays including
dendrimers, linear polymers, neoglycoproteins, and polydisperse polymers.
640
I I 1 Advances in Sugar Chemistry
Heterocycle
H $ L O H olf'&- HHO O q
HO
OH HO 8P-OR
0-
Fig. 11.1-1 Common strategies for glycomimetic design
The biological processes that have been studied using these ligands range
from virus interaction with host cells [73, 76, 771, bacterial toxin binding
[78, 791, and adhesion of leukocytes to endothelial cells [30, 74, 80, 811.
Thus, multivalent ligands have been used to explore protein-carbohydrate
interactions, and they often serve as potent inhibitors. The identification of
monovalent ligands of modest affinity can be leveraged to create multivalent
probes.
As the aforementioned examples highlight, most efforts to inhibit
either protein-carbohydrate interactions or the enzymes responsible for
glycoconjugate biosynthesis have focused on the utilization of carbohydrates
and their derivatives. Many of the available compounds, however, are
not optimal for studies in cells or organisms because they have low
binding affinity and selectivity, poor metabolic stability, and limited cell
permeability. Additionally, the synthesis of carbohydrate derivatives can
be difficult and labor intensive, and many iterations may be required
to improve the activities of the typical low-affinity carbohydrate leads.
Therefore, attention has recently turned to the design of compounds that
are not derived from carbohydrate building blocks. This review highlights the
development of noncarbohydrate-like ligands to study the physiological roles
of carbohydrates. First, we discuss the approaches to examine lectins, receptors
that use sugar-binding interactions to facilitate cell adhesion or cell signaling
events.
In conjunction with our overview of glycomimetics that block pro-
tein-carbohydrate interactions, we also discuss strategies to develop inhibitors
of carbohydrate-processing enzymes; this section focuses on the enzymes in-
volved in bacterial cell wall biosynthesis because it is an area in which many
new advances have been made. Enzymes that utilize sugars and synthesize
glycoconjugates unique to pathogens have been identified, and cell-permeable
inhibitors can be used to explore their biological roles or validate a potential
11.7 The Searchfor Chemical Probes to flluminate Carbohydrate Function
therapeutic target. The scaffolds identified in this work may also be applicable
to the development of probes of other prokaryotic and perhaps even eukaryotic
carbohydrate-utilizing enzymes.
11.1.3
General Considerations: Cell-surface Carbohydrate Recognition Interactions
11.1.3.1 S-type Lectins, Galectins

Galectins or S-type lectins, are one of the several classes of mammalian
proteins that possess carbohydrate recognition domains (CRDs) [82, 831.
There are 14 known galectins (two families, galectin-1 and galectin-3) that
possess one or more CRD [25]. Galectins bind j3-galactosides in a shallow
and highly solvent-exposed binding pocket (Fig. 11.1-2(a)) [84]. They are
closely related in structure to plant lectins, such as concanavalin A. Most
and possibly all galectins are functionally multivalent; members of this class
either possess two CRDs within one peptide chain or form dimers or higher
order oligomers. These lectins are involved in cell growth, differentiation and
apoptosis, cell adhesion, chemoattraction, and cell migration [85]. Galectins
are produced in the cytosol and then secreted, differentiating them from
traditional lectins, which are often membrane-bound. These lectins lack
a transmembrane segment or a secretion signal peptide, making their
mechanism of secretion especially intriguing [%I. Consistent with their
ability to occupy two different cellular locations, galectins appear to have
important intracellular as well as extracellular functions. They have been
implicated in RNA splicing, apoptosis, and the cell cycle [87, 881, although
their intracellular roles remain obscure. Many questions about the roles and
actions of galectins are unanswered, and glycomimetics could serve to resolve
them.
To probe galectin function, several groups have used mutagenesis [89,
901, X-ray crystallography [91, 921, and modeling [93] to design galectin
inhibitors. It has been determined that the galactose 4- and 6-position
hydroxyl groups form hydrogen bonds to the protein, while H I , H3, H4,
and H5 form a hydrophobic patch that is in van der Waals contact with
a tryptophan side chain. The remaining hydroxyl groups do not directly
contact the protein, suggesting that these positions can be functionalized
without the loss of activity. Indeed, a structure of the galectin-3 and N-
acetyllactosamine complex, which was determined by X-ray crystallography
[84], suggests that derivatization of the galactose 3-OH and 4-OH positions
could afford higher affinity ligands (Fig. 11.1-3) [52]. Since the 4-OH is
involved in a critical interaction with the protein, it was hypothesized that
functionalization at the C3-position would yield more effective inhibitors. This
hypothesis is consistent with the affinity of some complexes of galectins
and natural j3-galactosides that possess carbohydrate residues at the 3-
position [93].
642
I 1 1 Advances in Sugar Chemistry
Fig. 11.1-2 (a) Structure of S-type lectin, galectin-3, b o u n d t o N-acetyllactosamine and

(b) C-type lectin, E-selectin, b o u n d t o sialyl Lex.
Fig. 11.1-3 Examination of galectin-3 bound t o N-acetyllactosarnine suggests that

additional affinity may be gained by derivatization at C3.
Compounds with functional groups at this 3-position have been shown

to have affinity for galectins. For example, Sorme et al. synthesized
12 N-acetyllactosamine (LacNAc, 1) derivatives with C3 functionalization.
Specifically, they generated a 3‘-deoxy-3‘-aminolactosamine derivative, which
could be further modified via N-acylation or N-sulfonylation reactions [52].
These compounds were tested for their ability to bind to galectin-3, a lectin
proposed to participate in the formation of the “immunological synapse”
between T cells and antigen-presenting cells [94]. Compounds with inhibitory
11. I The Searchfor Chemical Probes t o Illuminate Carbohydrate Function 1 643
Fig. 11.1-4 Natural substrate N-acetyllactosamine 1 and galectin inhibitor 2 .
activity were found, including a derivative approximately 50-fold greater

than LacNAc, using a competitive ELISA (the most potent inhibitor 2 is
depicted in Fig. 11.1-4, IC50 = 4.4pM) [52]. Structural data derived from the
inhibitor-galectin-3 complex was later obtained and utilized along with a
fluorescence polarization assay [ 351 to design and identify even more potent
LacNAc derivatives (& 1 320 nm, & of 2 = 880 nM) [54]. The resulting
compounds exhibit affinity that is more than 2 orders of magnitude tighter than
the parent LacNAc. Although these compounds are carbohydrate derivatives
rather than glycomimetics, they illustrate that sugar modification can be used
to identify inhibitors with much higher potencies than the lead compound.
As with most lectins, multivalency is an important component in galectin
binding. Accordingly, several groups have generated polyvalent displays of
lactose. Although these compounds were more potent than monovalent lactose,
their binding was nearly equivalent when compared on the basis of lactose
residue concentration [95,96].A recent study identified several peptidic ligands
for a galectin [97, 981; however, to date no other noncarbohydrate probes have
been utilized for their study. Undoubtedly, such glycomimetics can further
our understanding of galectin function.
11.1.3.2 C-type Lectins, Selectins

Many cell-surface interactions are mediated by selectins, which are members
of a large class of Ca2-t-dependent sugar-binding proteins known as the
C-type lectins. There are a number (>70) of human proteins that contain
C-type lectin-like domains (CTLDs). Many of these bind carbohydrates in a
Ca2+-dependentmanner, though some appear to bind proteins, lipids, and even
sugar moieties in a Ca2+-independentmanner [99]. CTLD-containing proteins
that act via carbohydrate-binding include the selectins, mannose-binding
proteins (MBPs) and dendritic cell-specific intracellular adhesion molecule-3-
grabbino-non-integrin(DC-SIGN),all ofwhich are involved in immune system
function. C-type lectins are produced either as secreted soluble proteins, such
as MBPs, or as transmembrane proteins, such as the selectins and DC-SIGN.
In the immune system, C-type lectins mediate both adhesion and pathogen
recognition events. MBPs are involved in the recognition of pathogens [loo];
selectins mediate cell-cell adhesion [loll; and DC-SIGN is involved in both
644
I processes [102].Most glycomimetic inhibition strategies reported to date have
focused on the selectins.
The selectins were first identified in 1989 as proteins that participate in
the inflammatory immune response. Three members of this class have been
identified; they are designated as E- [103], P- [104],and L-selectin [lo51 with
reference to the cell type on which they were discovered (endothelium,platelets,
and lymphocytes respectively).The selectins facilitate migration of leukocytes
into tissue (Fig. 11.1-5). Although neutrophil recruitment is necessary for
an effective immune response, overrecruitment causes widespread tissue
inflammation and damage and is the known cause of asthma, septic
shock, psoriasis, and rheumatoid arthritis [IOG, 1071. As a result of their
biological significance,many researchers have focused on identifying selectin
inhibitors.
The selectins share many features. Each ofthe selectins has a similar domain
structure; they are composed of an N-terminal Ca2+-dependentlectin domain,
an epidermal growth factor repeat, and modules similar to those found in
certain complement binding proteins [log].In the CRD, human selectins share
>SO% homology. Similar to galectins, these proteins bind their substrates in
shallow, solvent-exposed pockets (Fig. 11.1-2(b))[log]; however, the tertiary
structures of the selectins and the galectins are dramatically different, as is
their carbohydrate recognition specificity. Selectins have been shown to bind
sialyl LeX(sLeX)or the related sialyl Lea in vitro; yet in physiological settings,
each selectin appears to recognize more complex glycoconjugates. Still, the
finding that each selectin can recognize these tetrasaccharides has led to their
use as blueprints for inhibitor design.
Both monomeric and polymeric derivatives of sLeXhave been generated
[72, 74, 801. Of these sLex-basedcompounds, the oligomeric and polymeric
inhibitors are the most potent. They exhibit significant increases in activity
compared to their monovalent counterparts [30, 1101. Despite these potencies,
Fig. 11.1-5 Selectins mediate the rolling ofwhite blood cells, causing them t o adhere t o
and then pass through the endothelium toward the site o f infection.
7 7 . J The Searchfor Chemical Probes to Illuminate Carbohydrate Function 1 645
the search for high affinity monovalent inhibitors has continued. Specifically,
the therapeutic value of selectin inhibitors has prompted considerable effort
to develop more conventional “druglike” compounds that block these
protein-carbohydrate interactions. Moreover, higher affinity monovalent
ligands could be used to generate even more potent multivalent inhibitors.
11.1.4
Applications: Identification o f Inhibitors of Protein-Carbohydrate Interactions
One of the first examples ofthe identification of non-carbohydrate inhibitors of

a selectin came from Kondo and coworkers. Although structural data for E- or
P-selectin bound to sLeXwas unavailable at the time of their study, they utilized
hypothesized interactions between E-selectin and a previously identified sLeX
mimetic 3 [111] to devise a pharmacophore 4 (Fig. 11.1-6) [112]. They used
this model to perform a high-throughput screen of a commercially available
compound database in silico. These studies led first to the identification of
lead structures and ultimately to the design of compound 5 (Fig. 11.1-7).
Compound 5 was found to inhibit E-, P-, and L-selectin (I& values of 86, 6.1,
and 30 yM respectively in an ELISA). In subsequent investigations, analogs
of 5 were synthesized to develop structure-activity relationships, and several
compounds with the ability to differentially inhibit specific selectins were
identified [Sl]. It is not clear, however, whether compound 5 and its congeners
are monomeric inhibitors of the selectins. Given their long hydrophobic alkyl
substituents, compounds of this class may be acting as aggregates. Still, the
results suggest that this general approach could be explored further.
In an alternative approach, the Gravel research group utilized a cyclohexane
scaffold 6 to generate a mimetic of sLeX(Fig. 11.1-8) [113].As in the previous
example, these experiments were performed prior to the publication of
structural data for E- or P-selectin bound to sLex.They utilized the unbound
structures of these proteins to perform docking experiments with both the
Hydrophobic interaction
Ionic interaction
HOoH
Calcium coordination
3 4
Fig. 11.1-6 Comparison ofthe previously identified sLeXmimetic 3 and the

pharmacophore design 4.
646
Fig. 11.1-7 Potent inhibitor S of E-,P-, and

0
VC,,H35 L-selectins.
QCOOH
Fig. 11.1-8 Bicyclohexyl mimetic designed t o probe E- and

r d o H P-selectins.
natural ligand and their proposed library of compounds. Pivotal to the design
of their docking experiments was the hypothesis that interactions between
the 2- and 3-OH groups on the fucose unit of sLeXand the calcium ion, and
the carboxylic acid group of the sialic acid moiety and Arg97 are essential
for binding. Their modeling studies suggested that a bicyclic mimic such
as compound 6 , though significantly smaller than the natural ligand, would
possess the features necessary to favorably interact with the receptor. Indeed,
these molecules did have inhibitory activity comparable to the natural ligand.
However, the authors found that both enantiomers (only one of which has a
display of hydroxyl groups similar to that of fucose) had the same activity as
measured by a cell-based competition assay (ICso = 4.5-7.0 mM). This result,
along with the elucidation of the structure of both P- and E-selectins bound to
sLeXby X-ray crystallography [log],suggests that their model was only partially
correct. Structural data confirms that interactions between the carboxylic acid
moiety and Arg97 are important for binding. These data also indicate that it
is the 3- and 4-OH groups that are important for substrate binding not the
hypothesized 2- and 3-OH groups. This difference likely explains the lack of
specificity of compound 6 and its enantiomer.
As previously mentioned, high-throughput screening may lead to potent
inhibitors of protein-carbohydrate interactions. Some success in the selectin
field has been achieved by Slee et al., who identified several potent inhibitors
of P-selectin by screening a library of compounds in an ELISA [53]. After
initial lead identification, they performed modeling studies that suggested
ligand modifications that would enhance the activity of their ligand. They
ultimately identified a compound, 7, with very good P-selectin inhibitory activity
(ICso = 300 nM) (Fig. 11.1-9). What sites on P-selectin this ligand binds,
1 1 . 1 The Searchfor Chemical Probes t o Illuminate Carbohydrate Function I 647
Ho2cw \ /
-
/ \
N,ci~H33
H
Fig. 11.1-9 Potent inhibitor of P-selectin
and selectin-mediated rolling in uiuo.
however, are not apparent. Given that the lectin interacts with glycosylated
peptide sequences that contain sulfated tyrosine residues, compound 7 may
compete with the peptide sequence. Interestingly, compound 7 bears some
structural resemblance to the Kondo inhibitor 5, suggesting that this type
of “trimodal” scaffold may be a general selectin inhibitor. As mentioned
for the Kondo ligands, it is not clear whether this compound, with its long
alkyl substituent, acts as a true monovalent inhibitor. Still, this compound is
notable because it was also found to inhibit selectin-mediated rolling in vivo
and dramatically reduce inflammation in a mouse peritonitis model.
The vast majority of glycomimetic studies have been targeted toward one
or two members of each lectin class. Strategies in which the same scaffold
can be used to derive specific inhibitors of different members of a large
class of proteins are even more powerful. Until recently, general scaffolds for
inhibitors of protein-carbohydrate interactions had not been described. In
contrast, peptidomimetic scaffolds such as benzodiazepines have been shown
to be useful for generating a variety of agonists and antagonists to G-protein
coupled receptors [114, 1151. Kiessling and coworkers sought to develop this
type of privileged scaffold for use in generating glycomimetics. To ascertain
whether such a strategy could be implemented, they targeted C-type lectins.
Many C-type lectins bind oligosaccharides that possess a key carbohydrate
residue with the axial-equatorial-equatorial hydroxyl orientation in mannose
(and L-fucose).While these groups can be essential for binding, substitution
at C1 and C6 of the mannosylated (or fucosylated) ligand often varies [116].
Thus, Schuster et al. utilized shikimic acid 8 as a building block to synthesize
mannose (fucose)-like compounds 9 (Fig. 11.1-10) 1551. Functionalization of
shikimic acid through the conjugate addition of a nucleophile (i.e.,a thiolate)
generates a structure that possesses the desired hydroxyl group orientation,
while introducing a site of diversity. Further library diversification can be
achieved by varying the amino acid substituent at the acid moiety ( R I ) , adding
648
I 1 I Advances in Sugar Chemistry
Fig. 11.1-10 Shikimic acid 8 can be

91 functionalized to yield a mannose
OWOH H
N,0
\ m i m e t i c 9.
8 9
dithiols (RZ), and subsequently functionalizing the resulting free thiol with
alkyl or benzyl bromides (R3). To test this strategy, they synthesized a focused
library of 192 compounds, which was screened for inhibition of MBP. From
this small library, they identified 10 compounds with activity comparable to or
better than the known ligand, a-methyl mannopyranoside (IC50 = 4-14 mM).
The high hit rate underscores the utility of this strategy.
11.1.5
Overview and Future Development: Inhibition of Protein-Carbohydrate
Interactions
While several researchers have identified non-carbohydrate ligands for

protein-carbohydrate interactions, this research area is still in its infancy.
To date, most inhibitors of protein-carbohydrate interactions are based on the
structure of the natural ligand. As highlighted in the above examples, rational
design has facilitated the identification of effective inhibitors. Several research
groups have developed non-carbohydrate probes of the selectins using design
strategies aimed at mimicking key interactions between the natural ligand and
the calcium ion. Nevertheless, the potent glycomimetics have features that
suggest that they may not be functioning as monomeric inhibitors. Still, it
is intriguing that the inhibitors identified in a high-throughput screen share
common features with those developed through rational design. Together,
these results emphasize the need to develop high-throughput screens that
can be used to identify new scaffolds. One of the problems in optimizing
glycomimetics is that it is often difficult to rapidly synthesize variants of the
lead compounds. Thus, it is critical that additional privileged glycomimetic
scaffolds be developed that can be readily diversified.
11.1.6
General Consideration: Inhibitors o f Sugar- Nucleotide-binding Enzymes
Prokaryotes and eukaryotes devote significant resources to the biosynthesis

of glycoconjugates. As a result of recent genome-sequencing projects, more
7 1 . I The Searchfor Chemical Probes to Illuminate Carbohydrate Function I 649
than 7200 glycosyltransferase-related sequences have been identified. This

accounts for about 1% of the open reading frames in a given organism [56].
Nucleotide-sugar substrates are used by organisms to make multisaccharide
units or to perform posttranslational modifications. The enzymes that facilitate
these processes primarily belong to the glycosyltransferase family; however,
a number of other enzymes catalyze critical reactions that depend on
these substrates. Given the many different known roles they play, studying
the function and mechanism of these biosynthetic enzymes is of critical
importance. The lack of cell-permeable inhibitors of these enzymes, however,
is a barrier.
The importance of glycoconjugate biosynthesis is underscored by the
number of human diseases that are associated with mutations in the
biosynthetic machinery. For example, reduction in the activity of 8-1,4-
galactosyltransferase-T1 [ 1171 appears to be a factor in rheumatoid arthritis,
and loss of ~-1,4-galactosyltransferase-T7 [l181 activity has been implicated in
progeroid-type Ehlers-Danlos syndrome [ 1191. Additionally, there are a growing
number of congenital disorders of glycosylation (CDGs), multisystemic
diseases caused by defects in the synthesis and processing of N-linked glycans
[4, 51. These glycans are involved in protein-carbohydrate interactions vital to
normal function such as T-cell clustering through the galectins [94]. Moreover,
N-linked glycoconjugates are thought to play crucial roles in protein folding,
localization, and half-life [120, 1211. The study of N-glycan biosynthesis in
mammals is complicated because when key glycosyltransferases are knocked
out in mice, embryonic lethality can result [122, 1231. Thus, the temporal
control offered by chemical inhibitors could have a major impact on our
understanding of enzymes that mediate glycan biosynthesis.
There are several natural products that block glycoconjugate biosynthesis,
including the N-glycosylation inhibitor tunicamycin 10 and glycosidase
inhibitors such as castanospermine and deoxymannojirimycin (Fig. 11.1-11).
These probes have proved valuable in a number of studies. Tunicamycin blocks
a key transphosphorylation event required for N-glycan biosynthesis, which
is initiated by the assembly of N-acetylglycosamines (GlcNAc)and a dolichol
pyrophosphate [ 1241. Tunicamycin inhibits this assembly and has therefore
been useful in the study of N-glycoprotein deficiency effects [125]. Most studies
10 HOI \OH 1-Deoxymannojirimycin
Fig. 11.1-11 Tunicarnycin 10 is an inhibitor o f N-glycan biosynthesis, while natural

products such as 1-deoxyrnannojirirnycin inhibit glycosidase function.
650
I using this compound have been performed in cell culture, but there have been
several reports on the effects of tunicamycin treatment on sea urchin [126],

Xenopus [127], and chick embryos [128], and recently Caenorhabditis elegans
[129]. These studies indicate that tunicamycin has a dramatic effect during
early development; thus, they highlight the utility of small molecule probes
for investigating carbohydrate-processing enzymes. Similarly, the natural
product glycosidase inhibitors can also be used to investigate glycoconjugate
biosynthesis [130, 1311. These agents, such as deoxymannojirimycin, function
as transition state inhibitors; their charged amino groups presumably mimic
the charge distribution of an oxocarbenium ionlike transition state. Thus,
nature has provided two general inhibitor strategies, both of which depend
on carbohydrate derivatives. These have spawned many efforts to design
bisubstrate inhibitors or transition state analogs. Still, many of the resulting
inhibitors have low cell permeability or lack the specificity required to target
a single enzyme. Thus, efforts to develop new inhibitor strategies are being
sought.
Many of these new inhibitor strategies have emerged from studying gly-
coconjugate biosynthesis in prokaryotes. Indeed, sugar-nucleotide-utilizing
enzymes are of critical importance in prokaryotes. All bacteria (gram-positive,
gram-negative, and acid-fast or mycobacteria) utilize sugar-nucleotide sub-
strates for the construction of their cell walls. The carbohydrate-containing cell
wall acts as a formidable barrier against cellular destruction, and compromised
structural integrity can result in the loss of cell viability [132, 1331. Despite
the differing compositions of these cell walls, many of the same enzymes
are involved in their biosynthesis. Peptidoglycan is the most well studied of
these crucial cell wall components and its synthesis is a common target of
antibiotics [134, 1351. Unfortunately, traditional antibiotics have begun to lose
their effectiveness due to the emergence of antibiotic-resistant strains of many
gram positive, gram negative, and mycobacteria [ 134- 1361. Common antibi-
otics do not target enzymes that mediate carbohydrate biosynthesis. Thus, the
development of ligands to study and inhibit these enzymes may facilitate the
development of new antimicrobial agents.
The structurally complex mycobacterial cell wall provides an example of the
importance of carbohydrate residues (Fig. 11.1-12).In mycobacteria, the inner
lipid membrane is attached to a peptidoglycan layer, which is composed of
a complex structure of peptides and sugar moieties. The peptidoglycan layer
is tethered to an arabinogalactan layer through a rhamnose-GlcNAc sugar
linker. This rhamnose-GlcNAc disaccharide is found only in bacteria and its
biosynthetic enzymes have been shown to be essential for mycobacterial growth
[134, 1371.The arabinogalactan layer also contains sugar residues that have not
been found in humans, specifically arabinofuranose and galactofuranose [ 135,
1381. The enzymes involved in synthesis of the arabinogalactan are necessary
for mycobacterial viability [139-1411. Thus, development of inhibitors of the
key biosynthetic enzymes would provide valuable therapeutic leads and useful
probes of cell wall biosynthesis.
I I. I The Searchfor Chemical Probes to Illuminate Carbohydrate Function 1 651
Fig. 11.1-12 Mycobacterial cell wall components.
Efforts to generate inhibitors of glycan biosynthesis suggest that agents that

function in this capacity can be identified [66, 1421. Interestingly, much of
the binding affinity for the natural sugar-nucleotide substrate arises from
nucleotide-protein interactions. Thus, inhibitors that exploit this binding
region should be more potent. Here, we will describe recent advances in
understanding and inhibiting enzymes that use nucleotide-sugar building
blocks for glycoconjugate biosynthesis. To date, several efforts have focused on
enzymes critical for bacterial cell wall assembly. These general strategies will
undoubtedly be useful for investigating eukaryotic glycoconjugate biosynthesis
pathways as well.
11.1.7
Applications: Identification of Inhibitors of Sugar- Nucleotide-binding Enzymes
11.1.7.1 Probe Identification through High-throughput Screening

The recent development of high-throughput screens for several sugar-nucleo-
tide-processing enzymes has aided in finding inhibitors of glycan biosynthesis.
An example of the utility of this approach is highlighted by the identification of
compounds that block members of the Mur family that use nucleotide-sugar
substrates. The Mur family of enzymes mediates peptidoglycan construction in
eubacteria (Fig. 11.1-13).MurC, MurD, MurE, and MurF are involved in the for-
mation of peptide bonds, and inhibitors for these enzymes have been reviewed
recently [143, 1441. In this chapter, we will focus on the enzymes that uti-
lize sugar-nucleotide substrates including MurA ( U DP-N-acetylglucosamine
enolpyruvyltransferase), MurB (UDP-N-acetylenolpyruvylglucosamine reduc-
tase), and MurG (glycosyltransferase).
MurA is the enzyme responsible for the first committed step in bacterial
cell wall biosynthesis, catalyzing the transfer of phosphoenolpyruvate (PEP) to
position 3 of UDP-N-acetylglucosamine (Fig. 11.1-13). As with tunicamycin,
652
Fig. 11.1-13 Peptidoglycan synthesis proceeds through a complex set o f reactions,

primarily glycosyl-transfers and amide bond-forming reactions.
nature has generated an inhibitor of this enzyme: the natural product antibiotic,
fosfomycin 11. Fosfomycin covalently labels a cysteine residue in the PEP
binding site of MurA and renders the enzyme inactive [145].A structure of the
MurA-fosfomycin complex, determined by X-ray crystallographic analysis,
has provided valuable information about the complex [146]. Moreover, it has
been utilized in the design of inhibitors of this sugar-nucleotide-processing
enzyme [143, 1461.
Several research groups have reported the identification of non-carbohydrate
inhibitors of MurA [146-1481. For example, Bush and coworkers identified
inhibitors using a high-throughput screen of a library of compounds in an assay
that monitored formation of inorganic phosphate (Fig. 11.1-13)[148].Three of
the identified inhibitors exhibit lower ICso values than does fosfomycin 12- 14
(Fig. 11.1-14). Modeling and inhibition studies were used to determine the
likely binding mode of these compounds. These data suggest that the identified
inhibitors are noncovalently binding at or near the PEP binding site, leaving
the sugar site unoccupied. These compounds are not glycomimetics, yet they
suggest that targeting unique features of the sugar-nucleotide-binding site
can lead to potent inhibitors.
High-throughput screening techniques have also been utilized to identify
inhibitors of MurG, a glycosyltransferase that mediates one of the final
steps of peptidoglycan synthesis (Fig. 11.1-13) [69]. Rather than assaying
for activity, the Walker group screened for compounds that could inhibit
binding of the substrate UDP-GlcNAc. With a fluorescence polarization
assay, they tested a commercially available library of approximately 49 000
druglike compounds, and identified several MurG inhibitors containing a
2-thioxo-4-thiazolidinone core (15, Fig. 11.1-15, K, = 1.3 pM, ICso = 1.4 pM)
[69, 1491. Using the MurG structure determined by X-ray crystallography
[150], they modeled the complexes to explore the possible binding mode(s) of
?h
1 7 . 1 The Searchfor Chemical Probes to Illuminate Carbohydrate Function 1 653
Fig. 11.1-14 Fosfornycin 11 and several

other MurA inhibitors 12-14.
- 0.r- 0 S
0
0
11 12
pp:h%
13 14
Fig. coworkers.
and 11.1-15 MurC inhibitor identified by Walker
15
this scaffold. The authors suggest that the thiazolidinone heterocycle could
mimic the diphosphate moiety by engaging in hydrogen-bonding interactions.
Presumably, the carbonyl (and carbonyl-like) moieties of the heterocycle
interact with hydrogen-bond donors on the protein. The studies also suggest
that the thiazolidinone substituents interact with the uridine and sugar-binding
regions of the protein. More recently, these inhibitors have been shown to
selectively block MurG over several other enzymes that utilize similar or
identical substrates [149].
Inhibitors of enzymes that use sugar-nucleotide substrates have also been
found in the pathway that leads to arabinogalactan synthesis in mycobacteria
[151, 1521. Arabinogalactan is composed of two sugars derived from the
donors, UDP-arabinofuranose and UDP-galactofuranose (UDP-Galf). The
biosynthetic donor of galactofuranose moieties (UDP-Gar) is synthesized by
UGM and the Gay-containing oligosaccharides are assembled by the putative
enzyme, UDP-galactofuranosyltransferase. The most efforts to explore G a y
incorporation have focused on UGM.
UGM is responsible for the isomerization of the thermodynamically favored
UDP-galactopyranose to the less favored UDP-galactofuranose (Fig. 11.1-16).
Sugar-based probes have been employed to study both UGM [153-156] and
the transferase [ 1571, but only recently have non-carbohydrate inhibitors
been identified. The Bertozzi and McNeil groups used a design strategy
that appears similar to that used by nature for tunicamycin. Specifically,
they modified a uridine with substituents. From their uridine-based library,
654
UGM HO& 8
o-yo-yo
?
- bH bH 0- 0-
UDP-galactopyranose HO OH UDP-galactofuranose HO OH
93% 7%
Fig. 11.1-16 UGM is responsible for the isomerization of the UDP-galactopyranose t o

UDP-galactofuranose.
18
Fig. 11.1-17 Recently identified inhibitors of UCM.
they identified several inhibitors of UGM [158]. Although the results are
promising, the initial hits did not appear to be cell permeable. It will
be interesting to explore the specificity of such ligands. Although it is
not clear whether one can achieve selectivity against other UDP-sugar
binding enzymes with this strategy, tunicamycin acts selectively on its
target.
The Kiessling group pursued an alternative approach. Although the assay
used by Bertozzi and McNeil monitored UGM activity, a high-throughput
fluorescence polarization-binding assay was used by Soltero-Higgin et al.
to identify UGM inhibitors. As with the Walker screen, the hits identified
contain a thiazolidinone or related nitrogen-containing heterocyclic core
(16-18, & 2 4.0 yM, ICso 2 1.6 yM) (Fig. 11.1-17) [68]. It is intriguing that
these compounds have structural features similar to those identified for
MurG. These shared features include the five-membered ring heterocycle
and the 1,3-arrangement of the substituents. This display of functionality
likely facilitates interactions with the sugar-nucleotide-binding regions of
the protein. Unlike the most potent MurG lead 15, which displays one
aromatic and one aliphatic substituent, all the UGM inhibitors contain
11. I The Searchfor Chemical Probes to llluminate Carbohydrate Function I 655
two aromatic substituents. Both MurG and UGM inhibitors possess an

aromatic group, which may serve as a uracil mimetic. The second aromatic
functionality in the UGM inhibitors may act as a sugar mimic and/or
participate in hydrophobic interactions with the enzyme's cofactor, flavin
adenine dinucleotide (FAD). Soltero-Higgin et al. found that their most potent
compound shows selectivity; it blocks UGM activity but has little or no effect
on an a-l,3-galactosyltransferase.These leads, along with recently reported
information about the mechanism ofthis enzyme [159],can be used to develop
even more potent inhibitors.
Several inhibitors of the rharnnose synthetic pathway, a recently validated
mycobacterial target [137], have been also identified through the use of a
high-throughput screen [ 1601. McNeil and coworkers developed a microtiter
plate-based assay and examined 8000 commercially available compounds.
They identified 11 compounds that were active against one or more of
the enzymes in the rhamnose biosynthetic pathway (RmlB, RmlC, and
RmlD, percentage inhibition at 10 pM = 39-97%). Additionally, four of these
molecules were found to hinder Mycobacteriurn tuberculosis growth in culture
(minimum inhibitory concentration or MIC = 16-128 pg mL-'). One of the
most intriguing findings of these studies is the structural similarity of the
inhibitors identified. Interestingly, two of the four molecules contained the
same thiazolidinone core found by Walker and Kiesshg (15 and 16).
11.1.7.2 Design o f Effective Probes

The heterocycles identified as inhibitors of sugar-nucleotide-binding enzymes
are similar to these identified by researchers employing design strategies.
Andres et al. successfully utilized known structural information on MurB
(Fig. 11.1-13) complexed to its substrate, enolpyruvyl uridine diphosphate
N-acetylglucosamine (EP-UNAG), to design effective inhibitors [161]. Their
design incorporates a heterocyclic core decorated with three substituents 19.
They hypothesized that the acid functionality would provide ionic interactions
equivalent to that of the natural diphosphate substrate. Using an assay in which
enzyme activity was monitored by spectrophotometric detection of cofactor
(NADPH) consumption, they identified six inhibitors. The most effective of
these (19, Fig. 11.1-18)has good potency, IC50 = 7.7 pM.
This success attracted the attention of other research groups, prompting the
development of several structurally related inhibitors. Snyder and coworkers
utilized the best hit from the aforementioned study 19 to design compounds
with a more rigid core structure 20 [162].The authors point out that the scaffold
used by Andres et al. was generated as a mixture of four diastereomers. They
theorize that utilization of bioisosteric imidazolinone analogs, a core structure
that does not contain any stereogenic centers, might provide compounds
with potent activity without the difficulties associated with the biological
analysis of diastereomeric mixtures. The resulting compounds not only had
high inhibitory activity against MurB but also showed significant whole cell
656
c+
0' 0'
?I
+
19 20
Fig. 11.1-18 Potent MurB inhibitors developed bywalsh 19 and Snyder 20.
antibacterial activity, which compound 19 did not have. One of the most potent
inhibitors 20 (IC50 = 15 pM, MIC = 4 pg mL-') is depicted in Fig. 11.1-18.
To identify probes of rhamnose biosynthesis, Lee and coworkers developed
an in silico library of 3888 compounds that were based on heterocycle 19. The
authors selected RmlC, as they believed it to be the best drug target in this
biosynthetic cascade. It has high substrate specificity, a unique structure, and
lacks a cofactor binding site. They docked these compounds into the active
site of RmlC and selected compounds with the best affinity (the top 5%) for
synthesis. They reported the synthesis of 47 of the 144 prospects (each of
the 47 compounds was synthesized as the esterified and free acid forms, for
example, 21 and 22 in Fig. 11.1-19).Although they did not find any compounds
that potently inhibit bacterial growth, they were able to identify molecules 21
and 22 that can differentiate between two similar enzymes, RmlC and RmlD
(Fig. 11.1-19) [163]. This result provides additional evidence that selective
inhibitors of nucleotide-sugar-processing enzymes can be discovered.
To identify inhibitors of several Mur enzymes, Mansour and coworkers
synthesized a small library (-50 members) of urea- or carbonate-containing
Fig. 11.1-19 Inhibitors ofthe rhamnose biosynthetic pathway.

'n
1 1 . 1 The Searchfor Chemical Probes t o Illuminate Carbohydrate Function I 657
1
Fig. 11.1-20 The most potent urea-containing
inhibitor of MurA and Band bacterial growth.
/ A Y C N
F N N S
H H
23
compounds. They discovered several effective in uitro inhibitors of both MurA

and MurB [ 1641. Moreover, some compounds showed good antimicrobial
activity against several gram-positive bacteria. The core of each of the most
potent inhibitors contains a urea moiety (the most potent inhibitor, 23, is
depictedin Fig. 11.1-20,MIC = 0.5-64 pg mL-', IC50 for MurA>25 pg mL-I,
MurB = 19 pg mL-') [164]. Modeling studies (using MurB structural data)
suggest that these compounds occupy regions of the binding site spanned
by both the nucleotide and sugar portions of the substrate. The authors
propose that the urea occupies the phosphate-binding region and that a strong
hydrogen bond is formed between the carbonyl oxygen and an active site, lysine.
Additionally, they suggest that the two aromatic moieties could be occupying
the sugar and the nucleobase binding sites. Structural data are needed to
determine how these compounds are oriented within the binding site.
11.1.8
Overview and Future Development: Inhibitors o f Carbohydrate-processing
Enzymes
Despite the relatively small number of studies that have identified non-
carbohydrate inhibitors of sugar-nucleotide-processing enzymes, it is appar-
ent that structural commonalities exist between these inhibitors (Fig. 11.1-21).
Some authors have suggested that these core structures may be acting as
electronic mimics of the diphosphate through hydrogen-bonding interactions
with their protein-binding partners. It is also possible that these core elements
are simply effective spatial mimics of the diphosphate moiety. The oriented
display of substituents of these heterocyclic scaffolds appears to be conserved
throughout the currently developed probes, suggesting that the spatial orien-
tation enforced by these core elements is at least partially responsible for the
inhibitory activity of these compounds. Undoubtedly, much will be learned
from the continued pursuit of molecules based on these and similar core
structures.
While the identification of these core structures suggests a promising
direction for generating inhibitors of glycan biosynthesis, it also suggests a
potential problem. Specifically, given the aforementioned similarities between
these probes, it may be difficult or impossible to achieve selectivity for
targeting one enzyme over another. While this problem may arise, the current
data suggest that selective inhibitors can be developed. For example, despite
658
I J 7 Advances in Sugar Chemistry
Fig. 11.1-21 Several structurally and/or electronically related scaffolds have been
identified.
the large similarities between the MurG and UGM inhibitors presented here,
both the Walker and Kiessling groups report selective inhibition of their
target enzyme over related proteins [68, 1491. Thus, it seems likely that these
common core structures can be diversified to yield selective inhibitors of
many different sugar-nucleotide-utilizing enzymes. It is also possible that
information acquired from the study of bacterial sugar-processing enzymes
will provide clues for the development of probes for eukaryotic enzymes
that mediate glycan biosynthesis. In addition to its role in bacterial cell
wall biosynthesis, UGM is also found in eukaryotic parasites, such as
Leishmania, and multicellular organisms, such as C. elegans [ 1651. Therefore,
the thiazolidinone-based inhibitors identified for a bacterial UGM could be
tested for efficacy in a eukaryotic system. It will be intriguing to determine
whether these scaffolds or others will be identified as hits from screens with
eukaryotic enzymes. We anticipate that with the advent of cell-permeable
probes of glycan biosynthesis, a greater understanding of the roles of these
enzymes in human disease will emerge.
11.1.9
Conclusion
Elucidating the biological roles of glycoconjugates is difficult. Using genet-

ics, molecular biology, biochemistry, and chemistry, compelling evidence has
emerged that glycoconjugates control fundamental processes ranging from
developmental patterning [ 1661 to immune system function [167]. Despite the
power of current tools, inhibitors that can be used to explore key interactions or
biosynthetic pathways are largely lacking from our armamentarium. Still, sig-
nificant progress has been made toward the identification of potent inhibitors
of glycan biosynthesis and their utilization for understanding carbohydrate-
binding proteins and enzymes. Key elements enabling this progress are the
development of effective high-throughput assays and advances in chemical
syntheses, which provide access to defined carbohydrate substrates. It is
intriguing that common inhibitor structures have emerged from these stud-
ies, suggesting that some scaffolds may be well suited to occupy lectin or
nucleotide-sugar-binding sites. Undoubtedly, additional scaffolds will be un-
covered as more targets are investigated. We envision that the chemical probes
that result will provide insight into the biological roles of glycoconjugates.
References I659
References
1. G.E. Ritchie, B.E. Moffatt, R.B. Sim, 13. D. Kahne, Combinatorial approaches
B.P. Morgan, R.A. Dwek, P.M. Rudd, to carbohydrates, Curr. Opin. Chem.
Glycosylation and the complement B i d . 1997, I , 130-135.
system, Chem. Rev. 2002, 102, 14. P. Sears, C.-H. Wong, Toward
305-31 9. automated synthesis of
2. C.R. Bertozzi, L.L. Kiessling, oligosaccharides and glycoproteins,
Chemical glycobiology, Science 2001, Science 2001,291,2344-2350.
291,2357-2364. 15. C. Leimkuhler, 2. Chen, R.G.
3. T. Feizi, Carbohydrate-mediated Kruger, M. Oberthur, W. Lu, C.T.
recognition systems in innate Walsh, D. Kahne, Glycosylation of
immunity, Immunol. Rev. 2000, 173, glycopeptides: a comparison of
79-88. chemoenzymatic and chemical
4. S . Grunewald, G. Matthijs, J. Jaeken, methods, Tetrahedron: Asymmetry
Congenital disorders of glycosylation: 2005, 16,599-603.
a review, Pediatr. Res. 2002, 52, 16. P. Mowery, Z.Q. Yang, E.J. Gordon,
618-624. 0. Dwir, A.G. Spencer, R. Alon, L.L.
5. H.H. Freeze, Human disorders in Kiessling, Synthetic glycoprotein
N-glycosylation and animal models, mimics inhibit L-selectin-mediated
Biochim. Biophys. Acta 2002, 1573, rolling and promote L-selectin
388-393. shedding, Chem. Biol. 2004, 1 I ,
6. J.B. Lowe, J.D. Marth, A genetic 725-732.
approach to mammalian glycan 17. M.J. Grogan, M.R. Pratt, L.A.
function, Annu. Rev. Biochem. 2003, Marcaurelle, C.R. Bertozzi,
72,643-691. Homogeneous glycopeptides and
7. M.A. Schmidt, L.W. Riley, I. Benz, glycoproteins for biological
Sweet new world: glycoproteins in investigation, Annu. Rev. Biochem.
bacterial pathogens, Trends Microbiol. 2002, 71,593-634.
2003, 11,554-561. 18. Y. He, R.J. Hinklin, J. Chang,
8. A. Dell, H.R. Morris, Glycoprotein L.L. Kiessling, Stereoselective
structure determination mass N-glycosylation by staudinger
spectrometry, Science 2001, 291, ligation, Org. Lett. 2004, 6,4479-4482.
2351-2356. 19. D. Macmillan, A.M. Daines, Recent
9. J. Zala, Mass spectrometry of developments in the synthesis and
oligosaccharides, Mass Spectrom. Rev. discovery of oligosaccharides and
2004, 23,161-227. glycoconjugates for the treatment of
10. A. Holeman, P.H. Seeberger, disease, Curr. Med. Chem. 2003, 10,
Carbohydrate diversity: synthesis of 2733-2773.
glycoconjugates and complex 20. W. Zhang, Fluorous tagging strategy
carbohydrates, Curr. Opin. Biotechnol. for solution-phase synthesis of small
2004, 15,615-622. molecules, peptides and
11. S.J. Keding, S.J. Danishefsky, oligosaccharides, Curr. Opin. Drug.
Prospects for total synthesis: a vision Discov. 2004, 7, 2269-2272.
for a totally synthetic vaccine 21. T. Feizi, W.G. Chai, Oligosaccharide
targeting epithelial tumors, Proc. microarrays to decipher the glyco
Nutl. Acad. Sci. U S A . 2004, 101, code, Nut. Rev. Mol. Cell Bid. 2004, 5,
11937-1 1942. 582-588.
12. S. Hanson, M. Best, M.C. Bryan, 22. I . Shin, S. Park, M.R. Lee,
C.-H. Wong, Chemoenzymatic Carbohydrate microarrays: an
synthesis of oligosaccharides and advanced technology for functional
glycoproteins, Trends Biochem. Sci. studies of glycans, Chem. - Eur. J.
2004, 29,656-663. 2005, 1I , 2894-2901.
660
23. D.M. Ratner, E.W. Adams, J. Su, B.R. 33. G.S. Jacob, C. Kirmaier, S.Z. Abbas,
O’Keefe, M. Mrksich, P.H. S.C. Howard, C.N. Steininger, J.K.
Seeberger, Probing Welply, P. Scudder, Binding of sialyl
protein-carbohydrate interactions lewis X to E-selectin as measured by
with microarrays of synthetic fluorescence polarization,
oligosaccharides, Chembiochem2004, Biochemistry 1995,34,1210-1217.
5, 379-383. 34. R.V. Weatherman, L.L. Kiessling,
24. 0. Blixt, S. Head, T. Mondala, Fluorescence anisotropy assays reveal
C. Scanlan, M.E. Huflejt, R. Alvarez, affinities of C- and 0-glycosides for
M.C. Bryan, F. Fazio, D. Calarese, concanavalin a, J. Org. Chem. 1996,
J. Stevens, N. Razi, D.J. Stevens, J.J. 61,534-538.
Skehel, 1. van Die, D.R. Burton, I.A. 35. P. Sorme, B. Kahl-Knutsson,
Wilson, R. Cummings, N. Bovin, M. Huflejt, U.J. Nilsson, H. Leffler,
C.-H. Wong, J.C. Paulson, Printed Fluorescence polarization as an
covalent glycan array for ligand analytical tool to evaluate
profiling of diverse glycan binding galectin-ligand interactions, Anal.
proteins, Proc. Natl. Acad. Sci. U.S.A. Biochem. 2004,334,36-47.
2004, 101,17033-17038. 36. C.T. Oberg, S. Carlsson, E. Fillion,
25. Y.C. Lee, R.T. Lee, H. Leffler, U.J. Nilsson, Efficient
Carbohydrate-protein interactions: and expedient two-step pyranose-
basis of glycobiology, Ace. Chem. Res. retaining fluorescein conjugation of
1995, 28,321-327. complex reducing oligosaccharides:
26. E.J. Toone, Structure and energetics galectin oligosaccharide
of protein carbohydrate complexes, specificity studies in a fluorescence
Curr. Opin. Struct. Bid. 1994, 4, polarization assay, Bioconjugate
719-728. Chem. 2003, 14,1289-1297.
27. L.L. Kiessling, N.L. Pohl, Strength in 37. M. Mizuno, M. Noguchi, T. Imai,
numbers: non-natural polyvalent T. Motoyoski, T. Inazu, Interaction
carbohydrate derivatives, Chem. Biol. assay of oligosaccharide with lectin
1996, 3,71-77. using glycosylasparagine, Bioorg.
28. R. Roy, Syntheses and some Med. Chem. Lett. 2004, 14,485-490.
applications of chemically defined 38. E.A. Smith, W.D. Thomas, L.L.
multivalent glycoconjugates, Cum. Kiessling, R.M. Corn, Surface
Opin. Struct. Biol. 1996, 6, 692-702. plasmon resonance imaging studies
29. B.E. Collins, J.C. Paulson, Cell of protein-carbohydrate interactions,
surface biology mediated by low J . Am. Chem. Soc. 2003, 125,
affinity multivalent protein-glycan 6140-6148.
interactions, Curr. Opin. Chem. Biol. 39. B.T. Houseman, M. Mrksich,
2004,8,617-625. Carbohydrate arrays for the
30. W.J. Sanders, E.J. Gordon, 0. Dwir, evaluation of protein binding and
P.J. Beck, R. Alon, L.L. Kiessling, enzymatic modification, Chem. Bid.
lnhibition of L-selectin-mediated 2002, 9,443-454.
leukocyte rolling by synthetic 40. D.A. Mann, L.L. Kiessling, in
glycoprotein mimics, J . Bid. Chem. Glycochemistry:Principles, Synthesis,
1999, 274,5271-5278. and Applications, 1st ed., (Eds.: P.G.
31. K. Kakehi, M. Oda, M. Kinoshita, Wang, C.R. Bertozzi), Marcel Dekker,
Fluorescence polarization: analysis of New York, 2001, pp. 221-275.
carbohydrate-protein interaction, 41. D.M. Ratner, E.W. Adams, M.D.
Anal. Biochem. 2001, 297,111-122. Disney, P.H. Seeberger, Tools for
32. E.G. Weinhold, J.R. Knowles, Design glycomics: mapping interactions of
and evaluation of a tightly binding carbohydrates in biological systems,
fluorescent ligand for influenza a Chembiochem 2004,51375-1383.
hemagglutinin, J . Am. Chem. Soc. 42. E.W. Adams, D.M. Ratner, H.R.
1992, 114,9270-9275. Bokesch, J.B. McMahon, B.R.
References I 6 6 1
O’Keefe, P.H. Seeberger, 52. P. Sorme, Y. Qian, P. Nyholm,
Oligosaccharide and glycoprotein H. Leffler, U.J. Nilsson, Low
microarrays as tools in HIV micromolar inhibitors of galectin-3
glycobiology: glycan-dependent based on 3’-Derivatization of
gpl20/protein interactions, Chem. N-acetyllactosamine, Chembiochem
Bid. 2004, 11, 875-881. 2002,3, 183-189.
43. S. Fukui, T. Feizi, C. Galustian, A.M. 53. D.H. Slee, S.J. Romano, 1. Yu, T.N.
Lawson, W. Chai, Oligosaccharide Nguyen, J.K. John, N.K. Raheja, F.U.
microarrays for high-throughput Axe, T.K. Jones, W.C. Ripka,
detection and specificity assignments Development of potent
of carbohydrate-protein interactions, non-carbohydrate imidazole-based
Nut. Biotechnol. 2002, 20, 1011-1017. small molecule selectin inhibitors
44. S. Park, M.-r. Lee, S.-J. Pyo, I. Shin, with antiinflammatory activity, J .
Carbohydrate chips for studying Med. Chem. 2001,44,2094-2107.
high-throughput carbohydrate- 54. P. Sorme, P. Arnoux,
protein interactions, /. Am. B. Kahl-Knutsson, H. Leffler, J.M.
Chem. SOC.2004, 126,4812-4819. Rini, U.J. Nilsson, Structural and
45. T. Feizi, F. Fazio, W. Chai, C.-H. thermodynamic studies on cation-11
Wong, Carbohydrate microarrays-a interactions in lectin-ligand
new set of technologies at the complexes: high-affinity galectin-3
frontiers of glycomics, Cum. Opin. inhibitors through fine-tuning of an
Struct. Biol. 2003, 13, 637-645. ariginine-arene interaction, /. Am.
46. M.C. Bryan, L.V. Lee, C.-H. Wong, Chem. Soc. 2005, 127,1737-1743.
55. M.C. Schuster, D.A. Mann,T.J.
High-throughput identification
Buchholz, K.M. Johnson, W.D.
of fucosyltransferase inhibitors using
Thomas, L.L. Kiessling, Parallel
carbohydrate microarrays, Bioorg.
synthesis of glycomimetic libraries:
Med. Chem. Lett. 2004, 14,3185-3188.
targeting a C-type lectin, Org. Lett.
47. F. Fazio, M.C. Bryan, 0. Blixt, J.C.
2003, 5, 1407-1410.
Paulson, C.-H. Wong, Synthesis of
56. P.M. Coutinho, E. Deleury, G.J.
sugar arrays in microtiter plate,]. Am. Davies, B. Henrissat, An evolving
Chem. SOC.2002, 124, 14397-14402. hierarchical family classification for
48. H.C. Hang, C. Yu, M.R. Pratt, C.R. glycosyltransferases, ]. Mol. Biol.
Bertozzi, Probing glycosyltransferase 2003, 328,307-317.
activities with the staudinger ligation, 57. H. Wang, S. Hanash, Intact-protein
/. Am. Chem. Soc. 2004, t26,6-7. based sample preparation strategies
49. L. Nimrichter, A. Gargir, M. Gortler, for proteome analysis in combination
R.T. Altstock, A. Shtevi, with mass spectrometry, Muss
0. Weisshaus, E. Fire, N. Dotan, R.L. Spectrom. Rev. 2005, 24,413-426.
Schnaar, Intact cell adhesion of 58. S.P. Gygi, B. Rist,
glycan microarrays, Glycobioloa S.A. Gerber, F. Turecek, M.H.
2004, 14,197-203. Gelb, R. Aebersold, Quantitative
50. M.D. Disney, P.H. Seeberger, The analysis of complex protein mixtures
use of carbohydrate microarrays to using isotope-coded affinity tags,
study carbohydrate-cell interactions Nut. BiotechnoL. 1999, 17, 994-999.
and to detect pathogens, Chem. Biol. 59. N.L. Pohl, Functional proteomics for
2004, 11,1701-1707. the discovery of carbohydrate-related
51. H. Moriyama, Y. Hiramatsu, enzyme activities, Curr. Opin. Chem.
T. Kiyoi, T. Achiha, Y. Inoue, Bid. 2005, 9, 76-81.
H. Kondo, Studies on selectin 60. C.J. Zea, N.L. Pohl, Kinetic and
blocker. 9. SARs of non-sugar substrate binding analysis of
selectin blocker against E-, P-, phosphorylase b via electrospray
L-selectin bindings, Bioorg. Med. ionization mass spectrometry: a
Chem. 2001, 9, 1479-1491. model for chemical proteomics of
662
I sugar phosphorylases, Anal. Biochem. 71. L.L. Kiessling, T. Young, K.H.
2004,327,107-113. Mortell, in Glycoscience: Chemistry
61. C.1. Zea, N.L. Pohl, General assay for and Chemical Biology 1-111,1st ed.,
sugar nucleotidyltransferases using (Eds.: B. Fraser-Reid, K. Tatsuta,
electrospray ionization mass J. Thiem), Springer, New York, 2003,
spectrometry, Anal. Biochem. 2004, pp. 1817-1861.
328,196-202. 72. L.L. Kiessling, J.E. Gestwicki, L.E.
62. Y. Yu, K.4. KO, C. Zea, N.L. Pohl, Strong, Synthetic multivalent ligands
Discovery of the chemical function of in the exploration of cell-surface
glycosidases: design, synthesis, and interactions, Curr. Opin. Chem. Biol.
evaluation of mass-differentiated 2000,4,696-703.
carbohydrate libraries, Org. Lett. 73. M. Mammen, S.-K. Choi, G.M.
2004, 6,2031-2033. Whitesides, Polyvalent interactions
63. C.-S. Tsai, Y.-K. Li, L.-C. Lo, Design in biological systems: implications
and synthesis of activity probes for for design and use of multivalent
glycosidases, Org. Lett. 2002, 4, ligands and inhibitors, Angew. Chem.,
3607-3610. lnt. Ed. Engl. 1998,37,2755-2794.
64. M. Ichikawa, Y. Ichikawa, A 74. E.E. Simanek, G.J. McGarvey, J.A.
mechanism-based affinity-labeling Jablonowski, C.-H. Wong,
agent for possible use in isolating Selectin-carbohydrate interactions:
N-acetylglucosaminidase, Bioorg. from natural ligands to designed
Med. Chem. Lett. 2001, 11, mimics, Chem. Rev. 1998, 98,
1769-1773. 833-862.
65. D.J. Vocadlo, C.R. Bertozzi, A 75. J.E. Gestwicki, C.W. Cairo, L.E.
strategy for functional proteomic Strong, K.A. Oetjen, L.L. Kiessling,
analysis of glycosidase activity from Influencing receptor-ligand binding
cell lysates, Angew. Chem., Int. Ed. mechanisms with multivalent ligand
Engl. 2004,43,5338-5342. architecture, J. Am. Chem. Soc. 2002,
66. P. Sears, C.-H. Wong, Carbohydrate 124,14922-14933.
mimetics: a new strategy for tackling 76. H. Kamitakahara, T. Suzuki,
the problem of N. Nishigori, Y. Suzuki, 0. Kanie,
carbohydrate-mediated biological C.-H. Wong, A lysoganglioside
recognition, Angew. Chem., Int. Ed. poly-L-glutamic acid conjugate as a
Engl. 1999,38,2300-2324. picomolar inhibitor of influenza
67. B.R. Stockwell, Chemical genetics: hemagglutinin, Angew. Chem., Int.
ligand-based discovery of gene Ed. Engl. 1998,37,1524-1528.
function, Nut. Rev. Genet. 2000, I , 77. J.D. Reuter, A. Myc, M.M. Hayes,
116-125. Z.H. Gan, R. Roy, D.J. Qin, R. Yin,
68. M. Soltero-Higgin, E.E. Carlson, J.H. L.T. Piehler, R. Esfand, D.A. Tomalia,
Phillips, L.L. Kiessling, Identification J.R. Baker, Inhibition ofviral
of inhibitors for adhesion and infection by
UDP-galactopyranose mutase, J. Am. sialic-acid-conjugated dendritic
Chem. SOC.2004, 126,10532-10533. polymers, Bioconjugate Chem. 1999,
69. J.S. Helm, Y. Hu, L. Chen, B. Gross, 10,271-278.
S. Walker, Identification of active-site 78. P.I. Kitov, J.M. Sadowska, G. Mulvey,
inhibitors of MurG using a G.D. Armstrong, H. Ling, N.S.
generalizable, high-throughput Pannu, R.J. Read, D.R. Bundle,
glycosyltransferase screen, I.Am. Shiga-like toxins are neutralized by
Chem. SOC.2003, 125,11168-11169. tailored multivalent carbohydrate
70. L.L. Kiessling, J.K. Pontrello, M.C. ligands, Nature 2000, 403,669-672.
Schuster, in Carbohydrate-Based Drug 79. E.K. Fan, Z.S. Zhang, W.E. Minke,
Discovery, 1st ed. (Ed.: C.-H. Wong), 2. Hou, C. Verlinde, W.G.J. Hol,
Wiley-VCH, Weinheim, 2003, High-affinity pentavalent ligands of
pp. 575-608. Escherichia coli heat-labile enterotoxin
References I663
by modular structure-based design, /. lectin, /. Bid. Chem. 1991, 266,

Am. Chem. SOL.2000,122,2663-2664. 5552-5557.
80. N. Kaila, B.E. Thomas, Design and 91. Y.D. Lobsanov, M.A. Gitt, H. Leffler,
synthesis of sialyl Lewis" mimics as S.H. Barandes, J.M. Rini, X-ray
E- and P-selectin inhibitors, Med. Res. crystal structure of the human
Rev. 2002, 22, 566-601. dimeric S-Lac lectin, L-14-11,in
81. E. J. Gordon, J.E. Gestwicki, L.E. complex with lactose at 2.9 A
Strong, L.L. Kiessling, Synthesis of resolution,/. Biol. Chem. 1993, 268,
end-labeled multivalent ligands for 27034-27038.
exploring cell-surface-receptor-ligand 92. D.-I. Liao, G. Kapadia, H. Ahmed,
interactions, Chem. Biol. 2000, 7, G.R. Vasta, 0. Herberg, Structure of
9-16. S-lectin, a developmentally regulated
82. N.L. Perillo, M.E. Marcus, L.G. vertebrate 8-galactoside-binding
Baum, Galectins: versatile protein, Proc. Natl. Acad. Sci. U.S.A.
modulators of cell adhesion, cell 1994, 91,1428-1432.
proliferation, and cell death, J. Mol. 93. K. Henrick, S. Bawumia, E.A.M.
Med. 1998, 76,402-412. Barboni, B. Mehul, R.C. Hughes,
83. H.-J. Gabius, H.-C. Siebert, S. Andre, Evidence for subsites in the galectins
J. Jimenez-Barbero, H. Riidiger, involved in sugar binding at the
Chemical biology of the sugar code, nonreducing end of the central
ChemBioChem 2004,5740-764. galactose of oligosaccharide ligands:
84. J. Seetharaman, A. Kanigsberg, sequence analysis, homology
R. Slaaby, H. Leffler, S.H. Barandes, modeling and mutagenesis studies of
X-ray crystal structure of the human hamster galectin-3, Glycobiology 1998,
galectin-3 carbohydrate recognition 8, 45-57.
domain at 2.1-A resolution, /. Biol. 94. M. Demetriou, M. Granocsky,
Chem. 1998, 273,13047-13052. S. Quaggin, J.W. Dennis, Negative
85. R.-Y. Yang, F.-T. Liu, Galectins in cell regulation of T-cell activation and
growth and apoptosis, Cell. Mol. L@ autoimmunity by Mgat5
Sci. 2003, 60, 267-276. N-glycosylation, Nature 2001, 409,
86. R.C. Hughes, Secretion of the 733-739.
galectin family of mammalian 95. I. Vrasidas, S. Andre, P. Valentini,
carbohydrate-binding proteins, C. Bock, M. Lensch, H. Kaltner,
Biochim. Biophys. Acta 1999, 1473, R.M.J. Liskamp, H.-J. Gabius, R.J.
172-185. Pieters, Rigidified multivalent lactose
87. S.F. Dagher, J.L. Wang, R.J. molecules and their interactions with
Patterson, Identification of galectin-3 mammalian galectins: a route to
as a factor in pre-mRNA splicing, selective inhibitors, Org. Biomol.
Proc. Natl. Acad. Sci. U.S.A. 1995, 92, Chem. 2003, I , 803-810.
1213-1217. 96. N.L. Pohl, L.L. Kiessling, Scope of
88. R.Y. Yang, D.K. Hsu, F.T. Liu, multivalent ligand function:
Expression of galectin-3 modulates lactose-bearing neoglycopolymers by
T-cell growth and apoptosis, Proc. ring-opening metathesis
Nntl. Acad. Sci. U.S.A. 1996, 93, polymerization, Synthesis 1999, SI,
6737-6742. 1515-1519.
89. J. Hirabayashi, K. Kasai, Effect of 97. S. Andre, C.J. Arnusch, I . Kuwabara,
amino acid substitution by R. Russwurm, H. Kaltner, H.-J.
site-directed mutagenesis on the Gabius, R.J. Pieters, Identification of
carbohydrate recognition and stability peptide ligands for malignancy- and
of human 14-kDa B-galactoside- growth-regulating galectins using
binding lectin, /. Biol. Chem. 1991, random phage-display and designed
266,23648-23653. combinatorial peptide libraries,
90. W.M. Abbott, T. Feizi, Soluble Bioorg. Med. Chem. 2005, 13,
14-kDa 8-galactoside-specific bovine 563-573.
664 7 7 Advances in Sugar Chemistry
I 98. C.J. Arnusch, S.Andre, P. Valentini, migration: where are we now? Semin.
M. Lensch, R. Russwurm, H.-C. Immunol. 2002, 14,133-140.
Siebert, M.J.E. Fischer, H.-J. Gabius, 108. L.A. Lasky, Selectins: interpreters of
R.J. Pieters, Interference of the cell-specific carbohydrate
galactose-dependent binding of information during inflammation,
lectins by novel pentapeptide ligands, Science 1992, 258,964-969.
Bioorg. Med. Chem. Lett. 2004, 14, 109. W.S. Somers, J. Tang, G.D. Shaw,
1437- 1440. R.T. Camphausen, Insights into the
99. K. Drickamer, C-type lectin-like molecular basis of leukocyte
domains, Curr. Opin. Struct. Biol. tethering and rolling revealed by
1999, 9,585-590. structures of P- and E-selectin bound
100. K. Hskansson, K.B.M. Reid, to sLeXand PSGL-1, Cell 2000, 103,
Collectin structure: A review, Protein 467-479.
Sci. 2000, 9, 1607-1617. 110. E.J. Gordon, L.E. Strong, L.L.
101. W.I. Weis, M.E. Taylor, Kiessling, Glycoprotein-inspired
K. Drickamer, The C-type lectin materials promote the proteolytic
superfamily in the immune system, release of cell surface L-selectin,
Immunol. Rev. 1998, 163, 19-34. Bioorg. Med. Chem. 1998, 6,
102. T.B.H. Geijtenbeek, D.J. 1293- 1299.
Kroopshoop, D.A. Bleijs, S.J. van 111. H. Tsujishita, Y. Hiramatsu,
Vliet, G.C.F. van Duijnhoven, N. Kondo, H. Ohmoto, H. Kondo,
V. Grabovsky, R. Alon, C.G. Figdor, M. Kiso, A. Hasegawa,
Y. van Kooyk, DC-SIGN-ICAM-2 Selectin-ligand interactions revealed
interaction mediates dendritic cell by molecular dynamics simulations
in solution, J . Med. Chem. 1997, 40,
trafficking, Nat. Immunol. 2000, 1,
362-369.
353-357.
112.
103. M.P. Bevilacqua, S. Stengelin, M.A.
Y. Hiramatsu, T. Tsukida, Y. Nakai,
Y. Inoue, H. Kondo, Study of selectin
Gimbrone, B. Seed, Endothelial
blocker. 8. Lead discovery of a
leukocyte adhesion molecule 1: an
non-sugar antagonist using a
inducible receptor for neutrophils
3D-Pharmacophore model, J . Med.
related to complement regulatory
Chem. 2000,43,1476-1483.
proteins and lectins, Science 1989, 113. M. De Vleeschauwer,
243,1160-1165. M. Vaillancourt, N. Goudreau,
104. G.I.Johnston, R.G. Cook, R.P.
Y. Guindon, D. Gravel, Design and
McEver, Cloning pf GMP-140, a synthesis of a new sialyl Lewis X
granule membrane-protein of mimetic: how selective are the
platelets and endothelium-sequence selectin receptors? Bioorg. Med.
similarity to proteins involved in Chem. Lett. 2001, 11, 1109-1112.
cell-adhesion and inflammations, 114. M.A. Estiarte, D.H. Rich, Burger’s
Cell 1989, 56,1033-1044. Medicinal Chemistry and Drug
105. L.A. Lasky, M.S. Singer, T.A. Discovery, 6th ed., (Ed.: D. Abraham),
Yednock, D. Dowbenko, C. Fennie, John Wiley and Sons, New York,
H. Rodriguez, T. Nguyen, S. Stachel, 2003, pp. 633-685.
S.D. Rosen, Cloning of a lymphocyte 115. G.R. Dawson, N. Collinson, J.R.
homing receptor reveals a lectin Atack, Development of subtype
domain, Cell 1989,56,1045-1055. selective GABA(A)modulators, C N S
106. J.G. Geng, M. Chen, K.C. Chou, Spectr. 2005, 10, 21-27.
P-selectin cell adhesion molecule in 116. R.T. Lee, M. Ichikawa, K. Fay,
inflammation, thrombosis, cancer K. Drickamer, M.-C. Shao, Y.C. Lee,
growth and metastasis, Curr. Med. Ligand-binding characteristics of rat
Chem. 2004, 11,2153-2160. serum-type mannose-binding protein
107. D. Marshall, D.O. Haskard, Clinical (MBP-A),J.Biol. Chem. 1991, 266,
overview of leukocyte adhesion and 48 10-481 5.
References I665
117. E.G. Berger, J. Rohrer, formation in calf-liver microsomes,
Galactosyltransferase-still up and Biochem. Biophys. Res. Commun.
running, Biochimie2003,85,261-274. 1975, 65,248-257.
118. R. Almeida, S.B. Levery, U. Mandel, 126. A. Mizoguchi, T. Mizuocki,
H. Kresse, T. Schwientek, E.P. Y. Kitazume, G. Tamura, A. Kobata,
Bennett, H. Clausen, Cloning and Abnormal spicule formation induced
expression of a proteoglycan by tunicamycin in the early
UDP-ga1actose:b-xylose development of the sea-urchin
~-1,4-galactosyltransferase I. A embryo, Cell Struc. Funct. 1981, 6,
seventh member of the human 341- 346.
p4-galactosyltransferase gene family, 127. R.S. Winning, N.C. Bols, J. J.
J. Biol. Chem. 1999, 274, Heikkila, Tunicamycin-inducible
26165-26171. polypeptide-synthesis during
119. T. Okajima, S. Fukumoto, xenopus-laevis embryogenesis,
K. Furukawa, T. Urano, Differentiation 1991, 46, 167-172.
K. Furukawa, Molecular basis for the 128. N . Zagris, M. Panagopoulou,
progeroid variant of ehlers-danlos N-glycosylated proteins interfere with
syndrome. Identification and the 1st cellular migration in early
characterization of two mutants in chick embryo, Int. J. Deu. Biol. 1992,
galactosyltransferase I gene, J. Biol. 36,439-443.
Chem. 1999,274,28841-28844. 129. X. Shen, R.E. Ellis, K. Lee, C.-Y. Liu,
120. C . Hammond, I . Braakman, K. Yang, A. Solomon, H. Yoshida,
A. Helenius, Role of N-linked
R. Morimoto, D.M. Kurnit, K. Mori,
oligosaccharide recognition, glucose
R.J. Kaufman, Complementary
trimming, and calnexin in
signaling pathways regulate the
glycoprotein folding and quality
unfolded protein response and are
control, Proc. Natl. Acad. Sci. U.S.A.
required for C. elegans development,
1994, 91, 913-917.
Cell 2001, 107,893-903.
121. P. Schieffele, J. Peranen, K. Simons,
130. K.M. Koeller, C.-H. Wong, Emerging
N-glycans as apical sorting signals in
themes in medicinal glycoscience,
epithelial cells, Nature 1995, 378,
Nat. Biotechnol 2000, 18, 835-841.
96-98.
131. N. Asano, Glycosidase inhibitors:
122. E. Ioffe, P. Stanley, Mice lacking
update and perspectives on practical
N-acetylglucosaminyltransferase I
activity die at mid- gestation, use, Glycobiology 2003, 13,93R-l04R.
revealing an essential role for 132. D.S. Boyle, W.D. Donachie, MraY is
complex or hybrid N-linked an essential gene for cell growth in
carbohydrates, Proc. Natl. Acad. Sci. Escherichia coli, J. Bacterial. 1998, 180,
U.S.A. 1994, 91, 728-732. 6429-6432.
123. M. Metzler, A. Gertz, M. Sarkar, 133. S.A. Denome, P.K. Elf, D.E.
H. Sachachter, J.W. Schrader, J.D. Henderson, D.E. Nelson, K.D.
Marth, Complex asparagine-linked Young, Escherichia coli mutants
oligosaccharides are required for lacking possible combinations of
morphogenic events during eight penicillin binding proteins:
post-implantation development, viability, characteristics, and
EMBOJ. 1994, 13,2056-2065. implications for peptidoglycan
124. E.S. Trombetta, The contribution of synthesis, Antimicrob. Agents
N-glycans and their processing in the Chemother. 1999, 181,3981-3993.
endoplasmic reticulum to 134. C. Walsh, Antibiotics: Actions, Origins,
glycoprotein biosynthesis, Resistance, ASM Press, Washington,
Glycobiology 2003, 13, 77R-91R. 2003.
125. J.S. Tkacz, 0. Lampen, Tunicamycin 135. T.L. Lowary, Recent progress towards
inhibition of polyisoprenyl the identification of inhibitors of
N-acetylglucosaminyl pyrophosphate mycobacterial cell wall
666
polysaccharide biosynthesis, Mini. the fluorescence probe

Rev. Med. Chem. 2003, 3,689-702. 8-anilino-1-naphthalenesulfonate
136. G.D. Wright, Mechanisms of (ANS) with the antibiotic target
resistance to antibiotics, Curr. Opin. MurA, Proc. Natl. Acad. Sci. U.S.A.
Chem. Biol. 2003, 7, 1-7. 2000, 97,6345-6349.
137. Y. Ma, F. Pan, M. McNeil, Formation 147. S. Eschenburg, M.A. Priestman, F.A.
of dTDP-Rhamnose is essential for Abdul-Latif,C. Delachaume,
growth of mycobacteria,J . Bacteriol. F. Fassy, E. Schonbrunn, A novel
2002, 184,3392-3395. inhibitor that suspends the induced
138. L.L. Pederson, S.J. Turco, fit mechanisms of
Galactofuranose metabolism: a UDP-N-acetylglucosamine
potential target for antimicrobial enolpyruvyl transferase (MurA),J.
chemotherapy, Cell. Mol. Lfe Sci. Biol. Chem. 2005, 280, 14070-14075.
2003, 60,259-266. 148. E.Z. Baum, D.A. Montenegro,
139. R. Koplin, J.R. Brisson, C.J. L. Licata, I. Turchi, G.C. Webb, B.D.
Whitfield, UDP-galactofuranose Foleno, K. Bush, Identification and
precursor required for formation of characterization of new inhibitors of
the lipopolysaccharide 0 antigen of the Escherichia coli MurA enzyme,
Klebsiella pneumoniae serotype 01 is Antimicrob. Agents Chemother. 2001,
synthesized by the product of the 45,3182-3188.
rfbD(KPO1)gene, /. Biol. Chem. 1997, 149. Y. Hu, J.S. Heim, L. Chen,
272,4121-4128. C. Ginsberg, B. Gross, B. Kraybill,
140. P.M. Nassau, S.L. Martin, R.E. K. Tiyanont, X. Fang, T. Wu,
Brown, A. Weston, D. Monsey, S. Walker, Identification of selective
M. McNeil, K. Duncan, inhibitors for the glycosyltransferase
Galactofuranose biosynthesis in via high-throughput MurG
Escherichia coli K-12: Identification screening, Chem. Bid. 2004, I I,
and cloning of UDP-galactopyranose 703-71 1.
mutase, J . Bacteriol. 1996, 178, 150. Y. Hu, L. Chen, S . Ha, B. Gross,
1047- 1052. B. Falcone, D. Walker,
141. F. Pan, M. Jackson, Y. Ma, M. Mokhtarzadeh, S. Walker, Crystal
M. McNeil, Determination that cell structure of the MurG:UDP-GlcNAc
wall galactofuran synthesis is complex reveals common structural
essential for growth of mycobacteria, principles of a superfamily of
J. Bacteriol. 2001, 183,3991-3998. glycosyltransferases, Proc. Natl. Acad.
142. P. Compain, O.R. Martin, S C ~U.S.A.
. 2003, 100,845-849.
Carbohydrate mimetics-based 151. X. Wen, D.C. Crick, P.J. Brennan,
glycosyltransferaseinhibitors, Bioorg. P.G. Hultin, Analogues of the
Med. Chem. 2001, 9,3077-3092. mycobacterial arabinogalactan
143. A.H. Katz, C.E. Caufield, linkage disaccharide as cell wall
S tructure-based design approaches to biosynthesis inhibitors, Bioorg. Med.
cell wall biosynthesis inhibitors, Chem. 2003, 1 I , 3579-3587.
Curr. Pharm. Des. 2003, 9,857-866. 152. K. Marotte, T. Ayad, Y. Genisson,
161. L.L. Silver, Novel inhibitors of G.S. Besra, M. Baltas, J. Prandi,
bacterial cell wall synthesis, CUT. Synthesis and biological evaluation of
Opin. Microbiol. 2003, 6,431-438. imino sugar-oligoarabinofuranoside
145. F.M. Kahan, J.S. Kahan, P. J. Cassidy, hybrids, a new class of mycobacterial
H. Kropp, The mechanism of action arabinofuranosyltransferase
of fosfomycin (phosphonomycin), inhibitors, Eur. J . Org. Chem. 2003,
Ann. N.Y. Acad. Sci. 1974, 235, 14,2557-2565.
364-386. 153. A. Caravano, D. Mengin-Lecreulx,
146. E. Schonbrunn, S. Eshenburg, J.-M. Brandello, S.P. Vincent,
K. Luger, W. Kabsch, N. Amrhein, P. Sinay, Synthesis and inhibition
Structural basis for the interaction of properties of conformational probes
References I667
for the mutase-catalyzed development of a microtiter

U DP-Galactopyranose/furanose plate-based screen for inhibitors of
interconversion, Chem. - Eur. J. conversion of dTDP-glucose to
2003, 9,5888-5898. dTDP-rhamnose, Antimicrob. Agents
154. Q.Zhang, H. Liu, Mechanistic Chemother. 2001, 45, 1407-1416.
investigation of UDP- 161. C.J. Andres, J.J. Bronson,
galactopyranose mutase from S.V. D’Andrea, M.S. Deshpande,
Escherichia coli using 2- and P.F. Falk, K.A. Grant-Young,
3-fluorinated UDP- W.E. Harte, H.-T. Ho, P.F. Misco,
galactofuranose as probes, J . A m . J.G. Robertson, D. Stock, Y. Sun,
Chem. Soc. 2001, 123,6756-6766. A.W. Walsh, 4-thiazolidinones: Novel
155. J.N. Barlow, J.S. Blanchard, inhibitors of the bacterial enzyme
Enzymatic synthesis of MurB, Bioorg. Med. Chem. Lett. 2000,
UDP-(3-deoxy-3-fluoro)-D-galactose 10,715-717. ,
and UDP+deoxy-2-fluoro)-D- 162. J.J.Bronson, K.L. DenBleyker, P.J.
galactose and substrate activity with Falk, R.A. Mate, H.-T. Ho, M.J.
UDP-galactopyranose mutase, Pucci, L.B. Snyder, Discovery of the
Carbohydr. Res. 2000, 328,473-480. first antibacterial small molecule
156. N.Veerapen,Y. Yuan, D.A.R. inhibitors of MurB, Bioorg. Med.
Sanders, B.M. Pinto, Synthesis of Chem. Lett. 2003, 13,873-875.
ammonium and ions 163. K. Babaoglu, M.A. Page, V.C. Jones,
and their evaluation as inhibitors of M.R. McNeil, C. Dong, J.H.
UDP-galactopyranose mutase, Naismith, R.E. Lee, Novel Inhibitors
Carbohydr. Res. 2004, 339, of an emerging target in
2205-2217. Mycobacterium tuberculosis;
157. S. Cren, S.S.Gurcha, A.J. Blake, G.S. substituted thiazolidinones as
Bersa, N.R. Thomas, Synthesis and
inhibitors of dTDP-rhamnose
biological evaluation of new synthesis, Bioorg. Med. Chem. Lett.
inhibitors of UDP-Gay transferase-a
2003, 13,3227-3230.
key enzyme in M. tuberculosis cell
164. G.D. Francisco, Z.Li, D. Albright,
wall biosynthesis, Org. Biomol. Chem.
N.H. Eudy, A.H. Katz, P.J. Petersen,
2004, 2,2418-2420.
P. Labthavikul, G . Singh, Y. Yang,
158. M.S. Scherman, K.A. Winans, R.J.
B.A. Rasmussen, Y. Lin, T.S.
Stern, V. Jones, C.R. Bertozzi, M.R.
Mansour, Phenyl thaizolyl urea and
McNeil, Drug targeting
carbamate derivatives as new
mycobacterium tuberculosis cell wall
inhibitors of bacterial cell-wall
synthesis: development of a
biosynthesis, Bioorg. Med. Chem. Lett.
microtiter plate-based screen for
2004, 14,235-238.
U DP-galactopyranose mutase and
identification ofan inhibitor from a l65. S.’. K.L. Owens,
uridine-based library, Antimicrob. M. Showalter, C.L. Griffith, T.L.
Agents Chemother. 2003, 47, 378-382. Doering, V.C. Jones, M.R. McNeil,
159. M. Soltero-Higgin, E.E. Carlson, T.D. Eukaryotic UDP-galactopyranose
Gruber, L.L. Kiessling, A unique mutase (GLF gene) in microbial and
catalytic mechanism for metazoal pathogens, Eukaryot. Cell
U DP-galactopyranose mutase, Nat. 2005,4,1147-1154.
Struct. Mol. Biol. 2004, I I , 539-543. 166. R.S. Haltiwanger, Regulation of
160. Y. Ma, R.J. Stern, M.S. Scherman, signal transduction pathways in
V.D. Vissa, W. Yan, V. Cox Jones, development by glycosylation, Curr.
F. Zhang, S.G. Franzblau, W.H. Opin. Struct. Biol. 2002, 12, 593-598.
Lewis, M.R. McNeil, Drug targeting 167. A. Cambi, C.G. Figdor, Dual function
Mycobacteriurn tuberculosis cell wall of C-type lectin-like receptors in the
synthesis: Genetics of dTDP- immune system, Curr.Opin. Cell
Rhamnose synthetic enzymes and Biol. 2003, 15, 539-546.
Chemical Biology
668
I I 1 Advances in Sugar Chernistv
11.2
Chemical Clycomics as Basis for Drug Discovery
Daniel B. Werz and Peter H. Seeberger
Outlook
Chemical glycomics uses synthetic carbohydrates and glycoconjugates to

study natural carbohydrates and glycoconjugates their role in important bio-
logical processes such as inflammation, cell-cell recognition, immunological
response, metastasis, and fertilization. The development of an automated
oligosaccharide synthesizer greatly accelerates the assembly of complex, nat-
urally occurring carbohydrates as well as chemically modified oligosaccharide
structures, and promises to make a major impact in the field of glycobiology.
Tools such as microarrays, surface plasmon resonance (SPR), and fluorescent
carbohydrate conjugates to map interactions of carbohydrates in biological sys-
tems are presented. Case studies of the successful application of carbohydrates
as active agents are discussed: Fully synthetic oligosaccharide vaccines to com-
bat tropical diseases (e.g., malaria), bacterial infections (e.g.,tuberculosis), viral
infections (e.g., HIV), and cancer. Aminoglycosides serve as examples of drugs
acting via carbohydrate nucleic acid interactions, while heparin works through
carbohydrate-protein interactions. A carbohydrate-functionalized fluorescent
polymer has been shown to detect miniscule amounts of bacteria faster than
commonly used methods.
11.2.1
Introduction
Three major classes of polymers are responsible for the storage and transfer
of information in biological systems: These are nucleic acids, proteins, and
polysaccharides. DNA, the genetic material transferring information from
generation to generation, functions as the blueprint of life. RNA serves as a
transient repository of genetic information on the way from DNA to proteins,
but also has pivotal roles in cell division, gene expression, and catalysis. The
protein synthesis machinery, called the ribosome, consists of RNA [l].Proteins,
the second major class of biopolymers, which are encoded by nucleic acids,
represent the catalytic machinery carrying out most of the reactions in the
cell. Proteins are also important as skeletal material of numerous organisms
to provide strength as well as flexibility. Glycosyltransferases, a special class of
enzymes, are responsible for the synthesis of carbohydrates, the third class of
biopolymers.
While nucleic acids and proteins are linear assemblies, carbohydrates are
structurally and stereochemically more diverse. A wide array of available
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA. Weinheim
ISBN: 978-3-527-31150-7
11.2 Chemical Clycomics as Basisfor Drug Discovery I 669
monosaccharide building blocks as well as the possibility of different

stereochemical linkages between each pair of carbohydrates results in
tremendous complexity. Additionally, the chain length of the oligosaccharides
can also vary widely from monosaccharides up to branched oligosaccharides
with more than 30 building blocks, or in the case of polysaccharides to several
thousand building blocks. The most prominent example for the latter type is
cellulose, which is the major constituent of plant tissues, and chitin, which
forms the shells of insects and crabs.
Moreover, oligosaccharides are present in the form of glycoconjugates in
all cell walls mediating a variety of events, such as inflammation, cell-cell
recognition, immunological response, metastasis, and fertilization [2]. The
carbohydrate coat called glycocalix surrounding a cell is specific for a particular
species, its cell type, and its developmental status. Alterations in cell-surface
oligosaccharides have been found in association with many pathological
conditions such as cancer and tuberculosis.
Usually, the desired glycoconjugates exist in heterogeneous mixtures that
are difficult to isolate in the pure form, and when possible, only small amounts
are obtained.
For the other two major classes of biopolymers, many tools are available
to elucidate their structure, their function, and their structure-function rela-
tionships. Detailed insights into protein-protein interactions, protein-nucleic
acid interactions, and nucleic acid-nucleic acid interactions have been gained
(Fig. 11.2-1). This research has been of fundamental importance for the
development of new therapeutics that aim to modify, enhance, or disrupt
these interactions. In contrast, carbohydrates, although studied for more than
IOOyears, have attracted less interest in the field of drug discovery. Forty
years ago, biochemical research concerning carbohydrates was focused on
their role in energy storage and supply in biological systems. Biosynthesis and
biodegradation pathways were discovered. But the function of carbohydrates
in biologically important recognition processes became evident much later.
Thus, all aspects of glycobiology, now often termed glycomics, are still not so
well understood than its two counterparts, genomics and proteomics, dealing
with nucleic acids and proteins.
The era of biotechnology was initiated by two major breakthroughs that
paved the way for further developments in biochemical research. First, the
sequencing of nucleic acids and proteins has been automated and allows for
the composition of an unknown sample to be determined quickly and reliably
[ 3 ] . Secondly, the synthesis of defined oligonucleotides [4]and peptides [ S ]
has also been automated and even allows nonspecialists in this field to obtain
rapidly larger-scale quantities of these important classes of biopolymers. The
rational design of specific modifications has come within reach and is an
important research tool in biomedicine, biotechnology, and pharmaceutics.
In contrast, oligosaccharide sequencing and structure determination re-
mains a difficult task, even though major efforts have been directed toward
the improvement of modern analytical methods such as high-performance
670
__--
__._..._._.__-
_,--
--__
.I...
,,'
, 0, < ' ., ,,,,' Proteomics
Protein - Proteir;\
Nucleic acid - Nucleic acid interactions :\'
interactions
-----...._.._.
Glycomics /
",%,Carbohydrate- Carbohydrate,,,"
*.
interactions ,,/
-----.__..-.._.---
_ _ - +
_.-'.
Fig. 11.2-1 Interactions o f t h e three main classes o f biopolymers.
liquid chromatography (HPLC),two-dimensional nuclear magnetic resonance

(NMR) techniques, and special mass spectroscopic methods such as elec-
trospray ionization and matrix-assisted [GI. Until recently, access to pure
oligosaccharides remained technically difficult and extremely time-consuming.
Multiple chemical [7]and enzymatic methods [8]are known, and an automated
method has been developed, but no general approach has evolved to date.
11.2.2
Automated Carbohydrate Synthesis
Analogous to the highly efficient synthesis of peptides and oligonucleotides,

solid-phase synthesis has been used for the automated assembly of
oligosaccharides [9, 101. Two advantages of the solid-phase approach are
noteworthy: The use of excess reagent drives reactions to completion; and
purification after each reaction step is not required, but rather washing
procedures remove excess reagents [9,10].
Our laboratory decided to utilize an acceptor-bound approach for the
carbohydrate assembly, whereby the anomeric position of the first carbohydrate
is attached at its reducing end to the solid support [9, 101. Therefore, glycosyl
7 7.2 Chemical Clycornics as Basisfor Drug Discovery I 671
phosphates [Ill and glycosyl trichloroacetimidates [I21 proved to be ideal

glycosylating agents that are relatively stable and can be stored for many months
in the refrigerator. Glycosyl phosphates are readily synthesized by a one-pot
procedure starting from differentially protected glycals. Epoxidation with
dimethyl dioxirane (DMDO) is followed by opening of the 1,2-anhydrosugar
with dibutylphosphate. Protection of the ensuing C2 hydroxyl group produced
a good to excellent yield of the desired glycosyl phosphates [11].Glycosylation
reactions in the presence of trimethylsilyl triflate result in good yield.
The reaction times usually range between 10 and 30 minutes. Selectivity
at the anomeric center is achieved by using appropriate participating or
nonparticipating groups at the C2 hydroxyl. Easily and selectively removable
temporary protecting groups such as Fmoc (fluorenylmethoxycarbonyl), that
is cleaved by weak bases, have shown to be important for successful
oligosaccharide syntheses [ 131. Orthogonal protecting groups are utilized
in concert to access branched oligosaccharides [13, 141. In addition to a useful
protecting group strategy, the next strategic consideration involves the choice
of an appropriate resin and the right linker connecting the first sugar at its
reducing end with the solid support. The linker has to be compatible with a
wide range of reaction conditions applied during oligosaccharide assembly.
However, after the synthesis is completed, rapid and efficient cleavage is
necessary. Two linkers that are readily connected to Merrifield’s resin have
shown to fulfill these requirements: An alkene-containing linker [ 151, which
is released from the solid support by olefin cross-metathesis using Gmbbs’
catalyst, and ethylene as well as an ester-containinglinker, which is cleaved by
strong bases such as methanolate [13].The latter linker can be used only when
the deprotecting sequences during oligosaccharide assembly avoid strong basic
conditions. Furthermore, novel capping and tagging methods [ 161 developed
for automated synthesis help to greatly simplify the postsynthetic workup and
purification process of synthetic oligosaccharides. Following each coupling
step, unreacted hydroxyl groups that may give rise to shorter carbohydrate
sequences are treated with a capping reagent that renders them silent in
subsequent couplings.
Usually, branched carbohydrates such as the Lewis antigens have been
synthesized in solution by highly convergent routes [17, 181. The LewisX
pentasaccharide, the Lewis Y hexasaccharide, and dimeric combinations
of Lewis antigens, including the LeY-Le’ nonasaccharide, are blood group
determinant oligosaccharides. The latter two also act as tumor markers that
are currently being explored in cancer therapy [19].A retrosynthesis ofthe fully
protected Lewis blood group oligosaccharides 1-3 is shown in Scheme 11.2-1.
With our sequential strategy using a small number of glycosyl donors 4-8
as building blocks, an automated solid-phase synthesis of these biologically
important compounds was possible [13].
Activation of the glycosyl phosphate monomers 4-8 was carried out at
-15 “C in dichloromethane under acidic conditions with the Lewis acid TM-
SOTf, Removal of Fmoc was accomplished by treatment with excess piperidine,
672
BnO OBn BnO OBn OBn

O,&
,$! . &OBu), Fmoco+O-&OBu)2
B~O!$$&,o-!~(oBU)z
OFmoc ~ OPlV
PlVO o ~ ~ ( o B uLevo$o-~
) zTCAHN
FmocO (oBU)2 FmocO PlVO BnO
OPlV
8 7 6 5 4
Scheme 11.2-1 Retrosynthesis o f t h e building blocks 4-8. Bn - benzyl,

protected Lewis X pentasaccharide 1 , Lewis Bu - butyl, Fmoc - 9-fluorenylmethoxy-
hexasaccharide 2, and LeX-LeY carbonyl, Lev - levulinoyl, Piv - pivaloyl,
nonasaccharide 3 indicates monosaccharide TCA - trichloroacetyl.
whereas the levulinoyl group was removed by treatment with a solution of

hydrazine. The coupling as well as the deprotection steps were repeated at least
twice to ensure high coupling efficiencies and a single deprotection event. A
general cycle for the installation of one building block is shown in Table 112-1.
Repetition of these cycles (Scheme 11.2-2) with the corresponding building
blocks completed the assembly of the penta-, the hexa-, and the nonasaccha-
ride, respectively. The total time durations for assembly of the carbohydrate
skeleton were 12 h for 1, 14 h for 2, and 23 h for 3 [13].
Cleavage of the ester linker from the resin using a solution of sodium
methanolate over a period of 6 h provided the crude oligosaccharides. HPLC
purification produced the fully protected Lewis X pentasaccharide 1, Lewis Y
hexasaccharide 2, and LeY-Le' nonasaccharide 3 in 12.6,9.9, and 6.5% yields,
respectively [13].
11.2.3
Tools for Clycomics
Once a carbohydrate structure of biological interest has been synthesized,

several tools [20] to map the interactions of the carbohydrates in biological
17.2 Chemical Glycornics as Baskfor Drug Dkcouery 1 673
Table 11.2-1 General cycle used with glycosyl phosphates for the
construction of oligosaccharides 1-3
Step Function Reagent Time (min)
Couple 5 equiv donor and 5 equiv TMSOTf 21

Wash Dichloromethane 9
Couple 5 equiv donor and 5 equiv TMSOTf 21
Wash N,N-Dimethylformamide (DMF) 9
Deprotection 3 x 175 equivalent piperidine in DMF or 3 4 or 80
5 x 10 equivalent hydrazine in DMF
Wash N,N-Dimethylformamide (DMF)
Wash 0.2 M acetic acid in tetrahydrofuran
Wash Tetrahydrofuran
Wash Dichloromethane
Scheme 11.2-2 Automated

oligosaccharide synthesis with
glycosyl phosphates. Initial
glycosylation of resin-bound
acceptor produces a coupling
product that may be
subsequently deprotected.
Iteration of coupling and
deprotection cycles with
phosphate donors 4-8
followed by cleavage of the
resin-bound oligosaccharides
and purification gives 1-3.
systems are at the disposal of today's glycobiologist. Figure 11.2-2 provides

an overview of tools including modified surfaces for microarrays and surface
plasmon resonance (SPR), monovalent fluorescent conjugates, neoglycopro-
tein and carbohydrate vaccines, multivalent quantum dot conjugates, affinity
tagged saccharides, derivatized magnetic particles, and latex microspheres. All
these methods relied on clever linking chemistries. Amine-containing linkers
I J Advances in Sugar Chemistry
674
I
Fig. 11.2-2 Tools for glycobiology: vaccines, d - multivalent quantum dot

a - modified surfaces for microarrays and conjugates, e - future neoglycoconjugates,
surface plasrnon resonance (SPR), f - affinity tag conjugates, g - magnetic
b - monovalent fluorescent conjugates, particle conjugates, h - latex microsphere
c - neoglycoproteins and carbohydrate and sepharose affinity resin conjugates.
are able to react with amine-reactive substrates such as activated esters. In

analogy, the carboy1 group containing linkers react with amine-containing
molecules. Furthermore, thiol-containing linkers react readily with maleimide
and iodoacetyl moieties and vice versa. In addition, thiol-containing moieties
show a high affinity to gold surfaces.
One special linker has been devised for most tools described in this chapter
(Scheme 11.2-3). 2-(2-(2-Mercaptoethoy)ethoxy)ethanol was selected due its
compatibility with existing synthetic methods, the ease of temporarily masking
the thiol functionality with a protecting group, and the readily applicable
thiol-based conjugation chemistry.
11.2.3.1 Carbohydrate Microarrays

Microarrays [21] in the “chip” format, prepared by attachment of biopoly-
mers to a surface in a spatially discrete pattern, have enabled a low-cost
and high-throughput methodology for screening interactions involving these
molecules. The most important advantage compared to classical meth-
ods is that microarrays allow for several thousand binding events to
11.2 Chemical Glycomics as Basisfor Drug Discovery I 675
Scheme 11.2-3 2-(2-(2-Mercaptoethoxy) removed from carbohydrate and thiol.

ethoxy)ethanol as a linker for preparing c - Reduced thiol coupled to maleimide or
neoglycoconjugates: a - Linker synthetically iodoacetyl functionalized structure (chip,
incorporated into reducing end o f mono- or bead, resin, fluorescent dye, quantum
oligosaccharide. b - All protecting groups dot, etc.).
be screened in parallel, whereby the experiment requires only minis-

cule amounts of both analyte and ligand. Thus, binding profiles and
lead structures are readily examined. Miniaturization through the con-
struction of microarrays is particularly well suited to all investigations
in the field of glycomics [22]. In contrast to the other two classes of
biopolymers, no biological amplification strategy such as the polymerase
chain reaction (PCR) or cloning exists to produce usable quantities of
complex oligosaccharides. Therefore, the miniaturized assay format is the
method of choice to perform several experiments with only mol of
compound.
676
l J J Advances in Sugar Chemistry
Hitherto, many methods for the preparation of carbohydrate mi-

croarrays have been described, such as nitrocellulose coated slides for
noncovalent immobilization of microbial polysaccharides [23], and self-
assembled monolayers modified by Diels- Alder mediated coupling of
cyclopentadiene-derivatized oligosaccharides [24], just to name two. Un-
fortunately, the first method requires large polysaccharides or lipid mod-
ified sugars for the noncovalent interaction. The latter method requires
the preparation of oligosaccharides bearing the sensitive cyclopentadiene
moiety.
In our laboratory, the best results were obtained by utilizing maleimide
functionalization of glass slides and the immobilization of the oligosaccharides
with thiol-containing linkers. However, with this linker system two methods of
surface functionalization should be distinguished: One presents a relatively low
density of immobilized oligosaccharides and excellent resistance to nonspecific
binding of proteins to the chip surface. The other permits a high-density
immobilization of carbohydrates, and therefore, allows for the examination of
oligosaccharide clusters at the surface.
1 12.3.2 Hybrid Carbohydrate/Clycoprotein Microarrays

A chip containing both carbohydrates and glycoproteins permits the rapid
determination of the context of binding to the glycoprotein. Incubation of
proteins with this hybrid array establishes whether the peptide context is
essential for binding or the carbohydrate structure alone is sufficient. To
prepare these slides, the glass surface is usually modified with two different
chemistries, for example, on one side a maleimide chemistry, and on the other
an N-hydroxysuccinimide (NHS) activated ester.
11.2.3.3 Microsphere Arrays

In contrast to common microarrays, the microsphere system uses optical
methods to define the position and structure of a carbohydrate series
[25]. Incubation of the immobilized microsphere with a fluorophore-
labeled carbohydrate-binding protein and the subsequent measuring of the
fluorescence signals permits a determination of the binding profile. Binding
events take place when one bead emits at both the wavelength of an internal
code, which is used as a marker for the oligosaccharide attached to the
microsphere, and the fluorophore-labeled protein.
11 2 3 . 4 Surface Plasmon Resonance (SPR)

A method to get quantitative insights into the binding of analytes to ligands in
real time is SPR [26]. For SPR experiments, one of the interacting species is
immobilized on the surface ofa chip. The prospective binding partner is flowed
over the chip. During this process, the refractive index of the chip changes
owing to the interaction as well as the accumulation of analyte. The kinetic data,
obtained in this fashion allows one to calculate association and dissociation
constants from sub-microgram quantities of material. There is no need to
label the ligand or the analyte, and any influence of a label on the binding
affinities can be excluded. A further advantage is that these measurements
permit evaluations of low and high affinity interactions. SPR is on the way
to become an extremely powerful tool in glycomics, since structure-activity
relationships are quickly assessed.
11.2.3.5 Fluorescent Carbohydrate Conjugates

Microarrays do not represent ideal formats for the examination of monovalent
protein-carbohydrate interactions. Commonly, the densities of the immobi-
lized oligosaccharides are too high to ensure that monovalent interactions
are observed. Another limitation of the array technique is the requirement of
purified receptor. Therefore, another more appropriate approach is needed to
study interactions with cells.
Monovalent and multivalent fluorescent probes can be utilized to evaluate
the influence of oligosaccharide clustering on recognition by cell-surface
lectins. Fluorescence microscopy and flow cytometry are appropriate methods
to visualize the corresponding receptor-carbohydrate interactions.
11.2.3.6 Carbohydrate Affinity Screening

In contrast to the array technique that usually utilizes purified receptors,
this synthetic tool facilitates the isolation and purification processes of
carbohydrate-binding proteins [20]. Crude mixtures or biological extracts are
separated by carbohydrate-containing affinity columns. Thus, this purification
method also provides information about the interaction of carbohydrates with
other biopolymers.
11.2.4
Oligosaccharide Conjugate Vaccines: Malaria and HIV
In addition to serving as tools, carbohydrates also hold great potential as

vaccines, as small amounts of antigen can be used to protect a large number
of people. Immunological investigations using fully synthetic carbohydrate
vaccines have shown very promising results in the treatment of various
diseases. These affiliations include cancer, bacterial infections such as
tuberculosis, and tropical diseases such as leishmaniasis and malaria.
The malaria parasite Plasmodiumfalciparum, infecting 5- 10% of the human
population worldwide, accounts for about 100 million clinical cases and
I 1 Advances in Sugar Chemistry
678
I the death of more than 2 million people annually caused by the malaria
toxin [27]. Therefore, the development of a malaria vaccine would be of
highest importance. Glycosylphosphatidylinositol (GPI), which is released
when parasites rupture the host's red blood cells, has the properties predicted
of this mortality-inducing toxin [28]. Experiments demonstrated that anti-GPI
vaccination can prevent malarial pathology in an animal model [29].
To prepare this antigen, the synthetic hexasaccharide malaria toxin 9
(Fig. 11.2-3) [30] was reacted with a linker, and conjugated to maleimide-
activated carrier protein. Mice treated with chemically synthesized GPI
attached to the protein were substantially protected from death by malaria.
Between GO and 75% of the vaccinated mice survived, whereas the survival
rate for unvaccinated mice was only 0-9%. It should be noted that only
miniscule amounts (10-9-10-7 g per person) of the hexasaccharide 9 that
was partly assembled by automated synthesis are necessary to perform the
vaccination. This study suggests that GPI is a highly conserved endotoxin
of malarial parasite origin. The preclinical model revealed that a nontoxic
GPI oligosaccharide coupled to a carrier protein is immunogenic and
provides significant protection against malarial pathogenesis. An antitoxic
oligosaccharide vaccine against malaria might be within reach.
The elucidation of HIV envelope glycoprotein interactions with prospective
binding partners advances our understanding of the viral entry and provides a
basis for the design of new vaccines interfering with HIV entry. Using the chip
format, interactions of carbohydrates decorating the viral surface envelope
proteins with receptors are readily discovered. Relevant substructures that are
important for binding can be identified simultaneously when the arrays are
composed of a series of closely related analogs [31].
\ I
Fig. 11.2-3 The anti-toxin malaria CPI vaccine candidate 9.

One important carbohydrate structure found at the HIV envelope glycopro-

tein gp120 is the triantennary N-linked mannoside (Man)g(GlcNAc)z. Utilizing
a variety of synthetic mannose-containing substructures 10-16 (Fig. 11.2-4(a)),
a chip with a wide range of concentrations was printed to establish a satu-
ration point for observed binding to a fluorescently labeled protein [31].
Thus, a carbohydrate-binding profile can be established for a given protein by
comparing the integrated fluorescence of different spots.
Incubation of these arrays with a series of different gpl20-binding
proteins (ConA, 2G12, Cyanovirin-N, DC-SIGN, and Scytovirin-N) revealed
a precise evaluation of their binding profiles [31]. Figure 11.2-4(b) shows the
corresponding chips. The experiments with 2G12 showed no binding with 12,
15, and 16 suggesting that a Manal-2Man linkage, the only structural motif
in common, is necessary for recognition by 2G12. In contrast, Scytovirin-N,
a protein that was isolated from the cyanobacteriurn Scytonema varium, binds
only to the structures 10 and 14. This result clearly illustrates that a different
structural motif within the oligosaccharide is recognized by Scytovirin-N.
The terminal Manal-2Man linkage, together with the underlying al-6
trimannoside moiety is necessary for Scytovirin-N binding. These studies also
corroborate that these proteins can bind high-density arrays of Manal-2Man-
containing oligosaccharides in the absence of the polypeptide backbone.
11.2.5
Carbohydrate- Nucleic Acid Interactions: Aminoglycosides
Aminoglycosides represent a family of naturally occurring pseudooligosaccha-

rides that consist of two to five monomers and a one-to-one ratio between the
amino and hydroxy groups. Clinically, these compounds have been used to
treat infectious diseases induced by a variety of gram-negative bacteria. Amino-
glycosides exhibit their antibiotic activity by inhibiting protein synthesis by
binding to bacterial ribosomes. Most commonly, aminoglycosides bind to the
A site in the small ribosomal subunit (30s) of the bacterial ribosome result-
ing in misreading during the translational process. Not surprisingly, charge
interactions between amino groups and the phosphate backbone dominate as
binding forces in these aminoglycoside-RNA complexes. As with many other
antibiotics, the efficiency of aminoglycosides has been compromised by the
emergence of resistant bacterial strains [32, 331. The most prominent mecha-
nisms that cause resistance are enzymatic modifications of the aminoglycoside
including N-acetylation and 0-phosphorylation. These modifications result in
a large decrease in binding affinity to the therapeutic target [34].
To facilitate the discovery of safer and more active aminoglycosides, high-
throughput methods are necessary. Microarray techniques enable medicinal
chemists to identify weak binders to resistance-causing enzymes and tight
binders to ribosomal RNA. Recently, our laboratory reported the construc-
tion of aminoglycoside microarrays to study antibiotic resistance [35, 361.
I J Advances in Sugar Chemistry
680
I
I
OR
-OR 13
10
OH OH
HO
HHO O S
0 OH
HHO O M
-OR OR
14
11
OH OH
&
''ORHO
Hoa
15
HO
12
1
HO
OR
16
Fig. 11.2-4 (a) Synthetic substructures of printed at 2 mM. Each carbohydrate is

the triantennary Winked mannoside spotted with a diameter of approximately
including thiol-containing linker for 100-200 pm. False color image of
immobilization and conjugation chemistry. incubations with fluorescently labelled
(b) Carbohydrate microarrays containing C o n 4 2G12, CVN, DC-SICN, and Scytovirin.
synthetic mannose 10-16 and galactose,
The antibiotic was immobilized on amine-reactive glass slides using a

DNA arraying robot. Two aminoglycoside acetyltransferase resistance en-
zymes, 2’-acetyltransferase (AAC(2’))from Mycobacterium tuberculosis [37]and
6’-acetyltransferase (AAC(G’))from Salmonella enterica [38] were used as
examples. Hybridization to the aminoglycoside arrays revealed that each
aminoglycoside interacts with both the enzymes. Comparison with calori-
metric studies of aminoglycoside-binding affinities to AAC(6’) [ 391 found a
strong correlation with the array results. Arrays were also incubated with two
different RNA sequences to determine binding specificity for bacterial and
human A-site RNA.
To facilitate the discovery of inhibitors of resistance-causing enzymes, a
library of aminoglycoside mimetics was synthesized and immobilized. Guani-
dinoglycosides [40] (Fig. 11.2-5) were chosen as aminoglycoside analogs for
several reasons: First, guanidinoglycosides can be readily prepared from
aminoglycosides. Second, the increased positive charge due to the larger
number of nitrogen-containing guanidino groups may allow guanidinoglyco-
sides to bind more tightly to the negatively charged aminoglycoside binding
pocket [41]. Third, the large difference in the pK, values of guanidino and
amino groups (12.5 vs. 8.8) suggests that guanidinoglycosides are likely not
substrates for acetyltransferases such as AAC(2’)and AAC(6’).As anticipated,
guanidinoglycosides revealed higher afinity to resistance-causing enzymes
than the corresponding aminoglycosides. Guanidinoglycosides do not serve as
substrates and inhibit acylation of several clinically important antibiotics. This
promising approach proves valuable for screening a plethora of compounds in
a short time to discover improved drugs that evade current modes of bacterial
resistance.
11.2.6
Carbohydrate- Protein Interactions: Selectins and Heparin
Cell-surface carbohydrates also act as recognition molecules allowing for the

normal trafficking of lymphocytes through the vascular system to the lymphatic
compartment [42]. During this process lymphocytes have to migrate through
specialized endothelial cells in the high endothelial venules. I t has been shown
that the binding of the lymphocytes is dependent on the presence of sialic
acid and calcium. As binding counterparts, three different calcium-dependent
proteins, called E-, P-, and L-selectins, were identified [43, 441. These proteins
682
HO
OH
Kanamycin A
HO
Neomycin B
OH NH
HO OH HO OH
Ribostamycin 6-N-P-Alanin-l,3,3'-N-guanidino-
ribostamycin
Fig. 11.2-5 Representative examples of guanidinoribostamycin) with a

aminoglycosides (Kanamycin A, Neomycin corresponding linker for immobilization
6,Ribostamycin). Furthermore, a guanidino- chemistry is shown.
glycoside (6'-N-B-alanin-l,3,3'-N-
allow for normal trafficking and are involved in the extravasation of leukocytes
during the inflammatory cascade.
With the aid of monoclonal antibodies, sialylated carbohydrate structures,
notably sialyl Lea and sialyl Le", were discovered to function as receptors for
the selectins [43]. Sialyl Le" is usually located on leukocytes, but also highly
expressed on a variety of different cancer cells [45]. The same holds true for
sialyl Lea,which serves as a tumor marker on gastrointestinal and pancreatic
cancers [4G]. Owing to the function ofsialyl Lewis structures in the extravasation
of cancer cells from the bloodstream and promoting metastatic spread to other
tissues, a clear correlation of expression of sialyl Lea and sialyl Le" on tumors
with enhanced progression and metastasis was observed. Since it is assumed
that these tumor-associated carbohydrate markers enhance extravasation and
metastasis by interactions with selectins, experiments were performed where
selectin expression was inhibited. Long-term studies showed that cancer
patients with tumors that express high amounts of sialyl Lea had a 4.5 times
higher probability to survive over a 10-yearperiod if the expression of E-selectin
was permitted [47]. These results point to a specific new form of cancer
therapy by directly inhibiting these carbohydrate-protein interactions that are
responsible for metastasis and tumor progression. Thus, the pharmaceutical
industry has explored the use of the bioactive conformations of sialyl Lea
and sialyl Le" to design glycomimetic drugs that bind to selectins. Beyond
developing glycomimetics based on rational design, combinatorial approaches
had much success. Solid-phase techniques were used to obtain libraries of
fucopeptides [48] for in vitro screening, and high-throughput screening of a
P-selectin assay showed that glycomimetics devoid of carbohydrate structures
also revealed strong binding [49]. However, in general selectins are problematic
for drug discovery because they show relatively weak multivalent interactions
that make a general approach more difficult.
Heparin is widely known to be a biologically important and chemically
unique polysaccharide, regulating a large variety of physiological processes. It
interacts with a plethora of different proteins of physiological importance [50].
The interaction with antithrombin I11 (AT 111) is best understood. Thus, since
the late 1930s heparin has served as a clinical anticoagulant in the treatment
of heart disease. Interactions with growth factors, chemokines, lipid-binding
proteins, and viral envelope proteins are worth noting [SO].
Heparin is a linear, unbranched, highly sulfonated polymer that consists of
(1+4)-linked pyranosyluronic acid and glucosamine units (Fig. 11.2-6) [51].
The type of uronic acid varies; usually 90% of L-iduronic acid and 10% of
D-glucuronic acid are found. Commonly, 20 to 200 disaccharide repeat units
are found giving rise to a tremendous complexity.
Because ofthe high content of negatively charged sulfate and carboxyl groups,
the most prominent type of interaction between heparin and basic amino acids
of the protein is of ionic nature. But, in some cases, hydrogen bonding and
even hydrophobic interactions are not negligible. With the exception of the
AT 111-heparin interaction, where the exact sequence of heparin associating
with the protein has been identified, the structure-function relationship of
Fig. 11.2-6 Schematic view of heparin.

684
I heparin is still very poorly understood. A better understanding is necessary to
apply defined heparin sequences in the treatment of other diseases. A variety

of techniques including S P R have been applied to study heparin-protein
interactions [50].
11.2.7
Detection o f Pathogenic Bacteria
Usually, the detection of pathogenic bacteria, such as Escherichia coli is based

on the selective growth of these bacteria in liquid media or on plates. This
procedure may require several days [52]. More recently, methods such as
pathogen recognition by fluorescently labeled antibodies, DNA probes, or
bacteriophages have been developed and proved to be much faster [52].
In many cases, bacteria as well as viruses bind to carbohydrates displayed
on the host cells they infect. Escherichia coli binds to mannose, influenza virus
binds to sialic acid, to name two examples [53]. To ensure the high-binding
affinity necessary for strong adhesion and successful infection of the cell,
the pathogen often uses multivalent interactions [54]. Conducting polymers
displaying carbohydrates can simulate these binding events and serve as an
ideal material to detect even small amounts of pathogens.
5
0
0
1) HO-ND EDAC 0rJo 0rJo

0
N,N'-Diisopropylamine
50 5p
0
50
2) O
-H
O
. 0
-O-NH~
5p
r'
3, HO+NHz quench
OH
17
Scheme 11.2-4 Synthesis of the carbohydrate-functionalized fluorescent polymer 17 for

the detection o f pathogenic bacteria.
Recently, our laboratory reported a carbohydrate-functionalized poly@-

phenylene ethynylene) (PPE) 17 (Scheme 11.2-4) that can be used for the
detection of Escherichia coli by multivalent interactions [55]. Therefore,
2'-aminoethyl mannoside and galactoside were coupled to PPE. Unreacted
succinimide esters were quenched by addition of excess ethanolamine before
washing with water-removed uncoupled reagents. The loading of the polymer
was determined by a phenol sulphuric acid test and revealed that about 25% of
the reactive sites were functionalized with glycosides. A fluorescence resonance
energy transfer (FRET) experiment insured that mannose-binding lectins
interact with mannose displayed on the polymer without affecting binding
selectivity and do not exhibit any nonspecific binding. Experiments with two
bacterial strains differing in their mannose-binding properties revealed that the
mannose-functionalized polymer imparted strong fluorescence to mannose-
binding Escherichia coli. Even separation and rinsing procedures are not able to
remove the bacteria from the polymer. In contrast, the mutated strain unable
to bind mannose showed no signal and no aggregation of bacteria.
The binding events involving the functionalized polymers and the bacteria
were studied with the microscope. Mutant bacteria that lost the ability to bind
to mannose do not bind to the polymer, whereas the mannose-binding bacteria
aggregate in clusters with fluorescent centers (Fig. 11.2-7).The number ofcells
in these clusters varies between 30 and several thousand. As anticipated, the
larger the aggregates, the stronger the fluorescence signal. Competitive binding
experiments with other carbohydrates displayed on the polymer do not reveal
any fluorescent clusters. To determine the detection limit of this new method,
serially diluted solutions of mannose-binding E. coli were incubated with the
mannose-containing polymer. Fluorescence microscopy experiments revealed
a limit in the range of 103-104 bacteria. Similar values were obtained earlier
by using fluorescently labeled antibodies. Further competitive experiments
have shown that only relatively high concentrations of free mannose (10 mM)
inhibit binding to the polymer, significantly. At concentrations of less than
10 yM the clustering is not affected.
However, many pathogens bind to the same carbohydrates, for example,
E. coli as well as Salmonella enterica bind to mannose. This limitation may
be overcome using cross-reactive sensor analysis [56]. Thus, the binding to a
variety of different analytes is checked in parallel. By comparison with known
data, the detection and determination of single or multiple pathogens, even
within complex mixtures, should be possible in the near future. The underlying
principle is the basis for the olfactory sense in most animals.
11.2.8
Conclusion
The isolation, purification, and structure elucidation as well as the synthesis of

carbohydrates have been challenging goals for decades. Recently, new methods
686
Fig. 11.2-7 Laser scanning confocal transmitted light images). (c) Fluorescence
microscopy image of: (a) Mutant Escherichia microscopy image of a large fluorescent
coli that does not bind t o polymer 17. (b) A bacterial cluster. (d) Conventional
fluorescent bacterial aggregate due t o fluorescence spectra of polymer 17 (black)
multivalent interactions between the and normalized fluorescence spectra of a
mannose-binding bacterial pili and the bacterial cluster obtained using confocal
polymer 17 (superimposed fluorescence and microscopy (red).
to gain access to these complex molecules have been developed, including a

fully automated oligosaccharide synthesizer. Glycosyl phosphates and glycosyl
trichloroacetimidates proved to be a powerful class of glycosylating agents for
this purpose. High-yieldingcoupling steps are achieved on the solid support by
using an excess amount of building blocks in the presence of a stoichiometric
amount of TMSOTf. Suitable protection and deprotection strategies lead to
the assembly of linear and even branched oligosaccharides that can now be
performed in a fully automated manner.
Several tools to understand the intricate role of oligosaccharides in various
cell-signaling processes have been developed. The “chip” format enables
glycoscientists to elucidate interactions of carbohydrates with fluorescently
labeled proteins, including bacterial and viral toxins. Clever linking chemistries
provide a wider range of glycans for screening in the microarray format. The
chips are constructed by using standard DNA gene chip instrumentation. To
References I687
detect interactions, only miniscule amounts of both ligand and analyte are
necessary.
The tool kit consisting of carbohydrate synthesizer and carbohydrate
microarrays lays the foundation for the discovery and elucidation of new
drugs, as studies with the fully synthetic antitoxin malaria vaccine candidate
have shown. HIV neutralizing proteins have been identified by studies
with carbohydrate microarrays; aminoglycoside microarrays were used to
test antibacterial resistance. Fluorescent polymers can be utilized to detect
small amounts of pathogenic bacteria in a short time.
Although many complex carbohydrate structures of pyranosides are now
accessible by automated synthesis, the automated assembly of bacterial sugars
is still a difficult goal to achieve. A further bottleneck is the rapid and highly
efficient synthesis of the monosaccharide building blocks. More efficient
syntheses for most of the approximately SO carbohydrate building blocks are
required.
Future glycobiologists will be able to screen a plethora of complex
carbohydrates that are thought to play previously unimaginable roles in
biological systems. The knowledge gained from glycomics will be as important
a basis for future drug discovery as that discovered in the field of genomics and
proteomics during the last 30 years. We are still just beginning to understand
the importance of carbohydrates in biological information transfer and much
remains to be discovered.
Acknowledgments
We thank all present and past members of the Seeberger group and our
collaborators who contributed to the results reported in this chapter. Daniel
B. Werz is grateful to the Alexander von Humboldt Foundation for a Feodor
Lynen Research Fellowship and to the Deutsche Forschungsgemeinschaft
(DFG) for an Emmy Noether Fellowship. Peter H. Seeberger thanks the ETH
for financial support.
References
1. (a) P. Nissen, J. Hansen, N. Ban, T.A. 2. (a) A. Varki, Biological roles of

Steitz, The structural basis of oligosaccharides: all the theories are
ribosome activity in peptide bond correct, Glycobiology 1993, 3, 97-130;
synthesis, Science 2000, 289, 920-930; (b) H. Lis, N. Sharon, Protein
(b) N. Ban, P. Nissen, J. Hansen, P.B. glycosylation. Structural and
Moore, T.A. Steitz, The complete functional aspects, Eur. /. Biochem.
atomic structure of the large 1993, 218, 1-27; (c) R.A. Dwek,
ribosomal subunit at 2.4A resolution, Glycobiology:Toward understanding
Science 2000, 289, 905-920. the functions of sugars, Chem. Rev.
6881 J J Advances in Sugar Chemistry
1996, 96,683-720; (d) Y.C. Lee, R.T. carbohydrates, Pure Appl. Chem. 1995,
Lee, Carbohydrate- 67,1609-1616.
protein interactions: Basis of 9. O.J. Plante, E.R. Palmacci, P.H.
glycobiology, Acc. Chem. Res. 1995, 28, Seeberger, Automated solid-phase
322-327; (e) W.H. Chambers, C.S. synthesis of oligosaccharides, Science
Brisette-Storkus, Hanging in the 2001,291,1523-1527.
balance: natural killer cell recognition 10. P.H. Seeberger, Automated
of target cells, Chem. Biol. 1995, 2, carbohydrate synthesis to drive
429-435. chemical glycomics, Chem. Commun.
3. T. Hunkapiller, R. J. Kaiser, B.F. Koop, 2003, 1115-1121.
L. Hood, Large-scale and automated 1 I. 0. J. Plante, R.B. Andrade, P.H.
DNA sequence determination, Science Seeberger, Synthesis and use of
1991,354,59-67. glycosyl phosphates as glycosyl
4. (a) M.H. Caruthers, Gene synthesis donors, Org. Lett. 1999, I, 211-214.
machines: DNA chemistry and its 12. R.R. Schmidt, W. Kinzy,
uses, Science 1985,230,281-285; Anomeric-oxygen activation for
(b) M.H. Caruthers, Chemical glycoside synthesis: the
synthesis of DNA and DNA analogs, trichloroacetimidate method, Adv.
Acc. Chem. Res. 1991,24,278-284. Carbohydr. Chem. Biochem. 1994, 50,
5 . E. Atherton, R.C. Sheppard, 21-123.
Solid-phase peptide synthesis: A practical 13. K.R. Love, P.H. Seeberger, Automated
approach, Oxford University Press, solid-phase synthesis of protected
Oxford, 1989. tumor-associated antigen and blood
6. (a) R. Rodebaugh, S. Joshi, group determinant oligosaccharides,
B. Fraser-Reid, H.M. Geysen, Angew. Chem., Int. Ed. 2004, 43,
Polymer-supported oligosaccharides 602-605.
via n-pentenyl glycosides: 14. M.C. Hewitt, P.H. Seeberger,
methodology for a carbohydrate Automated solid-phase synthesis of a
library, J . Org. Chem. 1997, 62,
branched Leishmania cap
5660-5661; (b) J. Rademann,
tetrasaccharide, Org. Lett. 2001, 3,
A. Geyer, R.R. Schmidt, Solid-phase
3699-3702.
supported synthesis of the branched
15. R.B. Andrade, O.J. Plante, L.G.
pentasaccharide moiety that occurs in
Melean, P.H. Seeberger, Solid-phase
most complex type N-glycan chains,
oligosaccharide synthesis: Preparation
Angew. Chem., Int. Ed. 1998, 37,
of complex structures using a novel
1241- 1245.
7. (a) S.J. Danishefsky, M.T. Bilodeau,
linker and different glycosylating
Glycals in organic synthesis: the agents, Org. Lett. 1999, I, 1811-1814.
evolution of comprehensive strategies 16. E.R. Palmacci, M.C. Hewitt, P.H.
for the assembly of oligosaccharides Seeberger, “Cap-Tag” - novel
and glycoconjugates of biological methods for the rapid purification of
consequence, Angew. Chem., Int. Ed. oligosaccharides prepared by
Engl. 1996, 35, 1380-1419; (b) P.H. automated solid-phase synthesis,
Seeberger, S.J. Danishefsky, Angew. Chem., Int. Ed. 2001, 40,
Solid-phase synthesis of 4433-4437.
oligosaccharides and glycoconjugates 17. G.Hummel, R.R. Schmidt,
by the glycal assembly method: A five Glycosylimidates. 79. A versatile
year retrospective, Acc. Chem. Res. preparation of the lactoneo-series
1998, 31, 685-695; (c) R.R. Schmidt, antigens-preparation of sialyl dimer
J.C. Castro-Palomino, 0. Retz, New Lewis X and the dimer Lewis Y,
Aspects of glycoside bond formation, Tetrahedron Lett. 1997, 38, 1173-1 176.
Pure Appl. Chem. 1999, 71,729-744. 18. P.P.Deshpande, S.J. Danishefsky,
8. C.-H. Wong, Enzymic and Total synthesis of the potential
chemo-enzymic syntheses of anticancer vaccine KH-1
References I689
adenocarcinoma antigen, Nature 1997, protein-carbohydrate interactions with
387,164-166. microarrays of synthetic
19. G. Ragupathi, P.P. Deshpande, D.M. oligosaccharides, ChemBioChem2004,
Coltart, H.M. Kim, L. J. Williams, S.J. 5, 379-383.
Danishefsky, P.O. Livingston, 23. D. Wang, S. Liu, B.J. Trummer,
Constructing an adenocarcinoma C. Deng, A. Wang, Carbohydrate
vaccine: immunization of mice with microarrays for the recognition of
synthetic KH-1 nonasaccharide cross-reactive molecular markers of
stimulates anti-KH-1and anti-Le(y) microbes and host cells, Nut.
antibodies, Znt. J . Cancer 2002, 99, Biotechnol. 2002, 20, 275-281.
207- 2 12. 24. B.T. Houseman, M. Mrkisch,
20. D.M. Ratner, E.W. Adams, M.D. Carbohydrate arrays for the evaluation
Disney, P.H. Seeberger, Tools for of protein binding and enzymatic
glycomics: Mapping interactions of modification, Chem. Biol. 2002, 9,
carbohydrates in biological systems, 443-454.
ChemBioChem2004, 5, 1375-1383. 25. E.W. Adams, J. Ueberfeld, D.M.
21. (a) D. Barnes-Seemann, S.B. Park, Ratner, B.R. O’Keefe, D.R. Walt, P.H.
A.N. Koehler, S.L. Schreiber, Seeberger, Encoded fiber-optic
Expanding the functional group microsphere arrays for probing
compatibility of small molecule protein-carbohydrate interactions,
microarrays: Discovery of novel Angew. Chem., Znt. Ed. 2003, 42,
calmodulin ligands, Angew. Chem., 5317-5320.
lnt. Ed. 2003, 42,2376-2379; 26. B.T. Houseman, E.S. Gawalt,
(b) S. Fukui, T. Feizi, C. Galustian,
M. Mrksich, Maleimide functionalized
A.M. Lawson, W. Chai,
self-assembled monolayers for the
Oligosaccharide microarrays for
preparation of peptide and
high-throughput detection and
carbohydrate biochips, Langmuir 2003,
specifity assignments of
19,1522-1531.
carbohydrate-protein interactions, Nat.
27. World Health Organization, World
Biotechnol. 2002, 20, 1011-1017:
malaria situation 1990, World Health
(c) A.N. Koehler, A.F. Shamji, S.L.
Stat. Q. 1992, 45, 257-266.
Schreiber, Discovery of an inhibitor of
28. L. Schofield, F. Hackett, Signal
a transcription factor using small
molecule microarrays and diversity transduction in host cells by a
oriented synthesis, J. Am. Chem. SOC. glycosylphosphatidylinositol toxin of
2003, 125,8420-8421; (d) P.J. malaria parasites, J . Exp. Med. 1993,
Hergenrother, K.M. Depew, S.L. 177,145-153.
Schreiber, Small molecule 29. L. Schofield, M.C. Hewitt, K. Evans,
microarrays: Covalent attachment and M.-A. Siomos, P.H. Seeberger,
screening of alcohol-containing small Synthetic GPI as a candidate anti-toxic
molecules on glass slides, J . Am. vaccine in a model of malaria, Nature
Chem. SOC.2000, 122,7849-7850. 2002,418,785-789.
22. (a) S. Bidlingmaier, M. Snyder, 30. M.C. Hewitt, D.A. Snyder, P.H.
Carbohydrate analysis prepares to Seeberger, Rapid synthesis of a
enter the “omics” era, Chem. Biol. glycosylphosphatidylinositol-based
2002, 9,400-401; (b) K.R. Love, P.H. malaria vaccine using automated
Seeberger, Carbohydrate arrays as solid-phase oligosaccharide synthesis,
tools for glycomics, Angew. Chem., Znt. J . Am. Chem. Soc. 2002, 124,
Ed. 2002, 41, 3583-3586: (c) L.L. 13434-13436.
Kiessling, C.W. Cairo, Hitting the 31. E.W. Adams, D.M. Ratner, H.R.
sweet spot, Nat. Biotechnol. 2002, 20, Bokesch, j.B. McMahon, B.R. O’Keefe,
234-235; (d) D.M. Ratner, W.W. P.H. Seeberger, Oligosaccharide and
Adams, J. Su, B.R. O’Keefe, glycoprotein microarrays as tools in
M. Mrkisch, P.H. Seeberger, Probing HIV glycobiology: Glycan-dependent
690
gpl20/protein interactions, Chem. 41. M.W. Vetting, S.S. Hegde,

Biol. 2004, 11,875-881. F. Javid-Majd, J.S. Blanchard, S.L.
32. C. Walsh, Molecular mechanism that Roderick, Aminoglycoside
confer antibacterial drug resistance, 2’-N-acetyltransferase from
Nature 2000, 406, 775-781. Mycobacterium tuberculosis in
33. G.D. Wright, Mechanisms of complex with coenzyme A and
resistance to antibiotics, Cur. Opin. aminoglycoside substrates, Nat. Struct.
Chem. Biol. 2003, 7,563-569. Biol. 2002, 9, 653-658.
34. B. Llano-Sotelo, E.F. Azucena Jr, L.P. 42. B.M. Gesner, V. Ginsburg, Effect of
Kotra, S. Mobashery, C.S. Chow, glycosidases on the fate of transfused
Aminoglycosides modified by lymphocytes, Proc. Natl. Acad. Sci.
resistance enzymes display U.S.A.1964,52,750-755.
diminished binding to the bacterial 43. (a)M.L. Phillips, E. Nudelman, F.C.
ribosomal aminoacyl-tRNA site, Chem. Gaeta, M. Perez, A.K. Singhal,
Biol. 2002, 9, 455-463. S. Hakomori, J.C. Paulson, ELAM-1
35. M.D. Disney, S. Magnet, J.S. mediated cell adhesion by recognition
Blanchard, P.H. Seeberger, of carbohydrate ligand, sialyl-Le’,
Aminoglycoside microarrays to study Science 1990, 250, 1130-1132; (b) E.L.
antibiotic resistance, Angew. Chem., Berg, J. Magnani, R.A. Warnok, M.K.
Int. Ed. 2004,43, 1591-1594. Robinson, E.C. Butcher, Comparison
36. M.D. Disney, P.H. Seeberger, of L-selectin and E-selectin ligand
Aminoglycoside microarrays to specifities: the L-selectin can bind the
explore interactions of antibiotics with E-selectin ligands sialyl Le(x)and sialyl
RNAs and proteins, Chem. - Eur. J . Le(y), Biochem.Biophys. Res. Commun.
2004, 10,3308-3314. 1992, 184,1048-1055; (c) M. Yoshida,
37. S.S. Hegde, F. ]avid-Maid, J.S. A. Uchimura, M. Kiso, A. Hasegawa,
Blanchard, Overexpression and Synthesis of chemically modified sialic
mechanistic analysis of acid-containing sialyl LeX-ganglioside
chromosomally encoded analogues recognized by the selectin
aminoglycoside 2-N’-acetyltransferase, family, GlycoconjugateJ.1993, 10,
/. Biol. Chem. 2001, 276,45876-45881. 3-15.
38. S . Magnet, T. Lambert, P. Courvalin, 44 J.L. Magnani, The discovery, biology,
J.S. Blanchard, Kinetic and mutagenic and drug development of sialyl Lea
characterization of chromosomally and sialyl Le’, Arch. Biochem. Biophys.
encoded salmonella enterica 2004,426, 122-131.
AAC(6’)-lyaminoglycoside 45. R. Kannagi, Carbohydrate-mediated
N-acetyltransferase, Biochemistry 2001, cell adhesion involved in
40,3700-3709. hematogenous metastasis of cancer,
39. S.S. Hedge, T.K. Dam, C.F. Brewer, GlycoconjugateJ. 1997, 14, 577-584.
J.S. Blanchard, Thermodynamics of 46. J.L. Magnani, B. Nilsson,
aminoglycoside and acyl-coenzyme A M. Brockhaus, D. Zopf, Z. Steplewski,
binding to salmonella enterica H. Koprowski, V. Ginsburg, A
(AAC(2’)k)from mycobacterium monoclonal antibody-defined antigen
tuberculosis, Biochemistry2002, 41, associated with gastrointestinal cancer
7519-7527. is a ganglioside containing sialylated
40. (a) T.J. Baker, N.W. Luedtke, Y. Tor, lacto-N-fucopentaoseI1, /. Biol. Chem.
M. Goodman, Synthesis and Anti-HIV 1982,257,14365-14369.
activity of guanidino glycosides, J . Org. 47. S. Matsumoto, Y. Imaeda,
Chem 2000,65,9054-9058; (b) N.W. S. Umemoto, K. Kobayashi,
Luedtke, T.J. Baker, M. Goodman, H. Suzuki, T. Okamoto, Cimetidine
Y. Tor, Guanidinoglycosides: A novel increases survival of colorectal cancer
family of RNA ligands, 1. Am. Chem. patients with high levels of sialyl
SOC.2000, 122, 1?035-i2036. iewis-X and siaiyl Lewis-A epitope
References I 6 9 1
expression on tumor cells, Br.]. microbial conditions, Mod. Drug

Cancer 2002, 86,161-167. Discov. 2004, 7, 36-38.
48. C.M. Huwe, T.J. Woltering, J . Jiricek, 53. (a) K.A. Karlsson, Bacterium-host
G. Weitz-Schmidt, C.-H. Wong, protein-carbohydrate interactions and
Design, synthesis and biological pathogenicity, Biochem. Soc. Trans.
evaluation of aryl-substituted sialyl 1999, 27,471-474; (b) K.A. Karlsson,
Lewis-X mimetics prepared via Pathogen-host protein-carbohydrate
cross-metathesis of C-fucopeptides, interactions as the basis of important
Bioorg. Med. Chem. 1999, 7, 773-788. infections, Adv. Exp. Med. B i d . 2001,
49. D.H. Slee, S.J. Romano, J. Yu, T.N. 491,431-443.
Nguyen, J.K. John, N.K. Raheja, F.U. 54. M. Mammen, S.-K. Choi, G.M.
Axe, T.K. Jones, W.C. Ripka, Whitesides, Polyvalent interactions in
Development of potent biological systems: implications for
non-carbohydrate imidazole-based design and use of multivalent ligands
small molecule selectin inhibitors and inhibitors, Angew. Chem., Int. Ed.
with anti-inflammatory activity, 1. 1998,37,2745-2794.
Med. Chem. 2001,44,2094-2107. 55. M.D. Disney, J. Zheng, T.M. Swager,
50. I. Capila, R.J. Linhardt, P.H. Seeberger, Detection of Bacteria
Heparin-protein interactions, Angew. with carbohydrate-functionalized
Chem., Int. Ed. 2002, 41, 390-412. fluorescent polymer,
51. (a) B. Casu, Structure and biological /. Am. Chem. Soc. 2004, 126,
activity of Heparin, Adv. Carbohydr. 13343-133346.
Chem. Biochem. 1985,43, 51-134; 56. K.J. Albert, N.S. Lewis, C.L. Schauer,
(b) W.D. Comper, Heparin and Related G.A. Sotzing, S.E. Stitzel, T.P. Vaid,
Polysaccharides, Vol. 7, Gordon and D.R. Walt, Cross-reactive chemical
Breach, New York, 1981. sensor arrays, Chem. Rev. 2000, 100,
52. R.C. Willis, Improved molecular 2595-2626.
techniques help researchers diagnose
Chemical Biology
12
The Bicyclic Depsipeptide Family of Histone Deacetylase
Inhibitors
Paul A. Townsend, Simon J . Crabb, Sean M . Davidson, Peter W. M . Johnson,
Graham Packham. and Arasu Ganesan
Outlook
It is only a decade since the first human histone deacetylase (HDAC) was
identified. Within this short period of time, these enzymes have had a
glorious history. Broad ranging studies by both chemists and biologists have
dramatically increased our fundamental understanding of H DACs and their
function in eukaryotic cell regulation. On the drug discovery front, multiple
HDAC inhibitors are at stages of clinical development as anticancer agents.
It is probable that more than one will soon be approved as a drug. A further
development is the link between HDAC inhibitors and a growing set of
therapeutic indications outside the cancer area. One can anticipate proof of
concept animal models leading to clinical trials for these drugs in the near
future.
In this review, we have focused on the bicyclic depsipeptide family of natural
product HDAC inhibitors. Compared to other classes, these compounds exhibit
high potency and a marked degree of selectivity between individual HDACs.
One of the natural products, FK228, is currently in advanced clinical trials
for cancer. Others, the spiruchostatins, were recently discovered and show a
similar biological profile of action. With these natural products, it is unclear
(and unlikely) that their precise structure represents the optimal molecule
within this class for human therapeutics. Several academic laboratories,
including our own, have achieved the total synthesis of depsipeptides. These
routes are being applied to the preparation of novel unnatural analogs,
which hold great promise in further exploiting the depsipeptides as subtype-
selective biological probes of HDAC function and as potential therapeutic
agents.
Copyright 0 2007 WILEY-VCH Verlag GmbH 61 Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
12 The Bicyclic Depsipeptide Family of Histone Deacetylase Inhibitors
694
I 12.1
Epigenetic Mechanisms o f Gene Regulation
One of the hallmarks of cellular pathologies such as neoplastic transformation

is that the normal control of differentiation, cell-cycle progression, and
appropriate entry into apoptosis (programmed cell death) becomes deranged.
This abnormal phenotype is a consequence of altered patterns of protein
expression, which in turn result from a variety of genetic abnormalities. An
area of increasing interest in basic and clinical research, are the epigenetic
control mechanism [l],focusing on the modulation of DNA packaging as a
means of gene expression regulation.
The genomic DNA of eukaryotes is tightly compacted into the higher order
structure of chromatin, which comprises histones, nonhistone proteins, and
DNA. These components come together in a tightly wound and organized
structure that is dynamic in its nature. The basic repeating unit of such chro-
mosomal organization is the nucleosome that occurs in approximately every
200 DNA bp, consisting of 146 bp of DNA wrapped left handed twice around an
octamer core ofpaired histones H3, H4, H2A, and H2B as successive “beads on
a string”. Nucleosomes are then usually further packed together via the linker
histone H1 allowing condensation of this fundamental unit into higher order
structures visible as chromosomes at metaphase. Posttranslational modifica-
tion of the higher order structure of DNA has now been demonstrated to have
an important role in regulating gene expression - bearing out a prediction [2]
made over 40 years ago. Modification to DNA occurs primarily by methylation
at CpG residues, which appears [3] to be a gene-silencing mechanism. In a
similar manner, the histone proteins undergo a variety of reversible posttrans-
lational modifications (Fig. 12-1)that cause an alteration in chromatin structure
and, hence, have a profound impact on the accessibility of DNA to the transcrip-
tional machinery. Histones exist as globular domains with long N-terminal
tails making up 25% of their structure. Lysine residues in the tail can undergo
acetylation, methylation, ubiquitinylation, and sumoylation. Additional post-
translational modifications include methylation of arginine, phosphorylation
of serine, and poly-ADP ribosylation of glutamate and aspartic acid residues.
Histone acetyltransferases (HATS)mediate the transfer of an acetyl group
from acetyl-coenzyme A (CoA) to the &-aminogroup of lysine. This simple
change dramatically alters the lysine side chain from its protonated positively
charged state at physiological pH to a neutral residue. As a result, the
afinity between the negatively charged DNA phosphodiester backbone and
the positively charged histones is weakened, enabling protein complexes
such as yeast mating type switching (SWI)/sucrosenonferuenting (SNF) and
other transcriptional factors to bind DNA, further relaxing its tightly wound
structure. Acetylation on the K9 and K4 lysines of the N-terminus tails of
internal, core, histones of the nucleosome is particularly associated with
enhanced gene expression. The return of acetyl-lysineto lysine is catalyzed by
a second family of hydrolyzing enzymes, the histone deacetylases (HDACs).
12.1 Epigenetic Mechanisms ofGene Regulation 1 695
Fig. 12-1 Examples of posttranslational modifications at histone tails. Source: M. Biel,

M. Wascholowski, A. Ciannis, Angew. Chem., Int. Ed. 2004, 44, 3186-3216.
Such deacetylation tends to lead to a more tightly bound and transcriptionally

silenced state (Fig. 12-2).
In general, transcriptional activators can bind and recruit HATS while
transcriptional repressors and corepressors interact with HDACs. The
unwinding of DNA off histones by lysine acetylation is conceptually helpful for
understanding the action of HATS and HDACs. It is, nevertheless, a simplistic
and incomplete explanation for the way in which these enzymes control gene
expression. For example, in some cases [4]inhibition of HDACs can lead to a
counterintuitive decrease in gene expression. It is likely that the overall pattern
of histone modification (of which acetylation is but one example) represents
Fig. 12-2 Schematic representation of histone acetylation as a model for transcriptional

control by epigenetic mechanisms.
696
I a “histone code” that in turn acts as a conduit for the recruitment of binding
partners that determine the state of gene transcription.

Among the histone modifying enzymes, HATs and HDACs are the
best characterized biochemically and provide attractive opportunities for
interdisciplinary research between chemists and biologists. At present, the
HDACs have outstripped the HATs in terms of their impact on drug
discovery. Multiple HDAC inhibitors from several chemical classes are
currently in clinical trials for cancer chemotherapy, whereas the literature
on HAT inhibitors is limited to in vitro data.
12.2
Histone Deacetylases
HDACs are an evolutionarily conserved group of enzymes, which catalyze

the hydrolysis of acetyl-lysine residues in proteins. While the importance of
this process for histones in modulating chromatin structure cannot be over-
estimated, “histone” deacetylase is a dangerously misleading nomenclature.
Reversible lysine acetylation has been identified [ S ] in an increasing number
of nonhistone proteins, both nuclear and cytoplasmic (Table 12-1).Transcrip-
tion factors dominate the list, with over 40 documented including MyoD,
NF-KB, GATA-1, c-Jun, B-Myb, and AML-1. With these proteins, acetylation
can modify DNA binding affinity, coregulator association, nuclear localization,
and susceptibility to posttranslational modification such as phosphorylation
or ubiquitinylation.
There are more than a dozen individual human HDAC enzymes [GI,which
can be divided into three main classes on the basis of structure and functional
characteristics through homology to yeast HDACs. HDACs 1 through 11
share the common mechanistic feature of being metalloenzymes, with a
highly conserved catalytic domain of 390 amino acids containing a zinc atom.
They are further subdivided into class I and class 11 enzymes. The class I
HDACs 1 , 2 , 3 ,and 8 are homologous in their catalytic sites to the yeast HDAC
Rpd3. They have a ubiquitous distribution and are localized to the nucleus. The
class I1 HDACs 4, 5, 6, 7, 9, and 10 are larger in size, restricted in their tissue
distribution, have the ability to shuttle between the cytoplasm and nucleus and
are homologous to the yeast HDAC HdaI. HDACll has similarities to both
class I and class I1 and is usually classified separately.
The class 111 HDACs (sirtuins) comprise a distinct set of enzymes sirtuin
(SIRT) 1-7 with a common 275 amino acid catalytic domain and homology
to yeast silent information regulator 2 (Sir2). These HDACs do not contain a
catalytic zinc, using nicotinamide adenine dinucleotide (NAD+)as the cofactor
that acts as an acetyl acceptor following hydrolysis of the nicotinamide moiety.
The sirtuins potentially constitute a link between cellular energy status and
transcriptional regulation. They are gaining widespread interest [7] because
of their intriguing involvement in several fundamental processes, including
12.3 Class I and Class / I HDACs as Drug Discovery Targets I 697
Table 12-1 Nonhistone proteins regulated by acetylation status
Function Acetylation targets
Transcription factor p73, TCF, GATA-I, RelA, E2F, UBF, EKLF, NF-Y,
STATG, CREB, c-Jun,CIEBDj3, E2A, HMGI (Y),
UBF, N F - K B p65/Rel A, NF-KB p50, YYI, BclG,
Cart-1, HIV-1 Tat, Brm, MyoD, TALl/SCL, E2A,
HIF-la, TFIIE, TFIIF, PC4, TFIIB, TAFI68
Tumor suppressor P53
Cell cycle Rb
Cell adhesion p-Catenin
Nuclear hormone receptor AR, E R a
Nuclear import factor Importin a , Rehl
Cytoskeleton protein a-Tubulin
Chaperone protein HSPOO
Signaling regulation Smad7
Apoptosis regulator Ku70
Nonhistone chromatin protein HMGBl/HMGl, HMGB2/HMGZ,
HMGNl/HMG14, HMGN2/HMG17
DNA metabolism Flap endonuclease-1, thymine DNA glycosylase,
Werner DNA helicase
DNA replication factor PCNA, MCM3
Chromatid cohesion protein San, cohesion subunits
Viral protein Adenoviral ElA, large T antigen, HIV Tat, s-HDAg
Bacterial protein Alba, CheY, acetyl CoA synthetase
Histone acetyl transferase DCAF, p300, CBP
longevity, apoptosis, gene silencing, and DNA damage repair. Nevertheless,

our understanding of the sirtuins is at a more embryonic stage than that of
the zinc metalloenzymes. For this reason, they will not be discussed further.
12.3
Class I and Class II HDACs as Drug Discovery Targets
HDACs play a fundamental role in determining the state of chromatin, and

are involved in the modulation of numerous other important proteins. Thus,
although the first human HDACs were only identified a decade ago, it is
not surprising that these enzymes are already attractive therapeutic targets [8]
for a host of diseases including cancer, neurodegenerative disorders, cardiac
hypertrophy, inflammation, diabetes, atherosclerosis, and infectious diseases.
Altered acetylation patterns are a hallmark [9] of many primary tumor
types. The best evidence for the importance of HDACs in cancer comes
from studies with small molecule HDAC inhibitors, ranging from cell-based
in vitro experiments to tumor xenograft models and human clinical trials.
Reassuringly, despite the potential for HDAC inhibitors to affect a range of
12 The Bicyclic Depsipeptide Family of Histone Deacetylase lnhibitors
698
I normal processes in healthy cells, early clinical studies have established [lo]
that they are relatively well tolerated in humans.
Investigations into HDAC inhibitors outside the cancer area are more recent
and are at an earlier stage of drug development. Nevertheless, there are mouse
and Drosophila models demonstrating [ll]positive effects of HDAC inhibitors
for the treatment of neurodegenerative ailments such as Parkinson’s and
Huntington’s disease. Similarly, mice knockouts and in vitro studies link [12]
aberrant HDAC activity with cardiac hypertrophy. The inhibition of some
HDACs has a beneficial effect in repressing hypertrophy, while HDACs 5 and
9 are anti hypertrophic. This suggests that a selective HDAC inhibitor will not
be suitable for therapy. In scientific papers and the patent literature, there are
reports of the beneficial effects of HDAC inhibitors in models for various other
therapeutic indications including inflammation [13], immunomodulation [ 141,
diabetes [15], and atherosclerosis [lG].
HDAC inhibitors are potentially useful for the treatment of infec-
tious diseases. This is most well documented with the malaria par-
asite. Merck and GlaxoSmithKline have reported [17] a series of in-
hibitors based on the apicidin cyclic tetrapeptide natural product scaf-
fold with some selectivity for Plasmodium over human HDACs. In the
antiviral field, HDAC inhibitors were recently shown [18] to drive the
expression of latent reservoirs of HIV, thus facilitating their eradica-
tion. Outside the human therapeutic areas, there is an interesting recent
patent [19] by Dow who has independently isolated FK228, a HDAC in-
hibitor, from a Madagascar plant and shown that it is an antiinsecticidal
agent.
12.4
HDAC Inhibitors
The lead small molecule inhibitors of zinc-dependent class I and class 11

HDACs were identified indirectly before an understanding of their mechanism
of action or the characterization of the human enzymes. Thus, Breslow’s
pioneering studies about the cell differentiating ability of DMSO led to
synthetic hydroxamic acid compounds that were later recognized as potent
HDAC inhibitors. Meanwhile, high-throughput screening of crude natural
product extracts in cell-based antimicrobial and anticancer assays followed
by isolation of the active principle provided compounds such as trichostatin,
trapoxin, and FK228 that were later shown to share the common mechanism
of HDAC inhibition.
Regardless of their origin, the structures of most inhibitors of the zinc-
dependent HDAC inhibitors can be easily rationalized. They conform to the
classical medicinal chemistry dogma for modulating hydrolase enzymes with
a catalytic metal at the active site by competitive reversible inhibitors. Such
compounds have two key features:
12.4 HDAC Inhibitors I 699
1. A resemblance to the enzyme substrate, promoting high

affinity recognition and binding by the enzyme.
2. Replacement of the scissile bond by a metal-binding
group, often a bidentate chelator.
This strategy has yielded successful drugs in the past, such as the angiotensin
converting enzyme (ACE) inhibitor Captopril and later congeners. More
recent examples include inhibitors of matrix metalloproteinases and peptide
deformylase. For HDACs, the pharmacophore is defined by a metal-binding
group attached to a linear unit of similar dimensions to the lysine side chain of
the substrate. This is terminated by a “cap” that serves to orient the inhibitor
in the enzyme’s substrate-binding channel.
The difficulty of expressing eukaryotic HDACs and obtaining them in
pure form has hampered our understanding of the mechanism of action
at the molecular level. A seminal breakthrough came about in 1999 with
the X-ray structure [20] of a HDAC-like protein from the thermophilic
bacterium Aqu@x aeolicus. Since bacteria lack histones, presumably the
protein acts as a lysine deacetylase upon other substrates. The bacterial
protein shares high homology with class I HDACs in its catalytic domain
and offers a reliable working model for the latter. The zinc atom in
the enzyme active site lies at the end of a narrow substrate-binding
channel that binds the acetyl-lysine side chain (Fig. 12-3). More recently,
the structures of human HDACS [21] and a bacterial enzyme [22]homologous
to class I1 HDACs were disclosed. At a gross level, all these structures are
similar in their substrate-binding channels. They are less informative in
Fig. 12-3 The X-ray structure of a bacterial corresponds to that of Fig. 12-4, with the
histone deacetylase-like protein catalytic zinc in purple. Source: T. A. Miller,
homologous to human class I HDACs. The D. 1. Witter, 5. Belvedere,J. Med. Chern.
color coding of amino acid residues 2003,46,5097-5116.
700
I 12 The Bicyclic Depsipeptide Family of Histone Deacetylase Inhibitors
Fig. 12-4 Sequence homology between rim regions are shown in color. Source:
mammalian HDACs and the bacterial T. A. Miller, D. J. Witter, S. Belvedere,J.
HDAC-like protein (HDLP). Conserved Med. Chem. 200% 4 6 5097-5116.
residues within the active site, channel, and
predicting the differences between isoforms at the “rim” of the channel,

and it is precisely these differences that are likely to determine substrate
specificity (Fig. 12-4). In the instance of lack of X-ray structures, one
approach has been to estimate [23] the eukaryotic enzymes by homology
modeling.
The simplest HDAC inhibitors are short chain carboxylic acids such as
butyric acid, where presumably the acid is the zinc-binding group. These are
relatively low in potency (micromolar ICSO). Valproic acid, an old drug used as
an anticonvulsant, is similarly a modest HDAC inhibitor and has now advanced
to clinical trials as an anticancer agent (Fig. 12-5).The low potency, combined
with short half-life and metabolic instability are the liabilities associated with
this class of HDAC inhibitors.
Hydroxamic acids are excellent metal-binding chelators, and they represent
the most important family of HDAC inhibitors with many examples of
nanomolar potency. This motif has been exploited by nature, as in the natural
12.4 H D A C lnhibitors I701
-
Valproic acid SAHA
0 '
MS-275
x
FK228
Fig. 12-5 Examples of H D A C inhibitors that have reached clinical trials as anticancer
agents.
product trichostatin A (TSA).Although too toxic for therapeutic use, TSA was
the first HDAC inhibitor to be mechanistically identified as such [24] and
remains the standard chemical probe of HDAC function and is widely used
as a molecular biological tool. Thousands of synthetic hydroxamic acid H DAC
inhibitors have been reported. Breslow's suberoylanilide hydroxamic acid
(SAHA) illustrates the design requirements for HDAC inhibition perfectly:
a hydroxamic acid metal-binding group, a linear spacer, and an anilide
cap. SAHA was commercialized via the startup Aton Pharmaceuticals, later
acquired by Merck for several hundred million dollars. SAHA is currently
under review for FDA approval and is an excellent illustration that drugs can be
minimalistic in structure and be successfully discovered in an academic setting.
The third family of HDAC inhibitors are cyclic tetrapeptide natural products
exemplified by the trapoxins and apicidins. A ketone functions as the metal-
binding group and an adjacent epoxide capable of irreversible covalent binding
to the enzyme is often present. The natural products contain a mixture of L
and D amino acids and a proline residue to favor the tight turn necessary to
cyclize a tetrapeptide. Although the cyclic tetrapeptides have yet to advance
to clinical trials, they are important biological tools. Schreiber's group [25]
used an affinity column with immobilized trapoxin B to identify its target of
action, and this led to the first characterization of a mammalian HDAC. More
recently, Nishino and Yoshida have reported [26]a series of unnatural analogs
based on the tetrapeptide scaffold with different zinc-binding groups.
Benzamides represent a fourth class of HDAC inhibitors. Unlike the other
H DAC inhibitors above, benzamides do not conform to the simple pharma-
cophore model with an obvious metal-binding group connected to a linear
spacer. Whether they work by the same mechanism or target an allosteric
site on the enzyme is not fully resolved. Nevertheless, they display nanomolar
potency, and more than one compound have reached phase I clinical trials for
cancer.
702
I 72 The Bicyclic Depsipeptide Family of Histone Deacetylase Inhibitors
At least in part, different HDACs presumably achieve selectivity by

discriminating between the side chains of adjacent residues near the scissile
acetyl-lysinein the protein substrate. The minimal pharmacophore for HDAC
inhibition of a metal-binding site and a linear spacer does not take these
additional interactions into account. The early hydroxamic acid inhibitors have
fairly small caps that do not protrude much beyond the substrate-binding
channel. Although potent, they are pan-HDAC inhibitors that are effective
against all the isoforms.
Until a better understanding of the function of individual HDACs is available,
it is unclear whether a global HDAC inhibitor is best for a therapeutic setting.
The past history of drug discovery does suggest that subtype-selectiveagents are
generally superior to nonselective inhibitors. By real time, quantitative, poly-
merase chain reaction (qPCR)we have investigated the level of HDAC genes in
a wide variety ofcancer cell lines compared to normal human dermal fibroblasts
(Fig. 12-6).Although it is difficult to directly compare probes, the results sug-
gest that cancer cells appear to express more ofthe class I HDACs and that these
should be the ones targeted by inhibitors. Similar observations [27] with patient
samples show elevated levels of HDACl and HDAC2 in different cancers.
Fig. 12-6 Relative expression levels of HDACs, by qPCR, in a series o f cancer cell lines.
72.5 The Depsipeptide HDAC Inhibitors
To achieve selectivity in a classical metal-binding HDAC inhibitor, the cap

needs to contain functionality for additional interactions with the “rim”. Of
the inhibitors described above, the cyclic tetrapeptides have this potential due
to their large macrocyclic scaffold, but have yet to result in clinical candidates.
Structurally, the most complex HDAC inhibitors are the depsipeptide natural
products exemplified by FK228. These compounds, which are treated separately
in the next section, have even more elaborate “caps”, and are the best-
documented example of selective HDAC inhibitors.
12.5
The Depsipeptide HDAC Inhibitors
The depsipeptide FK228 (originally called FR901,228) was isolated [28] by

Fujisawa Pharmaceuticals from an extract of the bacteria Chrornobacteriurn
violaceurn No. 968 on the basis of an assay for phenotypic reversal of ras-
transformed tumor cells. The compound was shown to be active in a tumor
xenograft animal model, and to have effects [29] similar to the known HDAC
inhibitors, trichostatin A and trapoxin. Superficially,FK228 (Fig. 12-7)does not
I
FK228 FR901,375 Spiruchostatin A R = i-Pr
Spiruchostatin B R = s-Bu
lntracellular Spiruchostatin C R= i-Bu
disulfide reduction
SHC! \
Spiruchostatin D
/
HS
FK228, active form
Fig. 12-7 The bicyclic depsipeptide HDAC inhibitors

I2 The Bicyclic Depsipeptide Family of Histone Deacetylase /nhibitors
704
I resemble these classic HDAC inhibitors. Within the reducing environment of
the cell, however, one can anticipate reduction of the disulfide bridge to give
free thiols, which now fit the model of a metal-binding group connected to a
linear spacer. Key experiments [30] by Yoshida’s group provided supporting
evidence for this hypothesis. Thus, when assayed in uitro against partially
purified HDACl and 2, FK228 is significantly more active in the presence of
the reducing agent dithiothreitol (DTT).The activity is lost when the oxidizing
agent HzOz is added, or when the reduced dithiol version of FK228 is used.
Furthermore, a thiomethyl derivative obtained by alkylation of the thiol was
inactive. These results indicate that the free thiol is needed for enzyme
inhibition. Excitingly, the data also revealed that FK228 was much more active
against the class I HDACs 1 and 2 than the class I1 HDACs 4 and 6. Compared
to simpler inhibitors such as trichostatin, the large macrocyclic “cap” contains
sufficient structural information for additional rim interactions outside the
substrate-binding channel, enabling differences in affinity between isoforms.
Another patent by Fujisawa disclosed [31]the structure of FR-901375 from an
extract of Pseudomonas chloroaphis No. 2522. While it is a likely HDAC inhibitor,
no data have been reported in this regard and the decision seems to have been
made to promote FK228 instead as the clinical candidate. In 2001, additional
depsipeptide natural products, the spiruchostatins,were reported [32]by Shin-
ya’s group at the University of Tokyo and Yamanouchi Pharmaceuticals.
These compounds were isolated from an extract of Pseudomonas sp. 471576,
on the basis of the ability to increase expression of luciferase driven by
the plasminogen activator inhibitor (PAI-I) promoter. Given the struchral
similarity to FK228, the spiruchostatins were likely to be HDAC inhibitors and
this was confirmed in a later patent [33] and in our biochemical studies (see
below) with the natural product prepared by total synthesis.
12.6
Total Synthesis of Depsipeptide HDAC Inhibitors - Routes to the B-HydroxyAcid
Fragment
Compared to other classes of H DAC inhibitors, the depsipeptides exhibit

two impressive features. Firstly, they are highly potent with IC5os in the
low nanomolar range. Secondly, they are significantly more active against
class I HDACs compared to class I1 HDACs. Fortuitously, it is the former
that are more heavily implicated in cancer and cardiac hypertrophy. On the
other hand, the depsipeptides are structurally the most complicated class
of HDAC inhibitors. Their elaborate framework has apparently deterred the
pharmaceutical industry from the preparation of unnatural analogs and the
iterative improvement of their properties. The Fujisawa and Yamanouchi
patents only cover the natural products and so far only academic groups have
described the total synthesis of depsipeptides.
12.6 Routes to the j3-Hydroxy Acid Fragment I 7 0 5
Disconnection of the depsipeptides at the amide and ester bonds plus the
intramolecular disulfide bridge leads to a peptide fragment and a p-hydroxy
acid. Neither of these is particularly daunting by the standards of modern day
complex molecule total synthesis. Nevertheless, the molecule as a whole has
an intricate array of functional groups that need to be selectively manipulated.
In addition, two macrocycles need to be made, which is always challenging
due to the entropic difficulty of making large-sized rings.
All the depsipeptides contain a common B-hydroxy acid, which can be
disconnected by an aldol reaction. However, it is an example of an “acetate
aldol” that suffers from poor facial selectivity of the acetate enolate. Many
of the auxiliaries and reagent-based conditions that work for propionate and
other a-substituted enolates are unsuitable for acetate aldols. In the event, each
depsipeptide total synthesis has featured a different route for the synthesis of
this B-hydroxy acid fragment.
In Simon’s pioneering FK228 synthesis [34], methyl pentadieonate was
reacted with trityl thiol to give the 1,6 conjugate addition product 1 as an
inconsequential mixture of a$- and p ,y-unsaturated isomers. Reduction
to the alcohol 2 and oxidation provided the a,B-unsaturated aldehyde 3.
The key asymmetric acetate aldol reaction was carried out using Carreira’s
conditions (Scheme 12-1) to give 4 in nearly quantitative yields and perfect
enantioselectivity, followed by hydrolysis to acid ent-5.This is the enantiomer of
the fragment present in the natural products. Because of later difficulties with
the macrolactonization, that step was carried out under Mitsunobu conditions
with inversion of the alcohol, hence necessitating the opposite stereochemistry
in precursor ent-5.
In the Wentworth-janda synthesis [35] of FR-901375, aldehyde 3 was
obtained by a shorter route via conjugate addition to acrolein and Wittig
reaction (Scheme 12-2). The authors had difficulties reproducing the
high enantioselectivity of Simon’s aldol reaction and alternative solutions
were sought. The successful synthesis utilized the Evans’ chiral auxiliary
with chloroacetate. The chloride is a “dummy” substituent ensuring high
diastereoselectivity in the aldol adduct 6 . The chloride was then reduced and
the auxiliary removed to give acid ent-5.
In our synthesis [36] of spiruchostatin A, we followed Simon’s procedure for
the preparation of 3. We too were unable to achieve the Carreira aldol in good
yield. Moreover, the reaction requires the preparation of three noncommercial
materials: the binaphthyl chiral aminophenol, the t-butyl salicaldehyde, and
the silyl ketene acetal. Instead, we opted for a diastereoselective aldol with
the Nagao auxiliary. For reasons that are not completely clear, the Nagao
thiazolidinethione auxiliary exhibits high diastereoselectivity in acetate aldols
unlike the more popular Evans oxazolidinone auxiliary. In this case, aldol
adduct 7 was obtained in good yield (Scheme 12-3).Unlike the other syntheses,
this was coupled directly to the peptide rather than hydrolyzed to the acid 5.
In the Doi-Takahashi synthesis [37] of spiruchostatin A, the acetate aldol was
performed with the Seebach quaternary oxazolidinone chiral auxiliary. The best
706
I
k
0
Me0 78% M e O L S T f l
(6:la$ to P,yisomer)
- b
91% HO-
(6:la,p to p,y isomer)
C
1 2
0.07 equiv 0.07 equiv
3 0.03 equiv 1.5 equiv 4

99%, >98% ee
I
Ti(0i-Pr),
BnO d 100%
Toluene 4 "C, 36 h; TBAF, THF, 5 min
0 OH
HO-STfl
enf-5
Scheme 12-1 Simon's route to acid 5. equiv (COCI)?,2.4 equiv DMSO, CH2C12,
Reagents and conditions: (a) 1.2 equiv -78"C, 30 min; 2.4 equiv Et3N, -30°C 4 h.
TrtSH, 1.2 equiv C s 2 C 0 3 ,THF, 20 h. (b) 2 (d) 10 equiv LiOH, MeOH, 3 h.
equiv DIBAL, CH&, -78 "C, 3 h. (c) 1.2
diastereoselectivity was observed with transmetallation of the lithium enolate

to zirconium. Basic hydrolysis of the product 8 then afforded free acid 5.
12.7
Total Synthesis o f Depsipeptide HDAC Inhibitors - Peptide Synthesis and
Formation o f the seco-Hydroxy Acid
Simon's FK228 synthesis, the first in this area, provided a blueprint for
preparation of the peptide fragment and its linkage to the B-hydroxy acid
5. Starting from D-valine, standard peptide coupling furnished the linear
peptide 9 (Scheme 12-4).The dehydrobutyrine side chain was now introduced
by conversion of the threonine to a tosylate followed by elimination. After
Fmoc deprotection, the free N-terminus was coupled to acid ent-5, and the
C-terminus methyl ester hydrolyzed to give seco-acid 10. A similar strategy was
employed in Wentworth and Janda's synthesis of FR901,375. For this target,
the absence of the dehydrobutyrine unit simplifies the tetrapeptide synthesis,
which was accomplished in a straightforward manner. Coupling with ent-5
and hydrolysis gave the seco-acid 11.
12.7 Peptide Synthesis and Formation of the seco-Hydroxy Acid
I 707
69% ! 1.5 qeuiv 0
0
0
1.8 equiv Bu2BOTf
A ,N I . ~2~equivI i-Pr,NEt
1
-.,
'Bn
CH2CI2, -78 to -10 "C, 8 h
69%, >90% de
Scheme 12-2 The Wentworth-Jandaroute to acid 5. Reagents

and conditions: (a) (i) 0.7 equiv TrtSH, 0.7 equiv E t j N, CH2Cl2,
1 h; (ii) Ph3P=CH-CHO, benzene, reflux. 7 h. (b) (i)Al amalgam,
aq THF, O"C, 2 h; (ii) aq LiOH/H202 in THF, 1 h.
1.7 equiv
0 JNk
d., S OH
3 'r
1.9 equiv TiCI, ~ 0 JN
STrt
d,,7
h
1.9 equiv i-Pr,NEt
CH,C12, -78 "C, 30 rnin 'r 76%
' 0 STrt
H L S3T r t BuLi
1.2 equiv Cp,ZrC12
THF, -78 "C to rt
XJ.
Ph Ph
8
51Yo
Scheme 12-3 The Canesan and Doi-Takahashi procedures for enantioselective acetate
aldol reactions with aldehyde 3.
In the spiruchostatin syntheses, the presence ofa statine unit in the peptide
fragment requires a significantly different protecting group strategy. Statine
esters, unless sterically hindered, rapidly undergo intramolecular cyclization
708
I 12 The Bicyclic Depsipeptide family of Histone Deacetylase Inhibitors
HzNs*
Me0
b
85%
AOMeoH 45% ‘,
Scheme 12-4 Simon’s and equiv i-Pr2NEt, MeCN/CH2C12, 30 min; (iv)

Wentworth-Janda’sroutes to a linear 2 equiv LiOH, aq THF, O”C, 3.5 h. (c) (i) 1
seco-hydroxy acid. Reagents and conditions: equiv Fmoc-D-Cys(STrt)-OH,1.2 equiv EDC,
(a) (i) 1 equiv Fmoc-L-Thr-OH,1.5 equiv 1.2 equiv HOBt, DMF/CH2C12, 20 h; (ii) 1.3
BOP, 3 equiv i-Pr2NEt,MeCN, 30 min; equiv TBSCI, 1.3 equiv imidazole, DMF,
(ii) 5% Et2NH/MeCN, 3 h; 1.1 equiv 20 h; (iii) 50% Et2NH/CH2C12,0°C, 3 h; (iv)
Fmoc-D-Cys(Trt)-OH,1.1 equiv BOP, 2.5 1.1 equiv Fmoc-D-Val-OH,1.3 equiv EDC,
equiv i-PrzNEt, MeCN, 30 min; (iii) 5% 1.4 equiv HOBt, DMF/CH?CIz, 20 h;
Et2NH/MeCN, 3 h; (iv) 1.1 equiv (v) 50% Et2NHJCHzC12, O”C, 4 h; (vi) 1.1
Fmoc-D-Val-OH,1.6 equiv BOP, 6 equiv equiv Fmoc-D-Val-OH,1.3 equiv EDC, 1.4
i-Pr2NEt, MeCN, 30 min. (b) (i) 3 equiv equiv HOBt, 20 h; (vii) 38% Et2NH/CH2C12,
TslO, pyridine, O”C, 20 min; (ii) 10 equiv O”C, 3 h, rt, 3 h; (viii) 1 equiv acid ent-5, 1.5
DABCO, MeCN, 2 h; 5% Et2NH/MeCN, equiv BOP, 3 equiv i-PrzNEt, MeCN/CH2C12,
22 h; (iii) 1 equiv acid ent-5, 1.5 equiv BOP, 3 1 h; (ix) LiOH, aq THF, 1 6 h.
to the lactam when the amine is deprotected. Furthermore, the B-hydroxy

ester unit is prone to protecting group migration and elimination. In our
total synthesis (Scheme 1 2 4 , the eventual solution used a nonhindered ester
that can be removed under neutral conditions without destroying the fragile
,5-hydroxy acid. Meanwhile, the statine was N-protected with a Boc group.
Upon acidic removal, the resulting protonated amine is not nucleophilic and
does not cyclize to the lactam. Addition of an acylating agent and a base,
neutralizes the amine in situ, which then undergoes intermolecular coupling.
This is a testament to the speed of acylations with activated carboxylic acids,
given the presence of an undesirable intramolecular pathway that does not
compete effectively.
The statine 12 was prepared by Claisen condensation ofvaline with methyl
acetate followed by stereoselective reduction of the j3-keto ester, following
precedents as in Jouillik’s total synthesis [38] of tamandarin. The Boc group
12.8 Macrocyclizations and Completion ofthe Synthesis I 709
o \
W
13
'L
y y
H
-
B O C - N A C ~ ~ H34%
d
~ o c .
OH 0
~ v l l
14 OH
15
Scheme 12-5 The Ganesan and CH2C12, 3 h; (iv) 5% Et>NH/CH3CN, 3 h;

Doi-Takahashi syntheses o f spiruchostatin A (v) 1.3 equiv Fmoc-D-Ala-OH, 1.3 equiv
seco acids. Reagents and conditions: PyBOP, 3 equiv i-PrzNEt, CHICN, 1 h; (vi)
(a) (i) 1.1 equiv PfpOH, 1.2 equiv EDC,HCI, 5% Et2NH/CH3CN, 5 h; (vii) 0.9 equiv 7, 0.1
0.2 equiv DMAP, CH2C12, O"C, 30 rnin, rt, equiv DMAP, CH2C12, O"C, then rt, 7 h; (viii)
4 h; (ii) 3.2 equiv LiCH2C02CH3, THF, 10 equiv Zn, NHqOAc/THF, 5 h.
- 7 8 ° C 45 min; (iii) 3.5 equiv KBH4, M e O H , (c) (i) irnzC0, (Et02CCH2C02)2 M g , THF;
-78 toO"C, 50 min; (iv) 26 equiv LiOH, 4 : 1 (ii) NaBH4, T H F / M e O H ; (iii) LiOH, aq THF;
THF/H20, O"C, 2 h; (v) 15 equivTceOH, 6.2 (iv) allyl bromide, K2C03. (d) (i) HCI,
equiv DCC, 0.12 equiv DMAP, CH2C12. 0°C EtOAc; (ii) Fmoc-D-Cys(STrt)-OH, EDC,
t o r t , 18 h. (b) (i) 20%TFA/CH2C12, 3 h; HOBt, i-PrzNEt, (iii) EtZNH; (iv)
(ii) 1 equiv Fmoc-D-Cys(Trt)-OH, 1.2 equiv Fmoc-D-Ala-OH, EDC, HOBt, i-Pr2NEt;
PyBOP, 3.5 equiv i-Pr2NEt, CH3CN, 20 min; (v) acid 5 , PyBOP, i-PrzNEt; (vi) Pd(PPh,)4,
(iii) 4 equiv TIPSOTf, 6 equiv 2,6-lutidine, morpholine, M e O H .
was removed, and the amine coupled with D-Cys(Trt)as described above. The
free alcohol was protected and the peptide sequentially coupled with D-alanine
and thiazolidinethione 7. Reductive removal of the trichloroethyl ester under
neutral buffered conditions provided seco-acid 13. The Doi-Takahashi route was
essentially similar, except that the statine unit 14 was an allyl ester, and the seco-
acid 15 had a free alcohol in place of the triisopropylsilyl (TIPS) protected 13.
12.8
Total Synthesis o f Depsipeptide HDAC Inhibitors - Macrocyclizationsand
Completion of the Synthesis
Interestingly, all the depsipeptide total syntheses to date have chosen

to form the macrocyclic ring by disconnecting the same ester bond.
710
I There are two strategies for such macrolactonizations. The first, which is
72 The Bicyclic Depsipeptide family oftfistone Deacetylase lnhibitors
more common, involves the activation of the carbonyl group followed by

nucleophilic intramolecular displacement by the alcohol. In Simon's FK228
synthesis, attempts at cyclizing seco-acid 10 in this manner were unsuccessful.
Consequently, the second strategy, whereby the alcohol is converted into a
leaving group that is displaced by the carboxylic acid, was explored. Under
carefully controlled Mitsunobu conditions, the macrocycle was obtained in
good yield (Scheme 12-6). The stereochemical inversion occurring in this
process meant that the alcohol in 10 had the enantiomeric chirality to the
natural product. After macrolactonization, the second cyclization involving
formation of the disulfide bridge was smoothly accomplished by iodine
oxidation, completing the total synthesis of FK228. The same sequence of
reactions was used in the Wentworth-Jandasynthesis of FR901,375.
For our spiruchostatin total synthesis, we chose to reexamine the first
strategy of carbonyl activation. At the very least, since our target was different
from Simon's, it was possible that his negative results would not apply to us.
Initial experiments with the popular Yamaguchi method, whereby the hydroxy
acid is treated with trichlorobenzoyl chloride, proved promising. When the
additional alcohol in the seco-acid was protected, this furnished the macrocycle
in good yield (Scheme 12-7). The mechanism of the Yamaguchi procedure
is believed to involve a mixedanhydride. A recent paper [39] suggests that in
some cases, the activated species is actually the symmetrical anhydride, and the
reagent can be replaced by simpler benzoic acids. Following cyclization, iodine
oxidation by the Simon procedure gave the disulfide, which was deprotected
(a) 25 equiv Ph,P,

20 equiv TsOH
5 equiv DIAD,
THF, 0 "C, 4 h
(b) 1 equiv l2
MeOH, 10 rnin
*
52%
(a) 25 equiv Ph,P,

5 equiv TsOH
20 equiv DIAD,
THF, 0 "C, 4 h
(b) 20 equiv l2
MeOH, 10 rnin
TBS (c) 5% aq HF/MeCN, 1 h
*
v - 37%
11 QH FR901,375
Scheme 12-6 Completion ofthe total syntheses of FK228 and FR901,375 by Mitsunobu
macrolactonization.
12.8 Macrocyclizations a n d Completion ofthe Synthesis I 711
0 \
1.5 equiv Et,N, 2 h;

1 equiv DMAP, toluene
n OH (b) 12, MeOH/CH,CI,

" (c) HCI, EtOAc
13 Spiruchostatin A
34%
O L (a) 1.2 equiv
""' 0 NO, 0 0 NO, I
OH 2.4 equiv DMAP, CH2CI,
0
15
OH (b) 12, MeOH/CH,CI,
67%
- -
Spiruchostatin A
16. epi-Spiruchostatin A
Scheme 12-7 Final stages in the Canesan and Doi-Takahashi total syntheses of
spiruchostatin A, and the structure of spiruchostatin A epimer 16.
to furnish the natural product. Similarly, the minor diastereomer of 7 was

carried forward through the whole sequence to provide 16, which is identical to
spiruchostatin A except for being the epimer in the B-hydroxy acid fragment.
In the Doi-Takahashi synthesis of spiruchostatin A, in which the additional
alcohol remains unprotected, the Shiina procedure for carbonyl activation was
used. This enabled the macrolactonization to proceed under milder conditions
at room temperature. The spiruchostatin syntheses show that it is possible to
form the macrocycle by the classical carbonyl activation method rather than
the alcohol activation seen in the Simon and Wentworth-Janda syntheses of
FK228 and FR901,375 respectively. The Shiina reagent is the reagent ofchoice
due to its room temperature activation, and we have successfully used [40]
this method for the preparation of a number of unnatural analogs. Since the
depsipeptides contain two macrocyles, the depsipeptide framework and the
disulfide bridge, the sequence in which these are formed is a separate issue.
All the syntheses have first made the cyclic depsipeptide. In the Doi-Takahashi
712
I route, an intermediate with the intramolecular disulfide bridge in place did not
12 The Bicyclic Depsipeptide Family ofHistone Deacetylase Inhibitors
undergo macrolactonization. This surprising result suggests that the disulfide

bridge does not predispose the system toward the second cyclization, although
modeling indicates that favorable low-energy conformations are accessible.
12.9
The Biological Characterization o f Spiruchostatin A
As described above, the spiruchostatins were first isolated on the basis of their
ability to regulate gene expression in cell-based reporter assays. Nevertheless,
the close structural similarity to FK228 suggested that these natural products
were HDAC inhibitors. Following our total synthesis, we characterized in
detail the activity of spiruchostatin A as an HDAC inhibitor in various model
systems. Initial analysis [3G]demonstrated that spiruchostatin A was a potent
nanomolar growth inhibitor of MCF7 human breast cancer cells. An increase
was observed in histone acetylation and in p21cip1/waf1 promoter activity - two
characteristic cellular responses to HDAC inhibitors.
FK228 is believed to work by a prodrug mechanism involving intracellular
activation by reduction of the disulfide bond. We have obtained evidence [41]
that spiruchostatin A works in a fashion similar to in vitro enzyme inhibition
assays. In the presence of DDT, reduced spiruchostatin A inhibited total HeLa
cell HDAC activity with an ICso of approximately 2 nM. In the absence of
DIT, intact spiruchostatin A was essentially inactive. Another hallmark of
FK228 is its selectivity between HDAC isoforms. The Yoshida group has
investigated this with overexpressed HDACs containing an epitope tag that is
partially purified from cell lysates by immunoprecipitation using antibodies.
In this assay, spiruchostatin A was approximately 500-foldhigher in the activity
against the class I HDACl compared to the class I1 HDACG (Table 12-2).These
results show that FK228 and spiruchostatin A have similar characteristics and
mechanisms as HDAC inhibitors.
Table 12-2 Inhibition values of depsipeptides against HDACl

and HDAC6. For comparison, the values with the nonselective
inhibitor trichostatin A are shown
Compound HDACl lCs0 HDAC6 lCso

[nmol] [nmol]
~~~~~
Trichostatin A 15.0 61
FK228 (with DDT) 4.0 790
Spiruchostatin A (with DDT) 0.6 360
12.9 The Biological Characterization ofSpiruchostatin A I 713
Spiruchostatin A FK228
.
A
MCF7
NHDF
1201 - m
A
MCF7
NHDF
2
cn
a,
g
40
20
4i\t-, 40
20
a
0 I I , I 7I
-2 - 1 0 1 2 3 4 -2 -1 0 1 2 3 4
Log dose [nM] Log dose [nM]
TSA SAHA
1 MCF7
A NHDF
- 2 - 1 0 1 2 3 4 - 2 - 1 0 1 2 3 4
Log dose [nM] Log dose [nM]
Fig. 12-8 Growth inhibition curves offour HDAC inhibitors in

MCF7 and normal human dermal fibroblast (NHDF) cell lines.
Cells were treated with inhibitor and relative cell growth
determined 6 days later.
We have performed side-by-side experiments to compare the growth

inhibitory activity of spiruchostatin A and FK228 compared to other classes of
HDAC inhibitors (Fig. 12-8).The depsipeptides FK228 and spiruchostatin A
are extremely potent inhibitors with subnanomolar/low nanomolar potency in
MCF7 growth inhibition assays. By contrast, the hydroxamic acids, SAHA and
PXDlOl both of which are in clinical trials, are much less potent. The same
result was obtained with the benzamide MS-275, another clinical candidate,
with high nanomolar IC50 values in these cells. We observe similar relative
potency for these inhibitors across various tumor cell types, including A2780
ovarian carcinoma cells and PC3 prostate cancer cells.
We have also compared the activities of depsipeptide and hydroxamic
acid HDAC inhibitors on cellular responses of malignant and normal cells.
When tested at equipotent concentrations, these inhibitors have remarkably
similar effects, inducing essentially identical levels of histone acetylation
and p21C'P'/Waf'protein expression. However, only SAHA and TSA induced
robust a-tubulin acetylation. This is consistent with previous findings [42]that
12 The Bicyclic Depsipeptide Family ofHistone Deacetylase Inhibitors
714
I HDACG, which is very weakly inhibited by spiruchostatin A and FK228, is
responsible for a-tubulin acetylation, and with previous studies [43]of FK228.
The inhibitors also had essentially identical effects on G2M cell-cycle arrest
and cell death. Of a subset of eight genes selected from approximately 100TSA-
regulated genes identified [44]by microarray analysis, all were also regulated
by spiruchostatin A. The growth inhibitory activity of HDAC inhibitor is
relatively selective toward malignant cells [45]and all inhibitors tested showed
approximately equivalent levels of sparing of normal human dermal fibroblasts
(Fig. 12-8).
These findings clearly demonstrate that the relative selectivity of depsipeptide
inhibitors for class I enzymes does not limit their anticancer activity, at least in
vitro, and confirm that inhibition of HDACG is not required for the effects [46]
of HDAC inhibitors on cell-cyclearrest and gene expression. Such observations
are consistent with the predominant expression of class I HDACs in malignant
cells, and the correlation between expression of these enzymes and outcome
in malignancies.
To address the importance of the cyclic cap structure of spiruchostatin A,
we examined the properties of analog 16, epimeric at the thiol-bearing side
chain. Although this compound conforms to the general requirements for
an HDAC inhibitor (i.e., a zinc-binding group, an aliphatic chain to mimic
the lysine side chain and a cap structure), epi-spiruchostatin A was inactive
in both in vitro and cell-based assays. Because of its epimeric nature, if this
compound is oriented within the active site in the same way as spiruchostatin
A, the rest of the depsipeptide framework will be a mirror image. Clearly this is
leading toward unfavorable interactions with the “rim”, or the loss of positive
interactions, leading to loss of activity.
One potential limitation of HDAC inhibitors is that they can induce
expression of p-glycoprotein protein (pgp)-1,a major drug emux pump. This
may lead to resistance to the HDAC inhibitors, as well as potential drug-drug
interactions by decreasing the intracellular accumulation of coadministered
agents. Using quantitative PCR, we demonstrated that spiruchostatin A
significantly induced expression of pgp-1 RNA in MCF7 cells, as documented
with FK228 (Fig. 12-9). Interestingly, pgp-1 RNA was not induced by epi-
spiruchostatin A, demonstrating that induction is likely to be predominantly
mediated by a direct effect of HDAC inhibition, rather than the xenobiotic
responses that mediate the induction of pgp-1 by many other compounds.
Previous work using TSA, has demonstrated [47] a significant role for the
transcription factor NF-Y in the control of pgp-1 expression.
Besides their application as anticancer agents, HDAC inhibitors also have
potential clinical utility in cardiovascular disease. We have characterized [48]the
effects of spiruchostatin A in cardiac myocytes. In these cells, phenylephrine
triggers a cascade of events leading to hypertrophy, including activation of
markers of fetal cardiac gene expression, such as atrial natriuretic factor (ANF)
and B-MHC, and reorganization of fibers to form sarcomeres. Spiruchostatin
A increased histone acetylation in cardiac myocytes, and reversed the effects
12.9 The Biological Characterization ofSpiruchostatin A I 715
FK228 (3.8 nM)

epi-Spi (30 nM)
Spi (30 nM)
DMSo r-~ - - ~ ~
0 50
~ T
100
- -
150
~ ~
200
- T - ~ ~
250
- r ~ -
300
- - - ~ - ~ ~
Fold induction
Fig. 12-9 Induction of pgp-1 RNA expression. M U 7 cells were

treated with the indicated compounds for 16 h and the expression
of pgp-1 RNA analyzed using Q-RT-PCR. Fold induction is shown
relative to DMSO treated cells.
of phenylephrine on ANF and p-MHC expression and sarcomere formation,

suggesting that depsipeptides may have antihypertrophic activity.
Despite the overall similarities between the effects of hydroxamic acid and
depsipeptide HDAC inhibitors on cancer cells, we have identified some im-
portant class-specificeffects (in addition to the selective induction of a-tubulin
acetylation). Importantly, the kinetics of inhibition of cellular HDACs by these
inhibitors varies widely. While hydroxamic acids induce rapid histone acetyla-
tion in intact cells, the onset of action of the depsipeptide inhibitors is much
slower (Fig. 12-10).Also, following removal of compound by extensive wash-
ing, histone acetylation is rapidly lost in hydroxamic acid treated cells, but is
maintained for protracted periods in cells treated with depsipeptide inhibitors.
The mechanisms responsible for these differences are not known, but
presumably relate to uptake of compound, and/or its metabolism to active
forms by intracellular reduction mechanisms.
Fig. 12-10 Histone acetylation in spiruchostatin A- or TSA-

treated cells. MCF7 cells were treated with 15 nM spiruchostatin A
or 80 nM TSA and analyzed by immunoblotting for histone
acetylation at the indicated time points.
716
I
Fig. 12-11 Induction of histone acetylation control (Co). (b) MCF7 cells were treated
by spiruchostatin A. (a) MCF7 cells were with indicated concentrations o f
treated with spiruchostatin A, reduced spiruchostatin A in the presence or absence
spiruchostatin A or spiruchostatin A in o f epi-spiruchostatin A. Histone acetylation
serum free media (SFM), all a t 15 n M for up and PCNA expression (loading control) was
t o 24 h. Untreated cells were analyzed as a analyzed by immunoblotting.
The kinetics of acetylation were not altered by culturing cells in the absence
of serum, suggesting that binding to serum proteins does not limit drug
action (Fig. 12-1l(a)).We also tested the effect of prereducing spiruchostatin
A before addition to cells. However, the kinetics of acetylation induced by
reduced and oxidized spiruchostatin A were essentially identical, suggesting
that intracellular reduction is not a rate-limiting step (Fig. 12-1l(a)).Finally,
we used the inactive epimer of spiruchostatin to investigate the potential
contribution of saturable transporters (Fig. 12-11(b)). We reasoned that this
chemically similar compound might compete for a putative transporter and
interfere with spiruchostatin A-induced acetylation. However, spiruchostatin
A-induced acetylation was equivalent in the presence or absence of its epimer.
Further studies are required to determine the factors that influence the kinetics
of action of depsipeptide HDAC inhibitors.
The significance of these findings for the clinical application of these
compounds is unclear. We and others have shown that transient histone
acetylation associated with “pulse” treatment of cells with hydroxamic acids
is not sufficient to promote G2M arrest. Consistent with this, it may be the
References 1 717
duration of histone acetylation rather than the peak levels that best predict
responses in individual patients in clinical trials. Therefore, the ability of
depsipeptide inhibitors to promote prolonged acetylation may be advantageous.
However, it may be necessary to maintain the circulating concentrations of
these compounds above a threshold for a considerable time before acetylation
is induced. A combination of a rapid acting hydroxamic acid HDAC inhibitor
and a long-lived depsipeptide HDAC inhibitor may provide a particularly
attractive combination.
References
1. (a) N. Sengupta, E. Seto, Regulation of C.R. Maroun, I. Paquin, A. Vaisburg,

histone deacetylase activities, /. Cell. Histone deacetylase inhibitors: Latest
Biochem. 2004, 93, 57-67; (b) M. Biel, developments, trends and prospects,
M. Wascholowski, A. Giannis, Curr. Med. Chem. Anticancer Agents
Epigenetics - An epicenter of gene 2005, 5, 529-560.
regulation: Histones and 9. M.F. Fraga, E. Ballestar,
histone-modifying enzymes, Angew. A. Villar-Garea, M. Boix-Chornet,
Chem., Int. Ed. Engl. 2004, 44, J. Espada, G. Schotta, T. Bonaldi,
3186-3216. C. Haydon, S. Ropero. K. Petrie, N.G.
2. V.G. Allfrey, R. Faulkner, A.E. Mirsky, lyer, A. Perez-Rosado, E. Calvo, J.A.
Acetylation and methylation of Lopez, A. Cano, M.J. Calasanz,
histones and their possible role in the D. Colomer, M.A. Piris, N. Ahn,
regulation of rna synthesis, Proc. Natl. A. Imhof, C. Caldas, T. Jenuwein,
Acad. Sci. U.S.A. 1964, 51, 786-794. M. Esteller, Loss of acetylated lysine 16
3. S.B. Baylin, J.E. Ohm, Epigenetic gene and trimethylated lysine 20 of histone
silencing in cancer - a mechanism for H4 is a common hallmark of human
early oncogenic pathway addiction?, cancer, Nat. Genet. 2005, 37, 391-400.
Nat. Rev. Cancer 2006, 6, 107-116. 10. (a) A. Mai, S. Massa, D. Rotili,
4. 1. Nusinzon, C.M. Horvath, Histone I. Cerbara, S. Valente, R. Pezzi,
deacetylases as transcriptional S. Simeoni, R. Ragno, Histone
activators? Role reversal in inducible deacetylation in epigenetics: an
gene regulation, Sci. STKE 2005 r e l l . attractive target for cancer therapy,
5. K. Zhang, S.Y. Dent, Histone Med. Res. Rev. 2005. 25, 261-309;
modifying enzymes and cancer: Going (b) S. Minucci, P.G. Pelicci, Histone
beyond histones, I . Cell. Biochem. deacetylase inhibitors and the promise
2005, 96, 1137-1148. of epigenetic (and more) treatments
6. M. Dokmanovic, P.A. Marks, for cancer, Nat. Rev. Cancer 2006, 6,
Prospects: Histone deacetylase 38-51.
inhibitors, /. Cell. Biochem. 2005, 96, 11. (a) J.S. Steffan, L. Bodai, J. Pallos,
2 93- 304. M. Poelman, A. McCampbell, B.L.
7. J.M. Denu, The Sir2 family of protein Apostol, A. Kazantsev, E. Schmidt,
deacetylases, Curr. Opin. Chem. Biol. Y.Z. Zhu, M. Greenwald,
2005, 9,431-440. R. Kurokawa, D.E. Housman, G.R.
8. (a) T.A. Miller, D.J. Witter, Jackson, J.L. Marsh, L.M. Thompson,
S. Belvedere, Histone deacetylase Histone deacetylase inhibitors arrest
inhibitors,J. Med. Chem. 2003, 46, polyglutamine-dependent
5097-5116; (b) C. Monneret, Histone neurodegeneration in Drosophila,
deacetylase inhibitors, Eur. I. Med. Nature 2001, 413, 739-743;
Chem. 2005, 40, 1-13; (c) 0. Moradei, (b) E. Hockly, V.M. Richon,
7181 12 The Bicyclic Depsipeptide Family of Histone Deacetylase Inhibitors
B. Woodman, D.L. Smith, X. Zhou, Holmes, K.N. Keavey,

E. Rosa, K. Sathasivam, A. Jaxa-Chamiec, P.W. Seale, P. Stead,
S. Ghazi-Noori, A. Mahal, P.A. R.J. Upton, S.L. Croft, W. Clegg, M.R.
Lowden, J.S. Steffan, J.L. Marsh, L.M. Elsegood, The synthesis of cyclic
Thompson, C.M. Lewis, P.A. Marks, tetrapeptoid analogues of the
G.P. Bates, Suberoylanilide antiprotozoal natural product apicidin,
hydroxamic acid, a histone deacetylase Bioorg. Med. Chem. Lett. 2001, 11,
inhibitor, ameliorates motor deficits in 773-776.
a mouse model of Huntington’s
18. G. Lehrman, I.B. Hogue, S. Palmer,
C. Jennings, C.A. Spina, A. Wiegand,
disease, Proc. Natl. Acad. Sci. U.S.A.
A.L. Landay, R.W. Coombs, D.D.
2003, 100,2041-2046.
Richman, J.W. Mellors, J.M. Coffin,
12. (a) T. McKinsey, E.N. Olson, Toward
R.J. Bosch, D.M. Margolis, Depletion
transcriptional therapies for the failing
of latent HIV-1 infection in vivo: a
heart: chemical screens to modulate proof-of-concept study, Lancet 2005,
genes, J. Clin. Invest. 2005, 115, 366,549-555.
538-546; (b) J. Backs, E.N. Olson, 19. P. Lewer, D.O. Duebelbeis, P.R.
Control of cardiac growth by histone Graupner, J.X. Huang, US Patent
acetylation/deacetylation, Circ. Res. 2005,261174.
2006, 98, 15-24. 20. M.S. Finnin, J.R. Donigan, A. Cohen,
13. (a) N. Yamaji, N. Shindou, Y. Terada, V.M. Richon, R.A. Rifkind, P.A.
World Patent, 2004, 017996; Marks, R. Breslow, N.P. Pavletich,
(b) F. Blanchard, C. Chipoy, Histone Structures of a histone deacetylase
deacetylase inhibitors: New drugs for homologue bound to the TSA and
the treatment of inflammatory SAHA inhibitors, Nature 1999, 401,
diseases?, Drug Discou. Today 2005, 10, 188-193.
197-204. 21. J.R. Somoza, R.J. Skene, B.A. Katz,
14. S. Skov, K. Rieneck, L.F. Bovin, C. Mol, J.D. Ho, A.J. Jennings,
K. Skak, S. Tomra, B.K. Michelsen, C. Luong, A. Arvai, J.J. Buggy, E. Chi,
N. Odum, Histone deacetylase J. Tang, B.-C. Sang, E. Verner,
inhibitors: a new class of R. Wynands, E.M. Leahy, D.R.
immunosuppressors targeting a novel Dougan, G . Snell, M. Navre, M.W.
signal pathway essential for CD154 Knuth, R.V. Swanson, D.E. McRee,
expression, Blood 2003, 101, L.W. Tari, Structural snapshots of
1430-1438. human HDAC8 provide insights into
15. S.G.Gray, P. De Meyts, Role of the class I histone deacetylases,
histone and transcription factor Structure 2004, 12, 1325-1334.
22. T.K. Nielsen, C. Hildmann,
acetylation in diabetes pathogenesis,
A. Dickmanns, A. Schwienhorst,
Diabetes Metab. Res. Rev. 2005, 21,
R. Ficner, Crystal structure of a
416-433.
bacterial clas 2 histone deacetylase
16. M. Crestani, C. Godio, N. Mitro,
homologue, J. Mol. Biol. 2005, 354,
World Patent, 2005, 105066. 107-120.
17. (a) S.B. Singh, D.L. Zink, J.M. Liesch, 23. D.-F. Wang, P. Helquist, N.L. Wiech,
R.T. Mosley, A.W. Dombrowski, G.F. 0. Wiest, Toward selective histone
Bills, S.J. Darkin-Rattray, D.M. deacetylase inhibitor design:
Schmatz, M.A. Goetz, Structure and Homology modeling, docking studies,
chemistry of apicidins, a class of novel and molecular dynamics simulations
cyclic tetrapeptides without a terminal of human class I histone deacetylases,
a-keto epoxide as inhibitors of histone J. Med. Chem. 2005,48,6936-6947.
deacetylase with potent antiprotozoal 24. M. Yoshida, M. Kijima, M. Akita,
activities, J . Org. Chem. 2002, 67, T. Beppu, Potent and specific
815-825; (b) P.1. Murray, M. Kranz, inhibition of mammalian histone
M. Ladlow, S. Taylor, F.’Berst, A.B. deacetylase both i n uiuo and i n uitro by
References I 7 1 9
trichostatin A, /. Biol. Chem. 1990, 265, Antibiot. 1994, 47, 301-310;
17174- 17179. (b) N. Shigematsu, H. Ueda,
25. J. Taunton, J.L. Collins, S.L. Schreiber, S. Takase, H. Tanaka, K. Yamamoto,
Synthesis of natural and modified T. Tada, FR901228, A novel antitumor
trapoxins, useful reagents for bicyclic depsipeptide produced by
exploring histone deacetylase Chromobacterium violaceum No. 968.
function,]. Am. Chem. SOC.1996, 118, 11. Structure determination, ].
10412-10422. Antibiot. 1994, 47, 311-314;
26. (a) N. Nishino, B. Jose,S. Okamura, (c) H. Ueda, T. Manda, S. Matsumoto,
S. Ebisusaki, T. Kato, Y. Sumida, S. Mukumoto, F. Nishigaki,
M. Yoshida, Cyclic tetrapeptides I . Kawamura, K. Shimomura,
bearing a sulfhydryl group potently FR901228, A novel antitumor bicyclic
inhibit histone deacetylases, Org. Lett. depsipeptide produced by
2003, 5, 5079-5082; (b) B. Jose, Chromobacterium violaceum No. 968.
Y. Oniki, T. Kato, N. Nishino, Ill. Antitumor activities on
Y. Sumida, M. Yoshida, Novel histone experimental tumors in mice, ].
deacetylase inhibitors: cyclic Antibiot. 1994, 47, 315-323.
tetrapeptide with trifluoromethyl and 29. H. Nakajima, Y.B. Kim, H. Terano,
pentafluoroethyl ketones, Bioorg. Med. M. Yoshida, S. Horinouchi, FR901228,
Chem. Lett. 2004, 14,5343-5346; a potent antitumor antibiotic, is a
(c) M.P. Bhuiyan, T. Kato, T. Okauchi,
novel histone deacetylase inhibitor,
N. Nishino, S. Maeda, T.G. Nishino,
Exp. Cell Res. 1998, 241, 126-133.
M. Yoshida, Chlamydocin analogs 30. R. Furumai, A. Matsuyama,
bearing carbonyl group as possible
N. Kobashi, K.-H. Lee, N. Nishiyama,
ligand toward zinc atom in histone
H. Nakajima, A. Tanaka, Y. Komatsu,
deacetylases, Bioorg. Med. Chem. 2006,
N. Nishino, M. Yoshida,
14,3438-3446.
S. Horinouchi, FK228 (depsipeptide)
27. (a) P. Zhu, E. Martin, J. Mengwasser,
as a natural prodrug that inhibits class
P. Schlag, K.P. Janssen, M. Gottlicher,
I histone deacetylases, Cancer Res.
Induction of HDAC2 expression upon
2002, 62,4916-4921.
loss of APC in colorectal
tumorigenesis, Cancer Cell 2004, 5, 31. M. Okuhara, T. Goto, T. Fujita,
455-463; (b) K. Halkidou, Y. Hori, H. Ueda, Japanese Patent,
L. Gaughan, S. Cook, H.Y. Leung, 1991, 3141296.
D.E. Neal, C.N. Robson, Upregulation 32. (a) K. Shin-ya, Y. Masuoka, A. Nagai,
and nuclear recruitment of HDACl in K. Furihata, K. Nagai, K. Suzuki,
hormone refractory prostate cancer, Y. Hayakawa, Y. Seto, Spiruchostatins
Prostate 2004, 59, 177-189; (c) C.A. A and B, novel gene
Krusche, P. Wulfing, C. Kersting, expression-enhancing substances
A. Vloet, W. Bocker, L. Kiesel, H.M. produced by Pseudomonas sp,
Beier, J.Alfer, Histone deacetylase-1 Tetrahedron Lett. 2001, 42, 41-44;
and -3 protein expression in human (b) K. Nagai, M. Taniguchi, N. Shindo,
breast cancer: a tissue microarray Y. Terada, M. Mori, N. Amino,
analysis, Breast Cancer Res. Treat 2005, K. Suzumura, I. Takahashi,
90,15-23. M. Amase, World Patent, 2004,
28. (a) H. Ueda, H. Nakajima, Y. Hori, 020460.
T. Fujita, M. Nishimura, T. Goto, 33. N. Shindou, A. Terada M. Mori,
M. Okuhara, FR901228, A novel N. Amino, K. Hayata, K. Nagai,
antitumor bicyclic depsipeptide Y. Hayakawa, K. Shinke, Y. Masuoka,
produced by Chromobacterium Japanese Patent, 2001,348340.
violaceum No. 968. I. Taxonomy, 34. K.W. Li, W. Xing, J.A. Simon, Total
fermentation, isolation, synthesis of the antitumor
physico-chemical and biological depsipeptide FR901,228,]. Am. Chem.
properties, and antitumor activity,/. SOC.1996, 118,7237-7238.
720
I 35. Y. Chen, C. Gambs, Y. Abe, M.C. Kao, S.L. Schreiber, Chemical
P. Wentworth Jr, K.D. Janda, Total genetic modifier screens: small
synthesis of the depsipeptide molecule trichostatin suppressors as
FR-901375,J. Org. Chem. 2003, 68, probes of intracellular histone and
8902-8905. tubulin acetylation, Chem. B i d . 2003,
36. A. Yurek-George, F. Habens, 10,397-410.
M. Brimmell, G. Packham, 44. M. Howell, B.M. Pickering, K. Carey,
A. Ganesan, Total synthesis of S.J. Crabb, R. Mitter, P.W.M. Johnson,
spiruchostatin A, a potent histone G. Packham, Microarrey analysis of
deacetylase inhibitor, J. Am. Chem. histone deacetylase regulated genes in
SOC.2004, 126,1030-1031. MCF7 human breast cancer cells,
37. T. Doi, Y. Iijima, K. Shin-ya, Manuscript in preparation.
A. Ganesan, T. Takahashi, A total 45. J.S. Ungerstedt, Y. Sowa, W.S. Xu,
synthesis of spiruchostatin A, Y. Shao, M. Dokmanovic, G. Perez,
Tetrahedron Lett. 2006, 47, 1177-1180. L. Ngo, A. Holmgren, X. Jiang, P.A.
38. B. Liang, D.J. Richard, P. Portonovo, Marks, Role of thioredoxin in the
M.M. Jouillii., Total syntheses and response of normal and transformed
biological investigations of cells to histone deacetylase inhibitors,
tamandarins A and B and tamandarin Proc. Natl. Acad. Sci. U.S.A. 2005, 102,
A analogs, J . Am. Chem. SOC.2001, 673-678.
123,4469-4474. 46. S.J. Haggarty, K.M. Koeller, J.C.
39. I. Dhimitruka, J. Santa Lucia Jr, Wong, C.M. Grozinger, S.L. Schreiber,
Investigation of the Yamaguchi Domain-selective small-molecule
esterification mechanism. Synthesis of inhibitor of histone deacetylase 6
a lux-s enzyme inhibitor using an (HDAC6)-mediated tubulin
improved esterification method, Org. deacetylation, Proc. Natl. Acad. Sci.
Lett. 2006, 8, 47-50. U.S.A.2003, 100,4389-4394.
40. A. Yurek-George, A. Cecil, T. Hill, 47. S . Jin, K.W. Scotto, Transcriptional
A. Ganesan, unpublished results. regulation ofthe M D R l gene by
41. S.J. Crabb, H. Rogers, P.A. Townsend, histone acetyltransferase and
A. Yurek-George, K. Carey, B.M. deacetylase is mediated by NF-Y, Mol.
Pickering, S. Maeda, P.W.M. Johnson, Cell. Biol. 1998, 18, 4377-4384.
K. Shin-ya, M. Yoshida, A. Ganesan, 48. S.M. Davidson, P.A. Townsend,
G. Packham, Depsispeptide histone C. Carroll, A. Yurek-George,
diacetycase inhibitors induce delayed K. Balasubramanyam, T.K. Kundu,
and protracted histore acetylation, A. Stephanou, G. Packham,
submitted for publication. A. Ganesan, D.S. Latchman, The
42. Y. Zhang, N. Li, C. Caron, transcriptional co-activator p300 plays
G. Matthias, D. Hess, S. Khochbin, a critical role in the hypertrophic and
P. Matthias, HDAC-6 interacts with protective pathways induced by
and deacetylates tubulin and phenylephrine in cardiac cells but is
microtubules in vivo, EMBOJ. 2003, specific to the hypertrophic effects of
22,1168-1179. urocortin, Chem. Biochem. 2005, 6 ,
43. K.M. Koeller, S. J. Haggarty, B.D. 162-170.
Perkins, 1. Leykin, J.C. Wong,
PART V
Chemical Informatics
Edited bv Stuart L. Schreiber. Tamn M. Kauoor. and Gunther Wess
Copyright 02007 WILEY-VCH Verlag G k b H & Co. KGaA, Weinhelm
ISBN: 978-3-527-31150-7
Chemical Biology
I723
13
13.1
Paul A. Clemons
Outlook
This chapter begins with an overview of cheminformatics and chemical space,

presenting concepts and terminology that will aid the reader’s understanding of
the following sections. The second section provides a conceptual perspective
on chemical structure, summarizing the evolution of the molecular graph
representation now intimately familiar to the synthetic organic chemist. The
third and main section outlines the development of computable molecular
descriptors, including those based on both empirical and theoretical models.
The purpose of this section is to demystify the process of computing
descriptors and to give readers, especially experimental chemists and biologists,
a clear connection between their intuitive concept of chemical structure
and how molecular structures can be represented computationally. The
fourth section uses several recent examples to illustrate how the concept
of chemical space can be applied to problems in cheminformatics, such as
property prediction, diversity analysis, and reagent selection. A brief final
section challenges cheminformatics to approach future efforts to understand
molecular diversity in terms of the experimental performance of small
molecules across multiple biological contexts. The novice reader should
use this narrative as a starting point for further inquiry, particularly by
exploring the primary sources and other references cited herein. The expert
reader is encouraged to allow this chapter to bring fresh perspective to a
familiar field, and especially to appreciate how future challenges will require
increasingly tight connections between synthetic chemists, chemical biologists,
and computational scientists.
Chemical Biology. From Small Molecules to System Biology and Drug D e s i p

Copyright 0 2007 WILEY-VCH Verlag GmbH 6; Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
724
I 13 Chemical Informatics
13.1.1
Introduction: Cheminformatics and Chemical Space
The similarity of small molecules and the diversity of small-molecule

collections can be described in many ways, both computational and
experimental. Predictive models and classification methods that relate
computable properties to measured outcomes can provide useful insights
into synthetic library planning, selection of compounds for screening efforts,
and prioritization of “hits” from high-throughput screening (HTS).In the past,
chemical intuition dominated analyses of small-molecule structure, structural
similarity, and chemical reactivity. Chemists trained in synthetic organic
chemistry, for example, have developed over two centuries of deep intuition
about chemical reactivity that can now be expressed in terms of formal logical
rules 111. Medicinal chemists have similarly built extensive working knowledge
of the structural patterns “accepted” by human biology as bioavailable drug
molecules.
In recent decades, chemists have increasingly turned to computation to
solve chemical problems [2]. The diversity of applications for computers in
chemistry reflects the variety in chemical research, and computers are now
indispensable in all areas of chemistry. In 1959, Konrad Zuse sold the first
commercially available computer, the magnetic drum-based 222, to Bayer
AG [2, 31. Beginning in the mid-l9GOs, chemists began to make use of
the rapidly growing capabilities afforded by computers to frame and solve
problems in chemistry. Initially, computer assistance to chemical research
focused on structure elucidation based on assisted evaluation of spectroscopic
data [4,51, and on programs to design organic syntheses on the basis of
known reaction data [2, GI. More than a decade ago, Ugi etal. made a
distinction between “computational chemistry”, in which the calculation of
molecular energy levels and geometries prevails, and “computer chemistry”,
in which the logical and combinatorial capabilities of computers (rather than
the arithmetic ones) are exploited to solve chemical problems not approachable
by numerical computations per se [2]. Though this conceptual distinction is
an important one, the precise terminology did not persist. Instead, the newer
term cheminformatics now enjoys wide use to encapsulate a broad range of
activities at the interface of chemistry and computer science, such as synthetic
planning, molecular property calculation, database searching, combinatorial
library manipulation, chemical similarity and diversity, and simulations of
molecular behavior.
While early efforts to use computers in chemistry were significant
accomplishments if a computer provided any solution to a chemical problem,
the present situation is far different. Today, a proliferation of methods and
approaches requires a distillation of meaningful results from a vast array
of potential solutions. Most frequently, this situation necessitates agile and
iterative feedback between hypothesis generation (afforded by computational
scientists) and hypothesis testing (usually performed in the laboratory).
73.7 Chemica/ Informatics 1 725
Thus, despite the emergence of cheminformatics as a thriving and distinct

subdiscipline, the need for close connections between cheminformatics and
experimental (e.g., synthetic organic) chemists has never been greater.
Against this backdrop, making clear distinctions between computed
properties and measured properties of small molecules is especially important.
In both cases, a structural representation of a small molecule is the input
parameter to a conceptual set of operations that give rise to numerical outputs
such as molecular descriptors, physicochemical properties, or biological
outcomes (Fig. 13.1-1(a)).However, to be useful in predictive ways, such
as when used to support prospective decisions about the investment of
synthetic chemistry resources, at least some of these numerical outputs must
be computable given only a structure representation. Only this situation allows
relationships between experimentally determined values and computed values
to be used to predict experimental outcomes for new molecules, based on
their structural similarity to molecules that have already been experimentally
tested (Fig. 13.1-1(b)).Most broadly, chemical space is a colloquialism that
refers to the ranges and distributions of computed or measured outputs
based on chemical structure inputs, and serves as a mathematical framework
for quantitative comparisons of similarities and differences between small
molecules (Fig. 13.1-1(c)).
13.1.2
General Considerations: Chemical Structure Graphs
Synthetic organic chemistry can be viewed as an ongoing series of experiments

to relate properties of chemical structure, particularly topological, steric, and
electronic properties, to a particular class of measured outcomes, namely,
the reactivities of combinations of functional groups under diverse reaction
conditions, as judged by reaction rates and yields of product formation. Physical
chemistry seeks relationships between chemical structure and such outcomes
as boiling or melting points, vapor pressure, and electrochemical potential.
Analytical chemistry often relates chemical structure to the measured behavior
of molecules in appropriately applied electromagnetic fields.
Each of these aspects of the field of chemistry is connected through
the basic principle of chemical structure, which is a profound physical
feature of the molecular world where we live. At its most fundamental,
stereoelectronic structure is a quantum-mechanical reality of all molecules,
with the intrinsic uncertainty that this reality implies. Thus, perfectly
accurate structural descriptions of molecules are both elusive and potentially
cumbersome. Instead, chemists have devised an exceptional model of molecular
structure by inference. This model has been built over decades between
evolving theory and experiments that measure various molecular properties
that derive from structure itself. Closely aligned with our intuitive definition
of “structure”, of course, are methods that provide direct information about
726
Fig. 13.1-1 The concept of chemical space. computed and measured properties.
(a) Chemical structure as an input t o (c) Chemical space as a mathematical
operations producing numerical outputs. framework for comparing molecules, where
(b) Conceptual illustration ofa possible “distance” is related t o “dissimilarity”.
predictive relationship between arbitrary
the “size” and “shape” of molecules, such as X-ray crystallography and

magnetic resonance spectroscopy. However, even these methods provide
only a partial picture of molecular structure. Experimental realities such as
lattice constraints, resolution limits, dynamic equilibria between rotamers,
and modeling ambiguities often raise questions about how the same molecule
might “look” under other experimental (or natural) circumstances.
13.1 Chemical Informatics I 727
Considering structure in this manner, however, promotes the notion that

it is rarely molecular structure per se that intrigues and excites us. Rather,
molecular structure is often just a surrogate that we use to encode likely
behaviors of molecules under different sets of circumstances. We often wonder,
for example, how a change in structure might result in some difference in
a measurable outcome. Indeed, it is molecular properties that are of primary
interest after all! Because of this fact, chemists have developed very elegant
and compact representations of chemical structure.
The concept of the chemical graph has a history that predates modern
theories of chemical bonds and molecular structure. Scottish chemist William
Cullen introduced “affinity diagrams” in his mid-eighteenth century lectures,
using lines to represent forces acting between molecules undergoing chemical
reactions [7]. Subsequently, in 1789, William Higgins used lines to denote
forces connecting atoms to depict individual molecules, in this case the various
oxides of nitrogen [7]. Both of these “chemical graphs” predated the modern
concept of the chemical bond as articulated much later by Couper and Kekule
[8], among others [9], but they did set the stage for more serious attempts to
study the spatial arrangement of atoms in molecules, notably by Dalton and
Wollaston, each of whom made use of models reminiscent of the modern
“ball-and-stick” depictions of chemical structure [7].
A more familiar concept of the molecular graph was introduced implicitly by
Sir Arthur Cayley in 1874 [lo], though the term graph was not used explicitly
until several years later by Sylvester [I11,who was inspired by the valence-theory
pioneer Edward Frankland’s “graphic-like symbolic formulae” [ 121. Cayley’s
seminal paper in chemical graph theory considered the mathematical theory of
isomers, and identified two types of molecular graphs, which Cayley named
“plerograms” and “kenograms” [lo, 13, 141. Though a contemporary of the
chemists involved in the development of chemical-bonding theory, such as
Couper and Kekule [9],Cayley is most widely known as a pure mathematician,
a fact that foreshadows the modern need for interdisciplinary approaches to
chemical research. In modern terminology, Cayley’s plerograms are molecular
graphs in which all atoms are represented by vertices, and all bonds by edges.
Cayley’s kenograms represent what are known today as hydrogen-suppressed
molecular graphs [12j.
Many advances in the understanding of electronic structure accompanied
the first half of the twentieth century, especially including the introduction of
shared electrons and electron-dot structures by Lewis in 1916 [15], quantum
mechanics in 1926, and Pauling’s hybrid molecular orbitals in 1931 [8].
Despite these advances, chemists rarely take the time or trouble to draw
the more “accurate” space-filling, or even three-dimensional ball-and-stick,
structures during normal presentation. Rather, chemists have developed
conventions such as condensed formulas, dashed-wedge line notation, and
hydrogen-suppressed chemical graphs, each of which embed implications of
electronegativity, lone pairs, molecular orbitals, and three-dimensionality as a
symbolic logic [ 15, 161 that trained chemists interpret automatically.
(c) 3R,4S,5R-trihydroxy-cyclohex-l-enecarboxylic acid
(d) 011100 010iooio ioioo010 01011100 ooooiioo
01000100 00000000 00000000 10000100
00000111 00000000 00010001 00000000
00000000 00110010 00000000 00000000
00000000 00000000 00000000 00000000
(e) O[C@@H]lCC(=C[C@@H] (0)[C@H]lO)C(=O)O

(f) 12 12 0 0 0 0 0 0 0 0999 v2000
-0.7145 0.2062 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.7145 -0.6187 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.0000 -1.0312 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7145 -0.6187 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7145 0.2062 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.0000 0.6187 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.0000 1.4437 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7145 1.8562 0.0000 0 0 0 0 0 0 0 0 0 0 0 0 0
-0.7145 1.8562 0.0000 0 0 0 0 0 0 0 0 0 0 0 0 0
1.4289 -1.0312 0.0000 0 0 0 0 0 0 0 0 0 0 0 0 0
-0.0000 -1.8562 0.0000 0 0 0 0 0 0 0 0 0 0 0 0 0
-1.4289 -1.0312 0.0000 0 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
2 3 1 0
3 4 1 0
4 5 1 0
‘.L________Atomic coordinates
5 5 1 0
6 1 2 0
6 7 1 0 +-Connection table
7 8 2 0
7 9 1 0
410 1 1
311 1 6
212 1 6
Most importantly in the present context, of course, the intersection of

hydrogen-suppressed graphs with general topological and graph-theoretical
considerations [ 131 represents an important conceptual advance in the
transition between human-readable and machine-readable structure represen-
tations, as we shall see in the following section. However, it is also important
to remember that one result of simplifying representations, whether made by
man or machine, is concealing of a considerable amount of latent complexity.
Any representation of chemical structure is thus a complex cipher, allowing
our model of structure such brevity as to mask the distinction between the
model and the reality of chemical structure. The foregoing evolution of such
representations is a testament to both our evolving understanding of structure
and the human capacity for encoding any information. In this latter sense,
however, chemical structure representation is quite naturally suited to the
computer age.
13.1.3
History and Development: Computable Representations of Structure
Since the advent of modern computers, much attention has been paid to
methods to represent chemical structure in ways that are electronically
encodable. Such representations underlie most modern systems designed
to store and utilize chemical information, such as chemical documentation
using databases. Beginning in the mid-twentieth century, several methods
of encoding chemical information for machine processing were developed.
Chemical cipher notations had been introduced and refined by Gordon [17,
181, Dyson [19], Waldo [20, 211, and Wiswesser [22, 231, among others,
beginning in the late 1940s. In 1962, Bouman introduced one of the first
linear-cipher representations, a “linearly organized chemical code for use in
computer systems (Locus)”,whose representations of chemical structure are
recognizably ancestral to modern molecular line-entry notations (Fig. 13.1-2(a))
[24]. Significantly, one stated objective of Bouman was to reduce the chemical
knowledge required to use the system, allowing more of the coding work to be
done by machines or by chemically na’ive clerical stafF.
In 1964, Spialter introduced the “atom connectivity matrix (ACM)” in
an attempt to define algebraically a “characteristic polynomial” associated
t Fig. 13.1-2 Encoding chemical structure. codes exemplified by M D L public keys

(a) Early encoding after Bouman [24], (Elsevier MDL; San Ramon, CA),
similar to modern line notation. (b) Early (e) Simplified Molecular Input Line Entry
encoding after Spialter [25],similar to Specification (SMILES) [28, 291 line
modern connection table. Modern encoding notation, and (9 atomic coordinates and a
methods using (c) International Union of connection table from the industry-standard
Pure and Applied Chemistry (IUPAC) structure-definition file(SDF) format.
systematic nomenclature, (d) fragment
730
I with chemicalinformatics
13 Chevn;ca/
topology [25]. Again, though clearly inspired by earlier graph-

theoretic work such as that by Ray and Kirsch [2G] among others, Spialter’s
paper is among the first to show something recognizable as a precursor
to a molecular connection table (Fig. 13.1-2(b))[25]. Many issues familiar in
modern cheminformatics were addressed by these early studies, such as the
trade-off in readability by an algorithm uersus a trained chemist, the rank and
seniority of substructures, and the uniqueness and generality of chemical
representations. On the other hand, stereochemical distinctions were not
addressed by these early systems; rather the focus of encoding was on the
topological connectivity of the molecular graph. For the most part, current
methods of computer-encodable structure representation fall into four classes
[27]:systematic nomenclature (Fig. 13.1-2(c)),fr.agmentationcodes (Fig 13.1-2(d)),
line notations (Fig. 13.1-2(e)),and connection tables (Fig. 13.1-2(f)).In general,
unambiguous stereochemical representation remains a problem for all but the
most sophisticated of encoding systems.
Importantly, encoding methods such as these give rise directly to a wealth
of computational approaches to assess similarity between compounds and
diversity among compound collections. Rather than relying on chemical
training to interpret chemical similarity or dissimilarity, such structure-
encoding methods allow algorithmic processing of often-large collections
of structures for specific properties, such as substructure matches, or general
properties, such as the overall diversity of a compound collection. Many
methods have been developed to take advantage of increased computing power
and computer science sophistication in the representation and computation
of structural features. The remainder of this section provides some key details
about illustrative examples of several such molecular descriptor methods.
13.1.3.1 Functional Group Constants

Attempts to investigate the effects of physicochemical properties on chemical
reactivity, biological activity, and toxicity date back over a hundred years
[30-321. In 1936, Hammett predicted entropies of ionization of benzoic acid
derivatives on the basis of both structural changes and a consideration of
the temperature-dependence of the dielectric constant of the solvent in which
ionization occurred. Hammett’s own comments prefigure ongoing controversy
about interpreting the structural determinants of molecular properties: “The
effect of a change in structure of reactant upon the equilibrium or rate of
an organic chemical reaction . . . has been attributed [both] to an increase or
decrease in the electrical work [of ionization due to] the substitution” [33, 341.
Further extensions of these groundbreaking ideas by Hammett resulted in the
so-called Hammett equation, initially used to summarize substituent effects
on rate and equilibrium constants for meta- and para-substituted benzene
derivatives 135, 361:
13.7 Chernica/ informatics I 731
The symbol ko is an intercept term that is equal to k for the parent

(unsubstituted) compound. The reaction constant p depends on reaction
conditions such as solvent and temperature, representing the susceptibility
of the reaction to environmental effects. In contrast, the substituent constant
D P is a measure of the electronic effect of replacing hydrogen by a given
substituent, and is assumed to be independent of the reaction conditions. By
defining p = 1 for the room temperature ionization of substituted benzoic
acids in water, Hammett calculated op values directly for 13 substituents, and
predicted those for a further 17 substituents by applying the primary D P values
to other reactions. Later work increased the number of c r p values to 44 and the
number of reaction series to 51 [35].
From a cheminformatic perspective, the most important consequence
of the Hammett equation is that it separates explicitly the contribution
of environment from that of chemical structure in the prediction of an
outcome (in this case, a reactivity property). As such, the Hammett equation
represents one of the earliest attempts to predict molecular behavior on
the basis of chemical structure alone. Notably, however, later investigators
experienced difficulties when trying to apply Hammett-type relationships to
biological systems, indicating that additional structural determinants need to
be considered [32, 371.
In the 1960s. several seminal papers by Hansch and coworkers inaugurated
the era of quantitative structure-activityrelationships (QSARs),using structural
determinants to model and predict first the physicochemical, and then
the biological properties. First, Fujita et al. explicitly measured partition
coefficients between 1-octanol and water for over 200 mono- and disubstituted
benzenes [38]. These measured values were used to derive new substituent
constants for 67 functional groups attached to various benzene derivatives,
representing the change in partition coefficient introduced by adding the
substituent. While some variation between these constants was observed
across different electronic environments, the variations were relatively small
and were sometimes related by simple linear expressions, allowing the authors
to use this system to establish correlations between partition coefficients and
biological activities. Shortly thereafter, Iwasa et al. demonstrated the value
of using substituent constants, this time for aliphatic groups, to correlate
chemical structure with the narcotic action of alcohols, esters, ketones, and
ethers on tadpoles [39].
These seminal papers set the stage for the entire field of QSARs, which
in general attempts to derive equations that relate predicted or measured
physicochemical properties to some biological outcome. In 1969, Hansch
reflected on these early results in the Accounts of Chemical Research [37, 401,
relating nearly 20 years of interest in indole derivatives, and an ongoing
collaboration with Robert Muir of the Pomona botany department to correlate
chemical structure with the biological activities of indoleacetic acid-like
synthetic hormones. In an almost prescient allusion to the ongoing challenges
of interdisciplinary work, Hansch recounts that “attempts to formulate these
732
I [results] in quantitative terms were frustrated by our conceptual training . . .
13 Chemical Informatics
Muir was well aware of “lock and key” theory of enzyme-substrate reactions,
. . .[and] I was conditioned to explain substituent effects in the electronic terms
of the Hammett equation.” Hansch et al. were considering different ways of
mathematically combining Hammett constants and partition coefficients to
reduce data variance in their models, and Fujita had initially suggested a linear
combination. Only later, when Hansch could “bring [himlself to postulate that
log (1/C) was not linearly but parabolically dependent on log P”, did they obtain
a generally useful relationship.
Hansch rationalized this relationship by saying that molecules that are highly
hydrophilic will not penetrate lipophilic barriers, while highly hydrophobic
molecules will be soaked up by the first lipophilic material they encounter;
either way, such molecules will have difficulty reaching their sites of action.
Thus, only molecules with intermediate lipophilicities will readily exert
biological influence. These insights represent groundbreaking thinking for
their time, and herald the modern age of QSAR. Currently, both linear and
nonlinear relationships between structure and activity are routinely considered,
and the effects of both electronic (polar) and hydrophobic interactions
are embedded within QSAR models. Such considerations allow generally
predictive models of activity based on small-molecule structures, at least
within congeneric series of molecules. Moreover, hydrophobicity, expressed
as the octanol-water partition coefficients (log P ) , has proven useful in
predicting various biological observations [37,40],and this property is now used
extensively in drug discovery and predictive toxicology [41, 421. The Hansch-
type approach that correlates physicochemical properties with activities using
multivariable regression techniques has subsequently been widely applied to
problem areas such as toxicity, enzyme inhibition, ligand-receptor binding,
carcinogenicity, mutagenesis, and metabolism [43],and the insights of Hansch
with respect to the interplay of hydrophobic and electronic parameters presage
decades of research into molecular descriptor analysis that continues to
this day.
13.1.3.2 Graph-Theoretic Indices

Recalling Cayley’s plerograms and kenograms [lo, 12, 141, small molecules
can be (and usually are) represented as polygonal shapes where each vertex
represents an atom and each edge represents a bond. This representation
is termed the molecular graph, and a given structure can be a path, a tree,
or a graph, in the formal language of topology. Graph theory provides for
the calculation of indicators defined over such graphs, generally termed
indices [14, 441. The use of topological indices in chemistry began in 1947
when Harold Weiner developed the oldest among the topological indices for
molecular structure [45-47], the Weiner index, and used it to predict physical
properties of paraffins [44, 481. The Weiner index, W , on a graph G , is
given by:
W(G) = C d(atom,,atom,)
’</
where d is the shortest distance obtained by counting bonds between the two
atoms, and the sum is computed over all pairs of atoms in G .
Importantly, it has subsequently been shown that the Weiner index for
a molecular graph may have strong correlations with chemical properties
[49-511. Consequently, it is often the objective of synthetic efforts, particularly
in drug discovery optimization, to construct compounds with certain properties
by synthesizing lead compounds with a particular Weiner index. This strategy
is an important example of how computed properties (that correlate predictively
with desired properties) can be used to create new compounds that have certain
values; that is, they occupy certain regions of a chemical descriptor space [44].
Weiner also observed the following relation for molecules that have acyclic
graphs:
W(G) = n,(bond,IG)nz(bond,IG)
where the sum is computed over all bonds in G , and where nl(bondilG) and
nz(bondi1G) are the number of atoms lying on either side of a given bond
[14, 46, 481. This result can be conceptualized first by considering that large
contributions to the sum in the first definition of W will come from atoms near
the molecular perimeter, since these are more bonds removed, on average,
from most other atoms, whereas smaller contributions to the sum will come
from more central atoms (Fig. 13.1-3(a)).Since all pair-wise distances used
in the sum are obtained by counting bonds, the alternative calculation of W
involves considering the number of times each bond must be traversed to
account for all paths between pairs of atoms separated by at least that bond
(Fig. 13.1-3(b)).
Weiner’s work set the stage for one of the first true multidimensional
molecular descriptor spaces. In 1979, Randit et al. published the details of
a program, written in both BASIC and FORTRAN, which found all the
paths through a molecular skeleton represented using a molecular graph [52].
Though the total number of such paths increases rapidly with molecular size,
and especially with the number of rings in a molecule, even at the time of
first publication such path counting was a practical computing task for most
chemical structures.
The strategy of this approach was to develop a set of molecular codes
corresponding to the number of self-avoiding paths of each length in a
molecule, for use both as a convenient representation in subsequent similarity
searches [52-541, and as a quantitative measure of structural complexity.
Since the basic calculation method for these codes was again based on
counting bonds, it is easy to visualize how these path codes are related to the
Weiner index (Fig. 13.1-3).While these initial molecular codes did not address
73 Chemical fnformatics
734
I
Fig. 13.1-3 Topological indices and path side ofthe bond. (c) Illustration of a graph
counts. (a) Illustration o f path counting C as a molecular representation, and o f the
leading to topological indices; beginning relationship between Randit’s path coding
with the atom labeled 1, red bonds illustrate system [52] and the Weiner index.
paths of lengths 1 through 6, terminating (d) Illustration o f a graph C’ representing
with the atoms labeled with asterisks. Randit’s later attempts [53]to include bond
(b) Illustration ofWeiner’s observation [14, order in finding paths; note how this
481 that a bond, labeled with an asterisk, will modification breaks the symmetry o f this
be traversed 3 x 9 = 27 times to account for graph, requiring relabeling o f four atoms.
all paths between pairs of atoms on either
multiple bonds, Randit later published a second version of the program [53]
that enumerates paths in chemical graphs with multiple bonds (Fig. 13.1-3(d)).
In this case, both the input information and the algorithm are more complex,
and the numeric values of the codes could be much larger, especially in the
case of molecules with multiple double bonds, but this improvement was a
step closer to representing the chemical reality of bond order.
Randit’s methods allow the association of numerical parameters with
chemical structure in a way analogous to more detailed structural studies based
on numerical calculations derived from theoretical models (e.g., quantum
chemical calculations). The distinction between these two approaches is in
the nature of the parameters, rather than the goal, which in both cases
is to define correlative relationships between numerical computation and
I
13.1 Chemical ~nformatics 735
actual molecular properties. While graph theory emphasizes conceptual

development, information encoding, and speed of calculation, quantum-
mechanical calculations focus on practical simulation of physical chemistry
theory and a more accurate (though more computationally demanding)
depiction ofchemical structure. Since this initial work, several other topological
indices proposed by Randit [54-571, Basak [55, 58-60], and Balaban [55,61,
621, have been used to predict toxicity as well as many physicochemical and
biological properties [55]. These indices have also been used in diversity
analysis [55,631 and in analyzing the “drug-likeness’’ of compounds and
compound collections [55,641. In general, topological indices have now been
highly developed theoretically [65], and are being complemented by related
information-theoretic indices, such as the Shannon index [SO, 65, 661.
13.1.3.3 Structural Feature Counts

Early QSAR studies concentrated on establishing correlations between
biological activities and experimentally derived physicochemical properties,
such as partition coefficients, molar refractivity, or pK,, and predominantly
used linear regression as a correlation technique [38, 43, 671. Although this
approach is still used, experimental physicochemical parameters have largely
been supplanted by computer-generated descriptors. In many cases, these
descriptors consist of feature counts, often computed on whole molecules
(e.g., number of carbons, number of rings, etc.), but increasingly fragment
codes have also been used to allow prediction of outcomes based on
molecular fragments or more local structural features, such as phamzacophores.
Such methods are generally fast, since only simple forms of structure
representation are needed for this type of modeling, circumventing the need
for time-consuming three-dimensional rendering, conformational analysis,
and molecular alignment, as is done with some other QSAR methods.
Fragment-based QSAR approaches are especially suited to rapid virtual
screening of large libraries against protein structures, a need that is often
encountered in both drug discovery and toxicology. The earliest attempt to
utilize substructural fragments to predict outcomes was the 1956 Free-Wilson
method [68], which uses linear correlations between an observable property
and constant, additive contributions of substituents to a common skeleton.
Subsequently, similar approaches have been used both by Leo et al. [69] and
by Ghose, Crippen, and coworkers [70-721 to calculate log P values by adding
partial log P “contributions” from each fragment in the molecule.
Attempting to couple a description of molecular topology with atom identities
in the molecular graph, Carhart et al. at Lederle Laboratories presented a
new descriptor methodology based on atom-pairs [73], inspired by Weiner’s
earlier work (Fig. 13.1-4(a)).Prior work had focused either on topology, as we
have seen from Weiner and Randit, for example, summing electronegativity
products for all pairs of atoms separated by paths of the same length [74],or on
developing chemically intuitive relationships only between directly connected
736
I 13 Chemical informatics
Fig. 13.1-4 Atom-pairs and topological space. (e) Topological torsions represent
torsions. (a) Illustration o f atom-types used the topologies of sets of four directly
in atom-pair descriptor calculation including connected atoms, using the same
atomic identities, pi-bonding, and molecular atom-types as atom-pairs; these encode
topology. (b) Distinct atom-types, some of local information only, whereas atom-pairs
which occur multiple times, make up the contain information about both local and
basic unit of atom-pair calculation. distant pair-wise relationships. Note how
(c) Atom-pairs are enumerated by the inclusion o f stereochemistry in
assembling the list ofall distinct pairs of topological torsions would increase the
atom-types and the path length connecting number o f distinct topological torsions in
them. (d) Distance metric defined by this molecule from 18 to 20 (gray ovals).
Carhart [73]in an atom-pair descriptor
13. 1 Chemical lnformatics I 737
pairs of atoms [75]. Carhart outlines two applications to structure-activity

problems to which molecular descriptors are to be applied: similarity between
compounds, and correlations of descriptors with measured biological activities.
Notably, this paper clearly frames the important goal of all molecular descriptor
studies, namely, to “express an irregular object like a chemical structure in a
regular form that allows the quantitative comparing and contrasting of those
structures” [73]. Further, Carhart explicitly enjoins his readers to consider
structure as a vector of numerical descriptors representing the position of
compounds in a high-dimensional space (i.e., a chemical space) with each
coordinate axis representing a different descriptor.
Carhart argues that Hansch analysis requires a set of compounds that are
closely related, sharing a common skeleton and differing only in the nature
and positioning of a few substituents [73]. Descriptors such as molecular
connectivity and other topological indices have the advantage that they can
be computed easily from the connection table of a structure and can be
applied to much more diverse sets of compounds. However, these descriptors
encode only whole-molecule measures of topology and therefore may be
difficult to interpret even if they do correlate with some measurable biology.
In contrast, certain other parameters that are computed from analyses of
molecular shape-encoded space-filling and electrostatic potential features can
also yield good models of activity; however, their computation requires detailed
conformational analyses.
As a compromise, Carhart et al. offered the atompair, which is meant to
afford generality, ease of interpretation, and encoding of local topological
structure. Perhaps the simplest method of encoding chemical structure
using a computer is simply to count molecular features, such as atoms,
substructures, or topological elements (e.g., rings). In general, descriptors
relating to topological substructure can take the form either of counts (e.g.,
the number of hydroxyl groups in a structure) or of binary variables that
record the simple presence or absence (as 1 or 0, respectively) of a particular
moiety. Atom-pairs, in particular, encode the number of occurrences of pairs
of atom-types separated by a particular number of bonds in the molecular
graph.
Atom-type designations in atom-pairs have constitutional, topological, and
electronic character - atoms of the same type share atomic identity, the same
number of non-hydrogen bonding partners, and the same number of bonding
j 7 electrons (Fig. 13.1-4(b)).Because of this representation, molecules tend to
have many, fewer than the theoretical maximum possible, atom-pairs (1/2
[n (n-l)]’ for a molecule with n atoms), both by virtue of having multiple
atoms of the same type, and because the order of the two atoms’ appearance
within an atom-pair is not important (Fig. 13.1-4(c)).A very significant aspect
of the definition of atom-types is that Carhart et al. provided both a distance
metric and a normalized similarity score for molecules based on the atom-pair
definition (Fig. 13.1-4(d)).Formally, such a provision is a requirement for any
metric descriptor space intended to afford a basis for comparison of molecular
738
I similarity or analysis of molecular diversity. Often, however, descriptors are
provided without such a formalism, leaving their value as a mathematical

description of chemical structure wanting.
One obvious criticism of atom-pairs is the loss of conformational information
associated with two-body topological descriptions of structure. However,
additional work at Lederle Laboratories sought to address this problem. The
reasoning was that although a specific three-dimensional arrangement of
atoms may be necessary for activity, the features essential for activity are
actually encoded in the topological description of the molecule. Torsion angles
defined by four consecutively bonded atoms represent the minimal structural
unit in terms of which molecular conformation can be completely described.
On the basis of this rationale, Nilakantan et al. proposed topological torsions as a
new descriptor set for use in QSAR studies [76].The topological torsion consists
of four consecutively bonded non-hydrogen atoms along with the number of
non-hydrogen branches (Fig. 13.1-4(e)),and is arguably the topological analog
of the torsion angle.
Immediately, the workers at Lederle recognized that the short-range
description provided by topological torsions complements the atom-pair
description in that each encodes different information about molecular
topology and shape. Like atom-pairs, topological torsions correspond to
readily recognizable features of molecules, and are similarly easy to calculate.
Comparing the two descriptions gives slightly different results in similarity
calculations. Whereas atom-pairs are sensitive to small changes even in
large molecules, topological torsions are local - the effects of changing a
single atom in a molecule is independent of the total number of atoms.
Rather, the actual number of topological torsions affected by a change in
structure depends only on the local topology in the vicinity of the change
[76]. Nilakantan et al. suggest that similarity analyses using both sets of
descriptors be combined by merging the lists of similar compounds that
were obtained by each method. Of course, by modern standards of high-
dimensional descriptor spaces, this approach somewhat misses the point - a
more useful measure of similarity (after accounting for differences in range and
variance among the descriptors) would be to perform similarity calculations
in a metric space containing both atom-pairs and topological torsions at the
outset.
Ongoing adaptation of the early work at Lederle has led to an explosion
of methods designed to exploit structural feature counts. A computer-
age generalization of feature-count descriptors are structure keys, which
work by associating bits in a string with the presence or absence of
defined molecular features (see Fig. 13.1-2(d)).In their most general form,
structure keys require that the choice of features to be included be specified
in advance, and the position of the bit in the bit-string encodes the
same feature for all molecules [77]. Indeed, in principle, Carhart’s atom-
pairs could be encoded as structure keys, with each bit corresponding to
the presence or absence of an allowed (predefined) atom-pair. Structure
73. I Chemical lnforrnatics I 739
keys are an important advance in the substructure searching of large

databases, or (more accurately) substructure screening. A screen is a
process by which candidates are ruled out efficiently, leaving only a small
number of candidates for more accurate but time-consuming comparisons.
Because structure keys encode substructural features exactly, and at defined
positions within bit-strings, encoded database objects that fuil to contain any
definite feature of a query structure can immediately be eliminated from
consideration.
Another conceptual possibility used in structure-key descriptors is to set
lower limits for the number of occurrences of a structural feature required
to set a particular bit. For example, one could associate a series of bits
with properties encoding the number of rings in a molecule, with one bit
for each of the features “>I ring”, “ > 2 rings”, etc. In this way, bit-strings
can be made to encode not only the presence but also the number of each
desired feature. It is important to note that this type of strategy enables the
encoding of any collection of molecular features, provided that a sufficient
number of bits are allowed. In addition to discrete parameters (e.g., number
of rings), even continuous-valued parameters (such as log P) could be encoded
in keys, provided that the continuous values can be binned to an acceptable
resolution.
Similar to atom-pairs, atom-triples have also been used to encode ligand
features in terms of the properties of triangles [78] since three-body objects
retain more information than pair-wise representations. However, often the
number of constituent triples for which calculations are required became
limiting, allowing fewer structures or fewer conformers to be considered. In
one adaptation, Good et al. [78]restricted their consideration to “key functional
centers” in molecules that participate in the triplet descriptions. While this
method reduces computation times for large databases, its inherent bias
(i.e., the preselection of which pharmacophores are allowed to participate)
presents a new set of problems for truly generic database and substructure
searching.
Circumventing this conceptual limitation of structure keys required an
important evolution of feature-counting methodologies - the notion of the
molecular fingerprint [77, 79, 801. In general, molecular fingerprints are bit-
strings that encode information about molecular atom-types, topology, and
even extended functional groups, but without prespecifying which features
are to be encoded. This generality is accomplished by generating the list of
features from the molecular structure itself, with a pattern representing each
atom, each pair of connected atoms, each triplet of connected atoms, and SO
on, Each of these patterns, up to some connectivity radius, is used to seed a
pseudorandom number generator that determines which bits are set by that
pattern. Though this hash-coding procedure does not preserve the positional
meaning of individual bits within the overall fingerprint, it does ensure that
any molecular fingerprint containing a given pattern will contain the bits
associated with that pattern.
13 Chemical fnforrnatics
740
I A fingerprint space can thus be viewed as a bit-string that is shared among
a very large unknown set of molecular features. Since each feature sets its own
subset of the bits (usually 4-43),the presence of a feature is related to the chance
that at least one of these bits is shared with no other pattern. Obviously, this
probability depends upon the total length of the fingerprint, the total number of
bits set by each pattern, and the total number of patterns. While structure keys
indicate the definitive presence or absence of a particular feature, fingerprints
are better at ruling out features (a required bit is absent) than confirming
them, since the presence of a pattern can only be determined with some
probability [77,79-811. Nevertheless, because of their higher density than
structure keys and their generality, fingerprints are now quite widely used in
cheminformatic applications. Since the introduction of fingerprints, and their
wide adoption in database systems such as Daylight, other fingerprints have
been developed that are tailored for other applications, such as learning and
clustering [77, 81, 821.
13.1.3.4 Electrotopological States (E-states)

Among the most self-contained and complete molecular descriptions is
that of Kier and Hall [83, 841, termed the electrotopological state (E-state).
This description combines electronic and topological characteristics of small
molecules, making use of the hydrogen-suppressed graph to generate state
values for each non-hydrogen atom. To compute E-state values, individual
non-hydrogen atoms within the molecular structure first receive intrinsic state
values according to the formula:
)(;
2
S”+1
I=
S
where N is the principal quantum number, S is the number of connected
atoms other than hydrogen, and 8’ is the number of valence electrons not
involved in bonds to hydrogen (Fig. 13.1-5(a)).The intrinsic state aims to
encode the accessibility of an atom to intramolecular interaction as well as the
collection of bonds over which adjacent atoms may influence its state [83, 841.
Note that this definition provides identical resolution of structural elements as
the atom-types used for atom-pair and topological torsion calculation (compare
with Fig. 13.1-4(b)).Estates, however, modify the intrinsic state by accounting
for all influences between atoms using the formula:
where ry is the number of atoms in the shortest path containing atoms i

and j, and the sum is taken over all atoms j in the molecule. The resulting
1.33
Fig. 13.1-5 Intrinsic and electrotopological resolution t o the atom-type definitions i n

states. (a) Illustration o f intrinsic state Fig. 13.1-4(b). (b) Illustration o f t h e
values; note that these values encode electrotopological state (E-state) values of
similar information and have equivalent Kier and H a l l [83, 841.
E-state values now reflect the influences of neighboring atoms, and thus
discriminate atoms with quite similar environments as having at least slightly
different E-state values (Fig. 13.1-5(b)).One of the primary benefits of the
E-state description of molecules is its generality; the calculations proceed from
first principles and can produce, overall, a high-dimensional “state space”
into which each molecule is positioned. Indeed, Kier and Hall argue that
to “generalize any analysis of molecular description to large collections of
arbitrary structures, it is necessary to work in a mathematical framework
that accounts adequately for the number and type of descriptors necessary
to build a relatively complete description of chemical structure.” This and
similar methods allow for an encoding of such structural features as size,
branching, unsaturation, cyclicity, heteroatom content, etc., in quantitative
terms, and provide a framework for numerous structure-activity applications
[55, 56, 85-87].
13.1.3.5 Shape and Field Descriptor Methods

While most of the foregoing methods focus on the rapid encoding of molecular
structure, particularly to facilitate large database searches and similarity
comparisons, it is still desirable and practical in some circumstances to encode
chemical structure using descriptors that explicitly account for molecular
shape properties, such as surface area or volume, in some regular fashion. In
general, one obstacle to conformation-dependent drug design is the accurate
characterization of molecular shape. One of the pioneers of this type of work,
Hopfinger made an important distinction between shape and conformation,
noting that conformation “is a component of shape in that conformation
defines the location of atoms in space. The properties of these atoms, most
notably their ‘sizes’, represent an additional set of factors needed” to fully
specify molecular shape [88].
742
Earlier work in this area of shape analysis focused on QSAR studies

accounting for conformational features of molecules, such as interatomic
distances [89], explicit atomic coordinate sets [go], computed intermolecular
distances [91], and simpler shape descriptors such as molecular volume “921.
Each of these descriptor types formally requires conformational analysis, and
therefore produces, accordingly, a family of solutions for most structures.
Against this backdrop, Hopfinger developed a model of molecular shape
on the basis of shape overlap, and used these descriptors to aid in the
prediction of activities of a series of dihydrofolate reductase inhibitors. In
this study, Hopfinger compares his QSAR example favorably to a similar
model from Silipo and Hansch [93],which is based solely on physicochemical
and substructural features. In this example, at least two shape descriptors
and one physicochemical feature were required to explain the variance in
enzymatic inhibition data [88].Thus, at least in this QSAR example, systematic
consideration of three-dimensional molecular geometry was essential to
explain drug potency. Hopfinger later developed a general formalism, on
the basis of a molecular mechanics pair-wise potential function, to compute
molecular potential energy fields [94].These functions, too, are conformation-
specific, requiring additional analysis and multiple solutions per molecule.
However, molecular descriptors can be derived from the resulting potential
energy fields, which in turn can be used in QSAR studies.
In 1988, Cramer et al. introduced comparative molecular field analysis
(CoMFA) [95], a descriptor methodology based on the notion that the most
relevant calculable properties to small molecule-receptor interaction are
shape-dependent properties. Cramer argued that because biological effects are
noncovalent, molecular mechanics force fields used to model stereoelectronic
effects could account for most such effects. CoMFA attempts to sample these
fields by considering a probe object designed to “feel” these forces from
a molecule at each point of a three-dimensional lattice. Each lattice point
gives rise to a steric and electronic potential term experienced by the probe
object, and thus the size of the resulting descriptor list can depend greatly
on the resolution of the probe object. However, because each descriptor has
the same energetic unit (e.g., kcal/mol), there is no need to normalize the
descriptor set before deriving a QSAR model. In general, CoMFA produces
descriptor lists that are considerably larger than the number of compounds
under consideration. Accordingly, CoMFA was one of the first QSAR methods
to rely on partial least-squares (PLS) analysis [88,95-981, which seeks to derive
linear equations from tables having many more columns than rows. Since the
development of CoMFA, a number of modifications and evolutionary advances
have afforded methods to improve model performance through variable subset
selection.
QSAR methods such as those used by Hopfinger and Cramer measure the
overall stereoelectronic similarity between pairs of molecules, in general by
relating activity data to comparisons of query molecules with a single lead
molecule. Good et al. extended this work by attempting such correlations
13. I Chemical informatics I 743
to data matrices obtained by the complete set of pair-wise comparisons

among a collection of molecules [78, 99, 1001, which gave excellent correlation
for a set of steroids. This work extends the notion of a property overlap
parameter, such as that used by Carhart [73] as a measure of similarity;
again, the numerator measures property overlap while the denominator
normalizes the similarity result (see also Fig. 13.1-4(d)).As originally applied,
electron density was used as the structural property for which overlap
was measured. In the study by Good, electrostatic potential, electric field,
and shape were also used by modifying the original program. These
additional parameters were used to derive good QSAR models for several
systems.
In 1996, Cramer introduced another advance in shape-based molecular
description as an extension of CoM FA, introducing “topomers” [ 1011.
Topomers make use of the substructural commonalities among members of
congeneric series of molecules to align the structures in a CoMFA field. For this
reason, their use is restricted to cases in which all members compared contain
a common substructural element, which is reminiscent of the empirical work
of Hansch on substituted benzenes. Cramer uses a “topomeric” algorithm
to align the variable portion of each molecule, in the process selecting a
representative conformation. The steric components of CoMFA are then
calculated for each of these variable portions, and the resulting descriptors
used to generate clusters of similar molecules. In the case of the original
topomer paper, Cramer segregated over 700 commercially available thiols into
231 bioisosteric clusters with compositions, at least as well in agreement with
medicinal chemistry experience and intuition as clusters derived with previous
computational methods.
Cramer’s topomer work is based on the idea that earlier efforts at
molecular alignment (including in his earlier CoM FA work) overemphasize
the need to find receptor-bound or minimum-energy conformations [ 1011.
The authors offer three explanations for why this might be so. First, they
argue that steric interactions are the most important class of noncovalent
interactions responsible for receptor engagement. Second, they cite the
nonindependence of electronic factors from steric factors, alluding to the
possibility of correlations between different descriptors, a complication
that is endemic to multidimensional descriptor spaces. Third, they note
that adding another geometric field (such as the electronic components
of CoMFA) would halve the contribution of steric information to the
differences between one molecular shape and another - in this case, many
more compounds would be required to recapitulate the observed bioisosteric
classes. This last reason is especially thought provoking - there are infinite
possible descriptors, but choosing too many for a particular comparison
may obscure the classification one is seeking, particularly if the “extra”
descriptors do not encode information germane to that classification. In
Cramer’s case, bioisosteric classes were sought that aesthetically agreed with
the intuition of medicinal chemists; for this reason, tqpomer classification
744
I of these thiols
13 Chemical informatics
was restricted to descriptors resulting from steric field
interactions.
A less direct but equally significant feature of the topomer paper is the
fact that Cramer et al. explicitly considered (and discussed in detail) several
features of the available clustering methods, the consequences of the chosen
number of clusters, and justified their choices. Sadly, such rigor is often
lacking in molecular descriptor analysis, particularly as commercial descriptor
calculation and clustering packages with fewer adjustable parameters (or
more “entrenched” default values for these parameters) emerge. Cramer et al.
rationalize the use of hierarchical clustering with complete linkage (where
intercluster distances are defined in terms of the worst-case scenario, or
maximum distance, between any pair ofobjects, one from each cluster) with the
intention of maximizing intracluster similarity at the expense of computational
resources. In particular, complete linkage hierarchical clustering produces
roughly spherical clusters, whose positions remain essentially stationary as
new objects are added, and which merge reluctantly. Practically speaking, such
clusters should be relatively robust to the input set of molecules.
In one particularly simple and elegant shape-based approach to molecular
description, Sauer and Schwarz [102, 1031 proposed the use of ratios between
principal moments of inertia (Fig. 13.1-6(a)).Here the authors reasoned that
the shape envelope of small molecules could be viewed as falling between three
limiting cases representing rods, disks, and spheres (Fig. 13.1-6(b)).By using
ratios computed using the principal moments of inertia of small molecules,
the authors reduced the problem of shape to a two-dimensional mapping onto
an isosceles triangle (Fig. 13.1-6(c)).Using this framework, the authors set
out to describe differences in chemical space coverage coming from skeletal
diversity, as defined by the number of different scaffolds represented by a
compound collection, versus appendage diversity, as defined by the inclusion
of multiple building blocks on a common scaffold. Most importantly, this
method encodes molecular shape independently of molecular size, allowing
shape comparisons to be made between molecules spanning large ranges of
molecular weight.
In general, shape-based descriptor methods can be viewed as the most
“realistic” picture of chemical structure, since latent features such as
molecular topology and valence remain implicitly encoded, whereas the
overall description is capable of encoding additional stereochemical and
conformational information. In general, this accuracy bears a certain
computational cost, either because detailed modeling must be employed
to generate a “good” three-dimensional structure for which to compute
descriptors, or because conformational uncertainty warrants calculation of
descriptors for a family of conformers. Nonetheless, shape-based molecular
description can provide powerful insights into the relationships between
topology, stereochemistry, and conformation in determining molecular
properties.
Fig. 13.1-6 Shape-envelope analysis based envelope" of small molecules.
on principal moments of inertia. (c) Two-dimensional map of a chemical
(a) Illustration of principal moments of space based on principal
inertia. (b) Relationships of principal moments-of-inertia ratios.
moments of inertia to the ideaiized "shape
746
I 13 Chemical lnformatics
13.1.4
Applications and Examples: Molecular Descriptor Spaces
As we have seen, molecular descriptors constitute information about steric and

electronic constraints conferred by chemical structure [104, 1051. Molecular
descriptors underlie both pharmacophore models [106, 1071 and analyses of
similarity or diversity among compound collections [log, 1091. The calculation
of descriptors therefore serves as a starting point in the analyses of small-
molecule relationships assessed prior to compound synthesis, before selecting
compounds for HTS, and in the interpretation of biological measurements of
small-molecule perturbation.
As described earlier, QSARs have emerged as a computational paradigm in
modern drug design [ 110- 1121. This approach attempts to encode biological
activity as a mathematical function using numerical methods to correlate large
amounts of screening data for hundreds or thousands of candidate compounds.
The data are mapped onto a chemical space consisting of several descriptors,
with the hope that this space can reliably estimate the properties of new
molecules [44]. A fundamental assumption of QSAR is that variations in the
biological activity of a series of chemicals that target a common mechanism of
action are correlated with variations in their structural, physical, and chemical
properties [32, 1131. Since structural properties of a small molecule can
often be determined more efficiently than biological properties, a statistically
valid QSAR model is a desirable substitute for the time- and labor-intensive
processes of chemical synthesis and biological testing.
Obtaining a statistically robust model depends on how well the selected
descriptors encode variations in activity within a structure series [32].
Information about molecular mechanism can aid a chemist in selecting
among available descriptors, but as we have seen, there are numerous bodies
of molecular descriptor theory, and the overall number of available descriptors
can easily number in the thousands. For this reason, modern molecular
modeling programs often include statistical tools to help evaluate which
descriptors best encode structure-activity variation.
About a decade ago, computational chemistry researchers began to address
the questions associated with how to validate a descriptor or set of descriptors.
Patterson et al. [114] established a framework for considering diversity in the
context of both lead discovery and lead optimization. In particular, Patterson’s
method relies on the discovery of “neighborhood behavior” between molecules
when considering the effects of changes in a measure of molecular diversity
and some biological activity. The chief requirement of a “valid” molecular
diversity description, argue the authors, is that small differences (distances)
in the underlying descriptor space do not often produce large differences in
biological response. A second important result of this work was the finding
that, in general, higher dimensionality of an underlying descriptor space
most often was predictive of good neighborhood behavior, and therefore of
“validity” of the descriptor space with respect to arbitrarily chosen biological
13.7 Chemical lnformatics I 747
outcomes. In this particular study, Patterson et al. used their method to validate
a number of individual descriptors and multidimensional descriptor spaces,
concluding that CoMFA fields, as well as two-dimensional (2-D) fingerprints
of the variable portions of the molecule series (each molecular descriptions of
high dimensionality), were most often possessed of neighborhood behavior.
Satisfactorily, later work using these concepts at Bristol-Myers-Squibb [ 1151
allowed for the prospective choice of molecules to synthesize that they were
significantly enriched in biological activity against angiotensin 11. In these later
studies, the topomer shape similarity description was once again shown to be a
highly effective predictor of activity, followed by the atom-pair description. For
this particular problem, most other descriptions did not exhibit the required
“neighborhood’ behavior.
Consistent with the results of Patterson, which allow large differences
in diversity descriptors to produce large variation in biological activity,
later work found that the use of “valid” molecular description methods
was more important than whether the test compounds used to inform the
prospective syntheses were weakly active or strongly active, suggesting that
this method should be a general way to approach lead optimization problems.
To generalize these conclusions with respect to chemical descriptor spaces,
especially notable is the better performance of two-dimensional fingerprints
of variable side-chains to whole-molecule two-dimensional fingerprints in the
original validation study [ 1141, suggesting that the highest dimensional space
relating to the variable portions ofthe molecules is desirable to use as a diversity
description. Intuitively, such descriptor spaces satisfactorily correspond to the
most information-rich description of the molecules under consideration.
Benigni et al. [11G] also compared different molecular description methods,
inspired by the study of global versus local properties of a molecular descriptor
space. Comparing a series of 148 structure keys, similar to those described
earlier, to a heterogeneous set of 37 one-dimensional (e.g., molecular weight),
two-dimensional (e.g., Weiner indices and E-states), and three-dimensional
(e.g., surface areas) molecular descriptors, Benigni et al. investigated a col-
lection of nearly 300 noncongeneric small molecules at both global and local
levels. Among the strengths of this approach was the authors’ clear distinction
between effects evident using local methods such as cluster analysis and effects
evident using global methods such as principal component analysis (PCA).
While cluster analysis techniques provided a detailed description of local struc-
ture within a chemical space, such as similarities between cluster members
and intercluster distances, factorial techniques, such as PCA, describe the en-
tire dataset in terms of a small number of orthogonal basis vectors. The authors
make use of this complementarity to show that the two descriptor spaces are
globally similar (isomorphic) as judged by the overall high mutual correlation
of their PCA transforms, and the progressive increase in this concordance
with increasing numbers of principal components (matched between the two
spaces to achieve similar levels of explanation of the overall variance). On the
other hand, cluster analysis, using k-means clustering and several choices of
748
I k, revealed that the structure-key description had much lower cluster propen-
13 Chemical fnformatics
sity (departure from a uniform population of the descriptor space) than did
the composite space composed of the one-dimensional, two-dimensional, and
three-dimensional descriptors. The authors suggest that this result can be ex-
plained by the much lower information density of the former space, composed
as it is from a series of binary features (presence or absence of predefined
structural features; see also Section 13.1.3.3) rather than from a collection of
discrete- or continuous-value variables. The generality of these results to ad-
ditional descriptor spaces will likely require additional experiments involving
many more compounds, but the conclusion that global isomorphism between
two descriptor spaces does not predict similarity in the fine structure between
those spaces is inescapable. The latter result has very important consequences
when considering the use of molecular descriptors in different computational
chemistry tasks. First, it suggests that any sufficiently information-rich repre-
sentation of chemical structure, whether composed of a large number of binary
variables (such as fingerprints) or composed of a smaller number of discrete-
or continuous-valued variables, is suitable for global analysis problems, such
as maximizing the overall diversity of a screening collection. On the other
hand, it suggests that the choice of descriptor space is quite important for local
problems such as lead exploration as envisioned in the neighborhood plots of
Patterson, or QSAR studies among members of congeneric series.
Rusinko et al. [117] reported an elegant method for feature (chemical
subspace) selection among binary descriptors using recursive partitioning.
The method requires that some measure of activity be recorded for the
compounds, but this activity figure can be qualitative. In this study, the
activities were simply 0, 1, 2, 3 , representing no activity, weak, moderate, or
strong activity. The authors' method uses sparse-matrix techniques to move
quickly through a very large set of descriptors and choose those descriptors
most responsible for discriminating active compounds from inactive ones.
The descriptors used were atom-pairs, topological torsions, and atom-triples,
computed for a group of 1650 monoamine oxidase (MAO) inhibitors. Using
the statistical T-test to find individual descriptors that accounted for large
differences in mean activities between the two groups, the authors achieved
15-fold enrichment (7/227) versus 72/3 5631 in inhibitors relative to random
selection. However, the false-negative and false-positive rates were both high,
since the method picked 220 other molecules that were not M A 0 inhibitors
and failed to find 65 M A 0 inhibitors in the dataset. The authors provide an
excellent discussion of the comparison of this method with other methods,
especially including those methods that fail badly when multiple mechanisms
of action are simultaneously operant in a dataset.
Also using chemical space as a framework, Agrafiotis [118] presented a
very fast method for diversity analysis on the basis of simple assumptions,
statistical sampling of outcomes, and principles of probability theory. This
method presumes that the optimal coverage of a chemical space is that
of uniform coverage. The central limit theorem of probability theory
73.7 Chemical Informatics 1 749
suggests that the distribution of distances between uniformly distributed

points becomes normal in the limit of a large number of dimensions. By
representing uniform coverage of chemical space in terms of a normal
distribution of distances, Agrafiotis was able to use a statistical test for
normality, the Kolmogorov- Smirnov (K-S) test, to determine whether a
given experimental coverage of chemical space, represented by a collection
of compounds under study, is more or less uniform. An important result
of this work was that a relatively small sampling of the overall collection
of intercompound distances closely approximated the expected distribution if
all pair-wise distances were explicitly computed, allowing the method to be
used to select subsets of building blocks in a combinatorial synthesis that
provided the most uniform coverage of products in the descriptor space of
interest.
Oprea provided a novel and important advance in descriptor space analysis
by introducing the ChemGPS system [119]. The key feature of this work
is to attempt to provide a global map of “drug-like’’ descriptor space by
deliberately choosing molecules well outside the drug-like space as “satellites”
with extreme values relative to the molecules under consideration. As a method
for providing a standard metric for chemical space, ChemGPS is essentially
generic; though it focuses on the drug-like space, the principles could be
applied broadly and are largely independent of the choice of molecular
descriptors used. In later work, Oprea applied a different descriptor set to
molecules in an effort to produce a chemical space relevant to absorption,
distribution, metabolism, and excretion (ADME)/toxicologystudies [120]. In
this case, the principal components corresponding to this space, named
GPSVS, were shown to be correlated to physically interpretable properties
of compounds, namely, solubility and permeability. This finding is certainly
not a general feature of PCA-based methods, since a priori there is little
reason to expect a preservation of chemical interpretability in the light of
a PCA transformation of data. However, in this case, the combination of
the ChemGPS method with a particular descriptor set (VolSurf) chosen for
its relevance to ADME properties, afforded a solution that provided a map
of chemical space subject to practical interpretation, despite its reduced
dimensionality.
In an effort to compare descriptor distributions between compounds
from different sources and synthetic paradigms, Feher and Schmidt [121]
used PCA-based methods to compare property distributions from natural
products, drugs, and combinatorial libraries. In this case, the authors used
chemical space as a common framework to ask questions about the how
the origins of compounds are manifest in their structural features at a
global level. In particular, this study demonstrates the general dominance
of synthetic efficiency, rather than structural diversity, in the preparation
of compounds by combinatorial chemistry. The descriptors most able
to distinguish natural products from those synthetic molecules studied
were those that rendered the latter class easier to make, such as fewer
750
I stereocenters, more aromatic rings, fewer complex ring systems, and more
flexible substituents. The authors confront the apparent paradox that the
search for synthetic substitutes for natural compounds often proceeds by
making exactly the types of changes known to medicinal chemists to
result in weaker and less specific activities. Not surprisingly, actual drug
molecules occupy a region of chemical space overlapping with both natural
products and synthetic molecules, since some drugs come from each of these
sources. Here, the authors suggest complementing traditional “drug-like”
property filters (i.e., Lipinski’s “rule of 5” [40]) with “natural product-like’’
property filters in an effort to synthesize molecules sharing more features
in common with natural products, in hope of synthetically accessing a
potentially underpopulated portion of pharmacologically relevant chemical
space.
These examples provide a good survey of approaches to problems in
cheminformatics, which rely on molecular descriptors and the definition
of a molecular descriptor space. One take-home message underpinning
all of these studies is that in defining chemical similarity and diversity,
both the choices of objects (molecules) and attributes (descriptors) are
important in determining the outcome. Many of these studies also show
how advances in computer hardware and software have been brought
to bear to address large-scale problems not explicitly tractable even a
generation ago.
13.1.5
Future Development: Multidimensional Outcome Metrics
In the past, it has been difficult to assemble collections of data on

small molecules that afford global comparisons of outcomes over both
broad structural classes of molecules and broad coverage of biological
motivation, for several reasons. First, many assays are still carried out
in a low- or medium-throughput format, and are typically performed on
subsets of compounds identified by higher throughput methods [122-1251.
Consequently, the scope of chemical structural diversity exposed to these
assays is restricted; indeed, such assays are often focused intensely
on lead series lacking skeletal diversity. Furthermore, since many such
assays are performed in the private sector by pharmaceutical companies,
the results from diverse assays are often not cross-referenced between
different organizations, producing result-sets that are either disjoint, or
whose relationships are difficult to interpret [41]. However, the advent
of technologies such as various microarray formats, and the increasing
prevalence of HTS and high-content screening in the academic sector,
now facilitates the public assessment of diverse compound collections
in many different biological contexts, especially including phenotypic
assays [126].
13.1 Chemical lnforrnatics I 751
Early work in the area of generating multidimensional biological mea-

surements of small molecules was carried out by Kauvar et al., who
focused on generating vectors of binding affinities to collections of pro-
teins [127]. Additional multidimensional phenotypic screening has involved
chemical-genomic profiling of yeast with different genetic backgrounds
for growth sensitivity [128], a study of stereochemical and skeletal diver-
sity among a collection of carbohydrates using chemical-genetic modifier
screens [129], and mechanism discovery by profiling small molecules using
high-throughput microscopy [ 1301, among others. More recently, similar stud-
ies have been extended to models of the proteome [131] and the tyrosine
kinome [80].
The most obvious consequence of these types of experimental advances
is the need for new computational methods in modeling structure-outcome
relationships. Traditionally, QSAR has considered situations where the de-
scriptors used to characterize molecular structure form a chemical space,
but the measurement of activity is a scalar quantity, usually an IC50 against
a particular target (Fig. 13.1-7(a)). In future, however, profile-based char-
acterization of small molecules, particularly early in drug discovery or in
the academic sector, will provide a much richer set of biological charac-
terization - inherently multidimensional - about small-molecule collections.
Under many circumstances, the data from multiple parallel or multiplexed
biological assays can be rendered formally comparable, allowing activity (or,
more broadly, phenotype) to be encoded as a vector of values (Fig. 13.1-7(b)).
Thus, modeling the relationships between small-molecule structures and
the phenotypes that they cause in biological systems will require new
computational approaches beyond the traditional regression techniques of
QSAR.
The more subtle, but potentially more exciting, consequence of such
multidimensional data analysis is the superposition of biological annotations
onto a collection of measurements, allowing connections between the
biological “coordinates” to be made independently of the measurements
themselves. As we have seen in this chapter, there are specific relationships
between various calculated molecular descriptors, based on the theory of
their construction or on their relationships to molecular properties such
as size and shape. Similarly, there are implicit encodable relationships
between the different assays that comprise any multidimensional fingerprint
of assay outcomes, such as combinations of cell states and cellular assays
(Fig. 13.1-7(c)). Exploiting such relationships across diverse collections of
small molecules indeed may uncover new relationships between the biological
states themselves.
Even more powerful is the notion of a global set of annotations encompassing
any conceivable small-molecule assay design and allowing connections
between experiments (on the same or similar compounds) conceived and
performed independently in different laboratories worldwide. In their simplest
form, such annotations can take the form ofliterature terms [132], for example,
752
References 1753
4 Fig. 13.1-7 Transition from complex mapping to activities that are
one-dimensional to multidimensional vector, rather than scalar, quantities, as
activity measurements. (a) Traditional increasing amounts of multidimensional
quantitative structure-activity relationship data become available. (c) Conceptual
(QSAR) considers the relationship between illustration of complex design and
some calculated descriptor space and a experimental relationships possible among
single measurement of activity, such as an components of multidimenslonal biological
lCs0 for enzyme inhibition. (b) Future work activities (see text).
with chemical space will require a more
to connect members of different target classes among a large group of proteins.

More complex examples are clearly possible, including visual phenotypes
measured via high-content screening [130, 133-1351, or the genotypes of cell
lines used in cell-based assays [136, 1371. To fully leverage this type of analysis
will require a rich ontology for phenotypes that explicitly link the biological
literature to the experimental design of small-molecule assays. It is in this way,
requiring full engagement of experimental biologists, that cheminformatics
and chemical space can fulfill their full potential in modern chemical biology
research.
References
1. E.J. Corey, X.-M. Cheng, The logic of 7. D. Bonchev, D.H. Rouvray, Chemical
Chemical Synthesis, John Wiley, New Graph 7heory: Introduction and
York, 1989. Fundamentals, Abacus Press, New
2. I. Ugi, J. Bauer, K. Bley, A. Dengler, York, 1991.
A. Dietz, E. Fontain, B. Gruber, 8. J. McMurry, Organic Chemistry,
R. Herges, M. Knauer, K. Reitsam, Brooks/Cole Publisher, Pacific
N. Stein, Computer-assisted solution Grove, 1992.
of chemical problems - the historical 9. C.A. Russell, The History ofValency,
development and the present state of Humanities Press, New York.
the art of a new discipline of 1971.
chemistry, Angew. Chew., Int. Ed. 10. A. Cayley, On the mathematical
Engl. 1993,32,201-227. theory of isomers, Philos. Mag. 1874,
3. K. Zuse, Der Computer, Mein 47,444-446.
Lebenswerk, Springer, Berlin, New 11. J.J.Sylvester, Chemistry and algebra,
York, 1984. Nature 1877, 17, 284.
4. J. Lederberg, Topological mapping of 12. D. Vukicevic, A. Milicevic, S. Nikolic,
organic molecules, Proc. Natl. Acad. J. Sedlar, N. Trinajstic, Paths and
Sci. U.S.A. 1965,53, 134-139. walks in acyclic structures:
5. R.K. Lindsay, Applications ofArt$cial plerographs versus kenographs,
Intelligencefor Organic Chemistry: T h e ARKIVOC2005, x 33-44.
DENDRAL Project, McCraw-Hill 13. N. Biggs, E.K. Lloyd, R.J. Wilson,
Book, New York, 1980. Graph Theory 1736-1936, Clarendon
6. G.E. Vleduts, Concerning one system Press, Oxford [England], 1976.
of classification and codification of 14. 1. Gutman, D. Vidovic, L. Popovic,
organic reactions, If: Storage Retr. Graph representation of organic
1963, 1 , 117. molecules: Cayley’s plerograms vs
754
I 13 Chemical lnforrnatics
his kenograms,J. Chem. SOC., Chem. lnf: Comput. Sci. 1988,28,

Faraday Trans. 1998, 94,857-860. 31-36.
15. A. Streitwieser, C.H. Heathcock, 29. D.A. Weininger, J.L. Weininger,
E.M. Kosower, Introduction to Organic SMILES 2: Algorithm for generation
Chemistry, Macmillan, New York, of unique SMILES notation, J . Chem.
1992. lnf: Comput. Sci. 1989, 29, 97-101.
16. K.P.C. Vollhardt, Organic Chemistry, 30. S. Borman, Production of optically
W.H. Freeman, New York, 1987. active drugs using lipases, Chem.
17. W.H.T. Davison, M. Gordon, Sorting Eng. NEWS1990, 28,9-14.
for chemical groups using 31. R.L. Lipnick, Charles Ernest Overton:
Gordon-Kendall-Davisonciphers, narcosis studies and a contribution to
Am. Doc 1957, Vlll, 202. general pharmacology, Trends
18. M. Gordon, C.E. Kendall, W.H.T. Pharmacol. Sci. 1986, 7, 161-164.
Davison, Chemical Ciphering: A 32. R. Perkins, H. Fang, W. Tong, W. J.
Universal Code as an Aid to Chemical Welsh, Quantitative structure-activity
Systematics, Royal Institute of relationship methods: perspectives
Chemistry of Great Britain and on drug discovery and toxicology,
Ireland, London, 1948. Environ. Toxicol. Chem. 2003, 22,
19. G.M. Dyson, E.F. Riley, Mechanical 1666-79.
storage and retrieval of organic 33. L.P. Hammett, The effect of structure
chemical data, Chem. Eng. News upon the reactions of organic
/1961,74-80. compounds. Temperature and
20. W.H. Waldo, Searching two solvent influences, I Chem. Phys.
.
dimensional structures by computer, 1936,4,613-617.
J . Chem. Doc. 1962, 2, 1. 34. C. Hansch, A. Leo, R.W. Taft, A
21. W.H. Waldo, R.S. Gordon, J.D. survey of Hammett substituent
Porter, Routine report writing by constants and resonance and field
computer, A m Doc 1958, 9, 28. parameters, Chem. Rev. 1991, 91,
165-195.
22. W.J. Wiswesser, The Wiswesser line
formula notation, Chem. Eng. News
35. L.P. Hammett, Physical Organic
Chemistry; Reaction Rates, Equilibria,
1952,3523.
and Mechanisms, McGraw-Hill Book
23. W.J. Wiswesser, A Line-Formula
Company, Inc., New York, London,
Chemical Notation, W. Y. Crowell Co.,
1940.
New York, 1954.
36. J. Shorter, The prehistory of the
24. H. Bouman, Linearly organized
Hammett equation, Chem. Listy 2000,
chemical code for use in computer
94,210-214.
systems (locus),/. Chem. Doc. 1962, 37. C. Hansch, A quantitative approach
3, 92-96. to biochemical structure-activity
25. L. Spialter, The atom connectivity relationships, Acc. Chem. Res. 1969,
matrix (ACM)and its characteristic 2,232-239.
polynomial (ACMCP),J . Chem. Doc. 38. T. Fujita, J. Iwasa, C. Hansch, A new
1964,4,261-269. substituent constant, pi, derived from
26. L.C. Ray, R.A. Kirsch, Finding partition coefficients,J. Am. Chem.
chemical records by digital SOC.1964,86,5175-5180.
computers, Science 1957, 126, 39. J. Iwasa, T. Fujita, C. Hansch,
814-819. Substituent Constants For Aliphatic
27. A.M.M. Jorgensen, J.T. Pedersen, Functions Obtained From Partition
Structural diversity of small molecule Coefficients,J . Med. Chem. 1965, 56,
libraries, /. Chem. In$ Comput. Sci. 150-3.
2001,41,338-345. 40. C.A. Lipinski, F. Lombardo, B.W.
28. D.A. Weininger, SMILES, a chemical Dominy, P. J. Feeney, Experimental
language and information system 1: and computational approaches to
Introduction and encoding rules, J. estimate solubility and permeability
References I 7 5 5
in drug discovery and development 52. M. Randic, G.M. Brissey, R.B.
settings, Adv. Drug Delivery Rev. Spencer, C.L. Wilkins, Search for all
1997,23, 3-25. self-avoiding paths for molecular
41, A.P. Beresford, M. Segall, M.H. graphs, Comput. Chem. 1979,3,5-13.
Tarbit, In silico prediction ofADME 53. M. Randic, G.M. Brissey, R.B.
properties: are we making progress? Spencer, C.L. Wilkins, Use of
Curr. Opin. Drug Discov. Devel. 2004, self-avoiding paths for
7, 36-42. characterization of molecular graphs
42. H. Yu, A. Adedoyin, ADME-Tox in with multiple bonds, Comput. Chem.
drug discovery: integration of 1980,4,27-43.
experimental and computational 54. M. Randic, On characterization of
technologies, Drug Discov. Today molecular branching, /. Am. Chem.
2003,8,852-61. SOC.1975, 97,6609-6615.
43. C. Hansch, A. Leo, D.H. Hoekman, 55. A.K. Debnath, Quantitative
Exploring Q S A R ,American Chemical structure-activity relationship
Society, Washington, 1995. (QSAR) paradigm - Hansch era to
44. Y.A. Ban, S. Bereg, N.H. Mustafa, A new millennium, Mini Rev. Med.
conjecture on Wiener indices in Chem. 2001, 1, 187-195.
combinatorial chemistry, 56. L.B. Kier, L.H. Hall, W.J. Murray,
Algorithmica 2004, 40,99-117. M. Randic, Molecular connectivity. I:
45. R. Gozalbes, J.P. Doucet, F. Derouin, Relationship to nonspecific local
Application of topological descriptors anesthesia, 1.P h a m . Sci. 1975, 64,
in QSAR and drug design: history 1971-4.
and new trends, Curr. Drug Targets 57. T. Pisanski, D. Plavsic, M. Randic,
In&. Disord. 2002, 2, 93-102. On numerical characterization of
46. I. Gutman, O.E. Polansky, cyclicity,J. Chem. In$ Comput. Sci.
Mathematical Concepts in Organic 2000,40,520-523.
Chemistry, Springer-Verlag, Berlin, 58. S.C. Basak, S. Bertelsen, G.D.
New York, 1986. Grunwald, Use of graph theoretic
47. 0. Ivanciuc, S.L. Taraviras, parameters in risk assessment of
D. Cabrol-Bass, Quasi-orthogonal chemicals, Toxicol. Lett. 1995, 79,
basis sets of molecular graph 239-50.
descriptors as a chemical diversity 59. B.D. Gute, G.D. Grunwald, S.C.
measure, J. Chem. In$ Comput. Sci. Basak, Prediction of the dermal
2000,40,126-134. penetration of polycyclic aromatic
48. H. Wiener, Structural determination hydrocarbons (PAHs): a hierarchical
of paraffin boiling points, J . Am. QSAR approach, S A R Q S A R Environ.
Chem. SOC.1947, 69, 17-20. Res. 1999, 10, 1-15.
49. E. Estrada, E. Uriarte, Recent 60. C. Hansch, D. Hoekman, H. Gao,
advances on the role of topological Comparative QSAR: Toward a
indices in drug discovery research, Deeper Understanding of
Curr. Med. Chem. 2001,8, Chemicobiological Interactions,
1573-1588. Chem. Rev. 1996, 96, 1045-1076.
50. A.R. Katritzky, V.S. Lobanov, 61. A.T. Balaban, Highly discriminating
M. Karelson, Normal boiling points distance-based topological index,
for organic compounds: correlation Chem. Phys. Lett. 1982, 89, 399-404.
and prediction by a quantitative 62. A.T. Balaban, D. Mills, S.C. Basak,
structure-property relationship, /. Correlation between structure and
Chem. In$ Comput. Sci. 1998,38, normal boiling points of acyclic
28-41. carbonyl compounds, /. Chem. In$
51. D.E. Needham, I.C. Wei, P. J. Comput. Sci. 1999, 39, 758-764.
Seybold, Molecular modeling of the 63. R.A. Lewis, J.S. Mason, I.M. McLay,
physical properties of alkanes, J . Am. Similarity measures for rational set
Chem. SOC.1988, 110,4186-4149. selection and analysis of
756
combinatorial libraries: the diverse application for an automated

property-derived (DPD) approach, J . superposition of certain naturally
Chem. Inf: Comput. Sci. 1997, 37, occurring nucleoside antibiotics, J.
599-614. Chem. Inf: Comput. Sci. 1989, 29,
64. S.L. Dixon, H.O. Villar, Investigation 163-172.
of classification methods for the 73. R.E. Carhart, D.H. Smith,
prediction of activity in diverse R. Venkataraghavan, Atom pairs as
chemical libraries, J. Cornput.-Aided molecular features in
Mol. Des. 1999, 13, 533-45. structure-activity studies: Definition
65. A. Katritzky, E.V. Gordeeva, and applications, J. Chem. In$
Traditional topological indices vs Comput. Sci. 1985, 25, 64-73.
electronic, geometrical, and 74. G. Moreau, P. Broto, The
combined molecular descriptors in auto-correlation of a topological
QSAR/QSPR research, J . Chem. Inf: structure: A new molecular
Comput. Sci. 1993, 33, 835-857. descriptor, Nouv. J. Chim. 1980, 4,
66. C.E. Shannon, W. Weaver, The 359-360.
Mathematical 7'heot-pof 75. T.H. Varkony, Y. Shiloach, D.H.
Communication, University of Illinois Smith, Computer-assisted
Press, Urbana, 1998. examination of chemical compounds
67. C. Hansch, B.R. Telzer, L. Zhang, for structural similarities, J . Chem.
Comparative QSAR in toxicology: rnf: Comput. Sci. 1979, 19, 104-111.
examples from teratology and cancer 76. R. Nilakantan, N. Bauman, J.S.
chemotherapy of aniline mustards, Dixon, R. Venkataraghavan,
Crit. Rev. Toxicol. 1995, 25, 67-89. Topological torsion: A new molecular
68. T.C. Bruice, N.Kharasch, R. J. descriptor for SAR applications.
Winzler, A correlation of Comparison with other descriptors, J .
thyroxine-like activity and chemical Chem. rnf: Comput. Sci. 1987, 27,
structure, Arch. Biochem. Biophys. 82-85.
1956, 62,305-17. 77. L. Xue, J. Bajorath, Molecular
69. A. Leo, C. Hansch, D. Elkins, descriptors in chemoinformatics,
Partition coefficients and their uses, computational combinatorial
Chem. Rev. 1971, 71,525-616. chemistry, and virtual screening,
70. A.K. Ghose, G.M. Crippen, Atomic Comb. Chem. High Throughput
physicochemical parameters for Screen. 2000, 3, 363-72.
three-dimensional structure-directed 78. A.C. Good, I.D. Kuntz, Investigating
quantitative structure-activity the extension of painvise distance
relationships I. Partition coefficients pharmacophore measures to
as a measure of hydrophobicity, J . triplet-based descriptors, 1.
Comput. Chem. 1986, 7,565-577. Cornput.-Aided Mol. Des. 1995, 9,
71. A.K. Ghose, A. Pritchett, G.M. 373-9.
Crippen, Atomic physicochemical 79. C. Bologa, T.K. Allu, M. Olah, M.A.
parameters for three dimensional Kappler, T.I. Oprea, Descriptor
structure directed quantitative collision and confusion: toward the
structure-activity relationships I I I: design of descriptors to mask
modeling hydrophobic interactions, chemical structures, J . Cornput.-Aided
J. Comput. Chem. 1988, 9,80-90. Mol. Des. 2005, 19, 625-35.
72. V.N. Vishwanadhan, A.K. Ghose, 80. J.S. Melnick, J. lanes, S. Kim, J.Y.
G.R. Revankar, R.K. Robins, Atomic Chang, D.G. Sipes, D. Gunderson,
physicochemical parameters for L. James, J.T. Matzen, M.E. Garcia,
three dimensional structure directed T.L. Hood, R. Beigi, G. Xia, R.A.
quantitative structure-activity Harig, H. Asatryan, S.F. Yan,
relationships: 4.Additional Y. Zhou, X.J. Gu, A. Saadat, V. Zhou,
parameters for hydrophobic and F.J. King, C.M. Shaw, A.I. Su,
dispersive interactions and their R. Downs, N.S. Gray, P.G. Schultz,
References I 7 5 7
M. Warmuth, J.S. Caldwell, An 91. G.M. Crippen, Distance geometry
efficient rapid system for profiling approach to rationalizing binding
the cellular activities of molecular data, J . Med. Chem. 1979, 22, 988-97.
libraries, Proc. Natl. Acad. Sci. U.S.A. 92. K. Yamamoto, A quantitative
2006, 103, 3153-8. approach to the evaluation of
81. Y.C. Martin, J.L. Kofron, L.M. 2-acetamide substituent effects on
Traphagen, Do structurally similar the hydrolysis by Taka-N-acetyl-beta-
molecules have similar biological D-glucosaminidase. Role of the
activity? J . Med. Chem. 2002, 45, substrate 2-acetamide group in the
4350-8. N-acyl specificity of the enzyme, J .
82. J . Hert, P. Willett, D.J. Wilton, Biochem. (Tokyo) 1974, 76, 385-90.
P. Acklin, K. Azzaoui, E. Jacoby, 93. C. Silipo, C. Hansch, Correlation
A. Schuffenhauer, Comparison of analysis. Its application to the
fingerprint-based methods for virtual structure-activity relationship of
screening using multiple bioactive triazines inhibiting dihydrofolate
reference structures, J . Chem. If: reductase, J . Am. Chem. SOC.1975,
Comput. Sci. 2004, 44, 1177-85. 97,6849-61.
83. L.B. Kier, L.H. Hall, An 94. A.J. Hopfinger, Theory and
electrotopological-state index for application of molecular potential
atoms in molecules, Pharm. Res. energy fields in molecular shape
1990, 7,801-7. analysis: a quantitative
structure--activity relationship study
84. L.B. Kier, L.H. Hall, Molecular
of 2,4-diamino-5-benzylpyrimidines
Structure Description: the
as dihydrofolate reductase inhibitors,
Electrotopological State, Academic
J . Med. Chem. 1983, 26, 990-6.
Press, San Diego, 1999.
95. R.D. Cramer, D.E. Patterson, J.D.
85. G.E. Kellogg, L.B. Kier, P. Gaillard,
Bunce, Comparative molecular field
L.H. Hall, E-state fields: applications
analysis (CoMFA): 1. Effect of shape
to 3D QSAR,J. Cornput.-Aided Mol.
on binding of steroids to carrier
Des. 199G, 10, 513-20.
proteins,]. Am. Chem. SOC.1988, 110,
86. L.B. Kier, L.H. Hall, General 5959-5967.
definition of valence delta-values for 96. R.D. Cramer, J.D. Bunce, D.E.
molecular connectivity, J . Pharm. Sci. Patterson, I.E. Frank,
1983, 72,1170-3. Cross-validation, bootstrapping, and
87. L.B. Kier, W.J. Murray, L.H. Hall, partial least squares compared with
Molecular connectivity. 4. multiple linear regression in
Relationships to biological activities, conventional QSAR studies, Quant.
J . Med. Chem. 1975, 18, 1272-4. Struct.-Act. Relat. 1988, 7, 18-25.
88. A.J. Hopfinger, A QSAR 97. W. Lindberg, J.-A. Persson, S. Wold,
investigation of dihydrofolate Partial least-squares method for
reductase inhibition by Baker spectrofluorimetric analysis of
triazines based upon molecular mixtures of humic acid and
shape analysis, J . Am. Chem. SOC. ligninsulfonate, Anal. Chem. 1983,
1980, 102,7196-9206. 55,643-648.
89. L.B. Kier, The preferred 98. S. Wold, A. Ruhe, H. Wold, W.J.
conformations of ephedrine isomers Dunn, The collinearity problem in
and the nature of the alpha linear regression: The partial least
adrenergic receptor, 1.Pharmacol. squares (PLS) approach to
Exp. Ther. 1968, 164, 75-81. generalized inverses, S I A M J . Sci.
90. H.J. Weintraub, A.J. Hopfinger, Stat. Comput. 1984, 5, 735-742.
Conformational analysis of some 99. A.C. Good, E.E. Hodgkin, W.C.
phenethylamine molecules, J . Theor. Richards, Utilization of Gaussian
Bid. 1973, 41, 53-75. functions for the rapid evaluation of
13 Chemical lnformatics
758
I molecular similarity,]. Chem. InJ 110. C. Hansch, D. Hoekman, A. Leo,
Comput. Sci. 1992, 32, 188. D. Weininger, C.D. Selassie,
100. A.C. Good, S.J. Peterson, W.G. Chem-bioinformatics: Comparative
Richards, QSAR’s from similarity QSAR at the interface between
matrices, Technique validation and chemistry and biology, Chem. Rev.
application in the comparison of 2002, 102,783-812.
different similarity evaluation 111. C. Hansch, A. Kurup, R. Garg,
methods, J . Med. Chem. 1993, 36, H. Gao, Chem-bioinformatics and
2929-37. QSAR: a review of QSAR lacking
101. R.D. Cramer, R.D. Clark, D.E. positive hydrophobic terms, Chem.
Patterson, A.M. Ferguson, Rev. 2001, 101, 619-72.
Bioisosterism as a molecular 112. Y.C. Martin, 3D QSAR: current state,
diversity descriptor: steric fields of scope, and limitations, Perspectives in
single ”topomeric”conformers,J . Drug Discovery and Design 1998,
Med. Chem. 1996, 39, 3060-9. 12-14,3.
102. W.H. Sauer, M.K. Schwarz, 113. M.A. Johnson,G.M. Maggiora,
Molecular shape diversity of American Chemical Society. Meeting
combinatorial libraries: a prerequisite C o n c e ~ t and
s A ~ ~ l i c a t i oof
n sMolecular
for broad bioactivity,J . Chem. InJ Similarity, Wiley, New York, 1990.
114. D.E. Patterson, R.D. Cramer, A.M.
Cornput, sci, 2003, 43, 987-1003.
103. W.H. Sauer, M.K. Schwarz, Size Ferguson, R.D. Clark, L.E.
Weinberger, Neighborhood behavior:
doesn’t matter: Scaffold diversity,
a useful concept for validation of
shape diversity and biological activity
“molecular diversity” descriptors, J .
of combinatorial libraries, Chimia
Med. Chem. 199639,3049-59.
2003,57,276-283.
115. R.D. Cramer, M.A. Poss, M.A.
104. M.G. Bures,Y.C. Martin,
Hermsmeier, T.J. Caulfield, M.C.
Computational methods in
Kowala, M.T. Valentine, Prospective
molecular diversity and
identification of biologically active
combinatorial chemistry, Curr. Opin. structures by topomer shape
Chem. Biol. 1998, 2, 376-80. similarity searching, J . Med. Chem.
105. P. Willett, Chemoinformatics - 1999,42,3919-33.
similarity and diversity in chemical 116. R, Benigni, G , Gallo, F. Giorgi,
libraries, C u r . Opin. Biotechnol. A. Giuliani, On the equivalence
2000, 11,85-8. between different descriptions of
106. O.F. Guner, History and evolution of mo~ecules:Value for computational
the pharmacophore concept in approaches, J . Chem. InJ Comput.
computer-aided drug design, Curr. Sci. 1999, 39, 575-578.
Top. Med. Chem.2o02, 2, 1321-32. 117. A. Rusinko 111, M.W. Farmen, C.G.
107. F. Yamashita, M. Hashida, In silico Lambert, P.L. Brown, S . S . Young,
approaches for predicting ADME Analysis of a large
properties of drugs, Drug Metab. structure/biological activity data set
Phamacokinet. 2004, 19, 327-38. using recursive partitioning, J . Chem.
108. M.P. Bradley, An overview of the InJ Comput. Sci. 1999, 39, 1017-26.
diversity represented in 118. D.K. Agrafiotis, A constant time
commercially-availabledatabases, ]. algorithm for estimating the diversity
Comput. Aided Mol. Des. 2002, 16, of large chemical libraries, J . Chem.
301-9. 1nJ Comput. Sci. 2001, 41, 159-67.
109. J.H. Voigt, B. Bienfait, S. Wang, M.C. 119. T.I. Oprea, J. Gottfries,
Nicklaus, Comparison of the NCI Chemography: the art of navigating
open database with seven large in chemical space, ]. Comb. Chem.
chemical structural databases, ]. 2001,3,157-66.
Chem. InJ Comput. Sci. 2001, 41, 120. T.I. Oprea, I. Zamora, A.L. Ungell,
702- 12. Pharmacokinetically based mapping
device for chemical space navigation, P.A. Clemons, S.L. Schreiber,
J . Comb. Chem. 2002,4,258-66. Relationship of stereochemical and
121. M. Feher, J.M. Schmidt, Property skeletal diversity of small molecules
distributions: differences between to cellular measurement space, J .
drugs, natural products, and Am. Chem. SOC.2004, 126,14740-5.
molecules from combinatorial 130. Z.E. Perlman, M.D. Slack, Y. Feng,
chemistry, J. Chem. 1 5
. Comput. Sci. T.J. Mitchison, L.F. Wu, S.J.
2003,43,218-27. Altschuler, Multidimensional drug
122. G.W. Caldwell, Compound profiling by automated microscopy,
optimization in early- and late-phase Science 2004, 306, 1194-8.
drug discovery: Acceptable 131. A.F. Fliri, W.T. Loging, P.F. Thadeio,
pharmacokinetic properties utilizing R.A. Volkmann, Biospectra analysis:
combined physicochemical, in vitro model proteome characterizations for
and in vivo screens, Curr. Opin. Drug linking molecular structure and
Discov. Devel. 2000, 3, 30-41. biological response, J . Med. Chem.
123. C.M. Krejsa, D. Horvath, S.L. 2005,48,6918-25.
Rogalski, J.E. Penzotti, B. Mao, 132. D.E. Root, S.P. Flaherty, B.P. Kelley,
F. Barbosa, J.C. Migeon, Predicting B.R. Stockwell, Biological
ADME properties and side effects: mechanism profiling using an
the BioPrint approach, Curr. Opin. annotated compound library, Chem.
Drug Discov. Devel. 2003, 6, 470-80. Biol. 2003, 10, 881-92.
124. T.R. Stouch, J.R. Kenyon, S.R. 133. Z.E. Perlman, T.J. Mitchison, T.U.
Johnson, X.Q. Chen, A. Doweyko, Mayer, High-content screening and
Y. Li, In silico ADME/Tox: why profiling of drug activity in an
models fail, J. Comput. Aided Mol. automated centrosome-duplication
Des. 2003, 17,83-92. assay, Chembiochem 2005, 6, 145-51.
125. H. van de Waterbeemd, E. Gifford, 134. J.C. Yarrow, Y. Feng, Z.E. Perlman,
ADMET in silico modelling: towards T. Kirchhausen, T.J. Mitchison,
prediction paradise? Nat. Rev. Drug Phenotypic screening of small
Discov. 2003, 2, 192-204. molecule libraries by high
126. P.A. Clemons, Complex phenotypic throughput cell imaging, Comb.
assays in high-throughput screening, Chem. High Throughput Screen. 2003,
Curr. Opin. Chem. Biol. 2004, 8, 6,279-86.
334-8. 135. J.C. Yarrow, Z.E. Perlman, N.J.
127. L.M. Kauvar, D.L. Higgins, H.O. Westwood, T.J. Mitchison, A
Villar, J.R. Sportsman, high-throughput cell migration assay
A. Engqvist-Goldstein, R. Bukar, K.E. using scratch wound healing, a
Bauer, H. Dilley, D.M. Rocke, comparison of image-based readout
Predicting ligand binding to proteins methods, BMC Biotechnol. 2004, 4,
by affinity fingerprinting, Chem. Biol. 21.
1995, 2, 107-18. 136. E.O. Perlstein, D.M. Ruderfer,
128. S.J. Haggarty, P.A. Clemons, S.L. G. Ramachandran, S.J. Haggarty,
Schreiber, Chemical genomic L. Kruglyak, S.L. Schreiber, Revealing
profiling of biological networks using complex traits with small molecules
graph theory and combinations of and naturally recombinant yeast
small molecule perturbations, /. Am. strains, Chem. Biol. 2006, 13, 319-27.
Chem. SOC.2003, 125,10543-5. 137. S.L. Schreiber, Small molecules: the
129. Y.K. Kim, M.A. Arai, T. Arai, J.O. missing link in the central dogma,
Lamenzo, E.F. Dean 111, N. Patterson, Nat. Chyem. Biol. 2005, I, 64-<
Chemical Biology
760
13.2
WOMBAT and WOMBAT-PK Bioactivity Databases for Lead and Drug Discovery
Marius Olah, Ramona Rad, Liliana Ostopovici, A h a Bora, Nicoleta Hadaruga,
Dan Hadaruga, Ramona Moldovan, Adriana Fulias, Maria Mracec, and
Tudor I. Oprea
Outlook
This chapter highlights the importance of gathering appropriate and accurate

information with respect to chemical structures and associated bioactivities,
focused on drug discovery. The contents of WOMBAT and WOMBAT-PK
are summarized, and examples are given for some of the problems that
are encountered when indexing correct biological properties and chemical
structures. Two examples for data mining in WOMBAT are given.
13.2.1
Introduction: The WOMBAT Databases
The current paradigm for drug discovery allows a relatively short

period, 6-12months, for the process that modifies an initial active com-
pound - either from high throughput screening (HTS),or from publications
and patents - into a well-characterized lead molecule. During this time, project
team members have relatively little time to familiarize themselves with ‘prior
art’, that is, to gather information pertinent to the new biological target, the
disease models, as well as active chemotypes on the intended, or related targets.
The task of gathering background information related to chemotypes is made
easier if one has access to chemical databases such as Chemical Abstracts via
SciFinder [l],Beilstein [2], and Spresi [3], or to medicinal chemistry-related
patent databases such as the MDL Drug Data Report, MDDR [4], the World
Drug Index, WDI [S], and Current Patents Fast Alert [GI.Collections of biologi-
cally active compounds include Comprehensive Medicinal Chemistry, CMC [7]
and DiscoveryGate [8],while the PubChem [9]database, part of the Molecular
Libraries Initiative (MLI) [lo],is more focused on tools for chemical biology.
Clinical pharmacokinetics data for marketed drugs is captured in the Physician
Desk Reference, PDR [ll],while DrugBank [12] also captures compounds in
clinical trials.
Primary HTS data are captured in PubChem [9],which has author-defined
labels for “active” and “inactive” chemical probes. However, most of the
other databases listed above do not capture biological endpoints in a simple
searchable manner: There are no fields that one can query in a quantitative
manner to identify what is the target-related activity of a particular compound,
or what other measured properties it has. Such information is important if
Chemical Biology. From Small Molecules to System Biology and Drug D e s i F .

ISBN: 978-3-527-31150-7
one considers that (a) not all chemotypes indexed in patent databases are
I
13.2 WOMBATand WOMBAT-PK 761
indeed active - some are merely patent claims with no factual basis; (b) not all
chemotypes disclosed as active are equally active, or selective for that matter, on
the target(s)of choice; and (c) not all compounds sharing the same therapeutic
indication behave in the same manner with respect to, for example, side effects.
Some of these were considered at AstraZeneca R&D Molndal, Sweden, in
May 2001, to initiate a data-gathering project centered primarily on the Journal
ofMedicinal Chemistry (JMC),in collaboration with scientists at the Romanian
Academy Institute of Chemistry in Timisoara, Romania. The major goal of
this project was to capture chemical structures and the associated biological
activities disclosed in the JMC, with an initial goal of 20000 entries set for
the first year. The first version of this database was available at AstraZeneca
R&D Molndal in May 2002; this version contained 21 700 structures (with
duplicates), and 36 738 experimental activities on 324 targets, captured from
837 JMC papers (1996-1999).
Because the internal dissemination of this database within AstraZeneca
R&D (a company with 11 R&D sites across four continents) was not deemed
a success, AstraZeneca decided to discontinue the project as of May 2002.
Backed by private funding, the database, renamed World of Molecular BioAc-
Tivity (WOMBAT)in 2003, continued to evolve [13]as discussed for WOMBAT
2006.1, below. Recognizing the paucity of chemical databases that capture clin-
ical pharmacokinetics data in a searchable manner, we further developed the
WOMBAT-PK (WOMBAT-Pharmacokinetics),to index such data from litera-
ture [14].This chapter summarizes the contents of WOMBAT and WOMBAT-
PK [ 1S], some of the problems encountered in appropriately indexing biological
activities and correct chemical structures (with focus on machine-readable
contents for data mining), and provides some examples of data mining with
WOMBAT. Other bioactivity databases [ 161,focused mostly on patent literature,
are shown in Table 13.2-1together with the on-line references.
13.2.2
WOMBAT 2006.1: Overview
WOMBAT 2006.1 contains 154 236 entries (136 091 unique SMILES Simpli-
fied Molecular Input Line Entry System [17, 18]),covering 6801 series from
over 6791 papers with more than 307 700 activities for 1320 unique targets.
All biological activities are automatically converted to the - log,, of the molar
concentration, regardless of activity type. Numerical values for activity are
stored in three fields; the additional two fields capture the experimental error,
when reported']. Besides exact numeric values (the vast majority), WOMBAT
1) In the absence of reported errors, the 3 activity

value fields are equal. The decision to index
these values for each molecule was taken
because 'missing values' are given a different
interpretation by statistical techniques.
762
I 13 Chemical fnformatics
Table 13.2-1 Examples of annotated databases, modified

from [16]
Database Description Homepage
AurSCOPE Databases containing biological and chemical http://www.aureus-pharma.com/

information relating to a class of drug targets
or a pharmaceutical topic of interest
BIDD Bioinformatics databases about drugs, http://bidd.nus.edu.sg/
natural products, protein targets, ADME
(Absorption, Distribution, Metabolism and
Excretion)/Tox,and drug-protein binding
Bioprint Ligand profiling data including target-specific http://www.cerep.fr/
activity, pharmacology, and ADME-related
properties
Blueprint Resource for biomolecular data focused on http://www.blueprint.org/
public databases for small molecule/domain
interactions
ChemBank Database about small molecules and http://chembank.broad.harvard.edu/
resources for studying their effects on biology
Drugmatrix Pharmacological, pathological, and gene http://www.iconixpharm.com/
expression profiles for benchmark drugs
GVK Biosciences Chemical structures, biological activities, http://www.gvkbio.com/
databases toxicity, and pharmacological data for a large
number of compounds curated from patents
and journals
KiBank Database of chemical structures with http://kibank.iis.u-tokyo.ac.jp/
associated binding affinity (K,) for given
targets
Kinase Captures published information for http://www.eidogen-sertanty.com/
Knowledgebase therapeutically relevant kinases
Jubilant Biosys Chemical structures, bioactivities, http://www.jubilantbiosys.com/
databases therapeutically relevant databases for a large products. htm
number of compounds curated from journals
and patents
Ligand Info Small molecule meta-database which http://ligand.info/
compiles various publicly available small
molecule databases
MDL Drug Data Contains biologically relevant compounds, http://www.mdli.com/
Report including launched and candidate drugs, and
well-defined derivatives
PDSP K, Public domain resource that provides http://pdsp.cwru.edu/
information related to drugs and their
binding properties
PubChem Provides a high volume of information on http://pubchem.ncbi.nlm.nih.gov/
the biological activities of small molecules; it
links chemical structures to other Entrez
databases
ZINC Online resource of commercially available http://blaster.docking.org/zinc/
compounds dedicated to virtual screening
practitioners
13.2 WOMBATand WOMBAT-PK
I 763
Fig. 13.2-1 Bioactivity distribution pie charts in WOMBAT

2006.1, classified by target type. The size of the pie chart is
proportional to the representation of each target class: enzymes,
42%; ion channels, 7%; proteins 7%; and receptors, 45%.
now captures ‘inactives’(3639),‘less than’ (21926), ‘greater than’ (635),as well

as percentage inhibition values (8448single dose experiments). The bioactivity
distribution by target type is given in Fig. 13.2-1. Four target types are captured
in WOMBAT: receptors (which includes GPCRs - G-protein coupled receptors,
nuclear hormone receptors, integrins and other receptors, e.g., sigma), enzymes
(associated with the Enzyme Commission E.C. number [19]),ion channels, and
proteins (biological targets that are not known as receptors, enzymes, or ion
channels, e.g., transporters).
A vast majority of the biological activities are related to inhibitors and antag-
onists: -56% of the activities are ICsO values (and variations), and 37% are Ki
values (and variations). Much less frequent are Dz or ECso values (-3% of the
measurements are for agonists or substrates) and binding affinity constants
(-1% Kb and Kd). In WOMBAT 2006.1, enzyme inhibitors populate more of
the inactivellow-activitybins, while receptor antagonists populate more of the
medium/high-activity bins (see also Fig. 13.2-2).The target profile of biological
activities is given in Table 13.2-2,with focus on some targets classes of current
interest to the pharmaceutical industry. Table 13.2-2further indicates the ratio
of “actives” in this release of WOMBAT: This table shows that for some target
classes (e.g., phosphatases) there is a relatively small number of “actives”, a
trend that is observed in most of the indexed enzymes. On the other hand,
receptor classes have a higher ratio of “actives”. The target type distribution by
activity in Fig. 13.2-2 reflects approximately 15 years of medicinal chemistry
(see also Table 13.2-2). Medicinal chemistry publications currently indexed in
WOMBAT are listed in Table 13.2-3.
764
Fig. 13.2-2 Target type distribution pie representation of each activity category:
charts in WOMBAT 2006.1, classified by inactives, 2%; low activity (0-6), 18%;
activity value (in the - log,o scale). The size medium activity (6-8), 41%; and high
of the pie chart is proportional to the activity (8-14.4), 40%.
Table 13.2-2 Target class profile for WOMBAT.2006.1*)
Target class Entries Percentage Actives Percentage
G-protein coupled receptors 50 778 32.92 3 1 111 20.17

Integrins 3127 2.03 1692 1.10
Nuclear hormone receptors 4335 2.81 2436 1.58
Sigma receptors 2123 1.38 l6Gl 1.08
Ion channels 13 500 8.75 5352 3.47
Serine proteases 7596 4.92 3166 2.05
(0xido)reductases 7770 5.04 2865 1.86
Kinases 9705 6.29 3241 2.10
Phosphatases 1361 0.88 81 0.05
Oxygenases 605 1 3.92 1716 1.11
Aspartyl proteases 4904 3.18 2881 1.87
Metalloproteases 4296 2.79 1471 0.95
Cysteine proteases 2063 1.34 771 0.50
Transporters 5462 3.54 2860 1.85
Others 31 165 20.21 NIA N/A
The WOMBAT database schemata, illustrated in Fig. 13.2-3, are further

discussed in the next section. Their organization, illustrated in Figs. 13.2-4
to 13.2-6, shows the 3 panels of the database: The Bioactivity Summary
") [Entries indicate the number of structures of 100 n M or better; percentage values relate
recorded for each target class, whereas "ac- to the total number of entries]
tives" indicate those entries with an activity
Table 13.2-3 Medicinal chemistry publications covered in
I 765
WOM BAT.2006.1
journal title Percentage Publication years
J. Med. Chem. 77.6 1991-2004 [complete]

2005 [partial coverage]
Bioorg. Med. Chem. Lett. 15.4 2002-2003 [complete]
Bioorg. Med. Chem. 5.6 2002-2003 [complete]
Eur. J. Med. Chem. 1.o 2002-2003 [complete]
I ROOT I
:- ,. ..~..~.~~
~ ~....~..
SMDLID
,... .
~ . ...
~,
~~~~ . . ~. . .
entry identifier
....... ~ ~~~~ ~...~~...~.
+! SID series identifier (related to the references database)
Y Structure chenucal structure (MDL MOL & SMILES formats)

,........ ~ ~ . ..~
. ~.
-.+ . R. e ~~.
~~
ference
..... . ~.~~
~
i ~ . ~ I
short bibliografic reference
~ ~~~~ . .. ~.
. . .
~
-+ K e y w o r d s structure keywords (stereo & salt data)

...~.~..~
. .. .. .. ..
~ ~~ .
-y Properties calc & exp properties (LogP/S, R05, LigEff, etc)
AID activity identifier (1, 2, ..., n)
T a r g e tT y p e target type (receptor, enzyme, ...)
t-.+
T a r g e thlame target name
ActType activity type (1C50, Ki, EC50, ...)
ValueType activity value type (=, <, >, inhib%, inactive)
A c tValue numeric activity value, in -log10 units
Range confidence range for the actlvlty value
BioKeywords target & exp determlnatlon information
S w i s s P r ot I D SwissProt I D / A N &species
R e c C l ass1 f GPCR/ N H R family/subfamily classification
Fig. 13.2-3 WOMBAT database schemata (simplified)

766
I 13 Chemical lnformatics
Fig. 13.2-4 WOMBAT bioactivity summary panel (example).
panel (Fig. 13.2-4) provides bioactivity types and values, some basic target
information, the minimal reference information as well as structural, chemical
(2D depiction and SMILES code), and related information (chirality, salt). The
Target and Biological Infomation panel (Fig. 13.2-5) provides detailed target
information, including biological information (species, tissue, etc.), detailed
target and target class information (including hierarchical classification for
G-protein coupled receptors, nuclear hormone receptors, and enzymes) as
well as further information regarding the bioassays (radioligand, assay type,
etc.). SwissProt [20] reference IDS are stored for most targets (-88%). The
Computed Chemical Properties panel (Fig. 13.2-6) includes several calculated
and experimental properties for each chemical structure, for example, counts
of miscellaneous atom types, Lipinski’s rule-of-five (Ro5) parameters [21]
(including the calculated octanol/water partition coefficient), ClogP [22] and
Tetko’s calculated water solubility [23],polar surface areas (PSAs)and nonpolar
surface areas (NPSAs), and so on. Finally, the Reference Database contains
bibliographic information (Fig. 13.2-7),including the Digital Object Identifier
I 767
Fig. 13.2-5 WOMBAT target and biological information panel (example).
(DOI) format [24] with URL links to pdf files for all literature entries, as well
as the PubMed ID for each paper.
13.2.3
WOMBAT Database Structure
WOMBAT is a dynamic database, which evolves as new data types are included.
The database structure is, however, preserved as much as possible from one
release to the next. Each root record (or WOMBAT entry) is identified by a
unique number (SMDLID),and is defined by the combination of one chemical
structure and one or more associated biological activities as entered in one
publication (Fig. 13.2-3). One field, series identifier (SID), links all the root
records indexed from one reference (article). There are 6801 SID values in
WOMBAT 2006.1 (see also Fig. 13.2-7). At the root level, information about
the bibliographic reference (unique SID) from which the entry originated the
entry is recorded together with various properties (illustrated in Fig. 13.2-6).
Separate keywords describe structural characteristics, related to stereochem-
istry (e.g., absolute, relative, f,R/S, ‘non-chiral’ or racemic) and to the salt
768
Fig. 13.2-6 WOMBAT computed chemical properties panel (example).
Fig. 13.2-7 WOMBAT references database (example).

form -
see also Fig.13.2-3. We record the salt separately to avoid the salt-
I 769
removal step that is usually performed in cheminformatic studies prior to

structure computations. For each SMDLI D, we define the following biological
activity sub-records: the activity identifier (AID), with values from 1 to n,
where n is the number of biological activity determinations for one structure;
TargetName (the target name on which the activity was measured); ActType
(the activity type, e.g., I&), ValueType, which can be one of five types: Exactly
(=), lower than (<),greater than (>), percentage inhibition at a given concen-
tration (@I), or inactive; ActValue, the numeric value of the activity, in - log,,
of the molar concentration; Range, the experimental confidence range for the
measured activity, also in logarithmic units. For each SMDLID and each AID,
we also record a number of BioKeywords related to biological activity infor-
mation (e.g., bio-species, tissue and cell types, and so on) and target-related
information (e.g., the E.C. number [19],what radio-labeled substrate or ligand
was used, and so on) - see also Fig.13.2-5. Thus, for one series (same SID
value), each activity block (AID range 1, . . ., n) has separate TargetName,
ActType, ValueType, and BioKeywords.
13.2.4
WOMBAT Quality Control
Quality control is performed at the moment of data entry, in particular with

respect to errors present in publications. Chemical structures are checked for
structural consistency by matching the molecular weight (MW) and chemical
formula with the ones available in the Experimental section and/or Supporting
Information - whenever available, and by comparison to prior publications.
Whenever in doubt, we also use other sources, such as the Merck Index [25]
and free Internet resources. In the instances where external and literature
data cannot be reconciled, SciFinder [I] is also used. The error rate so far
in medicinal chemistry publications is not at all negligible: We find an
average of approximately two errors per publication in all the 6791 papers
indexed in WOMBAT 2006.1. Given the median of 25 compounds per series,
this implies an overall error rate of 8%. These errors are distributed as
follows [26]:
incorrectly drawn or written structures (3%);incorrect
molecular formula or MW (3%);
unspecified position of attachment of substituents, or
ambiguous numbering scheme for the heterocyclic backbone
(0.9%);
structures with the incorrect backbone (0.7%);
incorrect generic names or chemical names (0.2%);
duplicates (0.2%);
incorrect biological activity (0.3%);
incorrect references (0.2%).
770
Not machine-readable Machine-readable

/
“ /
(1R,2S,3S,5S)-8-methyl-3-phenyl
-2-propyl-8-azabicyclo[3.2.1IoctaneA.
OH OH
Error: ‘Stereo bonds are only allowed (2R,3R,4S,5R)-2-(6-amino-9H-purin-9-yl)-
between chiral and achiral atoms’ 5-(Rgroup)-tetrahydrofuran-3,4-diol
Cross upldown wedge error
Undefined chirality may be interpreted as both R and S
Fig. 13.2-8 Human vs. machine-readable chemical structure

representations. Names based on the depicted structures were
interpreted using ACDName [30]. The cross upldown wedge error
(middle) causes errors in assigning the absolute chirality.
A special attention is given to stereochemistry, as some compounds

are published without proper chirality representation even though the
information is available, for example, for natural compounds and their
derivatives. Furthermore, as illustrated in Fig. 13.2-8, compounds published
in medicinal chemistry literature are often depicted in a “human-readable”
format; that is, structures are drawn in a format that chemists can interpret
to reconstruct proper chirality. However, this format is not “machine-
readable”, that is, cheminformatics software for 3D structural conversion,
or for automatically generating IUPAC (International Union of Pure and
Applied Chemistry) nomenclature, cannot perceive the stereo centers correctly
if the “above/below plane” convention is not strictly enforced. We illustrate

I 771
this with ACDName [27] on the structures depicted in Fig. 13.2-8: The
software does not perceive two stereo centers for the tropane ring on
the left side and returns an error for the sugar structure. The errors
are not specific to ACDName - this program is used only to illustrate
the problem. Another type of problem in structure-conversion is the cross
up/down wedge error, when two such bonds emerge from the same chiral
center (Fig. 13.2-8): Software cannot assign the proper chirality, since by
convention three atoms are in the ‘paper plane’, and only one is ‘wedged’
(up or down); two wedged bonds are simply not possible according to the
convention. Most of these errors can be corrected by checking previous
literature. Sometimes, even the cited reference may turn out to be an
error, for example, the reported MW is not consistent with the drawn, or
named, structure.
From a quality control standpoint, the assignment of the SwissProt ID
for each target can be a challenge, as publications do not always specify the
exact target used in an assay. In some instances, the species from which the
target was isolated is not explicitly mentioned, whereas some publications
do not mention what target subtype was used. For example, there are 1780
entries in WOMBAT 2006.1 that contain ‘estrogen receptor’ (ER) in the target
name, which implies that ERs present in a particular organ (e.g., uterus,
breast, brain) were tested for binding, agonism or antagonism. Of these, 1201
entries were annotated for a specific receptor subtype, either E R a or ERP, or
‘3A1’ and ‘3A2’ according to the nuclear receptor nomenclature [28]. For the
remaining 579 entries, a target could not specifically be assigned to a single
SwissProt ID.
This begs the question of storing multiple SwissProt ID values when a
mixture of targets is present. This situation is common for integrin receptors
that have the two protein chains separately defined in SwissProt. In the
ER example, 114 of the 579 entries were tested on MCF7 cells; however, it
is now clear that a third ER, GPR30 [29], could be present in MCF7 cells
[30]. Therefore, the observed anti-estrogenic activities for these 114 entries
should be questioned in the light of this new information; should three such
receptors be encoded? It further illustrates the dynamic nature of biological
targets: As biologists uncover more information about a particular target or
class of targets, and as our understanding about each target evolves, the
exact nomenclature changes as well. For example, there are 852 entries in
WOMBAT 2006.1 that contain ‘VEGFR-2’ as the target name: This target
name stands for the vascular endothelial growth factor receptor subtype 2,
but was previously known as ‘Flk-l/KDR’,or ‘fetal liver kinase-1’ and ‘kinase
insert domain-containing receptor’. The VEGFR-2 name is present in all
852 entries, even though some of the older (before 1999) publications did
not refer to this target by the VEGFR-2 name. In an annotated database
such as WOMBAT, one has to monitor and update not only changes
related to biology but also changes related to chemistry (and chemical
772
I errors), discussed
13 Chemical informatics
in more detail below. Practical applications based on
WOMBAT data mining using targets [31] and descriptors [32] have been
described.
13.2.5
Uncovering Errors From Literature
As the demand for integrated chemical and biological information increases,

scientists rely more often on annotated databases that capture medicinal
chemistry literature (see Tables 13.2-1 and 13.2-3). There is little, if any,
error checking downstream from publication time, even though mechanisms
for publishing errata have been in place for quite some time. While the
responsibility for published data accuracy resides primarily with the author(s),
it is also the responsibility of annotated database curators to capture as
many of these errors as possible. While ensuring the quality control in
WOMBAT, we have found inconsistencies in many ofthese publications. These
errors may have a significant effect on the way we understand the molecular
basis of chemical-biological interactions, at least for some particular series
used for structure-activity studies. Coats has traced the errors in a known
steroid benchmark for quantitative structure activity relationship (QSAR)
studies to the original publications [33]. Some of these errors are discussed
below.
Example 1. The following errors were found in Table 1 of Ref. 34, page 126:
compounds with molecular names 53 and 56, respectively,
appear to be duplicated because all their substituents are
identical. On the basis of their activities, 56 (compound 15e in
[35])has the meta -0CH3-CbH4 substituent, while 53
(compound 15g in [35])has the para -OCH3-C6H4
substituent;
the -NH- group is missing from the L substituent in
compound 27 (compound 9 in [36]),and the -CH2- group is
missing from the L substituent in compound 45 (compound
13 in [35]);
the R substituent of compound 66 is
-C6Hz-2-CO2CH3-4,5-(CH3)2 instead of the correct
-c6 H 2 -2-CO2CH 3 -4,s- (OCH 3)2 group (compound 51, in [ 361);
the R substituent of compound 68 is
-CbH2-2-CO2-4,5-(CH3)2instead of the correct
- C G H ~ - ~ - C O ~ - ~ , S - (group
O C H (compound
~)~ 7 in [35]);
compound 44 has a -log(ICso) of 7.67 instead of the correct
7.74 (compound 15d in [35]).
Example 2. In Table l b of Ref. 37, page 4361, the core structure contains an
oxygen atom instead of the correct sulfur atom [38]:
I 773
wrong correct
Thus, 47 structures (where X is the rest of the molecule) are incorrect

in Ref. 37. Since the paper illustrates the capabilities of a particular
structure-activity method, the consistent error does not influence the validity
of the models; it would, however, greatly influence the use of this series/model
in a medicinal chemistry project where the goal would be to improve the
binding affinity. Starting from the same initial publication [38], other errors
were propagated in [39]:
compound 37 has an incorrect double substitution in the para
position of the aromatic ring, 2,4-N02,4-OH,while the correct
one is 3-N02,4-OH;
the R substituent of compound B.12 is 2,4,6-C12,4-OMe
instead of the correct 2,6-C12,4-OMe.
Example 3 . Errors could also be found in Chemical Abstracts’ SciFinder [l].

All the errors we encountered originate in the primary publications; their
appearance in SciFinder illustrates how such errors can propagate (since
SciFinder is a very popular resource). For example, the compound RB-380
(CAS# 187454-94-0),published in [43] (original molecule name 24) has a ring
size of 14 atoms, instead of the correct 13:
SciFinder structure Correct structure
C34H42N607S2 C33H40N607S2
L-Phenylalaninamide, N-(5-mercapto-l- Cyclo-S,S-[(5-thiopentanoyl)-c~Me(R)-Trp~

oxopenty1)-a-methyl-D-tryptophyk- Cysl-Asp-Phe-NH, (this name is given in
homocysteinyl-L-a-aspartyl-,cyclic (1+2)- the original publication (401, in the
disulfide (this name is given in CAS) experimental section)
774
The correction we propose is based on the experimental section name and

on the following text fragment (p.648, results section [40]):". . . by introducing
an additional amide bond (compound 16 or RB 370) or a disuljide bridge (24 or
RB 380) into the 13-membered ring (Schemes 2 and 3), and by changing the size
ofthe ring (Table 1, compounds 43 and 45)." By analyzing the data from Table 1
of Ref. 40, compound 43 (which is actually 44 - which is another small error)
has a 13 atoms ring, while compound 45 has a 14 atoms ring.
Example 4. Stereochemical ambiguities and structural errors can be encoun-
tered in the Merck Index [25] as well, as shown in these two examples:
k
%&
Compound identifiers Merck Index structure Correct structure
and error description
MG30, anagyrine (CAS#
486-89-5):chiral center -
inversion and cross / &' /

upldown wedge
H H
M 1854, carisoprodol
(CAS# 78-44-4):
+fOOH
completely different
structure. All other HNY--NH2
0
information about
M1854 is correct (name,
formula and molecular
weight). The formula is
correct in the ninth
edition of the Merck
Index
The examples from SciFinder and the Merck Index are not intended to
question the quality of these products, which we consider to be outstanding.
They are invaluable resources to many chemists worldwide, and the error rate
in these two databases is insignificant if one takes into account the enormous
volume of indexed data. We have published a structure-activity paper on
HIV-protease inhibitors [41] in which a modified peptide was present in both
the training set, and the test set. A1 Leo of Pomona College has recently [42]
detected 100 chemical and name errors in the printed version of the sixth
edition of Burger's Medicinal Chemistry [43],errors that are to be corrected in
the on-line edition [44].One can never be too careful in verifying the available
information, in particular if one is to invest a significant amount of resources
in that area.
13.2.6
I 775
WOMBAT-PK: Clinical Pharmacokinetics (PK) and Toxicological (Tox) Data
As PK data has become more important during lead discovery and evaluation,
we screened the clinical pharmacokinetics literature and developed a chemical
database that captures such data in numerical searchable format (WOMBAT-
PK). Its organization is illustrated in Figs. 13.2-9-11, which illustrate three
of the 4 panels of the database: The Compound Description panel (Fig. 13.2-9)
provides the drug marketed names, some physico-chemical characteristics,
as well as structural, chemical (2D depiction and SMILES code), and related
information (chirality, salt). The Phamacokinetic Data panel (Fig. 13.2-10)
provides the drug target information, and multiple PK and Tox parameters,
indexed in both numerical and text form. The third panel, Potential Side
Efects, captures data for BBB (blood-brain barrier) permeability, cardiac
toxicity data, possibly related to hERG (human ether-a-go-go potassium
channel 1) bioactivity, in vitro bioactivities from WOMBAT, as well as
mammalian tox data (e.g., the lethal dose 50%, LD50). The fourth panel,
Computed Chemical Properties panel, is identical to the one in WOMBAT
Fig. 13.2-9 WOMBAT-PK compound description panel (example).

776
I 1 3 Chemical Informatics
Fig. 13.2-10 WOMBAT-PK pharmacokinetic data panel (example)
(see Fig. 13.2-6). The 2006 release of WOMBAT-PK contains 900 marketed
drugs (in rare cases, some are metabolites) with documented PK and Tox
properties.
Currently indexed PK, Tox, and physico-chemical properties data are
summarized in Table 13.2-4. The top nine properties were captured from
the following sources: Goodman 8 Gilman's ninth edition [45] (GSrG),
Avery's fourth edition [46] (Av), and the Physician Desk Ref. 11 (PDR). FDA's
Center for Drug Evaluation and Research website [47] was consulted for
FDA-approved drug labels. Other resources (e.g., Google'") were sometimes
used to compile the WOMBAT-PK database. The maximum recommended
therapeutic dose [48](MRTD)is available from the FDA [49],whereas MRTD-U
(MRTD corrected for the fraction-unbound) was determined by using the
percentage plasma protein binding (%PPB)data already indexed in WOMBAT-
PK. Thus, MRTD-U = MRTD x (1 - %PPB), and is available for 498 drugs.
Experimental LogD7.4and LogP values from compilation tables [SO] and from
the Sangster database [Sl], and pK, values from Avery [46] and the Merck
Index [25] were collected for these drugs. In WOMBAT-PK, drug targets
are assigned to 753 drugs (of these, 97% have SwissProt IDS), whereas the
phase I metabolizing enzymes (all with SwissProt IDS) are recorded for
I 777
Fig. 13.2-11 WOMBAT-PK potential side effects panel (example)
419 entries. Regarding cardiac toxicity, there are 218 drugs indexed for QT-
prolongation (a clinical observation based on the ECG, the electrocardiogram),
89 for Torsade de Pointes risk (another ECG signal), and 71 with hERG
binding data. Curating clinical PK data requires individual examination
[52], and sources such as Goodman & Gilman’s are often considered more
reliable.
Often, such experimental values are “greater than” or “less than” a given cut-
off value. A systematic round-off procedure was implemented, whereby < 5” “
was attributed a higher value (=2.5),compared to “< 1” (=0.5). Numeric values

also differ, sometimes significantly, due to various factors (e.g., multiple dose
vs. single dose, children vs. healthy volunteers); thus, conflicting values were
sometimes reported. The “on file” values in Table 13.2-4 are often averages
between G&G and Avery data, although ~ 3 0 % of the indexed values differ by
more than 20% between these two sources (data not shown).To identify trends,
we attenuated the effect of such discrepancies by implementing an incremental
increase procedure to some of the PK properties, as illustrated in Table 13.2-5.
Incremental rank values were selected from experience whenever possible:
for example, experimental errors related to percentage oral occur mostly for
values between 20 and 80%; 617 and 1217 represent the 112 and full value
778
Table 13.2-4 Experimental PK and Tox data captured in

WOMBAT-PK 2006.1
Property O n file CSLC Avery
%Oral bioavailability 740 277

%Urinary excretion 339 NIA
%Plasma protein binding 776 434
Clearance, C1 (mL min-' x kg) 514 422
Nonrenal clearance (fractional) 442 442
Volume of distribution, VD,, (L kg-') 552 45 3
Half-life, T1/2 (hr) 839 576
Terminal half-life, TT'1/2 (hrs) 581 580
Effective concentration (mM L-') 119 NIA
MRTD (pmole kg-I-bwlday) 575 N/A
MRTD-U (pmole kg-l-bwlday; f u corrected) 498 NIA
LogD7.4 (measured) 513 N/A
LogPoct (measured) 472 N/A
pKal 350 274
pKa2 99 75
In vitro Binding Data (from WOMBAT) 453 NIA
of creatinine clearance (120 mL/70 kg min-'), respectively; 3, 5.5, and 12

are typical 70-kg man volumes in liters for plasma, blood, and extracellular
fluids [14].
WOMBAT-PK also captures information about the known (or intended)
drug target(s). These are often retrieved from the therapeutic classification
data (e.g., anti-histaminic compounds are intended to act as antagonists
of the H1 histamine receptor), or can be inferred by searching medicinal
chemistry literature - see also Fig. 13.2-10. Of interest is the cross-index of
Table 13.2-5 Parent value ranking for certain PK parameters in

WOMBAT-PK 2006.1
% Oral Rank 3 oral Rank 5 oral % Urine Rank urine

0- 5 0 0 0-1 0
5.1-19.99 0 1 1.01-5 1
20.0-79.99 1 2 5.01-20 2
80.0-95 2 3 20.01-50 3
>95.1 2 4 50.01-80 4
>80 5
%PPB Rank PPB CI (mL min-' x kg) Rank CI V D (L kg-') Rank V D

0-5 0 0 ~ 7 ) 0 0-1 0
5.01-20 1 (6.01/7)-(1217) 1 1.01-3 1
20.01-80 2 (12.0117)-5 2 3.01-5.5 2
80.01-95 3 5.01-10 3 5.51-12 3
95.01-99 4 10.01-15.5 4 >12 4
>99.1 5 >15.5 5
the WOMBAT and WOMBAT-PK databases, which shows in vitro binding
I
73.2 WOMBATand WOMBAT-PK 779
information for certain drugs in medicinal chemistry literature. For example,

aspirin has a relatively weak binding affinity to cyclooxygenases COX-1
and COX-2 (but acts as suicide inhibitor); in the same time, it appears
to be 2 to 3 orders of magnitude more potent on GP IIb/IIIa, an a 2 b p 3 a
integrin involved in platelet aggregation. This probably explains why aspirin
is effective at the 75-80 mg/day dose range as an antiaggregant, compared to
the 500-1000 mg/day dose range for the anti-inflammatory effects [53].
13.2.7
Datamining With WOMBAT
Example 1. One of the major areas of interest in medicinal chemistry is

oncology. The cancer medicinal chemistry space was described earlier by
mining the WOMBAT and WOMBAT-PK databases [54].The oncology subset of
WOMBAT 2006.1 contains active 917 unique targets, detailed in Table 13.2-6.
A query for targets that have over 300 entries allows us to establish an activity
histogram, contrasting low-activity entries (Fig. 13.2-12a) with high-activity
entries (Fig. 13.2-12b). This allows the user to rapidly identify targets for
which the number of low-activity entries exceeds significantly the number of
high-activity entries, such as, GGTase, PKA, and Tubulin - see Fig. 13.2-12
legend for target names. In fact, there are only seven entries of Tubulin
inhibitors with activity better than 100nM, and only two of them are Ro5
[21] compliant (data not shown). One can conclude that such targets are
areas of opportunity for the design of novel inhibitors. By the same token,
AR, ERB, and MMP-13, respectively, are targets where the number of high-
activity entries highly exceeds the low-activity records. These targets are,
probably, already abundant with high-quality ligands, indicating that perhaps
selectivity or pharmacokinetic profiling are currently the key areas for further
optimization.
Example 2. The concept of leadlikeness [32, 55, 561 and its application in
developing leadlike libraries [55,57, 581 have been extensively discussed. The
reduction of the leadlike concept into practice at Astex [59] resulted in a
proposal for fragment libraries in lead discovery called the ‘Rule of Three’:
Table 13.2-6 Distribution of target types among oncology targets

in WOMBAT 2006.1
Target type count Percentage
Enzyme 759 82.77

Ion channels 4 0.44
Protein 56 6.11
Receptor 98 10.69
780
4 Fig. 13.2-12 Activity histogram for the
receptor, ER alpha and beta subtypes,

I 781
most-populated oncology-related targets in respectively, Ftase - protein

WOMBAT 2006.1. There were at least 300 farnesyltransferase, CCTase - protein
records per target. The top panel (a) shows geranyl-geranyltransferase,
low-activity compounds (10 pM or less), Lck - proto-oncogene tyrosine-protein
whereas the bottom panel (b) shows kinase, MAPK p38 - cytokine suppressive
high-activity compounds (100 n M or better). anti-inflammatory drug binding protein, or
The bars are color-coded according to R 0 5 mitogen-activated protein kinase p38 a ,
violations (see also legend). Numbers on MMP-1 through MMP-9 - matrix
top of each bar indicate the number o f metalloproteases 1 through 9, respectively,
compounds with low (a) and high (b) entries PDCFR - platelet derived growth factor
per target. Target names are as follows: receptor, PKA - CAMP-dependent protein
AR - androgen (or dihydrotestosterone) kinase A, PKC-a - protein kinase C, alpha
receptor, CDKZ/cyclin A - cell division type, VECFR-2 - vascular endothelial
protein kinase 2, DHFR - dihydrofolate growth factor receptor 2, or kinase insert
reductase, ECFR - epidermal growth factor domain receptor, c-Src - proto-oncogene
receptor, ER, ERa, and ERB - estrogen tyrosine-protein kinase SRC, and Tubulin.
MW < 300, ClogP < 3, number of hydrogen bond donors and acceptors 5 3,
flexible bonds 5 3, and PSA 5 60 A’. Using these criteria, WOMBAT 2006.1
returns 6607 entries. Of these, 2001 entries contain at least one biological
activity better than, or equal to 100 nM, and 543 of these contain a generic
name. This usually means that they are either launched drugs, or natural
products, or otherwise in an advanced stage of development. The examples
given in Fig. 13.2-13 illustrate the chemotype, target, and activity diversities
that can be found in rule-of-three compliant molecules: Neurotransmitter
and nuclear hormone receptor agonists (EC50) and antagonists (Ki, ICso, and
A’), neurotransmitter transporters, as well as enzyme inhibitors are present,
most of them with multiple activities. On the basis of the WOMBAT 2006.1
entries, it appears that there are a number of interesting chemotypes that
are rule-of-three compliant. Such cheminformatics-based mining can identify
target-specific small molecules for fragment library design [63].
13.2.8
Conclusions and Future Challenges
As annotated databases, WOMBAT and WOMBAT-PK continue to evolve in

time - not only with the addition of more entries but also with updates and
restructuring of the biological, clinical, and chemical information, which is
subject to revision even after the data are captured and indexed. The inclusion
of the precomputed properties panel allows the users to quickly identify rule-of-
five or rule-of-three compliant datasets, or to constrain the query with respect
to, for example, flexible bonds, PSA, computed solubility or LogP, and so on.
WOMBAT and WOMBAT-PK are currently available in the MDL Isis/Base
format. WOMBAT is also integrated in CABINET (Chemical And Biological
782
FH3
&CH H -
H , CH . N i o q N
H,C.‘ C H ’
C C H 3 /
0
Quinpirole Physostigmine
MW = 219.33 MW = 275.35 Norethindrone
ClogP = 2.02 ClogP = 1.95 MW = 298.43
EC, = 8.66 (D,) IC, = 9.16 (AChE) ClogP = 2.78
K, high = 8.80 (D4) EC, = 8.66 (PR,)
IC, = 8.09 (BChE)
K, low = 7.31 (D,) K, = 8.73 (PR,)
K, high = 7.62 (D3)
K, (OW = 6.38 (D,) CH3 CH3
9
H3C RTI-110
HO &
Morphine
OH
Ondansetron MW = 279.77
ClogP = 3.12 MW = 285.35
MW = 293.37
IC, = 9.21 (DAT) ClogP = 0.57
ClogP = 2.71
K, = 8.2 (H3) IC, = 8.38 (NET) K, = 9.3 (P,)
K, < 6.0 (H,) IC, = 8.26 (5-HTT) K, = 8.6 012)
K, = 9.1 1 (5-HT,) K, = 6.55 (6)
A, = 9.9 (5-HT4) K, = 7.31 (k,)
K, = 7.48 (k,)
FH3
O w N H
CH, z
5-OMe-a-Me-Tryptamine
MW = 204.27 LY-191704 H
ClogP = 1.75 MW = 249.74 SU-5416
K, = 8.66 (5-HT2,) ClogP = 2.82
MW = 238.29
K, = 8.08 (5-HT2,) IC, = 8.07 ( 5 ~ - R 1 ) ClogP = 2.83
K, = 9.0 (5-HT2,) IC, = 5.76 (5a-R2) IC, = 8.1 (Flt-I)
Fig. 13.2-13 Examples o f rule-of-three H I and H3 - histamine receptor types 1 and

compliant molecules that have biological 3 , S-HT~A,S-HT~B, S-HTzc, 5-HT3,
activity better than 10 nM. Under each 5-HT4 - serotonin receptor subtypes ZA, 26,
molecule, the following information is ZC, and types 3 and 4, DAT, NET,
included: molecule name, MW, ClogP, the 5-HTT - dopamine, norepinephrine, and
biological activity type, value, and target. serotonin transporter proteins,
Target names are as follows: D3 and p1, p2,6, k l , k3 - opioid receptor types
0 4 - dopaminergic receptor types 3 and 4, mu-1, mu-2, delta, kappa-1, and kappa-3,
AChE and BChE - acetyl- and 5u-R1 and 5a-R2 - 5-alpha-reductase
butyryl-choline esterases, PRA and isozymes 1 and 2, Flt-l - fms-like tyrosine
PRe - progesterone receptor types A and B, kinase receptor.
References I783
Informatics NETwork) [ G l , 621 as a server. CABINET [G2], a federation ofhigh-

performance scientific databases that collaborate through web-like interfaces
to provide integrated access to diverse chemical and biological information, is
described elsewhere [Gl].
Federated database servers such as CABINET could, for example, bring
together WOMBAT and C-QSAR [ G 3 ] , but the challenge goes beyond
technical issues related to field correspondence. Data normalization (e.g.,
ensuring similar treatment regarding chirality, salt information, measured
and computed properties) is likely to require on-the-fly data interpreters,
which in turn forces lack of ambiguity for all data entries in WOMBAT
and other databases. Data transparency is not always possible: For example,
most WOMBAT entries related to epithelial growth factor receptor (EGFR)
are classified as ‘TargetType = enzyme’, because EGFR is a membrane
receptor-linked tyrosine-protein kinase and medicinal chemists target EGFR
for kinase inhibition. However, in one instance, ‘TargetName’ was assigned
as ‘receptor’ because the endogenous ligand, EGF, was used to test for EGFR
antagonism [64]. Thus, restricting data fields to certain value types, usually
an asset for database indexing, can become a hindrance when the unexpected
occurs. And, one of the challenges in database federation remains adaptive
data normalization for biology-related data fields, since biological phenomena
are not always amenable to unambiguous mapping. By successfully addressing
these problems, it is quite likely that integrated data mining tools will change
the way we conduct everyday research.
Acknowledgments
The authors thank Prof. Hugo Kubinyi (Heidelberg, Germany) for suggestions.
References
I . Chemical Abstracts online and its http://www.mdli.com/products/

search module, SciFinder, are finders/database_finder/ and from
available from the American Chemical Prous Science Publishers,
Society, http://www.cas.org/ http://www.prous.com/index.html,
SCIF I N D E R/ ,2006. 2006.
2. The Beilstein Information Systems is 5. WDI. The Denvent World Drug Index,
available from, http://www. is available from Dement Publications
beilstein.com/. 2006. Ltd., http://
3. The Spresi Database is available from thomsonderwent.com/products/
InfoChem GmbH, Miinchen, Irlwdij and from Daylight Chemical
http://www.spresiweb.de/; and from Information Systems, http://www.
Daylight Chemical Information daylight.com/products/databases/
Systems, http://www.daylight.com/ WDI.htm1, 2006.
products/databases/Spresi.html, 2006. 6. The Current Patents Fast Alert
4. MDDR is available from MDL database is available from Current
Information Systems, Patents Ltd., London,
784
I http://www.current-patents.com/, 16. M. Olah, T.I. Oprea, Bioactivity
2006. databases, in Comprehensive Medicinal
7. The Comprehensive Medicinal Chemistry 11, (Eds.: J. Taylor,
Chemistry database is available from D. Triggle), Elsevier, New York, 2006.
MDL Information Systems, Inc., 17. D. Weininger, SMILES 1. Introduction
http://www.mdli.com/products/ and encoding rules, J. Chem. Ins
knowledge/medicinalLchem/index.jsp, ComPut. sci. 1988, 28, 31-36.
2006. 18. D. Weininger, A. Weininger, J.L.
8. DiscoveryGate is available from MDL Weininger, SMILES 2. Algorithm for
Information Systems, Inc., http:// generation of unique SMILES
www.mdli.com/products/knowledge/ notation, J. Chem. In& C o m P t . SCi.
discoverygate/; a subset of 1989,29,97-101.
DiscoveryCate is available through the l9. The Nomenclamre is
PubChem system, see http:// recommended by the International
www.mdli.com/company/news/ Union of Biochemistry and Molecular
press-releases/2006/pr-pubchem- Biology, and is available at
ZlmarOG.jsp, 2006. http://www.chem.qmul.ac.uk/iubmb/
9. The PubChem database is available enzyme/, 2006.
online at the National Center for 20. Swiss-Prot Protein knowledgebase
Biotechnology Information, database, http://kr.expasy.org/sprot/,
http://pubchem.ncbi.nlm.nih.gov/, 2006.
2006. 21. C.A. Lipinski, F. Lombardo, B.W.
Dominy, P.J. Feeney, Experimental
10. C.P.Austin, L.S. Brady, T.R. Insel,
F.S. Collins, NIH molecular libraries and computational approaches to
estimate solubility and permeability in
initiative, Science 2004, 306,
drug discovery and development
1138-1139.
settings, Adv. Drug Delivery Rev. 1997,
11. The Physician Desk Reference is
23, 3-25.
produced by 22. A. Leo, Estimating LogP,,, from
2003, ISBN 1-56363-472-4,and is structures, Chem. Rev. 1993, 5,
available online at http://www. 1281-1306.
pdr.net/, 2006. 23. I.V. Tetko, V.Y. Tanchuk, Application
12. The DrugBank database is available at, of associative neural networks for
http://redpoll.pharmacy . prediction of lipophilicity in ALOGPS
ualberta.ca/drugbank/, 2006. 2.1 program,J. Chem. In$ Comput. Sci.
13. M. Olah, M. Mracec, L. Ostopovici, 2002, 42, 1136- 1145, http:// 146.107.
*.
R. Rad, Bora, N. Hadaruga, I. OIah, 217.178/lab/alogps/index.html.
M. Banda~'. Simon, M. Mracec, T.l. 24. The Digital Object Identifier (DOI) is a
*OMBAT: Of system for identifying and exchanging
bioactivity, in Cheminformatics in Drug intellectual property in the digital
Discovery, (Ed.: T.I. Oprea), environment (http://www.doi.
Wiley-VCH, New York, 2005, org/). An object is directly accessible
223-239. using the customized address
14. T.I. Oprea, P. Benedetti, G. Berellini, http://dx.doi.org/DOI_VALUE,
M. Olah, K. Fejgin, S. Boyer, Rapid 2006.
ADME filters for lead discovery, in 25. Merck Index (13th edition), Merck &
Molecular Interaction Fields, (Ed.: Co, Rahway N J , 2001.
G. Cruciani), Wiley-VCH, New York, 26. T.I. Oprea, M. Olah, L. Ostopovici,
2006,249-272. R. Rad, M. Mracec, in On the
15. WOMBAT and WOMBAT-PK are Propugation of Errors in the Q S A R
available from Sunset Molecular Literature in EuroQSAR
Discovery, Santa Fe, New Mexico, 2002 - Designing Drugs and Crop
http://www.sunsetmolecular.com, Protectants: Processes, Problems and
2006. Solutions, (Eds.: M. Ford,
References I 7 8 5
D. Livingstone, J. Dearden, H. Van de T. Brock, T.P. Kogan, R.A. Dixon,
Waterbeemd), Blackwell Publishing, Discovery ofTBC11251, a potent, long
New York, 2003, 314-315. acting, orally active endothelin
27. ACDName is available from Advanced receptor-A selective antagonist, J. Med.
Chemistry Development Inc., Toronto, Chem. 1997,40, 1690-1697.
Ontario, CA, http://www.acdlabs.com. 36. C. Wu, M.F. Chan, F. Stavros, B. Raju,
28. G-protein coupled receptors are I. Okun, R.S. Castillo,
classified according to the GPCR Structure-activity relationships of
nomenclature available at N2-aryl-3-(isoxazolylsulfamoyl)-2-
http://www.gpcr.org/7tm, whereas thiophenecarboxamides as selective
nuclear receptors are annotated based endothelin receptor-A antagonists,
on the N R nomenclature available at /. Med. Chem. 1997,40, 1682-1689.
http://www.receptors.org/NR, 2006. 37. S.S. So, M. Karplus,
29. E.J. Filardo, J.A. Quinn, K.I. Bland, Three-dimensional quantitative
A.R. Frackelton Jr, Estrogen-induced structure-activity relationships from
activation of Erk-1 and Erk-2 requires molecular similarity matrices and
the G protein-coupled receptor genetic neural networks. 2.
homolog, GPR30, and occurs via Applications, /. Med. Chem. 1997, 40,
trans-activation of the epidermal 4360-4371.
growth factor receptor through release 38. B.J. Burke, A.J. Hopfinger,
of HB-EGF, Mol. Endocrinol. 2000, 14, 1-(Substituted-benzy1)imidazole-
1649- 1660. Z(3H)thione inhibitors of dopamine
30. C.M. Revankar, D.F. Cimino, L.A. B-hydroxylase, J. Med. Chem. 1990, 33,
Sklar, J.B. Arterburn, E.R. Prossnitz, A 274-281.
transmembrane intracellular estrogen 39. A. Vedani, D.R. McMasters,
receptor mediates rapid cell signaling, M. Dobler, Multi-conformational
Science 2005,307,1625-1630. ligand representation in 4D-QSAR:
31. N. Nidhi, M. Glick, J.W. Davies, J.L. Reducing the bias associated with
Jenkins, Prediction of biological ligand alignment, Quant. Struct.-Act.
targets for compounds using Relat. 2000, 19, 149-161.
multiple-category Bayesian models 40. A.G.S. Blommaert, H. Dhotel,
trained on chemogenomics databases, B. Ducos, C. Durieux, N. Goudreau,
J. Chem. In$ Model. 2006, 46,000-000. A. Bado, C. Garbay, B.P. Roques,
32. T.I. Oprea, Cheminformatics in lead Structure-based design of new
discovery, in Cheminformatics i n Drug constrained cyclic agonists of the
Discovery, (Ed.: T.I. Oprea), cholecystokinin CCK-B receptor,
Wiley-VCH, New York, 2005,27-42. J. Med. Chem. 1997,40,647-658.
33. E.A. Coats, The CoMFA steroids as a 41. T.I. Oprea, C.L. Waller, G.R. Marshall,
benchmark dataset for development of 3D-QSAR of human
3D-QSAR methods, in 3 0 Q S A R in immunodeficiency virus ( I ) protease
Drug Design, Vol. 3, Recent Advances, inhibitors. 11. Predictive power using
(Eds.: H. Kubinyi, G . Folkers, Y.C. limited exploration of alternate
Martin), Kluwer/ESCOM, Dordrecht, binding modes,J. Med. Chem. 1994,
1998,199-213. 37,2206-2215.
34. Q. Chen, C. Wu, D. Maxwell, G.A. 42. A. Leo, Personal communication,
Krudy, R.A.F. Dixon, T.1. You, A 2004.
3D-QSAR analysis of in vitro binding 43. D. Abraham, Burger’s Medicinal
affinity and selectivity of Chemistry (6th edn), Wiley-VCH, New
3-izoxazolylsulfonylaminothiophenes York, 2003.
as endothelin receptor antagonists, 44. D. Abraham, Personal
Quant. Struct.-Act. Relat. 1999, 38, communication, 2004.
124-133. 45. J.G. Hardman, L.E. Limbird, P.B.
35. C. Wu, M.F. Chan, F. Stavros, B. Raju, Molinoff, R.W. Ruddon, A.G. Gilman,
I. Okun, S. Mong, K.M. Keller, Goodman @ Gilman’s the
786
I 13 Chemica/ informatics
Pharmacological Basis of7herapeutics pharmaceutical research, Curr. Opin.

(9th edn), McGraw Hill, New York, Chem. Biol. 2004,8,255-263.
1996. 57. R.A. Goodnow Jr, P. Gillespie,
46. T.M. Speight, N.H.G. Holford, Avery’s K. Bleicher, Cheminformatic tools for
Drug Treatment (4th edn), Adis library design and the hit-to-lead
International, Auckland, 1997. process: a user’s perspective. in
47. FDA labels are at the Center for Drug Cheminformatics in Drug Discovery,
Evaluation and Research (CDER), (Ed.: T.I. Oprea), Wiley-VCH, New
website, http://www.accessdata. York, 2005,381-435.
fda.gov/scripts/cder/drugsatfda/. 58. K.H. Baringhaus, H. Matter, Efficient
2006. strategies for lead optimization by
48. J.F. Contrera, E. J. Matthews, N.L. simultaneously addressing affinity,
Kruhlak, R.D. Benz, Estimating the selectivity and pharmacokinetic
safe starting dose in phase I clinical parameters, in Cheminformatics in
trials and no observed effect level Drug Discovery, (Ed.: T.I. Oprea),
based on QSAR modeling of the Wiley-VCH, New York, 2005,
human maximum recommended daily 333-379.
dose, Regul. Toxicol. Pharmacol. 2004, 59. M. Congreve, R. Carr, C. Murray,
40,185-206. H. Jhoti, A ‘Rule ofThree’ for
49. MRTD is available from the CDER, fragment-based lead discovery?
website, http://www.fda.gov/cder/ Drug Discov. Today 2003, 8,
Offices/OPS_IO/MRTD.htm,2006. 876-877.
50. C. Hansch, A. Leo, D. Hoekman, 60. T.I. Oprea, J. Blaney,
Exploring QSAR, Vol. 2, ACS Cheminformatics approaches to
Publishers, Washington D.C., 1995. fragment-based lead discovery, in
51. The Sangster database is available at, Fragment-based Approaches in Drug
http://logkow.cisti.nrc.ca/. Discovery, (Eds.: W. Jahnke, D.A.
52. L.Z. Benet, Personal communication, Erlanson), Wiley-VCH, New York,
2006. 2006,99-121.
53. S. Andrieu, M. Lebret, J. Maclouf, 61. V. Povolna, S. Dixon, D. Weininger,
F. Beverelli, J.F. Giudicelli, CABINET - Chemistry and biological
A. Berdeaux, Effects of antiaggregant informatics network, in
and antiinflammatory doses of aspirin Cheminformatics in Drug Discovery,
on coronary hemodynamics and (Ed.: T.I. Oprea), Wiley-VCH, New
myocardial reactive hyperemia in York, 2005,241-269.
conscious dogs, J. Cardiovasc. 62. CABINET is available from
Pharmacol. 1999,33,264-272. Metaphorics LLC, Santa Fe, N M ,
54. D.G. Lloyd, G. Golfis, A.J.S. Knox, http://cabinet.metaphorics.com/.
D. Fayne, M.J. Meegan, T.I. Oprea, 63. C. Hansch, D. Hoekman, A. Leo,
Oncology exploration: charting cancer D. Weininger, C.D. Selassie, C-QSAR
medicinal chemistry space, Drug database. Available from the BioByte
Discov. Today 2006, 11, 149-159. Corporation, Claremont, CA,
55. M.M. H a m , A. Leach, D.V.S. Green, http://www.biobyte.com.
Computational chemistry, molecular 64. P. Furet, B. Gay, G. Caravatti,
complexity and screening set design, C. Garcia-Echeverria, J. Rahuel,
in Cheminfomatics in Drug Discovery, J . Schoepfer, H. Fretz, Structure-based
(Ed.: T.I. Oprea), Wiley-VCH, New design and synthesis of high affinity
York, 2005,43-57. tripeptide ligands of the Grb2-SH2
56. M.M. Hann, T.I. Oprea, Pursuing the domain, I. Med. Chem. 1998,41,
leadlikeness concept in 3442- 3449.
PART VI
Drug Discovery
Copyright 0 2007 WILEY-VCH Verlag CmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
Chemical Biology
CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim I789
14
Chemical Biology and Drug Discovery
14.1
Managerial Challenges in implementing Chemical Biology Platforms
Frank L. Douglas
14.1.1
introduction
This chapter will present the experiences and perspectives that led to the
creation of a concept named Chemical Biology Platform (CBP). CBPs embrace
the modern day version of the “drug discoverer” and the management
challenges associated with innovation. The management challenges are largely
due to the complexity and marked increase in quantity of information about
chemical structures, disease targets, and pathophysiology, as well as the
pharmacology studies in disease models and patient subpopulations.
Currently, management must also address the additional complexity of
mergers, which also affects information integration and organizational
collaboration. The challenge of accessing and correlating information
generated by the partners in the merger is often underestimated. Perhaps,
even more challenging is the attempt to build a culture for the newly merged
company in which scientists from different countries and organizations
share information, collaborate, determine global standards, and leverage both
tacit and explicit knowledge. The discussion will therefore focus on both
the scientific and cultural underpinnings of CBPs within an organizational
context.
14.1.2
The Management Challenge
The discovery and development cycle requires 10 to 15 years to move from a

conceptual biological and chemical approach, through preclinical and clinical
ISBN: 978-3-527-31150-7
790
I development to approval. Since, not surprisingly, the probability of success
74 Chemical Biology and Drug Discovery
(POS) increases with a project’s progression along the development path to

approval, the key challenge of management is to change the traditional POS
and compress the time relationship to one that is most representative of a
knowledge-driven paradigm (see Fig. 14.1-1).The knowledge-driven S curve is
often achieved when a team is working on follow-on compounds for a validated
disease target, or when the target being pursued is a “common mechanism”,
which is relevant for more than one disease and the compound has approval
for one of the diseases within the common mechanism.
Practically, in the innovation of new drugs, one can classify the set of research
and development activities into four primary areas or clusters of activities,
technologies, and responsibilities. The classifications are: target identification,
lead finding, lead optimization, and proof of product or product realization.
Note that target validation, a critical element in research and development, is
ultimately demonstrated in successful phase 111 clinical studies.
In Aventis, the traditional Research and Development organization was
reorganized into four divisions where the most relevant disciplines were
clustered within each division, as shown in Table 14.1-1. This organizational
design was based on three principles. First, clustering disease expertise
with concomitant technological support increases innovation and knowledge.
Secondly, aligning global resources and accountability will leverage scarce
resources, enhance technological innovation and reduce cycle time; and third,
late stage expertise applied to early innovative projects will rapidly identify
issues, conserve resources, and provide clinical knowledge for next generation
projects.
Probability
of
success
0 TI LI 1 LO PR GRAMS 15 years
IND/CTX
Discovery and development time
Fig. 14.1-1 TI-target identification; Li-lead identification; LO-lead optimization;

PR-product realization; GRAMS-global regulatory and marketing support.
74. I Managerial Challenges in Implementing Chemical Biology f/atforms I 791
Table 14.1-1 Centers ofexpertise in the Drug innovation and

Approval Organization of Aventis
Lead generation Lead organization

~
Functional genomics Drug metabolism and pharmacokinetics

(DMPIC)
Lead discovery technologies
Chemistry (medicinal and computational) Drug safety evaluation
Chemical development Clinical discovery and human
pharmacology (phases I and IIa)
Product realization Global regulatory and marketing support
Clinical development (phases IIb and 111) Quality assurance
Pharmaceutical development Chemistry, manufacturing and control
Biostatistics and data management Pharmacovigilance and epidemiology
Global project teams Regulatory liaison and policy
Global labeling
We, at Aventis, were also convinced that a dramatic increase in the

POS would occur only when the following three conditions were satis-
fied, namely:
The selected target is relevant and critical in the disease
process.
Proof of principle of target validation can be demonstrated in
the relevant patient population, usually in phase IIa clinical
trials.
Clinical trials can be designed and performed to demonstrate
a good benefitfsafety ratio, usually in phase 111.
Each of these requirements represented unique challenges in which

insufficient information and knowledge affect the POS.
14.1.3
Observation-based Discovery Background
Historically, drug discovery proceeded through the exploitation of observations

about a potential therapeutic product without having either an optimized
compound or an identified target as starting point. Two outstanding examples
are aspirin and penicillin, and both exemplify how POS is increased and the
time to development of follow-on products is accelerated by use of accumulated
knowledge. The story of acetylsalicylic acid (aspirin) began as early as fifth
century B.C. when Hippocrates noted that the powder from the bark and leaves
of the willow tree could treat headaches and fever. However, it was not until
the late 1820s that the work of several European scientists, including Johann
792
I Buchner of Germany, Brugnatelli and Fontana of Italy, and Henri Leroux,
resulted in the extraction of the active ingredient salicin [l].

In 1899, the German chemist Felix Hoffmann convinced Bayer to market
acetylsalicylic acid, which was first synthesized in 1953 by Frederic Gerhardt
and was devoid of the severe stomach irritation that was seen with the
unbuffered salicylic acid. This was followed by the rapid development of
several organic acids similar to aspirin, for example, ibuprofen and diclofenac,
which were approved for the treatment of pain and inflammatory disorders
[2]. Thus, the focus became the modification of chemical structure to
optimize the activity of these compounds to treat inflammation and pain.
Finally in 1971, Sir John Vane identified aspirin’s mechanism of action,
namely the inhibition of cyclooxygenase (COX) enzyme that converted
arachidomic acid to prostaglandins [3].The identification of COX as the target
accelerated the discovery and development of nonsteroidal anti-inflammatory
drugs (NSAIDs).
Perhaps one of the most impressive acceleration of the time from discovery
to product was that of development of COX-2 inhibitors, such as celecoxib
(Celebrex).Celebrex, a COX-2 selective inhibitor, was brought onto the market
in 1999, about 8 years after identification of the COX-2 enzyme. This example
demonstrates the marked reduction in cycle time that is possible when
one is able to satisfy the requirements of achieving the “S Curve”. These
requirements are:
a validated target
knowledge of the structure of the target
a large library of compounds with clear structure-activity
relationship (SAR)
predictive animal models, and/or
a human model of disease.
To clarify the understanding of a human model of the disease, we mean a

human illness in which Koch-like postulates can be demonstrated, that is:
a marker of the disease is present in the population;
the intervention impacts the marker;
the change in the marker correlates with the clinical response.
The recognition of the role of prostaglandins in inflammation and platelet

function led to the rapid use of the production or inhibition of various
prostaglandins as markers for inflammatory and thrombotic diseases.
Another example of this historical approach is the discovery of penicillin. In
1871, the English surgeon Lister observed that urine samples contaminated
with mold did not allow the growth of bacteria. More important, in 1897
Ernest Duchenne reported that Penicillium glaucum inhibited the growth of
Escherichia coli when both were grown on the same culture and that P. glaucum
also prevented animals inoculated with lethal doses of typhoid bacilli, from
contracting typhoid. Duchenne’s premature death from tuberculosis prevented
14. J Managerial Challenges in Implementing Chemical Biology Platforms I 793
his further pursuit of the observations [4]. In 1928, Sir Alexander Fleming
observed that a species of the mold Penicillium had inhibited the growth of
Staphylococcus aureus in a culture.
Like a true drug discoverer, however, Sir Alexander Fleming, having
discovered lysozyme in 1922, sensed the importance of his serendipitous
observation and pursued it. His tireless enthusiasm for and presentation
of his work on penicillin finally won the interest of Drs. Cecil Paine,
Howard Florey, and Ernest Chain. They were able to demonstrate the medical
potential of penicillin in individual infected cases, as well as succeed in
extracting “purified” drug in about 1940. Between 1940 and 1942, efforts
were successfully focused on the challenge of optimizing the production of
penicillin. Seventeen years later, John Sheehan of Massachusetts Institute of
Technology achieved a total synthesis of natural penicillin [S].
Thus, the penicillin story demonstrates a history similar to that of aspirin.
As in the case of aspirin, a target was serendipitously recognized from an in
vitro observation and there was a simultaneous proof of presence of an active
ingredient or compound. Drug discovery was thereafter focused on isolating
and synthesizing the active ingredient while pharmacological experiments
were performed in parallel. In the case of aspirin, the discovery of penicillin
is another case in which one started with a validated, unidentified target and
an active unidentified, unoptimized drug. Progress was accelerated when the
structure of penicillin was solved and its mechanism in inhibiting the cross-
linking of peptidoglycan was identified [GI.This discovery led to a number of
semi- and synthetic penicillins and cephalosporins, both based on the j3-lactam
structure that inhibited the enzyme that forms the peptidoglycan structure of
the cell walls of bacteria.
14.1.4
Mechanism-based Discovery Background
Propranolol is an interesting development and example of the progression

toward mechanism-based research. The hypothesis of the existence of the
,!?-receptorand the search for an antagonist occurred almost simultaneously.
“Tools” to optimize compounds and to characterize a- versus j3-receptors
became available. The continued modification of these compounds along
with simultaneous improvement of the bioassays resulted in a rapid cycle of
information generation and exploitation. In addition, Sir James Black was able
to go rapidly into a proof of concept in healthy volunteers with pronethalol, a
prototype and predecessor compound to propranolol.
This evidence revealed that a drug discovery team of pharmacologists and
chemists was rapidly incorporating new information, making correlations,
and prototyping. It was the genesis of the concept of chemical biology but not
formally accepted as a practice. The POS was greater than what would have
been expected at the beginning of this project because tool compounds existed
794
I that allowed simultaneous attempts at validating the hypothesized target as
well as finding the optimal compound.

Similar conditions existed in the case of antihistamines and that enabled
Sir James Black to propose that there were two histamine receptors and to
validate rapidly the hypothesized HI receptor with cimetidine, an optimized
compound. The key point of these successes, however, is the fact that Sir
James Black, the pharmacologist, and Dr Stephenson, the chemist, integrated
and correlated previous information to uncover new drugs [7].
14.1.5
Twenty-first Century Experience: Ketek (Novel Anti-infective Drug in 2003)
In our own experience at Aventis with Ketek, we could go rapidly from concept
to regulatory submission, because the in vitro biological models existed. The
models rapidly validated (a) its antibacterial activities and (b) the binding at
two sites on the 23s rRNA of the 50s ribosomal subunit which made it
effective against penicillin-resistant Streptococcus pneurnoniae [8].Secondly, an
understanding of the drug’s metabolism enabled targeted clinical studies to
evaluate any potential liabilities with respect to liver side effects or QT,. Thus,
the POS was high due to the extensive knowledge in the antibiotic arena and
expertise in QT, that existed in Hoechst Marion Roussel where it could be
leveraged during the discovery and development of Ketek. This was the case
of a validated target but unoptimized compound (Fig. 14.1-1).Ketek was also a
second compound in the series, as the first compound was terminated because
of liver side effects.
The above examples satisfy the Sir James Black criteria for selecting projects
with a high initial POS. Sir James Black’s advice was:
1. Start with a clinical problem.
2. Identify the controlling chemicals or hormones in the
system.
3 . Start at the most basic molecular level and test similar
molecules for in vitro activity [9].
The three points mentioned above were clearly observed in the discovery
of Enbrel. In this case, a fusion protein consisting of soluble p75-TNF (tumor
necrosis factor) receptor type 11 and the F, protein of human IgG receptor
was the “chemical” of interest. This approach was very clever in that Craig
Smith and Raymond Goodwin proposed that injecting a soluble TNF receptor
would assist in binding the excess TNF, which on interacting with its receptor
on the cell triggers the inflammatory process in rheumatoid arthritis patients.
The excess circulating TNFa, was the identified and somewhat validated target.
This cytokine plays a critical role in synovial proliferation. The technical
optimization step was the cloning and expressing of the TNF receptor. And
as in the earlier case of propranolol, an animal model existed, namely,
14. I Managerial Challenges in Implementing Chemical Biology Platforms I 795
the collagen-induced arthritis mouse model, in which the concept could be

simultaneously optimized and validated. Further, TNF served as a biomarker
in the patient studies.
14.1.6
Observation Summary and Future Application
The above examples reveal the following characteristics for an enhanced POS:
1. degree of validation of the target
2. optimization of leads
3 . ability to link optimization of lead with in vivo validation
of target
4. ability to test early in humans, particularly with aid of
biomarkers
5. rapid prototyping through leveraging of knowledge
generated from previous, relevant studies.
In complex, global organizations, the challenge is to create an environment

that enables the transfer of information and knowledge, and utilizes rapid
prototyping. One answer is the establishment of CBPs and was applied in
Aventis.
Figure 14.1-1 schematically shows the above scenario for a CBP project and
compares it with known mechanism-based approaches and unidentified and
unvalidated target projects. The middle curve represents the case of aspirin or
penicillin in which a validated but unidentified target is discovered. Concurrent
with the discovery of this target there is also recognition of the existence of
an active principle or compound. The discovery effort is therefore initially
focused on isolating and characterizing the active compound, followed by
simultaneous development of in vitro and in vivo biological assays to enable
optimization of the compound.
The positive POS value depends on the disease being studied. For example,
it is greater for anti-infectives as compared to an antipsychotic, because the
efficacyin vitro and animal assays are more predictive for efficacy in man when
one is dealing with anti-infectives. The POS rises rapidly through phase IIa,
the end of the lead optimization period.
The bottom curve for a selected and unidentified and unvalidated target
represents today’s paradigm. Here, the example is a selected putative target
based on differential gene expression. Targets of this nature are rarely validated.
A second challenge is that its protein product, for example, enzyme, although
easily identified, often is not easily crystallized, and therefore little structural
information is available to permit a rational drug design approach. This
period of target identification/lead identification (LI) can sometimes be
quite long, 2 to 5 years, before one can start the lead optimization phase
of activities. The POS approaches 100% much more slowly, even after
796
I 14 Chemical Biology and Drug Discovery
initial work in clinical phase 111 is underway, and only at the conclusion
of phase 111 are the data available to determine whether the target is valid and
relevant.
The upper curve is the best-case scenario. Here the target is not only
identified but also validated. In addition, the biological structure is known and
as a result one can start with rational drug design and de nouo synthesis. Here,
the time to LI is shortest. At the very outset of the project, the POS is very high,
both because the target is validated and there is structural information that
enables rapid lead finding, optimization, and prototyping. This situation is
approached when one is working on follow-on or next generation compounds
for a drug that is already in the market, and has a clear mechanism of action
or target.
The genomic age presents a significant opportunity to rapidly generate infor-
mation and approximate the upper or common mechanism curve. Genomics,
proteomics, metabolomics, pharmacogenomics, and bioinformatics will bear
fruit when two additional disciplines mature. These disciplines are the struc-
tural biology and the application of knowledge management to families of
targets such as kinases, proteases, ion channels, and G-protein coupled re-
ceptors (GPCRs).This will enable prediction and generation of SARs in silico,
which is the hope and future of CBPs.
14.1.7
Establishment o f Organizational Structures for Chemical Biology Platforms
In 1997,as mentioned above, Hoechst Marion Roussel, later to become Aventis,

reorganized Research and Development and renamed it Drug Innovation and
Approval (DI&A) (Fig. 14.1-2). A key aspect of this organization was the
creation of the Lead Optimization organization that had the responsibility
to develop proof of concept in man. This organization provided support
to the project teams by generating data in the areas of drug metabolism
and pharmacokinetics (DMPK), toxicology, biomarkers, and phases I and IIa
clinical trials. The goal was to go rapidly into human studies and through
“rapid prototyping” feed back information to the project teams to enable the
optimization of their compounds.
Another key component of the Drug Innovation and Approval organization
was the multidisciplinary project teams. The project teams were the “units of
innovation” and were managed by the Heads of the various sites, who had
responsibility from target identification through phase IIa. After phase IIa, the
projects were managed on a global basis from the Global Drug Development
Center, in Bridgewater, New Jersey. Since each site had responsibility for
specific diseases, through phase IIa, as well as the global functions, lead
generation and lead organization had units at each site (see Table 14.1-1);
all members of these project teams were colocated through phase IIa. This
permitted the close, rapid exchange of information and collaboration around
14.1 Managerial Challenges in Implementing Chemical Biology Platforms 1 797
Fig. 14.1-2 Drug Innovation and Approval (DldA).
each project. The members of project teams also benefited from the knowledge
that existed in their disciplines, as they could bring the expertise of their
colleagues to any challenge.
In 1999, during another set of discussions on how to best share knowledge
across project teams in different sites, we discerned several key points. First,
we had 54 projects with kinases as targets. These projects were focused on
inflammatory diseases, cancer, and central nervous system disorders and
existed in all three sites. Secondly, there were no organized mechanisms to
foster communication or knowledge sharing among the scientists.
A third revelation was that there were some common problems, for example,
the toxicity of lead compounds against kinase targets; or the need to develop
biased libraries of compounds to enhance “hit” finding; or lack of structural
information about the specific kinase enzymes.
A fourth revelation was that, although we had made significant progress
in DMPK, we were still dramatically losing compounds in man because of
safety issues. However, sharing of knowledge among the DMPK scientists did
contribute positively to the improvement in attrition rate due to poor DMPK
characteristics.
Another reality was that 60% of the 200 top selling drugs came from four
classes of mechanisms, namely, GPCRs, proteases, kinases, and ion channels
and transporters.
798
Finally, there was the recognition that the strategies used to find leads
were related to the amount of information we had about the structure of the
target. Thus the more knowledge available, the less time was needed to find
a lead compound. In fact, the strategies used to find lead compounds were in
decreasing order; de novo synthesis, virtual screening, focused screening, and
high-throughput screening, depending on the extent of knowledge available.
A focus on understanding the structure of the target to identify the spatial and
energy requirements of the potential agonist or inhibitor was a clear need.
The anticipated deciphering of the human genome was seen as the event
that would catalyze the ability to elucidate the structure of targets and further
enable rational drug design.
14.1.8
Chemical Biology Platforms (CBP)
In 2000, I introduced the Kinase Chemical Biology Platform that was the
first of our four CBPs. The initial step was to identify all scientists across
the company (now Aventis) with expertise and interest in kinases. The survey
yielded about 300 scientists, many of whom were actively involved in kinase
projects. We created a Kinase Community of Practice with these scientists as
members and used knowledge mail to facilitate communication, exchange,
and development of the kinase network.
The second step was the establishment of the Platform. There were two
key principles in establishing the CBP. First, (a) no changes in the DI&A
basic organizational structure and (b) the goal of the Platform was to facilitate
knowledge transfer to enable simultaneous drug discovery. (Simultaneous
drug discovery meant anticipating the critical issues and working on them in
a parallel rather than sequential fashion.)
A CBP core team was appointed and given a charter. This team consisted of
senior scientists who were respected by their peers. Each represented one of the
following disciplines: medicinal chemistry, computational chemistry, struc-
tural biology, molecular biology, toxicology, DMPK, clinical pharmacology,
and IT. A knowledge management specialist was assigned to the CBP.
The overall responsibility of each CBP core team was to:
leverage globally the target family knowledge across projects
independent of disease focus and priorities of each site;
improve Aventis’ target family compound collections (focused
libraries)
develop and apply the concept “all target compounds see all
targets of a family”;
develop target family-specific predictive models and tools
use external networks of experts in the field
to produce better compoundsfaster.

14. I Managerial Challenges in Implementing Chemical Biology Platforms 1 799
Each member of the CBP core team was expected to convene a small team of
individuals from hislher discipline, who were active members of project teams
within the same target family. These CBP strategy teams, as they were called,
identified problems that were common to several project teams and developed
strategies to solve them. Sometimes this involved engaging academic experts
to assist in the resolution. The results and “learnings” were shared with all
interested scientists (Fig. 14.1-3).
The responsibility ofthe core team was to discuss issues being pursued by the
strategy teams, identify the downstream implications for their individual areas,
and to look for “breakthrough” solutions or new methods of solving problems.
Areas of particular interest included use of structural biology information,
strategies for designing focused libraries, and identification of biomarkers.
14.1.8.1 Chemical Biology Early Success and Organizational Benefits

One of the early successes in the kinase CBP was the establishment of a
core panel of kinases against which all compounds of interest were screened,
and from which “surrogates” were used to form cocrystals and develop SAR.
Within 1 year, active compounds were found for the kinases, including 21
active series, and 9 lead compounds were selected.
800
A second immediate success was in DMPK. When a project team working on

ITK realized that their early compounds had safety problems due to inhibition
of P-450, the ITK team collaborated with the SYK team who had had the same
issue and had resolved it after a 2-year effort. ITK was able to benefit from the
recent knowledge that was gained in solving the SYK problem. As a result, ITK
required 6 months less to successfully design lead compounds without P-450
inhibition liabilities.
A third, and perhaps the most significant, achievement was the reduction
of the portfolio from 54 kinase to 38 kinase projects based on a more robust
evaluation of the POS of each project and ofthe resource commitment required
to prosecute the project. Thus, the organization conserved scarce resources
and reallocated it to other priorities.
We enabled knowledge sharing through the use of methods to capture
lessons learned in projects. A particularly effective method was the use of
the interrupted case study approach. Whenever a “breakthrough” or novel
solution to a problem was found, the scientists involved were invited to write
up the results as a case, and present the study at a workshop-setting with
an interrupted problem-solving approach. The scientist would at the outset,
describe the problem and its importance to the project. The participants would
brainstorm among themselves on potential solutions. The presenter would
select one or more suggested solutions that were tried and share the results.
After another round of brainstorming about other approaches or further
efforts, the final direction was presented. In this manner, the presenter would
finally unveil the unique solution. This method gained tremendous popularity
because it sometimes uncovered additional unanticipated approaches.
During the establishment of the kinase CBP, we encouraged the core team,
lead by Dr Andreas Batzer, to develop a “Book of Knowledge” in which they
recorded the organizational hurdles and the solutions that were encountered
in establishing the platform. This turned out to be a very useful exercise and
led to one of the most memorable experiences that I have had in my career in
the pharmaceutical industry.
About 6 months after the initiation of the kinase CBP, I was invited by
Dr Hans Peter Nestler to attend a workshop that he organized. He had no
other request but my presence. I was on vacation but in Frankfurt, so I
decided to attend the afternoon session. The first thing that was remarkable
is that Hans Peter had organized a “virtual” workshop among the centers in
Frankfurt, Paris, and New Jersey and was conducted by videoconference. The
second was that it brought together scientists from the different disciplines,
who were working on projects in the protease target family.
I listened without interruption and at the end of the session, Hans Peter
asked for my comments. I complimented him on the excellent effort and asked
how he was able to organize this workshop. He explained that he had used the
recommendations from the chemical biology Book of Knowledge as well as
had benefited from discussions with Andreas Batzer and his colleagues. And
14.1 Managerial Challenges in lmplementing Chemical Biology Platforms I 801
thus, the Protease Chemical Biology Platform with Hans Peter as head was
launched.
Shortly thereafter, a total of four chemical biology platforms: kinase (CBK)
led by Dr Andreas Batzer, protease (CBP) led by Dr Hans Peter Nestler, ion
channels and transformers (CBICT) led by Dr Heiner Glombik, and G-protein
coupled receptors (CBG) led by Dr Bruce Baron, were in operation. Thus,
within 18 months of my describing CBPs in my keynote address at IBC Drug
Discovery Conference in Boston in 2000, four CBPs were functioning.
Incidentally, this conference was very significant because the other keynote
address was delivered by Dr Craig Venter, who described the challenges
of deciphering the human genome. The next address was mine and it
acknowledged that, due to this incredible achievement that was led by
Dr Venter and Dr Francis Collins, one would be able to think in terms of target
families and develop knowledge about both structure and pathophysiology
more rapidly. The deciphering of the genome was critical to the application of
CBPs in industry.
14.1.9
Other Organizational and Knowledge Challenges
The desire to correlate information across projects and sites disclosed a critical
barrier. As a consequence of mergers or groups working independently, such
as in business unit structures with a single company, there was a lack of
standardization of assays, connectivity of databases, annotation of data, and
hence, we were unable to leverage knowledge or data. Thus, the correlation of
chemical and biological data was very difficult. We therefore launched, with
the help of a small team from McKinsey & Company, a program to establish
an informatics platform to support the CBPs. The goals of this effort included:
Provision of a curated, standardized, central repository to
enable rapid querying and retrieval of diverse, accurate
biological data (e.g., sequence similarity, expression, disease
association).
Knowledge-based establishment of correlations between
chemical space (compounds, hits, leads, etc.) and biological
space (e.g., target sequence and target 3D structure, as well as
ADMET data).
Ability to increase POS of the selected portfolio of projects by
selecting groups of targets with similar biological properties.
Identification of additional predictive and simulation tools to
leverage curated data, for example, ADM ET (absorption,
distribution, metabolism, elimination and toxicology). Rapid
identification of “privileged fragments” that lead to selection
of compounds of high interest for a specific target.
802
The overall hope was that the IT platform would not only improve
communications among the scientists but lead to increased correlations and
serendipitous findings.
14.1.1 0
Conclusion
Table 14.1-2summarizes the differences between the traditional drug discovery

approach and that fostered by chemical biology principles. CBPs were
designed to take advantage of the promise of genomics and power of
information technology in improving decision making and POS in drug
discovery and development. The platforms were expected to become the
“Knodes” or knowledge nodes of scientific networks that were focused
on understanding and generating information about families of enzymes,
receptors, ion channels, and transporters with respect to their ability to provide
solutions for altered homeostasis and disease in man.
By the end of 2002, the Aventis project portfolio was transformed. Of the 139
projects in the LI phase, the kinase and GPCR target families each contributed
19%,the protease and ion channels/transporters about 8% each. For projects in
the candidate identification phase, GPCR, kinase, and protease target families
each contributed about 20% of the compounds and ion channels/transporters
about 12% of the compounds in the portfolio.
With respect to processes, there were improved attempts and greater focus
on assuring standardization of assays, sharing of information, as well as
biased compound libraries across project teams, thus facilitating common
Table 14.1-2 Chemical biology
Traditional drug discovery DI&A chemical biology
Targets Collection of targets Selected target families

Workflow Sequential activities in chemistry Simultaneous efforts in internal
and biology and external networks
Scientific Traditional disciplines Knowledge-based approaches in
concept biology and chemistry
Organization Silos of functionality Cross-functional, beyond
disciplines, virtual, capability
oriented, DI&A network centric
Capabilities Existing skills in disciplines Best in class, knowledge-based,
learning curves
Value Individual projects Focus on optimizing the global
target family portfolio
M ind-set Functional, hierarchical lines of Entrepreneurial, value oriented
command
Source: CBK
References I803
mechanism projects across sites. External networks were under way and the
early results of the experiment were encouraging. I would recommend further
evaluation of this organizational approach to improve productivity in the
biopharmaceutical industry, and of the attempts made to quantify the results
to determine organizational benefits.
References
1. Mary Bellis, History of Aspirin, peptide cross-linking reaction in a cell

About.com. wall mucopeptide synthesis, Proc. Natl.
2. 1ohn.S. Nicholson, Ibuprofen, in Acad. Sci. 0.S. A. 1965, 54(1),75-81.
Chronicles of Drug Discovery, (Eds.: J.S. 7. 1ames.W. Black, Nobel Lecture: Drugsfor
Bindra, D. Leidner), John Wiley, New Emasculated Hormones: the Principles of
York, 1982,149. Syntopic Antagonism, 1988, Dec. 8.
3. J0hn.R. Vane, Inhibition of 8. R. Bersicio, et al. Structural insight into
prostaglandin synthesis as a the antibiotic action of telithromycin
mechanism of action for aspirin - like against resistant mutants, J. Bacteriol.
drugs, Nature 1971, 231,232-235. 2003, 185(14),4276-4279.
4. Mary Bellis, the History of Penicillin in 9. James Black Foundation Promotional
Inventors.guide@about.com. Materials, Published by The James
5. C&EN Special Issue, The Top Black Foundation, King’s College
Pharmaceuticals that changed the School of Medicine and Dentistry, Half
world, vol83, Issue 25 (6/20/05). Moon Lane, Dulwich (London),
6. E.M.J.R. Wise, J.T. Park, Penicillin: its England.
basic site of action as an inhibitor of a
Chemical Biology
804
14.2
The Molecular Basis o f Predicting Druggability
Bissan Al-Lazikani,Anna Gaulton, Gaia Paolini, Jerry Lanfear, John Overington,
and Andrew Hopkins
14.2.1
Introduction
Medicinal chemists have learnt through the experience of many hundreds of

screening campaigns in the pharmaceutical industry that for many targets
small-molecule modulators have not yet been discovered, even when screened
against a diverse chemical file of hundreds of thousands to millions of
compounds. Even when the medicinal chemist is fortunate enough to discover
a small-molecule modulator of the biological target of interest, it is common
for many “lead” compounds to be unsuitable for optimization into prototype
drugs. Chemical biologists may not require such optimized chemical tools
but both the chemical biologist and the medicinal chemist can learn from
each others experience in discovering chemical tools and leads. The failure
of many screening campaigns to discover druglike leads or chemical tools
against certain targets has lead to two competing hypotheses to explain and
overcome this phenomenon. The first hypothesis is that the discovery of a
chemical tool against a target is a function of the diversity of chemical space
screen against the target, independent of the target: the diversity argument.
The second hypothesis claims that the ability to discover a small-molecule
modulator is an inherent property of the physicochemical topology of a
biological target, independent of chemical space: the druggability argument.
These constraints are more severe if the aim is to discover drugs that can
be orally administered. The concept of druggability postulates that since
the binding sites on biological molecules are complementary in terms of
volume, topology, and physicochemical properties to their ligands, then only
certain binding sites on putative drug targets are compatible with binding
compounds having high affinity to compounds with “druglike” properties [l].
Furthermore, the concept also asserts that molecular recognition on biological
targets, such as proteins, has evolved to be exquisitely specific at discrete sites
on protein surfaces and creates stringent physicochemical limits that restrict
the target set available to modulation by small molecules. The extension
of this concept to a whole genome analysis leads to the identification of
the druggable genome: the genes and their expressed proteomes predicted
to be amenable to modulation by compounds compatible with druglike
properties [2, 31.
Copyright 0 2007 WILEY-VCH Verlag GrnbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
14.2.2
14.2 The Molecular Basis of Predicting Druggability
I 805
Chemical Properties of Drugs, Leads, and Tools
For in vitro or cellular experiments, the chemical biologists would require a

compound to have a minimum set of physicochemical characteristics to ensure
that the compound is within a range of solubility and polar/hydrophobic
balance of properties that enable the tool to permeate the cell membrane
and reach the site of action. For the medicinal chemist, the same principles
apply but the great range of biological barriers that a drug needs to pass
through to affect the biological system of a whole organism is far greater
and thus reduces the molecular property range of chemical space. Lipinski
introduced the concept of physicochemical property limits to the drugs, with
respect to solubility and permeability of drugs from a seminal analysis of
the Denvent World Drug Index, which demonstrated that orally administered
drugs are far more likely to reside in areas of chemical space defined by a
limited range of molecular properties. Lipinski’s analysis demonstrated that,
90% of orally absorbed drugs had molecular weights of less than 500 Da,
less than 5 hydrogen-bond donors (such as the OH and NH group count),
fewer than 10 hydrogen-bond acceptors (such as the total, combined nitrogen
and oxygen atom count being 10 or less), and lipophilicity of logP of 5 or
less [4]. The multiples of five observed in the molecular properties of drugs
led to the coining of the term Lipinski’s rule-ofjive (Ro5). Since the work of
Lipinski et al., various expansions of the definition, and methods to predict,
“drug-likeness” have been proposed in the literature [4-161. The common
thread emerging from the field is that drug-likeness is defined by a range
of molecular properties and descriptors that can discriminate between drugs
and nondrugs for such characteristics as oral absorption, aqueous solubility,
and permeability. This is illustrated by the observation that the distribution
of mean molecular properties of approved oral (small-molecule) drugs has
changed little in the past 20 years, despite changes in the range of indications
and targets [17].
14.2.3
Molecular Recognition is the Basis for Druggability
The molecular basis of the a priori druggability hypothesis is derived from

the biophysical study of molecular recognition. The binding energy ( A G ) of a
ligand to a molecular target (e.g., protein, RNA, DNA, carbohydrate) is defined
in Eq. (1).
A G = -RTlnK, = 1.4logK, (1)
where R = gas constant = 1.986 cal mol-’ K-l

The affinity of binding is predominately driven by the van der Waals
components and entropy components of the binding energy by the burying
806
I of hydrophobic surfaces. Thus for a ligand, such as a drug molecule, to
bind with an affinity of Ki = 10 nM it requires a binding energy ( A G ) of

-11 kcal mol-’. A lower affinity “hit” from a high-throughput screen of
Ki = 1 pM affinity equates to 8.4 kcal mo1-l. Thus a 10-fold increase in
potency is equivalent to 1.36 kcal mol-’ ofbinding energy. The binding energy
potential of a ligand is, in general, proportional to the available surface area
and its properties. The hydrophobic effect from the displacement of water
and the van der Waals attractions between atoms contributes approximately
0.03 kcal mol-’ k 2Thus, . a ligand with a 10 nM dissociation constant
would be required to bury 370 A’ of hydrophobic surface area, assuming that
there are no strong ionic interactions between the protein and the ligand.
Empirical analysis of nearly 50 000 biologically active druglike molecules
reveals a linear correlation between molecular weight and molecular surface
area (Fig. 14.2-1). The contribution of the hydrophobic surface to the binding
energy is demonstrated by the phenomenon of the “magic methyl”, in which
experienced medicinal chemists often observe that a single methyl group,
judiciously placed, can increase ligand affinity by 10-fold, approximately
equivalent to the maximal affinity per nonhydrogen atom [18].The accessible
hydrophobic surface area of a methyl group is approximately 46 A’ (if one
assumes that all of the hydrophobic surface area is encapsulated by the protein-
binding site and thus makes full contact with the target) with a hydrophobic
effect of0.03 kcal mol-’ k 2 equal to 1.36 kcal mol-’ approximately, equivalent
to the observed 10-fold affinity increase. In addition to the predominantly
hydrophobic contribution to the binding of many drugs, ionic interactions,
such as those found in zinc proteases (such as ACE inhibitors) contribute to
the binding energy. The attraction of complementary polar groups contributes
up to 0.1 kcal molt’ k 2with , ionic salt bridge approximately three times
greater, allowing low-molecular-weight compounds to bind strongly. Unlike
hydrophobic interactions, complementary polar interactions are dependent
on the correct geometry. Thus encapsulated cavities are capable of binding
low-molecular-weightcompounds with high affinities since they maximize the
ratio of the surface area to the volume.
Thus, the physicochemical characteristics of the binding site define the
physical and chemical properties of the ligand. Therefore, a target needs a
pocket that is either predefined or formed on binding by allosteric mechanisms.
In general, thermodynamics and selection pressure play a part in reducing
the accidental existence of such favorable pockets for ligand interactions. The
thermodynamic argument contests that it costs energy to maintain an exposed
hydrophobic pocket in an aqueous environment. Selection pressure may also
increase the specificity of molecular recognition for ligand pockets to avoid
inappropriate signaling or inactivation from the milieu of metabolite and small
molecules in which cells are bathed.
A quantitative approach is already well established for assessing the druglike
properties of a small molecule. Could such a quantitative approach be
74.2 The Molecular Basis offredicting Druggability I 807
Fig. 14.2-1 Relationship between area o f N, 0, P, and 5 atoms was estimated

molecular weight and molecular surface using the fast Ertl method [19] using a 2D
area. Analysis o f 49 456 biologically active, approximation. All other atom types
druglike compounds (1100 Da MW) with (excluding hydrogen atoms) were estimated
lCs0 <= 100 nM. Molecular weight was using an overlapping spheres method. All
calculated from the chemical structures calculations were performed using S
represented as desalted, canonical SMILES Scitegic’s Pipeline Pilot (Sari Diego, CAI.
strings. The calculated molecular surface
established for assessing the properties of proteins as drugs? The “rule-

of-five’’is a set of properties to suggest which compounds are likely to show
poor absorption or permeation, since such compounds are unlikely to show
good oral bioavailability [4].Physicochemical constraints such as this, limit
the type of proteins we see as drug targets; simply put, drug targets need to
be able to bind compounds with complementary properties. Since a receptor-
binding site must be complementary to a drug, it is reasonable to assume that
equivalent rules could be developed to describe the physicochemical properties
of binding sites with the potential to bind “rule-of-five” compliant molecules
808
I with a potent-binding constant (e.g., Ki < 100 nM). A number of properties
complementary to the “rule-of-five’’can be calculated, for example, the surface
area and volume of the pocket, hydrophobic and hydrophilic characters, and the
curvature and shape of the pocket. Following the assumption that properties
of the drug are complementary to those of the binding site, analysis of the
calculated physicochemical properties of the putative drug-binding pocket on
the target protein can provide an important guide to the medicinal chemist
in predicting the likelihood of discovering a drug against the particular target
site. On the basis of the known physicochemical properties of passively
absorbed oral drugs, one would predict “druggable” binding sites to be
predominately apolar cavities of 400-1000 A3, where over 65% of the pocket is
buried or encapsulated, with an accessible hydrophobic surface area of at least
350 A2.
Druggability predictions have been empirically explored using heteronuclear
NMR (nuclear magnetic resonance) to identify and characterize the binding
surfaces on protein by screening -10 000 low molecular molecules (average
MW 220, average cLogP 1.5) [20]. Screening results from 23 proteins reveal
that 90% of the ligands bind to sites known to be small molecule-ligand-
binding sites. In the relatively small sample of proteins studied, Hajduk
et al. noted a high correlation between experimental NMR hit rates and the
ability to find high-affinity ligands. Only in 3 of the 23 proteins were distinct
uncompetitive new binding sites were discovered. The authors’ postulated
that these new sites could possibly play an unknown physiological role in the
protein’s functions.
14.2.4
Estimatingthe Size of the Druggable Genome
Whilst our current knowledge may be limited in predicting a priori where

uncompetitive allosteric-binding sites may appear from a protein sequence,
we may be able to identify, at the sequence and structural levels, which targets
are more likely to be potentially amenable to modulation by druglike small
molecules from extrapolation of our current knowledge.
Using the knowledge about proteins, to which current drugs and leads
bind, we can infer the subset of the human genes and proteins that
have a high probability of being potentially druggable, that is, capable of
binding druglike small molecules with high affinity. Outlined below are
a number of methodologies and approaches that have been used to infer
the druggable portion of targets encoded by the human genome. In this
paper, we have extended the work of Hopkins and Groom and attempted
to estimate the size of the druggable human genome using three distinct
methodologies:
homology-based analysis from comprehensive survey of drugs
and leads;
14.2 The Molecular Basis of Predicting Druggability
feature-based probabilistic druggability analysis;

I 809
structure-based amenability analysis.
14.2.4.1 Initial Estimates

To gauge the number of possible drug targets in the human genome, one
should begin with a survey of the knowledge of the current modes of action
of existing drugs. In a review of the pharmacological literature, Drews [21,
221 identified 483 targets for known drugs. From this figure, Drews later
estimated the number of ligand-binding domains as a measure of the number
of potential points at which small-molecule therapeutic agents could be close
to 10 000; however, the methodology of how these numbers were derived is
not disclosed [23].
14.2.4.2 Hopkins and Groom’s Method

The first systematic survey of the druggable genome, following the publication
of the draft human genome [24,25],was by Hopkins and Groom [2]. Hopkins
and Groom attempted to identify the genes that produced potentially druggable
proteins by their membership in druggable gene families. The explicit
assumption of a gene family based analysis is that the conserved architecture
of the druggable protein domain is likely to be conserved amongst related
members of that domain’s gene family. Hopkins and Groom approached the
problem in two stages. Firstly, a database of drug target sequences from a
comprehensive survey of the literature and investigation of drug databases was
compiled. Secondly, the constructed drug target sequence database was used to
identify related members of a putative druggable gene family from the protein
domain annotation of the translated human protein sequences. Hopkins and
Groom’s analysis of the literature, the Investigational Drugs Database and
the Pharma Projects database identify 399 nonredundant molecular targets
shown to bind rule-of-fivecompliant compounds, with binding affinities below
10 pM. Whilst there is some degree of overlap with Drews’s work [21, 221,
a significant amount of redundancy was observed in the initial study. In
addition, a number of new proteins targeted by experimental drugs were
captured. Likewise, some targets for biological agents, for which modulation
by rule-of-fivecompliant compounds has not yet been shown, were eliminated
from the survey. Nearly half of the targets fall into just six major gene
families: GPCRs (G-protein coupled receptors), serine/threonine and tyrosine
protein kinases superfamily, zinc metallopeptidases, serine proteases, nuclear
hormone receptors, and phosphodiesterases. Ofthe 399 targets ofthe marketed
and experimental drugs identified, 376 sequences could be assigned to 130
drug-binding domains, as captured by their InterPro domain annotation. Of
these, 125 are domains with homologs and orthologs present in the human
proteome. The sequence and functional similarities within a gene family
assume a general conservation of binding site architecture between family
810
I members. The explicit assumption being that if one member of a gene
14 C h e m i c a l Biology and Drug Discovery
family is modulated by a drug molecule, other members of the family could

also be able to bind a compound with similar physicochemical properties.
Following the above logic, 3051 genes were identified as belonging to the
125 druggable InterPro domains and thus predicted to encode proteins
that have some precedence for inferring their ability to bind druglike
molecules.
The Hopkins and Groom’s database identifies only 120 biological targets as
the modes of action for marketed, rule-of-five compliant drugs, significantly
less than the previous estimate that launched drugs that acted on 483 targets.
Interestingly, of the vast majority of the drugs and leads identified in this
survey, about 90% are competitive with endogenous ligands at a structurally
defined binding site. This figure is similar to the rates of discovering new
binding sites, as shown by Hajduk et al. [20] (AH, personal communication).
14.2.4.3 Orth et al. Update 2004

Orth et al. [2G] based an estimate on the druggable gene families on the
InterPro domain assignments in the annotated gene-encoding loci of the 2004
release of the CCDS. The authors’ estimate the 3080 nonredundant gene-
encoded loci in the human genome predicted to be belonging to the druggable
genome with over 2950 druggable gene sequences in public database.
14.2.4.4 Russ and Lampel’s Update 2005

Russ and Lampel [27] conducted an estimate on druggable genome based
on the preliminary final assembly (Ensemble Release 35) of the human
genome where 99% of the sequence has high quality cover. The authors
found that PFAM protein domain annotation predicted fewer false positives
than the InterPro classification used by Hopkins and Groom [2], estimating
3100 druggable genes from the previously defined set of druggable protein
domains, approximately 2900 of which were predicted by both approaches.
Of the 3100 predicted genes, 2600 are covered by the consensus CCDS
annotation of the major genome databases. Extrapolation from the manual
VEGA genome annotation databases (about 40% of total genome) leads the
authors to a conservative estimate of around 2500 druggable genes. The
authors consider these assessments from the highly confident gene prediction
databases to be a lower conservative estimate of the size of the druggable
genome.
14.2.4.5 Homology-based Analysis o f Drug Targets

To expand the homology analysis methodology for identifying which targets
expressed from the human genome are likely to be druggable, it is necessary to
expand our survey to identify all the known biological targets of drugs and lead
14.2 The Molecular Basis ofPredicting Druggability 1 811
compounds. Inpharmatica commissioned the construction of two databases,

DrugStoreT"and StARLITe'", to accurately ascertain the number of biological
targets modulated by drugs and preclinical medicinal chemistry compounds,
respectively.
Inpharmatica's Drugstore is a relational database relating all FDA approved
drugs to their molecular targets and approved indication. From this analysis,
we have identified 26000 drug products which reduce to 1783 unique new
molecular entities (NMEs),ofwhich 1415 are small-molecule chemical entities,
180 are biological therapeutics (18 ofwhich are antibodies), and the remainder
are vitamins and supplements. As drug discovery has been more target
centric over the past two decades, in its research modus operandi, a key
point of debate has been how many modes of action are acted upon by
approved drugs? The first attempt to ascertain this number was by Drews,
who estimated that known drugs acted on 483 targets - the source of the
often quoted "500 targets" figures. Hopkins and Groom's analysis challenged
this figure and suggested, irrespective of polypharmacology off-target effects,
rule-of-five compliant (orally administered) approved drugs acted primarily
on only 120 modes of action. A sequent analysis by Burgess and Golden
proposed that all approved NMEs consisting of new chemical entities (NCEs)
and new biological entities (NBEs) targeted 272 proteins [28-311. Here
we propose, from the analysis of the DrugStore'" database, that all NME
primarily act on 301 drug targets, of which 238 are human proteins and
only 170 are human proteins targeted by small-molecule drugs (Table 14.2-1,
Fig. 14.2-2). Biological drugs target 59 modes of action with the currently
marketed antibody therapeutics acting on 15 human targets. Only nine
targets are currently found to be modulated by both small-molecule and
biological drugs. The remaining targets are predominately anti-infective drug
targets.
The drug target universe expands considerably if we expand our analysis to
include biological targets for which medicinal chemists have developed small-
molecule leads. Unlike the bioinformatics community which has developed a
wealth of public databases to assemble and disseminate protein and genomic
sequences, medicinal chemistry structure-activity relationship (SAR) data is
Table 14.2-1 Molecular targets of approved drugs
Class of drug target Species Number of

molecular
targets
Targets of approved N M E s All (anti-infectives and human) 301

Targets of approved N M E s Human only 238
Targets of approved NCEs Human 170
Targets of approved antibodies Human 15
Targets of approved biologicals All (anti-infectives and human) 59
812
Fig. 14.2-2 Molecular targets o f currently FDA approved drugs

(a) by number o f d r u g substances and (b) by number ofdrug
target in gene family. Figures are derived from analysis o f 1606
active ingredients (25 024 approved products) Orange Book, Sept
2002.
not publicly available in a systematic database and is spread between company

in-house data warehouse, peer-reviewed journal articles, and patents, often in
formats not easily accessible to machine processing. To survey the universe
of drug targets with known leads, Inpharmatica have created the StARLITe'"
database of bioactive compounds by extracting structures, assays, targets,
and SAR from the key medicinal chemistry journals (i.e., J . Med. Chem.
1980-2004, Bioorg. Med. Chem. Lett. 1990-2004) covering 350 000 compounds
and 1275 000 assay points. The comprehensive survey of medical chemistry
identifies 1155 known targets with at least one drug or lead compound with
a binding affinity below 10 pM,707 of which are human molecular targets
(Table 14.2-2, Fig. 14.2-3). Applying Lipinski's criteria to the compounds in
the dataset (as represented as desalted, canonical SMILES strings) reveals 587
human proteins with at least one or more compounds, which complies with the
"rule-of-five'' with a binding affinity more potent than 10 pM,which could be
unambiguously identified and assigned to a protein sequence (Fig. 14.2-4).The
extremely thorough analysis of the literature, represented in the StARLITe'"
Table 14.2-2 Molecular targets with chemical leads and tools.
Identified from the medicinal chemistry literature in
Inpharmatica's StARLITe'" database and unambiguously assigned
to a molecular target via a protein sequence
Gene family Redundant Ro5 Redundant Redundant Ro5 Redundant Nonredundant Ro5 Nonredundant
ortholog targets ortholog ortholog ortholog human human targets
(all species) targets (all species) mammalian mammalian targets (10 p M
t 1 0 KM t10 pM targets <10 p M targets t 1 0 p M t10 p M
Aminergic GPCRs 71 71 61 61 34 34
Aspartyl proteases 10 4 9 4 7 3
Cysteine proteases 20 18 19 17 16 14
Enzymes - others 149 117 131 104 102 81
GPCRs class A - others 59 47 49 38 35 30 ...,
A
GPCRs class B 12 7 10 5 5 2 i
u
GPCRs class C 20 20 19 19 10 10
54 44 46 37 34 28
2
Hydrolases
Ion channels - ligand gated 52 42 47 37 26 20 I
0
Ion channels - others 20 18 18 16 14 12 n
Kinases - others 11 8 11 8 7 6 2,
>
Metalloproteases 60 56 53 50 41 39 m
Nuclear hormone receptors 45 33 33 26 22 19 9
z.
Others 188 144 146 109 108 79
Oxidoreductases 67 63 62 58 39 37 %2
PDEs 15 13 15 13 11 11 a
Peptide GPCRs 99 72 80 59 52 42 s.
Protein kinases 101 90 87 78 75 66 09
D
Serine proteases 34 30 34 30 27 24 2
Transferases 68 46 57 39 42 30
-...
9c
Total 1155 943 987 808 707 587 -_
r"
-
814
Fig. 14.2-3 Gene Family distribution of nonredundant human

proteins with small-molecule chemical leads with binding affinities
t10 pM. Data derived from an analysis of Inpharmatica’s
StARLITe’” database.
database, doubles over in size the number of identified proteins with existing
lead matter.
Using this larger database of drug targets, which show some precedent
of modulation by small-molecule leads or drugs, we attempted to estimate
the size of the potential druggable genome based on a homology of known
drug targets. The underlying assumption in this analysis is that if one gene
family member has shown the propensity to selectively bind small molecule
modulates, other members of the gene family may significantly contain
physical-chemical and architectural properties that are also likely to bind
druglike small molecules. Proteins that have a similar sequence are generally
likely to share very similar three dimensional properties and perform similar
or related functions. If a protein therefore has a high degree of sequence
similarity to the target of a drug (or other protein that is known to be
14.2 The Mo/ecu/ar Basis of Predicting Druggability
I 815
Fig. 14.2-4 Proportion oftargets with leads observed with at least one rule-of-five
compliant compound within each gene family.
druggable) we predict that the protein is likely to be druggable too, if we

believe the binding site architecture to be conserved, Where proteins are
less closely related in sequence, it is more difficult to infer druggability.
Relatively small differences in the binding site of a protein could have a large
impact on its ability to bind small molecules. The authors recognize that
this is a simplistic assumption and is likely, if anything, to over estimate
the number of potential members of the predicted druggable subset of
the human genome. For example many individual members of the gene
family may bind distinct ligands, the molecular recognition properties of their
respective binding sites could be significantly divergent. Using the BLAST
sequence alignment algorithm to search each of the sequences against the
human genome, we identified 945 distinct genes that show homology to the
molecular targets of approved drugs at a cut off of 30% sequence identity
and E value less than or equal to lo-’. Expanding the BLAST analysis to
816
I include human proteins from the known druglike leads from the StARLITe'"
14 Chemical Biology and Drug Discovety
database, identified a 2921 protein sequence within the same sequence identity
cut-offs.
In addition to using a sequence homology approach, we also approached the
problem of identifying the druggable subset of the human proteome using a
feature-based Bayesian method.
14.2.4.6 Feature-based Druggability Prediction

Drug targets, be they targets of small molecular weight drugs or protein
therapeutics, may share common sequence-based features that are not
necessarily detectable by overall sequence similarity. An alternative approach
to using sequence-based similarity methods is to examine the presence of
sequence-based features that are enriched in drug targets compared to that
of the rest of the genome. A large set of over 100 protein properties and
features were calculated for each sequence in the Drugstore database such
as the number of transmembrane helices, signal peptides, isoelectric point,
length distribution, percentage of helical structure, antigenicity, net charge at
pH 7.4, domain complexity, subcellular localization, and so on. Features that
were enriched in existing drug targets were retained and used to construct
probabilistic Bayesian models for both small-molecule druggability prediction
and protein therapeutic druggability prediction. The implementation of this
Bayesian probabilistic scoring allows ranking of any portfolio of targets based
on their predicted druggability. The major advantage of this approach is
the independence of any prior knowledge about the examined protein, or
homology to precedented target families. The Bayesian models also hold the
advantage of being tunable to reflect specific gene families, or drug profiles.
The probabilistic models were then used to rank all sequences from the
human genome according to both small-molecule and protein druggability
as predicted by the presence of druggable features in the protein sequence.
The small-molecule model predicts 2325 gene products to be druggable with
high confidence level (i.e., achieving scores comparable with those of existing
targets).
14.2.4.7 Structure-based Druggability Analysis of PDB Structures

Following the hypothesis that druggable-binding sites can be predicted a
priori, we have developed an algorithm to analyze the Protein Data Bank
(PDB) for druggable-binding sites. Actual and putative ligand-binding sites
were respectively identified either by virtue of the presence of a ligand in
the crystal structure or by analysis of the surface of the protein structure.
A range of physicochemical properties of the identified binding sites and
cavities were calculated from the protein structures including volume, depth,
curvature, accessibility, hydrophobic surface area, and polar surface area. The
14.2 The Molecular Bask ofpredicting Druggability 1 817
algorithm was a trained set against a test set of 400 protein complexes binding
small-molecule, rule-of-five compliant ligands. From this analysis, a decision
tree was derived to predict the druggability of a binding site or cavity from
calculated physicochemical properties. The decision tree predicts whether a
cavity is druggable within the statistical confidence levels of the tree. This
method has demonstrated a91% success rate when predicting druggability on
the protein drug targets (of oral drugs as defined in Inpharmatica’s Drugstore
database of approved drugs). The method requires either an experimentally
derived structure or a high quality homology model. Ideally, because of the
inherent flexibility of many protein-ligand-binding sites, a sample of multiple
conformations is preferred. The method is scalable to be employed on the
entire PDB (December 2004 release). By removing short peptides, 27 409 files
were suitable for analysis, which were further classified into 76 322 structural
domains using SCOP [32] and DISCO base; of which 28% (21 522) of the
structural domains were found to have at least one site predicted, to some
degree, to be druggable. Because of the high redundancy in the PDB and the
high number of ligand-protein complexes reduced to a nonredundant set of
human targets, 427 proteins were predicted to contain a druggable-binding
site, with 281 of these proteins having no prior known compounds or drugs
developed against those targets. Structure-based druggability algorithms could
be automatically applied to continuously assess the stream of novel structures
determined by the structural genomic initiatives.
Combining a nonredundant set of genes from all the following methods:
current targets of approved drugs;
current targets of chemical lead or chemical tool;
sequence homology to current drug targets;
sequence homology to current chemical lead targets;
feature-based sequence probability prediction;
structure-based prediction;
sequence homology to structure-based prediction,
that were outlined earlier we can identify a total of 3505 unique genes that
are predicted with first- and second-order evidence and with high confidence
level to encode small-molecule druggable proteins of which only 170 are the
primary human targets for marketed drugs (Table 14.2-3).The results of this
combined analysis concur with the previous result estimated by Hopkins and
Groom [2] which shows that approximately 14% of the human genome could
be inferred to be potentially druggable.
14.2.5
How Many Drug Targets are Accessible to Protein Therapeutics?
If, in our explorations, the proportion of the protein targets expressed by the
human genome accessible to modulation by high affinity to druglike small
818
Table 14.2-3 Predictions ofthe size ofthe human druggable

genome
Druggability prediction method Number of

molecular targets
Targets of approved NCEs 170

Sequence homology to NCE drug targets 945
Targets of chemical leads with activities (binding affinities) 707
below 10 pM
Targets of Ro5 chemical leads with activities (binding 587
affinities < =10 pM)
Sequence homology to targets with chemical leads 2921
Feature-based druggability sequence probability prediction 2325
Structure-based prediction 427
Sequence homology to proteins predicted druggable by 3541
structure-based method (high confidence level)
Sequence homology to proteins predicted druggable by 6619
structure-based method (low confidence level)
Predicted druggable genome (high confidence level) 3505
;t Unique druggable targets from combining drug targets

targets with leads, homology to drugllead targets and
structure-based prediction.
molecules is limited how much larger is the universe for drug targets if we
expand our investigations to include targets of protein therapeutics such as
antibodies and recombinant biologicals? At the time of writing, approved
antibody therapeutics were known to act on 15 human targets whilst in total
all biological drugs in the pharmacopeia currently work via 59 modes of
action. Because of the inherently lower toxicity observed for fully humanized
antibodies and the rising rate of biological approvals, it has been argued that
antibodies may soon overtake NCE approvals [ 3 3 ] . Interestingly, it has also
been observed by studying rates of attrition that antibodies acting against novel
modes of action often show a higher chance of success in phase I1 clinical
studies than small-molecule drugs acting on mechanisms of precedence
[34-361. Thus, we attempted to estimate how many targets are accessible to
biological drugs as the targets of antibody therapies. Other criteria, such as
antigenicity are also important in developing inhibitory antibodies. However,
these have not been considered in this analysis, as they are not common to
both antibody and other protein drugs.
To estimate the number of genes expressing products that could be accessible
to antibody therapeutics, we assume that proteins are required to be located
in the extracellular matrix. We also assume that the extracellular location
is the union of secreted and transmembrane sets of proteins. Where the
extracellular location is known, this is often included in Swiss-Prot and gene
ontology (GO) [37] database annotation for the protein. Secreted proteins
can be predicted by the presence of a signal peptide whilst transmembrane
14.2 The Molecular Basis of Predicting Druggability 1 819
domains can be identified by sequence property prediction. Analysis reveals

1384 genes predicted to encode secreted proteins with high confidence level
(i.e.,predicted by multiple different methods). Ifthe confidence level is lowered
(i.e., signal peptide predicted by single method) 6560 genes are predicted to
be secreted. Our transmembrane analysis reveals that 973 genes are predicted
by multiple methods to have transmembrane domains and be located at
the plasma membrane whereby this number increases by 1407 genes which
may be plasma membrane proteins when predicted only by a single method.
Combining these results, we identified that the total number of extracellular
proteins with high confidence levels is expressed by 2287 genes. The study was
extended to identify proteins that have features similar to the current set of
biological drug targets using the Bayesian probabilistic feature-based algorithm
discussed above. Trained on the existing set of biological drug targets, 1637
gene products were predicted to be druggable via biological therapeutics with
high confidence levels (i.e.,achieving scores comparable with those of existing
protein targets). Therefore, the total number of genes predicted to encode
protein therapeutic druggable proteins is 3258 equivalent to 13% of the gene
in the human genome (Table 14.2-4).
14.2.6
Conclusion
From a comprehensive survey of the medicinal chemistry literature and by

combining a variety of methodologies - sequence homology, structure-based,
and feature-based - we have identified that approximately 3500 genes in the
human genome are predicted to be accessible to modulation by high affinity to
Table 14.2-4 Predictions of the number of genes i n the human

genome accessible to protein therapeutics (recombinant soluble
proteins and antibodies)
Druggability prediction method Number of molecular

targets
Targets of approved antibodies 15

Targets of approved biologicals 59
Secreted protein (high confidence level) 1384
Secreted proteins (low confidence level) 6560
Transmembrane predictions (high confidence level) 973
Transmembrane predictions (low confidence level) 1407
Unique, combined transmembrane, and secreted 2287
predictions (high confidence level)
Feature-based biological target sequence probability 1637
prediction
Total unique genes predicted to be accessible via biologic:a1 3258
therapeutics
820
I druglike small molecules: approximately 14% of the human genome. Of the
approximately 3500 human druggable genes, small-molecule chemical tools

or leads (with binding affinities equal to or more potent that l O p M ) have
already been identified that act on 707 of these and 170 are the primary targets
Fig. 14.2-5 Gene family distributions (a) small-molecule druggable genome (b) protein
therapeutics.
14.2 The Molecular Basis ofpredicting Druggability I 821
for approved, small-molecule drugs. While there may be many more proteins
expressed by the human genome, which may be discovered to be modulated
by small-molecule tools or drugs, the proteins identified as belonging to the
subset known as the druggable genome represent those targets we can readily
predict as having a higher confidence level of discovering a small-molecule
chemical tool than the remaining genes in the genome. Since it was first
proposed that the various physicochemical constraints on druglike chemicals
would reduce the available target space, it has been suggested that accessible
drug target space may expand considerably with the application of biologic
drugs such as fully humanized antibodies. Protein therapie approved to date
act via about 59 human targets, 18 ofthese are targeted by marketed antibodies.
With the commercialization of recombinant protein production, the number
of biological drugs receiving approval and being studied in the clinic is steadily
rising. Several commentators predict that the rise of antibody therapies may
challenge the premier position of small-molecule chemical entities as the
dominant technology of medicines [ 3 3 ] . Our analysis of the proposition of
the genome, potentially accessible to modulation by protein therapeutics such
as antibodies, is around 13% with 3258 genes predicted to encode proteins
druggable via protein therapeutics. Interestingly, 70% of all the drug targets
are also predicted to be accessible to modulation by antibody therapy. Indeed, if
we expand the analysis to compare the overlap between the antibody-accessible
druggable genome and the small-molecule druggable genome, 1516 genes
are predicted to encode proteins druggable by both small molecules and
protein therapeutics; which is approximately 45% of our current estimate of
the small-molecule druggable genome (Figs. 14.2-5 and 6).
Fig. 14.2-6 Overlap of antibody and small-molecule druggable universes.

822
I Acknowledgments
We would like to thank Colin Groom (UCB Celltech, Cambridge, UK) for
his long-standing contribution to this work. We also sincerely thank Edith
Chan (Inpharmatica, London), Robin Spencer (Pfizer, Groton), Lee Beeley
(Pharmamatters,Ramsgate), and Jonathan Mason (Pfizer, Sandwich) for their
helpful discussions in the development of this work.
References
1. A.L. Hopkins, C.R. Groom, Target Opin. Drug Discov. Devel. 2001, 4,
analysis: a priori assessment of 102-109.
druggability, Ernst Schering Research 10. I. Muegge, S.L. Heald, D. Brittelli,
Foundation Workshop,Berlin, 2003, 42. Simple selection criteria for drug-like
2. A.L. Hopkins, C.R. Groom, The chemical matter, /. Med. Chem.2001,
Druggable Genome, Nat. Rev. Drug 44,1841-1846.
Discou. 2002, I , 727-730. 11. D.F. Veber, S.R. Johnson, H.Y. Cheng,
3. J. Overington, Prioritizing the B.R. Smith, K.W. Ward, K.D. Kopple,
proteome: identifying Molecular properties that influence
pharmaceutically relevant targets, the oral bioavailability of drug
Drug Discov. Today 2002, 7, 516-521. candidates, J. Med. Chem. 2002,45,
4. C.A. Lipinski, F. Lombardo, B.W. 2615-2623.
Dominy, P.J. Feeney, Experimental 12. J.R. Proudfoot, Drugs, leads, and
and computational approaches to drug-likeness: an analysis of some
estimate solubility and permeability in recently launched drugs, Bioorg. Med.
drug discovery and development Chem. Lett. 2002, 12, 1647-1650.
settings, Adv. Drug Deliu. Rev. 1997, 13. W.P. Walters, M.A. Murcko,
23, 3-25. Prediction of ‘drug-likeness’,Adv.
5. A. Ajay, W.P. Walters, M.A. Murcko, Drug Delivery Rev. 2002, 54, 255-271.
Can we learn to distinguish between 14. W.J. Egan, W.P. Walters, M.A.
“drug-like’’ and “nondrug-like” Murcko, Guiding molecules towards
molecules?j. Med. Chem. 1998, 41, drug-likeness, Curr. Opin. Drug Discov.
33 14- 3324. Deuel. 2002, 5, 540-549.
6. J. Wang, K. Ramnarayan, Towards 15. I. Muegge, Selection criteria for
designing drug-like libraries: a novel drug-like compounds, Med. Res. Rev.
computational approach for prediction 2003, 23, 302-321.
of drug feasibility of compounds, J. 16. M.S. Lajiness, M. Vieth, J. Erickson,
Comb. Chem. 1999, I , 524-533. Molecular properties that influence
7. W.P. Walters, A. Ajay, M.A. Murcko, oral drug-like behavior, Curr. Opin.
Recognizing molecules with drug-like Drug Discov. Devel. 2004, 7,470-477.
properties, Curr. Opin. C h e w Biol. 17. M. Vieth, M.G.Siegel, R.E. Higgs, I.A.
1999,3,384-387. Watson, D.H. Robertson, K.A. Savin,
8. C.A. Lipinski, Drug-like properties P.A. Durst Hipskind, et al.
and the causes of poor solubility and Characteristic physical properties and
poor permeability, J . Pharmacol. structural fragments of marketed oral
Toxicol. Methods 2000, 44, 3-25. drugs, J. Med. Chem. 2004,47,
9. B.L. Podlogar, I. Muegge, L.J. Brice, 224-232.
Computational methods to estimate 18. I.D. Kuntz, K. Chen, K.A. Sharp, P.A.
drug development parameters, Curr. Kollman, The maximal affinity of
References I823
ligands, Proc. Natl. Acad. Sci. U.S.A. 29. C. Burgess, I. Golden, IBC Drug
1999, 96,9997-10002. Discovery and Technology Conference,
19. P. Ertl, B. Rohde, P.Selzer, Fast Curagen Corpo, Boston, 2002.
calculation of molecular polar surface 30. J.B. Golden, Prioritizing the human
area as a sum of fragment based genome: knowledge management for
contributions and its application to the drug discovery, Curr. Opin.Drug.
prediction of drug transport Discov. Devel. 2003, 6,310-316.
properties, J . Med. Chem. 2000, 43, 31. J . Golden, Towards a tractable
3714-3717. genome: knowledge management in
20. P.J. Hajduk, J.R. Huth, S.W. Fesik, drug discovery, Curr. Drug Discov.
Druggability Indices for protein 2003,17-20.
targets derived from NMR-based 32. A.C. Murzin, S.E. Brenner,
screening data, 1.Med. Chem. 2005, T. Hubbard, C. Chothia, SCOP: a
48,2518-2525. structural classification of proteins
21. J . Drews, S. Ryser, Classic drug database for the investigation of
targets, Nat. Biotechnol. 1997, 15, sequences and structures, J . Mol. Biol.
1318-1 319. 1995, 274,536-540.
22. J. Drews, Genomic sciences and the 33. S. Arlington, S. Barnett, S. Hughes,
medicine of tomorrow, Nat. J. Palo, Pharma 2010: The Threshold of
Biotechnol. 1996, 14, 1516-1518. Innovation, IBM Business Consulting
23. J. Drews, Drug discovery: a historical Services, London, 2002.
perspective, Science 2000, 287, 34. A.K. Pavlou, J.M. Reichert,
1960-1964. Recombinant protein
24. E. Lander, Initial sequencing and therapeutics-success rates, market
analysis of the human genome, Nature trends and values to 2010, Nat.
2001,409,860-921. Biotechnol. 2004, 22, 1513-1519.
25. J. Venter, The sequence of the human 35. J.M. Reichert, Protein therapeutic
genome, Science 2001,1304-1351. success rates increase with biotech
26. A.P. Orth, S. Batalov, M. Perrone, S.K. advances. Tufts center for the study of
Chanda, The promise of genomics to drug development impact report 2005,
identify novel therapeutic targets, 7.
Expert Opin.Ther. Targets 2004, 8, 36. Windhoven know they R&D enemy:
587-596. the key to fighting attrition, In Vivo
27. A.P. Russ, S. Lampel, The druggable 2005.
genome Drug Discov. Today, 2005, 37. G.O. Consortium, Creating the gene
10(23-24), 1577-9. ontology resource: design and
28. K. Davies, Cracking the ‘Druggable implementation, Genome Res. 2001,
Genome’. Bio-IT world, 2002, 1 1 , 1425-1433.
http://www.bio-itworld.com/
archive/100902/firstbase.html.
Chemical Biology
I825
15
Target Fami lies
15.1
The Target Family Approach
Hans Peter Nestler
Outlook
Chemical Biology strives to combine structural information about biological

and chemical molecules to design and discover novel molecular entities to
modulate biological processes. An integral concept is the clustering of proteins
into target families based on their structural and functional similarities. In
this chapter, we review the foundations of target families and highlight the
application of this knowledge for the efficient use of synthesis and screening
technologies to develop novel pharmaceutical agents.
15.1.1
Introduction
The sequencing of the human genome [l]marked the apex of the transforma-
tion of biology from an observational and descriptive activity to a hypothesis-
driven science. With the information about the building blocks for cells, it is
now possible to modulate and investigate the phenomenology of organisms
at a molecular level. Drug discovery underwent, in parallel, a tremendous
change from an empirical process driven by the experience of medicinal
chemists that translated pharmacological effects to changes in molecules, to a
knowledge-driven operation based on biochemistry, high-throughput synthesis
and screening, and structure-driven drug design. Yet, in spite of this evolution,
the productivity of the pharmaceutical industry has plummeted and 2004 saw
the lowest number of new drugs in history, coming to the market. Soon after
the sequences became available, discussions arose about how many of the ap-
proximately 27 000 genes that had been assigned [I]would be “druggable”, that
Edited by Stuart L. Schreiber. Tarun M. Kapoor, and Gunther Wess
ISBN: 978-3-527-31150-7
826
I is, their associated protein products could be modulated with small molecules
15 Target Families
in a directed fashion to achieve a desired therapeutic effect [2]. Considering

that most novel therapies would rely on oral administration of drugs, these
molecules have to fulfill the requirements to achieve suitable pharmacokinetic
behavior. The most quoted and commonly used guidelines are Lipinski’s
“rule-of-five” [ 3 ] and Veber’s “rotational bonds” [4]that have been based on
a statistical analysis of marketed oral drugs. Taking such boundaries into ac-
count, it has been estimated that about 10- 15% ofthe human genome would be
“druggable” [5].While this number may seem low, it should be considered that
only one-third of these mechanisms are consciously addressed and a significant
fraction of drugs, even those in development, still act through undefined molec-
ular pathways. Furthermore, the hype around the sequencing of the genome
and the assumed impact on drug discovery meanwhile has vanished as it was
recognized that biological networks are too complex and redundant to allow
control through one molecular dial. “Systems biology” tries to address this chal-
lenge by exploring the interactions of proteins and the resulting pathways of
transferring biological signals and actions. “Chemical biology” is the matching
complement in drug discovery that tries to synergize on structural relationships
of proteins to efficiently address the druggable genome (Fig. 15.1-1)[GI.
Similarities among protein structures have been investigated for a long
time, covering all levels from primary (sequence)via secondary (domain folds)
to tertiary (overall three-dimensional) structures. Investigations of tertiary
structures help predict functional sites and roles for novel proteins and to
understand enzyme mechanisms on a molecular level. Especially, through
bioinformatic analysis it has been possible to identify homologous reaction
mechanisms, even within proteins with lower sequence similarities and vari-
ous biochemical activities [7, 81, as highlighted by the cases of leukotriene A4
hydrolase and angiotensin converting enzyme (a zinc metalloprotease) which
are both inhibited by bestatin but have distinct biological roles [9]. Primary
structure investigations have been of preferential interest for evolutionary
analyses. Yet, these phylogenetic analyses have been crucial in defining target
families, groups of proteins of pharmaceutical interest having a similar gene
and therefore protein sequence. Kinases are the prototype of a target family
as their active sites are structurally highly homogenous and bind the same
cosubstrate, adenine triphosphate (ATP)(Fig. 15.1-1). Other gene families in-
clude G-proteincoupled receptors (GPCR),ion channels and transporters, and
proteases although the structural diversity among these families is higher and
therefore they group in structurally and mechanistically diverse subfamilies,
such as cysteine or metalloproteases.
“Chemical biology”, as we term these target family oriented concepts, has
reshaped all the stages of drug discovery and today it is a widely used discovery
paradigm in the pharmaceutical industry. The focus as well as the impact
of using target family knowledge has definitely been on the early stages,
from target identification via structural understanding through lead finding
efforts. The later stage of the drug discovery process, the optimization of
IS.7 The Target Family Approach 1 827
Fig. 15.1-1 Distribution ofgenes in market with a representation significantly

representative target families, drug higher than in the human genome. Analysis
candidates by target families and drugs by furthermore shows the upcoming o f kinases
target families (data sources [l] and in drug discovery with a significant
Phar m a p rojects ”) . W h ile a sign ifica nt percentage o f drugs in clinical trials, while
fraction o f molecular targets is still proteases and ion channels are represented
unknown, CPCRs have been identified as according to their occurrence in the human
the most prominent target o f drugs on the genome.
lead compounds into drug candidates, is not as amenable to technological

solutions that can be provided through target family concepts as the challenges
become very specific for each lead series. Still, transferring insights and
understanding compound interactions with targets and other proteins help
avoid entering dead-end alleys of modification. As the impact of the latter
aspects of chemical biology is hard to track (mostly due to the fact that no
“what-would-have-happened-if control data are existing) and is best shown
”
by anecdotal examples, we will focus on target family ideas that enable the early
stages. We will demonstrate their application and applicability to representative
target families, in this chapter. We will pay particular attention to the core
aspect of “chemical biology”, the matching of chemical and biological spaces
[6]. For a drug molecule to exert its pharmaceutical action, it is crucial that
the molecular shape complements the cast offered by the target protein. This
fact has been recognized first by Emil Fischer who phrased it as a “Key-Lock’’
828
I principle [lo], being unaware of the dynamic and flexible nature of protein
15 Target Families
structures, and today we understand the interactions of two molecules more

in a “Hand-Glove’’fashion with strong elements of induced fits [ll].We term
the ensemble of available interaction shapes in the genome the “biological
space”, while the “chemical space” is considered the ensemble of shapes
offered by small molecules. With our structural understanding constantly
evolving through molecular biology and crystallography,the efforts to rationally
design matching chemical structures increased and led to successes in drug
development, such as the HIV-protease inhibitors. Rational design depends
on valid starting points and structure-activity relationship (SAR) and is quite
powerful for the optimization and understanding of structural motifs that
trigger activity and selectivity at the protein target level. Rational design suffers
shortcomings when we attempt to address the challenges of finding novel
starting structures for optimization. High-throughput screening efforts try to
tackle this challenge by playing a high-number trial-and-error game. As the
screening collections reflect the target history of the respective company, they
often cover narrow aspects ofchemical space. Combinatorial chemistry claimed
to fill the chemotype gaps in the collections and to cover the chemical space with
diverse structures. Despite the tremendous number of compounds produced
at the peak of combinatorial chemistry, the libraries fell short of the promise, as
the libraries offered diversity around a point in space, thus densely populating
this area but neglecting others completely. This effort can be imagined as
putting a small rubber ball on the tip of a needle and trying to fill and represent
a large lecture hall in this manner. Thus, compound libraries can be very
powerful for exploring the match of chemical and biological spaces once an
active compound has been identified. Unfortunately, combinatorial Chemistry
was limited in its early years to a small repertoire of synthesis that could be used
and therefore a limited structural diversity that could be addressed and started
often from a biologically na‘ive structure, thereby populating unpromising areas
of chemical space. We will revisit these aspects and attempts for resolving
issues when we discuss the lead finding approaches, later in this chapter.
15.1.2
Understanding Biological Space
As mentioned above the key concept of “chemical biology” is the structural

matching of chemical and biological spaces. Thus, the first important element
must be the understanding of the biological space. Many questions have to be
addressed: Which proteins cluster into families and are related to each other at
a structural level?Which genes are expressed under which physiological setting
and how do their levels respond to insults on the system? Are the expressed
proteins functionally active, on which ligands do they exert their actions, and
is their functioning dependent on their subcellular distribution? With the
sequencing of the human genome, the blueprint of human physiology became
75.7 The Target Family Approach I 829
accessible: All proteins can be enumerated at the gene level and classified on
the basis of their sequence homology by bioinformatic tools [I].However, this
comfortable straightforward picture is complicated by the fact that genes can
be expressed in various forms, but the target family classifications hold up in
a first approximation.
In spite of the successes at the genomic and proteomic levels, the identifica-
tion of novel protein targets for modulation does not proceed at the expected
pace as the proteins do not act as isolated entities but as complexes in an
almost overcrowded environment. To exert their biological effect, the individ-
ual entities enter into dynamic physical interactions with each other and our
textbook knowledge about kinetics and thermodynamics does not necessarily
stand up to the task because of the high concentration and high viscosity of
the cytoplasmic space. Furthermore, the monitoring of gene expression and
protein analysis does not reveal the complete picture about their respective
binding partners. Today, we are still ignorant about many protein/protein-
and protein/ligand complexes, such as GPCR agonist, ion-channel modulator,
or protease-substrate pairs that associate and dissociate in a cell and are re-
sponsible for biological activity. Even in cases where we know the respective
binding partners, we are a long way from understanding the structural basis
and dynamics of these interactions. Structural biology methods such as crys-
tallography and nucleous magnetic resonance (NMR) have taught us much
about soluble proteins, such as kinases and proteases, but gaining structural
insights about membrane-bound proteins such as GPCRs and ion channels,
remains difficult. To date, only one structure for a bacterial GPCR and three
for ion channels have been reported [12-151. We will discuss in this section
the approaches to identify physiological and artificial ligands for proteins as
well as to gain structural knowledge about their interactions.
15.1.2.1 Charting Biological Space - Structural Biology and Informatics

Chemical biology relies heavily on a structure-driven rationale to make
lead finding efforts within target families more efficient and to anticipate
cross-reactivities between target family members on the basis of structural
similarities. After the sequencing of the human genome opened the way for
comparing proteins with each other at a sequence level, attempts to correlate
primary sequence to three-dimensional structure intensified especially for
membrane-bound proteins to provide a counterpart to structural biological
information available for soluble proteins. Sequence comparisons within the
protein families of GPCRs [16, 171, ion channels [18],and kinases [19] yielded
phylogenetic trees with functional and structural clustering according to ligand
types, especially for the GPCR family [20].
While phylogenetic analyses give some functional and structural hints, the
resolution of these analyses does not allow prediction or assignment of ligands
or substrates. For families of soluble protein targets, the situation for gaining
structural knowledge is quite comfortable. A wealth of crystal structures in free
830
I as well as inhibitor-bound forms is available for proteases and kinases, very
15 Target Families
often for a variety of ligands to each protein. This information is used intensely
for inhibitor optimization purposes but also allows structural comparisons at a
target family level.These analyses are based on structural overlays ofthe protein
structures within the target families, respectively the subfamilies of proteases,
and the affinity and repulsion of various small molecular probes, such as water
or methanol, to the active site’s surface. The studies provide “target family
landscapes” that show the relationships of the target family members at a struc-
tural level [21-231. The landscapes provide the tools necessary to understand
the cross-reactivities of inhibitors with closely related proteins or to assess the
likelihood of success for transforming an inhibitor for a particular target into
an inhibitor for another family target (Fig. 15.1-2). Furthermore, they allow
selection of closely related proteins as structural surrogates for those family
members, where crystallographic information is not available. This so-called
homology modeling is of crucial importance for understanding the structural
space covered by membrane-bound proteins, such as GPCRs or ion channels.
Using the rhodopsin GPCR structure [12] as a template and target family
homology, it has been possible to get topological information about the bind-
ing sites for many GCPRs to foster an understanding of the binding modes
of ligands [25].At a resolution of about 3.5 A, which can usually be achieved,
it is possible to understand differential binding of ligands to the receptors
and to rationalize their activation, as demonstrated by Goddard et al. in a
homologous series of ketones activating the olfactory receptor 912-93. Fur-
thermore, the differences in activation between mouse and human orthologs
could be assigned to a Ser105/Gly105 mutation [26]. This study also points to
an instrumental aspect for the structural modeling of membrane proteins. In
addition to sequence homologies, ligand-binding strengths are used to refine
the topologies and interactions. If combined with molecular dynamics, refine-
ment of the loops connecting the transmembrane helices as demonstrated
by the program PREDICT [27],the accuracy of the models becomes powerful
enough to perform virtual screening and to discriminate between ligands and
their binding modes [28, 291. In the ion-channel field, homology models can
be based on three crystal structures of various potassium channels, two of
which show the channel in the open [13, 141 and one in the closed state [IS].
Although ion channels are multimeric proteins and structurally more diverse
than GPCRs, good models have become available using the three structures
and ligand-activity information, as highlighted by the possibility of predicting
hERG blocking activity of ligands [30, 311. The hERG channel is of general
pharmacological interest as an antitarget, because blocking this channel can
induce fatal cardiac fibrillation. Thus, most biological data is available and the
homology-based models, even though they are built on the bacterial MthK
channel [13], have meanwhile reached the same accuracy as models derived
from SAR data [32]and can guide chemical optimization to achieve specificity
of ligands. Beyond the prediction of ligand-binding, homology models help
the functional analysis of ion channels. In a recent example, the gating of the
75.7 The Target Family Approach I831
Fig. 15.1-2 Assigning membership of a throughout the kinome [24]. To gain insight
protein to a protein family and analyzing the at the structural level, three-dimensional
structural relationships can be achieved by structures must be aligned and compared.
two major concepts. Starting from protein The comparison involves studies o f
sequence information, the similarities o f the interactions with various probes such as
sequences can be investigated and proteins amides, carbonyl, or water. The proteins are
can be clustered in phylogenetic trees. positioned in a cube and the interaction o f
These analyses were the basis o f the the probes at various positions in the cubes
assignments o f target families as reported, is measured. The statistical analysis o f the
for example, by Venter et al. [I]. At a higher interaction surfaces provides the
resolution, such trees can also be generated dimensions for separating the proteins in
within gene families. While these trees can structure-based landscape maps [21, 221.
provide information about the evolutionary The protein relations within these maps
relationships, the relations do not translate reflect the affinity profiles toward small
into structural similarities at a detailed level, molecule ligands and can be used t o
as shown by the distribution of affinities rationalize specificities.
toward various small molecule ligands
832
I Kir6.2 channel by ATP could be explained at the atomic level [33]utilizing the
15 Target Families
structures ofthe open Kir3.1 channel [14] and the closed KirBacl.1 channel [15].
15.1.2.2 Understanding Biological Machines - From Structure to Function

With the structural knowledge acquired, the second challenging aspect is to
establish the biological relevance of target family members. Gene expression
analysis is a very powerful tool to identify changes of gene regulation under
various physiological and pathophysiological conditions. The mRNA levels
in cells and tissues give indications about which proteins could be relevant
for a specific biological response. However, where the genomic tools have
been quite successful in identifying candidate targets in the GPCR and kinase
families where, the activity of proteins is tightly correlated to the expression
levels. There are other gene families, such as proteases, that are not regulated
by gene expression levels. Proteases usually have rather constant expression
levels as proenzymes over a broad range of physiological conditions and are
activated irreversibly by proteolytic cleavage, a characteristic that is important
for quick responses through activation cascades and discriminates them from
other gene families. Thus, while we can deduct proprotease levels from the
gene expression patterns, we cannot infer proteolytic activity levels from
them. The fact points directly to a more important challenge that cannot be
resolved at a genomic level: How do we find the interaction partners for our
target proteins and what are the structural determinants of the interactions?
Although phylogenetic analysis allows some classification in structural and
functional terms, the question concerns all target families and the approaches
used are termed deorphaning for GPCRs, phosphoproteomics for kinases, and
substrate mapping for proteases. All processes require tedious work, but can
be rewarding by yielding structural knowledge that can be employed in lead
discovery and optimization.
Orphan GPCRs are receptors without known agonistic or antagonistic
ligands. As GPCRs are usually identified on the basis of sequence homologies,
most of the GPCRs have no pharmacologic function or ligands associated
at the time of their identification. To find such ligands and later on elicit a
biological response, GPCRs are cloned and overexpressed with linkage to easily
detectable reporter genes and are screened against a collection of known signal
transmitters or dedicated libraries. Especially with the evolution of screening
technology for GPCR, it is possible today to deorphan many GPCRs, either
with their endogenous ligands or synthetic analogs. The identified ligands
give insight into the structural requirements for binding, information that can
be used to refine the above-mentioned homology models, and can be used as
tools to elucidate the biological functions [34]. Fortunately, the endeavor of
deorphaning GPCRs is supported by the existence of many GPCR targeted
drugs. As GPCRs are the endogenous targets to be addressed, because they can
be addressed extracellularly, the majority of drugs in the market are directed
to GPCRs (see also Fig. 15.1-1).These compounds can be applied to modulate
15. I The Target Family Approach 1 833
GPCR action and it is a valid assumption that many of the “orphan drugs” will
show to be GPCR modulators, thus expanding the toolchest of deorphaning
agents.
For kinases and proteases, the search for substrates may seem more
straightforward, as these enzymes act on and transform other proteins.
Phosphoproteomics has been established for kinases to identify interaction
partners at the protein level on a genomic scale [35]. Basically, cell cultures
are incubated with 32P-ATPand the cellular extracts are analyzed by two-
dimensional gel electrophoresis. As all kinases can use ATP as a substrate,
the phosphorylation patterns become very complex and do not point to an
individual kinase. To achieve specificity in detection and to avoid the heavy
use of radioisotopes, antibodies reacting to the phosphorylated proteins are
required. While nonspecific phosphoserine or tyrosine recognizing antibodies
are available, they pose the same challenge deconvoluting the specific
phosphorylation of one substrate by a specific kinase. Sequence-specific
antibodies can be raised against the phosphorylated peptide epitope [ 361.
To identify the epitopes, combinatorial peptide libraries are incubated with
purified kinases and 32P-ATP.The phosphorylated peptides can be identified
by microradiography and Edman degradation [37, 381 and can be used for
raising the antibodies. The gained sequence information could be applied for
designing selective inhibitors addressing the substrate-bindingpockets instead
of the ATP site, an approach that is currently not followed, as the peptide-
binding sites are not as distinct as for proteases. While antibodies reveal
information on the phosphorylation state of a protein, it remains unclear
which kinase is responsible for the phosphorylation at a specific position. In
a complementary approach, Shokat et al. were able to track phosphorylation
substrates for individual kinases, using kinases with an extended ATP-binding
site and a bulky ATP derivative. As only the mutated kinases are able to use the
bulky ATP analog, only the substrates of this kinase will be phosphorylated at
the specific phosphorylation sites [39]. Taking the information from all these
approaches together, it is possible to decipher the signaling pathways of the
kinome and to derive structural insights from the substrate sequences, which
could be translated into inhibitors and drugs.
Tracking protease activity remains one of the major challenges. As
mentioned earlier, gene expression levels do not correlate tightly with the
activity of a protease and even monitoring tools like in situ hybridization
cannot elucidate the protease activity in tissues or cellular systems, as the
antibodies employed do not often discriminate between the proenzyme and
activated proteases. Recently, efforts to image protease activity in a cell have led
to activity labeling probes, that act as suicide substrates and lead to fluorescent
tagging of the active site of active proteases [40].Currently, this technology is
limited to proteases that allow for covalent attachment of the probes, namely,
serine and cysteine proteases that act through a nucleophilic substitution, and
it does not reveal the proteins that are cleaved by the protease. Unfortunately,
straightforward labeling approaches as for kinases are not suitable, as no
834
I additional moieties are introduced. Therefore, alternate approaches based
15 Target Families
on two-dimensional gel electrophoresis have been devised that allow either

the differential labeling of substrates or utilize the differential mobility of
substrates and cleavage products after digestion. The identities of the proteins
are determined by mass spectrometry and sequence analyses, although these
technologies do not reach a resolution that would allow determination of the
characteristics of the protease selectivity pockets. For the first approach, cell
extracts are divided into two parts and in each portion the proteins are labeled
with a fluorescent dye, using different dyes for the portions. One fraction is
subjected to proteolytic digestion, while the other fraction remains untreated.
After mixing of the portions and electrophoretic separation, substrates can
be identified through the varying color of the spots [34]. In the second
approach, a cellular extract is separated by electrophoresis in one dimension.
After proteolytic digestion in the gel, the protein mixture is separated in
the second dimension where the cleavage products show a different mobility
from the parent proteins [41].While the first approach allows for analysis at a
proteomic level and under various conditions, the latter approach allows a direct
correlation of the cleavage peptides to the parent proteins. We use the insights
into the biological space of the target families to select screening collections as
well as to define specificity requirements for target family members to build
appropriate profiling panels. To gain a more detailed insight into the structural
parameters controlling substrate selections, peptide libraries have been used
intensely. Proteolytic digestion of such libraries that commonly contain hexa-
to octapeptides returns ensembles of peptide substrates [42].These substrate
ensembles carry pharmacophoric information of the substrate pockets as well
as on the specificity of these pockets. Together with the knowledge about
the preferred p-strand geometry of protease inhibitors [43] and the ensuing
privileged scaffolds, this information can guide protease inhibitor design.
15.1.3
Exploring Chemical Space
As mentioned in the introduction,the expedition through biological space with

small molecules has gone through several stages, swiveling between post- and
presynthesis selection of chemical structures [44]. From a purely empirical
level led by phenomenological studies without guidance from structural
information, through a phase of strong desire to rationally design drugs
via the high-number trial-and-error games of high-throughput screening and
combinatorial chemistry, we have reached today a stage where chemical biology
strives to integrate knowledge and technologies in the quest of finding novel
starting points for biological space exploration (Fig. 15.1-3). The achievements
of the past are not forgotten, but are used today in a biologically conscious
combination, which is exemplified in the novel lead discovery approaches that
were established in the last 5 years.
15. I The Target family Approach I 835
Fig. 15.1-3 Schematic visualization o f the diversity and often mismatch to biological
various concepts to address chemical and space. Chemical biology approaches
biological space (shaded areas) in drug combine the technologies established for
discovery. Medicinal chemistry focused on array synthesis with choosing appropriate
compound series (red dots) that had shown starting points for the libraries. Focused
activity in pharmacological assays and libraries start from known active
compound optimization was driven by a compounds. Scaffold hopping (blue arrows)
tight feedback from biological experiments, and morphing (green arrows) attempts
leading to a focused nonarrayed addressing evolve known structures by searching for
of chemical space. The combinatorial close neighbors or by combination of
promise was t o systemically explore the elements o f two compound series. Fragment
chemical space with diverse arrays o f approaches identify chemical motifs with
compounds (blue dots) to find the suitable biological activity that can provide novel
starting points. Analysis o f combinatorial starting points (flags) for arrayed synthesis.
chemistry libraries showed their limited
15.1.3.1 Building on the Established - Privileged Scaffolds

Combinatorial chemistry had raised the expectations of solving the challenge
of making the complete chemical space available for testing. Yet, it was
quickly realized that this hope was futile. Calculating the numbers of possible
chemical structures that would be considered druglike, for example, based on
836
I carbon, hydrogen, nitrogen, oxygen, sulfur, and phosphorus with molecular
15 Target Families
weights below 500, estimates reached ballpark figures of 10'' [45]. Even if
we assume that we could represent this space through 1%of the structures,
an estimate that is made often for representative selections from compound
sets, we are still looking at structures. The material requirements for
a single representation of each structure go beyond the resources available
in the known universe. Besides the disillusioning caused by the numbers, it
was soon recognized that compounds from combinatorial libraries were often
inactive or poorly active on biological molecules unless they were derived from
known active compounds. The structures were based on chemical feasibility
and therefore densely populated the regions of chemical space offered by the
scaffolds. With the insight that combinatorial libraries would not be capable
of addressing the biological space and would even fall way short of filling
the chemical space even within the boundary of molecular weights below
500, the utilization of combinatorial chemistry and parallel synthesis shifted
from a diversity approach to densely populating chemical space around proven
starting points, compounds with documented biological activity.
The literature and database on marketed drugs provide many of these
starting points. The analysis of drugs in the market and development revealed
that a limited set of 32 frameworks formed the basis of more than 50% of
the marketed drugs [4G].Although this analysis, like all retrospective studies,
may be biased toward GPCR activity modulators that represent a significant
fraction of drugs in the market, the study underlines two aspects. First, up-to-
date we have explored only a very limited subset of chemical space in our drug
discovery efforts, but remaining within this space makes us quite successful.
Secondly, nature may not be as structurally creative and tolerant as it has been
assumed and therefore biological space may be not as diverse as envisioned.
Beyond these points, the bias toward GPCR ligand may not be as limiting
as it may seem, as GPCR through their subfamilies are binding a variety of
structural motifs, such as nucleotides, lipids, and peptides, and small molecule
ligands like nicotinic acid or dopamine [47]. These ligand types are actually
shared with other target families and therefore the structural motifs from
the drugs in the market can be transferred to drug discovery of other target
families that may seem unrelated at first glance, such as nucleotide mimics
for kinases and peptide mimics for proteases. Although we are using a target
family approach, molecular frameworks may be the uniting concept between
target families, a fact underlining the importance of structural analysis and
knowledge gathering discussed earlier.
These insights have reshaped our thinking about library synthesis and high-
throughput screening and lead to the concept of focused target family libraries
to improve screening efficiency. Focused screening sets provide, if constructed
appropriately, multiple advantages. Firstly, they reduce the cost and efforts of
screening campaigns and address the throughput limitations of some assay
types. Second, high-quality activity data are gathered from the beginning as
the smaller compound numbers allow measuring of multiple data points per
75.I The Target Family Approach I 837
compound and thus reduce false positive and negative occurrence. Third, they
provide higher hit rates and thus SAR from the initial screening and provide
guidance for chemical programs directly. Yet, a delicate balance between
focused screening and the chance for serendipity remains to be maintained,
especially to address the challenge of discovering novel chemotypes that enable
securing an intellectual property position and exploring novel interfaces of
chemical and biological spaces.
15.1.3.2 A Journeythrough Chemical Space - Focused Libraries and Scaffold

Hopping
The heavy use of privileged scaffolds leads to an incestuous reinvestigation
of established structures. While this may be advantageous for efficiently
optimizing lead structures toward drug candidates as we are moving on
known terrain, it also limits our ability to resolve old issues or to find
new activities. It is commonly understood that similar chemical structures
elicit similar biological responses and we base our optimization strategies on
this concept [48]. Yet, the investigation of target families and the ensuing
structural investigations highlight one pitfall: If two similar molecules cause
a similar response on the target, then we have to assume that two structurally
similar targets respond to a molecule in a similar way. Especially for kinase,
the prototypic target family, we observe this phenomenon with significant
activities of one compound on several kinases. Most known kinase inhibitors
act as competitors of ATP, the universal cosubstrate for all kinases, and
therefore frequent hitters are quite common in high-throughput screening of
kinase inhibitors [49]. In a recent investigation, the binding affinities of 20
structurally diverse kinase inhibitors that are in clinical trials or marketed drugs
were investigated against a panel of 113 kinases distributed across the kinome.
The study highlights that even “selectivity”-optimized kinase inhibitors are
a long way from being selective and hit targets across the kinome [24]. The
kinome maps are phylogenetic trees based on sequence similarities, and we
have already discussed the shortcomings of phylogenetic analysis for high-
resolution structural grouping. Inhibition profiles of series of compounds
can give us guidance for structural clustering of kinases that is necessary to
devise selective and potent inhibitors [23].Taking the structural similarities of
proteins and especially their ligand-binding sites one step further, we realize
that kinases are not the only proteins interacting with nucleotides, such as ATP.
A large group of GPCRs binds to nucleotides and their modulators bear strong
structural similarities to kinase inhibitors. Their scaffolds are interchangeable
and the activity of kinase inhibitors is often observed on nucleotide-binding
GPCRs, most likely being an additional factor of side effects ofkinase inhibitors
observed in physiological settings. However, as the nucleotide-binding sites of
GCPRs are structurally more diverse, the problem of cross-reactivities are not
as pronounced, and other GPCR subfamilies do not suffer as strongly from
ligand promiscuity.
838
I 15 Target Families
As we have gained more and more structural insights, the rational design
of lead structures and the virtual screening of compound collections or
even virtual compound collections have gained tremendous importance.
While the methods have become more sophisticated over the years, the
challenges of making extrapolations from known chemotypes and data,
remain. With the advent of combinatorial chemistry molecular diversity was
one of the predominant themes. Although many measures for diversity
have been devised, the “Tanimoto” coefficient being the most renown, the
results depend heavily on the descriptors used to span the chemical space.
Furthermore, coming from a structural diversity assessment the measures do
not reflect the diversity with respect to the targets. Until today, the development
and selection of suitable descriptors for the chemical space remains a
challenge: An exhaustive enumeration of molecules in the “druglike” space
is not feasible, therefore all the descriptor sets in use focus on specific
applications and pharmacophoric subregions of chemical space. The use of
the above-mentioned “privileged fragments” as virtual building blocks for the
enumeration of structures, constitutes one approach that has proven useful
for the design of target family oriented libraries (Fig. 15.1-4). Utilizing these
scaffolds, for example, fused heteroaromatic cores for kinases and nucleotide-
binding GPCRs, offers the ability to target the libraries toward the respective
protein families and ensures the stability of the computational methods
through the similarity of the generated structures. The targeted libraries
usually represent 200- 1000 compounds around a given scaffold, giving a high
certainty in assessing whether the elaborated chemotype is suitable for a given
target or target family.
The privileged fragments mimic in most cases the natural ligands.
This makes kinases and nucleotide-binding GCPRs quite suitable to this
approach and the scaffolds used cannot deny their pedigree. In addition
to these ATP mimetics, kinases accept another class of ligands, “hinge-
binders”, out of their catalytically inactive conformation. This conformation
has been termed DFG-out conformation, due to the observed orientation
of a loop containing the amino acid triplet aspartate-phenylalanine-glycine.
This binding mode was unexpected but is used by many selective kinase
inhibitors, such as Gleevec. The other subclasses of GPCRs, such as amine
or peptide-binding GPCRs, accept tertiary amines or dipeptide ligand mimics.
Peptidomimetic approaches are used heavily to build protease scaffolds.
Selective protease inhibitors are quite straightforward to be obtained because
of the substrate variety and specificity of the proteases. However, the
concept of privileged scaffolds does not carry far. The unifying element
in protease substrates is the extended p-strand conformation that allows
interactions with four to six subpockets in the protease active site [43].
Mimics for this conformation have been developed but they still lack universal
applicability. Unlike the scaffolds for kinase or GPCR ligands, the cores
of protease inhibitors, like the peptidic backbone in the substrate, do not
contribute the majority of binding energy, and are therefore not crucial for
15. I The Target Family Approach I 839
Fig. 15.1-4 Examples of privileged shown in the example o f the neuropeptide Y

scaffolds and their relation to target antagonist. Structural orientation through
families. Drugs and compounds in discovery cyclic structures is also used in more
tend to mimic the natural ligands. For compact ligands, like the CB1 antagonist
example, protease inhibitors, although they rimonabant, where the template establishes
are diverse in structure, mimic the p-strand the correct orientation o f the lipophilic
peptide conformation common to all residues to the tertiary amine. The same
protease substrates t o orient small groups scheme helps position the two lipophilic
into the binding pockets. Some ofthem, in residues ofthe MAP p38 kinase inhibitor
addition, address the catalytic residues with relative t o the ATP mimicking pyrimidine
covalent binding (e.g., Pranalcasan or GW amine. Inhibitors binding to the open
311616) or by mimicking the transition state conformation o f kinases, like gleevec or
of proteolysis, like saquinavir. Even protease iressa, show a more extended shape. In
inhibitors not reaching through the catalytic addition to addressing some ATP
triad use cyclic scaffolds to achieve and interactions in the active site, they also
extend conformation, like the factor Xa reach into the peptide substrate region and
inhibitor [50]. With this characteristic, the therefore have to mimic the strand
structures come close to those o f conformation o f phosphorylated peptides as
peptide-binding GPCR antagonists, as well.
840
I affinity to the target (although they may severely affect the pharmacokinetic
15 Target Families
properties of the inhibitors). The energetic drivers of proteaselinhibitor

binding are the interactions in the subpockets, determining activity and
selectivity of the inhibitors. Recently, these pockets were probed directly
with molecular fragments that are linked to each other upon showing
affinity to the targets. These fragment-based approaches will be discussed
below.
As mentioned before, the design based on privileged scaffolds has an impact
on the novelty of the discovered molecules. This problem is augmented by
the fact that the virtual screening tools that are used today tend to favor
intrapolation to known or closely similar structures over extrapolation to novel
scaffolds. To circumvent these issues, which became especially prominent in
kinase-directed drug discovery, the concepts of “scaffold hopping” or “scaffold
morphing” are applied (Fig. 15.1-5).In both exercises, the matrices comparing
inhibition or affinity across targets and compounds [23,24,49,52] are crucial to
support the selection of appropriate starting points. Scaffoldhopping describes
I Scaffold hopping-major variations discovered in silico

I
Starting probe
(described active against
5-HT3a in MDDR)
~ ~
Scaffold morphing in biology-structural variation to modulate function
Lo+ Atropin
w -&
muscarinic cholinergic
‘5.,
N\ 1
oA \ \ o
OH
receptor antagonist
(plant alkaloid)
Cocaine
W:L
dopamine receptor
antagonist
acetylEpibatidine
choline receptor very fast deathAfactor
Anatoxin \ I (plant alkaloid)
(poison frog) (cyanobacteria) 0
Fig. 15.1-5 In silico scaffold hopping and biological defense. While cyanobacteria as
biological scaffold morphing. Starting from monocellular organisms use only cytotoxity
a bioactive probe reported as active against for defending themselves, multicellular
the 5HT3A receptor in the MDDR, about organisms have fine-tuned the activity of
120 000 records ofthe MDDR were searched tropane-like molecules to affect the central
using relaxed similarity requirements. The nervous system of natural enemies, while at
discovered chemotypes provide novel ideas the same time being resistant to the
for chemistry [Sl]. The bicyclic structure poisons. Yet the successful bicyclic amine
evolved to address multiple targets for was maintained as a core ofthe molecule.
IS. I The Target Family Approach I 841
a virtual screening technique that uses rather loose similarity boundaries. To

assess the similarity of molecules, Boolean strings of structural property de-
scriptors, so-called fingerprints, are compared with each other. One of the
metrics of similarity is the “Tanimoto coefficient”, which compares numbers
or present and absent bits in the strings to the total number of bits set.
If the fingerprints are based on two-dimensional structural descriptors, like
frameworks and small fragments, compounds with a Tanimoto coefficient of
larger than 85% are usually considered similar. At this similarity level, com-
pounds retrieved from database similarity searches are expected to be active
on the same target [53]. For scaffold hopping, the similarity boundaries are
loosened to a 60-70% level compared to the starting structures. The resulting
compounds are clustered according to the similarity of their scaffolds. These
scaffold clusters are investigated and if found active used for compound library
design. Usually, the resulting compounds carry structural elements of the
starting molecules, which also serve as anchor in the target protein [51, 541.
The description of molecules in two dimensions basically reflects the con-
nectivity of the atoms, which is useful for fast searches of large databases
starting from known structures. However, the interaction with the biological
target occurs in a three-dimensional space and currently, we assume that
the target recognizes more properties of the ligand than individual atoms.
Thus, virtual screening is often performed using 3D-pharmacophore models.
These pharmacophore models are rather straightforward to derive, if detailed
structural information on the target is available. The structural information is
then translated in a cast that is used to select fitting molecules. The shapes and,
especially, electrostatic properties can be refined by information on ligands
to the target proteins. For kinases, the addition of the shapes provides an
enveloping shape for the ATP-binding pocket that can be addressed through
screening [XI. Using ligand information for the building of pharmacophore
models comes especially into play when little or no structural information on
the target is available, such as for GPCRs and ion channels. We discussed above
the approaches for structural prediction based on sequence similarity, but in
reality virtual screening of pharmacophores derived from ligand-activity rela-
tionships are providing more accurate information. The need for ligand-target
information is addressed by databases that collect and consolidate informa-
tion from the literature in a target family oriented fashion. Under the target
family paradigm, the crystallographic information for GPCR and ion channels
discussed above 112-151 is used to template the pharmacophore models. Al-
though these ligand-based models are more challenging to build and are not as
accurate as the models based on structural information, they are - in part also
because of their fuzziness - quite useful in scaffold-hopping approaches [55].
The observation that structural elements are conserved even through
changing scaffolds led to the idea of “scaffold morphing”. Several scaffolds
with proven activities are overlaid and combined to yield novel chemotypes.
In addition to generating novel chemical matter, it is hoped to combine
favorable properties from the individual scaffolds while loosing the undesired
842
I characteristics in the process. Scaffold morphing is not unknown to medicinal
15 Target Families
chemistry and biological evolution. If we recall the previous discussion on

protein homology and phylogeny in this chapter, we realize that nature uses
combinations of functioning domains to provide novel three-dimensional
structures. The best domain combinations survive the evolutionary pressure.
Thus, it should not really come as a surprise that we find small molecule
motifs repeated with minor modifications in various natural products. Once
a successful scaffold was selected the biochemical synthesis pathway had an
evolutionary advantage and propagated itself into various organisms. Medicinal
chemistry uses iterative modification of bioactive structures in its efforts
to provide selective and pharmacokinetically optimized compounds, once a
suitable starting point for variation has been found. Although thus established
for a while experimentally, the adaptation of “scaffold morphing” ideas and
algorithms to lead finding and virtual drug discovery has been tackled only
recently and the success of generating structural diversity for finding novel
starting points and entering novel regions of chemical space remains to be
evaluated. The observation of “privileged fragments” across target families
in literature as well as in the discussed virtual screening approaches, led to
novel screening approaches that investigate the interactions of such fragments
instead of “full-size’’ligands with their protein counterparts.
15.1.3.3 Putting the Pieces Together - Fragment Approaches

For a long time it was thought to be impossible to detect the interaction of
small molecular fragments with target proteins, as the energetic determinants
of small fragments binding to a protein surface or pocket were believed to
work against high affinity interactions. Studying protein structures and the
energetics of protein-ligand interactions leads to a different perception:
First, it should be possible to identify weakly binding
molecules by measuring the affinity of a ligand to the protein
instead of attempting to influence the biochemical behavior in
competition to natural ligands, as these molecules only have
to interact in a two-way equilibrium instead of a three-way
competition.
Second, the required molecular size for ligand protein
interactions in defined pockets has been overestimated. A
recent study by Kuntz et al. shows that even small molecules
can form tight complexes with proteins. Each heavy atom can
contribute as much as 1.5 kcal mol-I in binding energy or a
10-foldincrease in affinity [56].
Third, it is not so much the enthalpy contributions but
entropic aspects that determine the suitability of fragments to
serve as anchors for lead optimization. “Molecular anchors”
show an energetic “stability gap” between the best binding
conformation and the second-best binding mode.
15. I The Target Family Approach 1 843
Promiscuously binding fragments show a more or less

continuous distribution of energy levels for different
interactions of fragment and protein [57].
Consequently, even a molecule fragment with as little as 10-12 heavy atoms

could theoretically lead to a nanomolar inhibitor or ligand. The small size may
even prove advantageous as the detrimental effects of molecule parts bumping
into the protein surface could be avoided. However, as the surface area
addressed through the ligand determines the binding energy, the topology of
the protein surface will be of crucial importance and will bias the applicability
of this technology to enzymes. In fact, most of the approaches are directed to
the deep specificity pockets of proteases or address the ATP site of kinases.
Recently, several reviews have summarized some of the successes from the
chemical point of view [58-601; so, we will highlight here only some examples
that are illustrative of the use in target families.
Two different concepts are currently followed for affinity screening
approaches. One focuses on optimizing the throughput for detecting
interactions and employs mass spectrometry or surface plasmon resonance
(SPR) technologies, establishing structural insights only in second level
experiments. Although several approaches have been described to use mass
spectrometry in affinity screening, the most promising concept couples the
equilibration with a brief size exclusion chromatography to remove unbound
library members before determining the ligands bound to target proteins by
mass spectrometry [61]. A family experiment using several J N K kinases thus
provided selective inhibitors with nanomolar activities and molecular weights
starting at around 350 Da [62]. As the removal of unbound compounds relies
on the size difference between small molecules and proteins, the approach has
also been shown to be quite powerful for screening membrane-bound proteins
that are captured in micelles. In a pilot study, GPCR aggregates provide a
high molecular size during separation from the small unbound ligands and
allow identifying of ligands to the M 2 receptor [63]. SPR, another established
methodology for quantifying protein-protein interactions, suffered for a long
time from the slow speed because long equilibration times are required before
the readout. Additionally, the detection of the interactions of small molecules
with proteins seemed to be impossible as the SPR signal correlates with the
increase in layer thickness. Small molecules lead to only a small change in layer
thickness but improvements in technology meanwhile allow the measurement
ofweaker affinities. The breakthrough ofusing SPR for affinity screening came
with the capability to combinatorially synthesize small ligands conjugated to
surface attachment tags. These conjugates can be spotted in arrays on the SPR
detection [64]. Recently, the search for fragments binding to the S1-specificity
pocket of the serine protease factor VIIa yielded haloaromatic moieties that
can be substituted for the well-known but undesired benzamidine as anchor.
Haloaromatic moieties were known as ligands to the benzamidine binding S1
pockets of the S1-clan serine proteases, such as factor Xa or thrombin. This
844
I knowledge guided the design for a library of approximately 1500 small-size
15 Target Families
fragments, which were immobilized on a microarray. Afinity screening with

factor VIIa identified several small ligands, and their interaction in the S1
pocket could be confirmed by crystallography using trypsin as a surrogate for
faster crystallographic screening and reconfirmation of the binding in factor
VIIa [G5].
The second line of concepts for fragment screening tries to extract as much
structural information from the initial interaction experiment and relies on
either N M R or crystallography, paying for the increased information content
with a limitation in throughput. The door to these experiments was pushed
wide open when Fesik and colleagues reported the successful screening of
small molecular fragments against the S 1’ pocket of stromelysin (matrix
metalloprotease 3 ) . Using biaryl systems, they could show that the resonances
in the NMR spectra shifted when the molecule fragments bound to the protein
(Fig. 15.14).Conjugating these fragments with hydroxamic acid, a potent zinc
chelator, provided compounds with nanomolar affinities [GG]. The elegance
and potential of fragment-based screening approaches was underlined by a
detailed investigation of the thermodynamics of the interactions [G7]. Using
the NMR-fragment screening but another fragment set, high affinity inhibitors
with novel structural motifs were discovered from a small set of fragments
for urokinase, wherein the deep S1-specificity pocket served as an anchoring
point, for the ligands [G8]. Starting from these anchors the ligands grew into
the S2, S3, and S4 pockets of the enzyme. Biochemical data as well as structural
information from NMR experiments guided the optimization toward selective
and nanomolar inhibitors [G9]. Fragment-based screening for kinase ligands
takes a slightly similar approach as kinase inhibitors do not have to bridge
several pockets. They can be grown from a central scaffold into some side
pockets of the ATP-binding site to improve selectivity and activity. The growth
of inhibitors has been demonstrated in the case of growing nonnucleotide
binders into the nucleotide-binding pocket of adenosine kinase [70].
In addition to being used for confirmation and investigation of binding
modes, crystallography has recently been established as a screening tool. The
technological advances in computing power and structure solving algorithms
allow the soaking and high-throughput crystallography of compound libraries
[71-731. The intriguing aspect of seeing the ligand’s orientation directly as
a screening result, may counterbalance a higher false negative rate caused
by ligands cracking the crystals or by ligands not being able to penetrate
the protein because of restricted conformational flexibility in the crystal.
Fragments to be screened are usually selected on the basis of known ligands
or crystal structures of the protein. Millimolar activities in biochemical assays
usually provide enough affinity to yield cocrystals, but compounds with effects
in the range are usually not detected in traditional high-throughput screenings
as they are in competition with the biological substrates. In a de novo lead
finding approach, Sanders et al. utilized the structural similarity of the active
pockets from urokinase and dihydroneopterin aldolase (DHNA) to select the
15. I The Target Family Approach I 845
Fig. 15.1-6 Selected fragment screening mass spectrometry [62] or surface plasmon
experiment applied to proteases and resonance [65]and established the binding
kinases. In their landmark study, Fesik et al. modes o f the ligands after identification
equilibrated hydrophobic molecules with through crystallography. In a recent
stromelysin and detected binding by shift o f approach, crystals o f CDK2 were used t o
NMR signals, retrieving structural select oxindole ligands from a dynamic
information from the initial study [MI. Other combinatorial library and established the
studies screened fragment collections using binding modes by crystallography in situ [76].
fragments to be screened against DHNA. The probing of the enzyme with the
same fragment set that had been used for urokinase by Nienaber et al. [71],
allowed establishing the structural requirements for selectivity in the initial
screening run and guiding the extension of the discovered fragments into
nanomolar inhibitors for DHNA [74]. Starting from privileged scaffolds for
the ATP pocket of kinases, fragments binding to p38 MAP kinase and cyclin-
dependent kinase 2 (CDK2) were discovered, that can serve as novel central
building blocks for kinase inhibitors [75].As the throughput of crystallography
is still limited compared to biochemical screenings, collection sizes have to be
small or as in the previous example, mixtures of fragments have to be screened.
To expand the size of collections that can be screened by crystallography,
Congreve et al. devised a dynamic combinatorial library system using “CDK2”
846
protein crystals as selectors for the tightest binding ligands which are formed
from the condensation of isatin and hydrazines. Instead of equilibrating with a
large amount of template protein, the reaction mixture is exposed to individual
crystals of CDK2 guiding the selective formation of imino-indolones. The
structures of selected reaction products are determined by crystallography,
immediately establishing a binding mode for the nanomolar inhibitors of
CDK2 [7G].
Today, the application of fragment approaches is still limited to soluble
proteins, but in future there will be adaptations to membrane-bound proteins,
especially those in which the ligand does not have to compete with natural
ligands, like GPCRs or ion channels, to exert a functional response in a
biochemical assay. The structural insights in the target families will guide the
selection of fragment sets and allow using individual proteins as surrogates
for the whole target family.
15.1.4
Epilogue
Over the last 5 years chemical biology has reshaped the methods of doing
drug discovery. The investigations of the structural characteristics of target
families allow us today to take a more rationale approach toward selecting
appropriate compounds for synthesis and testing. Through the sequencing
of the human genome, we have the blueprint of the building blocks of life
that can be modulated in their interactions through therapeutics. In addition
to the aspects discussed in this chapter, the analysis of pharmacokinetic
characteristics of molecules in the human body has established guidelines and
boundaries for molecules that help us to navigate the chemical space in regions
that offer a higher population of structures than those that may be suitable as
drugs [3,4].While many of the concepts of the target family approach may not
be novel if looked at individually, their conscious combination adds another
dimension: “chemical biology” is based on a thorough structural knowledge of
similarities and differences within a target family. On the basis ofthe sequence
homologies of proteins we can currently make predictions for ligands to
hitherto unexplored targets, thus building a powerful stepping-stone for lead
discovery. We have also learned how to use closely related family members
as surrogates when the target under study is not amenable to a particular
technology, such as crystallography. Today’s structural understanding also
allows us to make more sophisticated choices about investigations to prevent
side effects, and the increasing biological knowledge helps us to rationalize
side effects of drugs and to modify affected drugs accordingly. Yet, we
still run into the trap of building assay schemes for drug discovery that
allow high throughput and are self-consistent. The high-throughput design
sacrifices the biochemical mimicking of the cellular environment, such as the
previously mentioned high concentration and viscosity, for technical feasibility.
References I 8 4 7
The self-consistency often leads to the risk of loosing the relevance for the
pathophysiological phenomenology and thus jeopardizes the predictivity for
the therapeutic setting, being detached from reality like the “Hessian glass
bead game” [77].Eventually “systems biology” will elucidate how the building
blocks of life work together in networks and pathways and which results can
be expected by tweaking one dial in the system, leading to novel and powerful
assay set-ups. Thus, drug discovery may come a full circle to where it started,
but equipped with the chemical biology armentarium of understanding and
predicting the phenomenological changes observed in diseased states and after
the administration of drugs.
References
1. J.C. Venter, M.D. Adams, E.W. Myers, F. Zhong, W. Zhong, S. Zhu, S. Zhao,
P.W. Li, R.J. Mural, G.G. Sutton, D. Gilbert, S. Baumhueter, G. Spier,
H.O. Smith, M. Yandell, C.A. Evans, C. Carter, A. Cravchik, T. Woodage,
R.A. Holt, J.D. Gocayne, F. Ah, H. An, A. Awe, D. Baldwin,
P. Amanatides, R.M. Ballew, H. Baden, M. Barnstead, I. Barrow,
D.H. Huson, J.R. Wortman, K. Beeson, D. Busam, A. Carver,
Q. Zhang, C.D. Kodira, X.H. Zheng, A. Center, M.L. Cheng, L. Curry,
L. Chen, M. Skupski, S. Danaher, L. Davenport, R. Desilets,
G. Subramanian, P.D. Thomas, S. Dietz, K. Dodson, L. Doup,
J. Zhang, G.L. Gabor Miklos, S. Ferriera, N. Garg, A. Gluecksmann,
C. Nelson, S. Broder, A.G. Clark, B. Hart, J. Haynes, C. Haynes,
J. Nadeau, V.A. McKusick, N. Zinder, C. Heiner, S. Hladun, D. Hostin,
A.J. Levine, R.J. Roberts, M. Simon, J. Houck, T. Howland, C. Ibegwam,
C. Slayman, M. Hunkapiller, J. Johnson, F. Kalush, L. Kline,
R. Bolanos, A. Delcher, I. Dew, S. Koduru, A. Love, F. Mann, D. May,
D. Fasulo, M. Flanigan, L. Florea, S. McCawley, T. Mclntosh, The
A. Halpern, S. Hannenhalli, sequence of the human genome.
S. Kravitz, S. Levy, C. Mobarry, Science 2001, 291,1304-1351.
K. Reinert, K. Remington. 2. J. Drews, Drug discovery: a historical
J. Abu-Threideh, E. Beasley, perspective, Science 2000, 287,
K. Biddick, V. Bonazzi, R. Brandon, 1960-1963.
M. Cargill, I. Chandramouliswaran, 3. C.A. Lipinski, F. Lombardo, B.W.
R. Charlab, K. Chaturvedi, 2. Deng, Dominy, P.J. Feeney, Experimental
V. Di Francesco, P. Dunn, K. Eilbeck. and computational approaches to
C. Evangelista, A.E. Gabrielian, estimate solubility and permeability in
W. Gan, W. Ge, F. Gong, Z. Gu, drug discovery and development
P. Guan, T.J. Heiman, M.E. Higgins, settings, Adv. Drug Delivery Rev. 1997,
R.R. Ji, Z. Ke, K.A. Ketchum, Z. Lai, 23, 3-25.
Y. Lei, 2. Li, J. Li, Y. Liang, X. Lin, 4. D.F. Veber, S.R. Johnson, H.-Y.
F. Lu, G.V. Merkulov, N. Milshina, Cheng, B.R. Smith, K.W. Ward et al.,
H.M. Moore, A.K. Naik, V.A. Narayan, Molecular properties that influence
B. Neelam, D. Nusskern, D.B. Rusch, the oral bioavailability of drug
S. Salzberg, W. Shao, B. Shue, J. Sun, candidates, J. Med. Chem. 2002, 45,
Z. Wang, A. Wang, X. Wang, J. Wang, 2615-2623.
M. Wei, R. Wides, C. Xiao, C. Yan, 5. A.L. Hopkins, C.R. Groom, The
A. Yao, J. Ye, M. Zhan, W. Zhang, druggable genome, Nat. Rev. Drug
H. Zhang, Q. Zhao, L. Zheng, Discov. 2002, I , 727-730.
15 Target Families
848
I 6. G. Wess, M. Urmann, G-protein-coupled receptors in the
B. Sickenberger, Medicinal chemistry: human genome form five main
challenges and opportunities, Angew. families. Phylogenetic analysis,
Chem., Int. Ed. Engl. 2001, 40, paralogon groups, and fingerprints,
3341-3350. Mol. Pharmacol. 2003, 63, 1256-1272.
7. P.P. Wangikar, A.V. Tendulkar, 17. D.K. Vassilatis, J.G. Hohmann,
S. Ramya, D.N. Mali, S. Sarawagi, H. Zeng, F. Li, J.E. Ranchalis et al.,
Functional sites in protein families The G protein-coupled receptor
uncovered via an objective and repertoires of human and mouse,
automated graph theoretic approach, Proc. Natl. Acad. Sci. U.S.A. 2003, 100,
]. Mol. Bid. 2003, 326, 955-978. 4903-4908.
8. M.A. Koch, R. Breinbauer, 18. M.H. Saier Jr, A functional-
H. Waldmann, Protein structure phylogenetic classification system for
similarity as guiding principle for transmembrane solute transporters,
combinatorial library design, Biol. Microbiol. Mol. Biol. Rev. 2000, 64,
Chem. 2003,384,1265-1272. 354-411.
9. L. Orning, G. Krivi, F.A. Fitzpatrick, 19. S . Caenepeel, G. Charydczak,
Leukotriene A4 hydrolase. Inhibition S . Sudarsanam, T. Hunter,
by bestatin and intrinsic G. Manning, The mouse kinome:
aminopeptidase activity establish its discovery and comparative genomics
functional resemblance to of all mouse protein kinases, Proc.
metallohydrolase enzymes, ]. Biol. Natl. Acad. Sci. U.S.A. 2004, 101,
Chem. 1991,266,1375-1378. 11707-11712.
10. E. Fischer, Effekt der 20. S.M. Foord, Receptor classification:
Zuckerkonfiguration auf die post genome, Curr. Opin. Phamacol.
Enzymwirkung. Ber. Dtsch. Chenz. Ges. 2002, 2,561-566.
1894, 27,2985. 21. T. Naumann, H. Matter, Structural
11. D.E. Koshland Jr, The lock-and-key classification of protein kinases using
principle and the induced-fit theory, 3D molecular interaction field analysis
Angew. Chem., Int. Ed. Engl. 1994, 33, of their ligand binding sites: target
2475-2478. family landscapes, 1.Med. Chem. 2002,
12. K. Palczewski, T. Kumasaka, T. Hori, 45,2366-2378.
C.A. Behnke, H. Motoshima et al., 22. H. Matter, W. Schwab, Affinity and
Crystal structure of rhodopsin: A G selectivity of matrix metalloproteinase
protein-coupled receptor, Science 2000, inhibitors: a chemometrical study
289,739-745. from the perspective of ligands and
13. Y. Jiang, A. Lee, J. Chen, M. Cadene, proteins,]. Med. Chem. 1999, 42,
B.T. Chait et al., Crystal structure and 4506-4523.
mechanism of a calcium-gated 23. M. Vieth, R.E. Higgs, D.H. Robertson,
potassium channel, Nature 2002, 417, M. Shapiro, E.A. Gragg et al.,
515-522. Kinomics-structural biology and
14. M. Nishida, R. MacKinnon, Structural chemogenomics of kinase inhibitors
basis of inward rectification: and targets, Biochim. Biophys. Acta
cytoplasmic pore of the G 2004, 1697,243-257.
protein-gated inward rectifier GIRKl 24. M.A. Fabian, W.H. Biggs, D.K.
at 1.8. ANG. resolution, Cell 2002, 111, Treiber, C.E. Atteridge, M.D.
957-965. Azimioara et al., A small
15. A. Kuo, J.M. Gulbis, J.F. Antcliff; molecule-kinase interaction map for
T. Rahman, E.D. Lowe et al., Crystal clinical kinase inhibitors, Nat.
structure of the potassium channel Biotechnol. 2005, 23, 329-336.
KirBacl.1 in the closed State, Science 25. N. Vaidehi, W.B. Floriano,
2003,300,1922-1926. R. Trabanino, S.E. Hall, P. Freddolino
16. R. Fredriksson, M.C. Lagerstrom, et al., Prediction of structure and
L.-G. Lundin, H.B. Schioth, The function of G protein-coupled
References I 8 4 9
receptors, Proc. Natl. Acad. Sci. U.S.A. and strategies, Curr. Opin. Chem. Biol.
2002, 99, 12622-12627. 2003, 7,64-69.
26. P. Hummel, N. Vaidehi, W.B. 36. H. Zhang, X. Zha, Y. Tan, P.V.
Floriano, S.E. Hall, W.A. Goddard 111, Hornbeck, A.J. Mastrangelo et al.,
Test of the binding threshold Phosphoprotein analysis using
hypothesis for olfactory receptors: antibodies broadly reactive against
explanation of the differential binding phosphorylated motifs, /. Biol. Chem.
of ketones to the mouse and human 2002, 277,39379-39387.
orthologs of olfactory receptor 912-93, 37. 2 . Songyang, S. Blechner,
Protein Sci. 2005, 14, 703-710. N. Hoagland, M.F. Hoekstra,
27. S. Shacham. Y. Marantz, S. Bar-Haim, H. Piwnica-Worms et al., Use of a n
0. Kalid, D. Warshaviak N. Avisar, oriented peptide library to determine
B. Inbal, A. Heifetz, M. Fichman, the optimal substrates of protein
M. Topf, 2 . Naor, S . Noiman, kinases, Curr. Biol. 1994, 4, 973-982.
O.M. Becker, PREDICT modeling and 38. P.M. Chan, H.P. Nestler, W.T. Miller,
in-silico screening for G-protein Investigating the substrate specificity
coupled receptors, Proteins 2004, 57, of the Her-Z/Neu kinase using peptide
51-86. libraries, Cancer Lett. 2000, 1 GO,
28. T. Klabunde, G. Hessler, Drug 159-169.
design strategies for targeting 39. L.A. Witucki, X. Huang, K. Shah,
G-protein-coupled receptors, Y.Liu, S . Kyin eta]., Mutant tyrosine
ChemBioChem 2002,3,928-944. kinases with unnatural nucleotide
29. O.M. Becker, Y. Marantz, S. Shacham, specificity retain the structure and
B. Inbal, A. Heifetz et al., G phospho-acceptor specificity of the
protein-coupled receptors: in silico wild-type enzyme, Chem. Biol. 2002, 9,
drug discovery in 3D, Proc. Natl. Acad. 25-33.
Sci. U.S.A. 2004, 101, 11304-11309. 40. D.C. Greenbaum, W.D. Arnold, F. Lu,
30. J.S. Mitcheson, J. Chen, M. Lin, L. Hayrapetian, A. Baruch et al., Small
C. Culberson, M.C. Sanguinetti, A molecule affinity fingerprinting a tool
Structural basis for drug-induced long for enzyme family subclassification,
QT-syndrome, Proc. Natl. Acad. Sci. target identification, and inhibitor
U.S.A. 2000, 97, 12329-12333. design, Chem. Biol. 2002, 9,
31. R.A. Pearlstein. R.J. Vaz, J. Kang, X.-L. 1085-1094.
Chen, M. Preobrazhenskaya et al., 41. H.P. Nestler, A. Doseff, A
Characterization of HERG potassium two-dimensional, diagonal sodium
channel inhibition using CoMSiA 3D dodecyl sulfate polyacrylamide gel
QSAR and homology modeling electrophoresis technique to screen for
approaches, Bioorg. Med. Chem. Lett. protease substrates in protein
2003, 13,1829-1835. mixtures, Anal. Biochem. 1997, 251,
32. A.M. Aronov, Predictive in silico 122-125.
modeling for hERG channel blockers. 42. M. Meldal, 1. Svendsen, K. Breddam,
Drug Discou. Today 2005, 10,149-155. F.-I. Auzanneau, Portion-mixing
33. J.F. Antcliff, S. Haider, P. Proks, peptide libraries of quenched
M.S.P. Sansom, F.M. Ashcroft, fluorogenic substrates for complete
Functional analysis of a structural subsite mapping of endoprotease
model of the ATP-binding site of the specificity, Proc. Natl. Acad. Sci. U.S.A.
KATP channel Kir6.2 subunit, E M B O 1994, 91, 3314-3318.
I. 2005, 24,229-239. 43. I.D.A. Tyndall, T. Nall, D.P. Fairlie,
34. 0. Civelli, GPCR deorphanizations: Proteases universally recognize beta
the novel, the known and the strands in their active sites, Chem. Rev.
unexpected transmitters, Trends 2005, 105,973-999.
Pharmacol. Sci. 2005, 26, 15-19. 4.A. Eschenmoser, One hundred years
35. D.E. Kalume, H. Molina, A. Pandey, of the lock-and-key principle, Angew.
Tackling the phosphoproteome: tools Chem., Int. Ed. Engl. 1994, 33, 2363.
15 Target Families
850
I 45. R.S. Bohacek, C. McMartin, W.C. 54. L. Naerum, L. Norskov-Lauritsen, P.H.
Guida, The art and practice of Olesen, Scaffold hopping and
structure-based drug design: a optimization towards libraries of
molecular modeling perspective, Med. glycogen synthase kinase-3 inhibitors,
Res. Rev. 1996, 16, 3-50. Bioorg. Med. Chem. Lett. 2002, 12,
46. G.W. Bemis, M.A. Murcko, The 1525-1528.
properties of known drugs. 1. 55. D.G. Lloyd, C.L. Buenemann, N.P.
Molecular frameworks, J. Med. Chem. Todorov, D.T. Manallack, P.M. Dean,
1996,39,2887-2893. Scaffold hopping in de novo design:
47. K. Bondensgaard, M. Ankersen, ligand generation in absence of
H. Thogersen, B.S. Hansen, B.S. receptor information, J. Med. Chem.
Wulff et al., Recognition of privileged 2004,47,493-496.
structures by G-protein coupled 56. I.D. Kuntz, K. Chen, K.A. Sharp, P.A.
receptors, J. Med. Chem. 2004, 47, Kollman, The maximal affinity of
888-899. ligands, Proc. Natl. Acad. Sci. U.S.A.
48. Y.C. Martin, J.L. Kofron, L.M. 1999, 96,9997-10002.
Traphagen, Do structurally similar 57. P.A. Rejto, G.M. Verkhiver,
molecules have similar biological Unraveling principles of lead
activity? I.Med. Chem. 2002, 45, discovery: from unfrustrated energy
4350-4358. landscapes to novel molecular
49. A.M. Aronov, M.A. Murcko, Toward a anchors, Proc. Natl. Acad. Sci. U.S.A.
pharmacophore for kinase frequent 1996, 93,8945-8950.
hitters, J . Med. Chem. 2004, 47, 58. D.A. Erlanson, R.S. McDowell,
5616-5619. T. O’Brien, Fragment-based drug
discovery, J . Med. Chem. 2004, 47,
50. H. Matter, E. Defossa, U. Heinelt,
3463-3482.
P.-M. Blohm, D. Schneider et al.,
59. D.C. Rees, M. Congreve, C.W. Murray,
Design and quantitative
R. Carr, Fragment-based lead
structure-activity relationship of
discovery, Nat. Rev. Drug Discov. 2004,
3-amidinobenzyl-1H-indole-2-
3,660-672.
carboxamides as potent, nonchiral,
60. H.P. Nestler, Combinatorial chemistry
and selective inhibitors of blood
and fragment screening - two unlike
coagulation factor Xa, J . Med. Chem.
siblings? Curr. Drug Discov. Technol.
2002,45,2749-2769.
2005, 2, 1-12.
51. J.L. Jenkins, M. Glick, J.W. Davies, A 61. Y.M.Dunayevskiy, P. Vouros, E.A.
3D similarity method for scaffold Wintner, G.W. Shipps, T. Carell et al.,
hopping from known drugs or natural Application of capillary
ligands to new chemotypes, J. Med. electrophoresis-electrospray ionization
Chem. 2004,47,6144-6159. mass spectrometry in the
52. D. Horvath, C. Jeandenans, determination of molecular diversity,
Neighborhood behavior of in silico Proc. Natl. Acad. Sci. U.S.A. 1996, 93,
structural spaces with respect to in 6152-6157.
vitro activity spaces-a novel 62. G. Agnihotri, M.P. Scott, M.H.
understanding of the molecular Alaoui-Ismaili, U.F. Mansoor,
similarity principle in the context of D. Murphy et al., Identification of
multiple receptor binding profiles, I. potent inhibitors of c-Jun N-terminal
Chem. InJ Comput. Sci. 2003, 43, kinase-1 (JNK1) using ultra
680-690. high-throughput affinity based
53. H. Matter, Selecting optimally diverse screening, 12th Symposium on Second
compounds from structure databases: Messengers and Phospho-proteins
a validation study of two-dimensional (SMP-2004),2004.
and three-dimensional molecular 63. Y. Hou, J. Felsch, A. Annis, C.E.
descriptors, I . Med. Chem. 1997, 40, Whitehurst, C.C. Cheng et al.,
1219-1229. Identification of small molecule
References I851
ligands for G protein coupled receptor Bayburt et al., Design of adenosine

using affinity selection screening, kinase inhibitors from the NMR-based
GPCR IBC Conference, 2002. screening of fragments, J . Med. Chem.
64. G . Metz, H. Ottleben, D. Vetter, Small 2000,43,4781-4786.
molecule screening on chemical 71. V.L. Nienaber, P.L. Richardson,
microarrays, Meth. Princ. Med. Chem. V. Klighofer, 7.1. Bouska, V.L. Giranda
2003, 19,213-236. et al., Discovering novel ligands for
65. S. Dickopf, M. Frank, H.-D. Junker, macromolecules using X-ray
S. Maier, G. Metz et al., Custom crystallographic screening, Nut.
chemical microarray production and Biotechnol. 2000, 18, 1105-1108.
affinity fingerprinting for the S 1 72. R. Carr, H. Jhoti, Structure-based
pocket of factor VIIa, Anal. Biochem. screening of low-affinity compounds,
2004,335,50-57. Drug Discov. Today 2002, 7, 522-527.
66. P.J. Hajduk, G. Sheppard, D.G. 73. A. Sharff, H. Jhoti, High-throughput
Nettesheim, E.T. Olejniczak, S.B. crystallography to enhance drug
Shuker et al., Discovery of potent discovery, Curr. Opin. Chem. Biol.
nonpeptide inhibitors of stromelysin 2003, 7, 340-345.
using SAR by NMR,]. Am. Chem. SOC. 74. W.J. Sanders, V.L. Nienaber, C.G.
1997, 119,5818-5827. Lerner, J.O. McCall, S.M. Merrick
67. E.T. Olejniczak, P.J. Hajduk, P.A. et al., Discovery of potent inhibitors of
Marcotte, D.G. Nettesheim, R.P. dihydroneopterin aldolase using
Meadows et al., Stromelysin inhibitors crystaLEAD high-throughput X-ray
designed from weakly bound crystallographic screening and
fragments: effects of linking and structure-directed lead optimization, J .
cooperativity,]. Am. Chem. SOC. 1997, Med. Chem. 2004, 47, 1709-1718.
119, 5828-5832. 75. M.J. Hartshorn, C.W. Murray,
68. P.J. Hajduk, S. Boyd, D. Nettesheim, A. Cleasby, M. Frederickson, I.J.
V. Nienaber, J. Severin et al., Tickle et al., Fragment-based lead
Identification of novel inhibitors of discovery using X-ray crystallography,
urokinase via NMR-based screening, J . J. Med. Chem. 2005, 48,403-413.
Med. Chem. 2000,43. 3862-3866. 76. M.S. Conpreve, D.1. Davis, L. Devine,
Y
69. M.D. Wendt, T.W. Rockway, A. Geyer, C. Granata, M. O'Reilly et a].,

W. McClellan, M. Weitzberg et a]., Detection of ligands from a dynamic
Identification of novel binding combinatorial library by X-ray
interactions in the development of crystallography, Angew. Chem., lnt. Ed.
potent, selective 2-naphthamidine Engl. 2003, 42,4479-4482.
inhibitors of urokinase. Synthesis, i'7. D.F. Horrobin, Opinion: modern
structural analysis, and SAR of biomedical research: an internally
N-phenyl amide 6-substitution, J . Med. self-consistent universe with little
Chem. 2004, 47,303-324. contact with medical reality? Nat. Rev.
70. P.J. Hajduk, A. Gomtsyan, Drug Discov. 2003, 2, 151-154.
S . Didomenico, M. Cowart, E.K.
Chemical Biology
852
15.2
Chemical Biology of Kinases Studied by NMR Spectroscopy
Marco Betz, Martin Vogtherr, Ulrich Schieborr, Bettina Elshorst, Susanne Grimme,
Barbara Pescatore, Thomas Langer, Krishna Saxena, and Harald Schwalbe
Outlook
The review presents N M R methods that contribute to the structure-guided drug

design at the family of protein kinases. Eight kinase-targeted oncology drugs
emerged on the market in the past eight years, although the understanding
of the molecular key events for tumourgenesis has made great advances.
Kinases have a key role in dysregulation of tumour growth and survival.
Consequently, tumour-specific kinase inhibitors are needed to open new
therapeutic opportunities for cancer patients.
The recent advances of the recombinant expression of the catalytic domains
of protein kinases will be described, which pushed the frontier of amendable
kinases to NMR-guided drug discovery. The publication will focus on methods,
which provide information on the binding properties of small molecules to the
catalytic domains of protein kinases as identified from NMR-based screening
trials. Moreover, aspects of the dynamic behaviour of key residues involved
in kinase-ligand interactions at the active site will be explained. An applicable
tool (LIGDOCK) to calculate docking complexes with small molecules at
high precision, which helps medicinal chemists to judge structure activity
relationships, will be presented. The resulting information about selectivity,
binding site and binding mode was used for step-by-step optimisation
of molecular fragments. The insights obtained from NMR studies had
important implications for the drug discovery process as demonstrated by
an enhancement of selectivity of small compound collections towards a given
target kinase.
15.2.1
Introduction
15.2.1.1 Kinases as Drug Targets

The so-called gene-family approach to drug discovery allows the simultaneous
assessment of both potency and selectivity of protein targets. Primary screens
of individual therapeutic targets are followed by secondary screens composed
of homolog members of the protein family [l-31. To benefit from synergy,
drug discovery programs have integrated the recent advances in genomics
and structural genomics [4]. This facilitates a gene-family approach for protein
classes such as G-protein coupled receptors (GPCRs) ion channels, protein
kinases, and protein phosphatases. These protein families are key players in cell
Chemical Biology. From Small Molecules to System Biology and Drug Desip.
Copyright 0 2007 WILEY-VCH Verlag GinbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
15.2 Chemical Bio/ogy ofKinases Studied by NMR Spectroscopy I 853
signaling networks, governing processes such as cell growth, cell division, and
cell death. The pathophysiology of many diseases is caused by the perturbation
ofthese pathways, whether caused by environmental stresses or genetic defects.
Regarding protein kinases, deregulated activity is involved in all aspects of
neoplasia, including proliferation, invasion, angiogenesis, and metastasis IS].
As a result of the sequencing of the human genome, approximately 500 protein
kinases have been predicted. Although only a comparatively small number
of protein kinases has actually been targeted by established drugs, it is now
accepted that finding protein kinase inhibitors is a viable way to discover
new drugs. The remarkable success of the first inhibitors, the anticancer
drugs Gleevec (Novartis) [GI and Iressa (AstraZeneca) [7], supports the idea
of targeting a kinase that is pivotal to a malignant phenotype. These findings
have increased the efforts in drug discovery and development research in
this area.
15.2.1.2 Kinases - A n Overview

The highly organized succession of biochemical reactions found in living
organisms takes place along the route of signal transduction. The concept
of “activating” a protein is essentially important in signal transduction [8].
Activation is usually accomplished by structural reorganization of the protein
triggered by events such as ligand binding, protein-protein interactions, or
chemical modifications. One of the most versatile activation mechanisms is
the activation by phosphorylation. A defined activation status of a target is
maintained by two counteracting classes of enzymes, the protein kinases that
specifically phosphorylate targets and the protein phosphatases that specifically
dephosphorylate them. The majority of protein kinases catalyzes the transfer
of the terminal phosphate group from ATP to a specific serine and threonine,
and the minority uses tyrosine residues as protein substrates [9, lo].
Inhibition of protein kinases has been a powerful tool to study signal
transduction pathways. It is easily achieved by inhibitors such as staurosporine.
However, staurosporine inhibits a broad spectrum of kinases, therefore it is
not suited for therapeutic purposes [l11. As the understanding of kinase
mechanism and inhibition has advanced in the past years, there is now an
increasing number of kinase inhibitors that are specific for one particular
kinase. The discovery of such specific inhibitors has in turn enhanced our
understanding of kinase action and signal transduction pathways.
15.2.1.3 Structural Biology of Kinases

Most native protein kinases are assembled in a modular fashion. This assembly
always contains the protein kinase catalytic domain that catalyzes the transfer
of a phosphate group to a substrate. Most kinases have additional variable
domains that are involved in kinase recognition, activation, or localization.
854
I Common variable kinase
15 Target Families
domains include protein-protein recognition

domains (eg., SH2, SH3, PH, or polo box domains), signaling domains, and
membrane anchoring domains [12, 131. Although the variable domains are of
particular interest in biochemical, structural, and pharmacological research,
the focus of this review is restricted to the kinase catalytic domains. The high
degree of structural and functional conservation is an ideal prerequisite for a
target-family approach, therefore this domain has also been the major point of
attack in most kinase directed drug strategies.
All related serinelthreonine kinases and protein tyrosine kinases share a
structurally conserved catalytic domain of about 270 amino acids. Numerous
kinase catalytic domains have been structurally characterized by X-ray
crystallography [ 14- 181. All of them share the highly conserved bilobal fold
that is depicted in Fig. 15.2-1. In this fold, the N-terminal lobe is composed
almost entirely of j3-sheets, whereas the C-terminal lobe is dominated by
a-helices. The two lobes are joined by a polypeptide chain, which functions
Fig. 15.2-1 Ribbon diagram showing the structure ofthe catalytic

domain o f murine protein kinase A (PKA) in complex with
Mg/ATP (1 Q24.pdb). The basic architecture that has been
observed in all subsequent kinase domain structures is denoted.
75.2 Chemical Biology of Kinases Studied by NMR Spectroscopy I 855
as a hinge. The catalytic site is located at the interface region between both
lobes. The adenine moiety of ATP binds deep in a hydrophobic pocket between
the lobes, while the phosphates of ATP are aligned by interactions with the
backbone amides of a glycine-rich loop. The protein substrate binding site is
associated mostly with the C-terminal lobe. The catalytic cycle of the phosphate
transfer and the conformational reorganizations linked to it are reasonably
well understood [19].
Crystallographic studies of mammalian protein kinase A (PKA) with and
without Mg-ATP and an inhibitory polypeptide have revealed two different
conformational states. The so-called open form is seen in the apo form and in
the binary complex with the peptide. The N-terminal lobe is turned away from
the C-terminal lobe by 14" when compared to the closed conformation. The
closed structure can be observed in the ternary complex with Mg-ATP and the
peptide substrate. This conformation is necessary to bring the residues into
the correct orientation to promote catalysis [20].
A key aspect of regulation is that most kinases can be activated by specific
phosphorylation, but there are numerous other kinase-specific activation
and inactivation pathways that involve protein-protein interactions. The
phosphorylation takes place on residues located in a particular segment
in the center of the kinase domain, which is termed the activation segment. It
is defined as the region spanning conserved sequences DFG and APE. The
conversion from an inactive to an active state involves conformational changes
in the protein that lead to the correct disposition of substrate binding and
catalytic groups. Structures of kinases with unmodified activation loops fall
into two classes cyclin dependent kinase 2 (CDK2)and insulin receptor kinase
(IRK) are representatives of enzymes that adopt inactive conformations in
their resting state. Their activation loop has an inhibitory fold, blocking the
sites for ATP and substrate [14, 161. p21 activated protein kinase 1 (PAK1)and
PKA, when freed of a negative regulator as the inhibitory switch (IS) domain
[21] or the R subunit [22], respectively, appear to relax into an accessible
conformation.
15.2.1.4 NMR Spectroscopy in Drug Discovery

N M R spectroscopy is involved at many different stages in pharmaceutical
research. Biomolecular N M R spectroscopy that will be discussed in this review
is just one application of a technique that is routinely used in pharmaceutical
industry for reaction control, metabonomics [23],or characterization of natural
products [24]. The most commonly used biomolecular N M R techniques in
pharmaceutical research are screening of small-molecule substance libraries
and the structural characterization of protein-ligand complexes. These
applications can be at least partially accomplished by other approaches (high-
throughput screening (HTS), surface plasmon resonance (SPR), modeling,
and X-ray crystallography) [25]. However, it has turned out that N M R ideally
complements these methods.
856
For a long time, N M R spectroscopy of proteins has been restricted to

small proteins and peptides. However, recent methodological and technical
progresses have enabled N M R spectroscopy to routinely study proteins that
are as large as 40 kDa, thereby in principle allowing NMR studies on most
protein kinase catalytic domains. Although numerous internal studies on
protein kinases have been conducted in pharmaceutical companies, there
are only a few publications that describe the advantages of biomolecular
NMR spectroscopy for protein kinases. In this review, the typical workflow
of an NMR spectroscopic approach, when investigating protein kinases as
pharmaceutical targets, is outlined. In particular, protein NMR experiments to
characterize the target protein are covered. The review includes the NMR-
driven search for the best expression construct and the optimal buffer
conditions, where the achieved protein yield meets adequate signal-to-noise
ratio (discussed in Chapters 15.2.2.1 and 2.2). Several kinases are screened
in a fragment-based approach, the set-up of ligand-based NMR experiments
results in the identification of new ligands, whose binding affinities toward
the individual kinase are derived (Chapter 3 . 3 ) . The combination strategy with
the protein-based N M R characterization of the interactions, which leads to
detailed knowledge about viable ligands at their residue-specific binding site,
will be explained (Chapter 15.2.4).Subsequent molecular docking with NMR-
derived constraints, which reveals the binding modes of molecular fragments
at atomic resolution and serves as a starting point for further optimization
steps, are shown (Chapter 15.2.4.3).
15.2.2
Protein NMR Spectroscopy on Kinases
15.2.2.1 Protein Expression

The requirement for stable proteins with good solution behavior is common to
both X-ray crystallography and NMR techniques. In most protein expression
laboratories, the two most frequently utilized expression hosts for recombinant
proteins are Escherichia coli and baculovirus-mediated insect cells [26]. For
advanced N M R investigations, the incorporation of the nonradioactive but
NMR-active isotopes "N, 13C, and 2H is necessary throughout the protein.
The host expression organism is grown in isotopically enriched minimal media
or special commercial full media. In practical terms, this limits the labeled
protein expression to E. coli and yeast, when efficiency at the economical
expense is regarded. Proteins with disulfide bonds and proteins that require
glycosylation or other posttranslational modifications are often difficult if not
impossible to obtain from expression in E. coli. In these cases, yeast expression
systems such as Pichia pastoris can be used for N M R purpose, since P. pastoris
can metabolize methanol as the sole carbon source and provides glycosylated
proteins [27].
15.2 Chemical B i O b g y of Kinases Studied by N M R Spectroscopy
As an alternative, for the recombinant expression of eucaryotic proteins,

I 857
selectively labeled amino acids can be introduced by the baculovirus expres-

sion system in Spodopterafiugiperda (Sf9) insect cells at reasonable costs as
shown [28]. Recently, the uniform isotope labeling of the Abl kinase using
baculovirus-infected insect cells has been reported [29].
The expression of protein kinases implies several aspects, particularly if the
specific protein targets are difficult to express. A typical workflow is depicted
in Fig. 15.2-2. The employed strategy for recombinant kinase expression is
Fig. 15.2-2 Typical workflow o f t h e systems A and B, uniformly

expression o f protein kinases. Optimization ' S N / ' 3 C / 2 H - l a b e l e dprotein kinases are
is performed i n E. GO/; and Pichia pastoris possible and lead t o a triple labeled sample
simultaneously t o find the m o s t suitable for the NMR assignment. W i t h the
expression system. If both expression hosts baculovirus system only 15Nselective a m i n o
fail, the expression system is changed t o acid labeling is economically feasible.
baculovirus-infected insect cells. W i t h the
75 Target Families
858
I to utilize E. coli and P. pastoris as expression hosts in parallel. Screening
of several constructs and fusion partner tags in both expression systems
leads to properly folded kinases in most cases. If both expression hosts fail,
the expression system can be changed to baculovirus-infected insect cells.
Figure 15.2-3 shows an ensemble of ['H, "N]-TROSY (transverse optimized
spectroscopy) spectra recorded of Bruton's tyrosine kinase (BTK) labeled with
five different amino acids [30].
The expression of a particular kinase can be toxic to the host cells. This
effect was reported for PAK1, which could only be expressed in E. coli in
an autoinhibited state. A single point mutation of residue lysine 299 to an
arginine residue forces the activation segment into a conformation, where
ATP binding is prevented, leading to a kinase-dead protein that is not toxic to
the host cells anymore [31]. However, for drug discovery that targets the active
side of P A K l with ATP analogs, a different point mutation is inserted. ATP
binding should still be possible but the catalytic activity should be inhibited.
In case of expression difficulties, surrogate kinases are of major importance.
(a) Met: 12/12 signals (b) Ile. 12/13 signals (c) Leu: 25/26 signals
mm 1D
~ ~ ~~
~ D m l
112 1 00000 00000

0.000 00000
114
00000 0.000
-E '
~
116
118
00000 00000 00000 ,
e
122
iI:
~
7
2 124 ' I
126 j
1
128 1
4
130
-
'
%7,
132 I 135
, , ' , , , , , , ~ , ,
95 90 85 80 75 70 ppm 11 10 9 8 7 6 ppm 11 10 9 8 7 6 PPm
(d) Val: 17/18 signals (e) Phe: 11/12 signals

00000 : :00000
y o
110 0000. 110 .0000
00000 00000
4
I
135[ , , , , , , , 1 , , ' , , , , 1
11 10 9 8 7 6ppm 11 10 9 8 7 Gppm
6'H [PPml
Fig. 15.2-3 [' H,'SN]-TROSY spectra of the upper corner represent the 20 possible
different BTK samples with selectively amino acids. Proline is not considered
labeled "N-amino acids: Met (a), Ile (b), because it lacks the NMR-detectable amide
Leu (c), Val (d), and Phe (e). The circles in proton.
15.2 Chemical Biology ofKinases Studied by NMR Spectroscopy I 859
Protein kinase B (PKB/Akt) is a validated target in drug design, but the catalytic
domain cannot be expressed in E. coli in a functional form. PKA and Akt/PKB,
both of which belong to the AGC family (PKA/protein kinase G/PKC) of
protein kinases, share a high sequence homology. Distinct point mutations
in the active site of PKA (PKAB6 and PKAB8 chimeras) are introduced to
enhance their similarity and their corresponding binding profile [32].
Depending on the used expression system and fusion protein, the yield
of an expressed kinase can vary by one magnitude. Changing the gluthation
S-transferase fusion protein (GST) to an N-terminal His tag for the expression
of p38 results in an increase by factor 8 for the yield of the recombinant
protein.
Another issue for construct optimization efforts is done for mitogen-
activated protein kinase-activated protein kinase-2 (MAPKAP-2). Screening
of 20 protein constructs with different N- and C-terminal ends leads to an
NMR-feasible kinase. In this case, the protein expression yield of the different
constructs is of minor importance; the major goal to achieve is a properly
folded protein with long-term stability during the recording of the N M R
spectra.
The domain boundary has to be carefully chosen, which is also observed for
PKA expression. The A helix (amino acids 16-31) contains an N-myristylation
motif and a SO-residue extension at the C-terminus of the catalytic domain.
This amino acid stretch with the aromatic FTEF sequence must be
included during the NMR investigations because it folds back onto an
hydrophobic patch of the N-terminal lobe, thus stabilizing the whole protein
construct [ 331.
As a result, these expression efforts lead to a triple labeled protein kinase
sample, providing the basis for the N M R assignment of the specific kinase.
15.2.2.2 Construct and Condition Optimization

The highly conserved protein kinase catalytic domain fold has a size
of approximately 40 kDa. This hampers protein N M R investigations by
aggravated signal overlap and rapid 'H and 13C transverse relaxation.
'
Deuteration of nonexchangeable protons in combination with H decoupling
efficiently increases the size limit for solution N M R eliminating the possible
relaxation pathways. Additionally, the discovery of ['H, "N]-TROSY [34]
based triple-resonance methods enhances the advantages of deuteration,
allowing the sequential resonance assignment of large proteins to be
obtained.
It is necessary to assign at least the N M R resonances of atoms comprising
the protein backbone prior to further site-specific N M R studies. This can be
accomplished routinely by using a suite of triple-resonance experiments and
uniformly 2H/13C/'SN-labeledprotein samples [35]. However, the relative
high concentrations (100-GOO pM) necessary for assignment require a
careful choice of measuring conditions. Only in some cases, the published
860
I conditions
15 Target Families
for crystallization or testing for enzymatic activity can be

directly translated for NMR investigations. However, conditions can be
directly optimized by the DOSY (diffusion ordered spectroscopy) N M R
experiment, from which diffusion constants can be obtained [ 3 6 ] , and by
analyzing ['H, "N]-TROSY spectra for a set of conditions. Both methods
are powerful diagnostic tools. Usually it is sufficient to analyze a small
number of possible constructs at defined buffer conditions. Later the buffer
conditions can be optimized for the best construct by a two-dimensional
grid search for optimal pH and salt concentration. However, in some
cases the optimal conditions for various constructs can be different. It
is essential to keep the concentration constant in all samples because
TROSY signal intensity and aggregation behavior are themselves concentration
dependent.
For diffusion measurements, the bipolar LED sequence with water
suppression gives the best results [3G].While an absolute comparison between
different proteins is generally difficult, the relative diffusion constants for
one particular protein at different buffer conditions are good measures
for its oligomeric state or aggregated state, respectively. Furthermore, the
concentration dependence of the diffusion constant is a practical indicator
of the aggregation tendency. Figure 15.2-4 shows an example ofwhat can be
achieved by proper optimization of buffer conditions for MAPIZAP-2.
105 - ' ,
b
110 -
- 115 -
-z2
a
v)
120-
r
Lo
125 -
130 - 1.) ,
135 - 1
i , ;, , , , , , ,_ I
N M R conditions should follow the usual guidelines for N M R samples [37].

Ample buffering is particularly important at preferably low pH and at a
sufficient distance from the isoelectric point. To improve the signal-to-noise
ratio, the highest possible concentration is to be used without running into
aggregation problems.
Since one of the fundamental requirements for crystallization is good
solution properties, this type of screen can also be used to assess suitable
solution conditions for crystallization trials. The application of the described
workflow led to the successful expression and purification of several kinases,
which are validated targets in drug discovery. Figure 15.2-5 shows a gallery of
the corresponding spectra, which were recorded on the uniformly N-labeled
proteins in our group.
15.2.2.3 NMR Resonance Assignment

The N M R resonance assignment is possible in a standard fashion using a set
of triple-resonance experiments (e.g., HNCO, HN(CA)CO, HNCA, HNCACB,
HN(C0)CACB)[35].The use ofuniformly 2H/'3C/'sN-labeled protein samples
and a set of TROSY-typeversions of these experiments is indispensable. High-
field instruments (800 MHz or higher) and cryogenic probeheads contribute
significantly to enhanced sensitivity. Figure 15.2-6 shows the amide region of
a ['H, "N]-TROSY spectrum of uniformly "N-labeled PKA catalytic domain
with construct, buffer, and expression essentially taken over from published
X-ray crystallization studies. The spectrum obtained at a 800 MHz N M R
spectrometer is well resolved and demonstrates that NMR studies on protein
kinases are viable.
To benefit from enhanced sensitivity of proton detection, 'H nuclei
are required at exchangeable sites such as the amide protons or the C"
protons. Deuterated protein samples are usually prepared from host cells
grown in DzO-based media, which contain deuterated carbon sources.
Subsequent D/H exchange in the HzO-based N M R buffer reintroduces
protons at the labile positions. However, large proteins usually yield less
peaks in the 'H-15N correlation than it is expected from the primary
sequence. One explanation is a high deuteration level of nonexchangeable
amide sites in the hydrophobic core of large proteins. This problem
can be minimized, but not abolished, if the protein is overexpressed in
perdeuterated media powder, which has been resolved in HzO [38]. Even
then, protein resonances are missing in the HN-based triple-resonance
spectra. The absence of resonances presumably has more fundamental
reasons. Possible reasons are fast proton exchange rates with the solvent
or excessive line broadening caused by intramolecular motions. In a more
detailed fashion, the dynamic property of a particular segment of mitogen-
activated protein (MAP) kinase p38a is discussed in Chapter 15.2.4.2.
Anyhow, the absence of peaks complicates sequential resonance assignment.
As compensation, additional information beyond elaborate N M R pulse
862
Fig. 15.2-5 [’ H,”N]-TROSY spectra o f t h e catalytic domains o f various protein kinases.

For p38, PKA and PKC assignments o f t h e correlation peaks are available.
Fig. 15.2-6 [’ H,15N]-TROSYspectrum of active murine protein kinase A (PKA) with the
annotated assignment.
sequences is to be included. There are three practical tools that are used
to enable the assignment of the protein, (a)the chemical shift matching
procedure, (b) the use of paramagnetic spin labels, and (c) the use of amino
acid-selective labeled samples as described previously in Sections 15.2.1
and 15.2.2 [ 3 3 ] .
864
I 15.2.2.3.1
15 Target Families
The Chemical Shift Matching Procedure

The standard set of triple-resonance experiments needed for the sequential
assignment complements each "N-lH correlation resonance with a set of
up to six intra- and interresidual cross peaks (Fig. 15.2-7(a)).The assignment
process can be divided into two steps. The first step is the search for resonances
of amino acids that are identified as neighbors on the basis of cross peaks with
identical carbonyl or C"/Cfi carbon chemical shifts. An ensemble of several
consecutive cross peaks is called a stretch. Assumptions about the type of the
amino acid can be made from the chemical shifts of the corresponding C"
and Cf' carbons. The second step is the unique positioning of a stretch onto
the primary sequence. This process is called matching. It is possible that the
stretch is so long that the consecutive amino acid types occur only once within
the sequence. In practice, the incompleteness of observable resonances for
a given protein implies that most of the stretches are comparatively short.
On the other hand, the number of possible matching positions grows with
increasing size of a protein.
The MAPPER procedure [39] has been suggested to alleviate this problem.
This procedure is based upon the average chemical shifts of each amino acid
type and its standard deviation. It ranks possible assignments according to
their probabilities based on their chemical shift statistics. However, the NMR
investigation of protein kinases is made easier if the X-ray structure of the
particular kinase is available. For a protein with known structure it is possible
to predict chemical shifts quite accurately [40]. The chemical shift matching
procedure uses these chemical shift values instead of a statistical input of
normally distributed standard resonances.
This approach was tested on proteins with known assignments to investigate
its success rate and unambiguity. Three cases are to be distinguished (a) the
correct position is found as an unique solution, (b) the correct position is
found along with other solutions (i.e., minima within the fold of the global
minimum), (c) the correct position is not found. To be useful for the sequential
assignment, case (a) is favored and case (c) is to be avoided. Figure 15.2-7(b)
exemplifies the matching procedure of a given stretch of four amino acids. As
expected, the assignment becomes less ambiguous with increasing number
of residues within a given stretch. However, even for the chemical shift data
belonging to one single "N-' H correlation, the correct and unambiguous
solution is found for nearly 50% of the amino acids. This data includes carbon
chemical shifts for two neighboring amino acids. For stretches of four amino
acids, the solution is correct and unambiguous in 97% of the cases. Such
stretches are difficult to find, particularly for larger proteins, but easier to
match on the basis of qualitative chemical shift arguments. The strength of
the matching procedure is more obvious for short stretches, which are easy to
find but difficult to locate. Even for sparse data, the correct solution is found in
approximately 90% or more of all proteins under investigation. For more than
60% of all two-residue stretches, the correct solution is observed unambigously.
15.2 Chemical Biology ofKinases Studied by NMR Spectroscopy I865
(a) AM93 ARG94 SER95 LEU96

i,i-l i-1 i,i-1 1-1 1.1-1
7.65 1.21 9.21 ‘33 733 841 841
93-94 93-95 93-96

(b) 10
9
- 8
E
-a
a 7
D
p 6
an
tE 5
$ 4
0
g 3
“ 2
1
100 200 300 100 200 300 100 200 300

Residue number
Fig. 15.2-7 The shift match procedure can these diagrams, a t the bottom, the RMSD is
be divided into two steps. The identification shown as a function o f the assumed start
o f resonances o f subsequent residues and residue in the protein sequence. Low RMSD
the calculation o f the root mean square dif- values indicate possible correct start resi-
ference (RMSD) in calculated and predicted dues. The diagram shows the alignment o f
chemical shifts. (a) Strips from the three- the resonances o f t w o residues, 93-94 (left),
dimensional NMR spectra HNCOCACB and three residues, 93-95 (middle) and four
HNCACB. Through the matching o f C“ and residues, 93-96 (right). The alignment o f
CP chemical shifts the neighboring amino two residues leads to three different possible
acids can be concatenated to a “stretch”. solutions (circles). After the identification o f
The positioning of this stretch onto the three or four subsequent resonance sets, the
primary sequence, putative Ala93 to Leu96, correct alignment is the unique solution o f
is ambiguous a t this stage. (b) In each o f the shift matching process.
866
I In near all cases, the solution becomes unique upon the addition of a third
15 Target Families
amino acid.
15.2.2.3.2 The Use o f Paramagnetic Spin Labels

Unpaired electrons cause faster relaxation of all neighboring nuclei resonances
in a distance-dependent manner. Functional groups with unpaired electrons
can be introduced by chemical modification of known inhibitors [41]. It
has been demonstrated that these effects allow the detection and the
structural characterization of protein-ligand interactions relatively oriented
to the localization of the paramagnetic center within the structure [42, 431.
The reverse case, identification of neighboring protein resonances within a
published protein-inhibitor structure, can be used as a tool for the assignment
of signals. The quantification of this effect can be utilized to deduce distances
between isotopically labeled protein residues and the paramagnetic center [44].
There are two types of effects that are caused by the paramagnetic center, line
broadening due to increased relaxation rates and chemical shift changes due
to contact or pseudocontact shifts.
As an alternative to chemically modified inhibitors, short polypeptide
extensions to the primary sequence of the target protein can be appended,
which contain binding sites for trivalent metal ions. On loading with
paramagnetic lanthanide ions, which bind with high affinity to the binding
tags, chemical shifts of the neighboring residues are perturbed. Moreover,
bound lanthanide ions induce residual dipolar couplings (RDCs) because the
unpaired electron restricts the molecular tumbling by a weak alignment in the
static magnetic field. This provides an additional tool for the determination of
protein structures by solution NMR [45, 461.
However, for assignment (particularly of a large protein with overlapping
signals) it is desirable to keep chemical shift changes to a minimum. Therefore,
paramagnetic agents that induce a purely relaxing effect are best suited for
assignment purposes. Metal ions (e.g., Mn2+) can unspecifically bind in many
proteins. Spin-labeled ligands that bind to well-defined sites in the protein
are better suited to help the assignment procedure. For the Abelson kinase,
spin-labeled Gleevec (Fig. 15.2-8(a))has been reported previously as the ligand
[44]. Exhibiting a high affinity, such ligands cause additional problems due
to the chemical shift perturbations (CSPs) of the protein resonances. This
can be circumvented by a weakly binding adenosine derivative (Fig. 15.2-8(b)).
The position of the paramagnetic center can be inferred with sufficient
accuracy from kinase-ATP complex structures, which are publicly available.
By superimposing the spin-labeled derivative over the adenosine moiety in
the X-ray complex, the position of the ATP B-phosphorus atom can serve as
a good approximation for the paramagnetic center. For apo and/or inactive
kinases where no complex structure is available, the position can be inferred
from a molecular docking approach on the basis of the known binding mode
of adenosine to the hinge region of a kinase [33].
Fig. 15.2-8 Chemical structures o f (a) the spin-labeled analog o f

ST571 (Cleevec; imatinib), (b) spin-labeled adenosine,
(c) SB203580 (DFC-in inhibitor), and (d) BIRB796 (DFC-out
i nhibitor).
Although this approach works in principle with uniformly "N-labeled

protein, it is more favorable in combination with selective amino acid labeling.
First, in the absence of a spin-labeled inhibitor, the specific amino acid
resonances are identified when compared to the spectra recorded from
uniformly labeled kinases. The number of peaks is reduced and therefore
more manageable if the spin-labeled inhibitor is added in a second step.
Especially in regions with aggravated signal overlap, the selective labeling
technique clearly separates the peaks and, therefore, enables the quantification
of the induced peak attenuation.
Figure 15.2-9 shows an example of selectively "N-Met labeled kinase BTK,
which was expressed in baculovirus-infected insect cells. For this kinase no
assignment is available so far. The ['H,''N]-TROSY spectra shows 12 peaks
corresponding to the 12 methionines of the primary sequence (Fig. 15.2-7(a)).
Upon adding the spin-labeled adenosine, four peaks are strongly attenuated
corresponding to the four methionines which are in close distance to the spin
label. This information is very valuable for the assignment process.
15.2.2.4 Protein-basedResults of NMR Investigations on Kinases
15.2.2.4.1 Extent of Assignment

The assignments for two protein kinases have been recently published for
active murine PKA [33] and for the inactive human MAP kinase p38a [47].
Both kinases have been studied by numerous other biophysical methods and
868
I 75 Target Famihes
\-I
Distance [A] Distance [A]
MET431 12.8 * MET501 21.3
MET437 16.2 MET509 14.9 *
MET449 13.6 * MET570 18.9
MET450 14.9 MET587 17.1
MET477 16.5 MET596 18.9
MET489 25.1 MET630 23.3
Fig. 15.2-9 (a) ['H,"N]-TROSY spectra o f attenuated, are marked with an asterisk.
the selectively "N-Met labeled kinase BTK (b) Ribbon presentation ofthe structure o f
(black spectrum) showing 12 peaks BTK. Methionines are depicted as balls and
corresponding to the 12 methionine in the the spin-labeled adenosine i s shown as
primary sequence. Upon adding of sticks with the unpaired electron marked as
spin-labeled adenosine, the peak intensities a star. (c) Table ofthe distances o f t h e
are attenuated according t o the distance o f methionines to the spin-labeled adenosine.
the amino acid to the paramagnetic center. Four methionines are in closer distance t o
The percentage rate ofthe residual peak the spin-labeled adenosine marked with an
intensity is denoted by the peaks (light gray asterisk.
spectrum). Four peaks, which are strongly
thus are well-characterized proteins. A wealth of structural and functional

data is available to be compared with the NMR results. As pointed out above,
there are commonly less peaks in the spectra recorded from large proteins
than expected from their primary sequence. For the p38 MAP kinase only
three quarters of the theoretical observable HN-Peaks could be detected in
the ['H,''N]-TROSY spectrum and therefore, the number of assigned peaks is
15.2 Chemical Bio/ogy of Kinases Studied by NMR Spectroscopy
comparably lower. Even the samples with 15N selectively labeled amino acids
I 869
yield less than the expected number of peaks, indicating that the disappearance
of signals is not due to the overlapping resonances. A detailed statistic for
p38 MAP kinase is given in Table 15.2-1. For PKA even less peaks could be
observed and assigned. Figure 15.2-10 depicts the extent of both assignments
mapped on the crystal structures of each kinase.
Table 15.2-1 Statistics o f amino acids, observable and assigned

peaks in the [’H,’5N]-TROSY spectra ofthe kinase protein p38
Selectively Number of Number of Number of

labeled amino acids observable peaks assigned peaks
construct
ASP 27 20 15
Ile 22 15 15
Leu 42 37 22
Met 10 9 5
Phe 13 12 12
TYr 15 12 9
Val 22 21 19
Total 34514 261(76%) 167(64%)
a The total number of amino acids without prolines which

principally do not show a correlation peak due to the lack of an
amide proton.
Fig. 15.2-10 Ribbon representation ofthe assigned regions marked in yellow are the
protein kinase PKA (a) and p38 MAP kinase more surface exposed regions. (c) Statistics
(b) showing the N-lobe, the C-lobe, and the of the assigned and unassigned peaks in the
ATP-binding site. In both proteins the [‘ H,’5N]-TROSY spectra.
870
In both proteins, the assigned regions are the more surface exposed regions.
The N- and C-terminal sequences and also the p-sheet N-lobe are almost
entirely assigned. On the other hand, the C-helix,the catalytic loop and parts of
the activation segment remain unassigned in both proteins. These unassigned
regions are solvent inaccessible in the tertiary structure and form a contiguous
patch. However, the distribution of assigned versus unassigned regions of
both proteins (see Fig. 15.2-11) is different in many regions of the C-lobe.
It can be speculated that this observation indicates that the dynamics in the
Globe, or in the activation segment, are different in the two kinases, which
could correspond to the different functionality of these two proteins.
It is documented for the crystal structures of inactive human CDK2 and the
partially activated human CDK2-cyclin A complex that large conformational
changes of the activation segment occur [48]. Comparing the position of the
activation segment in the structures for Twitchin Kinase, IRK, calmodulin-
depend kinase I (CaMKI),and MAPK, a variety of conformations are revealed
that are accessible to different kinases in their inactive state [15, 16, 49,
Fig. 15.2-11 Sequence alignment ofthe protein kinases PKA and

p38. Assigned amino acids (black), prolines (gray), and
unassigned amino acids (white) are mapped onto the sequence.
SO]. The survey over the static crystal structures provides clues to the
conformational malleability of particular regions of the protein kinases, as
they move through the catalytic cycle while various substrates, inhibitors,
and scaffold proteins participate. It can be presumed that the mentioned
regions have residual mobility even in the absence of any ligands. These
local segmental motions could happen on a timescale, which is unfavorable
for conventional detection by solution N M R , as consequence, resonances
vanish because of excessive linebroadening. Fluorescence resonance energy
transfer (FRET)measurements support this hypothesis. One cysteine at the N-
terminal lobe of PKA was labeled with fluorescent probe acting as an acceptor.
The fluorescent donor was anchored at the opposing lobe and the observed
intramolecular anisotropy decay revealed that the apoenzyme is likely to be
highly dynamic [51].
15.2.2.4.2 Addressing the Activation Status by NMR

Activation and substrate binding of kinases is accompanied by concerted
motions between the two domains of the catalytic core. These motions
influence the relative orientation of the two lobes with respect to each other. In
particular, the activation of PKA is triggered by phosphorylation of the pivotal
residue Thr197 in the activation segment, which contributes a stabilizing ionic
interaction with the conserved Arg165 preceding the catalytic aspartate. The
crystal structure of the CDK2, partially activated upon the binding of cyclin
A, shows that the helix comprising residues PSTAIRE i s also involved in
the long-range nature of the structural rearrangement [48]. Noteworthy, the
corresponding helix C in PKA has not been assigned because of the lack
of correlation peaks which indicates the dynamic feature of this structural
region. The length of the activation segment varies up to 10 amino acids
among protein kinases. The variability may allow a kinase to be constitutively
active (e.g., phosphorylase kinase (PhK) possesses a glutamate residue at the
conserved position [52]),or it may allow control by autophosphorylation, if the
segment has a sequence corresponding to the specificity of the kinase itself
(e.g., PKA). Alternatively, the specificity attracts other protein kinases, which
function as part of the signal cascade.
The complex rearrangements caused by the activation/inactivation are also
easily detected by N M R spectroscopy since they result in large CSPs in [lH,
"N]-TROSY spectra. As an example, PKA possesses four phosphorylation
sites. The autophosphorylated site Thr197 is sufficient to achieve full
activation, whereas the function of the other phosphorylation sites is presently
unclear. Since PKA possesses a low basal activity, the suppression of the
autophosphorylation reaction can only be achieved by introducing mutants
that lack the phosphorylation site. Figure 15.2-12 shows the spectrum of the
resulting constitutively inactive mutant T197A [ 3 2 ] .Large CSPs as compared
to wild-type PKA document the inactivation of the protein. By contrast, the
mutation on the other possible phosphorylation sites, as exemplified by the
872
Fig. 15.2-12 (a) Overlay o f a section o f the active state o f wild-type PKA and the inactive
[' H,15N]-TROSY spectra ofwild-type protein mutant T197A is proven by the large CSPs
kinase A (PKA) (black) and mutant T197A shown in the overlay. (b) The mutation o f
(red). The mutant T197A lacks the the other phosphorylation site Ser338 to an
autophosphorylation site Thr197 and is alanine does not cause conformational
therefore constitutively inactive. The changes since no CSPs could be observed in
conformational rearrangement between the the spectrum.
mutant S338A, leads to much smaller CSPs. Finding constitutively active

mutants is an alternative, but care has to be taken: A single point mutation of
the corresponding threonine to an acidic residue is successful in few cases.
The implications have been investigated in more detail for MAPK and PKA,
respectively [32, 531.
Changing the activation status in vitro can be a tedious undertaking, since re-
action by incubation with the activating kinase or inactivating the phosphatase,
respectively, tends to be incomplete. Coexpression of a kinase together with its
activating predecessor in the signal cascade usually leads to a defined activation
status. For example, p38 MAP kinase was expressed in a dual construct with the
activating kinase MKKG, which itself had to be activated by point mutation [54].
Comparing the recorded ['H, ''NI-TROSY of a uniformly "N-labeled sample
with that of the inactive p38 MAP kinase provided the proof for the long-range
nature of interactions that rearrange the particular protein loops at the lip of
the catalytic cleft. Biomolecular NMR can monitor the success of such exper-
iments as an complemental technique to the measurement of the enzymatic
activity of the kinase. Mapping the observed CSPs on the crystal structure of
the corresponding kinase illustrates atomic details of the activation itself.
Activation can also be monitored through ATP-binding studies. Measure-

ment of ATP binding, for example, by the highly sensitive'H STD (saturation
transfer detection) NMR, needs small amounts of protein and does not rely
on '5N-labeled protein. However, care should be taken since in many cases
protein kinases possess very weak affinity to ATP that is caused by their low
basal activity. Although these affinities are very low (millimolar &), STD NMR
can be sensitive enough to indicate binding that is easily misinterpreted in
terms of phosphotransferase ability.
Direct observation of 31P NMR signals seems to be a more straightforward
approach to monitor protein phosphorylation. It has the additional advantage
that the phosphate groups in different phosphorylation sites can be discrimi-
nated and characterized in terms of function and dynamics [55]. However, "P
NMR cannot characterize alternative phosphorylation-independent activation
mechanisms. The direct observation of 31P N M R signals is inherently less
sensitive than the observation of proton signals and can thus easily fail to
detect low phosphorylation levels.
15.2.2.4.3 Studying the Dynamic Behavior of Kinases

Complex motions that are essential for the biochemical function are typical
for the entire class of protein kinases. A thorough understanding of protein
dynamical processes can provide novel points of attack for pharmaceutical
applications. X-ray structural analyses of protein kinases provide snapshot
pictures of the protein at different stages of their conformational cycle.
For PKA, which is the best studied kinase in this respect, many different
crystal structures provide a thorough understanding of the motions [56,
571. This data can help understand activation processes in other kinases
too. However, this picture is inherently incomplete as it relies upon the
availability of structural data that covers the whole motion. Models based
on analogy arguments have to be further tested whether motions inferred
for PKA can in fact be transferred to other kinases. Solution-state NMR
can act as a complement in providing a dynamic picture that links the
static structures obtained by X-ray crystallography. Two NMR methods can
provide information about conformational rearrangements of a protein at
atomic resolution. NMR relaxation measurements yield information on the
timescale of a process, whereas RDC can characterize the spatial nature of
such a motion.
"N and 2 H relaxation can be employed to detect fluctuations of
backbone dynamics of protein kinases on the nano- to picosecond and
milli- to microsecond timescales. Analysis of the relaxation data allows for
a semiquantitative estimation of the conformational entropy change for
the main chain of protein kinases dependent on ligand binding or point
mutation.
RDCs can be measured in weakly aligning media such as phage solutions,
bicelles or liquid crystals. The protein is restricted in its free tumbling by the
874
I media. The orientation-dependent dipolar coupling between nuclear spins,
15 Target Farndies
which is averaged to zero in free solution, becomes measurable without

losing the advantages of solution-state NMR. RDCs contain information about
the orientation of a nuclear spin pair with respect to an alignment tensor
that is dependent on the way the protein rotation is hindered. Noteworthy,
RDCs can be useful for the assignment of a protein with known tertiary
structure. The knowledge of an orientation can be used as an additional tool to
distinguish between several possible assignments. But the main contribution
is the ability to determine relative orientations of protein domains or, as in the
case of protein kinases, to distinguish between the more rigid and the flexible
segments.
15.2.2.4.4 Binding Mechanisms by Lineshape Analysis

The elucidation of different mechanisms of ligand binding can lead to a
deep understanding of specificity. The lineshapes of NMR signals include
information about the kinetics of processes in the range of a micro- to
millisecond timescale [58, 591. By titration of a ligand to a kinase, the
lineshape alters differently for different underlying reaction mechanisms
[GO]. Figure 15.2-13shows the expected lineshape ofa nucleus that changes its
resonance frequency during the binding event. Typical lineshapes are revealed,
when two states exist in equilibrium: Protein/ligand complexes and unbound
molecules. At the beginning and at the endpoint of the titration, a sharp peak
is expected at resonance frequency of the free protein and of the complex,
respectively (Figure 15.2-13(a)).The lineshape constantly alters during the
titration to broad peaks at intermediate resonances. The mechanism showing
this titration behavior corresponds to a classical key/bolt principle. The ligand
initially fits to the pocket of the protein without a conformational change in the
neighborhood of the observed nucleus. In Fig. 15.2-13(b)it is presumed that
the ligand first binds to the protein and an intermediate state is built, which
then reacts to the complex. A third peak occurs during the titration, which gives
evidence for the existence of the intermediate state. A titration as depicted in
Fig. 15.2-13(c)would indicate an induced tit mechanism. The ligand induces
a conformational change of the protein (or the free protein already exists in
two forms) and only in this induced state the protein can react with the ligand
to the complex. The peak of the “activated” protein initially occurs during the
titration and disappears with increasing ligand concentration. The successful
interpretation of lineshape titrations in terms of reaction mechanisms has
already been demonstrated on SH2 domains [61, (521. The reaction mechanism
can be obtained independently for different areas of the protein and the
consistency of the interpretations can be checked by mapping the results of
each nucleus on the structure of the protein. With such an amount of data
redundancy, ligand binding mechanisms can also be elucidated for catalytic
domains of kinases. A relevant interpretation for a drug discovery process is
15.2 Chemical Biology ofKinases Studied by N M R Spectroscopy 1 875
Fig. 15.2-13 Examples for simulated PL shows a possible titration curve o f a

lineshape oftitrations curves, which indicate ligand that does not initially fit into the
different binding mechanisms. The color o f ATP-binding pocket. The ligand binds to the
the curves changes from blue to red with protein building an intermediate state. This
increasing ligand concentration, while the intermediate state i s in a conformational
protein concentration is kept constant. equilibrium with the complex. The titration
+
(a) The model P L cf PL describes a ends with two separate peaks. This
titration curve o f a small ligand binding, for ensemble oftitration curves is incompatible
example, to the hinge ofthe ATP-binding with a key/bolt mechanism described in (a).
site. The lineshape changes from a sharp + *
(c) The model P L tf P +L cf PL
peak ofthe particular amide resonance in depicts possible titration curves o f a weak
the free protein state t o another sharp peak binder to an alternative conformation o f the
at the resonance o f the bound state. The kinase (e.g., the putative DFG-in/DFC-out).
intermediate curves show broader peaks A conformational equilibrium already exists
with maxima between the lamor frequencies in the free form. The ligand binds only to one
ofthe two states. This titration would give o f t h e conformations. The titration starts
no evidence for a reaction mechanism that with two peaks. One o f t h e m constantly
i s more complicated than a simple keylbolt decreases while the other signal broadens
+
mechanism. (b) The model P L t, PL” u and then arises in a new sharp peak.
even possible for cases, when a resonance appears or disappears with ligand
binding, as it is shown in an example in Chapter 15.2.4.2.
15.2.3
Screening of Kinases by NMR
1 5.2.3.1 Screening Techniques/Strategies

Fragment-based screening is a lead discovery approach as an alternative to
HTS-based techniques [63, 641. Much lower molecular weight (150-300 Da)
compounds are screened relative to HTS campaigns. Fragment-based hits are
typically weak inhibitors (10 pM-mM), and therefore need to be screened at
higher concentrations using sensitive biophysical detection techniques such as
protein crystallography and NM R. For high concentration bioassays, SPR, and
876
I conventional HTS the interpretation of data can often be hindered because of
15 Target Families
the high false-positive readouts. X-ray crystallography and NMR spectroscopy

provide robust and straightforward information, but the typical throughput of
up to 10000 compounds per screen is to be regarded as medium size. This
drawback is compensated by higher hit rates than observed with HTS, because
the lower complexity of the compounds have a higher probability of matching
a target protein-binding site. Moreover, fragment hits typically possess high
efficiency upon binding (binding energy per unit molecular mass). If the
binding interactions of these fragment hits can be structurally validated, they
are highly suitable for subsequent chemical optimization as clinical candidates
with good druglike properties [25].
Especially, after an initial HTS screen fails to produce viable hits, the
pharmaceutical research seeks to expand its lead identification strategies.
Therefore, NMR-based screening is gaining momentum relative to HTS-based
techniques [65-681.
Unlike enzymatic assays, N M R or X-ray crystallography screens do not
require enzymatic activity but measure the binding effect itself. Advan-
tageously, an inactive kinase can be targeted by the screening efforts.
Additionally,both methods can monitor binding sites other than the active site.
The general advantage of an NMR-based approach is that it is a solution-state
technique, which facilitates the handling. The folded state is discriminated
from the unfolded state of the target protein in each single experiment. There
is no need for immobilization (like in SPR), or crystallization (like in X-ray
screening) of the protein.
NMR-based screening methods can be classified into two groups according
to the source of the observed signals, ligand-detected NMR and protein-
detected NMR.
Ligand-detected NMR is a robust method, which is well suited to screening
compound mixtures with rapid deconvolution [69]. The target protein does
not have to be labeled and its size is typically much more than 20 kDa. The
protein production requirements are considered to be moderate. If the screen
is performed in the presence of a competitor with known interaction mode,
active site versus nonactive site binders can be distinguished and binding
affinities can be derived.
Protein-detected NMR, as exemplified by the patented "SAR (structure-
activity relationship) by NMR" approach [70], provides the principle interac-
tions between ligand and protein, if a backbone assignment is available. Like
ligand-detected NMR, compound mixtures can be screened, but the decon-
volution to identify the real hit needs additional experiments. Usually, the
"N CSPs of the protein amide resonances are observed, which are caused
upon ligand binding. If the three-dimensional structure of target protein is
available, direct information about the binding interactions is extracted at
the residue-specific resolution. In combination with a follow-up investigation
using NMR-restrained molecular docking, the binding mode of the viable hits
can be derived at atomic resolution [71].At least "N-labeling is required, which
75.2 Chemical Biology of Kinases Studied by NMR Spectroscopy I 8 7 7
increases the demands for the protein production and pushes the achievable
size limit to 40-50 kDa. An investment in a cryo-probehead reduces the
protein amount by at least fourfold.
The synergy of ligand- and protein-based NMR screening is revealed in their
combination. Ligand-detected N M R is used as a primary screen in large scale
sampling, hit validation is performed with protein-based N M R with much less
samples. False positives obtained from the primary screen are ruled out and
subsequent analysis during the validation step increases the knowledge gain
about the desired interaction mode of the prestage drug candidates.
15.2.3.2 Fragment Approach

The fragments can be considered as building scaffolds of a more complex
compound. After an initial validation they are combined or optimized into
compounds that meet the rational criteria of lead generation. As reviewed in
a more detailed fashion, the optimization process can be divided into three
strategies [64].
Fragment linking is used if two fragments have been identified, which bind
in adjacent binding sites being close enough to each other to be chemically
linked. Fragment evolution means the subsequent chemical modification
by introducing optimized functional groups or new side chains that target
additional interactions in the active site of the protein. Fragment self-assembly
makes use of reactive templates that are capable of self-assembly in the
presence of a seed template molecule.
The first two methods are heavily constrained on the available structural
data. The assignment of, at least, protein backbone resonances is clearly the
prerequisite to be achieved prior to a chemical optimization series. Lacking an
assignment, the information obtained by NMR-based fragment identification
and by their corresponding binding affinities can be transferred to X-ray
crystallography to reduce the number of trials.
The prominent role of virtual screening of large compound databases is to
be outlined. Both NMR and X-ray crystallography have a medium throughput
combined with costly instrumentation or specialized infrastructure. A
rational approach made in silico to select a smaller subset for screening,
addresses the economic demands of industrial research. The filter rules
used during the virtual screening already incorporate the basic properties
of fragments from known inhibitors. The first filter rule downsizes
the available compound collection for NMR or crystallography suitable
properties, like for example water solubility, and removes unwanted
functional groups. During the next filtering the wanted functionality is
included, which match the localized recognition elements in a simplified
model of interaction. As an example, the pharmacophoric fingerprint
for a given protein kinase considers the aromaticity of the adenine
moiety in combination with its H-bond donor and acceptor functions
[72]. Successful applications have not only been reported for single
878
I protein
15 Target Families
kinases, for example carboxyl-terminal Src kinase homologous

kinase-1 (ChK-1) kinase and CK2 [73, 741, but also at the gene-family
level [75, 761.
15.2.3.3 NMR Reporter Screening

Ligand-detected N M R screening in the presence of a competitor with known
complex structure and affinity has several advantages. Firstly, the strong
resonances of the well-behaved and highly soluble competitor serve as the
reporter for the binding event. As a matter of principle, it has lower affinity
and its signals can be obtained by various N M R methods (e.g., T l p relaxation
or STD N M R experiments depending on the best dispersion within a given
screening run) [42, 43, 77, 781. This permits the detection of potential high-
affinity molecules that are only marginally soluble, thus significantly enlarging
the diversity of compounds amendable to N M R screening. Secondly, with
the known binding constant of the reporter compound, K,values of the hits
can be derived. This useful approach allows a ranking applied to the primary
hit list. For practical purposes, relative binding affinities are sufficient in
most cases if the absolute & value of the reporter remains unknown. It is
the prioritization of viable compounds to further investigations, which is of
importance especially in the competitive environment of industrial research.
Thirdly, if the compound was identified in the absence of the competitor and if
the experiment is repeated in its presence, conclusions about the binding site
can be drawn. Either the competitor is directly displaced by the candidate or
the binding event takes place at an allosteric site. If applied to protein kinases,
a simple derivative of the adenine group is suitable, which provides rather low
affinity combined with good aqueous solubility.
In a similar manner, allosteric kinase inhibitors can be discovered by spin-
labeled adenine analogs (Fig. 15.2-8(b)).The degree of paramagnetic relaxation
enhancement allows an estimation of the distance of the ligand relative to the
spin label. This method is ideally suited to the fragment-linking approach of
lead generation [79, 801.
The use of the ATP resonances as the reporter has been described
recently. Reduction of the ATP STD NMR signals by a competitive inhibitor,
permitted a direct measurement of the inhibitor Ki with respect to the natural
substrate ATP. After this initial measurement, the assay was combined with
paramagnetic relaxation enhancement effect. In a second step, Maganese
ions were added to the samples, which turned the Mg2+/ATP complex into
a paramagnetic probe. The proximity of a potential non-ATP competitive
compound can be inferred [81].
Alternatively, a recognition site in close proximity to the ATP-binding site
can be targeted with an oligopeptide as the reporter ligand, whose sequence
is derived from the activating kinase MKK3b in case of p38 MAP kinase
[82]. The inhibition of the protein-protein interaction with a small peptide
could serve as a template for peptidomimetic inhibitor development [83, 841.
Though no nonpeptidic small molecules have been reported to bind at the

protein substrate recognition sites or the recruitment sites, respectively, it is
still a considerable scope. The diversity of potential contacts and the resulting
selectivity nurture prospects of further research efforts instead of targeting the
conserved ATP-binding site.
15.2.3.4 Results for Screening

An effective follow-up strategy of initial fragment-based hits is crucial if the
information from weak N M R binders should lead to the identification of potent
and selective optimization candidates. Especially, the protein kinase-family
approach with its deep and well-defined ATP-binding pocket is challenged by
the large amount of data being generated by the high hit rates. Depending on
the virtual screening method that leads to the tailored screening library, hit rates
roughly up to 5% can be expected. Figure 15.2-14 exemplifies that the family
approach with a kinase-biased compound collection yields a large number
of hits. Data mining and the subsequent selection for specificity toward
a particular kinase is an important but difficult process [85]. The general
workflow, which proves to be valid for compound or library optimization
respectively, is outlined.
First, the existing knowledge about kinase inhibitors is incorporated into the
starting compound collection, which serves as a validation set. The fragments of
known high-affinity ligands are identified and ligand-detected N M R approach
is applied to the ensemble of kinases. As a proof of principle, the binding
profile of different kinases, that is the binding sites of selected fragments must
be identified by the different N M R methods and the results are compared with
the data obtained by other assays or X-ray crystallography, respectively.
Second, the validated N M R methodology is applied to a screening approach at
larger scale. Virtual screening creates the kinase-biased compound collection.
Simplified models of interactions observed at the validation set provide
the basis for the filtering rules, which downsize large in-house compound
collections.
Third, all hits are subjected to an N M R competition assay to derive binding
affinities (Ki values) to each kinase. This data generates a ranking list, in which
compounds with reasonably sufficient affinity or selectivity toward a particular
kinase are identified. Patent situation and chemical feasibility, if the fragment
is capable of further development, is regarded.
Fourth, protein-detected N M R by ['H, "N]-TROSY spectra is applied to
a selection of compounds obtained during the ranking. On one hand, this
step verifies the observed hits and rules out false positives. On the other
hand, the binding site is characterized by mapping the observed CSPs on the
corresponding 3D structure.
Fifth, the CSPs for compounds of high interest are subsequently used as
restraints for molecular docking simulations. The binding mode is revealed at
atomic resolution and medicinal chemists can select desired ligand properties
75 Target Families
880
I
Fig. 15.2-14 Kinase selectivity profiles for a best with the observed kinase affinities.
representative dataset obtained by ligand- Twenty-nine clusters are lined up
detected NMR fragment approach. Each row consecutively and all the compounds, which
represents a kinase and the columns are members of a single cluster, are ordered
represent a small-molecule fragment. Eight again. The higher the average affinity, the
hundred and seventy compounds were position o f a compound is more t o the left
chosen out of a larger kinase-biased side within a given cluster and vice versa. At
screening library. The color-coding scheme the bottom, three clusters are denoted as an
corresponds t o a particular compound example. Most fragments bind with similar
having K, values greater (light gray) or lower affinities t o all kinases. To choose for
(dark gray) than 1.5 m M toward a single selectivity, isolated dark gray areas are t o be
kinase. The horizontal order is the result o f a picked within a row, where several similar
hierarchical clustering analysis compounds group together but do not show
(euclidean-Ward) with 65 descriptors (out o f the same affinities toward the other kinases
210 chemical descriptors), which correlate (light gray in the vertical).
to design the next optimized compound collection. In an iterative fashion, the

newly synthesized library is subject to step three. Figure 15.2-15 exemplifies
the improvement after two iterations.
The workflow clearly demonstrates that the combination strategy of ligand-
detected and protein-detected NMR meets the economical demands of
pharmaceutical research. The throughput of ligand-detected NMR is faster
and the requirements for producing unlabeled protein is moderate. With
the increase of ligand knowledge, protein-detected N M R with "N-labeled
kinases becomes more prominent. Specific questions are addressed with more
sophisticated N M R methods, which are more time consuming but lead to
detailed information about protein-ligand interaction at atomic resolution.
15.2 Chemical Biology ofKinases Studied by NMR Spectroscopy I881
Fig. 15.2-15 Development o f kinase-biased than to the others due t o the activated state
screening libraries during an NMR-based ofthis kinase. (b) A largerfragment library is
fragment approach. The K, values ofthe created by virtual screening, which utilizes
fragments are obtained by quantification o f the pharmacophoric fingerprints o f known
the STD NMR resonances o f an adenosine kinase inhibitors. After the screening, viable
derivative, which i s used as the reporter hits are selected and characterized by
ligand in a competition assay, (a) As a proof protein-based NMR. The information about
o f principle, published high-affinity ligands selectivity, binding site, and binding mode
are fragmented into their components. The was used for step-by-step optimization of
NMR method is applied t o this validation Small compound collections. (c) and
set, which reveals the typical affinities o f a (d) show that the synthesis efforts result in
particular kinase toward the “standard” the enhancement o f selectivity toward the
kinase fragments. For example, fragments third kinase.
usually exhibit higher affinities t o kinase A
A potent inhibitor of the serine/threonine Jun N-terminal Kinase 3 (JNK3)

was identified after using a fragment-based N M R approach. A follow-up study
with competition-bindingmethods and molecular docking based on the crystal
structure of JNK3 proposed potential binding models. These models were
used in turn to synthesize a set of several thousand optimized compounds
IS Target Families
882
I that contained elements from the original fragments, leading to the final
hits [%I.
The same NMR approach by the fragment-linking strategy was used to
enhance the activity of a weak inhibitor of p38 MAP kinase. By adding the first
one and then a second aromatic ring onto the central five-member heterocylic
ring, the activity reached the nanomolar regime [87].
15.2.4
Characterizing Kinase-Ligand Interactions by NMR
There are many interaction sites for inhibiting the phosphotransferase activity
of a protein kinase. Antagonism of the ATP-binding site to inhibit enzymatic
activity is the center of most investigations. Inhibition of this site can be
accomplished by unspecific inhibitors like staurosporine, and various kinase-
specific inhibitors have been discovered. Nevertheless, selectivity continues to
be a problem due to the commonality in the binding of ATP. All ATP site
binders bind to the highly conserved “hinge” region that connects N- and
C-terminal lobes. But the deep ATP cleft consists of several subsites that can
be utilized in the structure-based design of inhibitors. For example, the pivotal
role of protruding nonconserved residues has been reported, which facilitates
the access to particular subpockets, like a gate keeper. In the cases of imatinib,
gefitinib, and erlotinib clinical trials exhibited that single point mutations in
the active site lead to chronic resistance during the drug treatment [88, 891.
Alternatively, the kinase activation by interfering with regulatory subunit
binding can be prevented. Interactions can be stabilized, which maintain
kinase in the inactive form where it cannot bind ATP or where the residues
are misaligned for catalytic activity. Since inactive kinases must be correctly
recognized by activating enzymes, they differ more strongly from one another
than the activated forms, all of which fulfill the same function. The design of
binders to the inactive form could achieve a higher degree of selectivity. In
particular, the Asp-Phe-Gly motif (DFG) of the activation loop has attracted
much attention from medicinal chemists. A selective inhibitor at an adjacent
binding site turns a residue of the DFG loop into an “out” conformation
that precludes ATP from binding [go, 911. Kinase activity can be indirectly
inhibited by blocking the protein substrate recruitment site or by direct
inhibition of substrate phosphoacceptor subsite. Like all protein-protein
interaction surfaces this binding site is more difficult to target by small-
molecule inhibitors. It remains a considerable task for selectively targeting
individual kinases in this manner.
15.2.4.1 Mapping o f Chemical Shift Perturbations

The observation of ligand-induced NMR CSPs usually defines the interaction
site of a ligand reasonably well, if the assignment of NMR resonances is
available. As mentioned above, protein-based N M R screening approaches

are ideally suited as a follow-up to ligand-detected NMR approach at a
larger scale. Simultaneously with the validation of the primary assay it
allows the determination of the binding site for identified ligands (also
offering the combination with other primary assays like HTS or SPR).
Figure 15.2-16 shows the ligand-induced CSP for p38 interacting with
either the small-molecule inhibitor SB203580 (see structure in Fig. 15.2-8(c))
binding to the ATP-binding site or an oligopeptide binding to the protein
substrate docking site for MEF2A [82]. As the protein-ligand complexes
are known in both cases, CSPs can be easily compared with the complex
structures and show close similiarity between the X-ray structure and
the structure derived by N M R . The pronounced CSPs of SB203580
are induced by the ring current effects of the aromatic ring systems
in SB203580. The peptide derived from MEF2A induces weaker CSPs
because of the lack of aromatic amino acids in the peptide sequence.
The affected region covers a larger part of the protein, reflecting the
size of the peptide and the tertiary rearrangements known from the X-ray
structure.
Fig. 15.2-16 Ligand binding is detected by which considers the average value o f CSPs
CSPs. The two-dimensional [’ H,15N]-TROSY and their mean square deviation. (a) CSPs
spectra o f the uniformly 15N-labeledp38 o f the small-molecule inhibitor 58203580
MAP kinase in the absence and presence are mapped on 1A9U.pdb (b) CSPs o f t h e
compared. The difference o f a given amide oligopeptide (KPDLRVVIPP) derived from
resonance on ligand binding is calculated the protein substrate MEF2A mapped on
and projected on the crystal structure o f the 1 LEW.pdb.
kinase. A color-coding scheme is used,
884
I 15.2.4.2
15 Target Families
DFC-in/DFC-out
Recently, an alternative binding site adjacent to the ATP-binding cleft has been
exploited for pharmaceutical intervention. The pyrazole-urea-based inhibitor
BIRB796 (structure see Fig. 15.2-8(d))induces an alternative conformation of
the DFG motif of p38 MAP kinase, turning the side chain of Phe169 from an
“in” to an “out” configuration. The corresponding loop undergoes a 10 A shift
that precludes ATP binding through the incompatibility of the new position
of the Phe side chain. This recognition principle has been successfully applied
to the protein kinases such as Raf [92],p38 MAP kinase [go, 91, 931, or kinase
insert domain receptor (KDR) [94].
In the NMR analysis of this part of the polypeptide chain, the DFG loop
(Asp168-Phe169-Gly170) turned out to be one of the segments that could
not be assigned in the spectra of the apo form of p38 MAP kinase. A [’H,
”N]-TROSY spectrum recorded from selectively l 5N-Phe labeled samples
revealed 12 of 13 phenylalanine correlations. The 12 visible signals were
unambiguously assigned; the unobservable signal belongs to Phe169 in the
DFG loop. This finding was confirmed by the spectrum of selectively ”N-Phe
labeled mutant Phel69Tyr, which exhibited an identical TROSY spectrum with
12 peaks. Altered field strengths, temperatures, and more sensitive acquisition
conditions with a cryoprobe head did not affect the result.
On addition of the pyrazole-urea-based DFG-out inhibitor to a selectively
”N-Phe labeled p38 sample, 13 peaks can be detected. A further investigation
with 13C’-labeledAsp/”N-labeled Phe, recording a HNCO-type experiment
confirmed the assignment of Phe169. The lineshape of the Phe169 amide res-
onance was simulated and analyzed with respect to the ability to detect the peak
in a [‘H, ”N]-TROSY spectrum. The chemical shift difference between DFG-
in and DFG-out conformations was estimated by a chemical shift prediction
according to the published X-ray structures. Figure 15.2-17 shows the relative
maximum peak intensities of the amide ”N-resonance of Phe169 as a function
of the exchange rate and the relative population of the “out” state in the sim-
ulated spectra. The lowest peak intensities are expected for medium exchange
rates at equally distributed states. The extent of this area shrinks with the
decreasing field strength of the spectrometer. The situation during the NMR
measurement of the apo protein and complexes with DFG-in ligands seems to
be in the depicted area, where the lineshape leads to excessive broadening. In
principle, the peak is detectable again by changes in temperature (move left or
right in the diagram), by a decrease of the field strength or by “freezing” one
of the two conformations with a ligand (move up or down in the diagram).
For the apo form of the p38 MAP kinase it was deduced that the absence
of the amide peak for Phe169 in the DFG motif under all tested N M R
conditions is consistent with a conformational “in/out” equilibrium taking
place at an intermediate NMR timescale. Binding of the pyrazole-urea-based
DFG-out inhibitor is not compatible with the DFG-in conformation; therefore,
the conformational exchange process of the DFG loop is directly interfered.
Fig. 15.2-17 Simulation o f NMR spectra o f maximum detectable peak intensities are
a two state DFC-in/DFC-out model. The expected for medium exchange rates and
grayscale represents the relative maximum about uniformly distributed states. The
peak intensities o f the 15N-amide resonance extent of this area shrinks with decreasing
o f Phe169 as a function o f the exchange rate field strength ofthe spectrometer.
and the population ofthe “out” state. The Unobservable peaks can be made visible
magnetic field strength is set according to a again by changes in temperature (move left
’H resonance at 600 MHz. The chemical or right in the diagram), by decrease o f t h e
shift difference was set to 13.7 PPm, as field strength, or by “freezing” one of the
predicted by chemical shift calculations two conformations with a ligand (move up
applied to the published X-ray structures Or down in the diagram).
(1 P38.pdb and 1 KV1 .pdb). The lowest
The observation of the Phe169 amide resonance in the presence of DFG-out

inhibitors confirms this hypothesis.
In contrast to the pyrazole-urea-based compounds, the inhibitor class
similar to SB203580 has been described as DFG-in binders [95],where the con-
formation of the DFG loop is similar to that observed in the crystal structures
of apo form of p38 [96, 971. SB203580 or SKF86002 as DFG-in ligands of p38
do not invoke additional peaks in the [‘H, ”N]-TROSY spectrum of ”N-Phe-
p38. The observed N M R data suggests that the reported “DFG-in” binders
leave a putative conformational DFG “in/out” equilibrium in a time regime,
886
I where the Phe169 amide correlation depletes. This suggestion is in agreement
15 Target Families
with the results obtained by biological assays, that is, DFG-in ligands do not
interfere with the p38 activation [98],whereas DFG-out inhibitors block both
activity and activation of p38 [99].
15.2.4.3 LIGDOCK
The LIGDOCK procedure [71] was suggested for the determination of
protein-ligand complex structures from non-X-ray data. Ambiguous exper-
imental data from NMR [loo, 1011 or from other biophysical or biochemical
experiments is introduced in an ambiguous manner [102-1041, which makes
it possible to determine proteinlligand complexes on the basis of only a few
experiments. The concept is based on the idea to collect readily available CSPs,
first. If necessary, more sophisticated experimental results have to be added to
improve the accuracy of the structure determination. The calculations consist
of three stages. In the first step, the two molecules are positioned distinct to
each other and a rigid body minimization is performed. Poses that best fulfill
the experimental parameters proceed to a simulated annealing in torsion angle
space keeping the ligand and the binding area ofthe protein as flexible. Possible
solutions are equilibrated with a molecular dynamics simulation using explicit
water. A critical step of the procedure is the ranking of the structures. Accurate
structures are picked from a “selection plot” in which both the intermolecular
van der Waals and experimental energy are plotted. Structures having both a
low van der Waals and a low experimental energy are possible solutions. By
contrast, structures in which only one of the two energy terms is low are dis-
carded. The approach was tested for three examples with increasing degree of
complexity. The determination of PTP8 in complex with ptplb can be resolved
with CSPs only. Here, the definition of the binding site suffers to resolve the
structural problem. The calculation for H7 in complex with PKA presents two
problems, which are common in the structure determination using non-X-ray
data: only partial NMR assignment of the protein was available and addition-
ally, the protein conformation in the complex is an “open” conformation but
the apo structure has a “closed” form. The choice of the starting conformation
influences the result of the simulation. Nevertheless, the calculations were
started with the “wrong” apo form. Surprisingly, the orientation and possible
constructive interactions of the quinazoline ring that is the main feature of the
H series of inhibitors are correctly reproduced, although the starting structure
of the protein and the known X-ray structure of the complex were very different
and only partial assignment of PKA was available. The determination of the
structure of SB203580 in complex with p38 was most complicated because of
the specific shape of the ligand. It has one twofold and one threefold rotation
symmetry axes, implying that the ligand can occupy the binding site also in
other symmetry-related orientations. Therefore, it is not possible to determine
the complex structure with CSPs only. But in combination with either STD
References I887
experiments of selectively labeled p38 (SOS-NMR, structural information us-

ing overhauser effects and selective labeling) [lo51 or the introduction of a
knowledge-based restraint, this structural problem could be resolved.
References
1. A. Bellacosa, C.C. Kumar, A. Di 16. S.R. Hubbard, L. Wei, L. Ellis, W.A.

Cristofano, J.R. Testa, Adu. Cancer Hendrickson, Nature 1994, 372,
Rex 2005, 94, 29-86. 746-754.
2. H. Hirai, N. Kawanishi, Y. Iwasawa, 17. D.R. Knighton, J.H. Zheng, L.F. Ten
Curr. Top. Med. Chem. 2005, 5, Eyck, V.A. Ashford, N.H. Xuong, S.S.
167- 179. Taylor, J.M. Sowadski, Science 1991,
3. J.G. Shelton, L.S. Steelman, S.L. 253,407-414.
Abrams, F.E. Bertrand, R.A. 18. F. Zhang, A. Strand, D. Robbins,
Franklin, M. McMahon, ].A. M.H. Cobb, E.J. Goldsmith, Nature
McCubrey, Expert Opin. 7'her. Targets 1994, 367,704-711.
2005, 9,1009-1030. 19. J.A. Adams, S.S. Taylor, Protein Sci.
4. S.M. Keenan, J.A. Geyer, W.J. Welsh, 1993, 2, 2177-2186.
S.T. Prigge, N.C. Waters, Comb. 20. J . Zheng, D.R. Knighton, N.H.
Chem. High Tnroughput Screen 2005, Xuong, S.S. Taylor, J.M. Sowadski,
8, 27-38. L.F. Ten Eyck, Protein Sci. 1993, 2,
5. J. Dancey, E.A. Sausville, Nat. Rev. 1559-1573.
Drug Discov. 2003, 2,296-313. 21. M. Lei, M.A. Robinson, S.C.
6. T. Schindler, W. Bornmann, Harrison, Structure ( C a m b ) 2005, 13,
P. Pellicena, W.T. Miller, 769-778.
B. Clarkson, J. Kuriyan, Science 2000, 22. D.R. Knighton, J.H. Zheng, L.F. Ten
289,1938-1942.
Eyck, N.H. Xuong, S.S. Taylor, j.M.
7. S. Kobayashi, T.J. Boggon,
Sowadski, Science 1991, 253,
T. Dayaram, P.A. Janne, 0. Kocher,
414-420.
M. Meyerson, B.E. Johnson, M.J.
23. M.E. Bollard, E.G. Stanley, 1.C.
Eck, D.G. Tenen, B. Halmos, N. Engl.
Lindon, J.K. Nicholson, E. Holmes,
J . Med. 2005,352,786-792.
NMR Biomed. 2005, 18,143-162.
8. T. Hunter, Cell 2000, 100, 113-127.
24. W.F. Reynolds, R.G. Enriquez,J. Nat.
9. S.K. Hanks, Genome Biol. 2003, 4,
111.
Prod. 2002, 65,221-244.
10. S.S. Taylor, E. Radzio-Andzelm, 25. R.A. Carr, M. Congreve, C.W.
T. Hunter, F A S E B J . 1995, 9, Murray, D.C. Rees, Drug Discou.
1255- 1266. Today 2005, 10,987-992.
11. A. Gescher, Cen. Pharmacol. 1998, 26. I. Hunt, Protein Expr. Pur$2005, 40,
31,721-728. 1-22.
12. F.A. al-Obeidi, J.]. Wu, K.S. Lam, 27. M.J. Wood, E.A. Komives, J . Biomol.
Biopolymers 1998, 47, 197-223. NMR 1999, 13, 149-159.
13. T.J. Boggon, M . J . Eck, Oncogene 2004, 28. M. Bruggert, T. Rehm, S. Shanker,
23,7018-7927. J. Georgescu, T.A. Holak, J . Biomol.
14. H.L. De Bondt, J. Rosenblatt, N M R 2003, 25,335-348.
1. Jancarik, H.D. Jones, D.O. Morgan, 29. A. Strauss, F. Bitsch, G. Fendrich,
S.H. Kim, Nature 1993, 363, P. Graff, R. Knecht, B. Meyhack,
595-602. W. Jahnke,J . Biomol. NMR 2005,31,
15. S.H. Hu, M.W. Parker, J.Y. Lei, M.C. 343-349.
Wilce, G.M. Benian, B.E. Kemp, 30. C. Mao, M. Zhou, F.M. Uckun,J.
Nature 1994, 369, 581-584. B i d . Chem. 2001, 276,41435-41443.
888 15 Target Families
I 31. M. Lei, W. Lu, W. Meng, M.C. 49. J. Goldberg, A.C. Nairn, J. Kuriyan,
Parrini, M.J. Eck, B. J. Mayer, S.C. Cell 199G, 84, 875-887.
Harrison, Cell 2000, 102, 387-397. 50. D.M. Payne, A.J. Rossomando,
32. T. Langer, S. Sreeramulu, P. Martino, A.K. Erickson, J.H. Her,
M. Vogtherr, B. Elshorst, M. Brtz, J. Shabanowitz, D.F. Hunt, M.J.
U. Schieborr, K. Saxena, Weber, T.W. Sturgill, E M B O J . 1991,
H. Schwalbe, FEBS Lett. 2005, 579, 10,885-892. .
4049-4054. 51. F. Li, M. Gangal, C. Juliano,
33. T. Langer, M. Vogtherr, 8 . Elshorst, E. Gorfain, S.S. Taylor, D.A. Johnson,
M. Betz, U. Schieborr, K. Saxena, J . Mol. Biol. 2002, 315,459-469.
H. Schwalbe, Chembiochem 2004, 5, 52. L.N. Johnson, M.E. Noble, D.J.
1508- 1516. Owen, Cell 1996,85,149-158.
34. K. Pervushin, R. Riek, G. Wider, 53. J. Zhang, F. Zhang, D. Ebert, M.H.
K. Wuthrich, Proc. Natl. Acad. Sci. Cobb, E.J. Goldsmith, Structure 1995,
U.S.A. 1997, 94,12366-12371. 3,299-307.
35. M. Sattler, J. Schleucher, 54. D. Brancho, N. Tanaka, A. Jaeschke,
C. Griesinger, Prog. N M R Spectrosc. 1.1. Ventura, N. Kelkar, Y. Tanaka,
1999,34,93-158. M. Kyuuma, T. Takeshita, R.A.
36. C.S.J. Johnson, Prog. N M R Spectrosc. Flavell, R.J. Davis, Genes Dev. 2003,
1998,34,203-256. 17,1969-1978.
37. W. Kremer, H.R. Kalbitzer, Methods 55. M.H. Seifert, C.B. Breitenlechner,
Enzymol. 2001, 339, 3-19. D. Bossemeyer, R. Huber, T.A.
38. F. Lohr, V. Katsemi, J. Hartleib, Holak, R.A. Engh, Biochemistry 2002,
U. Gunther, H. Ruterjans, J. Biomol. 41,5968-5977.
N M R 2003,25,291-311. 56. D.A. Johnson, P. Akamine,
39. P. Guntert, M. Salzmann, D. Braun, E. Radzio-Andzelm,
K. Wuthrich, J. Biomol. N M R 2000, M. Madhusudan, S.S. Taylor, Chem.
18,129-137. Rev. 2001, 101,2243-2270.
40. S. Neal, A.M. Nip, H. Zhang, D.S. 57. S.S. Taylor, J. Yang, J. Wu, N.M.
Wishart, J. Biomol. N M R 2003, 26, Haste, E. Radzio-Andzelm,
21 5-240. G. Anand, Biochim. Biophys. Acta
41. P.A. Kosen, Methods Enzymol. 1989, 2004, 1697,259-269.
177,86-121. 58. P.W. Andersen,3. Phys. Soc.Jpn.
42. W. Jahnke, Chembiochem 2002, 3, 1954, 9,316-339.
167-173. 59. J. Sandstrom, Dynamic nuclear
43. W. Jahnke, S. Rudisser, M. Zurini, J magnetic resonance spectroscopy,
Am. Chem. SOC.2001, 123, Academic Press, New York, 1982.
3 149- 3 150. 60. U.L. Gunther, B. Schaffhausen,J.
44. B. Cutting, A. Strauss, G. Fendrich, Biomol. N M R 2002, 22,201-209.
P.W. Manley, W. Jahnke, J. Biomol. 61. U. Gunther, T. Mittag,
N M R 2004, 30,205-210. B. Schaffhausen, Biochemistry 2002,
45. J. Wohnert, K. J. Franz, M. Nitz, 41,11658-11669.
B. Imperiali, H. Schwalbe,J. Am. 62. T. Mittag, B. Schaffhausen, U.L.
Chem. SOC.2003, 125,13338-13339. Gunther, /. Am. Chem. SOC.2004,
46. K.J. Franz, M. Nitz, B. Imperiali, 126,9017-9023.
Chembiochem 2003,4, 265-271. 63. D.A. Erlanson, R.S. McDowell,
47. M. Vogtherr, K. Saxena, S. Grimme, T. O’Brien, C. Wiesmann, K.J. Barr,
M. Betz, U. Schieborr, B. Pescatore, J. Kung, J. Zhu, W. Shen, B.J. Fahr,
T. Langer, H. Schwalbe, J . Biomol. M. Zhong, L. Taylor, M. Randal, S.K.
N M R 2005,32,175. Hansen, 3. Med. Chem. 2004, 47,
48. P.D. Jeffrey, A.A. Russo, K. Polyak, 3463- 3482.
E. Gibbs, J. Hunvitz, J. Massague, 64. D.C. Rees, M. Congreve, C.W.
N.P. Pavletich, Nature 1995, 376, Murray, R. Carr, Nat. Rev. Drug
313-320. Discou. 2004, 3, 660-672.
References I889
65. M. Vogtherr, K. Fiebig, EXS 2003, 93, 80. W. Jahnke, A. Florsheimer, M.J.
183-202. Blommers, C.G. Paris, J. Heim, C.M.
66. J.M. Moore, Curr. Opin. Biotechnol. Nalin, L.B. Perez, Curr. Top. Med.
1999, 10,54-58. Chem. 2003,3,69-80.
67. B. Meyer, T. Peters, Angew. Chem., 81. M.A. McCoy, M.M. Senior, D.F.
Int. Ed. Engl. 2003, 42, 864-890. Wyss,]. Am. Chem. Soc. 2005, 127,
68. M. Pellecchia, D.S. Sem, 7978-7979.
K. Wuthrich, Nut. Rev. Drug DiSCOv. 82. C.I. Chang, B.E. Xu, R. Akella, M.H.
2002, I , 211-219. Cobb, E.J. Goldsmith, Mol. Cells
69. K.A. Mercier, R. Powers,]. Biomol. 2002, 9, 1241-1249.
N M R 2005,31,243-258. 83. G. Kontopidis, M.J. Andrews,
70. S.B. Shuker, P.J. Hajduk, R.P. C. Mclnnes, A. Cowan, H. Powers,
Meadows, S.W. Fesik, Science 1996, L. Innes, A. Plater, G. Griffiths,
274,1531-1534. D. Paterson, D.I. Zheleva, D.P. Lane,
71. U. Schieborr, M. Vogtherr, S. Green, M.D. Walkinshaw, P.M.
B. Elshorst, M. Betz, S. Grimme, Fischer, Structure ( C u m b ) 2003, 1I ,
B. Pescatore, T. Langer, K. Saxena, 1537- 1546.
H. Schwalbe, Chembiochem2005, 13, 84. C, ~ ~M.J, ~ l ~D,I,
~ d ~ ~ ~ ~ ~ ~
13. Zheleva, D.P. Lane, P.M. Fischer,
72. N. Baurin, F. Aboul-Ela, X. Barril, Curr. Med. Chem. Anticancer Agents
B. Davis, M. Drysdale, B. Dymock, 2003, 3,57-69.
H. Finch, C. Fromont, 85. C. Mclnnes, P.M. Fischer, Curr.
'"f:
C. Richardson, H. Simmonite, R.E. P h a m . Des. 2005, 11,1845-1863.
Hubbard']' Cornput' sci' 86. J. Fejzo, C. Lepre, X. Xie, Curr. Top.
2004,44,2157-2166. Med. Chem. 2003, 3,81-97.
73. P.D. Lyne, P.W. Kenny, D.A. 87. J. Fejzo, C.A. Lepre, J.W. Peng,
Cosgrove, C. Deng, S. Zabludoff, J.J. G.W. Bemis, M.A. Murcko, Ajay,
Wendoloski, S. Ashwell, 1.Med. J.M. Moore, Chem. Biol. 1999, 6,
Chem. 2004,47,1962-1968.
755-769.
74. E. Vangrevelinghe, K. Zimmermann, 88. R. Ren, Nut. Rev. Cancer2005, 5,
J. Schoepfer, R. Portmann,
172-183.
D. Fabbro, P. Furet,J. Med. Chem.
89. T.A. Carter, L.M. Wodicka, N.P.
2003,46,2656-2662.
Shah, A.M. Velasco, M.A. Fabian,
75. E. ter Haar, W.P. Walters,
D.K. Treiber, Z.V. Milanov, C.E.
S. Pazhanisamy, P.Taslimi, A.C.
Atteridge, W.H. Biggs 111, P.T.
Pierce, G.W. Bemis, F.G. Salituro,
Edeen, M. Floyd, J.M. Ford, R.M.
S.L. Harbeson, Mini. Rev. Med.
Chem. 2004,4,235-253.
Grotzfeld, S. Herrgard, D.E. Insko,
S.A. Mehta, H.K. Patel, W. Pao, C.L.
76. C. Chuaqui, 2 . Deng, J. Singh,].
Med. Chem. 2005,48, 121-133. Sawyers, H. Varmus, P.P. Zarrinkar,
77. C. Dalvit, M. Flocco, S. Knapp, D.J. Lockhart, Proc. Natl. Acad. Sci.
M. Mostardini, R. Perego, B.J. U.S.A. 2005, 102, 11011-11016.
Stockman, M, Veronesi, M, Varasi,]. 90. C . Pargellis, L. Tong, L. Churchill,
Am. Chem. SOC.2002, 124, P.F. Cirillo, T. Gilmore, A.G.
7702-7709. Graham, P.M. Grob, E.R. Hickey,
78. W. Jahnke, P. Floersheim, N. Moss, S. Pav, J . Regan, Nut. Struct.
C. Ostermeier, X.Zhang, B i d . 2002, 9, 268-272.
R. Hemmig, K. Hurth, D.P. Uzunov, 91. J. Regan, A. Capolino, P.F. Cirillo,
Angew. Chem., Int. Ed. Engl. 2002, 41, T. Gilmore, A.G. Graham, E. Hickey,
3420-3423. R.R. Kroe, J. Madwed, M. Moriak,
79. W. Jahnke, M.J. Blommers, R. Nelson, C.A. Pargellis,
C. Fernandez, C. Zwingelstein, A. Swinamer, C. Torcellini,
R. Amstutz, Chernbiochem 2005, 6, M. Tsang, N. Moss,]. Med. Chem.
1607- 1610. 2003,46,4676-4686.
15 Target Families
890
I 92. P.T. Wan, M.J. Garnett, S.M. Roe, P.G. McCaffrey, S.P. Chambers, M.S.
S. Lee, D. Niculescu-Duvaz, V.M. Su,J. Biol. Chem. 1996, 271,
Good, C.M. Jones, C.J. Marshall, C.J. 27696-27700.
Springer, D. Barford, R. Marais, Cell 98. S. Kumar, M.S. Jiang, J.L. Adams,
2004, I 1 6,855-867. J.C. Lee, Biochem. Biophys. Res.
93. J. Branger, B. van den Blink, Commun. 1999,263,825-831.
S. Weijer, J. Madwed, C.L. Bos, 99. Y. Kuma, G. Sabio, J. Bain,
A. Gupta, C.L. Yong, S.H. Polmar, N. Shpiro, R. Marquez, A. Cuenda, J.
D.P. Olszyna, C.E. Hack, S.J. van Biol. Chem. 2005, 280,19472-19479.
Deventer, M.P. Peppelenbosch, 100. C. Dominguez, R. Boelens, A.M.
T. van der Poll, J. Immunol. 2002, Bonvin, J. Am. Chem. SOC.2003, 125,
168,4070-4077. 1731-1737.
94. P.W. Manley, G. Bold, J. Bruggen, 101. A.D. van Dijk, R. Boelens, A.M.
G. Fendrich, P. Furet, J. Mestan, Bonvin, J.P. Linge, S.I. O’Donoghue,
C. Schnell, B. Stolz, T. Meyer, M. Nilges, FEBSJ. 2005, 272,
B. Meyhack, W. Stark, A. Strauss, 293-312.
J. Wood, Biochim. Biophys. Acta 2004, 102. J.P. Linge, S.I. O’Donoghue,
1697,17-27. M. Nilges, Methods Enzymol. 2001,
95. Z . Wang, B.J. Canagarajah, J.C. 339,71-90.
Boehm, S. Kassisa, M.H. Cobb, P.R. 103. M. Nilges,J. Mol. Biol. 1995, 245,
Young, S. Abdel-Meguid, J.L. Adams, 645-660.
E.J. Goldsmith, Structure 1998, 6, 104. M. Nilges, S.I. O’Donoghue, Prog.
1117-1128. N M R Spectrosc. 1998, 32, 107-139.
96. 2. Wang, P.C. Harkins, R.J. Ulevitch, 105. P.J. Hajduk, J.C. Mack, E.T.
J. Han, M.H. Cobb, E.J. Goldsmith, Olejniczak, C. Park, P.J. Dandliker,
Proc. Nutl. Acud. Sci. U.S.A. 1997, 94, B.A. Beutel,]. Am. Chem. SOC.2004,
2327-2332. 126,2390-2398.
97. K.P. Wilson, M.J. Fitzgibbon, P.R.
Caron, J.P. Griffith, W. Chen,
Chemical Biology
15.3 The Nuclear Receptor Superfamily and Drug Discovery I 891
15.3
The Nuclear Receptor Superfamily and Drug DiscoveryQ
John T. Moore, J o n L. Collins, and Kenneth H . Pearce
Outlook
Nuclear receptors are an evolutionarily related family of proteins unified by

common structural and functional properties. In general, these receptors act as
specialized transcription factors that ultimately regulate target genes involved
in a variety of critical biological processes, such as cellular differentiation,
reproduction, metabolic homeostasis, and immune system function. For a
subset of the receptors, activities can be regulated by endogenous hormones,
lipids or metabolites and in some cases synthetic small molecule ligands.
Drug discovery advances within the field have shown that designer ligands can
exert pathway- and tissue-selective effects on the receptors, thus maximizing
medically beneficial responses over side-effect liabilities. This review covers
many of the features of nuclear receptor structure/function and highlights
some of the key methodologies currently being used to aid discovery of new
nuclear receptor-targeted drugs.
15.3.1
Introduction
A central theme that defines the field of endocrinology is the act of controlling
activities and processes at distal sites in the body. Signaling molecules, in
some cases nonprotein small molecules, traverse the body and ultimately relay
their chemically encoded information to a protein receptor at the target tissue.
The nuclear hormone receptor ( N R ) is a classic example of a receiver for such
small molecule, chemical messengers. The N R is well adapted for this type of
function because it not only specifically binds the small molecule but is also
capable of relaying or transducing a complex set of signals carried along by
the properties of the ligand. As reviewed herein, the nature of the information
that the ligand-bound N R relays, depends on a complex interplay of factors,
such as ligand and cell type.
In humans, 48 N R genes have been identified (Fig. 15.3-1) [l].A feature that
unifies the N R s as a superfamily is that each receptor consists of an assembly
of functional modules (Fig. 15.3-2) [2].For the purpose of this review, the
module most relevant to current drug discovery approaches is the C-terminal
$< A similar version of this paper was published in ChemMedChem 2006, 1, 504-523, Wiley-VCH,
Weinheim, Germany
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wrss
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheirn
ISBN: 978-3-527-31150-7
c
NR112-PXR
I NR113-CAR
NR11I-VDR
N R I B2-RARB
NRIBS-RARG
_.
-
-
-
-
-
NR2AI-HNF4
NR2A2-HNF4
NRGAI. .GCNF
NRSAl-SF-1
NRSA2-LRH-
NR2ES-PNR
Fig. 15.3-1 The NR superfamily amino acid sequence relationships. NRs are
represented as a phylogeny plot. The 48 named according to the accepted unified
identified receptors within the human nomenclature (see Table 15.3-1 for more
genome are shown clustered according t o description) [l].
ligand-binding domain (LBD). The LBD is typically about 250 amino acids in
length and contains a key regulatory element, the so-called activation function
2 (AF2) domain, as well as all the recognition elements required for ligand
binding (Fig. 15.3-3(a),left) [3].
The fold of the N R LBD is typically described as three stacked a-helical
sheets. The helices comprising the “front” and “back” sheets are roughly
aligned parallel to one another. The helices in the middle sheet run across the
two outer sheets and occupy space only in the upper portion of the domain
(Fig. 15.3-3(a),right). The space in the lower part of the domain is relatively
Fig. 15.3-2 Domain organization ofthe Examples of selected NRs (see Table 15.3-1
NRs. Shown are the basic structural for abbreviations) are shown below to
modules comprising an NR (AF-1, activation demonstrate that most NR LBDs are simi-
function-1; DBD, DNA binding domain; lar in amino acid length, but the N-terminal
LED, ligand binding domain). In the linear region varies ~ ~ O n gfamily
St members.
schematic at the top, the general functions Numbers represent amino acid Position.
o f the respective regions o f NRs are noted.
894
15.3 The Nuclear Receptor Supefamily and Drug Discovery I 895
4 Fig. 15.3-3 Representative structures of atom type with carbons shown as blue and
NR functional modules. (a) The first NR oxygens shown as red. The domain on the
LBD to be solved crystallographically was right is rotated 90” t o clearly show the three
the apo RXR LBD (51. The representative helical layers that comprise the NR LBD
example structure shown here, depicted as fold. (b) Shown, as a ribbon diagram, this is
a ribbon diagram, is for PR bound t o its the first X-ray crystal structure o f an NR
natural ligand progesterone [6]. This DBD bound to a DNA response element.
structure, which was the first ofthe steroid This representative structure is the DBD
receptors to be solved, shows the basic fold from CR bound in an antiparallel fashion to
conserved among members ofthe NR its inverted direct-repeat DNA response site
superfamily. The major helices (red) are [7]. The CR DBD is bound as a homodimer,
labeled, the well-conserved small p-sheet is where one ofthe subunits is colored yellow
shown in yellow, and the random coil and the other is colored blue. The DNA helix
stretches connecting the major structural is shown with atoms represented as spheres
elements are colored green. The final and colored according t o atom type
C-terminal helix is labeled as the AF2 helix (carbon - green; oxygen - red;
and is described in more detail in the text. nitrogen - blue; phosphorus - magenta).
The progesterone molecule is colored by
void of protein, and for most NRs this creates an internal cavity for small
molecule ligands.
The central part of the typical N R contains the DNA binding domain (DBD),
which is usually about 70 amino acids, contains two zinc-finger motifs, and
is the most highly conserved sequence segment amongst the NRs. For some
NRs, the DBD forms a dimer and binds a DNA response element containing
a direct repeat of six base pairs [4]. The typical DBD contains three helices,
(Fig. 15.3-3(b))the first of which docks into the major groove of the DNA
recognition site. A second smaller helix and the loop preceding it create a
domain-domain interface. The third helix makes no DNA or other contacts.
Most NRs have an N-terminal domain, commonly referred to as the activation
finction I (AF1) domain. This module varies greatly in length amongst
receptors and generally contains a nonligand dependent transcriptional AF.
Upon activation by the ligand messenger, NRs typically function as
transcription factors where they bind to recognition elements and regulate
the expression of target genes. Once complexed to DNA, NRs recruit accessory
proteins such as coactivators, corepressors, and basal transcriptional factors,
thus initiating gene transcription (Fig. 15.3-4).In some cases, genes under the
control of a negative response element are downregulated by an NR; thus NRs
are able to act directly as activators or suppressors of gene function. As will
be discussed later in this chapter, N R pathway regulation goes beyond direct,
DNA-mediated transcriptional regulation. For example, some NRs crosstalk
with other important signal transduction schemes such as nuclear factor
kappa B (NF-KB)and activator protein 1 (AP-I)[8](Fig. 15.3-4).
NRs have a rich and long-standing history in drug discovery. This can be
attributed to several features inherent to this class of targets: (a) NRs have been
designed by nature to selectively bind “druglike” small molecules and (b) a di-
verse set of biologically important functions can be regulated through a single
ligand-activated receptor (see Table 15.3-1, e.g., of NR-targeted drugs). Data
896
Fig. 15.3-4 A simplified schematic homo- or heterodimer. This complex is able

depicting the general mechanisms of NR t o recruit coactivator (CoA) proteins and
function. Some unliganded NRs, such as the other transcriptional components to
steroid receptors, exist in the cytoplasm in regulate target gene expression. Another
an inactive complex with heat shock mechanism whereby ligand-activated NRs
proteins (hsps). Ligand binding triggers hsp can affect gene transcription involves
uncoupling and transport ofthe NR to the association with other transcription factors
nucleus. To directly regulate gene (TF) such as NF-KB, AP-1, (activator
transcription, the ligand-bound NR protein-1) is a transcription factor, TF and
associates with a DNA response element CATA (abbrev and ref). The precise
within the promoter ofthe target gene. In molecular mechanism ofthis latter activity
many cases the NR localizes in the form o f a remains controversial.
Table 15.3-1 The human nuclear receptor superfamily and
examples of ligands and therapeutic utilities
General Name Subtypes and Unified Natural ligand Examples of Therapeutic

categorylal abbreviations nomen- therapeutic reIevanceICl
(other common clature[b] ligands
abbreviations) (trade name)
Classic Estrogen receptor E Rct NR3A1 Estradiol, estrogens Tamoxifen, raloxifene Menopausal symptoms,
steroid ERB NR3A2 (Evista),genestein, osteoporosis prevention,
receptors diethylstilbestrol, breast cancer a. .-
equine estrogens 9
lu
(Premarin)
Glucocorticoid receptor GR NR3C1 Cortisol; Prednisone, Inflammatory and 2
glucocorticoids dexamethasone, immunological diseases, 2
fluticasone propionate asthma, arthritis, allergic $
(Flovent, Flonase), rhinitis, cancer, immune 2
mometasone furoate suppressant for transplant $
(D
(Nasonex),budesonide ‘n,
(Rhinocort/Pulmicort) 2
VI
Mineralocorticoid MR NR3C2 Aldosterone; Spironolactone Hypertension, heart
receptor deoxycorticosterone (Aldactone), failure P
eplerenone (Inspra) 3
3.
Progesterone receptor PR NR3C3 Progesterone; RU486 (Mifepristone) Abortifactant, menstrual
3
progestins control n
Androgen receptor AR NR3C4 Testosterone; Flutamide, Prostate cancer 0
androgens bicalutamide (Casodex) 09
z
P
w.
(continued overleaf) 2
E
General Name Subtypes and Unified Natural ligand Examples of Therapeutic 2
categorylal abbreviations nomen- therapeutic %-+
(other common clatureIb] ligands 2
abbreviations) (trade name)
--.
2.
VI
Classic RXR- Thyroid hormone TRa NRlAl Thyroid hormone Levothyroxine Thyroid deficiency
heterodimer receptor TRB NRlA2 (Synthroid)
receptors
Retinoic acid receptor RARa NRlBl Retinoic acid Isotretinoin (Accutane) Acne
RARB NRlB2
NRlB3
Peroxisome PPARa NRlCl Fatty acids, eicosanoids Fenofibrate (Tricor; Dyslipidemia (PPARa),
proliferators-activated PPARS NRlC2 PPARa), diabetes and insulin
receptor PPARy NRlC3 thiazolidinediones sensitization (PPARy )
(Avandia, Actos;
PPARy)
Liver X receptor LXRa NRlH2 24,25-Epoxycholesterol - Role in lipid and
LXRB NRlH3 24-Hydroxycholesterol cholesterol metabolism;
atherosclerosis
Farnesoid X receptor FXR NRlH4 Chenodeoxycholic acid Cholesterol maintenance,
protect hepatocytes from
bile toxicity; cholestasis
Vitamin D receptor VDR NRlIl Vitamin D, bile acids Calcitriol (Rocaltrol) H ypocalcemia,
osteoporosis, renal failure
Retinoid X receptor RXRa NR2Bl All trans-retinoic acid LG1069 (Targretin) Skin cancer
RXRP NR2B2
RXRY NR2B3
Xenobiotic Pregnane X receptor PXR NRlI2 Xenobiotics St. John’s wort, Role in protection from
receptors rifampicin toxic metabolites
Constitutive androstane CAR NR113 Xenobiotics Phenobarbitol Role in protection
receptor toxic metabolites
Orphan ER-related receptor ERRa NR3Bl Unknown Tamoxifen, Muscle fatty acid
Receptor (or ERRB NR3B2 diethylstilbestrol metabolism ( E R R
recently ERRy NR3B3 (ERRY)
deorphaned)
RAR-related orphan RORa NRlFl Cholesterol, cholesterol - Role in cerebellu
receptor RORfi NRlF2 sulfate development,
RORy NRlF3 maintenance of b
(RORa);circadian
(RORB); lymph n
organogenesis (R
Human nuclear factor 4 HNF4a NR2Al Palmitic acid Role in diabetes
HNF4y N R2A2
Reverse erbA Rev-erbAa NRlDl Unknown Circadian rhythm
Rev-erbAfi NRlD2
Testis receptor TR2 NR2Cl Unknown Unknown
TR4 NR2C2
Tailless-like TLX NR2El Unknown Role in neuronal
development
Photoreceptor-specific PNR NR2E3 Unknown Role in photorece
nuclear receptor differentiation
Chicken ovalbumin COU P-TF I NR2Fl Unknown Role in neuronal
upstream COUP-TFII NR2F2 development (CO
promoter-transcription COUP-TFIII NR2F6 vascular develop
factor (Ear2) (COUP-TFII)
(continued
General Name Subtypes and Unified Natural ligand Examples of Therapeutic 2
categoryia1 abbreviations nomen- therapeutic 2
(other common clatureIb] ligands 2
abbreviations) (trade name) =
2.
2
NGF-induced factor B NGFIBa (also NR4A1 Unknown Role in thymocyte
NUR77) apoptosis
Nur related factor 1 NGFIBB NR4A2 Unknown Role in dopaminergic
(NURR1, NOT1) neuron development
Neuron-derived orphan NGFIBy NR4A3 Unknown Unknown
orphan receptor 1 (NOR1)
Steroidogenic factor 1 SF1 NR5A1 Phospholipids Role in mammalian sexual
development
Liver receptor LRHl NR5A2 Phospholipids Role in lipid homeostasis,
homologous protein 1 cell-cyclecontrol
Germ cell nuclear factor GCNF NRGAl Unknown Role in vertebrate
embryogenesis
NR-like, DSS-AHC critical region DAXl NROBl Unknown Role in sex determination
DBD-less on the chromosome, and development
repressors gene 1
Short heterodimer SHP NROBZ Unknown General repressor of NRs,
partner obesity
-
a Each of the 48 human receptors is roughly categorized into
several very generalized groups. The order descends from the
historically, more studied, classical receptors (top) to the more
recently discovered family members (bottom).
b Nomenclature from Ref. 111.
c Biological role of the receptor if ligand is currently not
identified.
compiled for the year 2003 (http://www.rxlist.com/top2OO.htm) show that 34

of the top 200 most prescribed drugs target an NR. Currently, drugs targeting
an N R account for over 30 billion dollars in pharmaceutical sales and treat
numerous debilitating diseases. In the light ofthese facts, the N R field remains
an area of intense research with most of the current effort directed toward im-
proving upon current N R drugs or screening currently unexploited NRs. The
purpose ofthis review is to briefly cover the following general topics as they per-
tain to the chemical biology of NRs: The history of NR-targeted drug discovery,
principles of NR-ligand recognition and protein conformational change, bio-
logical pathways controlled by NRs, recent N R drug pursuits, and finally some
new technologies and future pharmaceutical prospects for this target class.
15.3.2
Brief History of N R s in Medicine and Drug Discovery
The first generation of N R drugs was discovered prior to a detailed knowledge

of the target class. Many clinically useful compounds were initially found by
tracking down biological activity from natural extracts. Only later did these
bioactive molecules lead scientists to the actual drug target.
Studies dealing with bioactive fractions from natural extracts containing
steroid or thyroid hormones helped lay the foundation for modern NR-based
endocrinology. For example, study of adrenal gland extracts initiated GR
drug discovery and these tissue extracts were used clinically to correct the
manifestations of Addison’s disease (glucocorticoid deficiency) (for review
see Refs. [9, lo]). From this early clinical work, a well-defined relationship
that connected the adrenal extract with maintenance of homeostatic function
emerged. For example, it was noted that, in addition to bringing about
remission from stress-related diseases, the extracts also suppressed symptoms
in patients suffering from inflammatory conditions such as allergy, hay
fever, and asthma. At the same time, biochemical characterization of the
adrenal gland extracts identified cortisone as an active steroidal component.
In 1948, when sufficient quantities of cortisone could be purified, its effects
on inflammatory diseases were directly tested. Ultimately, total synthesis of
cortisone was accomplished by Woodward and colleagues and a group at
Merck [11, 121, thus completing the first-generation evolution of this drug
and setting the stage for later syntheses of potent synthetic steroids such as
prednisolone and dexamethasone.
A similar history was seen with the first generation of drugs that targeted
other steroid receptors. It was known as early as 1916 that ovariectomy could
reduce the incidence of mammary cancer in high-incidence strains of mice
[13]. Studies of the biological effects of extracts containing estrogenic activity
prompted screens for compounds with antiestrogenic effects, initially for
contraception in the 1960s, but later for estrogen-responsive breast cancers.
Screens for antiestrogenic nonsteroidal compounds led to the discovery of
902
I ethamoxytriphetol, clomiphene, and then, tamoxifen. Tamoxifen ultimately
15 Target Families
became the gold standard for the endocrine treatment of breast cancer and
relatively recently became the first approved cancer chemopreventative agent.
Not surprisingly, the first set of N R genes cloned were from the steroid
receptor subgroup where prior research yielded compounds to aid in
purification of the receptor. The first human N R cloned was the glucocorticoid
receptor (GR), an accomplishment that relied heavily on reagents made
available from the purification and biochemical characterization of adrenal
extracts. With purified receptors, selective antibodies were used to help isolate
the corresponding cDNA [14-161. cDNAs representing the full-length coding
region of GR provided the first full-length amino acid sequence of an NR. The
estrogen receptor (ER) was also cloned around the same time by three groups
using independent strategies [17-191.
Comparison of emerging N R sequences (from human as well as from other
species) revealed conserved domains shared virtually among all NRs. The
finding that NRs could be isolated without knowledge of their ligand increased
the rate at which new NRs could be identified. Initially, oligonucleotides
representing conserved N R motifs (such as the highly conserved DBD) were
employed as molecular probes to perform low stringency DNA hybridizations
to cDNA libraries. The number of orphan NRs quickly surpassed the number
of classical NRs [20-221.
By the late 1990s, the chosen method for identification of new NRs shifted
from the laboratory to in silico methods. This advance was made possible by the
availability of large databases of randomly generated partial cDNA sequences,
known as expressed sequence tags (ESTs),and the development of bioinformatic
searches and query tools such as BLAST. Two new mammalian NRs were
successfully identified through automated searches of EST databases. The
pregnane X receptor (PXR) was identified in a public database of mouse
ESTs by a high-throughput in silico screen for NR-like sequences [23],
and the photoreceptor cell-specific receptor (PNR) was found in a human
EST database [24]. After the isolation of PNR from EST databases, the
number of human NRs totaled 48. The availability of the complete human
genome sequence in 2001 confirmed that this set of 48 is the complete N R
genome [25, 261.
As new NRs were isolated, new connections between first-generation drugs
and their targets were made. For example, thiazolidinediones (TZDs) had
previously been discovered through traditional pharmacological methods
to show clinical benefit in diabetes; however, the molecular basis for
this therapeutic effect remained unclear. By using expression constructs
derived from the isolated N R genes, activity screens for each receptor were
developed. Using these screens, TZDs were found to be potent and selective
activators of peroxisome proliferator activated receptor gamma (PPARy)
[27]. Once this link was made, the search for a second generation of
PPARy compounds could be initiated using an in vitro assay for PPARy
activation.
75.3 The Nuclear Receptor Supefamily and Drug Discovery I 903
This second generation approach of using the receptor rather than a

bioactive extract can be characterized as a “reverse endocrinology” approach.
Traditionally, ligands were identified on the basis of their biological effects.
But, when this process is reversed, the orphan receptors are used to identify
the ligands, which are subsequently used to dissect the biology of the
receptors. For example, a reverse endocrinology approach was used to link
farnesoid X receptor (FXR) to bile acid ligands. Availability of chemical tools
(bile acids as well as synthetic ligands) for FXR led to experiments that
linked FXR to bile acid homeostasis and suggested the possibility that FXR
ligands could be of benefit in treating disorders involving cholestatic liver
disease [28].
Amongst the N R superfamily, a third generation drug discovery effort
has recently begun. In this phase, screening methods that give information
beyond potency and selectivity (e.g., selective effects on gene expression) are
used to discover compounds with therapeutic advantages over present drugs.
Strategies that underlie this new drug discovery effort are the subject of a
following discussion on NR modulators.
15.3.3
Basic Principles for Ligand-NR Recognition
From a medicinal chemistry perspective, targeting NRs via novel small

molecule ligands is a fairly tractable exercise. As mentioned above, most
NRs have a small, enclosed ligand-binding pocket and a wide variety of
druglike, high-affinity molecules can be identified, which bind in this pocket.
The inherent difficulty of rational drug design for NRs derives from the vast
complexity of N R associated biology. While small molecules that bind the
target N R with high affinity can be fairly readily identified, the corresponding
functional activity is not always obvious or immediately interpretable given
the current level of biological understanding (discussed in more detail below).
In this section, we will discuss the general principles of ligand binding
for NRs.
15.3.3.1 Steroid Receptors: CR, MR, PR, AR, and ER

The ligand-binding pockets of the steroid receptors, which includes GR, the
mineralocorticoid receptor (MR),progesterone receptor (PR),and the androgen
receptor (AR),as well as the more divergent ER, have many common features
required for binding the natural hormone. At least one crystal structure exists
for each ofthese LBDs [6,29-311 (see Figs. 15.3-3(a)and 15.3-5 for an example
of PR and GR, respectively). Typically, about 75% (roughly 17 of 22 residues)
of the ligand-binding pocket’s inner lining consists of hydrophobic residues.
Generally, all the polar residues within the binding pocket (roughly three to five
residues) make a hydrogen bond to the natural ligand. In each case, the A ring
904
Fig. 15.3-5 Structure of the

CR LBD and features o f ligand
binding. (a) Crystal structure o f
the CR LBD bound with
dexamethasone [29]. The
protein is shown as a ribbon
diagram and the AF2 helix,
which is in the active
orientation, is colored red. The
CR agonist, dexamethasone, is
shown in space-filling mode
and carbons are colored blue,
oxygens are red, and
hydrogens are white.
(b) Close-up ofthe CR
ligand-binding site. The pocket
is shown as a cut away and the
back face o f the represents the
hydrophobic nature o f t h e
pocket (carbons are colored
green). Dexamethasone i s
shown oriented with the A ring
3-position ketone toward the
back of the pocket and the D
ring is positioned toward the
AF2 helix. Hydrogen bonds
with key amino acids within the
pocket are shown as dotted
yellow lines. (c) Representative
structures o f well-known CR
Iiga nds.
Dexamethasone Fluticasone propionate

,
N,
o@- RU486
of the steroid hormone is positioned between helices 3 and 5. The oxosteroid

receptors GR, MR, PR, and AR lock the A-ring 3-position carbonyl of the
steroid into place with a hydrogen bond “charge clamp” using a conserved
75.3 The Nuclear Receptor Superfamily and Drug Discovery 1 905
glutamine and arginine on helices 3 and 5 , respectively. With ER, coordination

of the 3-position hydroxyl is made via a glutamate and an arginine at the
respective locations. In all cases, the D ring of the steroid points toward helix
10 and the AF2 helix.
The volume of the pocket varies slightly amongst the receptors when in
complex with the respective natural ligand: approximately 420, 450, 560, 580,
and 590 A3, for AR, E R a , PR, MR, and GR, respectively. Although, depending
on the size and shape of the bound ligand, the volume of the pocket can
change significantly. This dynamic flexibility allows this class of receptors to
accept a wide variety of synthetic ligands with numerous shapes and volumes.
Interestingly, no crystal structures of unliganded steroid receptors have been
reported, so the precise nature of the pocket in the absence of ligand is
unknown.
Crystal structures of steroid receptors in complex with synthetic ligands
have revealed alternative binding modes as compared to the natural steroid
hormone. To date, E R a and ERB subtypes [32]have provided the most variety
of crystal structures with bound synthetic ligands [33]. There are currently
several examples of ER in complex with synthetic ligands: diethylstilbestrol
(DES), 4-hydroxytamoxifen (OHT) [34], genestein [35], raloxifene [36],
(R, R)-5,1l-cis-diethyl-5,6,11,12-tetrahydrochrysene-2,S-diol(THC) [37], and
the pure antiestrogen ICI 164384 [38]. Each of these complexes, either with
E R a or ERB, reveals that the hydrogen bond clamp with a hydroxyl off the
A-ring analog is conserved. The presence of this interaction in each of the
structures emphasizes the importance of this hydrogen bond for high-afinity
binding. The other commonality between these ligands is that they fill the core
of the ligand-binding pocket with hydrophobic atoms, each roughly occupying
the same volume. One of the key features of the OHT, raloxifene, and ICI
164384 structures is that each contains an extended amine or hydrophobic
group directed toward the AF2 helix, which causes steric repositioning of this
structural element (see Fig. 15.3-8(b)and the discussion in section 15.3.4).
15.3.3.2 RXR-heterodimer Receptors: PPARs, RXR, LXR, FXR

Unlike the steroid receptors, most ofwhich function as homodimers, a second
class of NRs function as heterodimers with the retinoid X receptor (RXR).
Importantly, these receptors serve as sensors for metabolites such as fatty
acids, oxysterols, and bile acids. Key elements of ligand recognition and
receptor activation have been elucidated following structure-function analyses
of several receptors in this family including the PPARs, liver X receptors
(LXRs),and FXR.
The X-ray crystal structures of the PPARs, LXRs, and FXR have been
determined in various unliganded and liganded states. The volumes of the
ligand-binding pockets are larger than the steroid receptors and range from
700 to 850 A3 for FXR/LXRs and to 1300 A3 for the PPARs. As with the steroid
receptors, the size and shape of the ligand-binding pockets can vary depending
906
I on the size and shape of the ligand. This plasticity permits the binding of
15 Target Families
diverse, structurally distinct chemotypes.

The majority of amino acids that line the ligand-binding pockets in these
receptors are hydrophobic; however, several key polar amino acids are present,
which have been shown to be critical for ligand recognition and receptor
activation. For the PPARs, an acidic group present in fatty acids is involved
in a complex hydrogen-bond network consisting of a tyrosine on AF2 and two
histidine residues on helices 5 and 10, most ofwhich are conserved between the
three PPAR subtypes (Fig. 15.34). Importantly, the direct hydrogen-bonding
interaction of the acidic moiety with tyrosine on AF2 stabilizes AF2 in an
active conformation and initiates transcriptional activation. The requirement
for this interaction for transcriptional activation is evidenced by the fact that
PPAR ligands (such as GW0072, Fig. 15.3-9) that lack this hydrogen-bonding
interaction show partial agonist or antagonist activity [39].
In contrast to the PPARs, the interaction between oxysterols and bile
acids with LXR and FXR, respectively, does not occur through a direct
interaction with an amino acid on AF2 [41, 421. A critical hydrogen-bond
interaction is observed between a histidine on helix 10/11 and either an
acceptor oxygen on the natural ligand (epoxycholesterol) or a donor oxygen
on a synthetic ligand (T0901317). This interaction positions the histidine
perpendicularly to a tryptophan residue that is located on the AF2 helix
(Fig. 15.3-7), which, in turn, promotes an electrostatic interaction between
these two amino acids. In addition to contributing to ligand binding, this
network of interactions connecting ligand to the AF2 helix helps stabilize
the receptor in an active confirmation (Fig. 15.3-7). It should be noted that
hydrophobic interactions between ligand and receptor can also initiate the
histidine/tryptophan electrostatic switch [43]. The cumulative data suggests
that this histidine/tryptophan interaction is the molecular basis for ligand-
dependent activation of the LXRs and FXR. Clearly, a select number of polar
amino acids within the binding pockets of PPARs, LXRs, and FXR play
important roles in mediating ligand recognition and receptor activation.
15.3.3.3 “Orphan” Receptors: HNF4, CAR, NCFIB

While the steroid and RXR-heterodimer receptors show low transcriptional
activity in the basal state, several NRs have been identified that are
transcriptionally active in the basal state and are thus referred to as constitutively
active receptors. Structural analyses of two NRs in this class, the hepatocyte
nuclear factors 4 (HNF4s)[44]and nerve growth factor-induced B (NGFIB) [45],
provide insight into two unique mechanisms that give rise to the constitutive
activity. The X-ray crystal structure of HNF4y has revealed the presence of
host-derived fatty acids in the ligand-binding pocket. A similar observation was
made in HNF4a [4G].The fact that these fatty acids were not displaceable led to
the proposal that these natural ligands serve as structural cofactors for HNF4.
In contrast to HNF4y, the X-ray crystal structures of NURRl and DHR38,
Fig. 15.3-6 Structure ofthe PPARy LBD (b) Close-up ofthe binding site with the
and features o f ligand binding. (a) Shown in PPARy LBD. The front face o f the site is
blue is a ribbon diagram ofthe crystal clipped away to show the bound
structure o f PPARy LBD bound with rosiglitazone molecule and the hydrophobic
rosiglitazone [40]. The AF2 helix, which is backside ofthe binding pocket. As shown, a
colored red, is in the active position for tyrosine residue from the AF2 helix o f
binding an LXXLL coactivator peptide (not PPARy makes a hydrogen bond with the
shown). The rosiglitazone molecule is thiazolidinedione head group o f
buried in the receptor and is represented in rosiglitazone. (c) Representative structure o f
space-filling mode with carbons colored a well-known PPARy ligand.
green, oxygens red, and nitrogens blue.
908
4 Fig. 15.3-7 Structure of the LXRB LBD and the case o f nonsteroidal synthetic
features o f ligand binding. (a) A ribbon molecules, protrudes toward the AF2 helix.
diagram representing the crystal structure o f (b) Close-up o f the ligand-binding pocket for
LXRP in complex with the synthetic agonist LXRB. The front half o f the receptor is cut
ligand, T0901317, i s shown in orange [42]. away t o show the ligand bound back face o f
The AF2 helix, which assumes the agonist the pocket. The histidine/tryptophan switch,
conformation, is colored red. The ligand is which is key for ligand-induced activation o f
shown in space-filling mode and carbons are LXR, is highlighted. The His-mediated
colored green, oxygens red, nitrogens blue, hydrogen bond i s indicated with yellow line.
and fluorines magenta. Similar t o the (c) Representative structures o f well-known
orientation o f steroids with the steroid LXR ligands.
receptors, the D ring, or D-ring mimetics in
the mouse and Drosophila orthologs of NGFIB-B, respectively, showed the

absence of a ligand-binding pocket [45,47].Instead, several bulky hydrophobic
residues fill the space that is normally occupied by the ligand, suggesting
that the receptor may not be regulated via the classical ligand-based approach.
Clearly, determination of the X-ray crystal structures for the remaining orphan
NRs will provide insight into the tractability of these targets for drug discovery.
15.3.4
Influence of Ligand on NR LBD Conformation
There have been numerous key studies demonstrating that ligand binding
does not simply trigger NRs from an off-state to an on-state. In fact these
studies revealed, at a molecular level, that activation of an N R by a small
molecule ligand is dramatically more complex than a two-state process. The
concept that ligand alters N R conformation to produce activity profiles pertains
mostly to the steroid receptors, PPARs, TR, RXR, RAR (retinoic acid receptor),
LXR, and FXR. Considerable doubt exists whether this concept applies to select
“constitutively active” receptors such as HNF4 and NGFIB.
One of the first studies to reveal the conformational effect of ligand utilized
a protease digestion assay to show that ER ligands could differentially affect
the pattern of protease-generated peptides [48]. As suspected from earlier
work, this study demonstrated that different ligand classes could affect N R
conformation and thus alter the AF2 activity of the receptor.
Predominantly structural studies using X-ray crystallography have shed
light on how ligands can alter N R conformation. In the late 199Os, two
groundbreaking reports on ER showed that ligand can particularly affect the
orientation of the most C-terminal a-helix of the LBD, referred to as the AF2
helix [34, 361. In these studies, the AF2 helix of ER, bound with an agonist
ligand such as estradiol or the synthetic DES, was shown to adopt a position
similar to that seen in the original RAR and PPARy agonist-bound structures
[S, 401 (Fig. 15.3-8(a)).In this active conformation, the AF2 helix spans across
H3 and H10. This arrangement creates a shallow, hydrophobic groove adjacent
75 Target Families
910
I
713 The Nuclear Receptor Supe6amily and Drug Discovery I 91 1
4 Fig. 15.3-8 Examples showing the many site for an LXXLL coactivator peptide, which
possible conformations ofthe AF2 helix. is colored yellow. The ligand tamoxifen
(a) E R u with the agonist diethylstilbestrol sterically interferes with the loop preceding
(341; (b) E R u with the antiestrogen the AF2 helix and causes the AF2 helix to
4-hydroxytamoxifen [34]; (c) PPARa with the reorient, bind within the coactivator cleft,
antagonist CW471 [49]. Each receptor, and block LXXLL peptide binding. For the
oriented in the standard position with PPARa:GW471 complex, the AF2 helix is
H1/H3 in front and slightly off to the right, perturbed in a way t o allow accommodation
is shown in space-filling mode. The AF2 o f a corepressor peptide (shown in
helix for each receptor is shown as a green magenta). In this case the AF2 helix is
ribbon, or as a green random coil for somewhat unwound and localizes on the
PPARa. On DES:ERu. the AF2 helix lies receptor in a different position relative to
across the receptor to help form a binding that seen for other NR LBD structures.
to the AF2 helix. This pocket accommodates a short helical peptide presented
at the surface of a coactivator protein (reviewed in a section below). Peptides
that bind this region of the activated N R typically contain an LXXLL motif
(where L and X represent leucine and any amino acid, respectively). This
short peptide motif is typically a-helical and the leucine residues are presented
on one face of the amphipathic helix. An additional electrostatic interaction
between amino acid side chains of the receptor and the peptide backbone are
believed to aid orientation and stability to the interactions.
The structures of E R bound with either tamoxifen or raloxifene, where both
are antagonists for AF2 function, strikingly revealed that the AF2 helix could
be repositioned from the agonist conformation (Fig. 15.3-8(b)).In each of
these structures, an amine-containing head group from the ligand protrudes
toward the surface of ER to destabilize the active position of the AF2 helix.
This shift causes the AF2 helix to rotate approximately 90" from the active
position. In the antagonist position, the AF2 helix occupies the coactivator
peptide-binding site on the surface of the receptor. These studies highlight
the ligand-induced flexibility and plasticity of the N R LBD particularly with
respect to the AF2 helix.
More recent structural studies using the GR LBD further demonstrate how
ligand can influence the conformation of the LBD [29, 501. The structure of
GR bound with the agonist dexamethasone shows that the AF2 helix exists in
an active position to allow coactivator peptide association. Two structures of
GR bound with the antagonist ligand RU486 have shown that a protruding
dimethylaniline group effectively prevents the AF2 helix from occupying the
active position. In one of these structures, the AF2 helix intramolecularly
blocks the coactivator site. In the other structure, the AF2 helix extends away
from the core of the LBD and associates with an adjacent LBD subunit in
the crystal. Again, these studies suggest that the AF2 helix and the loop that
precedes it are prone to ligand-induced conformational flexibility.
Two studies dealing with PPAR also demonstrate the ligand-induced
conformational aspects of the LBD. In a structure of PPARa, in complex
with both an antagonist ligand GWG471 (Fig. 15.3-9) and a peptide motif
15 Target Families
912
I
Fig. 15.3-9 Examples o f NR tool that is oriented toward the AF helix (as
compounds and drugs, many of which are determined from the crystal structure o f the
referred t o and discussed in the text. For NR-ligand complex) is shaded.
some ligands, the region o f the molecule
from a corepressor (reviewed below), the AF2 helix assumes an alternative

location (Fig. 15.3-8(c))[49]. Here, the AF2 helix occupies neither the agonist
nor antagonist position (i.e., the coactivator groove as seen with ER), but lies
adjacent to the corepressor peptide. Another study using nuclear magnetic
resonance on PPARy shows that the apo LBD is a highly flexible module in
which over half of the chemical shifts of the backbone atoms are missing [51].
When bound with rosiglitazone, these shifts particularly in the ligand-binding
pocket and the AF2-helix regions can be assigned. In general, these studies
suggest that physiochemical properties of the N R ligand can dramatically
influence conformational dynamics of the LBD, which in turn ultimately
governs the downstream signaling aspects of the liganded receptor.
15.3.5
The Multitude o f Ligand-induced NR Actions
By virtue of their ability to interact with a repertoire of molecules within the

cell, ranging from DNA response elements and protein accessory factors, the
NRs represent a target class of complex, multitasking proteins (see Refs. [52,
531 for reviews). Most of the NRs were initially considered to be simple
ligand-induced transcription factors. However, studies over the past decade
have revealed that NRs are much more complicated and serve more than a
unified functional purpose. In this section we will highlight some of the types
of activities of NRs using particular examples.
15.3.5.1 Gene Regulation and the Role ofActivity Enhancing Accessory Proteins
At various stages in the activity cycle, NRs act in concert with a variety of
binding partners. For example, prior to ligand binding, GR resides in the
cytoplasm of the cell in complex with chaperone proteins such as hsp90 or
p23 [54]. Ligand association causes dissociation of chaperones and allows
GR to traverse the nuclear envelope. Using amino acids within the DBD,
the GR binds to a recognition site on a specific promoter, a site referred
to as a glucocorticoid response element (GRE). N R response elements have a
general half site consensus of RGGTCA (where R is a purine); these DNA
half sites are commonly arranged as repeats, either direct or inverted. The
precise mechanism by which NRs associate with DNA response elements
varies amongst the superfamily. In general, the steroid receptors bind to their
response elements as homodimers, although GR can form heterodimers with
MR, and ERa and ERD also can bind DNA as heterodimers. Several NRs, such
as TR, PPARs, LXR, VDR, RAR, and FXR, require heterodimerization with
RXR. Further, many ofthe orphan receptors, such as LRH1, SF1, and NGFIB
can bind DNA as a monomer.
The DNA-bound, ligand-activated N R serves as the docking site for a rather
large extended family of proteins called coactivators. Binding of a coactivator
914
I protein is believed to be one of the key events in initiating transcriptome
15 Target Families
assembly and consequent gene transcription. The first coactivator, called

steroid receptor coactivator 1 (SRCl), was identified in 1995 [55] and since
then over 200 such cofactors have been discovered. The variety of functions
for coactivators, as well as their nomenclature, is a vastly complex field
and a full description of the multitude of their functions are beyond this
review (for more detail see Refs. [SG-581). Focusing on one representative,
S R C l is a member of the plG0 family of coactivators, which also includes
SRC2 (also called transcription intermediary factor 2 (TIF2))and SRC3 (also
called ACTR/pCIP/receptor associated coactivator (RAC3/TRAMl/amplified
in breast cancer 1 (AIB1)).
SRCl illustrates many features common among the coactivators. First, it
contains several LXXLL motifs, otherwise known as NR boxes [59, GO]. As
mentioned above, these short, a-helical motifs present a hydrophobic surface
that is critical for successfully docking the coactivator protein onto an activated
NR. Second, an activation domain within SRCl contains an acetyltransferase
activity which acts locally on histones to unravel DNA at the initiation site
[Gl]. Third, SRCl is able to aid recruitment of other nuclear enzymes,
such as other histone acetylating proteins including CAMP-response element
binding protein (CBP) and p300, and an arginine methyltase called coactivator-
associated arginine methyltransfrase 2 (CARM1). Ultimately, to initiate gene
transcription, the NR-coactivator complex recruits the chromatin remodeling
complex SWI/SNF and the basal transcription factor-recruiting complex, TR-
associated proteinlvitamin D receptor-interacting protein (TRAP/DRIP),and
other basal transcription factors.
15.3.5.2 Corepressors and the Role o f Activity Diminishing Accessory Proteins

Essentially the functional counterpart of coactivators, corepressor proteins
bind to many NRs in the absence of ligand and serve to repress basal
transcriptive activity [G2]. Corepressors play a particularly important role for
NRs that are found almost exclusively in the nucleus, unlike the apo steroid
receptors that are cytoplasmically localized. Studies involving the nuclear-
localized receptors TR and RAR led to the identification of silencing mediator
of retinoid and thyroid (SMRT) receptors and nuclear receptor corepressor
(NCoR) [G3,641. Both SMRT and NCoR recruit histone deacetylases (HDACs),
namely, HDAC3, which function to reverse the chromatin unwinding result
of the coactivator-recruited histone acetylases [GS].
Similar to how the coactivators use the LXXLL motif as a docking point,
the corepressors contain an LXXIIXXXL peptide referred to as the corepressor
nuclear receptor (CoRNR) box [GG]. The precise nature of the interaction
between corepressors and NRs remained elusive before the solution of the
crystal structure between PPARa and a peptide from SMRT. As mentioned
briefly in a previous section, this structure shows that the CoRNR box occupies
the same general site on PPARa as the coactivator LXXLL motif. However,
15.3 The Nuclear Receptor Superfamily and Drug Discovery I 91 5
the CoRNR box is approximately one a-helical turn longer, and the AF2
helix on PPARa is pushed out of position and does not play a role in
molecular recognition (Fig. 15.3-8(c)).There are several reports showing that
NRs occupied by nonagonist ligands, such as E R with raloxifene and GR with
RU486, increase corepressor binding. These results suggest that these type of
ligands not only disfavor coactivator binding but also create a surface on the
N R favorable for corepressor binding.
15.3.5.3 Interference in NF-KB and AP-1 Pathways

In addition to interaction with coactivator and corepressor proteins, NRs
have been shown to associate with a variety of other proteins key to cellular
maintenance and function. It has been well documented that several NRs,
predominately the steroid receptors and also PPARs, RXR, and RAR, have the
ability to crosstalk with signaling pathways involving the transcription factors
NF-KB and AP-1 [67, 681. Activated NRs typically repress the ability of N F - K B
and/or AP-1 to transcribe their targeted genes. This interference is believed to
be the basis for the anti-inflammatory actions of corticosteroids and estrogens
[69,70].There have been several mechanisms proposed for these activities, but
a conclusive molecular basis for these activities remains elusive. One proposal
suggests a direct interaction between the N R and NF-KB [71, 721. Since both
NRs and NF-KB require the aid ofcoactivator proteins, such as SRCl and CBP,
another proposed mechanism involves a “cofactor squelching” event. A third
proposal involving GR involves a direct association between the N R and protein
kinase A, whereby cross-coupling of NF-KB and GR occurs in the cytoplasm
[73]. Clearly, these studies show that NRs play a complex and integrated role
in pathway management beyond the direct D NA-mediated regulation of gene
transcription.
15.3.5.4 Nonnuclear Functions and Interactions with Other Cellular Proteins

Another level of complexity in N R functions, apart from the vast network
of coactivator, corepressor, and NF-KB/AP-l interactions, involves interaction
with a wide variety ofcellular proteins. In general, these activities are commonly
referred to as nongenornic actions [74]. Full coverage of this arena is beyond
the scope of this review, but a few selected examples are highlighted to
demonstrate the breadth of complexity that liganded NRs have on adjacent
pathways. For example, PR and other steroid receptors have been shown to
interact with numerous cytoplasmic kinases, such as c-Src tyrosine kinases,
in a ligand-dependent manner [75, 761. GR has been shown to interact with a
variety of cellular factors such as SMAD3 [77]and J N K [78].E R has been shown
to interact with a variety of factors, such as phosphatidylinositol-3-OH kinase
(PI3K) [79]. Additionally, N R s are phosphorylation targets, primarily within
the AFl domain, and it has been shown that N R activities can be modulated
by phosphorylation state [80-83].
916
15.3.6
Specific Examples of Recent NR Drugs and Novel Drug Candidates
As mentioned earlier in the text, the NRs have a rather illustrious history in
pharmaceutical discovery (Table 15.3-1). Once a synthetic ligand has been
identified for a receptor, typically via screening and/or structure-guided
design efforts, the goal is to chemically alter the properties of the ligand
to appropriately modulate the activities of the receptor. Throughout the last
decade or so, ligands that display differential activities relative to the natural
ligand have been commonly referred to as selective nuclear receptor modulators
(SNuRMs). One of the original demonstrations of this concept involved ER
and the two classic selective estrogen receptor modulators (SERMs),OHT and
raloxifene. Essentially, it was found that these SERMs retained tissue-selective
agonist activity (such as in bone tissue and on lipid profile for raloxifene),
but functioned as antagonists in reproductive tissues [84, 851. Furthermore,
even though both molecules were originally considered "antiestrogens", OHT
generally shows a trend toward estradiol-like activity in uterine tissue [85],
whereas raloxifene does not. The groundbreaking work around novel ER
ligands has opened the gates to find novel, tissue-selective synthetic modulators
for several of the therapeutically relevant NRs.
In this section we will highlight a few of the more recent pursuits of
SNuRMs (Fig. 15.3-9). The purpose of this brief discussion is to give an
overview of the current state of the art for ligand and drug discovery by
mentioning a few somewhat recent specific examples. Overall, the present
mission in N R drug discovery is to manipulate the receptor with ligand
to retain tissue-selective benefits while minimizing the unwanted activities
(Table 15.3-2). These few selected examples cover the basic principles of N R
drug discovery - such as identifying small molecule binders and modifying
hits for N R modulation - and the use of recent techniques and methodologies.
15.3.6.1 Selective ER Modulators (SERMs)

First reported in the 1970s, tamoxifen was the first synthetic N R small
molecule to show differential tissue effects. The primary reason it has not
been widely used to treat menopausal symptoms is the fact that this molecule
shows stimulatory effects on the uterus, which cause a significant risk for
endometrial cancer [86]. However, tamoxifen remains a first-line treatment
for ER-positive breast cancer. A second generation SERM, raloxifene, was
originally developed as a tamoxifen follow-up for breast cancer, but it was
demonstrated that this molecule has significant osteoporosis protective effects
without the endometrial activities relative to tamoxifen [87]. The molecular
basis for these ER-modulating activities has been the focal point for a wide
body of pharmacological research [88]. One proposed mechanism is the
differential effects of SERM-bound ER to promote corepressor association
versus coactivator association [89, 901.
15.3 The Nuclear Receptor Superfamily and Drug Discovery I917
Table 15.3-2 Examples of therapeutic profiles for designer,

tissue-selective nuclear receptor modulator ligands
Receptor Desired efficacy Unwanted activity

with therapeutic to be reduced with
modulator compound desired modulator
Estrogen receptor a Reduce menopausal hot a Breast and uterine tissue

(W flashes stimulation
a Prevent postmenopausal
osteoporosis
Glucocorticoid receptor a Reduce inflammatory a Fat redistribution and
conditions weight gain
a Suppress immune system a Increased bone loss
for transplant a Diabetes
a Depression/mood Effects
Mineralocorticoid a Reduce hypertension a H yperkalemia
receptor a Protection against
congestive heart failure
Progesterone receptor a Reduce endometriosis a Abortive activities
Androgen receptor a Protection against skeletal m Prostate stimulation
muscle atrophy
PPARa a Improve dyslipidemia a Peroxisome proliferation
PPARS a Improve dyslipidemia a Unknown
PPARy a Glucose lowering a Edema and weight gain
Liver X receptor (LXRa a Reduce atherosclerosis Hypertriglyceridemia
or LXRP) a Anti-inflammatory
a Antidiabetic
Farnesoid X receptor a Protection against a Unknown
cholestasis
Driving on the theory that ligands can induce specific ER conformations, a

series of triphenylethylene ligands for ER were made and screened through
a uterine Ishikawa cellular assay "911. Compounds showing the ability to
reduce estrogen stimulated Ishikawa cell stimulation were then tested in
ovariectomized rats for the ability to protect against loss of bone mineral
density. The molecule GW5638 was identified using this approach (Fig. 15.3-9);
it was further shown that the compound had antagonist properties on the
uterus and agonist activities on the bone and the cardiovascular system [92].
A further study has shown that the unique biological properties of GWS638
are derived from the unique structural conformation of ER when bound to
GW5638 relative to other SERMs [93]. In addition to this one example, a
number of novel SERMs have been identified using a combination of cellular
screens, primarily uterine cell- and breast cell-based assays [94, 951. These
SERMs include idoxifene, lasofoxifene, Wyeth 424, levomeloxifene, and others
(Fig. 15.3-9).
Two new approaches to ER ligand discovery have recently been reported.
One involves the use of NF-KB-driven reporter assays to discover pathway
918
I selective ligands with the potential to treat inflammatory disorders [96].
15 Target Families
Another relatively recent focus for ER-directed drug discovery relates to the
fact that there are two subtypes of this receptor, ERw and ERj3, which derive
from two separate genes [32, 971. Stimulated by the distinct tissue distribution
pattern of these two related receptors, the concept is that new indications, such
as inflammation and cancer, can be treated with an ER-selective molecule.
Toward this goal, several reports have demonstrated it to be possible to identify
ERB-selective ligands [37, 98-1001.
15.3.6.2 Selective CR Modulators (SCRMs)

A variety of debilitating diseases, such as rheumatoid arthritis, inflammatory
myopathies, cancers, and a variety of immunological diseases are treated
with the classic synthetic glucocorticoids, dexamethasone, and prednisone.
However, long-term treatment with these drugs often leads to serious side
effects such as fat redistribution, diabetes, vascular necrosis, and osteoporosis.
There is currently an intense effort to identify new small molecules that are able
to differentially modulate GR to retain the beneficial effects of glucocorticoids
and reduce the incidence of unwanted side effects [lo].
A key genetic study, utilizing a knock-in mutation of a dimerization-deficient
mutant of GR, has shed light o n the molecular basis for dissociative activity
[loll. In essence, this GRdimmutant demonstrated that some of the direct
gene transduction properties of GR can be reduced while other immune-
modulating functions of the receptor can be retained. This concept forms one
of the principles of selective modulation of GR. Importantly, many of the
anti-inflammatory effects of GR are believed to be driven by the ability of the
monomeric form of the receptor to interfere with NF-KB and AP-1 function,
which ultimately results in reduction of proinflammatory cytokines such as
interleukins (1L)-1, -2, -6, -8,and tumor necrosis factor (TNF) TNFw [69].
There have been several recent reports of ligands that display differential GR
activation. Although a complete survey is beyond the scope of this review, we
will select a few examples to demonstrate the concept and the methods used
to discover the ligands. Typically, three measures of GR activity were used to
identify these ligands: (a) direct GR binding relative to other steroid receptors,
(b) a cell-based assay measuring GRE-mediated gene transcription (referred
to a transactivation), and (c) cell-based assays measuring the ability of GR to
regulate NF-KB and AP-1-driven genes (referred to as transrepression).
Several steroid-based compounds have been shown to differentially reduce
transactivation with only minimal effects of transrepression (see Figure
15.3-9) [102, 1031. In the nonsteroidal class of GR ligands, a quinoline-
based series of compounds, particularly ones with an aryl substituent at the C5
position, yielded a trend toward a preferred transactivation/transrepression
profile in cellular assays. Some of these ligands also showed a more
promising therapeutic window for selective in vivo effects [104, 1051. In
another study, a nonsteroidal GR ligand, ZK 216348, has been reported
to show significant dissociation of transactivation and transrepression

activities [106]. Following a GR-binding assay to identify high-affinity binding
compounds, hits were characterized using (a) an assay measuring GRE-driven
reporter (induction of tyrosine aminotransferase), (b) an assay monitoring
reduction of lipopolysaccharide (LPS)-induced IL-8 production from TH P-1
monocyte/macrophage cells, and (c) an assay measuring inhibition of TNFa
and IL-12 p70 from LPS-induced peripheral blood mononuclear cells. This
linear approach highlighted ZK 216348 as a dissociative molecule. Further in
viuo work, using an ear inflammatory model for efficacy and models for skin
atrophy, weight gain, adrenal weight, and blood glucose levels for unwanted
side effects, showed an improved therapeutic profile relative to prednisone.
15.3.6.3 Other Modulator Efforts: PR, M R , AR, PPAR, FXR, LXR

The concept of selective N R modulation to produce an activity and therapeutic
profile distinct from the natural ligand has been applied to numerous other
receptors (Fig. 15.3-9). For example, a modified steroid ligand for PR, called
asoprinisol has been shown to produce antiuterotrophic effects with only
minimal antiabortive and breakthrough bleeding effects [ 1071. A selective
M R modulator called eplerenone, a molecule that was discovered decades
ago, has recently been approved as the drug for hypertension [108]. This
synthetic steroid has improved the specificity for M R over related receptors,
and functions as a partial antagonist of aldosterone [109]. Currently, there is
an effort to identify a modulator of AR for utility in prostate cancer as well
as possibly treating the neurological and muscular degenerative symptoms of
androgen deficiency [110- 1121. One recent example of a tissue-selective AR
modulator is LGD2226, which appears to retain some anabolic effects on the
bone and muscle with reduced proliferative effects on the prostate [113].
Several groups have shown progress in developing selective peroxisome
proliferator activated receptor gamma modulators (SPPARMs). The first-
generation TZD class of PPARy agonists, used pharmacologically as insulin
sensitizers, also exhibit dose-limiting liabilities such as hemodilution and
edema (see Table 15.3-2).Initial studies of PPARy activation by TZDs revealed
that these compounds activated the receptor via a direct interaction with
the C-terminal AF2 helix [40]. Structural studies have also revealed PPARy
activators that bind the LBD using non-TZD epitopes such as the partial
agonist GW0072 [39].Compounds that have distinct binding and/or activation
modes represent a potential avenue to discover PPARy modulators with
modified biological activities. Non-TZD selective PPARy modulator (e.g.,
[45] nTZDpa) compounds have been found which induce an altered LBD
conformation compared to TZDs as measured by protease protection and
N M R spectroscopy [114].Like GW0072, these compounds function as partial
agonists and could antagonize the activity of PPARy full agonists in 3T3-
L1 adipogenesis assays. Moreover, the nTZDpa compounds demonstrated
qualitative differences versus traditional agonists on gene expression in cell
75 Target Families
920
I culture (3T3-Ll adipocytes) and in vivo (white adipose tissue) and also on in
vivo physiological responses such as adipose depot size. Thus, further efforts
to develop SPPARMs may lead to compounds with improved characteristics
relative to existing clinical compounds.
Modulator efforts have also begun for NRs that to date have only been
investigated preclinically. In studies of both FXR and the LXRs [115-1171,
compounds with potential novel biological activity relative to natural ligands
are being identified. For example, LXRaIB are regulated in vivo by oxysterols
and this regulation is consistent with the role of the LXRs in cholesterol
homeostasis. Animal models using nonselective LXR tool compounds indicate
that, in addition to conferring atheroprotective effects, these agonists also
promote lipogenesis and triglyceride accumulation in liver. Miao et al. reported
that two LXR agonists (TO901317 and GW3965) show differential effects on
cofactor recruitment in human hepatoma cell assays. Additionally, these two
compounds differ in their in vivo effects on hepatic lipogenesis genes. These
studies point toward the promise of developing LXR modulator compounds
that possess antiatherogenic activities with limited hepatic liability. Whether
the difference between these compounds reflects tissue versus gene selectivity
remains to be elucidated. For both the steroid receptor and nonsteroid receptor
modulators, more work is needed to understand better the underlying basis of
modulator effects.
Taken together, these examples highlight the degree of complexity required
on several levels, such as high-affinity binding to the receptor, inducing
conformational change or altered structural dynamics, selecting an appropriate
cellular assay for measuring N R modulation, and using relevant in vivo models
for measuring the therapeutic index of effects. Because of the structural and
functional similarities within the N R superfamily, lessons learned from one
receptor concerning modulation by a designer small molecule can probably be
applied to other members of the family [3, 118, 1191. Overall, with increasing
knowledge of N R functions, the promise is high that novel, safer, and more
effective medicines will be the eventual outcome. Important in this pursuit is
the use of new technologies to profile ligands; this is the topic for the final
section below.
15.3.7
New Approaches to NR Drug Discovery
One of the more recent principles in the field of NR research and drug
discovery is the realization that a subset of the myriad of functions of NRs
can be selectively manipulated by ligand, a general concept referred to as N R
modulation. New technologies, including advanced computational methods, are
inspiring new strategies for discovering novel NR modulating drug candidates.
Importantly, new technologies allow profiling of N R ligands at greater speed
and in a more physiologically relevant context. Several new approaches to N R
modulator discovery are illustrated in this section, drawing on recent work on

ERaIERB to provide specific examples.
As discussed briefly in the previous sections, NRs do not act in isolation,
but in complex associations with other cellular factors. Cofactor interaction
screening exploits the relationship between N R structure and functional
activity. If a particular ligand uniquely alters the pattern of cofactor interaction
relative to other ligands, there is a likelihood that the differential i n vitro profile
will translate into a unique gene expression pattern or physiological outcome i n
vivo. Peptides representing these interactions can be synthesized on the basis
of known interaction motifs or isolated through screening random peptide
libraries. In ER modulator discovery, this method has been used to characterize
known SERMs and to discover E R ligands with unique properties [120].Norris
et al. applied affinity selection of peptides to identify binding surfaces that
are exposed on ERaIB when complexed with different ligands, such as with
estradiol or 4-OH tamoxifen. They found that the established SERMs, known
to produce distinct biological effects, induced distinct conformational changes
in the receptors. The ability of the peptides to discriminate between different
ERwIB ligand complexes has enabled development of screens to detect subtle
differences between E R ligands. Ligand screens have been developed on
the basis of NR-peptide interactions using a high-throughput multiplexed
technology, which utilizes fluorescently encoded microspheres [ 121, 1221.
Purified N R LBD domains can be used in these screens and the repertoire of
novel NR-interacting cofactors has expanded dramatically in the past few years.
To rapidly identify novel interactors, genome-widescreens for binding partners
have been carried out in yeast and mammalian-based two-hybrid systems. As
mentioned above, over 200 human N R cofactors have been identified. These
interactors are important in the era of N R modulator discovery since each new
cofactor carries the potential to recapitulate a particular cellular interaction
and thus provides the basis for a molecular screen for molecules that uniquely
affect the interaction.
Since NRs are transcription factors, monitoring ligand effects on N R target
genes is a powerful approach to N R drug discovery. The difficulties and
expenses involved in measuring endogenous gene expression have limited
this approach in drug-screening method until recently. Microarray technology
has made it possible to assess endogenous gene expression on a genome-wide
scale and this technology has been used to define an unbiased set of N R target
genes. For example, multiple groups have utilized microarray technology to
differentiate the functions of E R a and ERB in estrogen target organs such
as the bone, breast, and uterus, In one specific set of experiments, human
U20S osteosarcoma cells (which express neither E R a nor ERB) were stably
transfected with human E R a / B to selectively overexpress the receptors in
this bone model system [123]. Treatment of the two cell lines with 17-8
estradiol resulted in two overlapping but distinct patterns of gene expression.
Interestingly, 28% of the estradiol-regulated genes were E R a cell specific
while 11%were ERB cell specific. Not only did this work allow the functional
922
l dissection of the pathways regulated by two functionally similar receptors but
15 Target Families
it has also identified unique sets of endogenous target genes for use in ligand
screening assays.
Using a similar system as described above (U20S cells expressing either E R a
or ERB), Tee and colleagues [124]evaluated the effects of different ER ligands
(including the SERMs raloxifene and tamoxifen) on E R a and ERB target genes.
Microarray analysis showed that raloxifene and tamoxifen regulated only 27%
of the same genes in both the E R a and ERB-containing cells. These results
indicate that estrogens and SERMs exert tissue-specific effects by regulating
unique sets of target genes through ERa/#?. Thus, these specific genes serve
as unique identifiers of compound action, and a subset is especially useful in
discriminating ER ligands.
Higher throughput methods to analyze gene expression hold the promise of
screening large numbers of compounds in a cellular environment using a cost-
effective technology. For example, with advances in glass slide preparations
for monitoring transcriptional changes of thousands of genes, a hit from a
multiwell cell treatment can be inexpensively assessed over a genome-wide
range of genes. with such an analysis, it is possible to observe distinctions
between even very closely related chemotypes. A recent study has used
gene expression profiling to characterize breast cancer cells and to identify
desired “molecular fingerprints” within the data [ 1251. Key “biomarkers” can
be identified, which provide information linked to the phenotypic effect of a
compound. With such a screen, knowledge ofthe target ofthe compounds (e.g.,
whether a compound has antiestrogen effects) is not an a priori requirement.
One challenge in this type of approach is that vast amounts of data are
generated and bioinformatics analysis becomes a limiting factor. Current
advances in gene expression profiling as a drug-screening method must go
hand in hand with advances in bioinformatics and data handling.
Changes in the steady-state levels of mRNA do not tell the whole story. Study
groups are now involved in integrating the data obtained from mRNA steady-
state level analysis with proteomic data. Huber et al. analyzed differences
between the gene and protein expression patterns of the human breast
carcinoma cell line T47D and its derivative T47D-r, which is resistant to the
pure antiestrogen ZM 182780 [126]. Microarray analysis was carried out in
parallel to a proteomics analysis where the total cellular protein content of
T47D or T47D-r was separated on two-dimensional gels. Thirty-eight proteins
were found to be reproducibly up- or downregulated more than twofold in
T47D-r versus T47D in the proteomics analysis. Comparison with differential
mRNA analysis revealed that 19 of these were up- or downregulated in parallel
with the corresponding mRNA molecules. For 11 proteins, the corresponding
mRNA was not found to be differentially expressed, and for 8 proteins an
inverse regulation was found at the mRNA level. A general conclusion from
such studies is that, though the pattern of expression of the two data sets is
similar, the disconnected trends emphasize the importance of posttranslational
mechanisms in cellular development. These types of changes can only be
Acknowledgment 1 923
observed through integration of the proteomic and transcriptomic approaches.

New higher throughput methods to carry out proteome variation are making
this type of analysis more practical.
The above examples illustrate how N R target genes have been discovered
through physical experimentation. In silico approaches are also being developed
that increase the speed of N R drug discovery. For example, comprehensive
computational approaches can now be carried out to identify N R target
genes. NUBIscan represents a new computer algorithm for predicting N R
target sequences in regulatory regions of genes [127]. This approach is
being combined with other methods to quickly validate the target genes
predicted by the in silico method. High-throughput, genome-wide chromatin
immunoprecipitation methods have been combined with computational
methods to identify ER target genes and promoter sequences [128]. Genes
identified by computational analysis are not biased by target tissue or
expression levels, and thus complement microarray approaches.
In summary, N R drug discovery is moving closer to the realm of being able to
profile compounds in a setting closer to the native physiological environment,
or in an in vitro environment, with a physiologically comprehensive array of
functional partners in a high-throughput fashion.
15.3.8
Future Developments and Conclusions for NR Chemical Biology
The human NRs as a structural class are essential for life and survival, and
they play an integral role in many critical physiological processes such as
metabolism, homeostasis, differentiation, growth and development, aging,
and reproduction. This family of receptors has a common evolutionary
history as evidenced by their sequence relationship and their commonality
in cellular function [129]. The myriad of functions of NRs is vastly complex
and the pathways they control are intertwined with each other as well as with
numerous accessory proteins and partners in function. Even with this inherent
complexity, as reviewed briefly above, this family of receptors has had a long
and fruitful history for drug discovery. With the advent of high-throughput
chemistries, structural biology, novel biochemical methods, and pathway
analysis technologies, such as differential gene expression and proteomics,
there will undoubtedly be new discoveries leading to drugs with improved
therapeutic profiles. These N R modulator efforts should help in defining
better the ligand-induced activities that produce tissue-selective beneficial
effects and in minimizing unwanted activities. In addition, there are likely
to be advances toward ligand discovery for the remaining orphan receptors.
Studies using these tool compounds should lead to target validation and better
definition of therapeutic relevance for the remaining orphan NRs. Overall,
the future of targeting the N R superfamily with novel synthetic ligands holds
75 Target Families
924
I tremendous potential and should lead to a variety of safer, more effective
medicines for treatment of a plethora of human diseases.
Acknowledgment
We would like to thank Tim Willson for critically reading this review. We also
thank Lakshman Ramamurthy for his kind contribution of the N R superfamily
phylogeny plot. Finally, we would like to thank our many GlaxoSmithKline
colleagues for helpful discussions and collaborations on NR-related projects.
References
1. Nuclear Receptors Nomenclature, C., 8. M.I. Diamond, J.N. Miner, S.K.

A unified nomenclature system for Yoshinaga, K.R. Yamamoto, Tran-
the nuclear receptor superfamily, Cell scription factor interactions: selectors
1999, 97,161-163. of positive or negative regulation
2. 0. Wrange, J.A. Gustafsson, from a single DNA element, Science
Separation of the hormone- and 1990, 249,1266-1272.
DNA-binding sites of the hepatic 9. R.F. Witzmann, Steroids, Keys to
glucocorticoid receptor by means of Life, 1981.
proteolysis,J . Biol. Chem. 1978, 253, 10. M.J. Coghlan, S.W. Elmore, P.R.
856-865. Kym, M.E. Kort, The pursuit of
3. J.M. Wurtz, W. Bourguet, J.P. differentiated ligands for the
Renaud, V. Vivat, P. Chambon, glucocorticoid receptor, Curr. Top.
D. Moras, H. Gronemeyer, A Med. Chem. 2003,3,1617-1635.
canonical structure for the 11. R.B. Woodward, F. Sondheimer,
ligand-binding domain of nuclear D. Taub, The total synthesis of
receptors, Nut. Struct. B i d . 1996, 3 , cortisone, J. Am. Chem. Soc. 1951, 73,
87-94. 4057.
4. S. Khorasanizadeh, F. Rastinejad, 12. L.H. Sarett, G.E. Arth, R.M. Lukes,
Nuclear-receptor interactions on R.E. Beyler, G.I. Poos, W.F. Johns,
DNA-response elements, Trends J.M. Constantin, Stereospecifictotal
Biochem. Sci. 2001,2G, 384-390. synthesis of cortisone, J . Am. Chem.
5. W. Bourguet, M. Ruff, P. Chambon, SOC.1952, 74,4974-4976.
H. Gronemeyer, D. Moras, Crystal 13. V.C. Jordan,Tamoxifen: a most
structure of the ligand-binding unlikely pioneering medicine [see
domain of the human nuclear comment], Nut. Rev. Drug Discov.
receptor RXR-alpha [see comment], 2003,2,205-213.
Nature 1995, 375, 377-382. 14. R. Miesfeld, S. Okret, A.C. Wikstrom,
6. S.P. Williams, P.B. Sigler, Atomic 0. Wrange, J.A. Gustafsson, K.R.
structure of progesterone complexed Yamamoto, Characterization of a
with its receptor, Nature 1998, 393, steroid hormone receptor gene and
392-396. mRNA in wild-type and mutant cells,
7. B.F. Luisi, W.X. Xu, 2. Otwinowski, Nature 1984, 312, 779-781.
L.P. Freedman, K.R. Yamamoto, P.B. 15. M.V. Govindan, M. Devic, S. Green,
Sigler, Crystallographic analysis of H. Gronemeyer, P. Chambon,
the interaction of the glucocorticoid Cloning of the human glucocorticoid
receptor with DNA, Nature 1991,352, receptor cDNA, Nucleic Acids Res.
497-505. 1985, 13,8293-8304.
References I 9 2 5
16. S.M. Hollenberg, C. Weinberger, E.S. receptor analysis in model organisms

Ong, G . Cerelli, A. Oro, R. Lebo, E.B. and potential for drug discovery, Am.
Thompson, M.G. Rosenfeld, R.M. /. Phannacogenomics 2003, 3,
Evans, Primary structure and 345-353.
expression of a functional human 26. M. Robinson-Rechavi, A.S.
glucocorticoid receptor cDNA, Nature Carpentier, M. Duffraisse, V. Laudet,
1985,318,635-641. How many nuclear hormone
17. S. Green, P. Walter, G. Greene, receptors are there in the human
A. Krust, C. Goffin, E. Jensen, genome? Trends Genet. 2001, 17,
G. Scrace, M. Waterfield, 554-556.
P. Chambon, Cloning of the human 27. J.M. Lehmann, L.B. Moore, T.A.
oestrogen receptor cDNA, /. Steroid Smith-Oliver,W.O. Wilkison, T.M.
Biochem. 1986, 24, 77-83. Willson, S.A. Kliewer, An
18. P. Walter, S. Green, G. Greene, antidiabetic thiazolidinedione is a
A. Krust, J.M. Bornert, J.M. Jeltsch, high affinity ligand for peroxisome
A. Staub, E. Jensen, G. Scrace, proliferator-activated receptor
M. Waterfield, P. Chambon, Cloning gamma (WAR gamma),]. Biol.
of the human estrogen receptor Chem. 1995, 270,12953-12956.
cDNA, Proc. Natl. Acad. Sci. U.S.A. 28. D.J. Parks, S.G. Blanchard, R.K.
1985,82,7889-7893. Bledsoe, G. Chandra, T.G. Consler,
19. G.L. Greene, P. Gilna, M. Waterfield, S.A. Kliewer, J.B. Stimmel, T.M.
A. Baker, Y. Hort, J. Shine, Sequence Willson, A.M. Zavacki, D.D. Moore,
and expression of human estrogen J.M. Lehmann, Bile acids: natural
receptor complementary DNA, ligands for an orphan nuclear
Science 1986,231,1150-1154. receptor [see comment], Science 1999,
20. D.J. Mangelsdorf, R.M. Evans, The 284,1365-1368.
RXR heterodimers and orphan 29. R.K. Bledsoe, V.G. Montana, T.B.
receptors, Cell 1995, 83, 841-850. Stanley, C.J. Delves, C.J. Apolito,
21. B. Blumberg, R.M. Evans, Orphan D.D. McKee, T.G. Consler, D. J.
nuclear receptors - new ligands and Parks, E.L. Stewart, T.M. Willson,
new possibilities, Genes Deu. 1998, M.H. Lambert, J.T. Moore, K.H.
12, 3149-3155. Pearce, H.E. Xu, Crystal structure of
22. V. Giguere, Orphan nuclear the glucocorticoid receptor ligand
receptors: from gene to function, binding domain reveals a novel mode
Endocr. Rev. 1999, 20,689-725. of receptor dimerization and
23. S.A. Kliewer, J.T. Moore, L. Wade, coactivator recognition, Cell 2002,
J.L. Staudinger, M.A. Watson, S.A. 110,93-105.
Jones, D.D. McKee, B.B. Oliver, T.M. 30. R.K. Bledsoe, K.P. Madauss, J.A.
Willson, R.H. Zetterstrom, Holt, C.J. Apolito, M.H. Lambert,
T. Perlmann, J.M. Lehmann, An K.H. Pearce, T.B. Stanley, E.L.
orphan nuclear receptor activated by Stewart, R.P. Trump, T.M. Willson,
pregnanes defines a novel steroid S.P. Williams, A ligand-mediated
signaling pathway, Cell 1998, 92, hydrogen bond network required for
73-82. the activation of the
24. M. Kobayashi, S. Takezawa, K. Hara, mineralocorticoid receptor, J . Biol.
R.T. Yu, Y. Umesono, K. Agata, Chem. 2005,280, 31283-31293.
M. Taniwaki, K. Yasuda, 31. P.M. Matias, P. Donner, R. Coelho,
K. Umesono, Identification of a M. Thornaz, C. Peixoto, S. Macedo,
photoreceptor cell-specific nuclear N. Otto, S. Joschko, P. Scholz,
receptor, Proc. Natl. Acad. Sci. U.S.A. A. Wegg, S. Basler, M. Schafer,
1999, 96,4814-4819. U. Egner, M.A. Carrondo, Structural
25. J.M. Maglich, A.E. Sluder, T.M. evidence for ligand specificity in the
Willson, J.T. Moore, Beyond the binding domain of the human
human genome: examples of nuclear androgen receptor. Implications for
15 Target Families
926
I pathogenic gene mutations, J. Bid. 39. J.L. Oberfield, J.L. Collins, C.P.
Chem. 2000, 275,26164-26171. Holmes, D.M. Goreham, J.P. Cooper,
32. G.G. Kuiper, E. Enmark, J.E. Cobb, J.M. Lenhard, E.A.
M. Pelto-Huikko, S. Nilsson, J.A. Hull-Ryde, C.P. Mohr, S.G.
Gustafsson, Cloning of a novel Blanchard, D.J. Parks, L.B. Moore,
receptor expressed in rat prostate and J.M. Lehmann, K. Plunket, A.B.
ovary, Proc. Natl. Acad. Sci. U.S.A. Miller, M.V. Milburn, S.A. Kliewer,
1996, 93,5925-5930. T.M. Willson, A peroxisome
33. A.C. Pike, A.M. Brzozowski, R.E. proliferator-activated receptor
Hubbard, A structural biologist's gamma ligand inhibits adipocyte
view of the oestrogen receptor, J. differentiation, Proc. Natl. Acad. Sci.
Steroid Biochem. Mol. Biol. 2000, 74, U.S.A. 1999, 96,6102-6106.
261-268. 40. R.T. Nolte, G.B. Wisely, S. Westin,
34. A.K. Shiau, D. Barstad, P.M. Loria, J.E. Cobb, M.H. Lambert,
L. Cheng, P.J. Kushner, D.A. Agard, R. Kurokawa, M.G. Rosenfeld, T.M.
G.L. Greene, The structural basis of Willson, C.K. Glass, M.V. Milburn,
Ligand binding and co-activator
estrogen receptor/coactivator
assembly of the peroxisome
recognition and the antagonism of
proliferator-activated
this interaction by tamoxifen, Cell
receptor-gamma, Nature 1998, 395,
1998, 95,927-937.
137-143.
35. A.C.W. Pike, A.M. Brzozowski, R.E.
41. S. Svensson, T. Ostberg,
Hubbard, T. Bonn, A.G. Thorsell,
M. Jacobsson, C. Norstrom,
0. Engstrom, J. Ljunggren, J.K.
K. Stefansson, D. Hallen, I.C.
Gustafsson, M. Carlquist, Structure
Johansson, K. Zachrisson, D. Ogg,
of the ligand-binding domain of L. Jendeberg, Crystal structure of the
oestrogen receptor beta in the heterodimeric complex of LXRalpha
presence of a partial agonist and a and RXRbeta ligand-binding
full antagonist, E M B O J . 1999, 18, domains in a fully agonistic
4608-46 18. conformation, EMBOJ. 2003, 22,
36. A.M. Brzozowski, A.C. Pike, 4625-4633.
2. Dauter, R.E. Hubbard, T. Bonn, 42. S. Williams, R.K. Bledsoe, J.L.
0. Engstrom, L. Ohman, G.L. Collins, S. Boggs, M.H. Lambert,
Greene, J.A. Gustafsson, A.B. Miller, J. Moore, D.D. McKee,
M. Carlquist, Molecular basis of L. Moore, J. Nichols, D. Parks,
agonism and antagonism in the M. Watson, B. Wisely, T.M. Willson,
oestrogen receptor, Nature 1997, 389, X-ray crystal structure of the liver X
753-758. receptor beta ligand binding domain:
37. A.K. Shiau, D. Barstad, J.T. Radek, regulation by a histidine-tryptophan
M.J. Meyers, K.W. Nettles, B.S. switch, J. Biol. Chem. 2003, 278,
Katzenellenbogen, J .A. 27138-27143.
Katzenellenbogen, D.A. Agard. G.L. 43. M. Farnegardh, T. Bonn, S. Sun,
Greene, Structural characterization J. Ljunggren, H. Ahola,
of a subtype-selective ligand reveals a A. Wilhelmsson, J.A. Gustafsson,
novel mode of estrogen receptor M. Carlquist, The three-dimensional
antagonism, Nut. Struct. Biol. 2002, 9, structure of the liver X receptor beta
359-364. reveals a flexible ligand-binding
38. A.C.W. Pike,A.M. Brzozowski, pocket that can accommodate
J. Walton, R.E. Hubbard, A.G. fundamentally different ligands, /.
Thorsell, Y.L. Li, J.A. Gustafsson, Biol. Chem. 2003, 278,38821-38828.
M. Carlquist, Structural insights into 44. G.B. Wisely, A.B. Miller, R.G. Davis,
the mode of action of a pure A.D. Thornquest Jr, R. Johnson, T.
antiestrogen, Structure 2001, 9, Spitzer, A. Sefler, B. Shearer, J.T.
145- 153. Moore, T.M. Willson, S.P. Williams,
References I 9 2 7
Hepatocyte nuclear factor 4 is a that leads to active antagonism, /.

transcription factor that constitutively Biol. Chem. 2003, 278,22748-22754.
binds fatty acids, Structure 2002, 10, 51. B.A. Johnson, E.M. Wilson, Y. Li,
1225-1234. D.E. Moller, R.G. Smith, G. Zhou,
45. 2. Wang, G. Benoit, J. Liu, S. Prasad, Ligand-induced stabilization of
P. Aarnisalo, X. Liu, H. Xu, N.P. PPARgamma monitored by NMR
Walker, T. Perlmann, Structure and spectroscopy: implications for
function of Nurrl identifies a class of nuclear receptor activation, /. Mol.
ligand-independent nuclear Biol. 2000, 298, 187-194.
receptors, Nature 2003, 423, 555-560. 52. M. Beato, J. Klug, Steroid hormone
46. S. Dhe-Paganon, K. Duda, receptors: an update, Hum. Reprod.
M. Iwamoto, Y.I. Chi, S.E. Shoelson, Update 2000, 6,225-236.
Crystal structure of the HNF4 alpha 53. H. Gronemeyer, J.A. Gustafsson,
ligand binding domain in complex V. Laudet, Principles for modulation
with endogenous fatty acid ligand, /. of the nuclear receptor superfamily,
Nat. Rev. Drug Discou. 2004, 3,
Biol. Chem. 2002, 277, 37973-37976.
950-964.
47. K.D. Baker, L.M. Shewchuk,
54. W.B. Pratt, The role of heat shock
T. Kozlova, M. Makishima,
proteins in regulating the function,
A. Hassell, B. Wisely, J.A. Caravella,
folding, and trafficking of the
M.H. Lambert, J.L. Reinking,
glucocorticoid receptor, 1.Biol. Chem.
H. Krause, C.S. Thummel, T.M.
1993, 268,21455-21458.
Willson, D.J. Mangelsdorf, The
55. S.A. Onate, S.Y. Tsai, M.J. Tsai, B.W.
Drosophila orphan nuclear receptor O’Malley, Sequence and
DHR38 mediates an atypical characterization of a coactivator for
ecdysteroid signaling pathway, Cell the steroid hormone receptor
2003, 113,731-742. superfamily, Science 1995, 270,
48. D.P. McDonnell, D.L. Clemm, 1354-1357.
T. Hermann, M.E. Goldman, J.W. 56. C.K. Glass, D.W. Rose, M.G.
Pike, Analysis of estrogen receptor Rosenfeld, Nuclear receptor
function in vitro reveals three distinct coactivators, Curr. Opin. Cell Biol.
classes of antiestrogens, Mol. 1997, 9,222-232.
Endocrinol. 1995, 9, 659-669. 57. J.W. Lee, Y.C. Lee, S.Y. Na, D.J. Jung,
49. H.E. Xu, T.B. Stanley, V.G. Montana, S.K. Lee, Transcriptional coregulators
M.H. Lambert, B.G. Shearer, J.E. of the nuclear receptor superfamily:
Cobb, D.D. McKee, C.M. Galardi, coactivators and corepressors, Cell.
K.D. Plunket, R.T. Nolte, D.J. Parks, Mol. Lfe Sci. 2001, 58, 289-297.
J.T. Moore, S.A. Kliewer, T.M. 58. N.J. McKenna, B.W. O’Malley,
Willson, J , B. Stimmel, Structural Minireview: nuclear receptor
basis for antagonist-mediated coactivators - an update [Review],
recruitment of nuclear co-repressors Endocrinology 2002, 143,2461-2465.
by PPARalpha, Nature 2002, 415, 59. D.M. Heery, E. Kalkhoven, S. Hoare,
813-817. M.G. Parker, A signature motif in
so. B. Kauppi, C. Jakob, M. Farnegardh, transcriptional co-activators mediates
J. Yang, H. Ahola, M. Alarcon, binding to nuclear receptors [see
K. Calles, Am.0. Engstr, J. Harlan, comment], Nature 1997, 387,
S. Muchmore, A.K. Ramqvist, 733-736.
S. Thorell, L. Ohman, J . Greer, J.A. 60. D.M. Heery, S. Hoare, S. Hussain,
Gustafsson, J . Carlstedt-Duke, M.G. Parker, H. Sheppard, Core
M. Carlquist, The three-dimensional w ( L L motif sequences in
structures of antagonistic and CREB-binding protein, SRC1, and
agonistic forms of the glucocorticoid RIP140 define affinity and selectivity
receptor ligand-binding domain: for steroid and retinoid receptors, /.
RU-486 induces a transconformation Biol. Chem. 2001, 276,6695-6702
9281 15 Target Families
61. T.E. Spencer, G. Jenster, M.M. in the absence of high affinity DNA
Burcin, C.D. Allis, J. Zhou, C.A. binding by the estrogen receptor, J .
Mizzen, N.J. McKenna, S.A. Onate, Biol. Chem. 1994, 269,12940-12946.
S.Y. Tsai, M.J. Tsai, B.W. O’Malley, 71. A. Ray, K.E. Prefontaine, Physical
Steroid receptor coactivator-1 is a association and functional
histone acetyltransferase, Nature antagonism between the p65 subunit
1997,389,194-198. of transcription factor NF-kappa B
62. M.L. Privalsky, The role of and the glucocorticoid receptor, Proc.
corepressors in transcriptional Natl. Acad. Sci. U.S.A. 1994, 91,
regulation by nuclear hormone 752-756.
receptors, Annu. Rev. Physiol. 2004, 72. E. Caldenhoven, J. Liden, S. Wissink,
66, 315-360. A. Van de Stolpe, J. Raaijmakers,
63. J.D. Chen, R.M. Evans, A L. Koenderman, S. Okret, J.A.
transcriptional co-repressor that Gustafsson, P.T. Van der Saag,
interacts with nuclear hormone Negative cross-talk between RelA and
receptors [see comment], Nature the glucocorticoid receptor: a possible
1995,377,454-457. mechanism for the antiinflammatory
64. A.J. Horlein, A.M. Naar, T. Heinzel, action of glucocorticoids, Mol.
J. Torchia, B. Gloss, R. Kurokawa, Endocrinol. 1995, 9,401-412.
A. Ryan, Y. Kamei, M. Soderstrom, 73. V. Doucas, Y. Shi, S. Miyamoto,
C.K. Glass, M.G. Rosenfeld, A. West, I. Verma, R.M. Evans,
Ligand-independent repression by Cytoplasmic catalytic subunit of
the thyroid hormone receptor protein kinase A mediates
mediated by a nuclear receptor cross-repression by NF-kappa B and
co-repressor [see comment], Nature the glucocorticoid receptor, Proc.
1995,377, 397-404. Natl. Acad. Sci. U.S.A. 2000, 97,
65. M.G. Guenther, 0. Barak, M.A. 11893-11898.
Lazar, The SMRT and N-CoR 74. R. Losel, M. Wehling, Nongenomic
corepressors are activating cofactors actions of steroid hormones, Nat.
for histone deacetylase 3, Mol. Cell. Rev. Mol. Cell Biol. 2003, 4,46-56.
Biol. 2001, 21, 6091-6101. 75. V. Boonyaratanakornkit, M.P. Scott,
66. X. Hu, M.A. Lazar, The CoRNR motif V. Ribon, L. Sherman, S.M.
controls the recruitment of Anderson, J.L. Maller, W.T. Miller,
corepressors by nuclear hormone D.P. Edwards, Progesterone receptor
receptors, Nature 1999, 402, 93-96. contains a proline-rich motif that
67. M. Gottlicher, S. Heck, P. Herrlich, directly interacts with SH3 domains
Transcriptional cross-talk, the second and activates c-Src family tyrosine
mode of steroid hormone receptor kinases, Mol. Cells 2001, 8, 269-280.
action [see comment], J. Mol. Med. 76. M.A. Shupnik, Crosstalk between
1998, 76,480-489. steroid receptors and the
68. L.I. McKay, J.A. Cidlowski, Cross-talk c-Src-receptor tyrosine kinase
between nuclear factor-kappa B and pathways: implications for cell
the steroid hormone receptors: proliferation, Oncogene 2004, 23,
mechanisms of mutual antagonism, 7979-7989.
Mol. Endocrinol. 1998, 12,45-56. 77. C.Z. Song, X. Tian, T.D. Gelehrter,
69. L.I. McKay, J.A. Cidlowski, Molecular Glucocorticoid receptor inhibits
control of immune/inflammatory transforming growth factor-beta
responses: interactions between signaling by directly targeting the
nuclear factor-kappa B and steroid transcriptional activation function of
receptor-signaling pathways, Endocr. Smad3, Proc. Natl. Acad. Sci. U.S.A.
Rev. 1999, 20,435-459. 1999, 96,11776-11781.
70. A. Ray, K.E. Prefontaine, P. Ray, 78. A. Bruna, M. Nicolas, A. Munoz, J.M.
Down-modulation of interleukin-6 Kyriakis, C. Caelles, Glucocorticoid
gene expression by 17 beta-estradiol receptor-JNK interaction mediates
References I929
inhibition of the I N K pathway by of SERMs [see comment], Science

glucocorticoids, E M B O J . 2003,22, 2002,295,2465-2468.
6035- 6044. 90. P. Webb, P. Nguyen, P.J. Kushner,
79. T. Simoncini, A. Hafezi-Moghadam, Differential SERM effects on
D.P. Brazil, K. Ley, W.W. Chin, J.K. corepressor binding dictate ERalpha
Liao, Interaction of oestrogen activity in vivo, J. Biol. Chem. 2003,
receptor with the regulatory subunit 278,6912-6920.
of phosphatidylinositol-3-OH kinase, 91. T.M. Willson, B.R. Henke, T.M.
Nature 2000,407,538-541. Momtahen, P.S. Charifson, K.W.
80. S. Kato, Estrogen receptor-mediated Batchelor, D.B. Lubahn, L.B. Moore,
cross-talk with growth factor B.B. Oliver, H.R. Sauls, J.A.
signaling pathways, Breast Cancer Triantafillou, S.G. Wolfe, P.G. Baer,
2001,8,3-9. 3-[4-(1,2-Diphenylbut-l-
81. J. Bastien, C. Rochette-Egly, Nuclear enyl)phenyl]acrylic acid: a
retinoid receptors and the non-steroidal estrogen with
transcription of retinoid-target genes, functional selectivity for bone over
Gene 2004,328,l-16. uterus in rats, J . Med. Chem. 1994,
82. C . Rochette-Egly, Nuclear receptors: 37,1550-1552.
integration of multiple signalling 92. T.M. Willson, J.D. Norris, B.L.
Wagner, I. Asplin, P. Baer, H.R.
pathways through phosphorylation,
Cell. Signalling 2003,IS,355-366.
Brown, S.A. Jones, B. Henke,
H. Sauls, S. Wolfe, D.C. Morris, D.P.
83. D.A. Lannigan, Estrogen receptor
phosphorylation, Steroids 2003,68,
McDonnell, Dissection of the
molecular mechanism of action of
1-9.
GW5638,a novel estrogen receptor
84. R.R. Love, R.B. Mazess, H.S. Barden,
ligand, provides insights into the role
S. Epstein, P.A. Newcomb, V.C. of estrogen receptor in bone,
Jordan, P.P. Carbone, D.L. DeMets, Endocrinology 1997,138,3901-3911.
Effects of tamoxifen on bone mineral 93. C.E. Connor, J.D. Norris,
density in postmenopausal women G. Broadwater, T.M. Willson, M.M.
with breast cancer [see comment], N. Gottardis, M.W. Dewhirst, D.P.
Engl.J. Med. 1992,326,852-856. McDonnell, Circumventing
85. P.D. Delmas, N.H. Bjarnason, B.H. tamoxifen resistance in breast
Mitlak, A.C. Ravoux, A.S. Shah, W.J. cancers using antiestrogens that
Huster, M. Draper, C. Christiansen, induce unique conformational
Effects of raloxifene on bone mineral changes in the estrogen receptor,
density, serum cholesterol Cancer Res. 2001,61,2917-2922.
concentrations, and uterine 94. H.U. Bryant, Selective estrogen
endometrium in postmenopausal receptor modulators, Rev. Endocr.
women [see comment], N. Engl. J. Metab. Disord. 2002,3,231-241.
Med. 1997,337,1641-1647. 95. M.J. Meegan, D.G. Lloyd, Advances
86. S.M. Ismail, The effects oftamoxifen in the science of estrogen receptor
on the uterus, Curr. Opin. Obstet. modulation, Curr. Med. Chem. 2003,
Gynecol. 1996,8, 27-31. 10,181-210.
87. C.H. Turner, M. Sato, H.U. Bryant, 96. C.C. Chadwick, S. Chippari,
Raloxifene preserves bone strength E. Matelan, L. Borges-Marcucci, A.M.
and bone mass in ovariectomized Eckert, J.C. Keith Jr, L.M. Albert,
rats, Endocrinology 1994,135, Y. Leathurby, H.A. Harris, R.A. Bhat,
2001-2005. M. Ashwell, E. Trybulski, R.C.
88. D.P. McDonnell, The molecular Winneker, S.J. Adelman, R.J. Steffan,
pharmacology of SERMs, Trends D.C. Harnish, Identification of
Endocrinol. Metab. 1999,10, 301-311. pathway-selective estrogen receptor
89. Y.Shang, M. Brown, Molecular ligands that inhibit NF-kappaB
determinants for the tissue specificity transcriptional activity, Proc. Natl.
930
Acad. Sci. U.S.A. 2005,102, antiinflammatory activity in vivo,

2543-2548. Mol. Endocritd. 1997,11,1245-1255.
97. J.A. Gustafsson, What 104. S.W. Elmore, M.J. Coghlan, D.D.
pharmacologists can learn from Anderson, J.K. Pratt, B.E. Green,
recent advances in estrogen A.X. Wang, M.A. Stashko, C.W. Lin,
signalling, Trends Pharmacol. Sci. C.M. Tyree, J.N. Miner, P.B.
2003,24,479-485. Jacobson, D.M. Wilcox, B.C. Lane,
98. B.R. Henke, T.G. Consler, N. Go, Nonsteroidal selective glucocorticoid
R.L. Hale, D.R. Hohman, S.A. Jones, modulators: the effect of C-5 alkyl
A.T. Lu, L.B. Moore, J.T. Moore, L.A. substitution on the transcriptional
Orband-Miller, R.G. Robinett, activation/repression profile of 2,s-
J. Shearin, P.K. Spearing, E.L. dihydro-lO-methoxy-2,2,4-trimethyl-
Stewart, P.S. Turnbull, S.L. Weaver, 1H-[l]benzopyrano[3,4-flcluinolines,
S.D. Williams, G.B. Wisely, M.H. J . Med. Chem. 2001,44,4481-4491.
Lambert, A new series of estrogen 105. S.W. Elmore, J.K. Pratt, M.J.
receptor modulators that display Coghlan, Y. Mao, B.E. Green, D.D.
selectivity for estrogen receptor beta, Anderson, M.A. Stashko, C.W. Lin,
/. Med. Chem. 2002,45,5492-5505. D. Falls, M. Nakane, L. Miller, C.M.
99. H.A. Harris, J.A. Katzenellenbogen, Tyree, J.N. Miner, B. Lane,
B. S. Katzenellenbogen, Differentiation of in vitro
Characterization of the biological transcriptional repression and
roles of the estrogen receptors, activation profiles of selective
ERalpha and ERbeta, in estrogen glucocorticoid modulators, Bioorg.
target tissues in vivo through the use Med. Chem. Lett. 2004,14,
of an ERalpha-selective ligand, 1721-1727.
Endocrinology 2002,143,4172-4177. 106. H. Schacke, A. Schottelius, W.D.
100. E.S. Manas, R.J. Unwalla, Z.B. Xu, Docke, P. Strehlke, S. Jaroch,
M.S. Malamas, C.P. Miller, H.A. N. Schmees, H. Rehwinkel,
H. Hennekes, K. Asadullah,
Harris, C. Hsiao, T. Akopian, W.T.
Dissociation of transactivation from
Hum, K. Malakian, S. Wolfrom,
transrepression by a selective
A. Bapat, R.A. Bhat, M.L. Stahl, W.S.
glucocorticoid receptor agonist leads
Somers, J.C. Alvarez, Structure-based
to separation of therapeutic effects
design of estrogen receptor-beta
from side effects, Proc. Natl. Acad.
selective ligands, J. Am. Chem. Soc.
Sci. U.S.A. 2004,101, 227-232.
2004,126,15106-15119. 107. D. DeManno, W. Elger, R. Garg,
101. H.M. Reichardt, K.H. Kaestner,
R. Lee, B. Schneider,
J. Tuckermann, 0. Kretz, 0. Wessely, H. Hess-Stumpp, G. Schubert,
R. Bock, P. Gass, W. Schmid, K. Chwalisz, Asoprisnil (J8G7): a
P. Herrlich, P. Angel, G. Schutz, selective progesterone receptor
DNA binding of the glucocorticoid modulator for gynecological therapy,
receptor is not essential for survival, Steroids 2003,68, 1019-1032.
Cell 1998,93,531-541. 108. B.J. Barnes, P.A. Howard,
102. B.R. Walker, Deflazacort: towards Eplerenone: a selective aldosterone
selective glucocorticoid receptor receptor antagonist for patients with
modulation? Clin. Endocrinol. 2000, heart failure, Ann. Pharmacother.
52,13-15. 2005,39,68-76.
103. B.M. Vayssiere, S. Dupont, 109. J.A. Delyani, Mineralocorticoid
A. Choquart, F. Petit, T. Garcia, receptor antagonists: the evolution of
C. Marchandeau, H. Gronemeyer, utility and pharmacology, Kidney Int.
M. Resche-Rigon, Synthetic 2000,57,1408-1411.
glucocorticoids that dissociate 110. J.P. Heaton, Andropause: coming of
transactivation and AP-1 age for an old concept? CUT. Opin.
transrepression exhibit Urol. 2001,11, 597-601.
References I931
111. R.S. Tan, S.J. Pu, J.W. Culberson, function of residues in the nuclear
Role of androgens in mild cognitive receptor ligand-binding domain, J .
impairment and possible Mol. Biol. 2004, 341, 321-335.
interventions during andropause, 119. J.D. Baxter, J.W. Funder, J.W.
Med. Hypotheses 2004, 62, 14-18. Apriletti, P. Webb, Towards
112. A.F. Santos, H. Huang, D.J. Tindall, selectively modulating
The androgen receptor: a potential mineralocorticoid receptor function:
target for therapy of prostate cancer, lessons from other systems, Mol.
Steroids 2004, 69, 79-85. Cell. Endocrinol. 2004, 217, 151-165.
113. J . Rosen, A. Negro-Vilar, Novel, 120. J.D. Norris, L.A. Paige, D.J.
non-steroidal, selective androgen Christensen, C.Y. Chang, M.R.
receptor modulators (SARMs) with Huacani, D. Fan, P.T. Hamilton,
anabolic activity in bone and muscle D.M. Fowlkes, D.P. McDonnell,
and improved safety profile, 1. Peptide antagonists of the human
Musculoskelet. Neuronal Interact. estrogen receptor, Science 1999, 285,
2002,2,222-224. 744-746.
114. J.P. Berger, A.E. Petro, K.L. Macnaul, 121. M.A. Iannone, C.A. Simmons, S.H.
L. J. Kelly, B.B. Zhang, K. Richards, Kadwell, D.L. Svoboda, D.E.
A. Elbrecht, B.A. Johnson, G. Zhou, Vanderwall, S.-J. Deng, T.G. Consler,
T.W. Doebber, C. Biswas, M. Parikh, J . Shearin, J.G. Gray, K.H. Pearce,
N. Sharma, M.R. Tanen, Correlation between in vitro peptide
G.M. Thompson, J. Ventre, binding profiles and cellular activities
A.D. Adams, R. Mosley, R.S. Sunvit, for estrogen receptor modulating
D.E. Moller, Mol. Endocrinol. 2003, compounds, Mol. Endocrinol. 2004,
17,662-676. 18,1064-1081.
115. M. Downes, M.A. Verdecia, A.J. 122. K.H. Pearce, M.A. Iannone, C.A.
Roecker, R. Hughes, J.B. Hogenesch, Simmons, J.G. Gray, Discovery of
H.R. Kast-Woelbern, M.E. Bowman, novel nuclear receptor modulating
J.L. Ferrer, A.M. Anisfeld, P.A. ligands: an integral role for peptide
Edwards, J.M. Rosenfeld, J.G.
interaction profiling, Drug Discov.
Alvarez, J.P. Noel, K.C. Nicolaou,
Today 2004, 9, 741-751.
R.M. Evans, A chemical, genetic, and
123. F. Stossi, D.H. Barnett, J. Frasor,
structural analysis of the nuclear bile
B. Komm, C.R. Lyttle, B.S.
acid receptor FXR [see comment],
Katzenellenbogen, Transcriptional
Mol. Cells 2003, I I , 1079-1092.
profiling of estrogen-regulated gene
116. E.M. Quinet, D.A. Savio, A.R.
expression via estrogen receptor (ER)
Halpern, L. Chen, C.P. Miller,
P. Nambi, Gene-selective modulation
alpha or ERbeta in human
by a synthetic oxysterol ligand of the osteosarcoma cells: distinct and
liver X receptor, J . Lipid Res. 2004, 45, common target genes for these
1929-1942. receptors, Endocrinology 2004, 145,
117. B. Miao, S. Zondlo, S. Gibbs, 3473-3486.
D. Cromley, V.P. Hosagrahara, T.G. 124. M. Kian Tee, I. Rogatsky,
Kirchgessner, J. Billheimer, C. Tzagarakis-Foster, A. Cvoro, J. An,
R. Mukherjee, Raising HDL R.J. Christy, K.R. Yamamoto, D.C.
cholesterol without inducing hepatic Leitman, Estradiol and selective
steatosis and hypertriglyceridemia by estrogen receptor modulators
a selective LXR modulator, /. Lipid differentially regulate target genes
Res. 2004,45,1410-1417. with estrogen receptors alpha and
118. S . Folkertsma, P. van Noort, J. Van beta, Mol. Biol. Cell 2004, 15,
Durme, H.J. Joosten, E. Bettler, 1262-1272.
W. Fleuren, L. Oliveira, F. Horn, 125. P.E. Young, D.K. Bol, High-
J . De Vlieg, G. Vriend, A throughput transcriptional profiling
family-based approach reveals the for drug discovery and lead
15 Target Families
932
I development, Genet. Eng. News response elements, Mol. Endocrinol.
2003, 23. 2002, 16,1269-1279.
126. M. Huber, I. Bahr, J.R. Kratzschmar 128. V.X. Jin, Y.W. Leu, S. Liyanarachchi,
A. Becker, E.C. Muller, P. Donner, H. Sun, M. Fan, K.P. Nephew, T.H.
H.D. Pohlenz, M.R. Schneider, Huang, R.V. Davuluri, Identifying
A. Sommer, Comparison of estrogen receptor alpha target genes
proteomic and genomic analyses of using integrated computational
the human breast cancer cell line genomics and chromatin
T47D and the antiestrogen-resistant immunoprecipitation microarray,
derivative T47D-r, Mol. Cell. Nucleic Acids Res. 2004, 32,
Proteomics 2004, 3, 43-55. 6627-6635.
127. M. Podvinec, M.R. Kaufmann, 129. H. Escriva, S. Bertrand, V. Laudet,
C. Handschin, U.A. Meyer, The evolution of the nuclear receptor
NUBIScan, an in silico approach for superfamily, Essays Biochem. 2004,
prediction of nuclear receptor 40,ll-26.
Chemical Biology
15.4 The GPCR - 7TM Receptor Target Family I 9 3 3
15.4
The CPCR - 7TM Receptor Target Family
EdgarJacoby, Rochdi Bouhelal, Marc Gerspacher, and Klaus Seuwen
Outlook
Chemical biology approaches have a long history in the exploration of the

G-protein-coupled receptor (GPCR) family, which represents the largest
and most important group of targets for therapeutics. The analysis of
the human genome revealed a significant number of new members with
unknown physiological functions, which are today the focus of many
reverse pharmacology drug discovery programs. As the seven hydrophobic
transmembrane segments are a defining common structural feature of these
receptors, and as signaling via heterotrimeric G-proteins is not demonstrated
in all cases, these proteins are also referred to as seven transmembrane
(7TM) or serpentine receptors. This chapter will summarize important historic
milestones of GPCR research, from the beginning when pharmacology
was mainly descriptive, to the age of modern molecular biology with the
cloning of the first receptor, and now the availability of the entire human
GPCR repertoire at the sequence and protein levels. The chapter will
show how GPCR-directed drug discovery was initially based on the careful
testing of few specifically made chemical compounds and is today pursued
with modern drug discovery approaches, including combinatorial library
design, structural biology, and molecular informatics, as well as advanced
screening technologies for the identification of new compounds activating or
inhibiting GPCRs specifically. Such compounds, in conjunction with other
new technology, allow us to study the role of receptors in physiology and
medicine, and hopefully result in novel therapies. We will also outline how
basic research on the signaling and regulatory mechanisms of GPCRs is
advancing, leading to the discovery of new GPCR-interacting proteins, and
thus opening new perspectives for drug development. Practical examples
from GPCR expression studies, high-throughput screening (HTS), and the
design of monoamine-related GPCR-focused combinatorial libraries illustrate
ongoing GPCR chemical biology research. Finally, we will attempt to outline
future progress that may relate today’s discoveries to the development of new
medicines.
15.4.1
Introduction
G-protein-coupled receptors (GPCRs) are the largest known gene superfamily

of the human genome. Around 30% of all marketed prescription drugs
ISBN: 978-3-527-31150-7
934
I act on GPCRs
15 Target Families
(Fig. 15.4-1); in addition, they include around 30% of

all targets investigated until now, which makes this class of proteins
historically the most successful therapeutic target family [ 11. As illustrated
in Table 15.4-1, GPCR-directed drugs cover a wide range of therapeutic
indications [I, 21.
Sumatriptan Olanzapine Fexofenadine

5-HT1, agonist mixed 5-HTdD,/D2 antagonist H, antagonist
Ho&u
H 2 N
0
q
OH
H : + D - ~ ~ o / % hfNx
\
'N- NH
Gabapentin Salmeterol Valsartan
GABA, agonist p, agonist AT, antagonist
Risperidone Clopidogrel Farnotidine

mixed 5-HTJD, antagonist P2Y1, antagonist H, antagonist
OH
0 f
F.NH
0 N
'J
0
Leuprorelin
LH-RH agonist
Fig. 15.4-1 Chemical structures o f t o p selling CPCR drugs listed in Table 1

15.4 The CPCR - 7TM Receptor Target Family 1 935
Table 15.4-1 Examples oftop selling CPCR drugs - source IMS

Knowledge Link. CPCR drugs cover many therapeutic indications
and represent a substantial part of today’s marketed medicines.
Reported world sales are for 12 months ending with Q1 2005
~
Trade name Molecular entity Company Therapeutic World sales

indication ($ millions)
AllegraiTelfast ’ Fexofenadine Sanofi-Aventis Allergies 1792

DiovanLR Valsartan Novartis Hypertension 2214
GasterLR Famotidine Yamanouchi Gastric ulcer 656
Imigran Sumatriptan GlaxoSmithKline Migraine 1454
Leuplin/Lupron ’ Leuprorelin Takeda/Abbott Cancer 904
Neurontin“ GABApentin Pfizer Neurological 2480
pain
PIavix@ Clopidogrel Bristol-Myers Stroke 5277
Squibb
Risperdal“ Risperidone Johnson & Johnson Schizophrenia 3716
serevent“ Salmeterol GlaxoSmithKline Asthma 679
Zyprexa‘” Olanzapine Elli Lilly Schizophrenia 4905
“Before cloning”, GPCRs were originally defined as receptors transducing

signals from the extracellular compartment to the interior via biochemical
processes involving GTP-binding proteins. Molecular cloning of the first
receptor genes suggested protein structures similar to rhodopsin, with seven
transmembrane a-helical domains (hence also “7TM receptors”). Today,
GPCRs are known as extremely versatile receptors for extracellular messengers
as diverse as biogenic amines, purines and nucleic acid derivatives, lipids,
peptides and proteins, odorants, pheromones, tastants, ions like calcium and
protons, and even photons in the case of rhodopsin. GPCRs can form homo-
and heterodimers, as well as complex receptosomes, which in case-by-case
dependent manner can incorporate additional intra- and extracellular soluble
and transmembrane proteins [3, 41.
As illustrated in Fig. 15.4-2, three main families of human GPCRs are
known. The rhodopsin-like family A is the largest and the best studied from
the structural and functional points of view. The other two main subfamilies
are the secretin-like receptor family B, which binds several neuropeptides and
other peptide hormones, and the metabotropic glutamate receptor (mG1uR)-
like family C. A still separate group is constituted by the receptors of the
frizzled family, for which the direct coupling to heterotrimeric G-proteins is
still a matter of debate [5].
The human GPCRs have recently been reclassified using phylogenetic
analyses into five different groups named GRAFS, which is the acronym
for the groups: glutamate, rhodopsin, adhesion, frizzledltaste2, and secretin
[9]. The GRAFS system shows some distinct differences to the classification
given above: (a)the adhesion receptors, which are expressed on leukocytes
and in the central nervous system, are formed by secretin-like receptors
that have a long N-terminal domain including adhesion molecule repeats
936
15.4 The CPCR - 7TM Receptor Target Family 1937
4 Fig. 15.4-2 Classification o f human preferentially t o activate the effector

CPCRs. As described in greater detail below, adenylate cyclase through the C-protein G,,
the human genome contains 720-800 and in general t o a lesser extent t o G , and
genes coding for functional CPCRs. On the C, [7]. Family C CPCRs include the mCluR,
basis of their primary sequence, these genes the y-aminobutyric acid type B (CABAe) and
were historically classified into three main Ca+’-sensing receptors (CaR). This group
families, A, B, C or 1-3, respectively [6]. has 17 members in the human genome,
Sequences within each family generally including notably the pheromone receptors,
share over 25% sequence identity in the which form a small family in humans, but a
7TM core region. The rhodopsin-like family much larger one in rodents. The majority o f
A is by far the largest subgroup and contains family C receptors are characterized by very
the opsins, the olfactory GPCRs, small large N- and C-terminal tails, a disulfide
molecule/peptide hormone CPCRs, and bridge connecting the first and second
glycoprotein hormone CPCRs. Family A extracellular loop, together with a very short
CPCRs are characterized by several highly and well conserved third intracellular loop
conserved amino acids in the 7TM bundle, (e). A number o f t h e strongly conserved
and there is usually a disulfide bridge linking residues of class A CPCRs are also strongly
extracellular loops E l and E2 (there are only conserved in class C CPCRs; this is
few exceptions including the melanin consistent with class A and class C
stimulating/adrenocorticotropic hormone receptors having a common ancestor. The
(MSH/ACTH) and the cannabinoid ligand-binding site i s located in the
receptors). Most ofthe family A receptors N-terminal domain which is composed of
have a palmitoylated cystein in the the so-called Venus flytrap module (VFTM)
intracellular C-terminal tail. The binding that shares sequence similarity with
sites ofthe endogenous small molecule bacterial periplasmic amino acid-binding
hormone ligands o f class A CPCRs are proteins. In all class C CPCRs, except the
located within the 7TM bundle ((a), the CABAB receptor, a cysteine-rich domain
ligand-binding site is indicated in orange). (CRD) containing nine conserved cysteines
For peptide and glycoprotein hormone links the VFTM to the 7TM domain. For
receptors (respectively, (b) and (c)), binding mCluRl, the VFTM domain was crystallized
occurs via the N-terminal, the extracellular in the liganded and unliganded state and
loop segments, and the superior parts o f t h e was shown to form a disulfide linked
T M helices. Family B comprises 50 CPCRs homodimer undergoing considerable
for peptides like secretin, calcitonin, and reorganization upon ligand binding [8].The
parathyroid hormone. The family B CPCRs 11 human frizzled/smoothened receptors
are characterized by a relatively long control cell development and proliferation
N-terminal tail, which together with the mediated by secreted glycoproteins called
juxtamembrane 7TM parts is implicated in Writ and Hedgehog. The N-terminus
ligand binding and contains a network of contains a CRD ligand-binding domain with
three conserved disulfide bridges defining a 10 conserved CYsteines, all ofwhich form
globular domain structure (d); the 3D disulfide bonds. The names frizzled and
structure ofthe N-terminal ligand-binding smoothened refer to specific Drosophila
domain ofthe mouse CRF? receptorwas phenotypes that are linked t o mutations in
recently determined by high-resolution N M R the DrosoPh;la oflhologs. The N-terminal
spectroscopy. As in family A, the family B domains of CPCRs contain, in general,
receptors show a number of conserved N-glycolysation sites for posttranslational
proline residues within the TM segments modification, which ensure correct folding
and are thought t o be essential for the in the endoplasmatic reticulum and
conformational dynamics o f the receptors. Cell-sUrfaCe expression.
Family B receptors appear to couple
938
I like epidermal growth factor (EGF) domains and are likely involved in
15 Target Families
cell-cell interactions and (b) the taste receptors were reclassified into
two subgroups, one within the glutamate group and one together with
frizzledltaste2 group. While the GRAFS classification is useful, in this chapter
for historic reasons we will maintain the A, B, C nomenclature, as described
above.
In the last decades, several GPCR subfamilies were explored systematically in
such a way that today selective ligands and drugs are known for a large number
of receptors of these families [lo].The elucidation of the human genome with
the discovery of the sequences of many novel orphan GPCRs with unknown
functions provided the basis for further systematization of the exploration
of the GPCR superfamily for drug discovery. Because of the evolutionary
conserved commonalities existing inside a homogeneous subgroup of GPCRs,
especially for aspects of molecular recognition, it is a very rational expectation
that through further focus within subfamilies it will be possible to find ligands
of the new receptors and to discover innovative medicines [11,121.
This chapter will summarize the milestones of GPCR research and show how
modern chemical biology disciplines and discovery technologies are currently
used to explore this highly important target family and to contribute to new
and better medicines.
15.4.2
History/Development
In their unparalleled significance for medicine, the history of GPCR chemical

biology is in principle as old as the history of pharmacology [13]. Since the
beginning of the nineteenth century, pharmacologists like Ariens, Furchgott,
Schild, Blake, and others investigated animal models, isolated organs and
tissues to study dose-dependent activity of neurotransmitters, and peptide
hormones as well as natural and synthetic drugs. The targets for most of these
molecules later turned out to be GPCRs and ion channels. Many essential
concepts like the binding site and receptor theory, the definitions of agonists
and antagonists, affinity and efficacy, as well as the usage of radioligands for
binding studies and receptor quantification, were established (Table 15.4-2).
Several methods emerged to analyze quantitatively the dose response of
compounds.
The molecular nature of the receptors remained, however, unrevealed
long after pioneers of biochemistry - including Krebs, Rodbell, and Gilman,
working on adrenoceptors - had discovered important elements of the
signaling cascade in the 1960s and 1970s [15, 161. The early milestones for the
elucidation of the signaling cascades, which couple hormones via the receptors
to the intracellular effector proteins, included the discovery of cyclic adenosine
monophosphate (CAMP) by Sutherland as the first characterized second
messenger [17]; the enzyme adenylate cyclase responsible for its synthesis;
15.4 The CPCR - 7TM Receptor Target Family I939
Table 15.4-2 General pharmacological terms used in this chapter

to describe compound action at the GPCRs
Term Definition
Receptor A cellular macromolecule, or an assembly of macromolecules, that is

concerned directly and specifically in chemical signaling between
and within cells. Combination of a hormone, neurotransmitter,
drug, or intracellular messenger with its receptor(s) initiates a
change in cell function.
Agonist A ligand that binds to a receptor and alters the receptor state
resulting in a biological response. Conventional agonists increase
receptor activity. Full agonists stimulate the maximum response
capacity of the system: partial agonists do not reach the maximum
response capacity. The designation offull versus partial agonist is
system dependent and a full agonist for one tissue or measurement
may be a partial agonist for another.
Inverse agonists reduce the constitutive biological response. Nonendo-
genous agonists may combine either with the same site as the
endogenous agonist (primary or orthosteric site), or with a different
allosteric site on the receptor (allosteric or allotopic site).
Antagonist A drug that reduces the action of another drug, generally an agonist.
Many antagonists act at the same receptor macromolecule as the
agonist.
In competitive antagonism, the binding of the agonist and antagonist
is mutually exclusive, either because the agonist and antagonist
compete for the same binding site or combine with spatially adjacent
and overlapping binding sites (synoptic interaction): a third
possibility is that different binding sites are involved but they
influence the macromolecule in such manner that simultaneous
binding is impossible.
Allosteric A ligand that increases or decreases the action of an (primary or
(allotopic) orthosteric) agonist or antagonist by combining with a distinct
modulator (allosteric or allotropic) site on the receptor macromolecule.
Desensitization Decline in the response to continuous or repeated application of
agonist.
Adapted from the recommendation of the IUPHAR Committee

on Receptor Nomenclature and Drug Classification 1141
and heterotrimeric G-proteins as transducers. Intracellular free calcium and

inositol phosphates were later characterized as further second messengers,
and phospholipases, kinases, and ion channels emerged as important effector
systems downstream of GPCR activation. The list of effectors is ever expanding
(see Fig. 15.4-3) [18, 191.
Long before the GPCR proteins were isolated and sequenced, many
important therapeutic classes were successfully introduced into the clinics,
including the B-blockers, antihistaminics, anticholinergics, analgesic opiates,
and neuroleptics [20]. These compounds were developed from discovery to
market very rapidly and were successful in the pharmaceutical industry. The
sales provided funds to fuel further research in the field. A critical success
940
Fig. 15.4-3 Classical CPCR signaling. intramembrane hydrolysis o f

Receptors couple t o heterotrimeric phosphatidylinositol-4,5-bisphosphate
C-proteins to regulate a variety o f cell (PIP2) t o inositol-l,4,5-triphosphate (IP3)
responses. Agonist binding at the receptor and diacyl glycerol (DAG); DAC increases
leads to exchange ofC-protein bound CDP the activity o f protein kinase C (PKC) and IP3
t o CTP. The activated heterotrimer triggers the release o f Caf2 ions from
dissociates into the a-subunit (symbolized intracellular stores. Finally, the two
as a*)and the By-dimer, both ofwhich have members ofthe cu12ll3 family regulate Rho
an independent capacity t o signal forward proteins. Gby-dimers are combinations of
through the activation or inhibition o f five known isoforms ofthe C p subunit and
effectors. Hydrolysis o f CTP t o CDP leads to 13 known isoforms o f t h e C, subunit. Each
signal termination and reassociation o f the individual isoform can associate with a set
heterotrimer; regulators o f C-protein o f effectors and regulators. Cpy-dimers
signaling (RCS) proteins enhance the signal to a large number ofeffectors
intrinsic CTPase activity of the c, subunit. including ion channels, phospholipases,
Some C-protein subunits and effectors are phosphoinositide kinases, and the
expressed ubiquitously, others only in ras/raf/extracellular signal-regulated kinase
specific tissues. The 16 mammalian (ERK) pathways. Examples o f effectors
C-protein a-subunits fall into four broad include: C-protein regulated inward
families based on primary structure and the rectifying Kf channel (CIRKi-4),
dependent signaling cascade. The voltage-dependent Ca+* channels (VDCC),
stimulatory C,, family couple to adenylate phospholipase A2 (PLA2), PLCp and
cyclase t o cause an increase in intracellular Na+/HC exchanger (NHE1). The specific
CAMP levels. The eight members ofthe function o f individual Cpy-dimers is not fully
C,,,, family inhibit adenylate cyclase and explored. A single CPCR can activate more
trigger other signaling events. The three than one type o f C-protein. For further detail
members ofthe Cuq/ll family activate see Refs. [18, 191.
Phospholipase Cb (PLCp) resulting in the
factor for their discovery was the existence of relatively well established
knowledge of the physiology of the related hormone, and that new chemical
compounds were systematicallytested in biological models of multiple disease
15.4 The CPCR - TTM Receptor Target Family I 941
areas in parallel, allowing a rather complete understanding of their mode of

action. Binding profiles of drugs and reference compounds were generated
on membrane preparations from different organs, leading to the first clear
evidence for receptor subtypes expressed in different tissues.
The development of new protein chemistry technologies, like affinity labeling
and affinity chromatography procedures allowed access to enriched and
purified sources of receptors and finally introduced the molecular age of
GPCR research. Having access to a broad range of adrenergic ligands and by
coupling the new affinity chromatography procedures with more conventional
chromatographic procedures, the Lefkowitz group was first able to purify, the
B2-adrenoceptor in 1979 [21]. The proof of concept experiment that showed the
purified B2-adrenoceptor protein is indeed the functional receptor which was
achieved by reconstitution experiments in phospholipid vesicles with purified
G-protein and the catalytic moiety of adenylate cyclase [22]. The progress
in molecular cloning techniques provided access to the DNA sequence of
the receptors. Microsequencing of small peptide stretches obtained from
the purified adrenoceptors enabled the design of oligonucleotide probes,
allowing, in 1986, Merck Research Laboratories to clone the gene and cDNA,
encoding the hamster B2-adrenoceptor by using a genomic cDNA library
and by identifying overlapping clones that encoded all the peptide stretches
defining the full sequence [23].
The cloning of the B2-adrenoceptor was a historic breakthrough and catalyzed
molecular GPCR research. The analysis of the sequence revealed the homology
to bovine rhodopsin, which, since the beginning of 1980s, was a model
system for the study of membrane proteins and the investigation of the
molecular basis of vision. Given its remarkable easy access from retinal rod
preparations, the sequence ofbovine rhodopsin was, in 1982, determined using
conventional protein sequencing by Ovchinnikov, and cloned in 1983 [24]. A
structure-function relationship was established to bacteriorhodopsin, the
photon-driven and retinal-binding proton pump from the purple membrane
of Halobacterium halobium, for which Henderson and Unwin, in 1975, had
already determined a 7TM topology using electron microscopy techniques
[25]. Given that the investigation of the signaling mechanisms of rhodopsin
had revealed its linkage to the G-protein transducin, the knowledge on the
Bz-adrenoceptor sequence and signaling mechanism consolidated the view
that rhodopsin would provide an ideal model system for other GPCRs. The
speculation about the existence of a large family of such receptors with the 7TM
arrangement being a fold characteristic was then confirmed in the following
years by successful cloning of essentially all monoamine GPCRs and several
peptide class A GPCRs showing all the characteristic 7TM signatures in the
hydrophobicity plot analyses.
To this end, cDNA libraries, prepared from cells or tissues known to be
rich in certain receptors, were screened by low stringency hybridization, or
were used for polymerase chain reaction (PCR) amplification of candidate
genes using degenerate primers. Proof of function was obtained after the
75 Target Families
942
I expression of the cloned receptor in heterologous cells, by measuring an
agonist response. In many cases, however, the functional identity of the
cloned receptors could not be matched, and receptors of unknown function
were identified. Since the end of the 199Os, such orphan GPCRs became the
object of reverse pharmacology-based drug discovery programmes [26,27].The
successfully completed deorphanization projects resulted in relevant patent
and intellectual property claims to the inventors. The elucidation of the human
genome, in 2001, motivated additional projects in this direction, because almost
all members of the GPCR-7TM target family became visible at the DNA
sequence level, and advanced gene expression analysis and bioinformatics
methods became available for mining and classification purposes. Around
60 orphan receptors since became ligand paired and progress was, in many
cases, achieved for entire subfamilies such as the trace amine receptors or the
endothelial differentiation gene (EDG) receptors.
Besides the systematic exploration in drug discovery programmes, other
branches of GPCR research focused on the detailed investigation of receptor
signaling and regulation. Generally, direct signaling via second messengers
resulting in immediate cell responses can be distinguished from the persistent
activation of gene expression in the nucleus. The discovery of mutations
conferring constitutive receptor activation led to the identification of receptors
signaling in the absence of agonist ligands and which were later related to a
number of diseases [28] (e.g., in Jansen’s disease [29], the hypercalcemia and
skeletal dysplasia found in many cases is the result of a constitutively overactive
parathyroid hormone/parathyroid hormone related protein (PTH/PTHrP)
receptor, carrying a point mutation). Studies of the constitutive activity of
receptors led to both the in vitro and in vivo demonstrations of inverse agonism.
In the extended ternary complex model of receptor activation, inverse agonists
are ligands that preferably bind to and stabilize the inactive conformational
state of the receptor and therefore reduce background signaling [30]. Many
receptors show a weak constitutive activity in specific cell systems following
overexpression, and this can be used to determine the coupling mechanisms
engaged downstream.
Using mutagenesis and chimeric receptors, the ligand-binding domains
and intracellular domains interacting with G-proteins and other effectors were
determined [31,32]. Multiple signaling roles and signal switching mechanisms
were discovered for many GPCRs. For instance, the ,8z-adrenoceptor signals
on initial agonist binding via the G, pathway. Protein kinase A (PKA)
mediated phosphorylation within the third intracellular loop switches the
signaling specificity toward G, signaling pathways. A subsequent change of
the signaling properties occurs through G-protein-coupled receptor kinase
(GRK) mediated phosphorylation of the receptor C-terminal tail, resulting
in binding of ,8-arrestin proteins, which mediate receptor downregulation
via clathrin-coated pits. The internalized complexes subsequently undergo
regulated endosomal sorting either toward lysosomal degradation or by
recycling back to the plasma membrane. ,8-Arrestin also acts as a scaffold
15.4 The CPCR - 7TM Receptor Target Family I 943
protein for other signaling pathways and recruits, for instance the c-Src kinase
via the poly-Pro-SH3 domain, and thereby activates mitogen-activated protein
(MAP) kinase signaling. Also, G-protein independent signaling toward the
NHEl ion exchanger was observed. This occurs via the Naf/H+ exchanger
regulatory factor (NHERF) protein interacting by its postsynaptic density-95,
disc large, zonulla occludens-1 (PDZ) domain with the PDZ binding motifs
found at the C-terminus of several GPCRs [33].
The investigation of the mechanism of agonist-induced receptor signaling,
desensitization, internalization, trafficking, and recycling resulted in the
discovery of many proteins that interact with GPCRs and are collectively
called G-protein-coupled receptor interacting proteins (GIPs) [34, 351. The GIPs
link GPCRs to large protein networks, called receptosomes, whose mechanistic
investigation and exploration for drug discovery is the subject of intense
research activity. We will elaborate more on this topic at the end of the chapter.
15.4.3
15.4.3.1 CPCRs in Human and Other Cenomes

The human genome as well as genomes from several other species (mouse,
rat, zebra fish, Drosophila, Caenorhabditis elegans) are now relatively well
analyzed with respect to GPCRs, and these receptors constitute the largest gene
family in mammals. The most recent studies concluded - depending on the
stringencies ofthe different bioinformatics data mining methods used - on the
existence of 720-800 human GPCRs accounting for around 2% of the human
genome. These include ca 380 unique functional nonolfactory/nonsensory
GPCR sequences for which endogenous ligands are expected and are therefore
referred to as endo-GPCRs [9, 36). The endo-GPCR group has attracted the
most attention in recent years. These receptors are expressed in different
tissues and regulate various aspects of physiology. A recent comparative
investigation of the human and mouse endo-GPCR repertoire [36], revealed
367 human and 392 mouse GPCRs - 343 were found in common to both
species. The human receptors without orthologs, in mice, contain several
orphan receptors, but notably the melanin-concentrating hormone subtype
2 (MCHZ) receptor and the recently identified receptor for the eosinophil
chemoattractant 5-0x0-eicosatetraenoic acid. Of the 362 human GPCRs, 284
belong to the rhodopsin-like class A, 50 to the secretin receptor-like class
B, 17 to the class C, and 11 to the frizzled-smoothened receptor-like class
F/S; and of the 387 mouse GPCRs, 313, 47, 17, and 10 belong to classes A,
B, C, and F/S, respectively. The cataloguing of these receptors according to
ligand specificities reported in the literature identified 224 human and 214
mouse GPCRs with known ligands. The remaining 138 human and 173 mouse
GPCRs have no known ligands and are therefore orphan receptors. Among
the orphan receptors, 98 human and 136 mouse receptors belong to class A,
944
I 34 human and 31 mouse receptors belong to class B,
15 Target Families
G receptors belong to
class C in both species, and none belong to class F/S.
Olfactory receptor genes represent the largest mammalian subgroup. They
are class A receptors encoded by single exons, and they are transcribed in the
olfactory epithelium, where they interact specifically with the G-protein Goif to
transduce odorant signals. They provided the basis for the understanding of
odor recognition, which was awarded in 2004 with the Nobel prize for medicine
and physiology to Buck and Axel [37]. For some olfactory receptors, expressed
sequence tags (ESTs) were picked up in peripheral organs; however, the
significance of these findings remains unclear at present (e.g., prostate-specific
gene receptor (PSGR)).Especially for the human olfactory receptor family, it
is not yet entirely clear which of these receptors are functionally expressed,
as about 50% of the genes identified likely represent pseudogenes. In the
mouse family, the majority of olfactory receptors appears to be functional.
The annotation and functional characterization of olfactory receptors is
rapidly evolving and specific databases have been created that follow recent
developments “online” (see Table 15.4-3). In addition to olfactory receptors,
taste and pheromone receptors are identified as chemosensory.
The pheromone receptors play an important role in modulating behavior in
rodents; whether they are involved in human behavior is a matter of debate.
Pheromone receptors belong to class C. They are specifically expressed in the
vemeronasal organ in rodents, which is a specific structure separate from, but
in proximity to, the main olfactory epithelium. While there are more than 100
active receptors in the mouse, only 11 have been identified in humans, and
their ligands are unknown.
Taste receptors come in two families that are rather well conserved between
human and mouse. One group belongs to class C and has three members
(TlR1,2,3);these receptors form heterodimers like y -aminobutyric acid type
B (GABAB)receptors, and the different entities formed are responsible for
detecting sugars and amino acid glutamate. The second group oftaste receptors
is class A like (T2Rs) and comprises more than 30 receptors in humans, which
appear to be involved in detecting bitter tastes. All taste receptors are expressed
exclusively in the tongue, and there is a separation between cells expressing
T1- and T2-type receptors.
The opsins represent the highly interesting small family of light-detecting
GPCRs [38]. In addition to the four well known opsins operating in rod
and cone cells, there are four additional opsin-related receptors (retinal G
protein-coupled receptor (RGR) opsin, peropsin, melanopsin, encephalopsin)
that are likely to bind chromopliores and appear to play interesting roles in
light-sensing,outside the well-described primary phototransduction processes.
For instance, melanopsin may be involved in the control of circadian rhythms.
ESTs for encephalopsin were isolated from several tissues, including brain
and skin.
The genome of the nematode C. eleguns was the first to be sequenced
in full, followed by Drosophilu shortly after. These very distantly related
Table 15.4-3 Publicly available Internet molecular informatics
resources providing relevant information for CPCR chemical
biology research
~ ~~
Internet resource URL Specification o f CPCR related information available
http://www.iuphar-db.org/iuphar-rd/index.html/ Official database of the IUPHAR Committee on Receptor Nomenclature and Drug
Classification, includes information on name synonyms, structure, functional assays,
ligands, agonist and antagonist potencies, radioligand assays, transduction mechanisms,
receptor distribution, tissue function, and phenotype.
http://kidb.bioc.cwr.edu/ Database of N l M H Psychoactive Drug Screening Program. Pharmacoinformatics systems
with strong focus on GPCR pharmacology and profile structure-activity data.
http://www.gpcr.org/7tm/ GPCRDB: Information system of CMBI in Nijmegen contains information about
sequences, multiple sequence alignments, phylogenetic trees, 3D models, GPCR mutation
data and ligand-binding constants.
http://bioinfo-pharma.u- hGPCRdb: The human druggable GPCR database at the University Louis Pasteur of
strasbg.fr/gpcrdb/gpcrdb-form.htm1 Strasbourg provides searching capabilities for chemogenomics analyses of the 7TM and
the binding cavity domains of human GPCRs.
http://senselab.med.yale.edu/senselab/ORDB/ Olfactory Receptor Database of the SenseLab project at Yale University which is a
long-term effort to build integrated, multidisciplinary models of neurons and neural 1
systems, using the olfactory pathway as a model. The database provides metadata of gene
and protein sequences of olfactory receptors.
W
P
m
Table 15.4-3 (continued)
Internet resource URL Specification o f CPCR related information available
RAP/homepage.html/
http://www-grap.fagmed.uit.no/G The GRAP database at the University of Tromso contains information of mutants of family
A GPCRs with detailed description of the ligand-binding and signal transductional
properties.
http://umber.sbs.man.ac.uk/dbbrowser/gpcrPRINTSj A diagnostic bioinformatics resource at the Univei-sityof Xanchestei profiling a query
sequence against the PRINTS fingerprint database to determine most similar families or
receptor subtypes.
http://bioinformatics.biol.uoa.gr/PRED-GPCR/ Additional bioinformatics classifiers of GPCRs exist at the University of Athens and the
University of California Santa Cruz, and are, respectively, based on Hidden Markov Model
(HMM) and SVM methods.
http://www.soe.ucsc.edu/research/compbio/gpcr- ChemBank at Harvard University and Pubchem at the NCBI are cheminformatics
subclass/ databases for small molecules and their biological activities. Both systems are supported by
the NCI’s initiative for chemical genetics.
http://chembank.med.harvard.edu/ InterPro at EBI is a general bioinformatics database of protein families, domains, and
http://pubchem.ncbi.nlm.nih.gov/ functional sites in which identifiable features found in known proteins can be applied to
http://www.ebi.ac.uk/interpro/ unknown protein sequences.
organisms share with mammals the existence of receptor systems for

monoamines, acetylcholine, GABAB, glutamate, Wnt glucoproteins, and
several neuropeptides, inferring their potential usage as model organisms
to explore the biology of the conserved receptor systems [39].
Virally encoded GPCRs might have a direct role in human diseases. Indeed,
the GPCR from Kaposi’s sarcoma-associated herpesvirus has recently been
implicated in Kaposi’s sarcomagenesis, and the human cytomegalovirus-
encoded GPCRs have been implicated in atherosclerosis. Given the versatility
of GPCR signaling and its wide involvement in physiological processes, it
is not surprising that viruses have evolved to exploit these receptors to their
advantage [40].
15.4.3.2 Strategies for the Deorphanization o f CPCRs

Deorphanization, the identification of activating ligands for previously orphan
receptors, is a key task in reverse molecular pharmacology. Identifying
receptor/agonist pairs usually allows the rapid elucidation of the physiological
role of both partners, sometimes putting them in unexpected context. Thus, the
identification of orexin unexpectedly led to an understanding of narcolepsy; the
discovery of pH-sensing receptors triggered new experimental approaches in
several areas of biology. Although bioinformatics methods were initially helpful
in successfully directing ligand-pairing experiments as illustrated by examples
given in Section 4.3.5, deorphanization strategies rely on biological screening
of orphan GPCRs expressed in specific recombinant expression systems, such
as immortalized mammalian cells, yeast, or Xenopus melanophores [26, 271.
The agonist ligand libraries used for deorphanization include small
molecules, peptides, proteins, and lipids or tissue extracts, which are
specifically selected as described in Section 4.3.4. The identification of an
activating agonistic ligand of the cell-surface expressed receptors is dependent
on the activation of an intracellular signaling cascade. The difficulty in the assay
design is that the signaling cascade is a priori not known for a new orphan
receptor. Generic assay systems amenable for high-throughput screening
(HTS), therefore, need to be designed to allow screening of large surrogate
ligand collections.
One of the most successful approaches to deorphanization uses the
fluorescent imaging plate reader (FLIPR) screening technology, which detects
ligand induced intracellular Ca+’ mobilization. To direct the signaling via the
PLC Ca+’ readout, the receptors are transiently expressed into mammalian
cells in the presence of one or more cocktails of promiscuous G-proteins such
as GqlsIl(j,which couple to the majority of GPCRs [41], or the engineered
chimeric G-proteins like Gcrq,5-G or Guqs5-6in which five or six amino acids
of the C-terminal of G,, have been replaced by the corresponding amino
acids of G,, or G,, to redirect coupling of G,, or G,, specific receptors via
phospholipase Cp(PLCp) [42]. Through mechanisms that are not yet fully
understood, prestimulation of some cell types with the agonists of G,-coupled
948
I receptors dramatically sensitizes these cells to stimulation by Gi- and G,-
75 Target Families
coupled receptors, again linking such receptors to the calcium signaling

system.
GPCRs have been successfully expressed in yeast and coupled to the
endogenous mating response pathway. Yeast-based assays use a variety of
stable, expressed synthetic G-proteins and the readout is linked, for instance,
to the expression of the ,6-galactosidase or other reporter genes. The usage
of Xenopus melanocytes (frog skin cell) for transfection with mutant orphan
GPCRs, which increased constitutive activity, represents an alternative to the
mammalian and yeast expression systems. In response to selective GPCR
signaling via G,, or G,,, the melanosomes disperse the melanin pigment
and cause darkening of the cells. Conversely, when signaling is via G,,, the
melanosomes aggregate and cause lightening of the cells. Activation and
signaling can thus specifically be determined by simple measurements of
light transmittance. The so-called constitutively activating receptor technology
(CART),which is limited in compound throughput, provides the advantage of
identifying agonist and inverse agonist in the same experiment [43].
The validation of the hits includes testing of the possible interference with
endogenous receptors of the heterologous expression system and is followed by
selectivity screening on other GPCRs and further investigation on cell-based,
tissue or in vivo models. These experiments help determine the physiological
role of the newly discovered ligands and receptors.
There are limitations in such screening strategies, as the heterologous
expression systems may not provide a permissive context for signaling. For
example, the class B GPCR calcitonin receptor-like receptor (CRLR) requires
the presence of single-transmembrane-domain receptor activity modulating
proteins (RAMPS),which regulate the transport to the membrane and ligand
specificity properties of the receptor. Depending on the RAMP subtype,
CRLR can act as a calcitonin-gene-related peptide or as an adrenomedullin
receptor [44]. Other receptors are active only as heterodimers, as has first been
demonstrated for the GABABreceptors, requiring the coexpression of the two
partner receptor proteins.
The C5L2, a receptor that shares homology with the C3a and C5a
anaphylatoxin receptors, is currently thought to work simply as a ligand
sink without any classical signaling activity. Similarly, the chemokine receptor
DG is thought to bind several chemokines with the only purpose being to
internalize and degrade them. Unfortunately, it is not possible at present
to predict with reasonable certainty from the receptor’s primary sequence
whether it will be signaling. This raises the possibility for other orphan
receptors that either do not signal or use alternative G-protein independent
signaling pathways. These examples illustrate the need for the development
of novel screening and imaging technologies, reporting, for instance, on
receptor translocation of proteins between subcompartments of living cells
using light resonance energy transfer based on either fluorescence (FRET) or
bioluminescence (BRET) [45,4G].
Other receptors, like viral GPCRs (e.g., ORF74 of Kaposi sarcoma-associated

herpesvirus), are highly constitutively active and function in the absence of
ligand, which raised the possibility for the existence of other ligandless or-
phans. Again, other orphan receptors might play roles only in intracellular
mechanism, acting, for instance, as trafficking factors via heterodimerization,
or being expressed in the membranes of intracellular organelles; exogenously
applied nonmembrane permeable ligands will not activate such receptors. The
correct plasma membrane localization of orphan receptors studied should
be controlled using immunocytochemistry methods. There are several cases
where the reported receptor agonists may not be the physiological ones. For
instance, the receptors HM74 and HM74a respond to niacin (nicotinic acid), a
clinically useful molecule normalizing dyslipidemia, but the physiological first
messenger(s) remains to be discovered. In some cases, the original reports
describing new receptor-ligand pairings were not reproducible. For instance,
several years ago, the related receptors ovarian cancer G protein-coupled recep-
tor 1 (OGRl),G protein-coupled receptor 4 (GPR4),and T cell death-associated
gene 8 (TDAGS) were described as receptors for lipid messengers. Later, it
was demonstrated that OGR1, GPR4, and TDAG8 may in fact be considered
as genuine pH-sensing receptors [47].
15.4.3.3 Structural Biology o f CPCRs and Molecular Modeling o f

Ligand- Receptor Interactions
Until the year 2000, when the first 2.8-A crystal structure of bovine rhodopsin
was solved by Palczewski and coworkers, and which was later refined in 2004 to
2.2-A resolution and for which in total seven crystallographic conformational
states are deposited at the Protein Data Bank (PDB) [48], structural biological
investigations of GPCRs were limited to indirect mutagenesis and second
generation affinity labeling methods based on substituted-cystein accessibility
method (SCAM)where sulfhydryl-reactive affinity reagents are combined with
either wild-type or a series of substituted-cystein mutant receptors (e.g., D2
receptors).The first 3D molecular models were based on the analysis of the 2D
projection maps generated from cryoelectron microscopic data of 2D crystal of
rhodopsin and the analogous bacteriorhodopsin for which a 2.5-A resolution
X-ray structure became available in 1997 using microcrystals grown in lipid
cubic phases [49].The comparison of the 3D structures of rhodopsin and bac-
teriorhodopsin clearly showed differences in the length of the loop and helix
segments and of the relative arrangement, tilts and kinks of the individual
helices among the two proteins, which were already previously inferred to
exist based on the 2D projection maps. While the early 3D models based on
the bacteriorhodopsin template were able to explain, to some degree, the data
generated from mutagenesis experiments [SO, 5 11,the quality of these analyses
became clearly improved when the 3D structure of the bovine rhodopsin be-
came available. This applies especially for the class A GPCRs, which, although
having only a sequence similarity of 20-30% to rhodopsin, share characteristic
950
I signature motifs in each TM helix [52]. The main ligand-binding site of small
15 Target Families
molecule hormones and nonpeptidic agonist and antagonist is located within

the central crevice of the 7TM bundle, in analogy to the lipophilic binding
pocket of the retinal molecule in the light-sensing proteins. This is a remark-
able similarity, especially for the overlap between the positions of the proposed
ligand-contact residues and the positions of the retinal contact residues in
rhodopsin. The extracellular side involved in ligand binding appears to form
a receptor-specific binding site, while the cytoplasmic side and the ends of the
transmembrane helices toward the cytoplasm are significantly more conserved.
Illustrative examples include the work on the B 2 adrenergic, serotonin
5 - H T 1 (see
~ Fig. 15.4-4),neurokinin-1 ( N K l ) , adenosine A3, purine P2Y1, an-
giotensin AT1,and chemokine CCR2 receptors [2,52-541, where the 3D models
helped in understanding detailed aspects of the observed structure-activity
relationships (SARs) based on the analysis of the ligand-receptor interactions
probed especially by two-dimensional mutagenesis experiments, that is, exper-
iments in which both the ligand and the receptor are simultaneously modified
according to the presumed nature of the specific molecular interaction. Such
experiments are expected to be of better quality than the more frequent one-
dimensional mutations of the receptor, whereby the described effect on the
binding might not exclusively result from a direct ligand-receptor contact
but also from long-range structural perturbations. Such studies demonstrated
that antagonists of small molecule hormone receptors bind isosterically to
the endogenous ligands, whereas nonpeptide antagonist may bind rather dif-
ferently to the peptide agonists. Mutation experiments in combination with
molecular modeling of ligand-receptor interactions were also useful in un-
derstanding the species differences for ligand affinities and specificities (e.g.,
NK1 antagonists in human and rat).
More prospectively, the models were used to provide a conceptional
framework for combinatorial library design strategies like in the Novartis
and Biofocus chemogenomics knowledge-based approaches described below
and for the optimization of the selectivity aspects of lead series. For instance,
modeling of the 5-HTzc receptor-ligand interaction in combination with
ligand derived comparative molecular field analysis (CoMFA) was crucial for
the discovery and optimization of 5-HTzcp selective indoline urea ligands
not targeting the ~ - H T ~ receptors
A [55]. Rhodopsin-based models of the
7TM domain were also instrumental to researchers at Novo Nordisk in
understanding the molecular recognition of privileged structures used in
generalized library design approaches, which provide a mean to target orphan
GPCRs in the absence of the knowledge of the endogenous or surrogate ligand
[SG]. Modeling the interactions of three sets of privileged motif-based ligands
into their receptors, including 2-aryl-indole based ligands in the serotonin
5-HT6 and melanocortin-4 (MC4) receptors, spiro-piperidine-indane based
ligands in the growth hormone secretagogue (GHS) and MC4 receptors, and
2-tetrazole-biphenyl based ligands in the AT1 and GHS receptors, showed the
correlation of conserved patterns of residues in the ligand-binding pockets
Fig. 15.4-4 Three ligand binding sites filling" ligands 5-HT (serotonin - yellow),
model for monoamine-related CPCRs propranolol (cyan), and
illustrated by a rhodopsin-based 3D model 8-hydroxy-N,N-dipropylaminotetralin
o f the S - H T ~ A
receptor (left: extracellular (8-OH-DPAT - green), respectively. All
view; right: side view). We recently proposed three binding sites are located within the
a three binding site hypothesis for the highly conserved 7TM domain o f the
molecular recognition o f ligands at receptor and overlap a t the residue Asp3 32
monoamine CPCRs by combining: (D116) in TM3, which constitutes the key
(a) analyses ofthe architectures o f known anchor site for basic monoamine ligands.
monoamine CPCR ligands (see Fig. 15.4-9), The three distinct binding sites are also
(b) analyses o f molecular models o f the reflected by the architectures o f known
ligand-receptor interactions, and high-affinity ligands, which cross-link two or
(c) structural bioinformatics analyses o f the three "one-site filling" fragments around a
sequence similarities o f the three distinct basic amino group. For further detail see
binding regions o f "one-site filling" ligand references [51, 531. Throughout this chapter
fragments within the monoamine CPCR the residue positions are number coded
family. For the ~ - H T receptor,
~A which according to van Rhee and Jacobson [32]:
provided a template for the discussion o f The first digit gives the transmembrane
other related ligand-CPCR interactions, domain and the following number indicates
mutagenesis studies map three spatially the position o f the residue relative t o
distinct binding regions, which correspond position 50 which i s arbitrarily attributed t o
to the binding sites o f the "small, one-site the most conserved residue in each helix.
of the receptors with the recognition of specific privileged fragments. These

findings imply that any one particular privileged structure can target a specific
subset of receptors and that motif-based searches can be used for subsetting the
receptor repertoire including the orphan receptors. The models also showed
that only parts of the privileged structures are accommodated within the
conserved subpocket; some contacts are between substructure elements of the
952
I full privileged motif and the nonconserved part of the pocket, which suggests
15 Target Families
the possibility for design of selective ligands based on privileged motifs. A broad
spectrum of homology modeling techniques ranging from strict, template-
based methods to de novo prediction methods (e.g., the PREDICT method [57])
are used to build GPCR models. Although some reports suggest that rhodopsin
template-based approaches can be adapted to the entire GPCR repertoire
[%I, the underlying sequence alignments of such models must be carefully
investigated, which for some helices in some subfamilies are not obvious [9].
While most of the time these models neglect the long intracellular loops and
N- and C-terminal domains, some studies emphasized the role of the second
extracellular loop E2 in ligand specificity. In the bovine rhodopsin structure,
the E2 loop, which is bridged via a conserved disulfide link to the residue
Cys3.25 in top of TM3, covers parts of the central binding crevice in a lidlike
manner. One of the two ,!?-strandsthat defines the fold of the loop, contacts
directly with the retinal ligand. As the length of the loop varies significantly
within the class A family, general conclusions are difficult. Recently, it was
proposed on the basis of random saturation mutagenesis experiments of the
C5aR that the E 2 loop acts as a negative regulator of receptor activation and
stabilizes the nonsignaling receptor conformation in the absence of the agonist
ligand [59]. Also, the E2 loop has been implicated in ligand-ligand allosteric
interactions which were experimentally investigated by the SCAM approach
[60]. For instance, in the interaction of the muscarinic M1 receptor with the
allosteric modulator gallamine, an acidic sequence segment just before the
loop cystein residue could be linked to these effects. The potential role of the E2
loop in the allosteric effects observed for amiloride on the action of antagonists
of the C X ~ Aand (Y2A adrenoceptors and dopamine receptors is reported.
Recently, the potential value of GPCR models for in silico screening
applications has become of interest. Using a 3D model of the NK1 receptor
generated by the modeling binding sites including ligand information explicitly
(MOBILE)approach, in combination with 2D and 3D database searches, novel
submicromolar NKI antagonists were discovered [61]. As shown in another
study [62], models of the dopamine D3, muscarinic MI, and vasopressin V1,
receptors based on the rhodopsin template seemed to be of sufficient accuracy
to be useful (20- to 40-fold enrichment compared to random screening) in
protein-based virtual screening experiments. This procedure used standard
docking software like DOCK, FlexX, or GOLD and searched for GPCR
antagonist starting from antagonist-bound models shaped by minimizing
manually docked antagonist into the binding site. The same procedure
was, however, not applicable when a single agonist ligand was used for
the binding site shaping step, indicating that the structural changes that can
be achieved by minimization to expand the binding site are not sufficient
for stimulating the conformational changes occurring in receptor activation.
Instead, a multiagonist ligand pharmacophore-based receptor refinement
method needed to be used to generate useful models for agonist virtual
screening. Corroborative findings were described for models generated with
the PREDICT method and using the DOCK software in prospective virtual
screening for the Dz, ~ - H T ~S-HTd,
A , NK1, and CCR3 GPCRs [63].
Given especially the differences in the length of the intra- and extracellular
loops, the latter are expected to contribute to ligand entry, binding and/or
modulation especially for the peptide and protein binding GPCRs, and given
that the currently available inactive state rhodopsin structures can, at best,
be a reference for an antagonist state of related class A GPCRs, there are
many significant unknowns for the understanding of the structure-function
relationship of GPCRs. In this respect, the modeling and indirect structural
experiments of GPCRs also revealed the functional role of structural
microdomains as opposed to simply considering individual residues. An
important microdomain is the so-called DRY domain, which refers to a
conserved sequence patch at the cytoplasmatic end of TM3 in class A GPCRs
and which also involves residues in TM2, TMG, and TM7 [64]. The overall
picture common to many class A GPCRs is that residue Arg3.S0 is hydrogen
bonded to a carboxylate side chain at position Asp3.49 and to one or two residues
in TM6 equivalent to residues Glu6.30 and T h ~ 6 . 3in~ rhodopsin. Removal of
these interactions often results in constitutive activation of the receptor,
and based on this and the findings of analysis of structural intermediates
of the photocycle of rhodopsin, the emerging theory for receptor activation
suggests a mechanism involving a separation of the TM3 and TM6 domains
together with a twist in TM6, which pulls the third intracellular I3 loop
into the cell, uncovering residues related to G-protein coupling. Since the
DRY microdomain is not conserved in other GPCR families (exceptions are
some class C GPCRs), it may be concluded that the conformational changes
and signaling mechanisms are not strictly conserved. Importantly, as the
active conformations generated through constitutively activating mutations
and specific agonist ligands seem to be nonidentical, the concept of protean
ligands was defined by Kenakin to explain that each specific ligand-receptor
pair defines a functional entity with distinct signaling and functional properties
[65]. Obviously, this concept raises questions on the generality of the above
mentioned virtual screening studies for GPCR agonists.
Regarding class B and class C GPCRs, significantly few modeling studies
are reported. For class B GPCRs, a general two sites model has emerged for
peptide binding [7]. In this mechanism, the C-terminal ligand region binds
the extracellular N-terminal domain of the receptor. This interaction acts as an
affinity trap, promoting the interaction of the N-terminal region of the ligand
with the juxtamembrane 7TM domain of the receptor. Molecular models
were, for instance, generated for the interaction of peptide agonists with the
CFR2and PTH receptors, putting emphasis on a-helix recognition sites [G6,G7].
Nonpeptide ligands bind the juxtamembrane or the N-terminal domain and, in
most cases, allosterically modulate peptide-ligand binding [7]. Also noteworthy
is the modeling work around the allosteric binding sites of the class C Ca+*-
sensing receptor (CaR) [68] and mGluRl and mGluRS receptors [G9], where
site-directed mutagenesis and rhodopsin-based homology modeling showed a
954
I novel antagonist binding site within the 7TM bundles, clearly separated from
15 Target Families
the agonist binding site located in the N-terminal domains of these receptors.
Oligomerization of GPCRs appears to further contribute to the complexity
of the picture [70, 711, and recently a structural hypothesis was provided using
molecular modeling to describe how the G-protein transducin docks on to
dimer and tetramer oligomeric states of rhodopsin, revealing structural details
of this critical interface in the signal transduction process [72]. Biophysical
studies, using a Combination of mass spectrometry after chemical cross-
linking together with neutron scattering in solution, of the leukotriene B4
BLTl receptor, reconstituted with a heterotrimeric G-protein, sustains this
hypothesis by providing evidence for the overall assembly of a pentameric
complex formed by two BLTl units and one trimeric G-protein [73].
Ultimately, it will require high-resolution structures of multiple receptors
bound to multiple ligands including agonist, inverse agonist, and antagonist,
coupled to G-proteins and other modulators to understand fully the confor-
mational dynamics of GPCRs. The development of systematic approaches for
X-ray and nuclear magnetic resonance (NMR) analysis of GPCR structures is
hence currently a major scientific challenge, which requires further progress in
the expression, purification, and crystallization of GPCRs and their interacting
proteins [74].
15.4.3.4 Designing Compound Libraries Targeting CPCRs

In the last years, the design of GPCR-directed compound libraries became an
intense activity of drug discovery chemistry [75-771. Generally, the design of
deorphanization libraries can be distinguished from targeted lead-finding
libraries. Given the broad chemical diversity of the hormones that are
recognized by GPCRs, deorphanization libraries try to cover as many as
possible known active chemical classes. The term surrogate agonist library
is also appropriate given that the purpose of these libraries is to find a
chemical compound that selectively activates a given orphan receptor of
interest [2G, 271. Typically, compounds identical or similar to previously
identified GPCR agonists are included along with approved drugs and other
reference compounds with known bioactivity, like primary metabolites (e.g.,
KEGG compound set), or commercially available compilations, like the Tocris
LOPAC, the Prestwick, or the Sial Biomol sets. In addition to high-performance
liquid chromatography (HPLC) fractionations of tissue extracts to identify new
peptides and metabolites, of interest are protein mimetic libraries including
B-turnla-helix mimetics together with random or designed peptide libraries
based on the bioinformatics analysis of putatively secreted peptides and protein
hormones defined in the genome. Typically, the size of deorphanization or
surrogate sets is in the order of a few thousand well-characterized compounds
amenable for medium-throughput screening.
The design of lead-finding libraries follows the same molecular mimicry
principles and makes the best use of the substantial medicinal chemistry
15.4 The GPCR - 7TM Receptor Target Family I955
knowledge generated during the last decades around GPCR compounds

together with modern concepts, including lead/drug likeness and computa-
tional combinatorial library design [78, 791. Although focused library design
concepts target, in general, the classical binding sites, design concepts of
bivalent ligands and allosteric ligands are expected to become more important
in the future, given the anticipated progress in the understanding of the GPCR
oligomerization phenomenon [80]. Divalent ligands selectively targeting 8 - ~
opioid receptor heterodimers provide a recent example [81].The general expe-
rience with focused libraries and screening sets for GPCRs is very positive and
hit rates of up to 1-10% can be expected with library sizes of 500-2500 com-
pounds, when the libraries are designed toward new members with expected
conserved molecular recognition. Peptide and protein mimetic libraries includ-
ing ,9-turn/a-helix mimetics are recognized of central importance [82, 831. A
number of important hormones, like angiotensin, bradykinin, cholecystokinin
(CCK),melanine stimulating factor (MSF),and somatostatin (SST)make their
key recognition via specific p-turn motifs. Others, like corticotrophin releasing
factor (CRF),PTH/PTHrP, neuropeptide Y (NPY), vasoactive intestinal peptide
(VIP), or growth hormone releasing factor (GHRF) interact via a-helix motifs
[7, 841. While the design of organic druglike a-helix mimetics is still in its
infancy, the design of orally active p-turn mimetics based on organic druglike
scaffolds, or based on cyclic a-peptides or ply-peptides advanced to a quite
routine methodology. The work of Garland and Dean [85,86],defining a set of
triangular distance constraints that the substitution points of a scaffold have
to satisfy to mimic the specific C, atoms of the peptide template, provided a
generalized frame for the design of novel p-turn mimetic scaffolds and was
in combination with database searches that were successfully applied for the
design of CCK and SST antagonists [84].
The use of privileged substructures or molecular master keys, whether
target class specific or mimicking protein secondary structure elements, is an
accepted concept in medicinal chemistry. The privileged structure approach
emphasizes the molecular scaffolds or selected substructures that are able
to provide high-affinity ligands (agonist or antagonists) for diverse receptors
and originates from work at Merck Research Laboratories on the design of
benzodiazepine-based CCK antagonist, where the previously known K-opioid
Tifluadom was identified as a lead structure [87].A number of recent literature
reviews provide impressive reference repertoires of empirically derived
privileged structures, most notably the spiropiperidines, biphenyltetrazoles,
benzimidazoles, and benzofurans [88-901. The 2-aryl-indolescaffold illustrated
in Fig. 15.4-5 represents a particularly successful example and was shown at
Merck to generate actives for diverse class A GPCRs [91].
In the view of the above mentioned modeling of the ligand-receptor
interactions, the privileged structural classes will need to be analyzed further
to allow a more directed use of such libraries for specific receptor subsets.
The development of cheminformatic methods and procedures enabling the
automatic identification and extraction of privileged structures is especially
956
o$-Q
J/H
H
*OH
.--N
Br
?&; H
NPY, (lC50= 0.8 nM) NK, (lC,o = 0.8 nM) CCR, = 1190 nM)
CCR, (lC50= 920 nM)
rN' -NH,
5-HT,, (lC,o = 10 nM) 5-HT6(lC5!, = 0.7 nM) SST, (K, = 0.7nM)
Fig. 15.4-5 Examples ofCPCR active the library against several CPCRs led t o the
compounds based on the 2-aryl-indoles discovery o f NPYs, NKI, chemokine
privileged scaffold identified from a focused CCR~/CCRS,serotonin ~ - H T ~ A / ~ -and HT~,
combinatorial library at Merck. Screening o f SST, receptor antagonists [91].
needed in the context of generating knowledge from HTS data [92]. On the
basis of the molecular framework approach developed by Bemis and Murcko
[93], we recently initiated a systematic analysis using reference compound
and target information. Using the framework analysis as implemented in the
Scitegic Pipeline Pilot software, we designed a data pipelining protocol that
generates frequency analysis based on the input of the various reference sets.
The approach is illustrated in Fig. 15.4-6 for the monoamine GPCRs.
A different type of fragment-based design method called thematic analysis
was developed by researchers at Biofocus for the design of focused class
A GPCR libraries [77]. This knowledge-based method is comparable to a
method developed at Novartis, which is illustrated in Section 4.4.3 [53].
SARs were analyzed in detail across the whole class A GPCR family, and
family-activity relationships were used to develop a new classification process
on the basis of the pairing of sequence themes and ligand structural motifs.
A sequence theme is a consensus collection of amino acids within the central
binding cavity and a motif is a specific structural element binding to such
a particular microenvironment of the binding site. The analysis resulted in
a compilation of themes and motifs that, to date, are used at Biofocus to
generate focused discovery libraries and to increase the lead optimization
efficiency for these targets. The individual compound libraries are targeting
F+?I
PS2 0 0
p
5%:
0
0
0
0
0
0
15.4 The CPCR - T T M Receptor Target Family
I 957
a
0 0 0
Q
6
Cy 6
@H
PS1 PS2 PS3 PS4
Q
H
O\
6
HN\ 0
PS5 PS6 PS7

Fig. 15.4-6 Analysis o f privileged elements were assigned by the number o f
scaffold-target matrix o f monoamine CPCR compounds reported including a given
ligands. For each CPCR ligand assigned in framework for a given subtype. In addition,
the MDL Drug Data Report (MDDR) for each framework the total number of
database t o a specific monoamine GPCR monoamine GPCR subtypes addressed were
subtype, the Bemis-Murcko frameworks added and summarized in the frequency
were generated. The lists o f frameworks column; the rows were then sorted by
were then combined and duplicates were decreasing frequency. The structures o f the
eliminated. The comprehensive list o f seven most represented frameworks
unique frameworks define the row vector o f together with the addressed monoamine
the matrix, and the GPCR subtypes were GPCR subtypes are shown.
arranged t o the column vector. The matrix
958
subsets of GPCRs, including orphans, which share a predefined combination

of themes consisting of a central dominant theme and peripheral ancillary
themes. The library scaffold is designed such that it complements the central
theme and is amenable to incorporate a variety of structural motifs addressing
the individual sequence themes. Each library, consisting of approximately 1000
compounds, can thus be thought of as representing a number of predefined
themes, which are either present or absent in any given receptor, allowing
through such fingerprinting the computation of a library appropriateness score
for each receptor. Thematic analysis is also used to aid lead optimization by
the analysis of those themes, which may or may not be involved in the binding
of a particular hit molecule and the exploitation of new combinations of used
and unused themes to increase affinity and selectivity.
Compared to the fragment-based approaches, several groups have developed
knowledge-based library design strategies which are, in principle, based on
Sir James Black‘s frequently quoted statement: “the most fruitful basis for the
discovery of a new drug is to start with an old drug”. The associated selective
optimization of side activities (SOSA)approach is an additional very successful
medicinal chemistry concept in which the atypical neuroleptics acting on a
couple of GPCRs simultaneously provide a relevant illustration of the rationale
[94].The related computer-assisted drug design (CADD) methods make use
of selected reference compound sets and molecular descriptors together with
advanced cheminformatic methods to compare and rank the similarity of
designed candidate molecules [95, 961. Homology-based similarity searching
was developed at Novartis as a cheminformatics similarity searching method
able to identify not only ligands binding to the same target as the reference
ligand(s) but also potential ligands of other homologous targets for which no
ligands are yet known [97]. The method is based on the Similog descriptor,
which describes molecules as counts of pharmacophore triplets formed by the
individual nonhydrogen atoms and uses a centroid of the reference compounds
to describe the distance to the candidate molecule. In a retrospective analysis,
the method was shown to be highly effective for monoamine GPCR and
became an essential tool for the compilation of focused screening sets.
Related to the cheminformatics similarity searching methods are machine
learning methods, like artificial neural networks, Kohonen self-organizing
maps, and support vector machines (SVMs), which try to align the chemical
and biological spaces on the basis of mapping procedures [98].The goal here
is to identify which parts (islands) of the chemical-property space correspond
to specific target family or therapeutic activities, and vice versa. A number of
groups have applied such methods for design of broad GPCR-focusedlibraries,
and more recently, to specifically distinguish between family subgroups class
A, B, and C GPCR ligands, or to identify specific GPCR ligands for the
adenosine A2A, cannabinoid, CRF, and endothelin GPCRs [99]. De novo
design methods are reported, in which, based on ligand-based pharmacophore
models and abstract feature tree representations of GPCR ligands, virtually
generated molecules are evaluated and proposed for synthesis [ 100, 1011.
75.4 The GPCR - TTM Receptor Target Family I959
15.4.3.5 The Contribution o f Molecular Informatics to CPCR Chemical Biology

Given the fast growing number of molecular data and information related
to GPCRs, the need for molecular information systems which integrate
bioinformatic and cheminformatic systems was recently recognized [ 11,
1021. The cross-linking of the chemical and biological GPCR knowledge
spaces via classification and annotation schemes is an essential element of
chemogenomics knowledge-based ligand design strategies, which are based
on the fact that similar ligands bind to similar targets. The systems allow the
compilation of relevant reference sets for cheminformatics-based similarity
searches and for the library design of target class focused collections; vice versa,
the ligand similarity principle can be used to infer putative molecular targets
of compounds of interest. Most of the systematically generated information
on GPCRs is today publicly accessible via the Internet and a selection of
relevant information sites is summarized inTab. 15.4-3.In addition, a growing
number of chemogenomics knowledge-based companies, like Aureus Pharma,
Inpharmatica, GVKBio, Evolvus, and Jubilant Biosys are developing molecular
information systems, which integrate, in a comprehensive manner, GPCR
data from patents and selected literature together with chemical and biological
search engines.
Molecular information systems like the Cerep Bioprint Matrix or Iconix
DrugMatrix, summarizing the analysis of validated IC50 profiling data of drug
and development compounds on a panel of GPCR and other targets together
with absorption, distribution, metabolism, excretion, and toxicity (ADMET)
data, are becoming important for lead prioritization and design of safety
pharmacology studies. Currently, such data is used up front to identify the
clinical investigations potential side effects using both in uitro and in silico
testing [103-1051.
Given the fast growing complexity of the knowledge around GPCRs and
their interacting effector and regulator proteins, opening many new potential
mechanisms for interaction with small drug molecules, the design of the data
models of the molecular information systems will need to evolve further to
enable integration and mining of knowledge within a broader system biological
and chemical genetic network concept space.
Bioinformatics analyses provide an essential contribution to GPCR chemical
biology. The investigation of sequence similarities through phylogenetic,
diagnostic fingerprint, or Hidden Markov Model (HMM) analyses are a
commonly used strategy to classify new orphan members and to facilitate
the identification of the endogenous ligands [ 106, 1071. Phylogenetic analyses
predicted, for instance, that sphigosine-1-phosphate ( S l P ) , the endogenous
ligand of the EDGl GPCR, is also the ligand of the EDGj, EDGs, EDGb,
and EDGE GPCRs. Also, the ligand and the pharmacology of the human
histamine H4 GPCR was predicted through phylogeny, noting that it shares
only 26% identity with the histamine H I receptor. Conversely, examples are
known where sequence homology can be misleading; for example, a receptor
originally known as P2Y7 (BLT1)was thought to be a nucleotide receptor based
960
I on its similarity to P2Y purinoceptors, but it was shown to be activated by an
15 Target Families
unrelated ligand, leukotriene B4.

A different type of bioinformatics analyses focus on the analysis of specific
sequence motifs and signatures, which may lead to different conclusions
than the analysis of the overall sequence identity. For instance, two orphan
receptors, GPR6l and GPR62, reported to have overall sequence identity of
30% to the human 5-HTb receptor were thus classified as monoamine-like
receptors. Strikingly, both of them show mutations of the D3.32 residue and
should therefore belong to a different subfamily.
Understanding the principles of molecular recognition, in combination with
residue and motif-based 1D and 3D bioinformatics data mining, is becoming
an essential element for successful chemogenomics knowledge-based ligand
design strategies. Noteworthy in this perspective is the recent work done at
Pfizer and Biofocus, where, based on analysis of sequence data, mutation data,
and physicochemical properties of the ligands, approaches were outlined to
discover sequence patterns characteristic of specific ligand classes [77, 1081.
The potential of such computational methods was recently illustrated for the
identification of ligands of the prostaglandin D2 receptor chemoattractant
receptor-homologous molecule expressed on T helper type 2 (CRTH2). Using
a computational strategy which emphasizes on the classification of GPCRs
with respect to physicochemical features of selected amino acid residues
of the central binding cavity, researchers at 7TM Pharma showed that the
angiotensin AT1 and AT2 references can be used to identify high-affinity
ligands for the CRTH2 receptor; notably in the ordinary phylogenetic analysis,
the AT1 and AT2 receptors are not identified as close neighbors according to
the conventional evolutionary relationship models [log].
Other signature motifs direct the signaling interactions of the receptors with
effector and regulator proteins. The identification of a conserved motif within
second intracellular loops I2 and I3 of the somatostatin receptor subtypes
(SSTI, SST3, and SST4), the dopamine D2, and the a2B-adrenoceptors, which
confers inhibitory coupling to the NHEl ion exchanger, is given as a recent
example [1101.
15.4.4
15.4.4.1 Biological Expression o f CPCRs

The analysis of the tissue distribution of the receptors provides valuable
information related to the potential physiological function and therapeutic
indication of a given GPCR, and is an essential part of the pharmacological
target validation in the drug discovery process. Validation is based on
the evidence that the target gene is expressed in cells relevant to the
pathophysiologic mechanism of the disease indication. This information
is combined with the epidemiological evidence that target gene expression
is associated with the appearance/progression of the disease indication.

Furthermore, evidence that a target gene activity is necessary for a defined
phenotypic response relevant to disease indication is tested by the inhibition
of its expression or function, or by overexpression.
For instance, the undecapeptide substance P is a neurotransmitter that
mediates diverse biological responses in the nervous and immune systems
mainly through the NK1 GPCR. The specific response ofthe hormone depends
on the location of the NKI receptor, and pain, neurogenic inflammation,
asthma, and emesis are currently discussed as potential therapeutic indications
for NKI antagonists. The knowledge of the tissue distribution is thus essential
to predict potential main and side activities.
To this aim, specifically designed functional genomic experiments using
oligonucleotide GPCR chips or reverse transcriptase-polymer chain reaction
(RT-PCR) technologies in combination with immunochemistry approaches
allow the identification of gene expression profiles across a wide variety
of healthy versus diseased human and animal tissues [36, 431. The GPCR
expression matrix generated by Vassilatis et al. [36] and represented in
Fig. 15.4-7, shows the expression of 100 randomly selected endo-GPCRs
in peripheral and neural mouse tissues demonstrating that most GPCRs are
expressed in multiple tissues and that individual tissues express multiple
receptors. Strikingly, over 90% of the analyzed GPCRs are expressed in the
brain. The profiles of most GPCRs are unique, yielding thousands of tissue-
specific receptor combinations for the potential modulation of physiological
processes and design of therapeutics. Given that each tissue appears to have
a unique combination of GPCRs, indicates that second messenger pathways
are used in different contexts to allow differentiation of cellular responses to
hormone action. Expression profiling also contributes to the understanding
of the functional significance of receptor subtypes, which in different tissues,
couple one same hormone to different G-proteins and effector systems and
which might also show differences in their constitutive activity or regulatory
aspects, like the desensitization kinetics.
15.4.4.2 Advances in HTS o f CPCRs

Since the birth of modern HTS in the mid-l980s, drug discovery experienced
an explosion in novel assay methodologies and technologies. While around
10 000 compounds were tested every year in few assays in the mid-l980s,
these numbers rapidly increased in the major pharmaceutical companies
during the past 20 years to reach 1-2 million compounds tested within
50-100 assays. The major challenge is to develop and implement simple
assay methods to expedite HTS while maintaining high quality and generating
relevant information at low cost. GPCRs are targets where these criteria apply
well since their mode of activation by ligands offers many opportunities
for assay design and miniaturization. As illustrated in Fig. 15.4-3, the
signaling cascade subsequent to GPCR activation opens, in addition to basic
962
4 Fig. 15.4-7 Cluster analysis o f t h e were observed. In the first group (a) genes
expression o f 100 randomly selected mouse were expressed primarily in peripheral
endo-CPCR genes in 1 7 peripheral tissues tissues. Seven o f these genes were
and 9 different brain regions. The genes expressed exclusively in peripheral tissues
were analyzed individually by RT-PCR as and not in the brain. The second group
shown and the intensity ofthe observed (b) contained genes expressed primarily in
bands was determined by scanning. Each brain. O f these 41 genes, 14 were solely
gene is represented by a single row o f expressed in brain and not in peripheral
colored boxes with four different expression tissues. In the third group (c), the genes
levels: no expression, blue; low expression, were broadly expressed in the brain and
purple; moderate expression, dark red; throughout the periphery. Figure
strong expression, pure red. Three groups o f reproduced with permission from [36].
endo-CPCRs with broadly related profiles
ligand-binding assays, versatile opportunities to develop HTS assays based

on G-protein activation, determination of second messengers, or nuclear
activation. Currently, a variety of biophysical readout techniques and assay
formats are routinely used and have advantages and limitations as summarized
in Table 15.4-4.
Every assay will be selected on the basis of a set of criteria including among
others, infrastructure, instrumentation, throughput requirement, or the type
of information requested. For cell-based GPCR assays the question comes
to measuring affinity or efficacy [114]; both are fundamental and distinct
characteristics of the compound-receptor pair pharmacology [ 1151. Functional
cellular assays are especially superior in information compared to ligand-
binding assays when seeking allosteric modulators acting at receptor sites
other than the binding site of the endogenous agonist, or when multiple
measurements are required in the same well to provide additional activity and
selectivity information. For instance, Sabroe et al. showed in a single HTS run
that dual CCRl and CCR3 blockers are able to abrogate chemokine-induced
cell chemotaxis and other functional parameters such as eosinophil shape
changes and calcium mobilizations [llG].
FLIPR duplex calcium mobilization assays were developed at Novartis to
identify blockers of the chemokine CXCR4 receptor. Screening compounds
are tested in the same well against the CXCR4 receptor and subsequently
against the muscarinic MS receptor expressed in the same CEM-T cells. This
duplex readout provides a hint on compound selectivity in a cost-effective
fashion already in primary screening. The approach is rendered possible by
the noninvasive nature of FLIPR calcium assays and enables the prioritization
of compounds acting at the receptor level and the exclusion of compounds
interfering with cellular components common to the two GPCRs.
Furthermore, GPCR triplex assays are routinely used at Novartis and rely
on three successive readouts obtained from the same well. As shown in
Fig. 15.4-8,the triplex GABAB heterodimeric receptor calcium assay enables
the detection of agonist, modulation, or antagonist properties of screening
compounds in a single run. The presence of an agonist is revealed not only
by its own activity (Fig. 15.4-8(a))but also through the receptor desensitization
964 1 15 Target Families
Table 15.4-4 Commonly used assays for CPCR HTS
Molecular Coupling Assay type Plate format Comments

principles limitations
Ligand Radioligand filtration 96 Safety and costs

binding assay
SPA radioligand binding 384 costs
FP 384,1536 Ligand labeling
G-protein GTPy35S filtration assay 96 Safety and costs
activation
GTPy35S SPA 384 costs
Second cAMP determination 1536 FP: Sensitivity
messenger based on fluorescence
approaches: FP, FI,
HTRF/LANCE.
rP3 determination on 96,384 Low throughput,
binding and mainly Gq
chromatographic
approaches
Ca+2determination using 384 Mainly with G,
specific fluorescence and Gs with
reader technology and CNG2 channels
indicator dyes
(FLIPR/Fluo-4,
FDSS6000/Fluo-4,Fura-2)
or proteins (Aequorin).
Nuclear Reporter gene assays 384.1536 May lead to signal
activation activated via CRE and SRE variation based on
response elements and cell quality. Long
SEAP, luciferase, and incubation.
B-lactamase readouts
HTRF/LANCE - homogeneous time-resolved

fluorescence/lanthanide cryptate excitation; SPA - scintillation
proximity assay - homogeneous assay which detects
radioisotopes in close proximity to a solid scintillant;
FP - fluorescence polarization; FI - fluorescence intensity:
CNG2 - cAMP gated ion channel 2; CRE - cAMP response
element; SRE - serum response element; SEAP - secreted
alkaline phosphatase. For further details see Refs. [lll-1131
during the antagonist assay phase. Modulators are detected by using a small
agonist concentration in the second phase (Fig. 15.4-8(b))and may be devoid
of agonist properties. Antagonists are clearly appearing in the third phase
following an injection of a higher GABA concentration and are characterized
by a lack of intrinsic activity in the first phase (15.4-8(c)).
Multiplex assays do not achieve the compound throughputs possible with
single measurement assays; however, they produce much richer information
already in primary screening, which is invaluable for compound categorization
and prioritization by the medicinal chemists. A further advantage of such assays
Fig. 15.4-8 FLIPR calcium traces

from a CABAB receptor triplex
assay in a 384-well format.
Experiments are performed with
Chinese Hamster Ovarian K1
(CHOK1) cells stably expressing
the CABABR~ and C A B A B R ~
receptor subunits. The cells are
loaded with the calcium sensitive
dye Fluo-4. Three successive
injections are performed during
the course o f t h e experiment. The
first injection is with screening
compound in general a t 10 pM,
followed by two injections o f
CABA at concentrations
corresponding t o i t s EC20
(0.15 pM) and EC8o (10 pM),
respectively. Different FLIPR traces
are obtained depending on the
nature o f the screening
compound. (a) The agonist CABA;
(b) L-Baclofen, a CABAs receptor
modulator; (c) CCP56999, a
competitive CABAB antagonist
[117]. The shown signals are
expressed as nonnormalized
fluorescence changes.
is the possibility to exploit fluorescent kinetic traces to exclude compounds

interfering nonspecifically with the readout of the affecting cells. The lower
compound throughput per time unit can be largely compensated by a careful
assay design and by using assay automation to ensure overnight operations.
966
Although the information from cell-based, mainly heterologous, systems

is very valuable, caution is necessary for the interpretation of its i n vivo
physiological relevance. For instance, in stably transfected CHO cells,
Cevimeline (AF102B) behaves as a classical M1 antagonist, measuring
adenylate cyclase activity, fully blocking the activation by carbachol. However,
when measuring IP3 activation in the same cell line via PLCb or PLA2, the
compound behaves as a partial agonist. And even more amazing, when
monitoring with confocal microscopy intracellular Ca+* mobilization, it
behaves as superagonist having stronger response than carbachol [ 1181. Thus,
an important role comes to advanced HTS data analysis for decision support
in drug discovery [ 1191.
15.4.4.3 Designing a Focused Cornbinatorial Library for Monoarnine-related

GPCRs
On the basis of the central chemogenomics principle that similar ligands bind
to similar targets and that ligands of close homologous receptors are generally
considered as putative starting points in lead-finding programs for receptors
for which no specific ligands are yet known, we proposed a chemogenomics
knowledge-based combinatorial library design strategy for lead finding [531.
The strategy is founded on the integration of both, the deconvolution of known
modular ligands of homologous receptors into their component fragments and
the structural bioinformatics comparison of the binding sites for the individual
ligand fragments. In essence, in the ligand space, by the analysis of both the
ligand architectures and the structures of the component “one-site filling”
fragments of known ligands, it should be possible (by referring to the locally,
most directly related, and characterized receptors) to identify those component
ligand fragments, which based on the binding site similarities are potentially
best suited for the design of ligands tailored to the new target receptor.
The strategy was presented in the context of designing the tertiary amine
(TAM) combinatorial library directed toward monoamine-related GPCRs for
which the conserved aspartate residue D3.32 in TM3 was demonstrated by two-
dimensional mutation experiments to be responsible for the recognition of
the charged amino group of monoamine ligands by their GPCRs (Fig. 15.4-9).
Focusing on the central importance of the D3.32 residue and using the
D3 ~ ~ X ~ G ( D E ) R ( Y
motif
F H )in TM3 as sequence signature defining relatedness
to the monoamine GPCR subfamily we identified, by database searches, 50
human GPCRs, which included 7 orphan GPCRs (two ofwhich are now known
to correspond to pseudogenes) and constituted the originally aimed target
repertoire of the library. Later it was recognized that trace amine receptors,
which conserve the D3.32 residue, and also chemokine receptors, which lack the
D3.32 residue, but in which a corresponding glutamate residue E7.39 in TM7 is
responsible for the recognition of the TAM chemotype, have to be considered
on the basis of molecular recognition principles as monoamine-like GPCR;
7 J M Receptor Target Family
(a) Known Reference Architectures

15.4 The CPCR -
(b) Novel Compound Prototypes

I 967
HO
kNH
v
HU 0
8 I \
H O a
CI
Serotonin PrOpranOlOl 8-OH-DPAT
5-HT ago. p antagonist 5-HT,, part. agonist
f5
b,,
o=s=o
Ketanserin Janssen-1 WAY-100635 5-HT, antagoonist

5-HTzn antagonist D, antagonist 5-HTz, antagonist pK, = 7.25
o~ .
RO-16814 Kissei-1
Q0
’0,
p agonist D, antagonist
Fig. 15.4-9 Prototype structures ofthe ligands. In addition to this natural

Novartis TAM combinatorial libraries architecture, ligands exist where two or
generated through reductive amination o f three such “simple” ligand fragments are
selected aldehydes and secondary amines. linked around a basic positively charged
The new structures for which examples are group: these ligands are called,
shown in (b) were designed to be similar t o correspondingly, double and triple ligands. All
known monoamine CPCR ligands for which three architectures - “simple”, “double”,
examples are shown in (a). Ligands, which and “triple” - o f known monoamine CPCR
are o f the size o f the endogenous ligands, ligands are represented in the TAM library.
are herein called simple - one-sitefilling
what extents the target repertoire to around 80 GPCRs - a significant part of

all the class A GPCRs [ll].
Databases of site-specific ligand fragments, which should be recombined
on an appropriate scaffold to yield ligands, are the keystones of such a
knowledge-based system. Their generation of site specifics is in principle
possible through the deconvolution of the known ligands guided by SAR and by
molecular similarity consideration. Given the promiscuity of some fragments
(e.g., symmetric ligands), caution must be exercised before drawing definitive
conclusions about the actual positioning of the fragments. Pragmatically,
968
I these limitations to the generation of site-specificligand fragment databases
75 Target Families
were approached by pooling fragments into multiple pools and by designing

generic combinatorial libraries of known privileged active fragments around
appropriate scaffolds. The TAM library was screened in a number of GPCR
campaigns and high hit rates were observed especially for the monoamine
and chemokine GPCRs. Especially noteworthy is that the hit rates of the
designed TAM library are higher than those observed for a corresponding
library without specific design input. The TAM library includes many new
combinations of known active fragments and privileged GPCR motifs. In
addition to addressing new receptors, this should allow the discovery of
fascinating multireceptor profiles of potential pharmacological interest. The
search for antagonist of the 5-HT7 GPCR, which has the 5 - H T 1receptors
~ as
neighbors in the sequence dendrogram, illustrates the successful use of the
TAM library. Searching with S-HT~A reference compounds, using the Similog
method, within the TAM library, we were able to identify a 10% hit rate
(~K< B 5 pM) when only a biological assay with limited throughput capacity
was available. The hits were arylpiperazines, which in follow-up studies were
found to be active on other monoamine GPCRs also.
15.4.5
Future Development
The molecular knowledge of GPCRs as information processing units continues

to progress at an impressive pace [4, 1201. Besides the many efforts
and opportunities on orphan receptors, GPCR research focuses on deeper
characterization of GIP networks and receptosomes.
Key questions focus on the physiological and therapeutic relevance of re-
ceptor homo- and heterodimerization [71, 731. GPCRs were initially believed
to be monomeric entities, but accumulating evidence now supports the pres-
ence of GPCRs in multimeric forms using techniques like immunoblotting
and coimmunoprecipitation combined with FRET and B RET experiments in
living cells. The existence of homodimers is established for many class A
GPCRs (e.g., dopamine D2 and D3, pz-adrenoceptor, muscarinic M1 and M2
receptors, NKI, opiate, and SST5 receptors), and class C GPCRs (e.g., mGluR
and CaR forming covalent dimers via a cystein bridge linking the N-terminal
domains of the two receptors). Proposed roles for heterodimerization include
diversifying the pharmalogical response, providing a further mechanism for
the fine tuning of hormone signaling and G-protein specificity,and regulating
the receptor ontogenesis and internalization. Differences in the pharmaco-
logical properties of heterodimer GPCRs were observed for the G/K-opiate
receptors, dopamine/somatostatin receptors, and GABABRI/GABABRz recep-
tors. The GABABR~/GABABR~ heterodimer is particularly illustrative [121]. It
is known that the GABABR~ is not trafficked effectively to the cell surface in
the absence of GABABR~ expression. In addition, GABABR~ binds the agonist
ligand but is not coupled to G-proteins, whereas GABAB~2
activates G-protein
I
15.4 The CPCR - TTM Receptor Target Family 969
signaling but does not bind the ligand. It was recognized that new compound
screening strategies, allowing the detection of ligand binding or function only
by a heterodimer pair in the presence of the corresponding homodimers, are
required to allow rapid and effective identification of ligands with these char-
acteristics. Only with such ligands at hand, it will be possible to tease out the
physiological relevance of GPCR heterodimerization [71]. The opioid agonist
ligand 6’-guanidinonaltrindole (6’-GNTI) is the first example of such a lig-
and. G’-GNTI has the unique property of selectively activating only G/K-opioid
receptor heterodimers but not homomers [122]. Importantly, G’-GNTI is an
analgesic, thereby demonstrating that opioid receptor heterodimers are indeed
functionally relevant i n vivo. However, G’-GNTI induces analgesia only when
it is administered in the spinal cord but not in the brain, suggesting that the
organization of heterodimers is tissue specific. Other studies are indirect and
may reflect cross-talk between the signaling pathways at a level downstream
of receptor activation. The ability of B-blockers to interfere with angiotensin
AT1-mediated signaling, and the ability of the AT1 receptor blocker valsar-
tan to reduce catecholamine-induced elevation in the heart rate may indicate
functional angiotensin AT1-adrenoceptor interactions i n vivo.
The discovery that some GPCRs appear to function in preformed and
dynamic complexes with other signal transduction and scaffolding proteins
opens many interesting possibilities for drug discovery. For instance, targeting
the postsynaptic density (PSD-95) and Homer scaffolding proteins might
result in a new manner to modulate receptor activity [123]. PSD-95 is known
to function in synaptic neurotransmission and plasticity by enhancing or
depressing the synaptic strength depending on the frequency of neuronal
firing. The protein is a multiadapter, which binds via its PDZ domain
specific GPCRs (e.g., ~ - H T ~ 5-HT2c)
A, and ion channels (e.g., N-methyl-
D-aspartate (NMDA)) and enables, together with other protein-protein
interactions, the spatial organization of complex microarchitectures jointly
with the cytoskeleton. Similarly, the Homer proteins, which play a role in
the glutamatergic synaptic transmission, are composed of an N-terminal
enabled VASP homology type 1 (EVH1) domain, interacting with GPCRs
(e.g., mGluRl, mGluR5), ion channels (e.g., IP3 or ryanodine Ca+* receptor
channels, Transient receptor potential cation channel 1 and 2) and other
proteins, and a C-terminal coiled-coil domain that enables dimerization and
complex formation. It remains to be seen how general or specific these
intracellular GPCR modulator mechanisms are. In addition, small molecular
compounds able to disrupt or reinforce these interactions are needed to further
understand their physiological importance.
A new trend is also the therapeutic evaluation of monoclonal antibodies
against GPCRs. Although small molecule drugs seem to be the preferred
agents, recent success stories targeting the CCR5 receptors against HIV entry,
or the thyroid-stimulating hormone (TSH) receptor in Grave’s disease, show
that this route is also feasible.
970
More generally, in the age of genomic medicine, the pharmacogenetics

of GPCRs is becoming increasingly important and plays a role especially in
the target validation of the new GPCRs and the clinical validation of the
drugs [124], as was recently exemplified for adrenoceptors [125]. The study
of allelic variations, based on single nuclear polymorphism (SNP) or other
sequence polymorphism data, allows the identification of the major allele of
the target gene needed for the development of screening and profiling assays.
Alternative splicing (e.g., for the human histamine H3 receptor 20 isoforms
are reported), RNA editing (e.g., for S-HTzc receptor, seven major isoforms
are predicted differing in their second intracellular loop), and coupling to
specific G-proteins,have all been selected by evolution to modulate the activity
of GPCRs, providing multiple regulatory switches to fine-tune basal cellular
activities. In addition, genetic linkage studies provide evidence that a mutation
in the gene is associated with susceptibility to appearance/progression of the
disease indication.
Compared to the thus far discussed emphasis on drug discovery applications,
olfactory receptors play an important role in the perfume and cosmetic
industry; the screening and design of new odorants is an economically
interesting application. The discovery that the malaria transmitting mosquitos
Anopheles, which is responsible for the death of more than one million people
each year, possesses odorant receptors for particular components of human
sweat means that different ligands could be screened for their activation or
inhibition of these receptors, potentially leading to new, more effective insect
traps and repellents [126].
15.4.6
Conclusions
Chemical biology investigations of GPCRs started with very simple questions

to understand how hormones, such as adrenaline or glucagon, signal at the
intracellular level and how this signaling translates into the physiological
response. During the last 25 years of molecular GPCR research, the
understanding of the machinery was elucidated for a few model GPCRs
in great detail and revealed a fascinating beauty which turned out to be far
more complex than that initially expected [4]. During the next several years,
our detailed knowledge about many newly deorphanized GPCRs and the
organization and regulation of the GIP network constituting the receptosomes
will certainly continue to grow. The herein mentioned chemical biology
approaches will all contribute to the identification of chemical compounds that
enable the directed targeting of each of these components. In the perspective
of the drug discovery, it will be especially interesting to follow how signaling
drugs will be discovered further downstream, or whether the GPCR ligand-
binding sites will remain as the preferred entry point for medication. A
central question will be how fast these molecular discoveries will translate
References I971
into new medicines. Especially with the newly discovered and deorphanized
receptors, the ultimate challenge resides in the enormous knowledge gap
existing between the new molecular discoveries and their significance for
disease processes and medicine. While the classical hormone GPCR targets
were “top-down’’ validated, on the basis of pharmacology, physiology, and
clinical medicine, the new hormone GPCR systems come “bottom-up”: Their
early validation based on bioinformatics and genetics data are expected to
direct clinical research, and the comprehensive understanding of their role in
physiology will take time. It will be interesting to see the medical outcome of
these activities after another decade of research.
Acknowledgments
Drs. K. Azzaoui, B. Faller, P. Floersheim, P. Fuerst, D. Hoyer, H. Mattes,

H.-J. Roth, A. Sailer, G. Scheel, P. Schoeffter, S. Siehler, R.S. Tsai, and
R. Wolf (all NIBR associates) are acknowledged for various support and
discussions. We thank Drs. J. Mosbacher and K. Kaupman (also N I B R
associates) for CHOKl cells expressing the heterotrimeric GABABreceptor and
the selective modulators and antagonists.. B. Frisch and M. Brasey, from NIBR
Knowledge Center, are acknowledged for support with IMS Knowledge Link.
S.C.DiClemente, from NIBR Communications, is gratefully acknowledged for
editorial assistance.
References
1. A.L. Hopkins, C.R. Groom, The G protein-coupled receptors, Nucleic

druggable genome, Nut. Rev. Drug Acids Res. 2003, 31, 294-297.
Discou. 2002, I , 727-730. 7. S.R.J. Hoare, Mechanisms of peptide
2. T. Klabunde, G. Hessler, Drug and nonpeptide ligand binding to
design strategies for targeting class B G-protein-coupled receptors,
G-protein-coupled receptors, Drug Discov. Today 2005, 10,
ChemBioChem 2002,3,928-944. 417-427.
3. T.W. Schwartz, B. Holst, in Textbook 8. J.P. Pin, T. Galvez, L. Prezeau,
ofReceptor Pharmacology, (Eds.: J.C. Evolution, structure, and activation
Foreman, T. johansen), CRC Press, mechanism of family 3/C
Washington, 2003, pp. 81-109. G-protein-coupled receptors,
4. C. Ellis, The state of GPCR research Pharmacol. Ther. 2003, 98, 325-354.
in 2004, Nat. Rev. Drug Dkcou. 2004, 9. R. Fredriksson, M.C. Lagerstrom,
3,577-626. L.G. Lundin, H.B. Schioth, The
5. H.C. Huang, P.S. Klein, The Frizzled G-protein-coupled receptors in the
family: receptors for multiple signal human genome from five main
transduction pathways, Genome Biol. families. Phylogenetic analysis,
2004, 5, 234.1-2234.7. paralogon groups, and fingerprints,
6. F. Horn, E. Bettler, L. Oheira, Mol. Pharmacol. 2003, 63, 1256-1272.
F. Campagne, F.E. Cohen, G. Vriend, 10. The IUPHAR committee on receptor
GPCRDB information system for nomenclature and drug classification.
15 Target Families
972
I 7'he I U P H A R Compendium of 22. R.A. Cerione, B. Stmlovici, J.L.
Receptor Characterization and Benovic, C.D. Strader, M.G. Caron,
Classijcation, 2nd ed., IUPHAR R.J. Lefkowitz, Reconstitution of
Media, London, 2000. beta-adrenergic receptors in lipid
11. E. Jacoby,A. Schuffenhauer, vesicles: affinity
P. Acklin, in Chemogenomics in Drug chromatography-purified receptors
Discovery-A Medicinal Chemistry confer catecholamine responsiveness
Perspective, (Eds.: H. Kubinyi, on a heterologous adenylate cyclase
G. Muller), Wiley-VCH,Weinheim, system, Proc. Natl. Acad. Sci. U S A .
2004, pp. 139-166. 1983,80,4899-4903.
12. G. Wess, How to escape the 23. R.A. Dixon, B.K. Kobilka, D.J.
bottleneck of medicinal chemistry, Strader, J.L. Benovic, H.G. Dohlman,
Drug Discov. Today 2002,4,533-535. T. Frielle, M.A. Bolanowski, C.D.
13. R. Lekowitz, Historical review: a Bennett, E. Rands, R.E. Diehl,
brief history and personal Cloning of the gene and cDNA for
retrospective of mammalian beta-adrenergic receptor
seven-transmembrane receptors, and homology with rhodopsin,
Trends Pharmacol. Sci. 2004, 25, Nature 198G,21, 75-79.
413-422. 24. Y.A. Ovchinnikov, Structure of
14. R.R. Neubig, M. Spedding,
rhodopsin and bacteriorhodopsin,
Photochem. Photobiol. 1987,45,
T. Kenakin, A. Christopoulos,
International union of pharmacology
909-914.
committee on receptor nomenclature
25. R. Henderson, P.N. Unwin,
and drug classification. XXXVIII.
Three-dimensional model of purple
Update on terms and symbols in
quantitative pharmacology, membrane obtained by electron
Pharmacol. Rev. 2003,55,597-606. microscopy, Nature 1975,257, 28-32.
15. M. Rodbell, Nobel lecture. Signal 26. N. Robas, M. O'Reilly,
transduction: evolution of an idea, S. Katugampola, M. Fidock,
Biosci. Rep. 1995,15, 117-133. Maximizing serendipity: strategies
16. A.G. Gilman, Nobel Lecture. G for identifying ligands for orphan
proteins and regulation of adenylyl G-protein-coupled receptors, Curr.
cyclase, Biosci. Rep. 1995,15, 65-97. Opin. Pharmacol. 2003,3, 121-126.
17. E.W. Sutherland, Studies on the 27. A. Wise, S.C. Jupe, S. Rees, The
mechanism of hormone action, identification of ligands at orphan
Science 1972,177,401-408. G-protein coupled receptors, Annu.
18. M.J. Marinissen, J.S. Gutkind, Rev. Pharmacol. Toxicol. 2004, 44,
G-protein-coupled receptors and 43-66.
signaling networks: emerging 28. R. Seifert, K. Wenzel-Seifert,
paradigms, Trends Pharmacol. Sci. Constitutive activity of
2001,22,368-376. G-protein-coupled receptors: cause of
19. H. Hamm, The many faces of G disease and common property of
protein signaling, J . Biol. Chem. 1998, wild-type receptors, Naunyn
273, 669-672. Schmiedebergs Arch. Pharmacol. 2002,
20. /. Black, Nobel lecture in physiology 366,381-416.
or medicine-1988. Drugs from 29. E. Schipani, K. Kruse, H. Juppner, A
emasculated hormones: the principle constitutively active mutant
of syntopic antagonism, In Vitro Cell. PTH-PTHrP receptor in Jansen-type
Dev. Biol. 1989,25, 311-320. metaphyseal chondrodysplasia,
21. M.G. Caron, Y. Srinivasan, J. Pitha, Science 1995,268, 98-100.
K. Kociolek, R.J. Lefkowitz,Affinity 30. P. Strange, Mechanisms of inverse
chromatography of the agonism at G-protein-coupled
beta-adrenergic receptor, J . Biol. receptors, Trends Pharmacol. Sci.
Chem. 1979,254,2923-2927. 2002,23,89-95.
31. C.D. Strader, I.S. Sigal, R.B. Register, phospholipase C,]. Biol. Chem. 1995,
M.R. Candelore, E. Rands, R.A. 270,15175-15180.
Dixon, Identification of residues 42. B.R. Conklin, Z. Farfel, K.D. Lustig,
required for ligand binding to the D. Julius, H.R. Bourne, Substitution
beta-adrenergic receptor, Proc. Natl. of three amino acids switches
Acad. Sci. U.S.A. 1987, 84, receptor specificity of Gq alpha to
4384-4388. that of Gi alpha, Nature 1993, 363,
32. A.M. van Rhee, K.A. Jacobson, 274-276.
Molecular architecture of G 43. D.T. Chalmers, D.P. Behan, The use
protein-coupled receptors, Drug Dev. of constitutively active GPCRs in
Res. 1996, 37, 1-38. drug discovery and functional
33. J. Bockaert, P. Marin, A. Dumuis, genomics, Nut. Drug Discou. Rev.
L. Fagni, The ‘magic tail’ of G 2002, I, 599-607.
protein-coupled receptors: an 44. L.M. McLatchie, N.J. Fraser, M.J.
anchorage for functional protein Main, A. Wise, J . Brown,
networks, FEBS Lett. 2003, 546, N. Thompson, R. Solari, M.G. Lee,
65-72. S.M. Foord, RAMPS regulate the
34. J. Bockaert, J.P. Pin, Molecular transport and ligand specificity of the
tinkering of G protein-coupled calcitonin-receptor-like receptor,
receptors: an evolutionary success, Nature 1998, 393, 333-339.
EMBOJ. 1999, 18,1723-1729. 45. G. Milligan, High-content assays for
35. J. Bockaert, A. Dumuis, L. Fagni, ligand regulation of
P. Marin, GPCR-GIP networks: a G-protein-coupled receptors, Drug
first step in the discovery of new Discou. Today 2003, 8, 579-585.
therapeutic drugs?, Curr. Opin. Drug. 46. C. Granas, B.K. Lundholt,
Discov. Devel. 2004, 7, 649-657. A. Heydorn, V. Linde, H.-C.
36. D.K. Vassilatis, J.G. Hohmann, Pedersen, C. Krog-jensen, M.M.
H. Zeng, F. Li, J.E. Ranchalis, M.T. Rosenkilde, L. Pagliaro, High content
Mortrud, A. Brown, S.S. Rodriguez, screening for G protein-coupled
J.R. Weller, A.C. Wright, J.E.
receptors using cell-based protein
Bergmann, G.A. Gaitanaris, The G translocation assays, Comb. Chem.
protein-coupled receptor repertoires High Throughput Screen. 2005, 8,
of human and mouse, Proc. Natl.
301-309.
Acad. Sci. U.S.A. 2003, 100,
47. M.G. Ludwig, M. Vanek, D. Guerin,
4903-4908.
J.A. Gasser, C.E. Jones, U. Junker,
37. L. Buck, R. Axel, A novel multigene
H. Hofstetter, R.M. Wolf, K. Seuwen,
family may encode odorant receptors:
Proton-sensing G-protein-coupled
a molecular basis for odor
recognition, Cell 1991, 65,175-187. receptors, Nature 2003, 425,93-98.
38. A. Terakita, The opsins, Genome Biol.
48. K. Palczewski,T. Kumasaka, T. Hori,
2005, 6, 213.1-213.9. C.A. Behnke, H. Motoshima, Crystal
39. R. Fredriksson, H.B. Schioth, The
structure of rhodopsin: A G
repertoire of G-protein coupled protein-coupled receptor, Science
receptors in fully sequenced 2000, 289,739-745.
genomes, Mol. Phamacol. 2005, 67, 49. E. Pebay-Peyroula,G. Rummel, J.P.
1414-1425. Rosenbusch, E.M. Landau, X-ray
40. A. Sodhi, S. Montaner, J.E. Gutkind, structure of bacteriorhodopsin at 2.5
Viral hijacking of angstroms from microcrystals grown
G-protein-coypled-receptor signaling in lipidic cubic phases, Science 1997,
networks, Nat, Rev. Mol. Cell. Biol. 277,1676-1681.
2004,5,998-1012. 50. M.F. Hibert, S. Trumpp-Kallmeyer,
41. S. Offermanns, M.I. Simon, G alpha J. HofTlack, This is not a G
15 and G abha 16 couple a wide protein-coupled receptor, Trends
variety of receptors to Phamacol. Sci. 1993, 14, 7-12.
974
51. E. Jacoby, J.L. Fauchilre, three-dimensional model building,

E. Raimbaud, S. Ollivier, A. Michel, and receptor library screening, /.
M. Spedding, A three binding site Chem. Inj Comput. Sci. 2004, 44,
hypothesis for the interaction of 1162-1176.
ligands with monoamine G-protein 59. J.M. Klco, C.B. Wiegand,
coupled receptors: implications for K. Narzinski, T.J. Baranski, Essential
combinatorial ligand design, Quant. role for the second extracellular loop
Struct.-Act. Relat. 1999, 18, 561--572. in C5a receptor activation, Nut.
52. S. Filipek, D.C. Teller, K. Palczewski, Struct. Mol. Biol. 2005, 12, 320-326.
R. Stenkamp, The crystallographic 60. L. Shi, J.A. Javitch, The binding site
model of rhodopsin and its use in of aminergic G protein-coupled
studies of other G protein-coupled receptors: the transmembrane
receptors, Annu. Rev. Biophys. segments and second extracellular
Biomol. Struct. 2003, 32, 375-397. loop, Annu. Rev. Pharmacol. Toxicol.
53. E. Jacoby, A novel chemogenornics 2002,42,437-467.
knowledge-based ligand design 61. A. Evers, G. Klebe, Successful virtual
strategy-application to G-protein screening for a submicromolar
coupled receptors, Quant. Struct.-Act. antagonist of the neurokinin-1
Relat. 2001, 20, 115-123. receptor based on a ligand-supported
54. D.R. Flower, Modelling homology model, /. Med. Chern.
G-protein-coupled receptors for drug 2004,47,5381-5392.
design, Biochim. Biophys. Acta 1999, 62. C. Bissantz, P. Bernard, M. Hibert,
1422,207-234. D. Rognan, Protein-based virtual
55. S.M. Bromidge, S. Dabbs, D.T. screening of chemical databases. 11.
Davies, D.M. Duckworth, I.T. Forbes, Are homology models of G-protein
P. Ham, G.E. Jones, F.D. King, D.V. coupled receptors suitable targets?,
Saunders, S. Starr, K.M. Thewlis, Proteins 2003, 50, 5-25.
P.A. Wyman, F.E. Blaney, C.B. 63. O.M. Becker, Y. Marantz,
Naylor, F. Bailey, T.P. Blackburn, S. Shacham, B. Inbal, A. Heifetz,
V. Holland, G.A. Kennett, G.J. Riley, 0. Kalid, S. Bar-Haim, D. Warhaviak,
M.D. Wood, Novel and selective M. Fichman, S. Noiman, G
5-HT2C/2B receptor antagonists as protein-coupled receptors: in silico
potential anxiolytic agents: synthesis, drug discovery in 3D, Proc. Natl.
quantitative structure-activity Acad. Sci. U.S.A. 2004, 101,
relationships, and molecular 11304- 11309.
modeling of substituted 64. J. Ballesteros, S. Kitanovic,
1-(3-pyridylcarbamoyl)indolines, J . F. Guarnieri, P. Davies, B.J. Fromme,
Med. Chem. 1998,41,1598-1612. K. Konvicka, L. Chi, R.P. Millar, J.S.
56. K. Bodensgaad, M. Ankersen, Davidson, H. Weinstein, S.C.
H. Thorgensen, B.S. Hansen, B.S. Sealfon, Functional microdomains in
Wulff, R.P. Baywater, Recognition of G-protein-coupled receptors. The
privileged structures by G-protein conserved arginine-cage motif in the
coupled receptors, J . Med. Chem. gonadotropin-releasing hormone
2004,47,888-899. receptor, /. Biol. Chem. 1998, 273,
57. O.M.Becker, S. Shacham, 10445- 10453.
Y. Marantz, S. Noiman, Modeling the 65. T. Kenakin, Protean agonists. Keys to
3D structure of GPCRs: advances and receptor active states?, Ann. N.Y.
application to drug discovery, Cum. Acad. Sci. 1997,812,116-125.
Opin. Drug. Discou. Deuel. 2003, 6 , 66. C.R. Grace, M.H. Perrin, M.R.
353-361. DiGruccio, C.L. Miller, J.E. Rivier,
58. C. Bissantz, A. Logean, D. Rognan, W.W. Vale, R. Riek, NMR structure
High-throughput modeling of and peptide hormone binding site of
human G-protein coupled receptors: the first extracellular domain of a
amino acid sequence alignment, type B 1 G protein-coupled receptor,
Proc. Natl. Acad. Sci. U.S.A. 2004, dimeric leukotriene B4 receptor
101,12836-12841. BLTl and the G-protein,]. Mol. Biol.
67. R.C. Gensure, N. Shimizu, J . Tsang, 2003, 329,815-829.
T.J. Gardella, Identification of a 74. K. Lundstrom, Structural genomics
contact site for residue 19 of of GPCRs, Trends Biotechnol. 2005,
parathyroid hormone (PTH) and 23, 103-108.
PTH-related protein analogs in 75. K.H. Bleicher, L.G. Green, R.E.
transmembrane domain two of the Martin, M. Rogers-Evans, Ligand
type 1 PTH receptor, Mol. Endocuinol. identification for G-protein-coupled
2003, 17,2647-2658. receptors: a lead generation
68. S.U. Miedlich, L. Gama, K. Seuwen. perspective, Curr. Opin. Chem. Biol.
R.M. Wolf, G.E. Breitwieser, 2004,8,287-296.
Homology modeling of the 76. P. Jimonet, R. Jager, Strategies for
transmembrane domain of the designing GPCR-focused libraries
human calcium sensing receptor and and screening sets, Curr. Opin. Drug.
localization of an allosteric binding Discov. Devel. 2004, 7, 325-333.
site, ]. Bid. Chem. 2004, 279, 77. R. Crossley, The design of screening
7254-7263. libraries targeted at G-protein
69. A. Pagano, D. Ruegg, S. Litschig, coupled receptors, Curr. Top. Med.
N. Stoehr, C. Stierlin, M. Heinrich, Chem. 2004,4,581-588.
P. Floersheim, L. Prezeau, F. Carroll, 78. J . Bajorath, Integration of virtual and
J.P. Pin, A. Cambria, I . Vranesic, P. J. high-throughput screening, Nat. Rev.
Flor, F. Gasparini, R. Kuhn, The Drug Discov. 2002, 1, 882-894.
non-competitive antagonists 79. M.M. Hann, T.I. Oprea, Pursuing the
2-methyl-6-(phenylethynyl)pyridine leadlikeness concept in
and 7-hydroxyiminocyclopropan pharmaceutical research, Cum. Opin.
[blchromen-la-carboxylic acid ethyl Chem. Bid. 2004, 8,255-263.
ester interact with overlapping 80. S. Halazy, G-protein coupled
binding pockets in the receptors bivalent ligands and drug
transmembrane region of group I design, Exp. Opin. Ther. Patents 1999,
metabotropic glutamate receptors, 1. 9,431-446.
Bid. Chem. 2000, 275, 33750-33758. D.J. Daniels, A. Kulkarni, 2 . Xie,
S. Angers, A. Salahpour, M. Bouvier,
81.
70.
R.G. Bhushan, P.S. Portoghese, A
Dimerization: an emerging concept
bivalent ligand (KDAN-18)
for G protein-coupled receptor
containing &antagonist and
ontogeny and function, Annu. Rev.
K-agonist pharmacophores bridges 8 2
Pharmacol. Toxicol. 2002, 42,
and K , opioid receptor phenotypes, ].
409-435.
Med. Chem. 2005,48,1713-1716.
71. G. Milligan, G protein-coupled
receptor dimerization: function and 82. V.J. Hruby, Designing peptide
receptor agonists and antagonists,
ligand pharmacology, Mol.
Nat. Rev. Drug Discov. 2002, I ,
Pharmacol. 2004, 66,1-7.
847-858.
72. S. Filipek, K.A. Krzysko, D. Fotiadis,
Y. Liang, D.A. Saperstein, A. Engel, 83. M. Eguchi, M. McMillan, C. Nguyen,
K. Palczewski, A concept for G J.L. Teo, E.Y. Chi, W.R. Henderson
protein activation by G Jr, M. Kahn, Chemogenomics with
protein-coupled receptor dimers: the peptide secondary structure
transducin/rhodopsin interface, mimetics, Comb. Chem. High
Photochem. Photobiol. Sci.2004, 63, Throughput Screen. 2003, 6,611-621.
628-638. 84. T.R. Webb, in Chemogenomicsin Drug
73. J.L. Baneres, J . Parello, Discovery-A Medicinal Chemistry
Structure-based analysis of GPCR Perspective, (Eds.: H. Kubinyi,
function: evidence for a novel G. Miiller), Wiley-VCH, Weinheim,
nentameric assemblv between the 2004, pp. 313-324.
976
85. S.L. Garland, P.M. Dean, Design 1. Molecular frameworks, J. Med.

criteria for molecular mimics of Chem. 1996,39,2887-2893.
fragments of the beta-turn. 1. C alpha 94. C.G. Wermuth, Selective
atom analysis, J. Cornput.-Aided Mol. optimization of side activities:
Des. 1999, 13,469-483. another way for drug discovery, J.
86. S.L. Garland, P.M. Dean, Design Med. Chem. 2004,47,1303-1314.
criteria for molecular mimics of 95. R.P. Sheridan, S.K. Kearsley, Why do
fragments of the beta-turn. 2. C we need so many chemical similarity
alpha-C beta bond vector analysis, J. search methods?, Drug Discov. Today
Cornput.-Aided Mol. Des. 1999, 13, 2002,4,903-911.
485-498. 96. J. Hert, P.Willett, D.J. Wilton,
87. B.E. Evans, K.E. Rittle, M.G. Bock, P. Acklin, K. Azzaoui, E. Jacoby,
R.M. DiPardo, R.M. Freidinger, W.L. A. Schuffenhauer, Topological
Whitter, G.F. Lundell, D.F. Veber, descriptors for similarity-based
P.S. Anderson, R.S. Chang, Methods virtual screening using multiple
for drug discovery: development of bioactive reference structures, Org.
potent, selective, orally effective Biomol. Chem. 2004, 2, 3256-3266.
cholecystokinin antagonists, I. Med. 97. A. Schuffenhauer, P. Floersheim,
Chem. 1988,31,2235-2246. P. Acklin, E. Jacoby, Similarity
88. A.A. Patchett, R.P. Nargund, in metrics for ligands reflecting the
Annual Reports in Medicinal similarity of the target proteins, 1.
Chemistry, Vol. 35, (Ed.: G.L. Chem. In{ Comput. Sci. 2003, 43,
Trainor), Academic Press, San Diego, 391-405.
2000, pp. 289-298. 98. N.P. Savchuck, K.V. Balakin, S.E.
89. G. Muller, in Chemogenomics i n Drug Tkachenko, Exploring the
Discovery -A Medicinal Chemistry chemogenomic knowledge space
Perspective, (Eds.: H. Kubinyi, with annotated chemical libraries,
G. Muller), Wiley-VCH, Weinheim, Curr.Opin. Chem. Biol. 2004, 8,
412-417.
2004, pp. 7-42.
90. T. Guo, D.W. Hobbs, Privileged 99. M. von Korff, M. Steger,
structure-based combinatorial GPCR-tailoredpharmacophore
pattern recognition of small
libraries targeting G protein-coupled
molecular ligands, J. Chem. In$
receptors, Assay Drug Dev. Technol.
Comput. Sci. 2004,44, 1137-1147.
2003, I, 579-592.
100. G. Schneider, M.L. Lee, M. Stahl,
91. C.A. Willoughby, S.M. Hutchins,
P. Schneider, De novo design of
K.G. Rosauer, M.J. Dhar, K.T.
molecular architectures by
Chapman, G.G. Chicchi,
evolutionary assembly of
S. Sadowski, D.H. Weinberg, drug-derived building blocks, 1.
S. Patel, L. Malkowitz, J. Di Salvo, Cornput.-Aided Mol. Des. 2000, 14,
S.G. Pacholok, K. Cheng, 487-494.
Combinatorial synthesis of 101. G. Jenkins, Targeting GPCRs in
3-(amidoalkyl)and silico, Curr. Drug Discov. 2004, 3,
3-(aminoalkyl)-2-arylindole 23-26.
derivatives: discovery of potent 102. A. Schuffenhauer, J. Zimmermann,
ligands for a variety of G-protein R. Stoop, J.J. van der Vyver,
coupled receptors, Bioorg. Med. S. Lecchini, E. Jacoby,An ontology
Chem. Lett. 2002, 12, 93-96. for pharmaceutical ligands and its
92. R.P. Sheridan, Finding multiactivity application for library design and In
substructures by mining databases of Silico screening, J . Chem. Inf:
drug-like compounds, /. Chem. In$ Comput. Sci. 2002, 42, 947-955.
Comput. Sci. 2003,43,1037-1050. 103. C.M. Krejsa, D. Horvath, S.L.
93. G.W. Bemis, M.A. Murcko, The Rogalski, J.E. Penzotti, B. Mao,
properties of known drugs. F. Barbosa, J.C. Migeon, Predicting
References I977
ADME properties and side effects: 113. R.M. Eglen, Functional G
the BioPrint approach, Curr. Opin. protein-coupled receptor assays for
Drug. Discov. Devel. 2003, 6 , 470-480. primary and secondary screening,
104. H. Roter, Large-scale integrated Comb. Chem. High Throughput
databases supporting drug discovery, Screen. 2005, 8, 311-318.
C u r . Opin. Drug. Discou. Devel. 2005, 114. C. Williams, A. Sewing, G-protein
8, 309-315. coupled receptor assays: To measure
105. T.Klabunde, A. Evers, GPCR affinity or efficacy that is the
antitarget modeling: Pharmacophore question, Comb. Chem. High
models for biogenic amine binding Throughput Screen. 2005, 8, 285-292.
GPCRs to avoid GPCR-mediated side 115. D. Colquhoun, Binding, gating,
effects, Chembiochem 2005, 6, affinity and efficacy: the
876-889. interpretation of structure-activity
106. E.S. Huang, Predicting ligands for relationships for agonists and of the
orphan GPCRs, Drug Discou. Today effects of mutating receptors, Br. J.
2005, 10,69-73. Pharmacol. 1998, 125,924-947.
107. A. Gaulton, T.K. Attwood, 116. I. Sabroe, M.J. Peck, B.J . van Keulen,
Bioinformatics approaches for the A. Jorritsma, G. Simmons, P.R.
classification of G-protein-coupled Clapham, T.J. Williams, J.E. Pease, A
small molecule antagonist of
receptors, Curr. Opin. Phamacol.
chemokine receptors CCRl and
2003,3,114-120.
108. E.S. Huang, Construction of a
CCR3. Potent inhibition of
eosinophil function and
sequence motif characteristic of
CCR3-mediated HIV-1 entry, /. Biol.
aminergic G protein-coupled
Chem. 2000, 275,25985-25992.
receptors, Protein Sci. 2003, 12,
117. M. Marcoli, S. Scarrone, G. Maura,
1360- 1367.
109. T.M. Frimurer, T.Ulven, C.E. Elling,
G. Bonanno, M. Raiteri, A subtype of
the y-aminobutyric acid B receptor
L.O. Gerlach, E. Kostenis, regulates cholinergic twitch response
T. Hogberg, A physicogenetic in the guinea pig ileum, J. Pharmacol.
method to assign ligand-binding Exp. Ther. 2000, 293, 42-47.
relationships between 7TM 118. D. Gunvitz, R. Haring,
receptors, Bioorg. Med. Chem. Lett. Ligand-selective signaling and
2005, 15,3707-3712. high-content screening for GPCR
110. C.Y. Lin, M.G. Varma, A. Joubel, drugs, Drug Discou. Today 2003, 8,
S. Madabushi, 0. Lichtarge, D.L. 1108- 1109.
Barber, Conserved motifs in 119. H.P. Fischer, S. Heyse, From targets
somatostatin, D2-dopamine, and to leads: the importance of advanced
alpha 2B-adrenergic receptors for data analysis for decision support in
inhibiting the Na-H exchanger drug discovery, Curr. Opin. Drug.
NHE1,J. Biol. Chern. 2003, 278, Discov. Devel. 2005, 8, 334-346.
15128-15135. 120. T. Kenakin, Predicting therapeutic
111. A. Cacace, M. Banks, T. Spicer, value in the lead optimization phase
F. Civoli, J. Watson, An ultra-HTS of drug discovery, Nat. Rev. Drug
process for the identification of small Discov. 2003, 2,429-438.
molecule modulators of orphan 121. A. Couve, A.R. Calver, B. Fairfax, S.J.
G-protein-coupled receptors, Drug Moss, M.N. Pangalos, Unravelling
Discov. Today 2003,8, 785-792. the unusual signalling properties of
112. D.Gabriel, M. Vernier, M.J. Pfeifer, the GABA(B)receptor, Biochem.
B. Dasen, L. Tenaillon, R. Bouhelal, Phamacol. 2004, 68,1527-1536.
High throughput screening 122. M. Waldhoer, J. Fong, R.M. Jones,
technologies for direct cyclic AMP M.M. Lunzer, S.K. Sharma,
measurement, Assay Drug Dev. E. Kostenis, P.S. Portoghese, J.L.
Technol. 2003, I , 291-303. Whistler, A heterodimer-selective
978
agonist shows in vivo relevance of G 125. K.M. Small, D.W. McGraw, S.B.
protein-coupled receptor dimers, Liggett, Pharmacology and
Proc. Natl. Acad. Sci. U.S.A. 2005, physiology of human adrenergic
102,9050-9055. receptor polymorphisms, Annu. Rev.
123. J. Bockaert, L. Fagni, A. Dumuis, Pharmacol. Toxicol. 2003, 43,
P. Marin, GPCR interacting proteins 381-411.
(GIP), Phamtacol. Thher. 2004, 103, 126. E.A. Hallem, A. Nicole Fox, L.J.
203-221. Zwiebel, J.R. Carlson, Olfaction:
124. W.E. Evans, H.L. McLeod, mosquito receptor for human-sweat
Pharmacogenomics-drug disposition, odorant, Nature 2004, 427, 212-213.
drug targets, and side effects, N. Engl.
J . Med. 2003,348,538-549.
Chemical Biology
75.5 Drugs Targeting Protein-Protein interactions 1 979
15.5
Drugs Targeting Protein- Protein Interactions
Patrick Che'ne
Outlook
Most of the biological processes involve permanent and nonpermanent

interactions between different proteins, and many protein complexes play
a key role in various human diseases. Therefore, molecules preventing the
formation of these protein complexes could be valuable new therapeutic
agents to treat these diseases. Protein interfaces have not evolved to bind
low-molecular-weight molecules - as is the case with enzyme catalytic sites. It
is therefore difficult to identify small compounds that inhibit protein-protein
interactions. However, there is considerable diversity in the structure of protein
interfaces, some of which may be more attractive than others for medicinal
chemistry. One of the main challenges in drug discovery is therefore to identify
these interfaces and to exploit their properties to make marketable drugs. In
this article, the properties of protein interfaces will be studied in the light of
their use as drug targets.
15.5.1
Introduction
The discovery of new drug targets is a constant challenge for pharmaceutical

companies. In the last few decades, most drugs that have been developed are
enzyme inhibitors [I]. One reason to explain this preference is that enzymes
bind naturally to small molecules, their substrate. This therefore offers
the possibility of identifying small molecules - which mimic the substrate
and bind to these proteins, inhibiting their biological activity. For example,
transition state analogs bind with high affinity to enzymes and are potent
inhibitors [Z]. Furthermore, because enzyme inhibitors are normally small
molecules they usually have an acceptable bioavailability, which facilitates
their development. Currently, however, while many enzymes have still not
been targeted or are in the process of being evaluated in a more systematic
fashion [3], the pharmaceutical industry is looking for new opportunities
outside the enzyme field. Amongst the potential candidates, inhibitors of
protein-protein interactions represent an attractive new class of molecules.
Many proteins, including enzymes, exert part if not all of their biological
activity by interacting with other proteins. The prevention of these interactions
is then a way of modulating the activities of these proteins. The structural
diversity and large number of protein-protein interfaces offer an enormous
amount of new targets for the pharmaceutical industry. A certain caution is
Copyright 0 2007 WILEY-VCH Verlag GmbH 6 Co. KGaA. Weinheim
ISBN: 978-3-527-31150-7
980
I needed, however, because this large number of possible new targets may not
15 Target Families
be as enormous as it appears. Protein interfaces have not evolved to bind to

low-molecular-weight molecules, as did enzymes. It might therefore be more
difficult to identify protein-protein interaction inhibitors than it is to identify
enzyme inhibitors. A second difficulty comes from the diversity of the protein
interfaces. Since large families of enzymes bind to the same substrate (e.g.,
ATP for the kinases), it is possible to use the knowledge gained and the
compound libraries that were synthesized to target the first members of the
family to design more rapidly compounds that target new members of the
family. This of course dramatically enhances the speed of the drug discovery
process. In the case of protein-protein interactions, even if similarities have
been observed between some interfaces, it does not appear that binding sites
are preserved amongst protein interfaces. Therefore, each protein interface is
rather unique and new strategies in chemistry (synthesis and optimization
of new scaffolds) may have to be developed for each new protein-protein
interaction that is studied. This is of course more time consuming and less
attractive for the pharmaceutical companies because they have to maintain
high productivity in the very competitive field of drug discovery. The following
section presents an overview of the properties of protein interfaces followed
by the application of this knowledge to the design of competitive inhibitors of
protein-protein interaction.
15.5.2
The Diversity o f Protein-Protein Interfaces
In living organisms, a large number of proteins form transient or permanent

complexes to exert their biological function, and recent studies have revealed
the complexity of these protein-protein interaction networks [4].Since so
many protein-protein interactions occur in cells, one can expect differences
in the structure and composition of the regions of the proteins committed to
the formation of these complexes. These differences are necessary to reach
the degree of specificity needed to form the “right” complexes in the crowded
cellular environment and to obtain complexes with different stabilities. For
example, the protein concentration in the endoplasmic reticulum is estimated
to be 100 mg mL-l [5].This diversity of the protein interfaces is an opportunity
for drug discovery because it may allow more specific inhibitors to be generated.
However, it is very likely that many of these interfaces do not have the
properties required for the design of potent inhibitors. It is therefore very
important - before starting any drug discovery process aimed at designing
competitive inhibitors of protein-protein interactions - that the druggability
of the selected interfaces be evaluated. This depends on both the structure
and the physicochemical properties of the interface. In this section, we will
summarize the general properties of protein interfaces.
15.5 Drugs Targeting Protein-Protein lnteractions 1 981
Protein complexes are formed from identical subunits (homo-oligomers)

or from different subunits (hetero-oligomers) [6]. These oligomers can be
formed directly during protein synthesis (obligate complexes) or on the
encounter (nonobligatecomplexes) between the different subunits. The protein
complexes also have a different half-life. Permanent complexes are very stable
and their subunits remain associated, while others exist only transiently
(nonpermanent) and their chains associate/dissociate more easily. This means
that the subunits of some protein complexes (obligate/permanent) never exist
in cells as stable independent structures. Furthermore, the formation of some
complexes depends on the presence of effector molecules (e.g., GTP), on
changes in protein expression/localization, or on physiological conditions
(e.g., pH).
These general properties are already valuable for drug discovery. Targeting
the interface of permanent oligomers is a priori a difficult task since the only
way to abolish this type of interaction is to identify compounds that act during
protein synthesis/folding. However, it is conceivable that compounds may be
identified that, upon binding to the contact surface of one subunit, prevent
interaction with the other subunit in nonpermanent oligomers. Finally, the
synthesis of compounds mimicking the natural effector might be an attractive
way of inhibiting the formation of effector-regulated complexes. In this case,
the inhibitors are designed in such a way that they bind not at the protein
interface but to the effector-binding pocket. Depending on the structure of
the effector-binding site, the design of such inhibitors might be similar to the
design of enzyme inhibitors. This type of approach will not be considered here,
which focuses on compounds that, on binding at protein interfaces, prevent
the association between two proteins (competitive inhibitors).
Upon binding, the components of a protein complex bury part of their
accessible surface to create the contact interface. On average, the size of
the subunit interface in permanent homodimers is larger than in other
protein complexes [6, 71. Jones et al. have studied a set of 59 complexes and
found that the surface buried in homodimers varies from 368 to 4746A'
while in heterocomplexes it ranges from 639 to 3228A2 [8]. Janin and
collaborators also found similar results [7, 91. The study of the structure
of the free and associated subunits shows that they are likely to undergo
-2
conformational changes when they form large interfaces (21500 A ) [lo, 111.
With the exception of coexpressed subunits (obligate complexes) [ll],it does
not seem that there is a strong correlation between the size of the interface
and the binding energy (AG'') [12]. However, the entire contact surface does
not contribute equally to binding. Some regions - recognition patches or hot
spots - are more important for recognitionlbinding [13]. These regions have
a core and a rim [14]with the more accessible rim residues surrounding the
more buried core residues. The amino acid composition of the rim is similar to
1) AG: Gibbs free energy. The change in Gibbs formula: AG = A H - TAS. A process occurs
free energy is linked to change in enthalpy spontaneously - at constant temperature and
(AH) and entropy (AS) by the following pressure - when A G c 0.
982
I the rest of the protein surface, while the core contains more aromatic residues
15 Target Families
revealing a higher lipophilicity for this part of the contact region. There is a
correlation between the number of recognition patches and the size of the
interface [7, 141. The larger the interface, the more hot spots are present.
However, in most cases, only one hot spot is present at the interface, and on
an average it buries 1560 =t340 A’ of surface upon binding. In interfaces with
multiple recognition patches, one of them is generally larger and it has a size
similar to that of the hot spots found in single-patch interfaces. The presence
of recognition patches at protein interfaces is interesting for drug discovery.
Compounds that interact with these hot spots should prevent interaction
because a large part of the binding energy is concentrated in these areas. Since
the hot spots are of a smaller size that the full interface, it might be easier
to identify low-molecular-weightcompounds - comparable in size to enzyme
inhibitors - that inhibit interaction. By contrast, if the binding energy were
equally distributed over the entire interface, much larger molecules, with a
lower likelihood of success as drug-development candidates, would have to be
designed.
The shape of the interface is another important parameter for drug discovery
because it is more difficult to obtain potent inhibitors for flat interfaces than
for interfaces that contain well-defined cavities (pockets). The less flat the
interface between two proteins, the greater the tendency of one partner to be
buried and to form a more stable complex. The heterocomplexes have more
planar interfaces than homodimers, and permanent heterocomplexes have
more twisted contact surfaces than nonpermanent ones [8]. This suggests
that the most attractive complexes for drug discovery - the nonpermanent
complexes (see above) - have rather flat interfaces. The presence of cavities
(pockets) at the contact region should therefore be looked at very carefully
during the evaluation of a protein-protein interaction.
Even if an interface contains cavities, they must be suitable for drug
discovery. Of course, they must be large enough to accommodate inhibitors,
but their shape complementarity is also important. It might be more difficult
to generate potent competitive inhibitors if the two interacting chains are
closely packed and make an extensive number of direct interactions*’. In
contrast, if within the cavity, the shape complementarity between the two
chains is low, the interacting subunits may only make a limited number of
direct interactions. For such cavities, it might be easier to improve the potency
of the inhibitors. A potent inhibitor should contain chemical groups that, upon
binding to the target protein, mimic the key interactions (the most important
for AG) made by the competing subunit and chemical groups, which make
new interactions with the target protein. The creation of these additional
contacts between the inhibitor and the target protein leads to a favorable
2) A direct interaction is an interaction that

does not involve any bridging water molecules
between the two interacting protein subunits.
15.5 Drugs Targeting Protein-Protein lnteractions I 983
enthalpic contribution (AH < 0) in the binding energy and, therefore, to an

increased potency. “Loose” interfaces have a higher probability of containing
atoms not directly involved in the formation of the protein complex than
do very Complementary protein contact regions. They therefore offer more
possibilities for improving the potency of inhibitors. Several methods are
used to determine the complementarity between two interacting proteins [ 101.
Thornton and collaborators [8] have used one method - the gap index - to
measure the Complementarity of different complexes. Their results show that
the homodimers and permanent heterodimers make more complementary
interfaces than the nonobligatory heterocomplexes. The latter may therefore
be more “druggable”. It should be kept in mind that these methods give an
indication of the atom density (packing) but not of the interaction network.
Since it is important - to enhance potency - that inhibitors make more
interactions than the competing chain, loose packing does not necessarily
imply that the cavity is a good drug target. During the study of an interface,
therefore, it is important, even in the case of “loose” interfaces, to carefully
check that, in addition to the key interactions made in the protein complex, it
is possible to create new interactions that will help enhance the potency of the
inhibitors.
One consequence of a lack of complementarity between two interacting
proteins is that water molecules are present at the interface to satisfy the H-bond
network between the subunits. The study of different protein interfaces shows
that contact regions with few cavities do not contain many water molecules,
while interfaces with more cavities contain a larger number ofwater molecules
that are used to maintain close packing at the interface [15]. These trapped
water molecules are involved in bridging H bonds between the two chains
[15]. Water is therefore an important element of the interaction, and it should
be considered during drug design. The displacement of key bound water
molecules by the inhibitor should enhance its affinity because of a favorable
entropic effect.
The presence of water molecules at the interface reflects its polar nature, but
protein contact regions also contain hydrophobic areas, which are important
for the interaction. In terms of energy, hydrophobic interfaces are more
suitable for drug discovery than polar ones. The partial desolvation of both
the protein and the inhibitor upon binding is a favorable component of the
binding energy. The design of molecules that contain lipophilic moieties is
then a prerequisite for obtaining potent drugs. The chemical nature of protein
interfaces has been extensively studied, and their content of polarlapolar
groups analyzed [6-10, 14, 16-18]. On an average, 56% nonpolar carbon-
containing groups, 29% neutral polar groups, and 15% charged groups are
present at protein interfaces [lo]. The interfaces in permanent complexes are
generally more hydrophobic than the ones of nonpermanent complexes [6].
This could be explained by the fact that solvent-exposed hydrophobic patches
are energetically unfavorable and that subunits with hydrophobic surfaces are
therefore not stable. The presence of hydrophobic cavities at the interface
984
between two proteins is particularly attractive for drug discovery because it

allows the design of lipophilic molecules which, upon binding, become buried
in a hydrophobic environment.
Another feature of the interaction between two proteins is the loss
of flexibility of their contact regions upon binding. It is expected from
thermodynamics that better binding is obtained when the interaction does
not induce a large loss of conformational entropy. Indeed, it has been shown
that protein interacting sites are less flexible than the rest of the protein
surface [19]. The loss of flexibility that occurs during the association between
two proteins can be advantageously used to design inhibitors. The design of
compounds conformationally constrained in such a way that they already take
on their bound conformation in solution is a way of improving potency. Such
molecules will not undergo large conformational changes upon binding, and
they will therefore “pay” a decreased entropic penalty when compared with
more flexible inhibitors.
Altogether, this short summary indicates that there is no common
recognition template used by oligomeric proteins to form complexes. In
contrast, even if protein interfaces share some general properties, they differ
to a large extent. It is therefore very difficult to make a general statement
about the druggability or nondruggability of protein interfaces. Amongst the
large number of protein interfaces, some are more druggable, and the major
challenge for drug discovery is to identify them.
15.5.3
A Proposed Decision Tree to Select Interfaces for Drug Discovery
To help in identifying druggable interfaces, a decision tree is proposed. Two

points need to be addressed before describing this tree. First, drug discovery
is not - at least today - an exact science. So even if an interface does not fit
the decision tree, it might still be possible to obtain molecules that prevent
its formation. This leads to the second point: the potency of protein-protein
interaction inhibitors. In many cases, molecules (peptides or low-molecular-
weight compounds) with IC503) in the micromolar range are described as
protein-protein interaction inhibitors. However, a large number of these
compounds - while they may be useful tools to study the interaction - will
never enter clinical use, which is the ultimate goal for pharmaceutical
companies. These molecules need further optimization to achieve this goal.
Protein-protein interaction inhibitors will only be considered attractive as new
drugs when they demonstrate clinical efficacy, as do enzyme inhibitors. Such
drugs can only be obtained if the target interface allows the design of potent
3) IC50: concentration of inhibitor required to

inhibit 50% of the interaction between two
proteins.
15.5 Drugs Targeting Protein-Protein Interactions I 985
and bioavailable molecules. A detailed analysis of the interface to assess its

druggability is therefore required before any drug discovery programme can
be started. The proposed decision tree may help in selecting the interfaces that
possess the structural and physicochemical properties required for the design
of potent inhibitors (Fig. 15.5-1).
Just as it is easier to pick cherries from a tree in daylight than during a
moonless night, so too is it easier to guide the drug discovery process when
it is possible to see the structure of the target interface. A drug discovery
programme can be successful even without using any structural information,
but it might be harder and take longer to obtain potent molecules without this
precious knowledge. The structure of the interface should help in deciding
Interface to evaluate
r-
I Structure I
Hydrophobicity I
Complementarity
Fig. 15.5-1 A decision tree t o evaluate the druggability

LF +/MF o f protein interfaces. This tree can be used to determine
whether a selected interface possesses some of the
features required for drug discovery. LF - less favorable;
Attractive interface MF - favorable,
986
I whether it is druggable
15 Target Families
- using the criteria described below - but it will also

help improve the potency of the compounds during their optimization. This
explains why the availability of the interface structure is considered the most
favorable case in the decision tree.
The second criterion in the decision tree is the presence of cavities at the
interface. The most favorable case is when a well-defined binding pocket is
found at the contact region between both proteins. The presence of such
a pocket allows the formation of a stable inhibitor-protein complex when
the inhibitor mimics the protruding chain. The contact region of some
uncomplexed proteins is flexible and, upon binding, this plasticity/flexibility
allows conformational changes that enhance interface complementarity. The
structure of the final protein complex may therefore not reveal the presence
of cavities that are present on the surface of the unbound proteins but absent
in the final complex. Compounds that bind to these pockets could block the
conformational changes required for the formation of the complex preventing
the interaction. The knowledge of the structure of the unbound proteins is
therefore very useful to identify this type of pocket.
The next selection criterion concerns the polarity of the selected cavity. In the
most favorable case, it should contain hydrophobic residues to favor the design
of lipophilic inhibitors. The addition of hydrophobic substitutions (taking care
to ensure their solubility) is an effective way of improving the potency of an
inhibitor thanks to the hydrophobic effect. It has been shown that electrostatic
interactions are important for the rate of association, but not for the stability of
protein complexes [20]. Furthermore, electrostatic interactions are weakened
by the high dielectric constant of water. It might therefore be more difficult
to identify inhibitors that bind tightly to the target cavity when it is essentially
polar.
The presence of a hydrophobic cavity is important, but its size is also relevant
for drug discovery. It should be large enough to accommodate an inhibitor. An
analysis of 20 marketed drugs shows that they have a solvent-accessible surface
ranging from 150 to 500 A2 [21], so the target cavity should accommodate such
molecules. On the other hand, the cavity should not be so large that the
key contact residues for the interaction are too distant from each other. In
such cases, inhibitors designed to contact these different residues might be
excessively large. Keeping the size of inhibitors small is important for their
bioavailability. As a general trend, the larger a synthetic molecule is, the lower
its bioavailability.
The last criterion of the decision tree is the shape complementarity between
the two interacting subunits within the cavity. The less favorable case is when
both chains are densely packed and make many direct interactions within the
cavity. As already mentioned, inhibitors should mimic the natural substrate
but in addition they should make additional contacts that help enhance
their potency. The cavity should therefore contain atoms that are not directly
engaged in the interaction between the two proteins such that it is possible
to design molecules that interact directly with them. Interfaces that possess
15.5 Drugs Targeting Protein- Protein Interactions I 987
cavities with low complementarity might therefore be more attractive. Since

water molecules are present in such cavities, the potency of the inhibitors
could be enhanced if they are designed in such a way that upon binding they
displace some key water molecules.
The analysis of protein interfaces using the proposed decision tree leads to the
selection of competitive inhibitors because it focuses on the characterization of
the contact region between the two proteins. However, it is important to note
that molecules that do not bind at the interface can also inhibit protein-protein
interactions. The potency of competitive inhibitors - as determined by the
measure of their ICso - is affected by the concentration of the substrate
(Fig. 15.5-2).The higher the concentration of the substrate, the less potent
the inhibitor becomes. Therefore, if the competing subunit is very abundant
and/or very stable (low turnover), so that it accumulates after inhibition of the
interaction, it might be more difficult to reach efficacy with low doses of a
competitive inhibitor. Higher doses of inhibitor will have to be administered
to counterbalance this effect, but then compound-related toxicity could arise.
Molecules that are not competitive inhibitors do not suffer these disadvantages.
These molecules, which do not bind at the interface, induce conformational
changes that prevent complex formation. Several such inhibitors - allosteric
inhibitors - have been identified; see, for example, Arkin in Table 15.5-1.
However, it is very likely that this strategy does not apply to every protein
complex. Furthermore, if such binding sites do exist, they must also possess
structural and physicochemical properties that allow the design of potent
compounds.
(PI, A I+P, + P * A PIP,
Fig. 15.5-2 Competitive inhibition. The PI P2 complex by 50%. Note the influence
inhibitor (I) binds t o the target protein P1 of [S]on 1 5 ~ . Cheng and Prusoff have
blocking its association with protein P2. 150 published a detailed analysis on the
corresponds to the concentration o f relationship between IC50 and inhibition of
inhibitor required to inhibit/inactivate the enzymes [22].
988
Table 15.5-1 Some articles reviewing the latest findings in the

discovery of protein-protein interaction inhibitors. These articles
cover the period 2000-2004
First author Title References
Arkin, M.R. Small-molecule inhibitors of protein-protein Nat. Rev. Drug. Discov.

interactions: progress toward the dream 2004,3,301
Pagliaro, L. Emerging classes of protein-protein interaction C u r . Op. Chem. Bid.
inhibitors and new tools for their development 2004, 8,442
Janin, Y.L. Peptides with anticancer use or potential A m i n o Acids 2003, 25, 1
Berg, T. Modulation of protein-protein interactions with Angew. Chern. 2003, 42,
small organic molecules 2462
Loregian, A. Protein-protein interactions as targets for antiviral Rev. Med. Virol. 2002, 12,
chemotherapy 239
Ockey, D.A. Inhibitors of protein-protein interactions Expert Opin. Ther.
Patents 2002, 12, 393
Huang, Z. The chemical biology of apoptosis: exploring Chem. Biol. 2002, 9,1059
protein-protein interactions and the life and death
of cells with small molecules
Perez-Montfort, R. The interfaces of oligomeric proteins as targets for Curr. Top. Med. Chem.
drug design against enzymes from parasites 2002,2,457
Toogood, P.L. Inhibition of protein-protein association by small 1.Med. Chem. 2002, 45,
molecules: approaches and progress 1543
Cochran, A.G. Antagonists of protein-protein interactions Chem. Biol. 2000, 7, R85
Zeng, J. Computational structure-based design of inhibitors Combi. Chem. High
that target protein surfaces Throughput Screen 2000,
3,355
Huang, Z Structural chemistry and therapeutic intervention of Pharmacol. Trer. 2000,
protein-protein interactions in immune response, 86,201
human immunodeficiency virus entry, and
apoptosis
15.5.4
Experimental Validation of the Selected Interface
All the selection criteria presented in the decision tree in Fig. 15.5-1 are
general, and many protein interfaces will only fulfill some of them. In these
cases - and also for interfaces that meet all the decision tree criteria - an
experimental study of the interface should be carried out before starting drug
discovery activities. This experimental validation should enable a good level of
confidence to be obtained on the druggability of the selected interface.
A powerful way of performing this experimental validation is to combine
site-directed mutagenesis and peptide-binding experiments. Site-directed
mutagenesis is used to demonstrate the role of selected residues in the
interaction, while peptides will help in mapping the binding site and also
in defining the importance of key amino acids. The synthesis of peptides
containing nonnatural amino acids can also be used to create new contacts
with the targeted subunit. This should help in validating some optimization
15.5 Drugs Targeting Protein-Protein interactions 1 989
strategies that could be used later on in the design of low-molecular-weight

compounds. It must be kept in mind that peptides can be used only if at
least one of the two contact regions at the interface is formed by a contiguous
stretch of amino acids. This is not often the case, and many protein-binding
sites are fragmented [8].
Peptides are also useful tools to demonstrate the validity of the biological
concept and thereby show that the inhibition of the selected protein-protein
interaction leads to the expected phenotype. Since peptides generally have a low
bioavailability, they often have to be coupled to special sequences that facilitate
their transport into cells [ 2 3 ] . Finally, the peptides can serve as starting points
for a drug discovery programme. They can be transformed to peptidomimetics
that - in some cases - can be further depeptidized.
15.5.5
Screening Techniques, Compound Libraries, and Targets
Since the goal of any drug discovery programme that deals with a
protein-protein interaction is to identify low-molecular-weight compounds
that bind to a well-defined pocket, the technologies and compound libraries
used to identify enzyme inhibitors can also be used to identify protein-protein
interaction inhibitors.
Various assays are used to identify competitive inhibitors of protein-protein
interactions, but the ones in which the inhibition of the complex is directly
measured - competition assays - are the most commonly used. Several assay
formats exist: enzyme-linked immunosorbent assay (ELISA), fluorescence
polarization, fluorescence resonance energy transfer, and others. These
assays are designed in such a way that they use either the two full-
length proteins, only their interacting domains, or even, when possible,
peptides that mimic the binding region. One must be very cautious with
this type of assay when determining ICsos. The potency of competitive
inhibitors depends on the amount of the competing protein present in
the assay (Fig. 15.5-2). The amount of competing protein present in the
assay may vary between laboratories and even between different protein
batches (change in specific activity). To obtain an accurate estimate of the
binding properties of the inhibitors, their h4’ should be measured. The data
obtained with the competition assay should therefore be completed with
the & measurements obtained, for example, by isothermal calorimetry.
Calorimetric measurements also provide valuable information about the
energy of the interaction, which can be used to further optimize the
compounds (e.g., to generate more enthalpy-driven or entropy-driven
compounds [24]).
4) &: apparent dissociation constant of the

protein-inhibitor complex.
75 Target families
990
I The other assays used to identify protein-protein interaction inhibitors
are the binding assays. In these cases, only one of the two interacting
chains is present and the binding of the compounds to this protein is
measured. Several assay formats are used: surface plasmon resonance, * H-”N
heteronuclear single quantum correlation nuclear magnetic resonance (NMR),
ultracentrifugation, and others. Many of these methods only indicate that the
compounds bind to the target protein, but they do not show that their binding
inhibits the interaction. This needs to be demonstrated in a subsequent
analysis (e.g., with a competition assay).
It is important to note that, in some competition and binding assays,
it is difficult to directly determine whether the inhibiting molecules are
competitive inhibitors. The inhibitors may bind to a pocket located outside
the interacting region and modulate the interaction by an allosteric effect. To
allow a better optimization of these inhibitors, their binding mode should be
firmly demonstrated. It is essential in this process to determine the structure
of the inhibitor-protein complex.
All types of compound libraries can be screened to identify protein-protein
interaction inhibitors: low-molecular-weight compound libraries, natural
compound libraries, peptide/peptidometic libraries, combinatorial chemistry
libraries, fragment libraries, and so on. A simple literature survey shows
that molecules belonging to these different types of libraries are described
as protein-protein interaction inhibitors. However, there is an argument
sometimes cited in the literature about the diversity of compounds in these
libraries: the libraries available in the pharmaceutical companies reflect their
drug discovery history. Since most of them have focused on the design of
enzyme inhibitors, it is possible that the structural diversity of their libraries
might not match what is required to identify protein-protein interaction
inhibitors. Although this might be the case, the increasing number of drug
discovery programmes dealing with protein interfaces will ensure that the
chemical diversity of these libraries will change, and they may contain more
compounds that prevent protein-protein interactions. An alternative reason
to explain the low success rate when randomly screening large libraries for
protein-protein interaction inhibitors is that the selected interfaces have low
druggability and that, independent of the chemical diversity of these libraries,
the probability of finding inhibitors is also low.
The availability of the three-dimensional structure of the protein complex
allows structure-driven drug discovery approaches. In this case, a phar-
macophore model is first established. This corresponds to identifying the
interactions that take place at the interface and which contribute most to
AG. The importance of these interactions can be validated by site-directed
mutagenesis or, when possible, by the use of peptides. Once these interactions
are validated, molecules containing chemical groups mimicking these key
interactions are selected from compound libraries and tested. Very often these
initial molecules are not optimal (e.g., they do not make all the key contacts)
and they must be modified to enhance their potency. This is done, for example,
by adding the missing pharmacophores and/or by creating contacts that are

not present in the natural complex. Alternatively, de novo drug design may be
carried out. In this case a “very basic” scaffold - which mimics only few of the
key interactions made by the competing subunit - is selected and modified
progressively to obtain molecules that contain the different pharmacophores.
This of course is very time consuming and resource demanding, because the
affinity of the initial scaffold is usually low and a great deal of chemistry is re-
quired to improve its potency. The structure-drivenand screening approaches
are not mutually exclusive, but the former require good comprehension of the
interaction, while the latter can be used without information regarding the
target interface.
The list of protein-protein interactions that have been the subject of drug
discovery programmes is constantly increasing, and many excellent articles
have reviewed the latest findings in this area. Some of these reviews are
listed in Table 15.5-1, and they can provide the reader with idea of the
protein-protein interactions that have already been selected as targets for
drug discovery programmes and on the inhibitors that have been identified in
these studies. In the following section, we will focus on one protein interface:
the p53-hdm2 interface. This protein-protein interaction has been selected
from the literature because, when the various results are put together, the
work carried out by the different research groups working on this interface
makes up a very nice case study for the design of competitive protein-protein
inhibitors.
15.5.6
An Example: The Design o f Inhibitors ofthe p53-hdm2 Interaction
15.5.6.1 Biological Background

The p53 protein is a transcription factor that regulates the expression of
several genes with different biological functions, such as cell-cycle regulation,
apoptosis, DNA repair, and differentiation [25]. The loss of p53 function has
dramatic consequences and the p53 gene is deleted or mutated in more than
50% of human cancers [26]. The overexpression of the hdm2 protein can
also lead to the inactivation of p53. The p53 and hdm2 proteins form an
autoregulatory feedback loop [27, 281: p53 stimulates the expression of hdm2,
which in turn acts negatively on p53 in several ways (Fig. 15.5-3). It inhibits
its transcriptional activity [29], promotes its degradation [30, 311, and favors
its export from the nucleus [32]. The hdm2 gene is amplified in about 7% of
human cancers [33], and hdm2 is overexpressed in different types of tumors
[34, 351. It is therefore likely that the p53 pathway is not active in these tumors,
because the overexpressed hdm2 protein constantly inhibits the p53 protein.
The idea that several pharmaceutical companies have pursued is to generate
molecules, which by preventing the p53-hdm2 interaction will activate the
p53 pathway in these tumors and thereby show anticancer activity.
992
Fig. 15.5-3 Regulation o f p53 by hdm2. mediate other biological answers such as
The tumor suppressor p53 is a tetrameric senescence. hdm2 is a negative regulator o f
transcription factor. Upon various stress p53. Upon binding t o p53 it inhibits its
conditions such as DNA damage, and transcriptional activity, promotes its
activation of various oncogenes or hypoxia, degradation, and favors its export from the
p53 is activated and binds to DNA. nucleus. Therefore, in the presence of hdm2
Depending on the cell line and/or the the tumor suppressor activity o f p53 is
cellular stress, p53 induces either a cell-cycle inhibited.
arrest or apoptosis. p53 is also able to
15.5.6.2 Characterization of the Interface

Yeast two-hybrid screen [3G]and immunoprecipitation experiments [37]were
initially used to map the two contact regions between both the proteins.
The hdm2-binding domain on p53 was localized between residues 1 and 52
[36, 371, and the p53-binding domain on hdm2 between residues 1 and 118
[36, 371. Further studies using site-directed mutagenesis identified Leul4,
Phel9, Leu22, and Trp23 as key p53 contact residues [38], and a minimal
hdm2-binding site on the p53 protein was mapped between residues 18 and
23 [39]. The strength of the interaction (&)between p53 peptides and hdm2
fragments has been determined by several methods and, depending on the
length of these fragments and the methodology used, & values between GO
and 700 nM have been obtained.
The availability of the structure of a p53 peptide (residues 15-29) in complex
with a hdm2 fragment (residues 17-125) permits a more detailed analysis of
the interface (Fig. 15.5-4(a))[40].The p53-binding site on the hdm2 protein
is a cleft, about 25 A long and 10 A wide. In the bound p53 peptide, residues
19 to 25 form an a-helix, and residues 17, 18, and 26 to 29 take a more
extended conformation. The structure of the bound p53 peptide is stabilized

by several intramolecular hydrogen bonds. This first observation indicates that
hdm2 is the only one of the two proteins to possess a well-defined pocket.
Inhibitors then have to be designed in such a way that they mimic p53.
The calculated accessible surface area buried at the interface on hdm2 and
p53 is about 660 and 809A2, respectively. So the interface between these
two proteins is not excessively large, and it can accommodate standard sized
drugs (see above). The determination of the planarity [8]of the hdm2 contact
region is 3.1. This confirms that the contact region is not flat but twisted in
agreement with the presence of the above-described pocket. NMR experiments
show that p53-derived peptides do not take a well-defined structure in solution
[41, 421 suggesting that the p53 fragments only adopt the observed helical
conformation when bound to hdm2. This structural organization of p53
upon binding is associated with a decrease in entropy, and experimental
data give a change in entropy of -40.4 cal mol-' for the binding of a p53
fragment to hdm2 [43]. Upon p53 binding, conformational changes are also
detected within the hdm2 protein [44, 451. The interaction between p53 and
hdm2 is essentially hydrophobic, and 70% of the atoms at the interface are
nonpolar. The three amino acids Phel9, Trp23, and Leu26 from p53 are
located on the same side of the helix and their lateral chain point is toward
the hdm2 protein (Fig. 15.5-4(a)).These amino acids make several interactions
with hydrophobic hdm2 residues (Leu54, Leu57, Ile61, Met62, Tyr67, Va175,
Va193, Phe86, Ile99, Phe91, and Ile103). Only three direct hydrogen bonds are
present at the interface (p53 Phel9 - hdm2 Gln72; p53 Trp23 - hdm2 Leu54;
p53 Am29 - hdm2 TyrlOO),and there is no water molecule bridging the two
contact regions. This suggests high packing at the interface. Indeed, the gap
volume [46]between both proteins is 892 A3 and the gap volume index (ratio
between the gap volume and the interface accessible surface area) [8]is 0.61 A.
Altogether the structural study of the p53-hdm2 interface suggests that
there is a good likelihood of it being a druggable target. It fits most of the
criteria of the decision tree presented in Fig. 15.5-1 (except for its high shape
complementarity). Furthermore, since the p53 contact region is formed by
only one segment of contiguous amino acids, peptides mimicking p53 can be
used to establish/confirm a pharmacophore model and to study the effect of
the inhibition of the p53-hdm2 interaction in tumor cells.
15.5.6.3 Establishment o f a Pharmacophore Model and its Validation

The structure of p53 in complex with hdm2 [40] and the initial data obtained
with p53-derived peptides [36, 391 indicate that peptides can be used to
study this interaction to establish a pharmacophore model. Phage display
experiments [47]allowed the identification of a 12-mer phage-derived peptide
(peptide 2, Table 15.5-2)that is 29 times more potent than the wild-type peptide
(peptide 1,Table 15.5-2)[48].Peptide 2 was truncated to eight residues, leading
to a peptide with micromolar activity (peptide 3, Table 15.5-2) [48]. It should
994
Fig. 15.5-4 The structure o f p53 (residues chain is manually located in the structure of
17 t o 29) in complex with hdm2 (residues 25 the p53-hdm2 complex. The backbone o f
t o 109) [40]. (a) The surface o f hdm2 is the p53 peptide is shown in gray and hdm2
represented in white, the p53-binding site in Lys94 i s represented. (c) The different hdm2
green, and the p53 peptide in red. The residues (Leu57, Phe86, lle99, and lle103)
lateral chains o f p53 Phel9, Trp23, and surrounding p53 Trp23 are indicated and
Leu26 are shown. (b) p53 Leu22 has been their van der Waals surface is represented in
replaced by a tyrosine residue and the lateral green.
15.5 Drugs Targeting Protein-Protein lnteractions 1 995
Table 15.5-2 Example of peptidic inhibitors used as tool

compounds for studying the p53-hdm2 interaction. The lC50
values were obtained in a competition assay [49]. The position of
the three key residues Phel9, Trp23, and Leu26 is indicated
Peptide Sequence G o ( F M)
1 Ac-Gln-Glu-Thr-Phe’9--Ser-Asp-Leu-Trp23-Lys-Leu-Leu26-Pro-NH~
8.7
2 Ac-Met-Pro-Arg-Phe”-Met-A~p-Tyr-Trp~~-Glu-Gly-Leu~~-Asn-NH~
0.3
3 A~-Phe”-Met-Asp-Tyr-Trp~~-Glu-Gly-Leu~~-N
HZ 8.9
4 Ac-Phe” -Met-Aib-Tyr-Trpz3
-Glu-Ac3
c-Leuz6- N H1 2.2
5 A~-Phe”-Met-Aib-Pmp-GC1Trp~’-Glu-A~3~-Leu~~-NHz 0.005
Aib - a-amino isobutyric acid;

Acjc - I-amino-cyclopropanecarboxylicacid;
Pmp phosphonomethylphenylalanine;
~
6-CI-Trp 6-chloro-tryptophan.
~
be noted that further deletions of peptide 3, which remove the essential

residues Phel9 or Leu26, induce a dramatic drop in activity. Since short
peptides are usually very flexible in solution and because the bound p53 takes
a well-ordered structure when bound to hdm2, the next step was to decrease
peptide 3 flexibility to decrease the entropic penalty “paid” upon binding. The
two nonnatural amino acids - a-amino isobutyric acid (Aib) and l-amino-
cyclopropanecarboxylic acid ( A c ~ c -) were used to fix the conformation of
the peptides in solution [49, 501. Different peptides were synthesized and
the more potent peptide 4 was obtained (peptide 4, Table 15.5-2). N M R
measurements confirm a higher preorganization in solution for peptide 4.
This peptide was modified to determine whether its potency can be improved
by making new interactions with the hdm2 protein. Tyr22 was replaced by
a phosphonomethylphenylalanine (Pmp) and Trp23 replaced by a 6-chloro-
tryptophan (6-C1-Trp)[49].The modification at p53 Tyr22 creates a salt bridge
with the amino group ofhdm2 Lys94 (Fig. 15.5-4(b)).The addition ofa chlorine
atom at position 6 on Trp23 was used to fill a small hydrophobic cavity formed
by the hdm2 residues Leu57, Phe86, Ile99, and Ile103, which are unoccupied
in the p53-hdm2 complex (Fig. 15.5-4(c)).Making these new contacts via
Pmp22 and 6-Cl-Trp23 results in an approximately 440-fold increase in the
potency of peptide 4 (compare peptides 4 and 5, Table 15.5-2). This gain in
potency is probably associated with a more favorable enthalpic contribution in
the binding energy.
Altogether this study with the peptides shows that the key contacts made
by p53 Phel9, Trp23, and/or Leu26 are important for the binding of
p53 to hdm2 and, therefore, nonpeptidic inhibitors should mimic these
important interactions. The work carried out with the peptides containing
nonnatural amino acids also indicates that, despite the high complementarity
of the interface, it is possible to create additional interactions with hdm2
75 Target Families
996
I (e.g., Pmp22 and 6-C1-Trp23)enhancing the potency of the inhibitors. This
could also be exploited with nonpeptidic inhibitors.
The peptides were also used to demonstrate that inhibition of the p53-hdm2
interaction in tumor cells leads to activation ofthe p53 pathway. Three different
strategies have been used to introduce the p53 peptides in cells. Peptide 2
has been inserted into the Escherichia coli thioredoxin protein [Sl] or fused to
the glutathione S-transferase protein [52]and peptide 5 has been directly used
without further modification [53, 541. The data obtained with these different
tools reveal that p53-hdm2 interaction inhibitors stimulate p53 activity (as
measured by the induction of p53-regulated genes) in different tumor cells.
These results are expected, since preventing the hdm2-mediated degradation
of p53 should induce its accumulation in cells and, as a consequence, its
activation. The activation of p53 by the peptides induces either a cell cycle
or apoptosis, depending on the tumor cell lines, revealing that p53-hdm2
inhibitors have an antiproliferative effect and, therefore, they behave as
anticancer drugs.
The peptides were also used to study the effect of inhibiting the p53-hdm2
interaction in vivo. A p53 peptide (residues 16-27) was linked to the Tat
transduction sequence and was used in New Zealand white rabbits with
intraocular retinoblastoma [55]. Injecting this peptide into the interior chamber
induced tumor regression, and apoptosis was observed. This effect is specific
to the tumor cell, since the peptide induced damage only to the tumor and
not to the surrounding ocular tissues (lens, cornea, retina, etc.). These in vivo
experiments suggest that inhibitors of p53-hdm2 have an anticancer activity
in vivo and, in addition, they may not be toxic to nontumour tissues. This latter
information is of importance, since p53-hdm2 inhibitors also activate p53 in
nontumour cells [54].
Biological validation is a key step in any drug discovery programme because,
even if a protein-protein interaction is a “top” drug target for medicinal
chemistry, its inhibition should lead to the expected biological output. In the
case of the p53-hdm2 interaction, the results obtained both in vitro and in vivo
tend to demonstrate that inhibitors of this interaction will exert an anticancer
activity in at least some tumors.
15.5.6.4 The Synthesis o f Low-molecular-weightCompounds

For many years, the only synthetic low-molecular-weight inhibitors of
the p53-hdm2 interaction described were not very potent. Only chalcone
derivatives (G - Fig. 15.5-5) [45], some polycyclic compounds (7 - Fig. 15.5-5)
[56], and sulfonamides (8 - Fig. 15.5-5) [57] were described. A fungal
metabolite, chlorofusin (9 - Fig. 15.5-5),was also described as an inhibitor
of the p53-hdm2 interaction [58]. Finally, 1,4-benzodiazepine-2-ones were
proposed from a computational approach (10 - Fig. 15.5-5) [59].
These data were not very encouraging and, despite the attractiveness of this
approach, it seemed that not only the druggability of the p53-hdm2 interaction
I
75.5 Drugs Targeting Protein-Protein interactions 997
Q 10
- O q OH
O
6
CI
Fig. 15.5-5 Low-molecular-weight inhibitors ofthe p53-hdrn2

interaction. 6 Chalcone derivative [45], 7 polycyclic compound [56],
8 sulfonamide [57], 9 chlorofusin, 1 0 1,4-benzodiazepine-2-one
[59], 11 cis-imidazoline [60].
was not as good as predicted by the structural analysis of the interface but
also obtaining potent low-molecular-weight inhibitors was not an achievable
goal. However, scientists at Hoffmann-La Roche recently demonstrated the
feasibility of inhibiting the p53-hdm2 interaction with low-molecular-weight
compounds. Since the publication of the first reports on peptidic inhibitors
of the p53-hdm2 interaction, it took about 10 years to obtain such results!
By screening a library of synthetic chemicals, Vassilev et al. were able to
identify cis-imidazolines (11 - Fig. 15.5-5),which they optimized for potency
and specificity [60]. These compounds bind at the p53-binding site on hdm2,
and their different substitutions mimic the key contacts made by p53 Phel9,
Trp23, and Leu26 (Fig. 15.5-6). Furthermore, the halogen (C1 or Br) present
on one of their phenyl groups mimics the chlorine atom of 6-C1-Trpin peptide
5. Finally, these molecules build up around a heterocycle and have a rigid
conformation that minimizes the entropic contribution upon binding. Their
potency (ICso), measured in a competition assay, is in the 100 to 300nM
998
Fig. 15.5-6 Binding mode of [40] have been superimposed. Only the
cis-imidazoline and p53 peptide. The bound cis-imidazoline and the p53 peptide
structures of the cis-imidazoline-hdm2 (in red) are represented. The lateral chain of
complex [60]and the p53-hdm2 complex p53 Phel9, Trp23, and Leu26 are shown.
range. These compounds are active in various tumor cells (IC50 between 1
and 2 pM), in which they induce the activation of the p53 pathway. More
importantly, they show efficacy as single agents in a tumor model in mice. One
of these compounds (11- Fig. 15.5-5) given orally at a dose of 200 mg kg.-'
twice daily for 20 days induces 90% inhibition of tumor growth (i.e., of cells
overexpressing hdm2). This treatment does not induce toxicity as measured by
bodyweight measurements and necropsy. These data are highly encouraging,
and it will be very exciting to see the effect of these molecules - or of their
follow-up - in the clinic.
15.5.7
Conclusions
The design of protein-protein interaction inhibitors is a hot topic in drug

discovery today because many protein interfaces are exciting targets for
pharmaceutical companies. However, one should be cautious while making
any assumption that designing protein-protein interaction inhibitors will be a
new Eldorado for the pharmaceutical industry or conversely that programmes
based on protein-protein interactions should be avoided because of the low
probability of obtaining potent inhibitors. Protein interfaces are quite unique,
References I999
and the only way to decide whether an interface is a “good” or a “bad” target for
drug discovery is to carry out a careful analysis of its structure before starting
any drug discovery activity. This should help in selecting better targets, thereby
reducing the risk of investing time and resources in programmes that do not
deliver the expected molecules. The p53-hdm2 interaction is one example of
the interfaces that have been successfully targeted with low-molecular-weight
compounds (see also Table 15.5-1). Many other protein-protein interactions
are under investigation, and it is likely that new inhibitors of protein-protein
interaction will be described in the future.
References
1. A.L. Hopkins, C.R. Groom, The 11. I.M. Nooren, J.M. Thorton, Structural
druggable genome, Nat. Rev. Drug characterisation and functional
Discov. 2002, 1, 727-730. significance of transient
2. A.R. Fersht, Enzyme Structure and protein-protein interactions, 1.Mol.
Mechanism, 2nd ed., Freeman, New Biol. 2003, 325, 991-1018.
York, 1985. 12. N. Brooijmans, K.A. Sharp, I.D.
3. P. Chene, The ATPases: a new family Kuntz, Stability of macromolecular
for a family-based drug design complexes, Proteins 2002, 48, 645-653.
approach, Expert Opin. Trter. Targets 13. W.L. DeLano, Unraveling hot spots in
2003, 7,453-461. binding interfaces: progress and
4. S. Li, C.M. Armstrong, N. Bertin, challenges, Curr. Opin. Struct. Bid.
H. Ge, S. Milstein, M. Boxem, P.O. 2002, 12, 14-20.
Vidalain, J.D. Han, A. Chesnau, 14. P.Chakrabarti, J. Janin, Dissecting
T. Hao, D.S. Goldberg, A map of protein-protein recognition sites,
interactome network of the metazoan Proteins 2002, 47, 334-342.
C. elegans, Science 2004, 303, 540-543. 15. J . Janin, Wet and dry interfaces: the
5. B. Kleizen, I. Braakman, Protein role of solvent in protein-protein and
folding and quality control in the protein-DNA recognition, Structure
endoplasmic reticulum, Curr. Opin. 1999, 7, R277-R279.
Cell Biol. 2004, 16, 343-349.
16. C.J. Tsai, S.L. Lin, H.J. Wolfson,
6. I.M.A. Nooren, J.M. Thornton,
R. Nussinov, Protein-protein
Diversity of protein-protein
interfaces: architectures and
interactions, E M B O ] . 2003, 22,
interactions in protein-protein
3486-3492.
interfaces and in protein cores. Their
7. R.P. Bahadur, P. Chakrabarti,
similarities and differences, Crit. Rev.
F. Rodier, J. Janin, Dissecting subunit
interfaces in homodimeric proteins, Biochem. Mol. Biol. 1996, 3 1 ,
Proteins 2003, 53, 708-719. 127-152.
17. T.A. Larsen, A.J . Olson, D.S. Goodsell,
a. S. Jones, J.M.Thornton, Principles of
protein-protein interactions, Proc. Morphology of protein-protein
Natl. Acad. Sci. U.S.A. 1996, 93, 13-20. interfaces, Structure 1998, 6, 421-427.
9. L. Lo Conte, C. Chothia, J. Janin,The 1a. Y. Ofran, B. Rost, Analysing six types
atomic structure of protein-protein of protein-protein interfaces, /. Mol.
recognition sites, /. Mol. Biol. 1999, Biol. 2003, 325, 377-387.
285,2177-2198. 19. C. Cole, J . Wanvicker, Side-chain
10. S.J.Wodak, J. Janin, Structural basis conformational entropy at
of macromolecular recognition, Adu. protein-protein interfaces, Protein Sci.
Protein Chem. 2003, 61, 9-73. 2002, 1 I , 2860-2870
1000
20. J.A. Wells, Binding in the growth protein ligase for itself and p53,]. Biol.
hormone receptor, Proc. Natl. Acad. Chem. 2000,275,8945-8951.
Sci. U.S.A. 1996, 93, 7-12. 32. J . Roth, M. Dobbelstein, D.A.
21. T.R. Gadek, J.B. Nicholas, Small Freedman, T. Shenk, A.J. Levine,
molecule antagonists of proteins, Nucleo-cytoplasmic shuttling of the
Biochem. Pharmacol. 2003, 65, 1-8. hdm2 oncoprotein regulates the levels
22. Y.C. Cheng, W.H. Prusoff, of the p53 protein via a pathway used
Relationship between the inhibition by the human immunodeficiency
constant (Ki) and the concentration of virus rev protein, E M B O ] . 1998, 17,
inhibitor which causes 50 per cent 554-564.
inhibition (IC50) of an enzymatic 33. J. Momand, D. Jung, S. Wilczynski,
reaction, Biochem. Pharmacol. 1973, J. Niland, The MDM2 gene
22,3099-3108. amplification database, Nucleic Acids
23. J.J.Schwartz, S. Zhang, Res. 1998,26, 3453-3459.
Peptide-mediated cellular delivery, 34. B. Eymin, S. Gazzeri, C. Brambilla,
C u r . Opin. Mol. Tner. 2000, 2, E. Brambilla, Mdm2 overexpression
162-167. and pl4ARF inactivation are two
24. A. Velazquez-Campoy, I. Luque,
mutually exclusive events in primary
E. Freire, The application of
human lung tumors, Oncogene 2002,
thermodynamic methods in drug
21,2750-2761.
design, Tnermochim. Acta 2001, 380,
35. D. Polsky, B.C. Bastian, C. Hazan,
217-227.
K. melzer, J. pack, A. Houghton,
25. K.H. Vousden, X. Lu, Live or let die:
K. Busam, C. Cordon-Cardo, I . Osam,
the cell’s response to p53, Nat. Rev.
hdm2 protein overexpression, but not
Cancer 2002, 2, 594-604.
26. T. Soussi, K. Dehouche, C. Beroud, amplification, is related to
p53 website and analysis of p53 gene tumorigenesis of cutaneous
mutations in human cancer: forging a melanoma, Cancer Res. 2001, 61,
link between epidemiology and 7642-7646.
carcinogenesis, Hum. Mutat. 2000, 15, 36. J.D. Oliner, J.A. Pietenpol,
105-213. S. Thiagalingam, J. Gyuris, K.W.
27. S.M. Picksley, D.P. Lane, The Kinzler, B. Vogelstein, Oncoprotein
p5 3-mdm2 autoregulatory feedback mdm2 conceals the activation domain
loop: a paradigm for the regulation of of tumour suppressor p53, Nature
growth control by p53?, BioEssays 1993,362,857-860,
1993, 15,689-690. 37. J. Chen, V. Marechal, A.J. Levine,
28. X. Wu, J.H. Bayle, D. Olson, A.J. Mapping of the p53 and mdm-2
Levine, The p53-mdm-2 interaction domains, Mol. Cell. Biol.
autoregulatory feedback loop, Genes 1993, 13,4107-4114.
Deu. 1993, 7,1126-1132. 38. J. Lin, J. Chen, B. Elenbaas, A.J.
29. J. Momand, G.P. Zambetti, D.C. Levine, Several hydrophobic amino
Olson, D. George, A.J. Levine, The acids in the p53 amino-terminal
mdm-2 oncogene product forms a domain are required for
complex with the p53 protein and transcriptional activation, binding to
inhibits p53-mediated transactivation, mdm-2 and the adenovims 5 E1B
Cell 1992, 69,1237-1245. 55-kD protein, Genes Deu. 1994, 8,
30. R. Honda, H. Yasuda, Activity of 1235-1246.
MDMZ, a ubiquitin ligase, toward p53 39. S.M. Picksley, B. Vojtesek, A. Sparks,
or itself is dependent on the RING D.P. Lane, Immunochemical analysis
finger domain of the ligase, Oncogene of the interaction of p53 with
2000, 19,1473-1476. mdm2;-fine mapping of the mdm2
31. S. Fang, J.P. Jensen, R.L. Ludwig, K.H. binding site on p53 using synthetic
Vousden, A.M. Weissman, Mdm2 is a peptides, Oncogene 1994, 9,
RING finger-dependent ubiquitin 2523-2529.
References I1001
40. P.H. Kussie, S. Gorina, V. Marechal, S.F. Howard, S.M. Picksley, D.P. Lane,
B. Elenbaas, J. Moreau, A.J. Levine, Molecular characterization of the
N.P. Pavletich, Structure ofthe mdm2 hdm2-p53 interaction, /. Mol. Biol.
oncoprotein bound to the p53 tumor 1997, 269,744-756.
suppressor transactivation domain, 49. C. Garcia-Echeverria, P. Chene, M.J.
Science 199G, 274, 948-953. Blommers, P. Furet, Discovery of
41. M.J.J.Blommers, G. Fendrich, potent antagonists of the interaction
C. Garcia-Echeverria, P. Chene, On between human double minute 2 and
the interaction between p53 and tumor suppressor p53,J. Med. Chem.
mdm2: transfer NOE study of 2000,43, 3205-3208.
p53-derived peptide ligated to mdm2, 50. R. Banerjee, G. Basu, P. Chene,
J. Am. Chem. Soc. 1997, 119, S. Roy, Aib-based peptide backbone as
3425-3426. scaffolds for helical peptide mimics, 1.
42. M. Uesugi, G.L. Verdine, The a-helical Pept. Res. 2002, GO, 88-94.
FXXFF motif in p53: TAF interaction 51. A. Bottger, V. Bottger, A. Sparks,
and discrimination by mdm2, Proc. W.L. Liu, S.F. Howard, D.P. Lane,
Natl. Acad. Sci. U.S.A. 1999, 96, Design of a synthetic Mdm2-binding
14801- 14806. mini protein that activates the p53
43. Z. Lai, K.R. Auger, C.M. Manubay, response in vivo, C u r . Biol. 1997, 7,
R.A. Copeland, Thermodynamics of 860-869.
p53 binding to hdm2(1-126): effects 52. C. Wasylyk, R. Salvi, M. Argentini,
of phosphorylation and p53 peptide C. Dureuil, I. Delumeau, J. Abecassis,
length, Arch. Biochem. Biophys. 2000, L. Debussche, B. Wasylyk, p53
381,278-284. mediated death of cells overexpressing
44. 0. Schon, A. Friedler, M. Bycroft, MDMZ by an inhibitor of MDMZ
S.M.V. Freund, A.R. Fersht, Molecular interaction with p53, Oncogene 1999,
mechanism of the interaction between 18, 1921-1934.
mdm2 and p53,]. Mol. B i d . 2002, 323, 53. P. Chene, J. Fuchs, J. Bohn,
491-501. C. Garcia-Echeverria, P. Furet,
45. R. Stoll, C. Renner, S. Hansen, D. Fabbro, A small synthetic peptide,
S. Palme, C. Klein, A. Belling, which inhibits the p53-hdm2
W. Zeslawski, M. Kamionka, T. Rehm, interaction, stimulates the p53
P. Muhlhahn, R. Schumacher, pathway in tumour cell lines, J. Mol.
F. Hesse, B. Kaluza, W. Voelter, R.A. Biol. 2000, 299, 245-253.
Engh, T.A. Holak, Chalcone 54. P. Chene. J. Fuchs, 1. Carena, P. Furet,
derivatives antagonize interactions C. Garcia Echeverria, Study of the
between the human oncoprotein cytotoxic effect of a peptidic inhibitor
MDMZ and p53, Biochemistry 2001, 40, ofthe p53-hdm2 interaction in tumour
336- 344. cells, FEBS Lett. 2002, 529, 293-297.
46. R.A. Laskowski, SURFNET a program 55. J.W. Harbour, L. Worley, D. Ma,
for visualizing molecular surfaces, M. Cohen, Transducible peptide
cavities and intramolecular therapy for uveal melanoma and
interactions, /. Mol. Graph. 1995, 13, retinoblastoma, Arch. Ophthalmo.
323-330. 2002, 120,1341-1346.
47. V. Bottger, A. Bottger, S.F. Howard, 56. J. Zhao, M. Wang, J. Chen, A. Luo,
S.M. Picksley, P. Chene, X. Wang, M. Wu, D. Yin, 2 . Liu, The
C. Garcia-Echeverria, H.K. initial evaluation of non-peptidic
Hochkeppel, D.P. Lane, Identification small-molecule HDM2 inhibitors
of novel mdm2 binding peptides by based on p53-HDM2 complex
phage display, Oncogene 1996, 13, structure, Cancer Lett. 2002, 183,
2141 -2147. 69-77.
48. A. Bottger, V. Bottger, 57. P.S. Galatin, D.J. Abraham, A
C. Garcia-Echeverria, P. Chene, H.K. nonpeptidic sulfonamide inhibits the
Hochkeppel, W. Sampson, K. Ang, p53-mdm2 interaction and activates
15 Target Families
1002
I p53-dependent transcription in 59. N. Majeu, M. Scarsi, A. Caflisch,
mdm2-overexpressing cells, J. Med. Efficient electrostatic model for
Chem. 2004,47,4163-4165. protein-fragment docking, Proteins
58. S.J. Duncan, S. Gruschow, D.H. 2001,42,256-268.
Williams, C. McNicholas, R. Purewal, 60. L.T. Vassilev, B.T. Vu, B. Graves,
M. Hajek, M. Gerlitz, S. Martin, S.K. D. Carvajal, F. Podlaski, Z. Filipovic,
Wrigley, M. Moore, Isolation and N. Kong, U. Kammlott, C. Lukacs,
structure elucidation of chlorofusin, a C. Klein, N. Fotouhi, E.A. Liu, In vivo
novel p53-mdm2 antagonist from a activation of the p53 pathway by
Fusarium sp, J. Am. Chem. Soc. 2001, small-molecule antagonists of mdm2,
123, 554-560. Science 2004, 303, 844-848.
Chemical Biology
I 1003
16
Prediction of ADM ET Properties
UlfNorinder and Christel A. S. Bergstrom
Outlook
This chapter describes some of the approaches and techniques used currently
to derive in silico models for the prediction of absorption, distribution,
metabolism, elimination/excretion, and toxicity (ADMET) properties. The
chapter also discusses some of the fundamental requirements for deriving
statistically sound and predictive ADMET relationships as well as some of
the pitfalls and problems encountered during these investigations. It is
the intention of the authors to make the reader aware of some of the
challenges involved in deriving useful in silico ADMET models for drug
development.
16.1
Introduction
With the use of genomics, proteomics, and bioinformatics, the possibility to

identify and validate target proteins has recently improved. Once the target has
been identified, the search for a pharmacophore, that is, a structural fragment
that binds to the target and exerts the effect, with an acceptable therapeutic
potency starts. After finding such a structure, the lead optimization is initiated.
Computational chemistry (CC)and high-throughput screening (HTS)are used
to synthesize new compounds and optimize them with regard to increased
potency. The lead optimization is performed in cycles, and in the end the
leads with the highest potency might be structurally rather diverse from
the starting structure. The obtained chemical library can be composed of
several thousands of new structures. The synthesized library is experimentally
examined for developability with the use of rapid experimental techniques for
measuring, for example, stability, solubility, permeability, and toxicity. After
Edited bv Stuart L. Schreiber. Tarun M. Kaooor. and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag G k b H 61 Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
1004
I 1 G Prediction ofADMET Properties
Fig. 16-1 From target identification to optimization process is performed in cycles,

candidate drug (CD). Target identification and at the end ofthe lead optimization
and validation are followed by lead discovery process the developability ofthe
and lead optimization. The lead compounds is traditionally investigated.
these determinations, one to two candidate drugs (CDs) are selected from the
library for further development (Fig 16-1).
The increase in new structures generated each year has not resulted in
the expected increase of marketed new drugs annually. This has amongst
others been attributed to poor pharmacokinetic (PK) properties of the
CDs, and as much as 40% of the attrition rate of CDs has been re-
lated to poor PK profiles [I]. Given this, reliable screening filters for fac-
tors such as absorption, distribution, metabolism, elimination/excretion,
and toxicity (ADMET) are highly desirable [2-41. Indeed, the consider-
able effort that has been invested in the development of experimental
absorption filters, for example, cell monolayers for permeability determina-
tions [5, 61 and the turbidimetric method for solubility measurements [7],
76. 7 introduction I 1005
Fig. 16-2 Reasons for attrition in drug formulation and cost o f goods were only
development in the years o f 1991 and 2000 observed as reasons for attrition in 2000 and
The following reasons were observed not 1991 Further Pl(/bioavailability profiles
clinical safety (black), efficacy (red), o f new drugs were largely improved d u r i n g
formulation (green), Pl(/bioavailability this decade Finally, commercial reasons for
(blue), commercial (yellow), toxicology attrition were m o r e than threefold higher in
(gray), cost o f g o o d s (purple). and 2000 than in 1991 [8]
unknown/others (white) Note that
has lately resulted in a decrease i l l the attrition rate related to PI<

properties (Fig. 16-2) [8]. However. to allow a n A D M E T analysis of coni-
putationally designed druglike molt~cules to bc pc,rformed prior to their
chemical synthesis, computer-based filter-s for prcdicting PI< properties are
needed.
Also, in current pharmaceutical rescarch i i r m challenges have been included
where additional considerations have to be t a l e t i with respect to toxicological
effects such as avoiding interactions with human Ether-a-go-go-RelatedGene
(hERG) as well as potential cytochronie P450 intcractions related to avoidance
of phase 1 metabolism. A particular probleni associated with the predictions of
toxicological effects is the lack of one well-dcfiiied and measui-able target (end
point) where the same mechanism is involved it1 giving rise to the observed
effrcts. On the contrary. even fairly similai- compounds may exert their toxicity
using di ffei-e11 t mec hail is ins.
From a development perspective. oiie of the first properties to be evaluated
is the gastrointestinal (GI) absorption. since tht, extent to Mhich a drug is
absorbed through the intestine will determitie i f i t is possible to give the
drug in an oral dosagc form. This formulatioti is the tnost convenient dosage
form for the patient. allowing thc patient to taltc care of the medication
himself/herself.. Two of the main factors influtmcing intcstinal absorption are
the solubility of the compound in thr GI fluid and thc permeabilit) of thc
compound through the intestinal wall. Thc solitbility will be restricting the
absorption if the oral dose given is not s o l u b l ~i n 250 m L in the pH interval
relevant in the G I tract (pH 1 in the stornach u p to pH 8 i n the colon) .I)“
1006
I Permeability will restrict the absorption if the permeability coefficient through
7 G Prediction ofADMET Properties
the enterocytes is low, leading to only a fraction of the compound in solution

that has been transported over the epithelium during the transit time in
the small intestine. Both solubility and permeability are dependent on the
physicochemical properties of the molecule, unfortunately in an opposed
manner (Fig. 16-3). For instance, lipophilicity, which is the major driving
force for permeability, is one of the most restricting properties for aqueous
solubility.
Fig. 16-3 Molecular properties important energy-dependent active transport

for solubility and permeability. (a) In the GI processes (transport efflux and influx
tract the tablet needs t o dissolve t o be able proteins) are used by compounds that are
to permeate the intestinal wall. One ofthe medium t o large sized, both by polar and
main properties restricting solubility, for nonpolar compounds. Further, the
example, hydrophobicity, is a driving force compounds may be charged or uncharged.
for the transcellular permeability. (b) The The figure presents hydrophobic atoms in
following general properties can be extracted gray (carbon atoms) and white (hydrogens
for permeability (from the left-hand side): bound t o carbon atoms), and polar atoms
the transcellular route is used by nonpolar, are shown in red (oxygen atoms), pink
medium-sized (MW < 500), and uncharged (hydrogens bound t o oxygen atoms), blue
compounds; the paracellular route is (nitrogens), and light blue (hydrogens
utilized by compounds that are polar, small bound t o nitrogens).
(MW < 180), and charged;
IG.I introduction I 1007
16.1.1
Drug Solubility
The aqueous solubility of the compound is dependent both on the

intramolecular forces in the solid state and the intermolecular forces between
the drug molecule and the surrounding intestinal fluid. The solubility will be
poor if it is more energetically favorable for the molecules to bind to each
other than to the water molecules, resulting in the molecules rather remaining
as a solid compact than dissolving in the water-based fluid. However, poor
solubility might also be a result of the unfavorable binding between the water
and the drug molecule is unfavorable. Depending on which of these underlying
properties is the most important, different physicochemical properties will be
important for the behavior of the molecule in the water. Multivariate data
analysis of melting point, a property reflecting the stability of the solid state,
has shown that molecules proven to form stable crystals, in general, are small,
rigid, and polar [lo]. On the other hand, compounds that are hydrophobic,
flexible, and large demand a larger cavity to be formed in the aqueous fluid to
get dissolved, and may be solubility restricted due to these properties. Models
for prediction of solubility will be further discussed in Section 164.2, but
the above-mentioned contrasts indicate that solubility is not a straightforward
property to predict.
16.1.2
Intestinal Permeability
A compound can permeate the intestinal wall by using the paracellular route
(between the cells) or the transcellular route (through the cells) by passive
diffusion. To generalize, small, hydrophilic, and/or charged compounds, which
cannot permeate the lipophilic cell membrane, diffuse through the aqueous
pores. However, the pores cover less than 1% of the intestinal surface [ll],and
this in concert with the solute restriction caused by the tight junctions of the
pores largely limits the contribution of the paracellular pathway. Compounds
that show a reasonable hydrophobicity (log D p H 7 . 4 of0-2) and intermediate size
(up to a molecular weight of 500) are assumed to permeate the intestinal wall by
passive transcellular diffusion. Even though the transport by the transcellular
route seems to be a rather complex process, demanding partitioning between
lipophilic and hydrophilic milieus several times, the vast majority of druglike
compounds utilize this pathway. Larger molecules with a large number of
hydrogen bond donors and acceptors, sometimes in combination with a high
lipophilicity value, may be utilizing active processes and transport proteins to
get through the cells. However, the latter properties also increase the risk that
the compound might be transported by ef€lux proteins, resulting in a secretion
of the compound back to the intestinal lumen. Such efflux results in a lower
drug concentration reaching the blood circulation and the site of action.
IG Prediction ofADMET Properties
1008
I To conclude, two of the main factors influencing intestinal drug absorption
are aqueous solubility and intestinal permeability. These characteristics are
dependent on opposed physicochemical properties, resulting in difficulties
in finding easily interpretable models for prediction of the drug absorption
process. Several computational solubility and permeability models have so far
been developed and a majority ofthese are either dataset restricted, for example,
only a small volume of the druglike space has been included in the training
of the model, or mechanism based, for example, valid for a specific transport
route or transport protein. This indicates that firstly, the datasets used in the
development of absorption models applicable in the drug discovery process
need to cover a large volume of the druglike space. Secondly, the development
of pharmaceutical informatics tools is crucial to extract correct information
from combinations of all mechanism-based models that are available.
16.1.3
Toxicity
Structure-activity relationships (SARs) in toxicology are based on the

assumption that an adequate representation, that is, geometric and electronic,
of the investigated structures will permit the derivation of a quantitative
statistical model. This assumption is not unique for toxicological modeling
but true for all other areas of ADME modeling as well. However, in toxicology,
the situation is somewhat further complicated by the fact that toxicological
effects may result from many different mechanisms. This, in turn, means
that it is possible to establish good in silico models for congeneric series of
molecules, and more general models may be difficult to derive. In 1969, Convin
Hansch, the founder of modern quantitative structure-activity relationship
(QSAR), proposed that, in general, a biological and toxicological action for a
congeneric series of structures could be described by the model:
Log(activity)= a(n) + b(&)+ c(S) + d (1)

where n,E , and S are related to the hydrophobic, electronic, and steric
descriptions, respectively, of the studied compounds.
Toxicological structure-activity investigations have over the years been con-
ducted in areas such as nonspecific toxicity, aquatic toxicology, mutagenicity,
and carcinogenicity as well as developmental toxicity, and skin sensitization.
For a recent article on the subject see Ref. 12.
16.2
Traditionally, the discovery setting has worked in serial with the primary
focus set on identification of new structures that show good pharmacological
effect. After the screening for pharmacological effects, other important

properties such as solubility, permeability, stability, metabolism, distribution,
elimination, and toxicity have been investigated one after each other. This is
a noneffective, time-consuming drug discovery process, which not necessarily
results in the identification of optimal drug molecule because of the
investigation of one property at the time. Currently, the pharmaceutical
industry is working with experimental screens in a parallel setting, in which
the above-mentioned properties are experimentally examined at the same time
and thereafter evaluated. Hence, all properties affect the final decision on
which compounds to pursue, leading to better selection of the CDs. Further,
the discovery setting is now moving into the virtual era, applying several virtual
tools to further cut time and costs during the discovery process. By designing
virtual compound libraries and testing these by virtual docking to targets and
in silico models for ADMET properties, a prioritized library predicted to have
favorable pharmaceutical profile and acceptable pharmacological potency is
computationally selected and thereafter synthesized. This scenario results in
knowledge-based synthesis of fewer compounds with better properties than
both the serial and parallel setting described above. After the synthesis of
the prioritized library, the potency and the developability of the compounds
must be experimentally confirmed (Fig. 16-4). Thus, methods for rapid and
reliable experimental screening of these properties are warranted. Currently,
rapid methods have been devised for the screening of several of the ADMET
properties at the expense of reliability [7, 131, resulting in large number of
false-positive results in the screens. By incorporating reliable computational
and experimental screens better leads will be produced, saving time and
money during the discovery process. However, if the virtual-based discovery
setting is to be successful new computational tools need to be developed.
The development of informatic tools applicable for pharmaceutical profiling
and with the capacity to handle large databases with such diverse information
as in silico, in vitro, and in vivo data as well as qualitative and quantitative
information will be of utmost importance.
16.3
16.3.1
General Terms
When trying to develop in silico models for the prediction ofADMET properties
there is in most cases a trade-off between accuracy, speed, and, many times,
transparency of the derived models. This is not always a significant problem
as the various models may be intended for different usages, for example, for
high-throughput in silico screening or for guidance and focusing, respectively.
In reality, this often means that rapidly computed descriptors, often of one-
IG
1010
I Prediction ofADMET Properties
Traditional Current Near future

(serial) (parallel) (knowledge based)
-a-lji”[c
Library
a-
Virtual library
Privileged library
-El aaaaaa
8
8 CD selection
CD selection
CD selection
Fig. 16-4 The traditional setting applied in properties are experimentally evaluated
the candidate drug (CD) selection was a simultaneously and the complete profile can
serial experimental testing o f pharmacology be used when selecting the CD. In the
(P) followed by the different ADMET knowledge-based setting, a virtual library
properties, resulting in extended designed in the computer is primary
development times and difficulties t o find evaluated through different in silico models
the optimal compound. Currently the for pharmacology and ADMET properties. A
pharmaceutical industry applies a parallel priviliged library is synthesized on the basis
setting and moves toward the ofthe results from the virtual screening and
knowledge-based setting. In the parallel the compounds are thereafter
setting, both pharmacology and ADMET experimentally tested.
and two-dimensional nature, are utilized in the former kind of models while
more computer intensive, three-dimensional based, variables are employed
(sometimes in conjunction with one- and two-dimensional representations) in
the latter type of models.
Cronin and Schultz have in a recent article [14] quite nicely put forward
some rather basic requirements to derive statistically sound models:
1. well-defined and measurable target
2. a chemically and biologically diverse data set
3 . physicochemical descriptors that are consistent with the
modeled target
4. usage of an appropriate statistical technique
5. where possible, a strong mechanistic basis.
16.3.2
Datasets and Models
One consideration to take into account in ADMET modeling is the availability

of relevant and accurate datasets. In general, there exists a relatively small
number of datasets, especially public ones, with desirable quality of data,
diversity of the investigated structures, and large enough size to permit
sufficient validation of the derived model. In the ADMET literature, especially
within the areas of solubility, absorption, and permeability, it is quite common
that models are derived from rather few compounds (less that 50). These
models are usually quite local ones having a limited scope with respect to
their predictive ability. Local models, however, are in many cases quite useful
for advancing a particular project or set of compounds but in one particular
respect a vast majority of the published models are lacking information,
that is, with respect to the applicability domain in which they operate. Very
few publications of ADMET models explicitly point out or discuss how the
applicability domain of the derived model in question is established. Statistical
models in general, including in silico ADMET models, should always have
some protocol (measure) to determine if the prediction of a property for a
particular compound is within, on the border of, or outside (perhaps also
how far outside) the applicability domain of the model based on the chemical
description employed. This aspect will be further discussed in Section 16.3.3.6
together with an approach on how to proactively use the information on outliers
to further advance the model. Absorption and permeability models, and the
datasets they are based on, also have a particular problem with respect to active
transport. In the past, datasets were modeled under the assumption that the
absorption or permeation process was devoid of active transport, although later
analysis showed that this was not entirely true. Most probably, compounds in
datasets currently being investigated will later be found to be involved in active
transport by transporters not yet identified. An extenuating circumstance is
the fact that if a model with good statistics as well as good predictive ability
is derived despite the fact that some compounds of the training set, that is,
the compounds used to derive the model, are involved in active transport then
the two alternative explanations may emerge: (a) that the amount of active
transport of a particular compound is rather small (negligible) or (b) that the
derived model somehow encompasses the information also related to active
transport, although this was, in most cases, not the intent from the start.
16.3.3
Statistical Tools
16.3.3.1 Linear Multivariate Methods

The statistical methods most often employed for developing ADMET in silico
structure-property relationships are linear multivariate methods, such as
multiple linear regression (MLR) or partial least squares (PLS).Although aimed
at the same end point, namely, to derive a statistically sound and predictive
structure-property relationship, the underlying assumptions regarding the
information contained in the independent variables, that is, the chemical
7G Prediction ofADMEJ Properties
1012
I description of the investigated structures, are quite different for the two
methods. With respect to MLR the following should be considered:
1. MLR assumes each variable to be exact and relevant, that
is, the information content in each variable is to be used
in entirety for developing the statistical model.
2 . Strong colinear variables must be eliminated by removing
all but one of the strongly correlated variables; otherwise
spurious chance correlation may result.
3 . The number of variables cannot exceed the number of
observations, for example, the number of measured
ADMET property points, to be studied. A rule of thumb is
that the number of variables used should not exceed a
fourth of the number of observations.
Regarding PLS the following applies:

1. The descriptors (variables) are not treated as exact and
relevant but as consisting of two parts, one part related to
the dependent variable and the other part not related
(noise).
2. Strong correlations between relevant variables are not a
problem in PLS and all such variables can be kept in the
analysis. In fact, the models derived using PLS become
more stable with the inclusion of strongly correlated and
relevant parameters.
3 . The number of original descriptors may vastly exceed the
number of compounds in the analysis since PLS uses,
internally, only a few (usually less than 5-10) latent
variables for the actual statistical analysis.
4. Again, a rule of thumb is that the number of latent
variables used should not exceed a fourth of the number
of observations.
The PLS model becomes identical to the MLR when the number of
latent variables of a PLS derived model becomes equal to the number
of actual independent variables, something that rarely happens as a
consequence of model validation. The regression coefficients of the MLR
model are straightforward to interpret while the PLS latent variables need
to be retransformed into original variable space to be interpreted in a
similar manner. This also means that the PLS “regression” coefficients
are dimensional dependent, that is, they depend on how many latent variables
(PLS components) are used. However, since each PLS component explains
a decreasing amount of variance it is usually not that important if a PLS
model is based on three or four components, which also means that the PLS
“regression” coefficients will not differ very much between the three- and
four-component models.
16.3.3.2 Nonlinear Multivariate Methods

Although a majority of the published ADMET models are based on linear
multivariate methods as discussed in Section 16.3.3.1, other nonlinear meth-
ods have also been employed. The most commonly used nonlinear method
in ADMET modeling is neural networks (NNs). Backpropagation N N s have
been used to model absorption, permeation, as well as solubility and tox-
icological effects. A particular problem for many N N s is the tendency for
these networks to overtrain (see further discussions on model validation in
Section 16.3.3.4), which needs to be closely monitored to avoid the situation
where the derived model becomes an “encyclopedia”, that is, the model can
perfectly explain the variance of the investigated property of the compounds
used to derive the model but have quite poor predictive ability with respect to
new compounds.
16.3.3.3 Dataset Pretreatment

It is very important to give the variables used in the model development
equal chance, regardless of their respective numerical scales, to influence the
outcome of the analysis. This can be achieved by scaling the variables in an
appropriative way. One popular method for scaling variables is autoscaling
whereby the variance of each variable is adjusted to 1. Sometimes it is
also desirable to center each of the variables with respect to their mean
values.
16.3.3.4 Model Validation

Stringent model validation is a cornerstone for the successful develop-
ment of any statistical model. Without proper validation the predictive
ability of the derived model cannot be estimated. Likewise, the derived
model may equally be nothing more than a random model. There are
a few standard techniques that should be employed to ensure proper
validation:
1. Cross-validation is one technique for the internal
validation of a proposed model. When using the
cross-validation the training set is divided into groups,
usually four to seven, and one group is removed from the
set. The model is then derived using the rest of the
training set. The dependent property of the compounds of
the left-out group is the predicted by the developed model.
Each group is successively left out and predicted in the
same manner as just described. The predicted residual
error sum of squares (PRESS) is computed from all the
predictions. The PRESS value is compared with the sums
of squares for the dependent variable y (SSY):
c(yi,measured - )‘mean)’ (2)

1014
A squared correlation coefficient (Q2)is then defined as:
Q2 = 1 - PRESS/SSY (3)
A significant difference between Q2 and the normal

squared correlation coefficient ( R2)is that the former may
also assume negative values, indicating that the model has
worse predictive ability than using the mean value as
predicted value for each compound. Q2 should be 20.5
for the model to be considered to have reasonable
practical predictive performance.
2. An external validation set should be used as an
independent test of the predictive ability of a derived
model.
3 . Randomization of the dependent variable, that is, the
values of the dependent variable is randomly redistributed
among the compounds. A model is then derived on the
basis of the redistributed values and checked for its
predictive performance using the methods outlined under
points 1and 2. This procedure is repeated a number of
times, typically between 50 and 100 times. There should
exist a clear separation in predictive ability between the
model based on the “true” dependent values versus the
model based on redistributed values.
16.3.3.5 Training and Test Set Selection

It is certainly possible to chose a training set at random and also to derive a
statistically sound and predictive model. Chances are, however, that the choice
of training set compounds is soinewhat skewed. This, in turn, most probably
means that many of the remaining compounds, the external test set, will fall
outside of the applicability domain of the derived model and constitute outliers
to the present model. For a model to have the ability to show a good predictive
capability and to cover the investigated descriptor space in a good manner
the training set must be chosen with some care. There are several methods
available for the selection of well-distributed training sets. Two such methods
will be exemplified here:
1. Experimental design methods of some appropriate
complexity are one such choice. The number of
compounds to be used for the training set depends on the
chosen design scheme and the number of investigated
independent variables (descriptors) but may typically
range between 8 and 64.
2. Maximin methods, where the aim is to maximize the
closest (minimum)distance between two potential
training set compounds in the investigated descriptor

space. By maximizing the closest distance all other
distances between training set compounds are greater,
thus ensuring a rather uniform distribution of
compounds comprising the final training set.
16.3.3.6 Applicability Domain Estimation

It is rather essential that the applicability domain of a derived model can be
evaluated so that outliers to the model may be indicated. If an established
statistical model is to be regarded as poor from a predictive point of view
this should be done on the basis of correct reasons, that is, that the model
has truly poor predictive ability and not from the fact that the model cannot
estimate outliers to the model with acceptable accuracy. The latter case is
probably the most common cause for statistical (ADMET) models to “fall from
fame” especially those that can be accessed through internal or external web
services. In many cases it is difficult, if not impossible, to find out about
the compounds used as training set and/or the chemical description used
in the model. Thus, many compounds outside the applicability domain of
the model will be submitted. It is therefore of great importance to have an
indication together with the prediction whether the compound is considered
to fall inside or outside of this domain, that is, if the compound is an outlier
or not. The outlier information, and possibly also how far from the model the
compound in question is, may in many cases be utilized in a more proactive
way than just realizing that a number of compounds submitted to the model
for prediction are, in fact, outliers to the present model. Thus, by analyzing the
outliers, perhaps virtual compounds, from various points ofviews, for example,
structural or synthetic, some of these compounds may later be synthesized
and tested experimentally. The same compounds may then be incorporated
into a revised model that will have a broader applicability domain. There are
different methods available to determine whether a particular compound is to
be labeled as an outlier. In this section, we will describe two of these methods:
1. The first of these methods is the Mahalanobis distance.
This distance in descriptor space measures how similar
the investigated compound is to the training set
compounds. The Mahalanobis distance is superior to the
corresponding, and more familiar, Euclidian distance
since the former takes correlations between the variables
into account, that is, the Mahalanobis distance does not
assume orthogonal descriptors as does the euclidian
distance, that normally exists.
2. The second method is related to the remaining
information present in the variables used to describe the
compound that has not been utilized by the model. This
method is closely related to the PLS method and its
IG
1016
I Prediction ofADMEJ Properties
assumption with respect to the relevance of each variable

(see Section 16.3.3.1).Thus, if a particular compound
contains a lot of unexplained variance (information) in the
chemical descriptor variables, much more than the
training set compounds, it is quite likely that the
compound in question will have other properties, not
accounted for by the present model, which will impact on
the true value for the investigated ADMET property. The
predicted value will therefore, most likely, deviate
substantially from the corresponding experimental value.
16.3.3.7 Calculation of Descriptors

A large number of different descriptors have been used to model ADMET
properties. All 1-D, 2-D, and 3-D based computed chemical properties have
been found useful for deriving statistically sound and predictive ADMET
models. The choice of which type of descriptors, or combinations thereof, to
use depends very much on the aim of the derived model. Is the model to be used
for screening large sets of (virtual)compounds or for smaller sets of structures?
How important is interpretability versus predictive accuracy and robustness
of the prediction? How much computational time is allowed for spending on
each individual prediction? In fortunate cases many of these considerations
coincide, that is, the model is robust and shows good predictive capability as
well as being based on rapidly computed descriptors that are easy to interpret
from a mechanistic or physicochemical point of view. However, in most cases
there exists a trade-off between objectives. Depending on the priorities for the
development of the particular model at hand different sets of descriptors have to
be employed. Having these aspects in mind, it is usually quite useful to develop
more than one model for the same ADMET property based on different sets
of descriptors. This way, both interpretability (incorporated into a model with
acceptable although perhaps not the best predictive ability) and robustness,
as well as speed and accuracy can be achieved. For instance, the former
kind of model can be used for understanding the important physicochemical
properties influencing the particular ADMET property in question, how these
physicochemical properties should be modified to achieve a suitable level for
the investigated ADMET property. This, in turn, gives an indication of how
new and improved compounds could be designed, as well as enables focusing
on promising regions of the chemical space of the model. Thus, instead of
simulating a very large number of virtual compounds for prediction by the
model, a much smaller number can be submitted. Subsequently, this smaller
number of structures can then be submitted to a more robust and accurate,
although more complex, model for the final estimation ofthe ADMET property.
In many cases, consensus or ensemble models, although more complex in
nature, are quite useful for deriving in silico ADMET models with high levels
of predictive accuracy as well as high degrees of robustness. These models may
I G. 3 General Considerations I 1017
quite often be looked upon as “gray”, not “black”, boxes since each model can
be interpreted but the multitude of them makes the overall picture difficult to
comprehend.
1-D and 2-D descriptors are generally much faster to compute than the
corresponding 3-D based ones. Also, the possible problems associated with
generating a reasonable 3-D conformation for the investigated structure are
eliminated.
1. I-D descriptors such as molecular weight, molar
refractivity, as well as number of atoms and bonds have
been used to model permeability, absorption, solubility,
and toxicological effects. These kinds of descriptors are
usually rather easy to interpret.
2. A large number of 2-D descriptors exist. Many of them are
topological in nature, that is, they are computed from the
connectivity of the investigated compound or, more
specifically, from the mathematical graph that the
structure represents, and often contain important
information with respect to ADMET modeling. Some of
the more well known, and often much used, topological
variables are the Kier and Hall descriptors. However,
many times these topological descriptors are somewhat
difficult to interpret with respect to the question: “How
should the present structure be modified to improve the
ADMET property presently investigated?” A particular
subset of topological descriptors, the so-called
electrotopological ones, is an exception with respect to
interpretability. These kinds of descriptors are quite easy
to interpret in terms of hydrogen bonding and quite a few
published investigations have found the electrotopological
(or e-state) descriptors useful for deriving good ADMET
models.
3. In many cases 3-D based descriptors are superior to lower
dimensional ones because they capture important
information, such as internal hydrogen bonds, and other
potentially important, but buried functional groups
revealed only by using the actual 3-D representation of
investigated compound. The 3-D descriptors may also be
easier to interpret than some of the previously mentioned
variables. However, choosing the correct 3-D
conformation may, in some cases, cause problems
depending on how rapidly the descriptors must be
generated. There are softwares for converting 2-D
structures into 3-D ones, for example, Corina and
Concord, but although quite successful in a vast majority
of cases, both these programs sometimes fail during the
1018
I 1 6 Prediction ofADMET Properties
conversion process or the 3-D conformation given is not a

reasonable one for this particular modeling exercise
(Tab. 16-1).Certainly, some sort of conformational
analysis would in many cases be desirable. For the 3-D
descriptors there exists a large difference in complexity
and computational speed ranging from rapid calculations
of various surfaces and volumes of a structure to high
level, for example, ab initio, quantum mechanical based
descriptors such as orbital energies, charges,
polarizabilities as well as multipole moments.
In some cases it is possible to go from more computationally demanding

descriptors to more rapidly computed ones while preserving the information
content from one descriptor matrix to the other.
16.4
16.4.1
Physiological Factors and Experimental Parameters Influencing the Accuracy of
Predictions of Intestinal Drug Absorption
16.4.1.1 Solubility
The intestinal solubility of a compound is dependent on physicochemical
properties of the molecule (discussed in Sections 16.1.1 and 16.4.2), the
location in the GI tract, the general physiology, and the dosage form. By
analyzing the descriptors in the Noyes-Whitneyequation [ 151the physiological
and pharmaceutical influence on dissolution becomes apparent:
dfitfdt = DA(C,)/h (4)

where, C, is the maximum amount of drug that can be dissolved in the fluid,
that is, the solubility value, A is the surface area of the undissolved compact,
D is the diffusion coefficient in the intestinal fluid, and h is the height of the
diffusion layer adjacent to the undissolved tablet. The diffusion coefficient of
a molecule will be dependent on the viscosity of the fluid; the higher viscosity,
the lower diffusion coefficient and thereby less amount of compound will
be dissolved per time unit. Furthermore, the larger the surface area of the
undissolved compact and the higher the solubility of the compound, the more
compound will be dissolved per time unit.
The pH of the GI tract varies from pH 1 in the stomach up to pH 8 in
the colon. Thus, the solubility of protolytes, that is, compounds with one or
several ionizable groups, will be dependent on the location in the GI tract [16].
Compounds with an acidic functional group will show increased solubility
at pH values above the pK,, whereas the solubility of bases will improve at
Table 16-1 Examples of commercial software available for
prediction of ADMET related properties
Software Company Dissolution Sol Perm Trp Oral HIA BBB Metabolism Other Toxicity
bioavaila bility PK
AbSolv x x
ACD Solubility DB ACD labs X
ADME batches PharmaAlgortihms X X X
ADME boxes PharmaAlgorithms x x x X
Cerius2 AccelRys X x x X X X
Cloe PK Cyprotex X X X
GastroPlus Simulations Plus X x x X X X X
iDEA PKexpress Lion Biosciences X X X X
KnowItAll ADME/Tox Bio-Rad Laboratories X X x x X X
Oraspotter ZyxBio X X X -
o\
PK-sim Bayer Technology x x X X X X A
Services b
-Q
QikProp Schrodinger x x X 8
QMPRPlus Simulations Plus x x x X s.
0
SLIPPER TimeTec X
xX x
3
Crosses shows properties predicted in each of the reported
software. The following abbreviations are used: Sol - solubility,
Perm - intestinal permeability, Trp - transporters,
HIA - human intestinal absorption, BBB - blood-brain
barrier permeability, PK - pharmacokinetic properties.
1020
I pH values below the pK, value. For ampholytes, the lowest solubility will be
found at the isoelectric point, which is obtained at a pH value between the
acidic and basic pKa values. Another physiological factor that will influence the
solubility is the ionic strength of the intestinal fluid. This will be dependent on
food and fluid intake, and on the absorption and secretion of fluid within the
intestine [17].In general, the solubility decreases with increased ionic strength,
because of the salting-out effect and/or the common ion effect displayed by
the counterions in the solution [18, 191. However, the presence of electrolytes
can in specific cases improve the solubility [lo].This phenomenon is known
as the salting-in effect, and occurs when additives such as electrolytes loosen
up the tight water structure and thereby drive the formation of solvent cavities
for the drug molecule. Further, food induces the secretion of bile salts, that
is, surfactants secreted by the bile bladder, which may improve the solubility
of poorly soluble compounds by acting as a wetting agent or by solubilization
within the lipophilic core of bile salt micelles formed at higher bile salt
concentrations [21].
The in silico models derived for solubility are based on intrinsic solubility
as their experimental input data. The intrinsic solubility is the solubility value
determined for the neutral (i.e., uncharged) species of the compound and
is generally determined at 2 pH units above the pKa value for bases and
2 pH units below the pK, value for acids. Ampholytes are determined at
their isoelectric point. The solubility values used for the model development
therefore seldom reflect the apparent solubility seen in the intestinal
fluids. Hence, the predicted values obtained from the models need to be
transferred to an in vivo situation, for instance, by use of the Henderson-
Hasselbalch equation, which takes into account the pH dependency of
solubility [16].
16.4.1.2 Permeability
The rate and extent of intestinal permeation is dependent on the physico-
chemical properties of the compound (see Sections 16.1.2 and 16.4.3) and the
physiological factors. Drugs are mainly absorbed in the small intestine due to
its much larger surface area and less tight epithelium in comparison to the
colon [17].The permeation of the intestine may be affected by the presence of
an aqueous boundary layer and mucus adjacent to cells, but for a majority of
substances the epithelial barrier is the most important barrier to drug absorp-
tion. The lipoidal cell membrane restricts the permeability of hydrophilic and
charged compounds, whereas large molecules are restricted by the ordered
structure of the lipid bilayer.
In the GI tract, a pH-dependent permeability is seen (see also
Section 16.4.1.1):the higher the degree of ionization of the compound, the
poorer the permeability. Other physiological factors influencing the perme-
ability value of the compounds are the motility of the GI tract, the expression
of transport proteins, and the thickness of the mucus layer adherent to the
76.4 App/ications and Practical Examples I 1021
enterocytes. These factors influence the permeability as follows: the better

the motility of the intestine, the smaller the unstirred waterlayer (UWL)
adjacent to the cells. In general, the peristaltic is so good in vivo that the
UWL does not become the rate-limiting step in the absorption process. Fur-
ther, the extent to which the transport proteins are expressed will largely
influence the absorption. Dependent on whether the transport protein is an
influx protein, transporting the compounds through the enterocytes into the
blood circulation or an efflux protein, transporting the compound out from
the cell back to the intestinal lumen, the fraction absorbed (FA) will either
increase or decrease with a high expression of the transporter. Finally, a
thick mucus layer adjacent to the cells may slow down the diffusion of the
compound and become the rate-limiting step of the absorption process. Taken
altogether, these physiological factors may result in large interindividual vari-
ability in the permeability value, giving large standard deviations in the FA
i n vivo.
The i n silico models derived for permeability are based on experimentally
determined permeability values using different cell culture models. The most
commonly used is the Caco-2 cell line, which is a human colon carcinoma
cell line [22, 231. This cell line is inexpensive and easy to culture, and these
factors in concert with its human origin make it a popular cell model.
However, the colonic epithelium is somewhat tighter than the small intestinal
epithelium, resulting in permeability values of 1-2 orders of magnitude less
than that seen in smaIl intestinal tissue. Despite this fact, the permeability
ranking of the compounds is in good agreement with that obtained in the
small intestine, and therefore the model is a valid tool for estimations of
FA over the small intestinal wall. Other cell lines used for determination
of permeability values are MDCK cells that originate from canine kidney
tissue [24] and 2/4/A1 cells originating from the rat small intestine [25].
The drawback with these cell lines is that they are not obtained from
human tissues and the MDCK cell line is further restricted by its kidney
origin resulting in for example, other expression pattern of transporters than
that in the human small intestine. In vivo, perfusion studies in humans
can be used to determine intestinal drug permeability [26]. All the different
experimental settings and protocols applied for permeability measurements
will largely influence the obtained permeability data. It is therefore important
that the experimental values used in the development of computational
models are determined in a consistent manner, within the same laboratory
using one experimental setting and one experimental protocol. Only then
the i n silico model is based on high quality data and the noise level is
minimized.
16.4.1.3 Fraction Absorbed

Several computational absorption models based on human FA data have
been published [27-301. These models should be interpreted with caution,
7G Prediction ofADMET Properties
1022
I due to the fact that the datasets are compiled from a large number of
literature sources of varying quality. The following facts must be taken into
consideration:
1. Different experimental methods are used to determine
the FA, resulting in a large variability in the numbers
reported.
2. The influence of active transporters and the concentration
dependency in vivo are not always clear.
3. It is not transparent whether the FA is solubility limited
and/or permeability limited, resulting in difficulties in
obtaining a mechanistically transparent model.
4. The datasets obtained are often heavily biased toward
compounds with high FA due to the fact that a majority of
the compounds for which FA is known are commercially
available compounds. Hence, these compounds are the
results of years of discovery and development and they are
expected to show a good absorption profile. However, this
fact will influence the obtained in silico models. These will
be rather good at sorting compounds such as high FA, but
poor in determining other classes such as intermediate or
poor FA due to the lack of such compounds in the
training sets.
To conclude, it is not unusual that FA data for the same compound varies
with 50% in the literature, for example, FA can be reported as either 10 or GO%,
generally sorted as poor and intermediate FA, respectively. If such data is used
for training the i n silico model, the model will to a large extent be based on
noise leading to poor external predictions and noninterpretable results. In our
mind, it is more relevant to estimate the FA on the basis of in silico solubility
and permeability screens.
16.4.2
In silico Solubility Models
Modeling solubility represents perhaps a bigger challenge than modeling

absorption and permeability. Why is this so? Some of the particular issues
involved in trying to derive good statistical models for solubility are related
to quality and precision of the dependent variable, namely, the solubility
values, the complexity (or lack thereof) and diversity of the compounds
of the investigated datasets, the possible influence of the solid state for
each of the studied compounds as well as whether modeling solubility is
fundamentally a linear or nonlinear problem. With respect to the first issue,
the quality (precision) of the solubility values found in literature, it must be
recognized that the values published stems from a variety of experimental
16.4 Applications and Practical Examples 1 1023
procedures that make comparisons between sets of measurements rather

difficult. It is not uncommon that published values of a particular compound
may differ by as much as a factor of lo! This, in turn, certainly makes
modeling solubility a difficult problem. Many of the publications on modeling
solubility contain a large number of compounds but in many cases majority
of these structures are rather simple, nondrug like, molecules in which the
structural complexity with respect to functional groups and ring systems is
somewhat limited. Good quantitative structure-solubility relationships are
easier to derive for such datasets. Also, it has been recognized for many
years that the solid state of each of the investigated compounds may very
well play an important role for the modeling attempt to be a successful
one. The difficulty here lies in the fact that it is rather difficult to obtain
a theoretical estimate of the solid phase within reasonable computation
time and with satisfactory precision. Nevertheless, many attempts have
been made and many articles published over the years on how to model
solubility. In this section some of these recently published works will
be described and commented upon to illustrate the present status of the
field:
1. A well-known paper is that by Huuskonen [31]. In this
investigation a backpropagation artificial neural network
(ANN) was used as statistical engine and as e-state
descriptors to parameterize the chemical structures. The
investigation was based on 1297 compounds, also known
as the Huuskonen dataset, and used a large training set, a
randomly chosen test set, and a second (external) test set
composed of 21 compounds. A model with good statistical
quality was developed (see Table 16-2).A point worth
noticing in this investigation is the use of the dataset
specific “test” set where, in this case, according the
publication: “The network architecture and the training
end point giving the highest coefficient of determination,
rired,and the lowest standard error s for the predictions of
the test set were then used”. This means that the
randomly chosen test set is in fact a validation set for the
training of the N N and the only “true” external test set is
the 21-compound set. A somewhat larger external test set
is desirable to more extensively evaluate the predictive
ability of the derived model in question. The statistical
results are presented in Table 16-2.
2. Several other investigations of solubility using the
“Huuskonen dataset” and other datasets using ANNs and
various other N N methods, for example, Bayesian NNs,
Kohonen’s self-organizing NNs, have been published in
the last few years (see Table 16-2 for results and
references).
1024
I IG Prediction ofADMEJ Properties
3 . Jorgensen and Duffy have published a recent review of

predictions of solubility focused on drugs [32].
4. Consensus modeling using ANNs have been published by
Manallack and coworkers [ 3 3 ] .They used BCUT variables
with diagonal elements consisting of charges, hydrogen
bonding acceptor and donor ability, respectively, as well as
polarizability.
Many, not to say an overwhelming majority, of the investigations that

have published on the prediction of aqueous solubility of dmgs (and other
compounds) have identified the most important (influential) factors to be
related to hydrogen bonding, polarizability or polarity, as well as hydrophobicity
expressed through terms such as e-state indices, hydrogen bonding terms, and
the log P variable.
Lately, consensus modeling has come into play as a useful tool for obtaining
robust models with good predictive ability. By using this approach the weakness
of one particular model is compensated by the other models thus obtaining a
much more robust behavior for the ensemble of models.
Apart from the accuracy of experimental data as discussed earlier, however,
there exists a problem with the presently derived models, that is, although, at
first sight, appearing to be quite respectable statistical models with rather good
predictive ability these models are not so optimal for predicting the solubility
of drugs. Why is that?
An investigation by Norinder and coworkers [34]will be used to illustrate the
situation but, again, this is a general deficiency among the published models
for predicting aqueous solubility. The statistics for the PLS model is quite
appreciably, see Table 16-2 (Norinder; PLS) and also a plot of experimental
versus calculated solubility (Fig. 16-5).
However, a closer inspection of the solubility range relevant for most
drugs, that is, -6 to -3, reveals a rather different picture (Fig. 16-6). For
the accurate prediction of such entities the derived model is not very useful.
This is, however, the situation that investigators are faced with when trying
to derive models for accurately predicting drug solubility that can be of
valuable practical use for medicinal chemists, biologists, pharmacologists,
and others in trying to advance research projects to arrive at compounds
with reasonable solubility. Using consensus or ensemble modeling instead
of a single model usually improves the situation somewhat as exemplified
by a rule-based ensemble model using two-dimensional parameters on the
Huuskonen dataset (Table 16-2: Norinder; RDS/classification/ensemble) [34].
Sometimes, depending on the targeted use ofthe model as well as the precision
of the experimental data, it is more useful to classify the range of solubility
into two or three bins (categories). This approach is exemplified on the same
dataset in which three categories (log(S); good: > -2, medium: -2 to -4,
poor: 1 4 ) were used. The results of a single model approach as well as an
ensemble modeling (50 models) are reported in Table 16-2.
Table 16-2 Summary of different methods and models for the
Huuskonen aqueous solubility dataset
Training set Test set Test set 2
Model Type n R2 S n R2 S n R2 S References
Gasteiger MLR 797 0.79 0.93 496 0.82 0.79 21 0.56 1.20 Yan and Gasteiger, I.
Chem. In& Comput. Sci.,
2003,429-434
ANN40-8-1 797 0.93 0.50 496 0.92 0.59 21 0.85 0.77
Liu ANN7-2-1 1033 0.86 0.70 258 0.86 0.71 21 0.79 0.93 Liu and So, J. Chem. In&
Comput. Sci., 2001,
1633-1639
Tetko MLR 879 0.86 0.75 412 0.85 0.81 21 0.77 0.99 Tetko et al., J. Chem. In&
Comput. Sci., 2001,
1488- 1493
A N N 33-4-1 879 0.94 0.47 412 0.91 0.60 21 0.90 0.64
Huuskonen M LR 884 0.89 0.67 413 0.88 0.71 21 0.83 0.88 Huuskonen,]. Chem. 1nJ g
Comput. Sci., 2000, h
773-777 b
Ts
ANN30-12-1 884 0.94 0.47 413 0.88 0.60 21 0.91 0.63
253 0.93 0.54 21
P
Wegner ANN9-15-1 1016 0.94 0.52 0.82 0.79 Wegner and Zell, /. Chem.
In$ Comput. Sci., 2003,
2.
0
1077-1084
2
r\
1
x
Norinder PLS 800 0.87 0.69 497 0.93 0.58 21 0.80 0.82 Unpublished work SL
Norinder RDS/ensemble 800 0.97 0.35 497 0.95 0.51 21 0.87 0.67 Unpublished work a2-u
Model Type n Accuracy (%) n Accuracy (“3) n Accuracy (“A) s.
800 82.10 497 80.30 21 0.83 0
Norinder RDS/classification
RDS/classification/ 800 98.00 497 86.90 21 0.91 Unpublished work
Unpublished work c
P
Norinder
ensemble -m
3-
5
n - number of compounds, RZ - squared correlation
-
4
coefficient, s - standard deviation, accuracy - %correct 8

Gl
classified compounds into the three classes: good,
medium, poor.
1026
-2
-4
-6
-8
-1 0
-1 3.-
-12 -10 -8 -6 -4 -2 0 2
Experimental log(S)
Fig. 16-5 Model ofthe Huuskonen performance o f the developed model with
aqueous solubility dataset using PLS [34]. respect to usage for predicting aqueous
Triangles - training set, circles - test set. solubility for new potential drug compounds
The plot shows the "deceptively" good (see also Figure 16-6).
16.4.3
In silico Models o f Permeability and FA
16.4.3.1 Descriptors Used for Permeability Predictions

Response parameters when studying permeability related absorption can
be the permeability through a cell monolayer, such as Caco-2, MDCK,
and 2/4/A1; the effective permeability in the intestine; or the FA of the
dose. Permeability models predicting intestinal absorption are generally
models of transcellular passive diffusion, and descriptors of hydrophobicity,
hydrophilicity, and size have proven important (see Table 16-3). Hydrophobic
descriptors can be regarded as measures of distribution capacity into the
membrane, hydrophilic descriptors as desolvation restriction when the
compound partitions from the intestinal aqueous fluid into the hydrophobic
membrane, and size reflects the steric hindrance to diffusion through the
membrane [35]. The log Pact descriptor has been used historically to predict
membrane permeability and hence, it is incorporated into a large number
of the models developed. For noncomplex datasets, properties such as
log Pact, polar surface area (PSA), and hydrogen bond counts have each been
used as a single predictor of permeability [36-391. However, lipophilicity
can be regarded as a composed property that is largely dependent on
-3
-4
A A
-5
A A
L A -6
-6 -5 -4
Experimental log(S)
Fig. 16-6 Close-up o f t h e area o f aqueous “true” or limited performance ofthe

solubility interest from drug development developed solubility model with respect t o
perspective [34]. Triangles - training set, predictive capability for new compounds.
circles - test set. The graph shows the
the size and hydrophilicity of the compound, and thus, the use of these
two components might be regarded as more sound than logPo,,. Indeed,
the use of molecular weight and number of hydrogen bonds have been
shown to predict better the permeability of a smaller dataset than did
log pact [401.
The introduction of more complex datasets used for model development
has pointed at the need for several descriptors and multivariate data analysis
(Table 16-3). For instance, combinations of PSA and nonpolar surface area
(NPSA) proved to predict the permeability of a series of peptides when PSA
alone failed [41]. Moreover, the introduction of larger structures and structures
with larger flexibility showed that the partitioned total surface areas (PTSAs),
that is, the surface area of the molecule occupied by a specific atom, and/or
descriptors related to the flexibility of the molecule are in the permeability
predictions [42, 431.
Electrotopological indices have been used to predict permeability, computa-
tionally (Table 16-3). The electrotopological descriptors are not always easily
comprehended, even though they can be attributed to describe hydropho-
bicity, hydrophilicity, and size. Other typical 2D generated descriptors are
related to dispersion forces, polarizability, solute molar volume, and hydro-
gen bonding acidity and basicity [44-471. Descriptors such as log POct/logDo,,,
1028 I 1G Prediction ofADMET Properties
Table 16-3 Quantitative in silico models based on Caco-2

permeability values or human fraction absorbed (FA) data
Response Type of descriptors Statistical R2 Nt, Nte References

method
Caco-2 Papp Number of LR 0.94 10 0 Conradi et al., Pham.

hydrogen bonds Res., 1992,435-439
Caco-2 Papp PWASA LR 0.98 11 0 Hjort Krarup et al.,
Pharm. Res., 1998,
972-978
Caco-2 Papp PSA SR 0.96 9 0 Ertl et al.,]. Med.
Chem., 2000,
3714-3717
Caco-2 Papp Molecular surface MLR 0.96 19 0 Stenberg et al., Pharm.
areas Res., 1999,205-212
Caco-2 Papp Solute and solvation M LR 0.86 30 8 Kulkarni et al., ]. Chem.
related 1
5. Comput. Sci., 2002,
331-342
Caco-2 Papp PSA, lipophilicity, MLR 0.71 77 23 Hou et al.,]. Chem. In5
size, and flexibility Comput. Sci., 2004,
1585-1600
Caco-2 Papp Hydrogen bond MLR 0.71 33 12 Marreroetal.,].
capacity, Pharm. Pharm. Sci.,
lipophilicity, and 2004,186-199
size
Caco-2 Papp Hydrogen bond PLS 0.85 9 8 Norinder et al., Pharm.
strength and Res., 1997,1786-1791
electrostatics
Caco-2 Papp Hydrogen bond PLS 0.80 16 0 Oprea and Gottfries, J.
capacity, Mol. Graph Model,
lipophilicity, size, 1999,261-274
and flexibility
Caco-2 Papp Hydrogen bond PLS 0.92 11 0 Osterbergand
capacity and Norinder, J. Chem. In5
lipophilicity Comput. Sci., 2000,
1408-141 1
Caco-2 Papp Size, surface PLS 0.90 16 0 Osterberg and
tension, and Norinder, Eur. J.
dielectric constant Pharm. sci., 2001,
327-337
Caco-2 Papp Electrotopological PLS 0.71 17 10 Stenberg eta].,]. Med.
indices Chem., 2001,
1927-1937
Caco-2 Papp Hydrogen bond PLS 0.87 17 10 Stenberg et al.,]. Med.
strength and Chem., 2001,
electrostatics 1927-1937
Caco-2 Papp Surface areas PLS 0.93 17 10 Stenberg et a].,]. Med.
Chem., 2001,
1927-1937
Table 16-3 (continued)
Response Type of descriptors Statistical R2 Nt, Nt, References

method
Caco-2 Pa,, Electrotopological PLS 0.91 9 8 Norinder and

indices Osterberg, J . harm.
Sci., 2001, 1976-1085
Caco-2 Papp Surface areas PLS 0.93 13 10 Bergstrom et al.,J.
Med. Chem., 2003,
558- 570
Caco-2 Pa,, Hydrogen bond PLS 0.83 20 10 Matsson et al., J. Med.
capacity, PSA, and Chew., 2005,604-613
charge
Caco-2 Papp Hydrogen bond NN 0.62 87 0 Fujiwara et al., 1nt.J.
capacity, charge, Pharm., 2002,95-105
polarizability, and
dipole moment
Caco-2 P, PSA SR 0.91 9 0 Palm et al., J . Med.
Chem., 1998,
5382-5392
Caco-2 active Size, electrostatics, P LS 0.75 20 0 Wanchana et al., /.
trp (peptides) and flexibility Pharm. Sci., 2004,
3057-3065
Caco-2 active Electrotopological PLS 0.92 20 0 Wanchana et al., J.
trp (peptides) indices Pharm. Sci., 2004,
3057-3065
FA PSA SR 0.94 20 0 Palm et al., Pharm.
Rex, 1997,568-571
FA PSA SR 0.91 20 0 Ertl et al.,J. Med.
Chem., 2000,
3714-3717
FA Structural M LR 0.79 417 50 Klopman et al., Eur. J .
fragments Med. Chem., 2002,
253-263
FA Hydrogen bond PLS 0.50 85 0 Oprea and Gottfries, J .
capacity, Mol. Graph Model,
lipophilicity, size, 1999,261-274
and flexibility
FA Hydrogen bond PLS 0.93 74 0 Osterberg and
capacity and Norinder, J . Chem. In$
lipophilicity Comput. sci., 2000,
1408-1411
FA Electrotopological PLS 0.83 13 7 Norinder and
indices Osterberg, /. Pham.
Sci., 2001, 1976-1085
FA Hydrogen bond NN 0.87 76 10 Wessel et al., 1.Chem.
capacity, size, and If: Comput. Sci., 1998,
flexibility 726-735
1030
I 16 Prediction ofADMET Properties
Table 16-3 (continued)

~~ ~~~
Response Type of descriptors Statistical R2 Nt, N e References

method
FA Hydrogen bond NN 0.86 76 10 Niwa,J. Chem. Inf:

capacity, flexibility, Comput. Sci., 2003,
and hydrophobicity 113-119
Compilation of descriptors, size of datasets, and statistical

models used, and accuracy of published in silico absorption
models. Several classification models can be found in the
literature, which are regarded as qualitative models and
therefore not reported. Caco-2 and FA data were selected for
the compilation, since these are the main responses used in the
development of computational models. However, other
responses such as permeability in 2/4/A1 cell monolayers,
artificial membranes, and the MDCK cell line, have also been
used as responses in the computational model development.
The following abbreviations are used: R2 -coefficient of
determination, Nt, and N,,- number of compounds in training
set and test set, PaPp-apparent permeability, P,-cellular
permeability, active trp - active transport, PWASA - polar
water accessible surface area, PSA - polar surface area,
LR - linear regression, SR - sigmoidal regression,
MLR - multiple linear regression, PLS - partial least squares
projection to latent structures, N N - neural network.
polarizability, polarity, strength of Lewis base and acid, number and strength
of hydrogen bond donors/acceptors, obtained from quantum mechanics have
also been correlated to permeability [42, 48, 491. These descriptors did show
high accuracy in the prediction, even though less complex and more rapidly
calculated descriptors were almost as accurate. Thus, since quantum mechanic
descriptors are not outperforming the fragment-based descriptors with respect
to accuracy, they will not be usable in the drug discovery setting until such
calculations become faster.
16.4.3.2 Factors Influencing the Accuracy o f Computational Permeability

Models
Most published models are based on experimentally determined permeability
data in Caco-2 cell monolayers. However, models based on FA (human
intestinal absorption) have also been developed. The descriptors used in these
models are of the same type as found in the cell-based models. However, the
response parameter used generally shows large variability, depending on the
methodology used to determine the FA in humans and the interindividual
variability (see Section 16.4.1.3), and hence the accuracy of the obtained
model is largely influenced. Even for datasets where the compounds have
been selected carefully to utilize only passive diffusion to permeate the
intestinal cell membrane [SO], it has later become evident that some of the
included compounds also have an active component included in their transport

mechanism. The quality of the response parameters can also be varying for
the datasets used in permeability models based on cell lines. Permeability
values obtained for the same compound using the same cell line in different
laboratories will differ in their absolute numbers due to effects of cell culture
protocols and experimental procedures during the measurements. Hence,
the dataset used for training and evaluation should be determined within the
same laboratory using the same experimental protocol. However, classification
models might be based on compiled data, since measurements in the different
laboratories in general will result in the same ranking of compounds, that
is, the compounds will be correctly sorted as poor, intermediate, or high
permeability compounds even though the absolute numbers may differ largely
between the laboratories.
Other important factors influencing the accuracy and applicability of the
model are the chemical diversity of the training set used in the model
development, the statistical tools used in the development, and the transport
mechanisms included in the response parameter. These will influence the
models as follows: to be generally applicable and to have high accuracy in
the prediction of drug permeability, the training set used should cover a large
volume of the druglike space. If a model applicable for a specific therapeutic
class is warranted, the training set should be focused on this region of the
druglike space. In any of these scenarios, the most important fact to bear in
mind is that the training set should be representative of the type of compounds
that are to be predicted, that is, if a model is to predict the permeability
of drugs then druglike molecules must be used in the model development.
Regarding the statistical tool used, it is important to select a statistical and
mathematical tool that is sound. Hence, the data has to be preanalyzed
so that linear versus nonlinear methods are correctly selected. Finally, it is
difficult to obtain transparent and interpretable models if all different kinds
of transport routes are included in the measured permeability value. Ideally,
separate models are developed for passive transcellular diffusion, passive
paracellular diffusion, and for each of the transport proteins that can be
utilized. After the establishment of these models, pharmaceutical informatic
tools are used to extract the information about the apparent permeability
through the intestinal wall.
When plotting the permeability versus FA, different cell models will result
in largely different slopes and ranges of the respective permeability curve.
The cell models, in common, have relatively steep slopes, as exemplified in
Fig. 16-7.The presented curves are obtained from permeability measurements
using Caco-2 and 2/4/A1 cell lines in our laboratory. The 2/4/A1 cell line
has the steepest slope and highest apparent permeability values of the two
cell lines, and is in good agreement with the values obtained in human
perfusion studies [25]. The steep slopes of these model systems result in the
in silico models based on these data, which are good at discriminating high
permeability from low permeability. However, a small difference in predicted
1032
I Fig. 16-7 Permeability versus
human fraction absorbed. The
range and the slope o f the
apparent permeability values
obtained from different cell
models used for in vitro studies
FA
o f absorption differ largely, as
exemplified with Caco-2 cell
permeability values (full line)
and 2/4/A1 cell permeability
(dashed line). Drawn after
Matsson et al.,J. Med. Chem.,
2005, 48, 604-61 3.
permeability in comparison to the experimental value in the region of the slope

may shift the compound from being predicted as intermediately permeable to
be either highly or poorly permeable. Hence, the predictions in the midrange
of the permeability values are much more difficult to interpret and draw
conclusions from regarding further development.
16.4.4
A Computer-based Biopharmaceutical ClassificationSystem
The biopharmaceutics classification system (BCS) is one way of getting

information on drug absorption [51]. According to the BCS, compounds can
be sorted into four classes depending on their solubility and permeability:
class I compounds with high solubility and high permeability; class 11
compounds with poor solubility and high permeability; class 111 compounds
with high solubility and poor permeability; and class IV compounds with
poor solubility and poor permeability. High solubility is defined as the
maximum dose given orally soluble in 250 mL fluid within the pH interval
of 1-7.5, otherwise it is of low solubility. High permeability is defined
as 290% absorbed, otherwise it is low [9]. If a compound is sorted as a
class I compound, no further clinical studies need to be performed after
minor changes in the formulation. Various cut-off values for the BCS have
previously been applied as qualitative screening tools for drug absorption in
drug discovery and development [9,52,53].Recently, a semiexperimental study
using literature solubility data in combination with FA data predicted from the
calculated log Po,, correctly sorted 65% of a series of 29 compounds [54].
If a computer-based BCS with high accuracy in the prediction of the
absorption characteristics were to be devised, it would be possible to sort
compounds absorption-wise, prior to synthesis. Such virtual tools applied
in early drug discovery would result in fewer CDs with formulation
problems.
In a recent study we used a BCS with six classes, where the solubility
was classified as either “low” or “high” in accordance with the cut-
off values set by the FDA and the permeability was classified as ‘‘low’’
(FA < 20%), “intermediate”(20% < FA < SO%), or “high” (FA > 80%) [55].
This classification was chosen because we believe it provides a better tool
for absorption ranking of compounds in drug discovery than the stricter
permeability classification provided by the FDA. Experimental determinations
of the Caco-2 permeability and intrinsic solubility were performed in-house,
and PLS i n silico models based on PTSAs were derived. In comparison to the
experimentally determined data, the combination of the two in silico models
resulted in 87% of the compounds being sorted into the correct class. The
compounds included in a reference test set given by the FDA were correctly
sorted with an accuracy of 77%. To summarize, these results indicate that
more sophisticated in silico models combining computational analysis of the
solubility and permeability can successfully estimate the absorption process
both qualitatively and quantitatively [55].
16.4.5
In silico Toxicity Models
Toxicology is a rather different matter compared with the other ADME

disciplines because many different mechanisms may be involved. Thus, the
compounds of the investigated dataset may, although they appear to be rather
similar, be subject to different toxicological mechanisms that, in turn, give
rise to different types of toxicological responses. A large number of papers
have been published over the years with proposed models (relationships) that
relate molecular structure to a toxicological end point of some sort. Three
good literature starting points with respect to the present state of in silico
toxicology statistical modeling are by Green [56], Schultz and coworkers [57],
and Dearden [58], respectively. The first article is an update on the various
softwares that exist for prediction toxicology, for example, DEREK, OncoLogic,
HazardExpert, COMPACT, multi-CASE, and TOPCAT, while the article by
Schultz and coworkers focuses on QSARs in toxicology. Toxicological end
points that are referred to in this investigation are aquatic toxicity, receptor-
mediated toxicity, mutagenicity and carcinogenicity, skin sensitization, and
skin and eye irritation, and they are acute. The article by Dearden deals with
both softwares but also has references to some specific toxicological Q5AR
investigations related to end points such as cytotoxicity, drug resistance, and
skin permeability. A study with a historical perspective for the development
of QSARs in toxicology published by Schultz and coworkers makes useful
reading [12].Within the area of modeling QSARs, including pharmacophore
approaches, several articles have appeared in recent years. A QSAR related
article on cytochromes P450 has been published by Lewis [59]. Relationships
between binding affinities related to various binding site interactions such
1034
I as hydrogen bonding and n-n stacking and also to parameters related to
7 G Prediction ofADMET Properties
hydrophobicity, namely, log P and log D, have been developed. An extensive

review article related to QSARs of cytochrome P450s has recently been
published by Hansch and coworkers [60].A large number of P450 end points
and datasets for which QSARs have been investigated are presented in this
review article. A slight drawback in many of the P450 datasets in this review
is that they are relatively small in size. Typically, many P450 datasets contain
between 7 and 15 compounds and the largest investigated dataset contains
only 28 members. Although useful for elucidating important properties and
possibly rendering some mechanistic insight in fortunate cases, the resulting
statistical models are rather local in character with a small applicability
domain. The practical use of these models for predicting the behavior of
new, virtual, sets of compounds may therefore be of limited value. Lately,
additional considerations with respect to avoiding interactions with hERG
have entered into the drug development scenario. Avoiding interactions with
hERG has become a top priority for many drug companies due to the increased
attention with respect to this issue by Federal Drug Administration (FDA)
and regulatory agencies in other countries due to the severe consequences
associated with hERG interaction such as Q-T interval prolongation. Only a
few published studies on hERG SARs have been published so far and much
work is currently being conducted to identify properties and/or structural
entities that cause hERG channel inhibition. One structure-based model of
hERG inhibition based on the KcsA crystal structure has been published,
while the other models are ligand based using 3D QSAR techniques such
as CoMFA, CoMSIA, and Catalyst. Recently, 2D QSAR descriptions using
both more traditional variables as well as holograms have been used to derive
models for hERG inhibition. Again, the publicly available training sets for
developing in silico models for hERG are rather limited in size, which restrain
these models with respect to predictive ability for estimating inhibition of new
compounds.
An interesting article published by Stouch and coworkers [Gl] addresses
some cases where ADME/Tox models fail and the reason for these failures.
In some cases, the failure is related to the intended use of the in silico and
the expectations of the users of the model. In other cases, failures are related
to developmental aspects of the model, such as choice of statistical tool,
description of the investigated structures, as well as limited model validation.
Feng and coworkers [G2] have benchmarked different sets of descriptors, for
example, constitutional descriptors (CONS),topological information indices
(TI), BCUT parameters, as well as some fragment (fingerprint) descriptors
(FRAG), and statistical methods, for example, recursive partitioning (RP),
ANNs, and PLS, on four different datasets with different toxicological end
points. They found that three combinations BCUT and RP, FRAG and PLS, as
well as FRAG and RP worked better than expected, while two combinations
BCUT and NN, together with TI and RP worked somewhat worse than
expected. The fact that fragment (fingerprint) descriptors seem to work well
I
IG.5 Future Development and Conclusions 1035
is not too surprising since the concept of toxicophores has been used for
quite some time in explaining the toxicological behavior of compounds. At the
same time, however, the authors of the article also state that for large datasets
there is a clear need for the development of new descriptors and/or statistical
methods.
16.5
Future Development and Conclusions
To improve solubility, permeability, and toxicity predictions, further a number

of actions are needed. Firstly, as mentioned above, focus should be set on
the datasets used for the training of the in silico models. The compounds
included in the model development and validation need to be representative
of the application of the model. Hence, if a general in silico model is to be
developed, a large dataset (i.e., hundreds of compounds) with a chemical
diversity covering the volume of the druglike space should be used. On the
other hand, if a model applicable for the prediction of a specific subset
is warranted, focus should be set on this region of the druglike space
to improve the accuracy of the model. Secondly, the experimental setting
needs to be standardized and the experimental values used in the model
development should be consistently determined using one type of assay.
Only high quality data should be incorporated to minimize the effect of
Fig. 16-8 (a) To improve the drug is further divided into a large number o f
discovery setting, the development o f subgroups as exemplified by absorption.
informatics tools suitable for virtual These subgroups may cooperate,
pharmaceutical screening are highly counteract, or be independent ofeach other.
desirable. Such tools must have the ability Furthermore, both qualitative and
t o extract important information related t o quantitative information are compiled in
each o f the main areas investigated during these screening, further stressing the
the drug discovery and early development importance o f development o f specific
process, that is, pharmacological effect and software for this application.
ADMET properties. (b) Each ofthese groups
1036
I noise on the model. Thirdly, the models should be as simplified as possible.
16 Prediction ofADMEJ Properties
In our opinion, it is therefore better to permeability-wise develop several

mechanism-based models revealing, for example, the extent of the passive
transcellular and/or paracellular transport, and eventual binding to important
transport proteins. Finally, to extract information from such different models
for transferring the computational predictions to approximations of the in
vivo behavior new data-mining tools need to be devised (Fig. 16-8). The
need for data-mining tools devised for pharmaceutical informatics can be
exemplified by the absorption process per se. The extent to which a compound
is absorbed will be dependent on its dissolution rate, stability (chemical and
enzymatical), solubility, and permeability (passive transcellular component,
passive paracellular component, active influx, and active e m u ) . For each
component in the ADMET screen, the same scenario is valid, that is, a large
number of in silico models need to be devised to predict each of the ADMET
components. Hence, one of the future challenges will be the development of
user friendly,transparent, and fast data-miningtools, allowing pharmaceutical
informatics to be performed early in drug discovery. If such computational
tools are devised and highly accurate in silico models of ADMET properties
applicable to the druglike space are developed, then the prerequisites for a
successful virtual drug discovery setting are present.
Acknowledgments
Christel Bergstrom acknowledges financial support from the Knut and Alice
Wallenberg foundation and the Swedish Fund for Research without Animal
Experiments.
Glossary
Multiple Linear Regression (MLR)

The relationship between the independent input variables xi and the dependent
variable y is described in the equation:
y = a0 + a1x1 + 02x2 + + +
~3x3 ' ' ' anXn + E (5)
The error parameter E is the residual. The parameters a, are adjusted
so that the sum of the squared errors ( C E ~for
) all the investigated objects
(compounds) is minimized.
Partial Least Squares (PLS)

PLS reexpress the original matrix of data ( X ) for the investigated objects
(compounds) as the product of a score matrix T and a loading matrix P'. The
Glossary 17037
scores, where each investigated object (compound)has a computed set of score

values, give the best summary of X and can be seen as the underlying factors
of the studied system. Similarly, the dependent variable Y is decomposed into
U and C‘.
U=BxT
The PLS algorithm then minimizes F while preserving the correlation

between X and Y through the equation U = B x T.
Neural Networks (NN)

NN systems are inspired by the manner in which biological nervous systems,
for example, the brain, handle information. A typical NN is constructed from
a number of “input nodes” (the X variables), a “hidden layer” of nodes, and
an “output node” (the dependent Y variable).
The basic idea of the network is to adjust the weights (wi)of each connection
so that, as was the case for MLR, the sum of the squared errors ( C E * )between
experimental and predicted output for all the investigated objects (compounds)
is minimized.
Huuskonen Dataset
The Huuskonen dataset [31] consists of 1297 compounds compiled from
the AQUASOL dATAbASE of the University of Arizona (Yalkowsky,S. H.;
Dannelfelser, R. M. The ARIZONA dATAbASE of Aqueous Solubility;
College of Pharmacy, University of Arizona:
1038
Tucson, AZ, 1990) and SCR’s PHYSPROP Database (Syracuse Research

Corporation. Physical/Chemical Property Database (PHYSOPROP); SRC
Environmental Science Center: Syracuse, NY, 1994). The experimental
aqueous solubility values for the investigated compounds are measured
between 20 and 25°C. The logs values of the dataset ranges from -11.62
to f1.58.
BCUT Descriptors
The BCUT descriptors are the lowest and highest eigenvalues of a connectivity
matrix of a molecule in which the diagonal elements for each atom are assigned
properties such as atomic charges, atomic polarizability, or atomic hydrogen
bond parameters, respectively.
Recursive Partitioning (RP)

RP is a method that in a repetitive (recursive) manner selects variables that
separate and enrich different classes, for example, active and inactive or toxic
and nontoxic, of compounds to achieve a good discrimination between the
classes, thus creating sets of rules to attain that objective.
50 Inactive compounds (I)
References
1. T. Kennedy, Managing the drug 3. S. Modi, Computational approaches to

discovery/development interface, Drug the understanding of ADM ET
Discov. Today 1997, 2,436-444. properties and problems, Drug Discov.
2. D.E. Clark, P.D. Grootenhuis, Today 2003,8,621-623.
Progress in computational methods 4. H. van de Waterbeemd, E. Gifford,
for the prediction of ADMET ADMET in silico modelling: towards
properties, Cum. Opin. Drug. Discov. prediction paradise? Nut.Rev. Drug
Devel. 2002, 5, 382-390. Discov. 2003, 2, 192-204.
References 11039
5. P. Artursson, J. Karlsson, Correlation own solutions, J. Am. Chem. SOC.1897,

between oral drug absorption in 19, 930-934.
humans and apparent drug 16. K.A. Hasselbalch, The calculation of
permeability coefficients in human the hydrogen number of the blood
intestinal epithelial (Caco-2)cells, from the free and bound carbon
Biochem. Biophys. Res. Commun. 1991, dioxide of the same and the binding of
175,880-885. oxygen by the blood as a function of
6. P. Artursson, R.T. Borchardt, the hydrogen number, Biochem. Z.
Intestinal drug absorption and 1916, 78, 112-144.
metabolism in cell cultures: Caco-2 17. T.T. Kararli, Comparison of the
and beyond, P h a m . Res. 1997, 14, gastrointestinal anatomy, physiology,
1655-1658. and biochemistry of humans and
7. C.A. Lipinski, F. Lombardo, B.W. commonly used laboratory animals,
Dominy, P.J. Feeny, Experimental and Biophurm. Drug Dispos. 1995, 16,
computational approaches to estimate 35 1-380.
solubility and permeability in drug 18. J.B. Bogardus, Common ion
discovery and development settings, equilibriums of hydrochloride salts
Adv. Drug Deliv. Rev. 1997, 23, 3-25. and the Setschenow equation,].
8. I. Kola, J. Landis, Can the Pharm. S C ~1982,
. 71, 588-590.
pharmaceutical industry reduce 19. E. Khalil, S. Najjar, A. Sallam,
attrition rates? Nut. Rev. Drug Discov. Aqueous solubility of diclofenac
2004,3,711-716. diethylamine in the presence of
9. C.A. Lipinski, Drug-like properties pharmaceutical additives: a
and the causes of poor solubility and
poor permeability, 1.Phnmacol.
comparative study with diclofenac
sodium, Drug Dev. Ind. Pharm. 2000,
Toxicol. Methods 2000,44, 235-249.
C.A.S. Bergstrom, U.Norinder,
26,375-381.
10.
20. T. Arakawa, S.N. Timasheff,
K. Luthman, P. Artursson, Molecular
Mechanism of protein salting in and
descriptors influencing melting point
salting out by divalent cation salts:
and their role in classification of solid
balance between hydration and salt
drugs,]. Chem. InJ Comput. Sci. 2003,
binding, Biochemistry 1984, 23,
43,1177-1185.
5912-5923.
11. J.R. Pappenheimer, K.Z. Reiss,
21. W.N. Charman, C.J. Porter,
Contribution of solvent drag through
intercellular junctions to absorption of S. Mithani, J.B. Dressman,
nutrients by the small intestine of the Physicochemical and physiological
rat,]. Membr. Biol. 1987, 100, 123-136. mechanisms for the effects of food on
12. T.W. Schultz, M.T.D. Cronin, J.D. drug absorption: The role of lipids and
Walker, A.O. Aptula, Quantitative pH,]. Pharm. Sci. 1997,86,269-282.
structure-activity relationships 22. I. J , Hidalgo, T.J. Raub, R.T. Borchardt,
(QSARs) in toxicology: a historical Characterization of the human colon
perspective,]. Mol. Struct. ( T H E O ) carcinoma cell line (Caco-2) as a
2003, 622,l-22. model system for intestinal epithelial
13. M. Kansy, F. Senner, K. Gubernator, permeability, Gastroenterology 1989,
Physicochemical high throughput 96,736-749.
screening: Parallel artificial membrane 23. P. Artursson, Epithelial transport of
permeation assay in the description of drugs in cell culture. I: A model for
passive absorption processes, J. Med. studying the passive diffusion of drugs
Chem. 1998,41,1007-1010. over intestinal absorptive (Caco-2)
14. T.W. Schultz, M.T.D. Cronin, Pitfalls cells, J. Pharm. Sci. 1990, 79, 476-482.
in QSAR,]. Mol. Struct. ( T H E O ) 2003, 24. J.D. Irvine, L. Takahashi, K. Lockhart,
622,39-51. J , Cheong, J.W. Tolan, H.E. Selick, J.R.
15. A.A. Noyes, W.R. Whitney, The rate of Grove, MDCK (Madin-Darby canine
solution of solid substances in their kidney) cells: A tool for membrane
I6 Prediction ofADMEJ Properties
1040
I permeability screening, 1.P h a m . Sci. Pitt, A consensus neural
1999,88,28-33. network-based technique for
25. S. Tavelin, V. Milovic, G. Ocklind, discriminating soluble and poorly
S. Olsson, P. Artursson, A soluble compounds, J. Chem. If:
Conditionally immortalized epithelial Comput. Sci. 2003, 43, 674-679.
cell line for studies of intestinal drug 34. U. Norinder, P. Liden, H. Bostrom,
transport, J. Phartnacol. Exp. Ther. Prediction of aqueous solubility using
1999,290,1212-1221. rule-based systems (RDS,
26. H. Lennernas, 0. Ahrenstedt, www.compumine.com) and ensemble
R. Hallgren, L. Knutson, M. Ryde, modelling, unpublished results.
L. Paalzow, Regional jejunal 35. S.J. Marrink, H.J.C. Berendsen,
perfusion, a new in vivo approach to Simulation of water transport through
study oral drug absorption in man, a lipid membrane, J. Phys. Chem.
Pharm. Res. 1992, 9,1243-1251. 1994, 98,4155-4168.
27. Y.H. Zhao, J. Le, M.H. Abraham, 36. R.A. Conradi, A.R. Hilgers, N.F. Ho,
A. Hersey, P.J. Eddershaw, C.N. P.S. Burton, The influence of peptide
Luscombe, D. Boutina, G. Beck, structure on transport across Caco-2
B. Sherborne, I. Cooper, J.A. Platts, cells. 11. Peptide bond modification
Evaluation of human intestinal which results in improved
absorption data and subsequent permeability, Pharm. Res. 1992, 9,
derivation of a quantitative 435-439.
structure-activity (QSAR) 37. K, palm, p. Stenberg, K. Luthman,
with the Abraham descriptors, J. P. Artursson, Polar molecular surface
Pharm. Sci. 2001, 90, 749-784. properties predict the intestinal
28. G. Klopman, L.R. Stefan, R.D.
absorption of drugs in humans,
Saiakhov, ADME evaluation: 2. A
Phartn. Res. 1997, 14,568-571.
computer model for the prediction of
38. L. Hjorth Krarup, I. Thooger
intestinal absorption in humans, Eur.
Christensen, L. Hovgaard, S. Frokjaer,
J. P h a m . Sci. 2002, 17,253-263.
Predicting drug absorption from
29. T. Niwa, Using general regression and
probabilistic neural networks to molecular surface properties based on
predict human intestinal absorption molecular dynamics simulations,
with topological descriptors derived Pharm. Res. 1998, 15,972-978.
from two-dimensional chemical 39. K. Palm, K. Luthman, A.L. Ungell,
structures, J . Chem. h j Comput. Sci. G. Strandlund, F. Beigi, P. Lundahl,
2003, 43, 113-119. P. Artursson, Evaluation of dynamic
30. M.A. Perez, M.B. Sanz, L.R. Torres, polar molecular surface area as
R.G. Avalos, M.P. Gonzalez, H.G. predictor of drug absorption:
Diaz, A topological sub-structural Comparison with other computational
approach for predicting human and experimental predictors, J. Med.
intestinal absorption of drugs, Eur. J. Chem. 1998,41,5382-5392.
Med. Chem. 2004,39, 905-916. 40. G. Camenisch, J. Alsenz, H. van de
31. J. Huuskonen, Estimation of aqueous Waterbeemd, G. Folkers, Estimation
solubility for a diverse set of organic of permeability by passive diffusion
compounds based on molecular through Caco-2 cell monolayers using
topology, J. Chem. Inj Comput. Sci. the drugs’ lipophilicity and molecular
2000,40,773-777. weight, Eur.J. P h a m . Sci. 1998, 6,
32. W.L. Jorgensen, E.M. Duffy, 313-319.
Prediction of drug solubility from 41. P. Stenberg, K. Luthman,
structure, Adv. Drug Deliv. Rev. 2002, P. Artursson, Prediction of membrane
54,355-366. permeability to peptides from
33. D.T. Manallack, B.G. Tehan, calculated dynamic molecular surface
E. Gancia, B.D. Hudson, M.G. Ford, properties, P h a m . Res. 1999, 16,
D.J. Livingstone, D.C. Whitley, W.R. 205-212.
References I1041
42. P. Stenberg, U.Norinder, 50. M.D. Wessel, P.C. Jurs, 1.W. Tolan,
K. Luthman, P. Artursson, S.M. Muskal, Prediction of human
Experimental and computational intestinal absorption of drug
screening models for the prediction of compounds from molecular structure,
intestinal drug absorption, J . Med. /. Chem. I$ Comput. Sci. 1998, 38,
Chem. 2001,44,1927-1937. 726-735.
43. D.F. Veber, S.R. Johnson, H.Y. Cheng, 51. G.L. Amidon, H. Lennernas, V.P.
B.R. Smith, K.W. Ward, K.D. Kopple, Shah, J.R. Crison, A theoretical basis
Molecular properties that influence for a biopharmaceutic drug
the oral bioavailability of drug classification: the correlation of in
candidates, I. Med. Chem. 2002, 45, vitro drug product dissolution and in
2615-2623. vivo bioavailability, Pharm. Res. 1995,
44. M.J. Karnlet, R.M. Doherty, 12,413-420.
v, Fiserova-Bergerova, P,W, Carr, 52. E. Walter, S. Janich, B.J. Roessler, J.M.
M.H. Abraham, R.W. Taft, Solubility Hilfinger, G.L.J.Amidon,
properties in biological media 9 HT29-MTX/Caco-2cocultures as an in
prediction of solubility and part tion of vitro m ~ ~for e the
l intestinal
organic nonelectrolytes in blood and epithelium: in vitro-in vivo correlation
tissues from solvatochrornic with permeability data from rats and
parameters, _I. Pharm. Sci. 1987, 76, humans, Pharm. Sci. 1996, 85,
1070-1076.
14-17.
53. S. Winiwarter, N.M. Bonham, F. Ax,
45. J.A. Gratton, M.H. Abraham, M.W.
A. Hallberg, H. Lennernas, A. Karlen,
Bradbury, H.S. Chadha, Molecular
Correlation of human jejunal
factors influencing drug transfer
permeability (in vivo) of drugs with
across the blood-brain barrier, /.
experimentally and theoretically
Pharm. Pharmacol. 1997,49, derived parameters. A multivariate
1211-1216.
data analysis approach, /. Med. Chem.
46. M.H. Abraham, Y.H. Zhao, J. Le, 1998,41,4939-4949.
A. Hersey, C.N. Luscombe, D.P. 54. N.A. Kasim, M. Whitehouse,
Reynolds, G. Beck, B. Sherborne, C. Ramachandran, M. Bermejo,
I. Cooper, On the mechanism of H. Lennernas, A.S. Hussain, H.E.
human intestinal absorption, Eur. J . Junginger, S.A. Stavchansky, K.K.
Med. Chem. 2002,37,595-605. Midha, V.P. Shah, G.L. Amidon,
47. O.A. Raevsky, S.V. Trepalin, H.P. Molecular properties of WHO
Trepalina, V.A. Gerasimenko, O.E. essential drugs and provisional
Raevskaja, SLIP P ER-2001- Software biopharmaceutical classification, Mol.
for predicting molecular properties on phamacol, 2004, 1, 85-96,
the basis of physicochemical 55. C.A.S. Bergstrom, M. Strafford,
descriptors and Structural Similarity,/. L. Lazorova, A, Avdeef, K. Luthman,
Chem. In$ Comput. Sci. 2002, 42, P. Artursson, Absorption classification
540-549. of oral drugs based on molecular
48. U.Norinder, T. Osterberg, surface properties, /. Med. Chem. 2003,
P. Artursson, Theoretical calculation 46,558-570.
and prediction of Caco-2 cell 56. N. Green, Computer systems for the
permeability using MolSurf prediction of toxicity: an update, Adv.
parametrization and PLS statistics, Drug D e h . Rev. 2002, 54, 417-431.
Pharm. Res. 1997, 14,1786-1791. 57. T.W. Schultz, M.T.D. Cronin, T.I.
49. U.Norinder, T. Osterberg, Netzeva, The present status of QSAR
P. Artursson, Theoretical calculation in toxicology,/. Mol. Struct. (THEO)
and prediction of intestinal absorption 2003, 622, 23-38.
of drugs in humans using MolSurf 58. J.C. Dearden, In silico prediction of
parametrization and PLS statistics, drug toxicity,/. Cornput.-Aided Mol.
Eur. I.Pharm. Sci. 1999,8,49-56. Des. 2003, 17, 119-127.
7G Prediction ofADMET Properties
1042
I 59. D.F.V. Lewis, S. Modi, M. Dickins, 61. T.R. Stouch, J.R. Kenyon, S.R.
Quantitative structure-activity Johnson, X.-Q. Chen, A. Doweyko,
relationships (QSARs)within Y. Li, In silico ADME/Tox: why
substrates of human cytochromes models fail, J . Cornput.-AidedMol. Des.
P450 involved in drug metabolism, 2003, 17,83-92.
Drug Metab. Drug Interact. 2001, 18, 62. J. Feng, L. Lurati, H. Ouyang,
221-242. T. Robinson, Y. Wang, S . Yuan, S. S
60. C. Hansch, S.B. Mekapati, A. Kamp, Young, Predictive toxicology:
R.P. Verma, QSAR of cytochromes benchmarking molecular descriptors
P450, Drug. Metab. Rev. 2004, 36, and statistical methods, J. Chem. Inf:
105- 156. Comput. S C ~2003,43,14G3-1470.
.
PART VII
Systems Biology
Edited by Stuart L. Schreiber, Tamn M. Kapoor, and Gunther Wess
ISBN: 978-3-527-31150-7
Chemical Biology
I 1045
17
Computational Methods and Modeling
17.1
Systems Biology of the JAK-STATSignaling Pathway
lens Timmer,Markus Kollmann, and Ursula Klingmiiller
Outlook
Systems biology is a worldwide rapidly growing field of research. The central

idea of systems biology is to apply mathematical modeling to understand
the dynamics of regulatory processes in cell biology. In this chapter we will
discuss the necessity of the systems biology approach and exemplify it by an
application to cellular signal transduction.
17.1.1
Introduction
After sequencing the genomes of several organisms, including humans, the

“text of life” is available. Now, the next step is to learn how to “read”
it. This includes the understanding and prediction of cellular responses
to external stimuli and to decipher the evolutionary design principles of
biochemical networks. Of special medical importance is the understanding
of conditions promoting health or leading to disease. In some cases, single
gene mutations decide between the two states. But it is more and more
recognized that the function of cellular processes is not determined by a
single gene, but by regulation of the complex cellular networks. Diseases like
cancer result from dysregulations in these networks. Regulation is determined
by dynamical interaction of the involved components. Therefore, biological
function becomes the systems’ property of dynamic networks. The goal
of systems biology is to elucidate the network-based functions of cellular
processes. Because of the complexity of these processes, intuition-based
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gurither Wess
ISBN: 978-3-527-31150-7
1046
I 77 Computational Methods and Modeling
reasoning is not sufficient to reach this goal, but mathematical computer-based

approaches are necessary.
17.1.2
History Development
A systems-based approach to biology dates back to Norbert Wiener (1894-1964)

[I]and Ludwig von Bertalanffy (1901-1972) [2]. These early approaches might
have suffered from oversimplifying assumptions and far-reaching general
claims, but provided groundbreaking examples of how mathematical modeling
and ideas from control theory can contribute to a systems level understanding
of biology.
In the 1970s, two groups independently developed the systems biology
of metabolic systems [3, 41. Metabolic systems are especially suited for a
mathematical treatment because, in contrast to signaling pathways and gene
regulatory networks, they
are completely determined by the involved enzymes;
usually operate in steady state;
obey conservation laws for their components, the metabolites.
The developed control theory for metabolic systems allows inferring of,
for example, the effects of local changes, like the properties of an enzyme
on global properties as the flux through the system. Furthermore, general
global properties of the systems were captured by summation and connectivity
theorems, see [S] for a comprehensive review.
For signaling pathways and gene regulatory networks, the above constraints
do not hold and similar general statements are not available. But for specific
examples, the ideas from metabolic systems have been generalized to signaling
pathways [GI and design principles of signaling pathways and gene regulatory
networks have been discovered [7, 81. An important topic of recent research
is the robustness of the systems because they have to function in a noisy
environment under fluctuating conditions. These investigations reach from
bacterial chemotaxis [9, 101 via components of signaling pathways [11] to
developmental biology 1121, see Ref. 13 for a recent review.
For signaling pathways, recent years have seen an increasing number of
studies of specific pathways where mathematical modeling is applied to infer
systems’ properties from the models. These applications include the mitogen-
activated protein (MAP)-kinasepathways [14-161, apoptotic pathways [17-191,
the WntlB-Catenin [20],and the Janus kinase-signal transduction and activator
of transcription (JAK-STAT)pathway [21].A regulatory network that has been
studied intensively is the cell cycle [7, 22, 231.
Because of the nascent state of systems biology, only few textbooks are
available [24-261.
77.7 Systems Biology oftheJAK-STAT Signaling Pathway I 1047
17.1.3
Since Newton’s days, Physics and Engineering have been extremely successful
in understanding the inanimate part of nature by applying mathematics and
translating these insights into technological developments. It is foreseeable
that in the twenty-first century an analogous development will take place for
the animate part of nature, including technology based on the insights of the
basic sciences.
Arguments for the helpful contributions of mathematics applied to the life
sciences include:
Make assumptions explicit
Decades of work in biology have produced enormous
amounts of knowledge rendering it difficult to see the forest
for the trees, that is, to judge what the important players and
effects are. A mathematical description necessitates being
explicit about what the assumptions of a model are.
Understand essential properties from failing models
If a mathematical models fails to describe biological data, this
gives the valuable information that the assumptions of the
model missed an essential part.
Condense information, handle complexity
The huge extent of biological knowledge is also an obstacle
since it does not allow for intuition-based reasoning due to its
complexity. Mathematical modeling can help handle the
complexity by condensing it into a model.
Understand role of dynamical processes, for example,
feedback
Dynamic properties like combinations of positive and
negative feedbacks induce system properties that can only be
captured by mathematics, see Ref. 16 for an example, where a
mathematical treatment elucidates why cells react differently
to transient and sustained stimuli.
Impossible experiments become possible
Mathematical models allow for in silico biology. Experiments
that might be impossibIe biochemically can be conducted
using the computer.
Prediction and control
On the basis of mathematical models, new experiments can
be suggested and their outcome can be predicted. Especially,
the control of networks can be investigated in silico. This
enables identification of targets for medical intervention.
Understand what is known
Pure biological facts can be understood in the context of
dynamic behavior.
1048
Discover general principles

It is expected that nature developed a limited number of
“tricks” and principles independent of specific
implementations to ensure, for example, robustness of the
biological function in a noisy environment. Mathematics will
be helpful in discovering these general design principles.
“You don’t understand it until you can model it”
Being able to mathematically model a biological process
might be the final proof of understanding.
All these arguments apply to biology in general; but due to its network
structure, especially to cell biology in terms of metabolism, signal transduction,
and gene regulation.
Systems biology can be defined as the endeavor to understand biomedical
systems using data-based mathematical modeling of their dynamic behavior.
The final goal is to turn the life sciences from a qualitative, descriptive science
into a quantitative, predictive science. Systems biology relies on other fields of
research but should also be distinguished from them, since systems biology is
more than . . .
. . . Mathematical Biology because systems biology is data
based
Mathematical Biology formulates and investigates
mathematical models inspired by biology but it is de facto a
part of mathematics often not getting back to biology. Systems
biology requires close collaborations between theoreticians
and experimentalists. This ranges from the joint planning of
experiments to the corporate interpretation of the results of
the mathematical models including the formulation of new
hypotheses to be tested in the next cycle between “wet-lab”
and “dry-lab”.
. . . Bioinformatics because systems biology considers the
dynamics
Bioinformatics is an important basis for systems biology in,
for example, identifying the components involved but does
not deal with the dynamic aspects of networks that are
substantial for systems biology.
. . . another ‘omics’-technologybecause systems biology
involves mathematics
Proteomics, genomics, metabolomics, and other
high-throughput technologies to monitor the state of cells in
certain respects provide important information for systems
biology, but systems biology should not be understood as
“putting the . . .omics together”. It should be noted that the
term systems in systems biology originates from systems
sciences, that is, the mathematical discipline of how to infer

properties from dynamical models.
. . . “one Postdoc - one protein” because systems biology
considers the system
Although “systems” in systems biology stems from systems
sciences the goal is to understand systems in the colloquial
sense. The detailed investigation of the components of the
systems is the indispensable basis to reach this aim.
17.1.4
Practical Example
Considerable progress has been made in identifying the molecular composition

of complex signaling networks. However, as outlined above, to reveal the
systems properties, quantitative models based on experimental observations
have to be developed. In this section, the core module ofthe JAK-STATpathway
of the Epo receptor is investigated. On the basis of time-resolved quantitative
measurement of the receptor activity, unphosphorylated and phosphorylated
STAT-5 in the cytoplasm, the parameters in differential equations describing
the pathway are estimated. The analysis will show that the so far believed
assumption of a feed-forward cascade to describe the pathway is not compatible
with the experimental data. A generalization of the model that includes
nucleocytoplasmatic cycling is suggested. The final model is validated by
successfully predicting the outcome of a new experiment. From this model,
we infer the time courses of the unobserved STAT-5 populations and show
that, on a systems level, fast nucleocytoplasmatic cycling of STAT-5 serves as
a remote sensing system to couple gene activation to receptor activity.
The JAK-STATpathway of the Epo receptor is essential for proliferation and
differentiation of erythroid progenitor cells.
Binding of Epo to the extracellular part of the receptor leads to activation
by phosphorylation of the JAK2 at the cytoplasmic domain of the receptor. In
turn, this leads to receptor recruitment and to phosphorylation of monomeric
STAT-5, a member of the STAT family of transcription factors. The phos-
phorylated monomeric STAT-5 forms dimers and these dimers migrate into
the nucleus where they bind to promoter regions of the DNA and initiate
gene transcription. Afterwards, it is dephosphorylated and dedimerized. It was
debated whether STATs are degraded in the nucleus [27],or exported back to
the cytoplasm [28]. In any case it was believed that the active role of STAT-5
ends in the nucleus. Thus, the JAK-STAT signaling pathway represents a
feed-forward cascade. Its graphical representation is given in Fig. 17.1-1.
Assuming mass-action kinetics and denoting the amount of activated
Epo receptors by EpoRA(t), unphosphorylated monomeric STAT-5 by ( t ) ,
phosphorylated monomeric STAT-5 by x l ( t ) , phosphorylated dimeric STAT-5
in the cytoplasm by x3 ( t ) and phosphorylated dimeric STAT-5 in the nucleus
1050
Fig. 17.1-1 Graphical representation of the JAK-STAT pathway o f

the Epo receptor. The dashed line represents a possible export of
STAT-5 from the nucleus back t o the cytoplasm that is, however,
not involved in the signaling.
by x 4 ( t ) ,we arrive at the following dynamic model where the time dependence
is suppressed for the sake of clarity:
23 = +0.5 k 2 ~ :- k 3 ~ 3 (3)
k4 =+k3~3 (4)
These equations describe the yield and loss of the different components.
For example, Eq. (1) states, that the unphosphorylated STAT-monomer x1
is reduced, expressed by the minus sign, with a certain rate k l due to the
interaction of the STAT-monomer with the activated receptor described by
x1E ~ o R A .Since this interaction leads to the phosphorylated STAT-monomer
x 2 , the same term as in Eq. (l),but with positive sign appears in Eq. (2). The
second part of Eq. (2) describes the loss of the phosphorylated STAT-monomer
x 2 by dimerization with rate constant k z . This term appears in Eq. (3) with the
factor of 0.5 since two monomers form one dimer. The second term in Eq. (3)
and the right-hand side of Eq. (4),finally, describe the transport of the dimer
into the nucleus.
17. I Systems Biology oftheJAK-STATSignaling Pathway I 1051
The initial values for x2,x3,and x4 are zero, the initial value for x1 is a free
parameter that in addition to the parameters k l , k 2 , and k3 has to be estimated
from the data.
These equations have a vivid meaning. For example, Eq. (1)means that the
rate of change of the unphosphorylated monomer is negative and proportional
to the interaction of the monomer with the activated receptor. The rate is
determined by kl .
By quantitative immunoblotting, the time courses of the phosphorylated
(monomeric, x2, and dimeric, xj) STAT-5 in the cytoplasm y t ( t ) , the total
amount of STAT-5in the cytoplasm y 2 ( t ) and the activation of the Epo receptor
y 3 ( t ) , were determined. The measured values represent relative units. For a
detailed description of the biochemical techniques to measure the different
components, see Ref. 21.
All together, the observation equations read:
where k5 - k7 have to be included as scaling parameters since only relative

units can be measured. The factor of 2 in Eqs. (5, 6) reflect the fact that
a dimer produces a signal that is twice as high as a monomer. Note, that
E ~ o R Ameasured
, by y 3 , is not a dynamical variable but an external input. The
observables y1 and y2 will later be used to estimate the parameters.
To first gain insights into the properties of this system, a simulation study
is performed. Therefore, all parameters are set to 1, and an artificial Epo-
receptor time course is chosen. The dynamical model is solved numerically
and the observation equations are evaluated. The resulting time courses for the
phosphorylated STAT-5 in the cytoplasm y l and the total amount of STAT-5
in the cytoplasm y2 are displayed in Fig. 17.1-2.
The qualitative behavior is identical for all parameter settings: The
phosphorylated STAT-5 in the cytoplasm shows a biphasic behavior, the
total amount of STAT-5 in the cytoplasm decreases monotonically. However,
the quantitative behavior depends on the parameters. Thus, if simulated
model predictions are compared to experimental data, it is difficult to decide
whether discrepancies between simulated and measured data result from
inadequate parameters or from an insufficient model. To resolve this simulation
dilemma [29], we will estimate the parameters from the experimental data.
Mathematically, the equations of the system under investigation can be
summarized as:
1052
Simulation 3 Simulation 4
1'
08
0" #I
0 2 4 6 8 10
t t
Fig. 17.1-2 Results of a simulation study for yl (phosphorylated

STAT-5 in the cytoplasm, red) and yz (total STAT-5 in the
cytoplasm, blue). Initially, upper left, all parameters are set t o 1,
for the other plots parameters k l t o k:, are set to 10.
Equation (8) captures the dynamical equations (1-4), the parameters, and
the activation ofthe Epo receptor as an external input u. Equation (9)describes
how the sampled observables are linked to the dynamical variables and
also includes observational noise &(ti) always present in experimental data.
Estimation of the parameters is based on minimizing the error function:
where $ ( t i ) denotes the experimental data, $(ti;Il(t = 0), k) denotes the

model predictions depending on the parameters and the initial values, and
02 denotes the variance of the noise. Numerical techniques are established to
Y
fulfill this task [29, 301.
Figure 17.1-3 displays time courses of Epo-receptor activation, phosphory-
lated STAT-5in the cytoplasm, and the total amount of STAT-5 in the cytoplasm
for one representative experiment. The receptor displays its maximal activity
8 min after stimulation. In the time series ofphosphorylated STAT-5,a plateau

is reproducibly detected between 10 and 30 min.
With the feed-forward model, Eqs. (1-3), derived from the graphical
representation in Fig. 17.1-1, the experimental data in Fig. 17.1-3, connected
to the model by Eqs. (5-7), and the numerical techniques to estimate the
parameters, we arrive at the modeling results displayed in Fig. 17.1-4.
For the phosphorylated STAT-5 in the cytoplasm, the model does not
capture the plateau between 10 and 30 min and the behavior of total STAT-5
2ot I 4
:01j
5
OO -72;-30- 40
O
Lp>
50
m
60
(4 Time (min) Time (min)
1.2,
a t
Fig. 17.1-3 Examples ofthe measured time series. (a) Activation o f t h e Epo receptor.
(b) Phosphorylated STAT-5 in the cytoplasm. (c) Total amount o f STAT-5 in the cytoplasm.
35 1.2
--.__
(I)
I-.-
-_I
- -
0 10 20 30 40 50 60
(4 Time (min) (b) Time (min)
Fig. 17.1-4 Fit ofthe feed-forward model, Eqs. (1-4), to the measured time series o f
phosphorylated (a) and total (b) STAT-5 in the cytoplasm.
1054
I in the cytoplasm is completely missed. This calls for a reconsideration of the
17 Computational Methods and Modeling
biological assumptions that led to Fig. 17.1-1.

In an iterative process different extensions of the model were tested, see [21,
31, 321 for mathematical and statistical details. The result is that the export of
STAT-5 from the nucleus plays an active and essential rule in this pathway.
The export of STAT-5 was modeled by a delay term xi = x3(t - r ) , describing
the sojourn time of STAT-5in the nucleus. The extended model reads:
Xi = 2p4X; - PlxlEpoRA (11)
24 = p3x3-p4xj (14)
The results of a fit of this model to the data are displayed in Fig. 17.1-5 and
demonstrate a good agreement of the model trajectories with the experimental
data. As a surprising result, the sojourn time T of STAT-5 in the nucleus
turned out to be approximately G min. The fitted trajectory for phosphorylated
STAT-5 shows that the "plateau" between 10 and 30 min is not a plateau, but
results from waves of phosphorylated STAT-5through the nucleus.
Simulating the model allows investigation of the single populations x1 to x4
of STAT-5.The in silico results are given in Fig. 17.1-6.
It is observed that the unphosphorylated monomer x1 is completely pro-
cessed in the first wave of activation, Furthermore, the concentration of the
phosphorylated monomer x2 is low for the whole time because the dimerization
process is fast. This explains the experimental experience that the phosphory-
lated monomer is difficult to measure. The model explains this fact in a natural
way. On the basis of the fitted model, a sensitivity analysis is performed. These
in silico investigations mean that the parameters of the model are changed and
the (predicted) effect on the function of the system is determined. Because we
35 I 1.2
m IT
Time (min) (b) Time (min)
Fig. 17.1-5 Fit o f t h e extended model, Eqs. (11-14), including

nucleocytoplasmatic cycling t o the measured time series of
phosphorylated (a) and total (b) STAT-5 in the cytoplasm.
7 7.7 Systems Biology oftheJAK-STAT Signaling Pathway I 1055
Fig. 17.1-6 In silico results. Simulation ofthe single STAT

components. Blue: unphosphorylated monomer X I , black:
phosphorylated monomer x z , green: phosphorylated dimer in the
cytoplasm x 3 , red: phosphorylated dirner in the nucleus x 4 .
deal with signal transduction, activation of target gene is the most important
function. For the study, target gene activation is assumed to be proportional to
the shuttling STAT-5 in the nucleus. The results are displayed in Fig. 17.1-7.
Surprisingly, the first step in the network, that is, variation of the
phosphorylation of the monomeric STAT-5 described by kl has the smallest
Fig. 17.1-7 I n silico results. Sensitivity dimerization ( k z ) , green: nuclear import

analysis. Predicted influence ofthe single ( k 3 ) , red: sojourn time in the nucleus ( r ) ,
parameters on gene transcription. Black: yellow: nuclear export ( k 4 ) .
phosphorylation o f t h e monomer ( k l ) , blue:
1056
I influence on gene activation. It can be varied by a factor of 2, showing next to
no effect. The parameters describing the nucleocytoplasmic shuttling ( k 3 , k 4 ,

and t) have the largest influence. Especially, setting k4 to zero, meaning to
inhibit the nuclear export, is predicted to decrease target gene activation by a
factor of 2.
This prediction can be tested experimentally. The substance Leptomicin
B inhibits the nuclear export of STAT-5. Figure 17.1-8(a) shows the time
course of the protein CIS whose translation is initiated by the JAK-STAT
signaling pathway. The areas under the curves differ roughly by the predicted
factor of 2. Results for repeated experiments in Fig. 17.1-8(b)demonstrate that
Leptomicin B has no effect on CIS translation without Epo stimulus. In the
case of stimulation, the protein production is decreased by a factor of 2 if
Leptomicin B is applied, which confirms the in silico prediction of the extended
model and finally validates the model.
In summary, the mathematical model allows for the inference of two
system’s properties
STAT-5 is not available in excess. The cell acts economically:
By cycling STAT-5 is “recycled”.
Fast cycling of STAT-5 represents a remote sensor system to
closely couple gene expression to receptor activation.
A saying in mathematical modeling reads: “All models are wrong, but some
are useful”. This also holds in the presented case:
Fig. 17.1-8 Experimental confirmation of the in silk0 prediction

ofthe extended model. (a) Time course of the translation o f the
proteins CIS with and without inhibition ofthe nuclear export of
STAT-5 by Leptomicin B (LMB). (b) Summary of repeated
experiments.
17.1 Systems Biology oftheJAK-STATSignaling Pathway I 1057
No scaffolding for receptor- STAT-5interaction

The interaction OFSTAT-5with the receptor that we have
described by Eq. (1)is a highly complex process. A detailed
modeling of this process would require up to 50 equations
containing approximately the same number of parameters.
Spatial effects
We have treated the cell as a well-stirred reactor, which is
certainly not true for the highly structured cell.
Stochastic effects
The deterministic description by the proposed model does not
capture the stochastic effects that are always present in living
systems.
Data averaged over 10‘ cells
The biochemical process to generate the experimental data
averages over 10‘ cells, which are not identical.
Nevertheless, the final model is reasonable because it fulfills the two central
requirements of a successful model:
Capture the main effect
Make testable predictions
Defacto the above listed shortcomings are not relevant. Even more, it is in
fact not desirable to have a model that exactly copies the cell. I t would have too
many parameters and it would not tell what the relevant effects are.]) In this
sense, successful modeling means to make well-chosen “errors”.
In summary, the example has shown that given quantitative time-resolved ex-
perimental data, it is possible to turn qualitative, static cartoons like Fig. 17.1-1
into quantitative dynamical models allowing for
Testing the cartoon
Calculating unobservable components
Manipulating the system in silico
Identifying efficient manipulation targets
Predicting the outcome of new experiments
Inferring systems’ properties
17.1.5
Future Development
The limiting factor in systems biology is high quality data [16]. Mathematical
modeling can only give as much information as is coded in the data. Unfortu-
nately, most techniques including the high-throughput “omics” technologies
1) In analogy to Goethe’s saying: “If I draw my

dog exactly as he is, I will have two dogs, but
never a piece of art”, for modeling holds: “If
I model the cell exactly as it is, I will have two
cells. but never a model”.
7 7 Computational Methods and Modeling
1058
I up to now produce mainly qualitative data. The rapid technological develop-
ments in these areas and new techniques like quantitative immunoblotting 1331
or protein chips will allow building and validating larger models, including
also the interactions between signaling, and gene regulatory and metabolic
networks. So far, most of the measurement techniques average over a large
number of cells not taking into account cell-to-cellvariability. Imaging methods
will allow investigation of the dynamic behavior in single cells [34, 351.
On the basis of these technologies systems biology is expected have a major
impact on medicine:
As demonstrated by Fig. 17.1-7in the above application to the
JAK-STATpathway, sensitivity analysis can contribute to the
identification of drug targets facilitating the early stages of
drug development.
Adverse effects are a major reason for terminating clinical
trials in the late stages of drug development. Systems biology
models, including, for example, drug metabolism, can help
discover adverse effects earlier.
The effects of the drugs show a large interindividual
variability due to polymorphisms [36, 371. Systems biology
approaches taking this into account will help in transferring
current medicine from mainly being reactive to a predictive
and preventive personalized medicine as visualized in Ref. 38.
References
1.N. Wiener, Cybernetics, or Control and blinkers: dynamics of regulatory and

Communication in the Animal and the signaling pathways in the cell, C u r .
Machine, MIT Press, Cambridge, 1948. Opin. Cell Biol. 2003, 15, 221-231.
2. L. von Bertalanffy, General Systems 8. S. Shen-Orr, R. Milo, S. Mangan,
n e o r y , Braziller, New York, 1968. U. Alon, Network motifs in the
3. R. Heinrich, T.A. Rapoport, A linear transcriptional regulation network of
steady-state treatment of enzymatic Escherichia coli, Nut. Genet. 2002, 31,
chains. General properties, control 64-68.
and effector strength, Eur. /. Biochem.
9. N. Barkai, S. Leibler, Robustness in
1974,42,89-95.
simple biochemical networks, Nature
4. H. Kacsar, J.A. Burns, The control of
1997,387,913-917.
flux, Symp. Soc. Exp. Biol. 1973, 27,
65- 104. 10. U. Alon, M.G. Surette, N. Barkai,
5. R. Heinrich, S. Schuster, R e
S. Leibler, Robustness in bacterial
Regulation of Cellular Systems, chemotaxis, Nature 1999, 397,
Chapman & Hall, New York, 1996. 168-171.
6. R. Heinrich, B.G. Neel, T.A. Rapoport, 11. N. Bliithgen, H. Herzel, How robust
Mathematical models of the protein are switches in intracellular signaling
kinase signal transduction, Moi. Cells cascades? J . Theor. Biol. 2003, 225,
2002, 9,957-970. 293-300.
7. J.J. Tyson, K.C. Chen, B. Novik, 12. G. von Dassow, E. Meir, E.M. Munro,
Sniffers, buzzers, toggles and G.M. Odell, The segment polarity
References I1059
network is a robust developmental cell division cycle of fission yeast,

module, Nature 2000, 406, 188-192. Chaos 2001, 1 I , 277-286.
13. J. Stelling, U. Sauer, 2. Szallasi, 23. B. Novak, J.J.Tyson, Modelling the
F.J. Doyle, J. Doyle, Robustness of controls of the eukaryotic cell cycle,
cellular functions, Cell 2004, 118, Biochem. Soc. Trans. 2003, 31,
675-685. 1526- 1529.
14. A.R. Asthagiri, D.A. Lauffenburger, A 24. H.Kitano, Foundations ofsystems
computational study of feedback Biology, MIT Press, Cambridge, 2001
effects on signal dynamics in a 25. E. Klipp, R. Henvig, A. Kowald,
mitogen-activated protein kinase C. Wierling, H. Lerrach, Systems
(MAPK) pathway model, Biotechnol. Biology in Practice, Wiley-VCH,
Prog. 2001, 17, 227-239. Weinheim, 2005.
15. B. Schoeberl, C. Eichler-Jonsson, 26. L. Alberghina, H.V. Westerhoff,
E.D. Gilles, C . Muller, Computational Systems Biology,Springer, New York,
modeling of the dynamics of the MAP 2005.
kinase cascade activated by surface 27. T.K. Kim, T.Maniatis, Regulation of
and internalized EGF receptors, Nut. interferon-y-activated STATl by the
Biotechnol. 2002, 20, 370-375. ubiquitin-proteasome pathway, Science
16. U.S. Bhalla, P.T. Ram, R. Iyengar, 1996,273,1717-1719.
MAP kinase phosphatase as a locus of 28. M. Koster, H. Hauser, Dynamic
flexibility in a mitogen-activated redistribution of STATl protein in
protein kinase signaling network, IFN signaling visualized by GFP
Science 2003, 297,1018-1023. fusion proteins, Eur. J. Biochem. 1999,
17. M. Fussenegger, J.E. Bailey, J. Varner, 260,137-144.
A mathematical model of caspase 29. J. Timmer, H. Rust, W. Horbelt,
function in apoptosis, Nat. Biotechnol. H.U. Voss, Parametric, nonparametric
2000, 18,768-774. and parametric modelling of a chaotic
18. T. Eissing, H. Conzelmann, circuit time series, Phys. Lett. A 2000,
E.D. Gilles, F. Allgower, E. Bullinger, 274, 123-134.
P.Scheurich, Bistability analyses of a 30. H.G. Bock, Recent advances in
caspase activation model for parameter identification for ordinary
receptor-induced apoptosis, /. Biol. differential equations, in Progress in
Chem. 2004, 279, 36892-36897. Scientijc Computing, vol. 2, (Eds.:
19. M. Bentele, I. Lavrik, M. Ulrich, P. Deuflhard, E. Hairer), Birkhauser,
S. StoBer, H. Kaltoff, P.H. Krammer, Boston, MA, 1983,95-121.
R. Eils, Mathematical modeling 31. T.G. Muller, D. Faller, J. Timmer,
reveals threshold behavior of I. Swameye, 0 . Sandra,
CD95-induced apoptosis, /. Bid. U. Klingmuller, Tests for cycling in a
Chem. 2004, 166,839-851. signalling pathway, J. Royal. Stat. Soc.
20. E. Lee, A. Salic, R. Kruger, C: Applied Stat. 2004, 53, 557-568.
R. Heinrich, M.W. Kirschner, The 32. J.Timmer, T. Muller, 0. Sandra,
roles of APC and Axin derived from 1. Swameye, U. Klingmuller,
experimental and theoretical analysis Modelling the non-linear dynamics of
of the Wnt pathway, PLoS 2003, 1, cellular signal transduction, Int. /.
116-132. Bfurcat. Chaos 2004, 14,2069-2079.
21. I. Swameye, T. Muller, J. Timmer, 33. M.Schilling, T.Maiwald, S. Bohl,
0. Sandra, U. Klingmuller, M. Kollmann, J. Timmer,
Identification of nucleocytoplasmic U . Klingmuller, Quantitative data
cycling as a remote sensor in cellular generation for systems biology - the
signaling by data-based modeling, impact of randomisation, calibrators,
Proc. Natl. Acad. Sci. U.S.A. 2003, 100, and normalisers, I E E Proc. Systems
1028-1033. Biology, 2006, 152, 193-200.
22. B. Novak. 2. Pataki, A. Ciliberto, 34. D.E. Nelson, A.E.C. Ihekwaba,
1.7. Tyson, Mathematical model of the M. Elliott, J.R. Johnson, C.A. Gibney,
1060
B.E. Foreman, G. Nelson, V. See, CYP2B6 gene with impact on

C.A. Horton, D.G. Spiller, expression and function in human
S.W. Edwards, H.P. McDowell, liver, Phamacogenetics 2001, I I,
J.F. Unitt, E. Sullivan, R. Grimley, 399-415.
N. Benson, D. Broomhead, D.B. Kell, 37. 0. Burk, H. Tegude, I. Koch,
M.R.H. White, Oscillations in NF-KB E. Hustert, R. Wolbold, H. Glaeser,
signaling control the dynamics of gene K. Klein, M.F. Fromm, A.K. Nuessler,
expression, Science 2004, 306, P. Neuhaus, U.M. Zanger,
704-708. M. Eichelbaum, L. Wojnowski,
35. N. Rosenfeld, J.W. Young, U. Alon, Molecular mechanisms of
P.S. Swain, M. Elowitz, Gene polymorphic CYP3A7 expression in
regulation at the single-cell level, adult human liver and intestine, /.
Science 2005, 307,1962-1965. Biol. Chem. 2002,277,24280-24288.
36. T. Lang, K. Klein, J. Fischer, 38. L. Hood, J.R. Heath, M.E. Phelps,
A.K. Niissler, P. Neuhaus, B. Lin, Systems biology and new
U. Hofmann, M. Eichelbaum, technologies enable predictive and
M. Schwab, U.M. Zanger, Extensive preventative medicine, Science 2004,
genetic polymorphism in the human 306,640-643.
Chemical Biology
17.2 Modeling lntracellular Signal Transduction Processes 1 1061
17.2
Modeling lntracellular Signal Transduction Processes
Jason M. Haugh and Michael C. Weiger
Outlook
The ability to control normal and diseased cell function will require quantitative
analyses of how cells perceive and decode information. Involving enzyme-
catalyzed reactions and assembly of protein-protein and protein-lipid
complexes that modulate enzyme activity, signal transduction is the biochemical
integration of information inside the cell, and manipulation of signal
transduction networks thus offers a broad-based approach to influence cell
behavior. Mathematical modeling approaches, wherein chemical kinetics,
spatial distributions of molecules, and biophysical constraints may be described
in dynamic and unambiguous terms, are being applied with increasing
frequency to analyze biochemical signaling mechanisms more critically. Once
validated by quantitative measurements, such models may soon offer a means
to predict the integrated behavior of interacting pathways and combinations
of cell stimuli. We discuss here the recent advances in, and challenges faced
by, this emerging field.
17.2.1
Introduction
The past 15 years or so have seen a shift in the focus of biological research to
the study of molecular mechanisms underlying cell regulation and function.
Thus, we now have a qualitative roadmap of how intracellular molecules are
organized to form signal transduction pathways, which govern cell decision-
making in a tightly controlled, context-dependent manner [l].However, it is
not yet fully appreciated how biochemical mechanisms affect the kinetics of
pathway activation, or how the magnitudes and/or timing of those signals are
related to the likelihood and quality of a cell response.
Mathematical modeling of signal transduction interactions, pathways, and
networks is emerging as a powerful tool that can aid in explaining and
interpreting experimental data. In most cases, the explanations are fairly
intuitive (at least in hindsight) once the model has been applied to the problem
at hand; in other cases, the conclusions are less so. In any case, quantitative
models provide a way to organize hypotheses and integrate the many effects
that may be at play. If done correctly, all the inherent assumptions are clearly
laid out, because the system is described in the unambiguous language of
mathematics.
Chemical Biology. From Small Molecults to System Biology and Drug Design
Edited by Stuart L. Schreiber. Tarun M . Kapoor. and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KCaA, Weinheim
ISBN: 978-3-527-31150-7
1062
In theory, quantitative models of signaling processes offer two distinct

advantages over the conceptual models routinely invoked in the signaling
literature. First, models may be formulated that are mechanistic, meaning they
are based on established principles of physical chemistry and/or mechanics, in
which case the form of the model equations is determined by the hypothetical
mechanism assumed. In many cases, one may formulate multiple models
corresponding to different candidate mechanisms and rule out one or more of
them on the basis of a quantitative analysis. Models that are phenomenological,
on the other hand, aim to capture at least the qualitative features observed
in experiments. They are naturally less powerful, but they serve a definite
and useful role and are appropriate in situations where the mechanisms that
“connect the dots” are much less certain. Second, to the extent that the model
has been trained on a large amount of high-quality quantitative data, and its
mechanistic assumptions are sound, it may be used to predict the outcomes
of novel experiments and may thus generate new, hypothesis-driven research.
Some of the experimental findings will inevitably contradict the predictions
of model, but just as with conceptual models, one would iteratively refine the
model on the basis of such data.
In this chapter we aim to review the progress that has been made in modeling
signal transduction, mostly in recent years and also note the pioneering
contributions to this field, and we critically assess the open questions that need
to be addressed, if the field is to advance. We have intentionally organized the
discussion in a top-down manner, starting from the cell’s initial perception of
external stimuli and building up step by step to the complex models, which
incorporate multiple, interacting signaling pathways (Fig. 17.2-1). Although
reductionism is not so fashionable these days, we wish to stress that there
is still much to learn from generalized models of relatively simple systems,
and that it is easy to neglect the details as we strive toward models of greater
scope [2]. Finally, we refer the interested reader to a number of related reviews
published recently on the topic of modeling signal transduction [3-81.
17.2.2
Receptor-Binding and Regulation Mechanisms
The first step in most signaling pathways is the binding ofcell surface receptors,
which links the presence and concentration of a specific extracellular ligand
to the intracellular processes that ultimately govern the cellular response.
One often thinks of receptor binding simply as a reversible, bimolecular
process, characterized by the dissociation (inverse equilibrium) constant, KD;
an apparent KD value is generally defined as the free concentration ofligand that
yields half-maximal binding to the cell surface (or to receptors immobilized on
a solid support). In the simplest model, each ligand-bound receptor is activated
for signal transduction. This picture belies a number of complexities, however,
which are most often neglected in models of signal transduction. Arguably the
7 7.2 Modeling lntracellular Signal Transduction Processes I 1063
Fig. 17.2-1 Fundamentals o f intracellular recruitment and covalent modification o f

signaling. In this chapter, we discuss signaling adaptors and enzymes, many o f
modeling o f intracellular signaling which act upon substrates associated with
processes from the top down. (a) One must the membrane or colocalized in the receptor
first consider the binding o f ligands to complex. (c) In many situations, such as the
receptors and receptor dimerization a t the perception of ligand gradients, one must
cell surface, as well as the internalization explicitly account for the spatial patterns o f
and intracellular processing of receptors and intracellular signaling molecules. After
ligands, which affect the number o f establishing these general concepts, we
functional complexes available for signaling. discuss modeling o f the downstream
(b) Signaling complexes organize signal signaling pathways and networks.
transduction pathways through the
two most important complexities involve the dimerizationlaggregation and

intracellular trafficking of receptors, which significantly impact the kinetics and
dose response of receptor activation and subsequent intracellular signaling.
17.2.2.1 Receptor Dimerization

Many receptors form dimers or higher oligomers on the cell surface,
spontaneously andlor in response to ligand binding. In many cases, receptor
dimerization is required for downstream signal transduction. For example,
structural constraints generally prevent receptor tyrosine kinases (RTKs)from
phosphorylating their own cytosolic tails in an intramolecular fashion, and thus
dimerization permits phosphorylation of receptor sites in trans. In the case
of multi-subunit receptors such as the interleukin 2 (IL-2) receptor, different
subunits can bring together distinct non-RTKs that rely on each other for
activation. Although many receptor systems rely on dimerization, this process
can occur in different ways, and models can be and have been used to discern
between candidate mechanisms. The underlying issues informing such
models include the number of binding sites per ligand and receptor molecule,
whether multiple subunits/receptor types are involved and their relative
affinities for ligand, and whether ligand binding is required/sufficient for
dimerization or if other receptor domains are involved. These considerations
and the receptor density determine whether receptor activation will exhibit
1064
I a hyperbolic (as for
1: 1 binding or Michaelis-Menten kinetics), sigmoidal

(apparent cooperativity),or bell-shaped dose-response curve (Fig. 17.2-2), and
evaluation of candidate models is generally achieved through comparisons
c
0
.- -
m
+
>
.-
C I -
m
0
-m
c -
.-0
0.01 0.1 1 10 100

[Ligand], dimensionless
0.01 0.1 I 10 1001000

[Ligand], arbitrary units
Fig. 17.2-2 Receptor dimerization complexes, the resulting steady-state dose
mechanisms and dose response. The response (solid curve) is predicted t o exhibit
manner in which receptor dimers form more cooperativity than does the simple 1 : 1
affects the dose response o f receptor binding case (dashed curve); here, ligand
activation and downstream signaling. Here concentration is normalized by the value
we invoke simple, steady-state models that that yields half-maximal activation. (b) When
account for receptor binding, dimerization, dimers for via lateral association o f one 1 : 1
and trafficking t o illustrate this point. complex and a free receptor, a bell-shaped
(a) When dirners form via the lateral dose-response curve is predicted.
association o f t w o 1 : 1 ligand-receptor
77.2 Modeling lntracellular Signal Transduction Processes I 1065
with quantitative ligand binding and receptor activation data measured at

various times and/or ligand concentrations. Models that focus on or include
receptor dimerization have emerged for epidermal growth factor (EGF) [9-131,
insulin [14], fibroblast growth factor [15, 161, FcERI (immunoglobulin E) [17,
181, platelet-derived growth factor (PDGF) [19], human growth hormone [20,
211, and IL-2 [22] receptors.
17.2.2.2 Receptor Trafficking

Receptors are not static on the cell surface, as the plasma membrane and
all its constituents are turned over at various rates. Membrane proteins
undergo endocytosis, whereby they are internalized in vesicles that bud off
from the plasma membrane and later fuse with endosomes inside the cell.
There, they are sorted for one of two fates: recycling back to the plasma
membrane, or degradation in lysosomes. Receptor trafficking processes
are modulated in response to receptor binding through a combination of
protein-protein interactions and covalent modifications (e.g., ubiquitylation),
which can specifically immobilize/sequester activated receptors in endocytic
or endosomal structures or otherwise mark them for enhanced internalization
and/or degradation rates. Certain growth factor receptors of the RTK family,
as well as other receptor types are regulated in this fashion, which over time
leads to a significant downregulation of the number of receptors available
for binding and signaling at the cell surface. Models accounting for these
effects at various timescales and levels of abstraction have been offered, most
notably for EGF/EGF receptor [23-251, and for other systems as well [21,
26, 271. Besides the consideration of receptor abundance at the cell surface,
one must also consider whether the receptor remains ligated and/or active
in endosomes, and if so which signaling processes endure or are initiated
there. Although it is commonly assumed that internalized receptors are silent,
implicitly or based on specific evidence, compartmentalization of signaling and
its potential role in prolonging specific signaling events have been considered,
using modeling [28, 291.
17.2.3
Receptor-mediatedCovalent Modifications and Molecular Interactions
Once the functional receptor-ligand complex has been assembled, it is rapidly

activated for intracellular signaling. This often occurs through conformational
changes in the receptor, which result in the switching on of an intrinsic
enzymatic activity or the association of enzymes from the cytosol. In the case of
G-protein-coupled receptors, the story ends here, as ligated receptors may then
activate heterotrimeric G-proteins that are precoupled to the receptor or that
encounter receptor complexes by lateral diffusion in the membrane. However,
growth factor and cytokine receptors present a more complex situation, given
1066
I the aforementioned phosphorylation of one or more receptor subunits by
receptor-associated kinase activities. Receptors tend to be phosphorylated on

multiple sites, and each site may be phosphorylated to a different extent on
an average. They are phosphorylated by the kinase(s) and dephosphorylated
by protein phosphatases in a dynamic fashion and at various rates, and the
pattern of phosphorylation might change with increasing receptor occupancy.
The general purpose of receptor phosphorylation is to provide a scaffold for
the association of cytosolic signaling enzymes and adaptor proteins, which
possess one or more modular binding domains (e.g., Src-homology 2 and
phosphotyrosine-binding domains) that recognize specific phosphorylation
sites. The recruited proteins are thus activated to initiate various signaling
pathways, and each functional receptor might have the capacity to form large,
multiprotein complexes.
17.2.3.1 Receptor Phosphorylation and Binding States

It is clear that even these early stages of receptor signaling present significant
challenges from the standpoint of modeling, as one has to decide whether to
ignore or account for the combinatorial diversity of phosphorylated receptor
species and their complexes with intracellular proteins. The former strategy
is adopted most often, particularly when the downstream signal transduction
is the focus, which may be appropriate when phosphorylation of a specific
site and the resulting activation of an enzyme are known or assumed to
be independent from other processes. One must deal with these issues,
however, when receptor-binding sites overlap or when one receptor-bound
protein affects another in the complex. To this end, the Cell Signaling
group at Los Alamos National Labs has recently devised a general modeling
framework that accounts for all possible receptor species while assuming that
receptor binding, dimerization, and receptor phosphorylation are kinetically
independent [4,181. Such assumptions are generally necessary to avoid an
explosion in the number of rate constant values that must be specified.
Another recent model has explicitly considered receptor-mediated regulation
and localization of phosphatase activities (e.g., Shp-1 and -2) as a means
of modulating receptor phosphorylation states and signaling specificity [30].
Even with these advances, we are far from capturing the true complexity in the
formation of receptor complexes; multivalent interactions between different
proteins suggest the possibility that protein interactions form cyclic (ring)
structures, which could be important for maintaining the stability of the
complex but are notoriously difficult to model even in the simplest cases [31].
Proteins in complex with activated receptors are often phosphorylated by
the associated kinase(s), leading to modulation of enzymatic activity or, in
the case of adaptor proteins such as Shc, IRS-11-2, and Gab-11-2, binding
of other proteins to the phosphorylated site(s). Because these proteins
are substrates of receptor-associated kinase activity, they are commonly
assumed to leave the receptor complex after phosphorylation in some models
[ll,321, according to the Briggs-Haldane mechanism of enzyme action. Most

of the biochemical evidence suggest otherwise, however, as the binding
domains tend to be truly modular, and hence other models have treated the
binding and phosphorylation of receptor-binding proteins as independent
events [28, 331. Certain phosphorylated enzymes such as phospholipase C
(PLC) and phosphoinositide (PI) 3-kinase act on substrates at the plasma
membrane and do so in a spatially localized manner, consistent with the
view that maintenance of receptor association is critical for the functions of
these enzymes. This is the perspective from which some models of these
pathways have been formulated [ 19, 341. Considering this, receptor binding
of certain phosphorylated proteins may be compromised by competing intra-
or intermolecular interactions, reflecting the need to access other locations or
compartments; the phosphorylation and dimerization of STAT transcription
factors is a case in point. Generally speaking, one needs to carefully consider
whether phosphorylation of a particular protein affects its receptor-binding
properties.
17.2.3.2 Kinetic Considerations

Ligands with sub-nanomolar effective KD values, including many growth
factors and cytokines, tend to form functional receptor complexes that remain
active for several minutes. In fact, some receptor dimers may dissociate so
slowly that they rely on internalization for signal termination [21]. In such
cases, it is generally safe to assume that intracellular phosphorylation and other
reactions respond rapidly to changes in receptor occupancy (pseudosteady
state). In cases where the functional complex dissociates more rapidly,
however, one must also account for receptor complexes that are formed
but are not yet phosphorylated as well as active complexes that dissociate but
have not yet been dephosphorylated (or otherwise deactivated) (Fig. 17.2-3).
For example, such issues arise in the case of T-cell receptor engagement
of peptide-MHC complexes presented on antigen-presenting cells in which
prospective peptide ligands naturally vary widely in receptor-binding affinity.
Kinetic proofeading refers to the inability of rapidly dissociating ligands to
transmit signals, because the short-lived receptor-ligand complexes fail to
be activated, whether by dimerization, phosphorylation, association of other
proteins, and/or other mechanisms [35, 361. On the other hand, a shorter
lifetime can be advantageous when active receptors persist for some time
after ligand dissociation, particularly when ligand molecules may be limiting
in number as in the case of antigen presentation [37]. Each ligand may thus
participate in serial engagement of multiple receptors [38, 391. As discussed in
the following section, a shorter lifetime may also be beneficial when significant
spatial gradients develop in the vicinity of an active receptor.
Signaling outcomes may also be affected by disparities in the timescales
associated with intracellular processes (Fig. 17.2-3). Substrate exchange refers
to the ability of phosphorylated (or otherwise modified) proteins to associate
1068
Fig. 17.2-3 Kinetic considerations at the kinetics o f intracellular proteins is also

level o f receptor complexes. The kinetic important relative t o the rates o f
proofreading concept (top left) holds that phosphorylation/dephosphorylation by
ligands with fast off-rates will not allow the receptor-associated, mem brane-associated,
sequence of events required for activation o f and cytosolic kinases/phosphatases.
signaling t o occur; however, a high off-rate, Substrate exchange is said t o be high when
relative t o the rate o f receptor deactivation, the kinetics are such that the
can be advantageous when the number of phosphorylation state o f the protein reflects
ligand molecules is limiting in number a global average ofthese activities.
(serial engagement, top right). The binding
and dissociate with receptor complexes before they are dephosphorylated. Slow
versus rapid exchange is determined by the relative rates of substrate phospho-
rylation, dissociation from the receptor complex, and dephosphorylation within
the complex and in the cytosol; fast exchange has the effect of homogenizing
the phosphorylation state ofthe protein, which thereby responds globally to the
average status of the receptor complexes [28, 301. The ability to hold informa-
tion about the local receptor environment, in the context of phosphorylation
within the receptor complex, requires slow substrate exchange [33].
17.2.4
Spatial Organization and Gradients on Cellular and Subcellular Length Scales
Most of the examples cited above are purely kinetic models with variables
changing only with respect to time. While processes may be compartmentalized
in such models, with rate terms that account for transfer between cellular
compartments, spatial gradients within compartments are obviously not
accounted for. In most cases, signaling molecules encounter one another
through mutual diffusion, and net molecular transport from one location to
another depends on such gradients. However, the concept of a concentration
gradient serving as a “driving force” for macroscopic diffusion leads to a
common misconception. On a microscopic level, biological molecules are
constantly in motion through collisions with water (and occasionally other)
molecules, and thus it is obvious that they can associate in the absence of
concentration gradients. If one were to survey the cytoplasm of a typical cell,
the average distance between the plasma membrane and the nucleus is in the
-
range of L 1-10 ym. The diffusion coefficient D of a small molecule such as
Ca2+or ATP in the cytosol is -103pm2 spl,and that of a larger macromolecule
is -10 pm2 spl (the cytosolic D value for green fluorescent protein, medium
sized at 27 kDa, has been measured at 40 pm2 spl).In three dimensions, the
average time associated with traversing that distance is L 2 / 6 D , which yields a
range of times from 0.2 ms to 2 s. One concludes that diffusive transport in the
cytosol is relatively efficient on cellular length scales, and that the formation of
macroscopic gradients requires a fairly rapid degradation/turnover of the
molecule. In the case of intracellular calcium and certain other second
messengers, fluorescence imaging experiments and detailed kinetic and spatial
modeling [40, 411 have demonstrated that spatial waves propagate in the cell
as a result of rapid dynamical processes characteristic of excitable media [42].
For signaling proteins that are phosphorylated or otherwise modified at the
plasma membrane and/or at endosomal membranes but dephosphorylated
throughout the cell, models have been used to evaluate the possibility and
functional consequences of gradients of these phosphorylated proteins in the
cytosol[28,43-451; when the cytosolic phosphatase activity is either very strong
or very weak, however, a kinetic model is adequate [33].
1070
1 7.2.4.1 Spatial Gradient Sensing and Chemotaxis

Spatial gradients, both inside and outside the cell, are an inherent component
of directed cell migration, or chemotaxis, in which cell movement is biased
over time toward the highest extracellular concentration of chemoattractant, or
away from the highest concentration of repellent. Such gradients are formed
as a natural consequence of physiological settings during development, the
immune response and wound healing, for example. Eukaryotic cells sense
the gradients spatially, that is by linking the local chemoattractant receptor
signaling to the cytoskeletal and/or adhesion processes that drive cell crawling.
The signaling pathways that mediate this linkage have been studied intensely in
recent years, and in cells that exhibit rapid, amoeboid migration (Dictyosteliurn
discoideum, neutrophils), it has been established that external gradients are
amplified inside the cell to the point where an all-or-none decision is made
concerning the direction of membrane protrusion. In response, numerous
models have been proposed that include autocatalytic signaling processes
or other positive feedback mechanisms, negative feedback that tends to
desensitize the response, and/or a combination of slow- and fast-diffusing
species (Fig. 17.2-4). While the classic treatment in this vein was offered
over 30years ago by Gierer and Meinhardt [4G], most of the models have
emerged recently [47-SO], in tandem with experimental work revealing some
of the underlying molecular details. One of the key features of spatial gradient
sensing is the ability to localize the intracellular second messenger(s), which
requires an appropriate turnover rate relative to diffusion across distances
of -10 ym. Well suited in this regard are membrane lipids such as 3’ PIS,
products of receptor-activated PI 3-kinases known to mediate spatial sensing
[47, 51, 521, whose role is to organize motility processes specifically at the
protruding plasma membrane.
17.2.4.2 Gradients on the Molecular Scale

The concentrating effect of enzyme recruitment by receptors combined
with the slow diffusion of membrane-associated substrates that many
signaling enzymes act upon can push such receptor-proximal reactions
into a regime in which their rates are limited by lateral diffusion of the
substrate. In such cases, substrate gradients would tend to form depletion
zones surrounding the enzyme molecules (radius -10- 100 nm). Theoretical
consideration of this problem in the biological context dates back to the
seminal contributions of Adam and Delbruck [ 5 3 ] and Berg and Purcell [54],
and more recent theories and simulations have focused on specific enzymatic
mechanisms relevant to early signaling processes [SS-581. Another layer
of complexity at this level of modeling is the subcompartmentalization or
domain structure of the plasma membrane, which has been shown using
models to affect the rates of enzyme-mediated reactions and the apparent
motion of single particles tracked at various frame rates [59-G1]. Accurate
microscopic models of signaling reactions/interactions are needed especially
17.2 Modeling lntracellular Signal Transduction Processes 1 1071
Fig. 17.2-4 Spatial sensing o f stimulation, on the other hand, yields a

chemoattractant gradients. (a-c) Depict persistent and amplified messenger
phenomena seen in gradient sensing by gradient. (c) The sensing mechanism is able
certain fast-moving cells, with to track changes in the orientation ofthe
concentrations o f chemoattractant (dashed extracellular gradient. (d) Models have been
lines) and intracellular messenger (solid formulated on the basis of the opposition of
curves) a t the "front" and "rear" o f the cell positive and negative feedback loops,
shown as a function o f time. (a) Uniform together with fast and slow diffusion ofthe
stimulation typically elicits adaptation o f the various components. Here, m * denotes the
signaling response. (b) Gradient active intracellular messenger.
1072
I in the light of the inability to spatially resolve such gradients by fluorescence
microscopy.
17.2.5
Downstream Signaling Cascades and Networks
After the receptor-mediated events described above, signals are transduced

through conserved biochemical pathways (Fig. 17.2-5), ultimately leading to
the actuation of functional responses such as specific control of transcription,
translation, or cytoskeletal dynamics. A signaling cascade generally refers
Fig. 17.2-5 Signal transduction pathways These act upon membrane-associated

and networks. A partial interaction map, substrates, which once modified recruit
focusing on receptor-proximal signaling serine-threonine kinases and other
processes, is illustrated for the network enzymes to the membrane for initiation o f
typically activated by growth factor receptors signaling cascades. Of particular interest are
(RTKs) and cytokine receptors that associate branch points (blue), which act upon
with nonreceptor tyrosine kinases such as multiple molecules/pathways, and points o f
those ofthe Src and JAK families (not convergence (red), which receive and
depicted). Adaptor proteins are shown on integrate inputs from multiple pathways.
the first level below the receptor, followed by Pathway modulators are also shown (light
the enzymes in complex with the receptor. green).
to a series of enzyme modification processes, as in the activation of

the various mitogen-activated protein (MAP) kinases, and thus presents a
linear picture of signal transmission. As considered theoretically by Bray
[62], the use of multiple intermediates in signaling pathways affords more
opportunities for regulation, often from parallel pathways (crosstalk). In
fact, most signaling “pathways” are simply dominant routes of regulation
embedded in larger networks of interactions, in which proteins may
interact with and/or modify multiple substrates (branch points) and receive
regulatory inputs from multiple molecular partners (convergence points)
(Fig. 17.2-5).
1 7.2.5.1 General Considerations and Pathway-specificModels

In addition to providing multiple nodes for pathway regulation, signaling
cascades have long been considered a mechanism for amplifying signals.
Biologists often refer to amplification in the linear sense, suggesting that
a signaling cascade will amplify the absolute number of activated proteins,
but in theory this outcome should not be expected. The sensitivity of the
pathway, defined as the fractional change in output relative to that of
the input, is another matter. Borrowing from formalisms developed for
the analysis of metabolic pathways, it is readily shown that the sensitivity
is additive as one moves down a sequence of reactions [63]. Pioneering
theoretical work by Goldbeter and Koshland [64, 651 and later by Ferrell
[G6] showed that amplified sensitivity to a stimulus is readily achieved in
systems governed by reversible, enzyme-mediated covalent modifications.
These effects were shown to arise when the modifying enzymes are close
to saturation, and when activation requires multiple modifications by the
same enzyme (as in the dual phosphorylations of MEK and Erk in the
MAP kinase cascade). More recent studies along these lines have considered
the effects of enzyme/substrate compartmentalization [67, 681 and binding
to scaffolding proteins [69], the kinetics in response to transient stimuli
[67, 701, pathway feedback and branching [63, 711, and the existence and
functional significance of bistability [72] in signaling cascades. Another suite
of models has analyzed or otherwise considered the mechanisms involved
in specific pathways. Within the past 10 years or so, such models have been
formulated to describe receptor-mediated formation of Ras-GTP [ll,67, 731,
activation of the Raf-MEK-Erkand homologous kinase cascades [29,69,74-791,
regulation of PtdIns(4,5)P2lipid levels through activation of its synthesis and
PLC-mediated hydrolysis [34, 801, and activation of PI 3-kinase and Akt [19],
and still others have considered pathways of activation of NF-KB [Sl], STAT
[82], and Gli [83] transcription factors. For the sake of simplicity, each of the
models cited above implicitly assumes that its pathway operates in isolation;
however, as models become more detailed it is clear that they will need to
consider crosstalk interactions from other pathways emanating from the same
receptor(s).
1074
1 7.2.5.2 Pathway Crosstalk and Signaling Networks

When confronted with a system in which multiple signaling pathways
are activated and crosstalk between them is prevalent, it is difficult to
predict the consequences of mutations or interventions at the level of
signaling intermediates, particularly those nodes that serve as branch and/or
convergence points through which signals are distributed and integrated.
Especially when a branch point leads to activation of some downstream signals
and suppression of others, or when a convergence point receives both positive
and negative signals, it is crucial to quantitatively characterize the magnitudes
of the effects and how they influence the overall response [84]. An example of
this sort of signal integration is seen in the activation of Erk, which is activated
by the Raf-MEK-Erkcascade and negatively regulated through phosphorylation
of Raf by Akt, a PI 3-kinase-dependent pathway; a model accounting for this
crosstalk relationship has appeared recently [85].Pathway crosstalk interactions
may also be involved in positive feedback loops that produce prolonged
responses, provided a threshold level of receptor signaling has been achieved.
Activation of a negative feedback is then needed to break the cycle. To illustrate
such bistable signaling mechanisms, Bhalla and Iyengar have formulated
complex models in the context of Erk activation, which are robust with respect
to producing bistability over relatively wide ranges of parameter values [32,
861. Pathway crosstalk remains an important and developing area of signal
transduction research, in both the experimental and modeling arenas.
17.2.6
Prospects and Challenges
With our ever-expanding knowledge of signal transduction mechanisms, it is

envisioned that complex kinetic models incorporating all major intracellular
pathways will be constructed. In tandem, stochastic simulations accounting
for the full diversity of molecular interactions and intracellular compartments
will allow researchers to visualize, at the single-molecule level, the sequence
of signaling complex assembly and the local and global activation of signaling
pathways that follows. Another exciting frontier is the linkage of signaling
dynamics with control of the cytoskeleton, which will require an appreciation
of both kinetics and mechanics, and yet another is the interface with gene reg-
ulatory networks and genomic data. In terms of implementation, the question
is not whether such efforts are feasible; indeed, efforts along these lines are
well underway. Rather, the real test will be to extract mechanistic insights that
allow one to predict or at least explain the outcomes of specific experiments.
17.2.6.1 Limitations o f Complex Models

If the field is to move toward more complicated models that include multiple
pathways and cell stimuli, a nurnber of hurdles must be overcome. First and
perhaps foremost, one must choose a model structure that relates to molecular
mechanisms that may not be known completely, and so it is inevitable
that complex models will include controversial elements. Like conceptual
models of signaling mechanisms, quantitative models will need to be refined
and/or revised in the light of new findings, but then the model bears the
burden of showing whether earlier predictions and analyses remain valid.
Second, a fundamental problem with complex models is that they require the
specification of an increasing number of parameter (e.g., rate constant) values;
even when such values are obtained from the literature or from best-fits to
available data sets, it must be recognized that there is a great deal ofuncertainty
associated with this exercise. In the best-case scenario, the model would be
validated by direct comparison with quantitative measurements that assess
multiple intermediates activated under the same stimulation conditions, and
even then a sensitivity analysis will be warranted to identify those parameter
values that drive the quality of fit; in spite of the vast literature on signaling
mechanisms, the field is currently limited by the availability of such data.
Model generality is a related issue; it seems unlikely that a model that was
trained on one cellular context will transfer well to the analysis ofother systems.
Finally, more comprehensive models can be cumbersome to work with, and
how one might approach the analysis depends on the specific question(s)
being asked. In response, it has been suggested that one might build models
from smaller process modules, which might be analyzed individually and in
the context of other modules [87, 881. Software packages such as Virtual Cell
(http://www.nrcam.uchc.edu/) [89] have been developed for the purpose of
linking models together in a seamless and interactive way.
1 7.2.6.2 Model Compression and Integration

The issues of model structure, parameter estimation, generality, and
modularity all point to the continued need for detailed analyses of smaller
models that focus on a particular aspect of the system, in conjunction with
focused, quantitative experiments. While the modular strategy described above
will no doubt become increasingly valuable as efforts are made to link the
models, we offer here an approach that is similar in spirit yet distinct in
one important respect; that is, once a submodel has been formulated and
analyzed in full-blown mechanistic detail, we favor a compression step whereby
the submodel is simplified by lumping parameters and processes to the extent
where it retains its basic features (as illustrated in Fig. 17.2-6).Classically, this
is achieved through a consideration of fast and slow kinetic processes, perhaps
with input from sensitivity analysis. We argue that such a coarse-graining
approach is forgiving with respect to the choices made in the submodel
formulation and facilitates the process of submodel integration; one might
initially explore the phenomenological behavior of the higher-level model
with fewer parameters to specify, simplifying the sensitivity analysis and
portability to other systems. The simplifying assumptions used to condense
1076
c,
k, j
internalized dimer
Fig. 17.2-6 Compression o f a signaling describing the initial receptor binding or

module. As an illustrative example o f model receptor trafficking. (d) One could stop at
compression, we consider the activation o f this stage and simplify, or make further
PDCF receptor as a module to be embedded assumptions. The simplified receptor
in a model ofcell response to PDCF. balance shown here, with R(0) defined as
(a) Model schematic (adapted from the cell surface receptor number prior t o
Ref. 19). Our previous model accounted for +
ligand stimulation (R(0)= ( V s / k t ) ( l
PDCF receptor binding, dimerization, and krec/kdeg))assumes that k , >> k,, kt
internalization; in addition t o the processes (pseudoequilibrium, with KD = k , / k f ) .
shown here, we have added basal receptor (e) The simplified equations are used t o
turnover, synthesis, and recycling. (b) The solve for C2, the number offunctional
complete kinetic model is posed in terms of signaling complexes, in terms o f only three
ordinary differential equations according t o lumped parameters. If one is interested only
the laws o f mass action. There are 1 0 in the shape ofthe dose-response curve,
adjustable rate constants in this model. one might normalize the ligand
(c) It is assumed that the ultimate cell concentration, [L],and Cz (by KD and V , / k , ,
responses are slow relative to the processes respectively; alternatively, c2 could be
considered here, hence we assume a steady normalized by its maximum value, taken at
or pseudosteady state. The simplifications 6 = 1 ) . In that case, the normalized
shown here further assume that the dose-response curve would be determined
processes described by the rate constants by a single Parameter, K x .
k- and k,,,,, are much faster than those
References 11077
each submodel may be reevaluated at any time, and it would be expected
that some findings would prompt a revision of the submodel, while others
will simply reveal accessory processes that modulate the existing lumped
parameters.
17.2.7
Concluding Remarks
Quantitative models, in conjunction with quantitative experimentation, are

being used to evaluate biochemical signaling mechanisms, predict the
outcomes of novel experiments, and generate nonintuitive insights and
hypotheses warranting further study. Generalized and pathway-specificmodels
have elucidated relationships between molecular properties and kinetics of
signaling responses, incorporating spatial information where appropriate. The
lessons learned from smaller, “reductionist” models have been significant,
and one of the challenges we now face is how best to integrate such models to
analyze complex intracellular systems.
Acknowledgments
Support from the NIH (ROl-GM067739), N S F (# 0133594), and Office of Naval

Research (N00014-03-1-0594)is gratefully acknowledged.
References
1. T. Hunter, Signaling - 2000 and 6. J.J.Tyson, K.C. Chen, B. Novak,

beyond, Cell 2000, 100,113-127. Sniffers, buzzers, toggles and
2. D. Bray, Reductionism for blinkers: dynamics of regulatory and
biochemists: how to survive the signaling pathways in the cell, CUT.
protein jungle, Trends Biochem. Sci. Opin. Cell Biol. 2003, 15, 221-231.
1997,22, 325-326. 7. N.J. Eungdamrong, R. Iyengar,
3. B.M. Slepchenko, J.C. Schaff, J.H. Computational approaches for
Carson, L.M. Loew, Computational modeling regulatory cellular
cell biology: spatiotemporal simulation networks, Trends Cell Biol. 2004, 14,
of cellular events, Annu. Rev. Biophys.
661-669.
Biomol. Stmct. 2002, 31, 423-441.
8. H.M. Sauro, B.N. Kholodenko,
4. W.S. Hlavacek, J.R. Faeder, M.L.
Blinov, A.S. Perelson, B. Goldstein, Quantitative analysis of signaling
The complexity of complexes in signal networks, Prog. Biophys. Mol. Biol.
transduction, Biotechnol. Bioeng. 2003, 2004, 86,s-43.
84,783-794. 9. C. Wofsy, B. Goldstein, K. Lund, H.S.
5. A. Levchenko, Dynamical and Wiley, Implications of epidermal
integrative cell signaling: challenges growth factor (EGF) induced EGF
for the new biology, Biotechnol. Bioeng. receptor aggregation, Biophys. /. 1992,
2003,84,773-782. 63,98-110.
10781 17 computational Methods and Modeling
10. S.G. Chamberlin, D.E. Davies, A phosphoinositide 3-kinase/Akt

unified model of c-erbB receptor signaling in fibroblasts, J . Biol. Chem.
homo- and heterodimerisation, 2003,278,37064-37072,
Biochim. Biophys. Acta 1998, 1384, 20. M.M. Ilondo, A.B. Damholt, B.C.
223-232. Cunningham, J.A. Wells, P. De Meyts,
11. B.N. Kholodenko, O.V. Demin, R.M. Shymko, Receptor dimerization
G. Moehren, J.B. Hoek, Quantification determines the effects of growth
of short term signaling by the hormone in primary rat adipocytes and
epidermal growth factor receptor, J . cultured human IM-9 lymphocytes,
Biol. Chem. 1999, 274, 30169-30181. Endocrinology 1994, 134, 2397-2403.
12. P. Klein, D. Mattoon, M.A. Lemmon, 21. J.M. Haugh, Mathematical model of
J. Schlessinger, A structure-based human growth hormone (hGH)-
model for ligand binding and stimulated cell proliferation explains
dimerization of EGF receptors, Proc. the efficacy of hGH variants as
Natl. Acad. Sci. U.S.A. 2004, 101, receptor agonists or antagonists,
929-934. Biotechnol. Prog. 2004, 20, 1337-1344.
13. B.S. Hendriks, G. Orr, A. Wells, H.S. 22. B. Goldstein, D. Jones, I.G.
Wiley, D.A. Lauffenburger, Parsing Kevrekidis, A.S. Perelson, Evidence for
ERK activation reveals quantitatively p55-p75 heterodimers in the absence
equivalent contributions from of IL-2 from Scatchard plot analysis,
epidermal growth factor receptor and lnt. lmmunol. 1992, 4, 23-32.
HER2 in human mammary epithelial 23. H.S. Wiley, D.D. Cunningham, A
cells, J. Biol. Chem. 2005, 280, steady state model for analyzing the
6157-6169. cellular binding, internalization and
14. S. Wanant, M.J. Quon, Insulin degradation of polypeptide ligands,
receptor binding kinetics: modeling Cell 1981, 25, 433-440.
and simulation studies, J . Theor. Biol. 24. C. Starbuck, D.A. Lauffenburger,
2000, 205, 355-364. Mathematical model for the effects of
15. K.E. Forsten, M. Fannon, M.A. epidermal growth factor receptor
Nugent, Potential mechanisms for the trafficking dynamics on fibroblast
regulation of growth factor binding by proliferation responses, Biotechnol.
heparin, J . Theor. Biol. 2000, 205, Prog. 1992, 8, 132-143.
21 5 -230. 25. A.R. French, D.A. Lauffenburger,
16. K. Forsten-Williams, C.C. Chua, M.A. Intracellular receptor/ligand sorting
Nugent, The kinetics of FGF-2 binding based on endosomal retention
to heparan sulfate proteoglycans and components, Biotechnol. Bioeng. 1996,
MAP kinase signaling, J . Theor. Biol. 51,281-297.
2005,233,483-499. 26. E.M. Fallon, D.A. Lauffenburger,
17. C. Wofsy, B.M. Vonakis, H. Metzger, Computational model for effects of
B. Goldstein, One Lyn molecule is ligandlreceptor binding properties on
sufficient to initiate phosphorylation interleukin-2 trafficking dynamics and
of aggregated high-affinity IgE T cell proliferation response,
receptors, Proc. Natl. Acad. Sci. U.S.A. Biotechnol. Prog. 2000, 16, 905-916.
1999, 96,8615-8620. 27. C.A. Sarkar, D.A. Lauffenburger,
18. J.R. Faeder, W.S. Hlavacek, I. Reischl, Cell-levelpharmacokinetic model of
M.L. Blinov, H. Metzger, A. Redondo, granulocyte colony-stimulating factor:
C. Wofsy, B. Goldstein, Investigation implications for ligand lifetime and
of early events in FceRI-mediated potency in vivo, Mol. Phamacol. 2003,
signaling using a detailed mathe- 63,147-158.
matical model, J . lmmunol. 2003, 170, 28. J.M. Haugh, D.A. Lauffenburger,
3769-3781. Analysis of receptor internalization as
19. C.S. Park, I.C. Schneider, J.M. Haugh, a mechanism for modulating signal
Kinetic analysis of platelet-derivpd transduction, I. Theor. Bid. 1998, 195,
growth factor receptor/ 187-218.
References I1079
29. B. Schoeberl, C. Eichler-Jonsson, E.D. Proc. Natl. Acad. Sci. U.S.A. 2005, 102,
Gilles, G . Muller, Computational 4824-4829.
modeling of the dynamics of the MAP 38. C. Wofsy, D. Coombs, B. Goldstein,
kinase cascade activated by surface Calculations show substantial serial
and internalized EGF receptors, Nat. engagement of T cell receptors,
Biotechnol. 2002, 20, 370-375. Biophyr.1. 2001, 80, 606-612.
30. J.M. Haugh, I.C. Schneider, J.M. 39. D. Coombs, A.M. Kalergis, S.G.
Lewis, On the cross-regulation of Nathenson, C. Wofsy, B. Goldstein,
protein tyrosine phosphatases and Activated TCRs remain marked for
receptor tyrosine kinases in internalization after dissociation from
intracellular signaling, J. 7’heor. Biol. pMHC, Nat. Immunol. 2002, 3 ,
2004, 230,119-132. 926-931.
31. R.G. Posner, C. Wofsy, B. Goldstein, 40. C.C. Fink, B. Slepchenko, 1.1. Moraru,
The kinetics of bivalent ligand-bivalent J . Schaff, J. Watras, L.M. Loew,
receptor aggregation: ring formation Morphological control of inositol-
and the breakdown of the equivalent 1,4,5-trisphosphate-dependent signals,
site approximation, Math. Biosci. 1995, 1. Cell Biol. 1999, 147, 929-935.
126,171-190. 41. J.C. Schaff, B.M. Slepchenko, Y.S.
32. U.S. Bhalla, R. lyengar, Emergent Choi, J . Wagner, D. Resasco, L.M.
properties of networks of biological Loew, Analysis of nonlinear dynamics
signaling pathways, Science 1999, 283, on arbitrary geometries with the
381-387. virtual cell, Chaos 2001, 11, 115-131.
33. J.M. Haugh, A.C. Huang, H.S. Wiley, 42. S.Y. Shvartsman, Shooting from the
A. Wells, D.A. Lauffenburger, hip: spatial control of signal release by
Internalized epidermal growth factor intracellular waves, Proc. Natl. Acad.
receptors participate in the activation Sci. U.S.A. 2002, 99,9087-9089.
of p21rasin fibroblasts, J . Biol. Chem. 43. B.N. Kholodenko, G.C. Brown, J.B.
1999,274,34350-34360. Hoek, Diffusion control of protein
34. J.M. Haugh, A. Wells, D.A. phosphorylation in signal
Lauffenburger, Mathematical transduction pathways, Biochem. /.
modeling of epidermal growth factor 2000, 350, 901-907.
receptor signaling through the 4. B.N. Kholodenko, MAP kinase cascade
phospholipase C pathway: signaling and endocytic trafficking: a
mechanistic insights and predictions marriage of convenience? Trends Cell
for molecular interventions, Biol. 2002, 12, 173-177.
Biotechnol. Bioeng. 2000, 70, 225-238. 45. I.V. Maly, H.S. Wiley, D.A.
35. T.W. McKeithan, Kinetic proofreading Lauffenburger, Self-organization of
in T-cell receptor signal transduction, polarized cell signaling via autocrine
Proc. Natl. Acad. Sci. U.S.A. 1995, 92, circuits: computational model
5042- 5046. analysis, Biophys. J . 2004, 86, 10-22.
36. W.S. Hlavacek, A. Redondo, C. Wofsy, 46. A. Gierer, H. Meinhardt, A theory of
B. Goldstein, Kinetic proofreading in biological pattern formation,
receptor-mediated transduction of Kybernetik 1972, 12, 30-39.
cellular signals: receptor aggregation, 47. M. Postma, P.J.M. Van Haastert, A
partially activated receptors, and diffusion-translocation model for
cytosolic messengers, Bull. Math. Biol. gradient sensing by chemotactic cells,
2002, 64,887-911. Biophys.J. 2001, 81, 1314-1323.
37. P.A. Gonzalez, L.J. Carreno, 48. A. Levchenko, P.A. Iglesias, Models of
D. Coombs, J.E. Mora, E. Palmieri, eukaryotic gradient sensing:
B. Goldstein, S.G. Nathenson, A.M. application to chemotaxis of amoebae
Kalergis, T cell receptor binding and neutrophils, Biophys. J . 2002, 82,
kinetics required for T cell activation 50-63.
depend on the density of cognate 49. K.K. Subramanian, A. Narang, A
ligand on the antigen-presenting cell, mechanistic model for eukaryotic
1080
I gradient sensing: spontaneous and 60. L.D. Shea, J.J.Linderman,
induced phosphoinositide Compartmentalization of receptors
polarization, J. Theor. Biol. 2004, 231, and enzymes affects activation for a
49-67. collision coupling mechanism, J.
50. L. Ma, C. Janetopoulos, L. Yang, P.N. Theor. Biol. 1998, 191, 249-258.
Devreotes, P.A. Iglesias, Two 61. K. Ritchie, X. Shan, J. Kondo,
complementary, local excitation, K. Iwasawa, T. Fujiwara, A. Kusumi,
global inhibition mechanisms acting Detection of non-brownian diffusion
in parallel can explain the in the cell membrane in single
chemoattractant-induced regulation of molecule tracking, Biophys. /. 2005, 88,
PI(3,4,5)P3response in Dictyostelium 2266-2277.
cells, Biophys. /. 2004, 87, 3764-3774. 62. D. Bray, Intracehlar signaling as a
51. J.M. Haugh, F. Codazzi, M. Teruel, parallel distributed process, /. Theor.
T. Meyer, Spatial sensing in fibroblasts Biol. 1990, 143, 215-231.
mediated by 3' phosphoinositides, J. 63. B.N. Kholodenko, J.B. Hoek, H.V.
Cell Biol. 2000, 151, 1269-1279. Westerhoff, G.C. Brown,
52. J.M. Haugh, I.C. Schneider, Spatial Quantification of information transfer
analysis of 3' phosphoinositide via cellular signal transduction
signaling in living fibroblasts: I. pathways, FEBS Lett. 1997, 414,
Uniform stimulation model and 430-434.
bounds on dimensionless groups, 64. A. Goldbeter, D.E. Koshland Jr,An
Biophys. /. 2004, 86, 589-598. amplified sensitivity arising from
53. G. Adam, M. Delbriick, Reduction of covalent modification in biological
dimensionality in biological diffusion systems, Proc. Natl. Acad. Sci. U.S.A.
processes, in Structural Chemistry and 1981, 78,6840-6844.
Molecular Biology, (Eds.: A. Rich, 65. A. Goldbeter, D.E. Koshland Jr,
N. Davidson), W.H. Freeman and Co., Ultrasensitivity in biochemical
San Fransisco, 1968,198-215. systems controlled by covalent
54. H.C. Berg, E.M. Purcell, Physics of modification: interplay between
chemoreception, Biophys. /. 1977, 20, zero-order and multistep effects, /.
193-219. Biol. Chem. 1984, 259,14441-14447.
55. L.D. Shea, G.M. Omann, J.J. 66. J.E. Ferrell Jr, Tripping the switch
Linderman, Calculation of fantastic: how a protein kinase cascade
diffusion-limited kinetics for the can convert graded inputs into
reactions in collision coupling and switch-likeoutputs, Trends Biochem.
receptor cross-linking, Biophys. 1. S C ~1996,
. 21,460-466.
1997, 73,2949-2959. 67. J.M. Haugh, D.A. Lauffenburger,
56. J.M. Haugh, A unified model for Physical modulation of intracellular
signal transduction reactions in signaling processes by locational
cellular membranes, Biophys. J. 2002, regulation, Biophys. /. 1997, 72,
82,591-604. 2014-2031.
57. H. Berry, Monte Carlo simulations of 68. J.E. Ferrell Jr, How regulated protein
enzyme reactions in two dimensions: translocation can produce switch-like
fractal kinetics and spatial segregation, responses, Trends Biochem. Sci. 1998,
Biophys.]. 2002, 83, 1891-1901. 23,461-465.
58. P.J. Woolf, J.J. Linderman, Untangling 69. A. Levchenko, J. Bruck, P.W.
ligand induced activation and Sternberg, Scaffold proteins may
desensitization of G-protein-coupled biphasically affect the levels of
receptors, Biophys. J. 2003, 84, 3-13. mitogen-activated protein kinase
59. M.J. Saxton, K. Jacobson, signaling and reduce its threshold
Single-particle tracking: applications properties, Proc. Natl. Acad. Sci. U.S.A.
to membrane dynamics, Annu. Rev. 2000, 97,5818-5823.
Biophys. Biomol. Struct. 1997, 26, 70. R. Heinrich, B.G. Neel, T.A. Rapoport,
373-399. Mathematical models of protein
References I1081
kinase signal transduction, Mol. Cells phosphoinositide turnover, J. Cell Biol.

2002, 9,957-970. 2003, 161,779-791.
71. V.K. Mutalik, A.P. Singh, J.S. 81. A. Hoffmann, A. Levchenko, M.L.
Edwards, K.V. Venkatesh, Robust Scott, D. Baltimore, The IKB-NF-KB
global sensitivity in multiple enzyme signaling module: temporal control
cascade system explains how the and selective gene activation, Science
downstream cascade structure may 2002, 298,1241-1245.
remain unaffected by cross-talk, FEBS 82. S. Yamada, S. Shiono, A. Joo,
Lett. 2004, 558, 79-84. A. Yoshimura, Control mechanism of
72. J.E. Ferrell Jr, Self-perpetuating states JAK/STAT signal transduction
in signal transduction: positive pathway, FEBS Lett. 2003, 534,
feedback, double-negative feedback 190- 196.
and bistability, Curr. Opin. Cell Biol. 83. K. Lai, M.J. Robertson, D.V. Schaffer,
2002, 14,140-148. The sonic hedgehog signaling system
73. H. Resat, ].A. Ewald, D.A. Dixon, H.S. as a bistable genetic switch, Biophys. 1.
Wiley, An integrated model of 2004, 86,2748-2757.
epidermal growth factor receptor 84. B.N. Kholodenko, A. Kiyatkin, F.J.
trafficking and signal transduction, Bruggeman, E. Sontag, H.V.
Biophys. J. 2003, 85,730-743. Westerhoff, J.B. Hoek, Untangling the
74. C.F. Huang, J.E. Ferrell Jr, wires: a strategy to trace functional
Ultrasensitivity in the interactions in signaling and gene
mitogen-activated protein kinase networks, Proc. Natl. Acad. Sci. U.S.A.
cascade, Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 12841-12846.
1996, 93,10078-10083. 85. M. Hatakeyama, S. Kimura, T. Naka,
75. W.R. Burack, T.W. Sturgill, The T. Kawasaki, N. Yumoto, M. Ichikawa,
activating dual phosphorylation of J. Kim, K. Saito, M. Saeki,
MAPK by MEK is nonprocessive, M. Shirouzu, S. Yokoyama,
Biochemistry 1997,36,5929-5933. A. Konagaya, A computational model
76. F.A. Brightman, D.A. Fell, Differential on the modulation of
feedback regulation of the MAPK mitogen-activated protein kinase
cascade underlies the quantitative (MAPK) and Akt pathways in
differences in EGF and NGF heregulin-induced ErbB signalling,
signalling in PC12 cells, FEBS Lett. Biochem. J. 2003,373,451-463.
2000,482,169-174. 86. U.S. Bhalla, P.T. Ram, R. Iyengar,
77. B.N. Kholodenko, Negative feedback MAP kinase phosphatase as a locus of
and ultrasensitivity can bring about flexibility in a mitogen-activated
oscillations in the mitogen-activated protein kinase signaling network,
protein kinase cascades, Eur. J. Science 2002,297,1018-1023.
Biochem. 2000,267,1583-1588. 87. G. Weng, U.S. Bhalla, R. lyengar,
78. A.R. Asthagiri, D.A. Lauffenburger, A Complexity in biological signaling
computational study of feedback systems, Science 1999, 284, 92-96.
effects on signal dynamics in a 88. A.R. Asthagiri, D.A. Lauffenburger,
mitogen-activated protein kinase Bioengineering models of cell
(MAPK) pathway model, Biotechnol. signaling, Annu. Rev. Biomed. Eng.
Prog. 2001, 17,227-239. 2000, 2, 31-53.
79. S. Sasagawa, Y. Ozaki, K. Fujita, 89. J . Schaff, C.C. Fink, B. Slepchenko,
S. Kuroda, Prediction and validation of J.H. Carson, L.M. Loew, A general
the distinct dynamics of transient and computational framework for
sustained ERK activation, Nat. Cell modeling cellular structure and
Bid. 2005, 7, 365-373. function, Biophys. J. 1997, 73,
80. C. Xu, J . Watras, L.M. Loew, Kinetic 1135- 1146.
analysis of receptor-activated
Chemical Biology
I 1083
18
Genome and Proteome Studies
18.1
Genome-wide Gene Expression Analysis: Practical Considerations and
Application to the Analysis of T-cell Subsets in Inflammatory Diseases
Lars Rogge and Elisabetta Bianchi
Outlook
The scope of this chapter is twofold. We will first review some important
conceptual and technical issues related to experiment design that we feel
should be addressed while designing studies using microarrays. In the second
part, we will illustrate how this technology can be employed practically to
promote insight into a specific biological field, by reviewing several studies
that address the molecular basis of inflammatory diseases using gene profiling.
We will focus on the gene expression analysis of T-lymphocyte subsets, the
key players in several inflammatory diseases.
18.1.1
Introduction
The concept of systems biology is to use a holistic approach to understand

the function of an organism. This approach involves a large-scale analysis of
the interplay of the constituents of the organism using genetics, genomics,
and proteomics. Systems biology would have remained an illusion without the
significant progress that has been made in each of the three fields mentioned
above. Genomic-scale gene expression profiling has developed from its infancy
in the mid-1990 into a robust tool used currently in many laboratories and now
has increasing impact on biological and biomedical research. This technology
is based on the development of the so-called microarrays. Microarrays consist
of an ordered array of DNA sequences on a solid support that allows measuring
I S B N : 978-3-527-31150-7
1084
I 18 Genome and Proteome Studies
the expression level of many genes in parallel. The technology can reveal the
physiology of cells and tissues on an unprecedented scale by quantitating the
mRNA levels of tens of thousands of genes [l].
The amount of data generated by microarray experiments cannot be handled
by simple sorting in spreadsheets or by plotting on graphs. Microarray data
analysis has recently developed as a separate field with increasing impact of
mathematicians generating dedicated algorithms and tools [2-41. Sophisticated
computational tools are now available, but it should be noted that a basic
understanding of these tools is required for meaningful data analysis.
18.1.2
History/Development
Gene expression profiling using microarrays is a relatively new technology.

Initially, global gene expression studies have relied mainly on two technologies:
spotted complementary DNA (cDNA) microarrays and commercial high-
density oligonucleotide microarrays generated by light-directed, chemical
synthesis [S, 61 (see Refs. 7 and 8 for reviews of the two technologies). It
is of interest to note that the technology of light-directed, chemical synthesis
was initially developed for the parallel synthesis of multiple peptides (e.g.,
for the identification of epitopes of monoclonal antibodies) [9], then applied
to the parallel synthesis of oligonucleotides for rapid DNA sequence analysis
(e.g., of HIV or other pathogens) [lo], before it was commercialized for
the monitoring of gene expression [Ill. Currently, in addition to the two
technologies described above, custom-designed and commercial platforms
using “long” oligonucleotides (approximately GO nucleotides) are increasingly
used.
Apart from the development of dedicated technology for the production,
microarray technology is based on the knowledge of the transcriptome
(cDNA sequences) of the respective organism. In the early days, microarrays
contained large amounts of expressed sequence tags (ESTs), whose origins
and significance were sometimes dubious. The scarce annotation of ESTs
sometimes turned the biological interpretation of microarray experiments
into a nightmare. The availability of the draft sequence of the human
genome represented a milestone in the development of microarray technology.
The notion that humans have “only” approximately 30 000 genes made it
technically possible to design microarrays that could measure the expression
levels of all human genes on a single chip. In addition, the published
draft sequence allowed the control of the cDNA sequences represented on
microarrays and resulted in a much higher quality of both custom-made and
commercial microarrays. The recent publication of the finished euchromatic
sequence of the human genome [ 121will certainly result in a further refinement
of this technology. Currently, custom-made and commercial microarrays
typically interrogate the expression levels of approximately 30 000 human
18.1 Genome-wide Gene Expression Analysis I 1085
genes, although the international human genome sequencing consortium

predicts “only” 20 000-25 000 protein-coding genes. This discrepancy
indicates that it may still take some time to further improve this technology.
Nevertheless, it is fair to say that genome-wide gene expression analysis has
developed in only 10 years from a splendid idea into a robust tool.
18.1.3
18.1.3.1 Issues in Experimental Design

Array experiments are still far from being inexpensive, both in terms of
reagents and time. Careful design of these experiments is therefore essential
to optimize information retrieval, in particular, in studies involving primary
human samples, which have to take into account the limitations imposed
by restricted availability of sample material and the high donor-to-donor
variability. Two basic experimental designs are possible: in two-fluorescence
methods, the two samples to be compared are labeled with two different
dyes and hybridized to the same array, allowing direct comparison of gene
expression levels; in one-fluorescence methods, each sample is hybridized to a
separate array, and differences in gene expression levels between samples are
determined by comparison with a common reference sample (Fig. 18.1-1).
18.1.3.1.1 Reference Sample

Microarray experiments are often employed to determine relative fold
differences in gene expression levels between different experimental samples.
The reference sample is the one to which the other samples are compared. For
one-color platforms, in which each sample is hybridized to a separate array,
the choice of the sample of reference is quite flexible, and can be performed
after the experiment is carried out. For technologies in which two extracts
are hybridized to the same array, the choice of the reference sample has to
be included in the experimental design. The direct comparison between two
samples (e.g., tumor vs. normal sample) reduces variations in measurements,
providing a more accurate representation of expression changes. A method for
optimizing direct comparisons, the loop design, has been proposed by Kerr and
Churchill [13]. In loop design studies, samples are systematically compared
with each other, an approach that allows the generation of more relevant data
and of very precise assessment of gene expression levels. A drawback of this
approach is its limited flexibility, since extension of these studies to include
additional samples calls for a redesign of the experiment and rapidly growing
requirements for larger amounts of RNA and microarrays. In addition, with
this study design, the efficiency of estimation of gene expression levels is
greatly compromised by the loss of just one sample.
The use of a common reference sample allows the comparison of data from
multiple arrays, and, ideally, from multiple experiments or laboratories that use
1086
Fig. 18.1-1 Global gene expression studies given gene. More recently, custom-designed
rely mainly on two technologies: spotted and commercial platforms using "long"
complementary DNA (cDNA) oligonucleotides (60-mers) are increasingly
microarrays (a) and oligonucleotide used.
microarrays (b). The first type o f microarray To generate hybridization targets, RNA i s
is generated by robotic spotting of cDNA extracted from the tissue o f interest and
fragments for defined genes on a glass slide, mRNA is reverse transcribed into cDNA. In
in an ordered fashion. In general, each gene protocols used mainly for spotted cDNA
is represented by double-stranded DNA arrays, fluorescently labeled nucleotides are
probe (up to 1 kb) that is usually generated incorporated into the cDNA during this
by polymerase chain reaction (PCR) step. In other protocols used mainly for
amplification. Current technology allows the high-density oligonucleotide arrays, a
deposition o f more than 10 000 genes on a biotin-labeled cRNA target is generated by
single slide. High-density oligonucleotide
transcribing the double-stranded cDNA
arrays are generated by in situ synthesis of
target with T7 RNA polymerase. This last
short oligonucleotides (25-mers) on a glass
step also results in a linear amplification
slide. A sophisticated process developed in
(approximately 50-fold) o f the material. In
the semiconductor industry, termed
photolithography, is used to synthesize both cases, the labeled target cDNA or
approximately 1 300 000 distinct cRNA is hybridized t o the array, and the
oligonucleotide features in defined places intensity of hybridization t o individual cDNA
on a chip. In contrast to spotted cDNA fragments or oligonucleotides on the array
arrays, each gene is represented by 11 to 20 is revealed by a high-resolution scanner. The
pairs o f oligonucleotides on a single chip. hybridization signal is then used t o
This allows the design o f oligonucleotide determine the expression level o f each gene
probes that hybridize to a specific exon o f a represented on the array.
the same reference, making it easier to build common databases of microarray

data. The desirable characteristics of a reference sample are that it should be
homogeneous, available in large quantities, and stable over time. Frequently
used reference samples are genomic DNA or RNA from different cell lines, that
have been pooled to obtain coverage of all expressed genes [14,15].In a study to
compare a direct two-dye measurement (where two samples are hybridized to
78.I Genome-wide Gene Expression Analysis 1 1087
the same array) with a common reference measurement (where each sample is
hybridized to a separate array), Park et al. found a high correlation between the
two settings, suggesting that multiple comparisons of experimental conditions
using a common control can achieve a satisfactory degree of accuracy [16].
18.1.3.1.2 Replication and Sample Size

Microarray technology is very powerful, but quite noisy - and this characteris-
tic should be taken into account while planning array experiments. Replication
is a good approach to decrease the effects of variability. Technical replicates
(such as multiple hybridizations performed with the same RNA sample) can be
used to assess the experimental noise of the system and to ensure quality con-
trol of the experiment. Technical replicates that have entered common practice
include dye swapping for experiments in which two extracts are hybridized to
the same array. In this case, it is recommended to repeat sample hybridization
by inverting the dyes that label the samples. This expedient is commonly
employed to control gene-specific dye bias [17-191. Another common example
of technical replication is the presence, on the array, of multiple probes that
identify the same transcript. Reporter sequence replication may provide the
additional advantage of facilitating cross-platform comparison of data, which
requires adequate matching of corresponding probe sets and may be optimally
performed by matching the sequence of the probes present on the different
microarrays, rather than the genes represented [20]. It is generally agreed
that experimental variations due to technical aspects of the process (such as
cDNA and cRNA synthesis or chip hybridization) do not constitute the major
source of variability of microarray experiments, which is instead provided
by the natural variability of gene expression levels, with variations among
samples obtained from different individuals being the most pronounced. This
variability is most effectively addressed by the use of biological replicates (e.g.,
mRNA from different extractions or from multiple biological samples) [21,
221. The importance of replicate microarray experiments has been emphasized
in a study addressing the natural differences of gene expression in inbred
mouse strains [23]. The authors used a 5406-clone spotted cDNA microarray
to quantitate transcript levels in the kidney, liver, and testis from each of
six normal male C57BLG mice. analysis of variance (ANOVA) was used to
compare the variance across the six mice to the variance among four replicate
experiments performed for each tissue. The conspicuous finding was that
statistically significant variable gene expression was detected for 3.3, 1.9, and
0.8% of the genes in the kidney, testis, and liver, respectively [23].Importantly,
many of the transcripts that were found to be most variable were immune-
modulated, stress-induced, and hormonally regulated genes. Pritchard et al.
point out that genetically diverse populations such as humans are very likely to
show an even greater variability in gene expression than inbred mice [23].This
suggests that a meaningful study of the outbred human population will require
many replicate experiments and/or an extensive characterization of normal
1088
I variability, to discriminate between informative variations in gene expression
18 Genome and Proteome Studies
and effects due to uncontrolled variables. The estimation of adequate sample

size for microarray studies takes into account several factors, including the
variability of the population, the desired detectable fold differences in gene
expression, the power (probability) to detect differences, and the acceptable
error rate [24-281. A number of papers provide computational methods or
orientation tables to help determine the desirable number of replicates to be
included in a statistically significant study. A general and sobering conclusion
that derives from many of these calculations is that the number of samples
required for a reasonably informative experiment is much larger than the
number commonly used in human microarray case-control studies [25].
18.1.3.1.3 Pooling of R N A Samples

Messenger RNA is often pooled in microarray experiments, either because
of the impossibility of obtaining sufficient material from a single individual
or to reduce costs, by reducing the number of microarrays hybridized. The
effect of pooling on data quality is still debated in the literature. Pooling
can be useful in reducing the variability in individual samples induced by
experimental artifacts or by sample dishomogeneity [21]. However, a serious
drawback of pooling is the loss of information regarding population variability,
and therefore pooling should not be used if inferences are sought for single
subjects. This is typically the case of studies aimed at identifying gene profiles
that classify individual subjects and predict their membership in classes
(e.g., cancer patients vs. normal patients, or distinguishing cancer subsets).
An additional disadvantage of pooling is the inability to detect outliers and
possibly remove them from the analysis.
It has been proposed that appropriate RNA pooling can provide adequate
statistical power and improve the efficiency and cost-effectiveness for many
types of microarray experiments when inferences are made at the group level
[29]. In particular, for small experimental designs, in which only few arrays are
available for each biological condition, pooling could actually improve accuracy
[ 301. For larger designs, that include several biological replicates, pooling is
not usually advantageous. Pooling extra subjects on a fixed number of arrays
decreases slightly the variability across experiments, at the price of loss of
individual information. As pooling is often taken into consideration to reduce
the number of arrays (and therefore the costs) of an experiment, it should be
noted that to maintain accuracy, the number of subjects analyzed must be
greatly increased, and that the added expense of additional samples for the
pooled design may outweigh the benefit of saving on microarray cost [30, 311.
18.1.3.1.4 RNA Amplification

An alternative approach to pooling, in the analysis of small samples, is
RNA amplification. In particular, this approach has been successfully used to
derive enough RNA from sources such as laser capture microdissection
18. I Genome-wide Gene Expression Analysis I 1089
of solid tissues (Refs. 32-36 and references therein). King et al. found
that gene expression measurements from small sample RNA are not
really equivalent to measurements from standard sample RNA, possibly
because of amplification failure of low-abundance transcripts and sequence-
specific differences in amplification efficiency. They, however, concluded
that biological variability in gene expression between independent samples
is greater than the technical variability associated with the amplification
process [36]. Some amplification methods have been shown to have
reproducible bias (such as overrepresentation of T-rich sequences), related
to the amount of starting material and to the number of amplification
cycles. Underrepresentation of mRNA with extensive secondary structure
may be partially resolved by performing the reverse transcription step at
higher temperatures [37]. Comparisons between amplified and nonamplified
samples show that the best correlations of expression levels are obtained for
abundant transcripts [38].
The choice ofthe amplification protocol may be important in determining the
quality and robustness of the results, as even small variations in methodology
introduce considerable distortion of gene expression profiles. Klur et al. have
focused on procedures in which a double-stranded cDNA produced from total
RNA is used as a template to generate a labeled cRNA, and have compared
random PCR amplification, which includes a PCR amplification step at the
double-stranded cDNA level and linear amplification, consisting of two cycles
of cDNA synthesis followed by in vitro transcription. The authors found that
brain microdissections prepared with either method gave similar expression
results, in their ability to identify differentially expressed genes. Analysis of
technical replicates, however, suggests that random PCR amplification may
be more reproducible, requires smaller RNA input, and generates cRNA of
higher quality than linear amplification [39]. Several comparisons between
amplification procedures are available in the literature [40-431.
18.1.3.2 Some Principles of Data Analysis

The raw data produced by microarray analysis is a digital image. To generate
numeric data of gene expression levels, the hybridization spots on the array
have to be identified and their intensity measured (image quantitation). Image
analysis is often performed through manufacturer’s software, which also gen-
erally provides the means for initial quality control and low-levelanalysis of the
data (preprocessing). Initial transformation of the data includes background
subtraction and elimination (flagging) of aberrant signals and hybridization
spots of low intensity (usually, those with intensity less than two or three times
the standard deviation of the background intensity). Data are normalized to
eliminate systematic, nonbiological variations, such as those introduced by
differences in RNA amounts used, sample labeling, dye incorporation, or
scanner settings. Normalization makes adjustments for these effects, so that
1090
I average gene expression levels are made equivalent among the arrays com-
pared. There are several normalization methods commonly used, and they
can be either based on the complete set of arrayed genes, or on endogenous
(housekeeping) or exogenous (spiked-in) control genes. All normalization
methods are based on some assumptions, such as that most gene expression
levels do not change across conditions or that total RNA levels in a sample
do not change. When relying on housekeeping genes for normalization, it
is useful to refer to a large number of genes, since expression of many of
the housekeeping genes can actually vary among different biological settings.
For more detailed discussion of data preprocessing, see Refs. 4, 44. These
first steps of data transformation are required to organize the data into a gene
expression matrix, a table where each row represents a gene and each column
an experimental condition. In addition to information on gene expression
levels, the table ideally contains information on the variability and accuracy
of measurement (e.g., standard deviations among replicates). Data organized
in such a way can then be used for analysis: the simplest is the identification
of differentially expressed genes. Many publications still characterize differen-
tially expressed genes as those whose expression ratios, or “fold changes” are
above an arbitrary set level; however, more complex algorithms that take into
account the intrinsic variability of the dataset are possible (see Refs. 4, 45, 46
for an overview of current methods). To further biological insight, additional
analytical methods can be applied to simplify the dataset and produce an
overview of the data. These analysis approaches can be “unsupervised”, that
is, based exclusively on the information intrinsic to the data (Figs. 18.1-2 and
18.1-3),or “supervised”, such as class prediction, which assigns new samples
to known classes, on the basis of already acquired biological information
(Figs. 18.1-4 and 18.1-5).Examples of unsupervised analyses are the various
“clustering” algorithms that create categories of similar data, either by group-
ing genes into classes with similar expression profiles, or by grouping samples
in classes defined by similarly expressed genes. Microarray analysis can also
be used to delineate the biological pathways involved in a process, by analyzing
whether certain functional classes of genes are overrepresented in a cluster.
There is a current effort to develop informatic tools that provide informative
gene annotation and correlation with biological pathways. Many of these,
such as ArrayXPath (http://www.snubi.org/software/ArrayXPath/), GoMiner
(http://discover.nci.nih.gov/gominer),MAPPfinder (http://www.genmapp.
org/MAPPFinder.html), or Onto-tools [47], use the organizing principles of
Gene Ontology, which characterize genes on the basis of molecular function,
biological process, and cellular component (http://www.geneontology.org). We
will be unable discuss here the many algorithms that have been formulated
to aid both in unsupervised and supervised analysis. For an introduction, we
refer the reader to Refs. 4,45,46. For links to analysis software the reader can
refer to further websites for array databases:
http://genopole. toulouse.inra.fr/bioinfo/microarray/;
http://www.rockefeller.edu./genearray/links.php;
Fig. 18.1-2 In the unsupervised approach, expression between different samples, such
pattern-recognition algorithms are used to as hierarchical clustering o f groups o f genes
identify subgroups of samples that have with similar patterns o f expression in a set
related gene expression profiles. A of tumor samples. These so-called gene
commonly used method, termed hierarchical expression signatures may include genes
clustering [Z], calculates the similarity in expressed in a specific cell type or stage o f
expression o f t w o different genes across a differentiation, or genes expressed during a
set o f samples. Using this similarity particular biological response, such as
measure, genes can be ordered activation o f a specific intracellular signaling
hierarchically, leading to the identification o f pathway or cell proliferation. Typical graphic
genes that are regulated in a similar fashion representations o f data clustering are a
(coregulation). This method can also be dendogram and a “heat map”, which usually
used t o determine the similarity in gene color codes the levels o f gene expression.
K-means Clustering Fig. 18.1-3 Another unsupervised learning approach is

provided by “K-means clustering”. A K number o f
69
0 cluster centers (“centroids”, in black) are chosen
0 randomly among the samples. The algorithm iteratively
00 assigns samples (in white) to the nearest (most similar)
centroid’s cluster and recalculates the centroid based on
the new inclusion. The process is repeated until all
0 O0 samples are assigned and centroids no longer change.
0 0
0 0
http://www.stat.uni-muenchen.de/-strimmer/rexpress.html;
http://nslij-genetics.org/microarray/soft.html;
ihome.cuhk.edu.hk/
-b400559/arraysoft.html
18.1.3.3 lnterplatform Comparison of Results

With the expanding application of high throughput technologies for analysis
of gene expression, an increasingly attractive possibility is the comparison
1092
Fig. 18.1-4 Supervised methods represent distinguish between members and

an alternative that can be applied if previous nonmembers o f a class on the basis o f gene
information i s available about which genes expression data. The computer program is
are expected t o be coregulated. In general, subsequently used to recognize and classify
supervised methods use a “training set” in genes in the “data set” according t o their
which genes known t o be related by function gene expression levels. Supervised methods
are provided as positive examples and genes therefore compare biological information
not known t o be members o f that class are (e.g., clinical data) with already known gene
negative examples. This “training set” is expression features that are characteristic o f
used by the computer program t o learn to a group.
Supervised learning: linear classifiers Fig. 18.1-5 Class prediction can

also be obtained through the use o f
Disease 1 support vector machines (SVMs).
SVM can test several mathematical
(u combinations o f genes to find a line
C 0
.-0 or plane that optimally separates
groups o f samples in the training set
and accurately classifies new samples.
Disease 2
Gene combination 1
of data sets from independent experiments, sometimes based on different

microarray platforms. Unfortunately,the obvious advantage of having multiple
observations at our disposal is often offset by the difficulty in comparing
experiments that are heterogeneous in format, sample annotation, type of
microarray used, and statistical processing of results. While intraplatform
reproducibility is quite satisfactory in many of the studies that have
addressed this issue, the analysis of interplatform variability has occasionally

produced discouraging findings. Studies comparing gene expression levels
and significant gene expression changes obtained by analyzing the same RNA
samples with different microarray systems often show relatively low correlation
between platforms (Refs. 16,48-50 and references therein), so that completely
different sets of differentially expressed genes may be identified when the same
sample is analyzed with different arrays [Sl]. Perhaps not surprisingly, the best
correlations are obtained for highly expressed genes [16, 491. A major source
of variation for oligonucleotide arrays is the choice of the probe sequence,
which determines the affinity of hybridization with the sample [lG]. Short
oligonucleotides result in more specific target identification compared to long
cDNA clones that are more likely to give cross-hybridization to homologous
sequences on other genes [52]. Jarvinen et al. report a fairly good correlation for
gene expression data from two commercial platforms, Afimetrix and Agilent
( r = 0.78-0.86), but lower correlations for data obtained from custom-made
arrays. Their analysis shows that more than half of the discrepancies can be
traced back to incorrect clones on the custom-made arrays and to problems
in gene designation and annotation [52]. Another source of variation is
introduced during data analysis, as different algorithms may cause variability
in the measured spot intensity levels or in the number of analyzable data
points between different microarray platforms. Low-level analysis, such as
quality filtering and normalization, is most often performed with the software
provided by the array manufacturer and may have substantial influence
on subsequent processing of the data. An additional level of difficulty in
comparing results obtained with different microarray settings is introduced
by the lack of standardization in gene annotation [48]. One note of caution in
the interpretation of the above validation studies is the observation that the
number of replicates analyzed is often quite small. This fact could contribute
to the limited overlap observed between findings obtained with different
platforms.
The differences between multiple platforms have also been exploited as
a method to cross-validate microarray data. Lee et al. have proposed the
application of a mutual validation algorithm to data obtained from two
microarray platforms (oligonucleotide and cDNA arrays) that are subject to
different artifacts, to generate a consensus gene expression dataset more
reliable than either set. Such an approach would substitute individual
validation of differentially expressed genes through more “classic” methods,
such as northern blot or quantitative RT-PCR [53]. A conceptually similar
approach has been used in silico, by comparing publicly available datasets
for acute lymphoblastic leukemia to cross-validate findings from a new
microarray experiment [54]. A list of differentially expressed genes that
had been reported in the literature as possible subclass predictors was
validated on all of the independent datasets generated on the various array
platforms [54].
1094
I 18 Genome and Proteorne Studies
18.1.3.4 Toward a Standardization o f Microarray Data

Microarray data are context-dependent since they rely on the use of different
reagents and software packages for data processing and analysis. The large
number of hardware and software tools employed, as well as the fragmentary
information on the experimental settings, constitute an obvious obstacle
to the meaningful comparison of microarray data from different sources.
Efforts to standardize the recording of microarray-basedexperiments and the
formulation of gene expression data have been promoted by the Microarray
Gene Expression Data (MGED) Society.MGED is an international organization
of biologists, computer scientists, and data analysts whose aim is to develop
and promote tools that facilitate the sharing of high throughput data generated
by functional genomics and proteomics experiments. Its efforts are articulated
mainly in three areas:
Minimum Information About a Microarray Experiment (MIAME, http://
www.mged.org/miame) is a document that describes the minimum informa-
tion required to ensure easy interpretation and independent verification of
microarray data [55]. A guideline itemizes the detailed information that should
be included while reporting a microarray experiment (see Table 18.1-1for a
summarized checklist).
MIAME-required information should be encoded using a standard lan-
guage, MAGE-ML (for Microarray Gene Expression Markup Language).
MAGE-ML is a formal language designed to describe information about
microarray-basedexperiments, including microarray designs and manufactur-
ing information, microarray experiment setup and execution information,
and gene expression data and data analysis results. The MAGE Work-
group (http://www.mged.org/Workgroups/MAGE/mage.html) has simplified
the MAGE language by omitting some elements and producing MAGE-
ML-Lite; however, the MAGE format may still be somewhat hostile for the
inexperienced user.
Furthermore, terms used to provide MIAME-compliant information
should be chosen from a controlled vocabulary, codified by the Ontology
Working Group (OGW, http://mged.sourceforge.net/ontologies/index.php).
The primary purpose of the MGED Ontology is to provide standard terms
for the annotation of microarray experiments. The terms are provided in the
form of an ontology, which not only defines precisely the terms included
in the vocabulary but also describes how the terms are related to each
other. The MGED website lists several links to MIAME-supportive gene
expression databases or microarray analysis tools that use the ontology standard
vocabulary.
Although compliance with MGED guidelines is still somewhat limited, it
is of note that journals such as Nature, Cell, and The Lancet have adopted
these guidelines for submitting microarray expression data for publication.
In addition to demanding MIAME-compliant data, Nature and Cell require
authors to submit their microarray data to a public repository as a precondition
18. 1 Genome-wide Gene Expression Analysis I 1095
Table 18.1-1 MIAME checklist
Experiment design
0 Goal of the experiment
0 Description of the experiment (e.g., abstract from a related publication)
0 Keywords (e.g., time course, cell type comparison)
0 Experimental factors (the parameters or conditions tested)
0 Experimental design - relationships between samples, treatments, extracts, and so on
0 Quality control steps taken (e.g., replicates or dye swaps)
0 Links to the publication, any supplemental websites or accession numbers
Samples used, extract preparation and labeling

0 Origin of each biological sample and its characteristics (e.g., gender, age, developmental stage, strain, or
disease state)
0 Manipulation of biological samples and protocols used
0 Technical protocols for preparing the hybridization extract and labeling
0 External controls (spikes),if used
Hybridization procedures and parameters

0 Protocol and conditions used for hybridization, blocking and washing, including any postprocessing
steps such as staining

Measurement data and specijications
Data
- The raw data, namely, scanner or imager and feature extraction output
- The normalized and summarized data (gene expression data matrix)
0 Data extraction and processing protocols
- Image scanning hardware and software, processing procedures

- Normalization, transformation, and data selection procedures
Array design
0 General array design, including the platform type
Array feature and reporter annotation, normally represented as a table

0 For each feature (spot) on the array, its location on the array and the reporter present in the location
0 For each reporter, unambiguous characteristics of the reporter molecule, including the sequence for
oligonucleotide based reporters, the source, preparation and database accession number for long
reporters, and primers for PCR-based reporters
Appropriate biological annotation for each reporter
for publication, a requirement shared by a number of other life-science journals

as well.
18.1.3.5 Public Databases for Gene Expression Data

Gene profiling experiments produce large volumes of data, whose significance
typically goes beyond the first immediate analysis of the first report.
The data generated in one laboratory may become a useful source of
information for a large number of researchers and clinicians. The need
to reinvestigate and compare over time the gene expression datasets generated
in different experimental systems has encouraged the establishment of
a growing number of public databases for gene expression data [SG].
Examples are the ArrayExpress repository of the European Bioinformatics
1096
Institute (http://www.ebi.ac.uk/arrayexpress),the Gene Expression Omnibus

(GEO) at the National Center for Biotechnology Information (NCBI) of
the National Institute of Health (GEO, http://ncbi.nlm.nih.gov/geo/), and
the Center for Information Biology Experimentation Databases (CIBEX,
http://cibex.nig.ac.jp/index.jsp) in Japan. These databases have adopted the
standards proposed by the MGED Society and implement the Gene Ontology
vocabulary.
The RNA Abundance Database (RAD; http://www.cbil.upenn.edu/RAD)
has recently been updated to provide a MIAME-supportive infrastructure
for gene expression data management [57]. Software has been developed
to generate MAGE-ML documents that permit export of studies from
RAD to other MAGE-ML compatible databases. RAD has also been
linked to an integrated databases system, Genomics Unified Schema
(GUS - http://www.gusdb.org). GUS maximizes information from stored
data by providing a platform that integrates genomic and transcriptome data
from multiple organisms (http://www.allgenes.org). The RIKEN Expression
Array Database of the Institute of Physical and Chemical Research, Japan
(READ, http://read.gsc.riken.go.jp/) is a database of expression profile
data from the RIKEN mouse cDNA microarray. It stores the microarray
experimental data and information, and provides Web interfaces for
researchers to retrieve, analyze, and display their data [58]. The Stanford
Microarray Database (SMD; http://genome-www.stanford.edu/microarray/)
serves as a microarray research database for the entire scientific community,
by providing full public access to the data published by SMD users, along with
many tools to explore and analyze those data. SMD currently provides public
access to data from 5000 microarrays. Stanford Genomic Resources also offer
a comprehensive yeast gene expression database (SGD). A project-dedicated
database is represented by Germonline (http://www.germonline.org), which
provides cross-species microarray data relevant to the mitotic and meiotic cell
cycles, as well as gametogenesis [59].
Several databases offer the possibility to perform global analysis of datasets
derived from different technologies. CleanEx (http://www.cleanex.isb-sib.ch/)
of the Swiss Institute of Experimental Cancer Research is a curated database
that includes microarray and serial analysis of expression (SAGE) expression
data. The data is presented in a way that facilitates joint analysis and cross-data
set comparisons [60].By collecting and integrating different types of expression
data, the Gene Expression Database (GXD, http://www.informatics.jax.org/
or http://www.informatics.jax.org/menus/expression_menu.shtml) provides
information about expression profiles in different mouse strains and mutants.
The database classifies genes and gene products according to the Gene
Ontology project [61,62].Links to additional gene expression databases can be
found at:
http://ihome.cuhk.edu.hk/-b400559/arraysoft~public.html and http://www.
123.genomics.com
18.1.4
18.1.4.1 Development and Function of CD4+ T-cell Subsets: Gene Profiling as a

Tool to Identify Transcriptional Networks in Infectious and
Inflammatory Diseases
The discovery of polarized subsets of CD4+ Tcells that differ in their
cytokine secretion pattern and effector functions has provided the molecular
framework for the understanding of the diversity of T-cell-dependent immune
responses against different types of pathogens [63, 641. The two subsets of
differentiated CD4+ T cells, T helper type 1 (Thl) and T helper type 2 (Th2),
protect against different microbial pathogens by producing cytokines able to
mobilize different mechanisms of defense. Thl cells are characterized by the
secretion of interferon-y (IFN-y) and are adept at macrophage activation.
Such cells have been demonstrated in numerous infectious disease models to
activate appropriate host defenses against intracellular pathogens, including
viruses, bacteria, yeast, and protozoa. Th2 cells produce interleukin (1L)-4,
IL-5, and IL-13, and are involved in the development of humoral immunity
protecting against extracellular pathogens (Fig. 18.1-6). On the other hand,
uncontrolled Thl responses are associated with inflammatory or autoimmune
pathologies such as rheumatoid arthritis (RA), insulin-dependent diabetes
mellitus (IDDM), or psoriasis and excessive Th2 responses are associated with
allergies and asthma [65].This indicates that the development of Thl and Th2
cells must be tightly controlled and that therapeutic modulation of immune
responses may have an impact on human diseases.
During the past decade, important progress has been made in the
understanding of the mechanisms that regulate the development and the
functional properties of Thl and Th2 cells. Thl and Th2 cells develop from
a common precursor, the naive CD4+ T cells. T helper cell differentiation is
initiated by triggering of the T-cell receptor (TCR) on na'ive CD4+ T cells, and
cytokines present at the time of stimulation are essential to determine the cell
fate of the developing effector T-cell population: IL-4 activates signal transducer
and activator of transcription 6 (STAT6) and promotes Th2 differentiation
while IL-12 is a potent inducer of Thl development, through activation of
STAT4 [65-671. IFN-y has been shown to be an important cofactor for Thl
cell development, because of its ability to stimulate antigen-presenting cells
(activated macrophages and dendritic cells) to produce high-levels of IL-12.
An important breakthrough in the understanding of the molecular events
that determine the differentiation and the activity of Thl and Th2 cells has
been the identification of two so-called master regulators, T-bet and GATA-3.
The transcription factor T-bet is expressed in Thl cells and activates Thl
cell-specifictranscripts such as IFN-y [68].Conversely, the transcription factor
GATA-3 plays a central role in Th2 cell development by inducing expression
ofthe Th2 cytokines IL-4, IL-5, and IL-13 [69, 701 (Fig. 18.1-7).
1098
Fig. 18.1-6 T helper cell differentiation. T h l Differentiation o f na'ive precursor T cells

and Th2 cells develop from a common into T h l or Th2 cells depends mainly on the
precursor, the na'ive CD4+ T cell. Na'l've cytokine environment at the time o f priming.
CD4+ T cells differentiate into T helper type I L-4 promotes Th2 development, whereas
1 and T helper type 2 (Thl and Th2) cells IL-12 plays a central role in controlling the
that protect against microbial pathogens by development o f T h l cells. IL-12 is produced
producing cytokines that mobilize by dendritic cells (DC), which are the most
appropriate defence mechanisms. The potent APC for na'l've CD4+ T cells. T h l cells
differentiation process is initiated by secrete IFN-y and are important effectors o f
stimulation ofthe T-cell receptor (TCR) on cell-mediated immunity, whereas Th2 cells
the naive CD4+ T cell with a peptide-major secrete IL-4, IL-5, and IL-13 (the so-called
histocompatibility complex (MHC) complex Th2 cytokines) and are important mediators
on an antigen-presenting cell (APC). o f humoral immunity.
To learn more about the differentiation and functional properties of human

Thl and Th2 cells and also to possibly identify molecules which could be of
interest for pharmacological intervention in chronic inflammatory diseases,
we decided to take an independent approach to study human Thl and Th 2
cells by analyzing their gene expression profiles [73]. We generated human
Thl and Th2 cells from cord blood leukocytes and analyzed samples 3 days
after stimulation to detect changes of gene expression that occurred early in
the differentiation process. In this study, we used Affymetrix high-density
oligonucleotide arrays with the capacity to display transcript levels of GOO0
human genes. The analysis of the chip data was performed using software
developed in house. After analyzing gene expression data from Thl and Th2
cells derived from two independent donors, we realized that it was difficult
to discriminate between subset-specific and donor-specific changes in gene
18.7 Genome-wide Gene Expression Analysis 1 1099
Fig. 18.1-7 Control o f T helper cell this model is that cytokine receptor
differentiation. Following the identification signaling and STAT activation are placed
o f T-bet as the master transcription factor upstream o f the master T helper
inducing T h l development, a model o f T lineage-determining transcription factors
helper cell differentiation has been proposed T-bet and GATA-3. This model also infers
[68]. According t o this model, IL-12 signals that T-bet and GATA-3 antagonize each
through high-affinity IL-12 receptors via other. Subsequent studies have shown that
STAT4 t o activate expression ofT-bet. following stimulation o f na’l’ve CD4’ T cells,
Subsequently, T-bet activates expression o f expression of T-bet is strongly induced by
IFN-y and represses expression ofthe Th2 IFN-y signaling and STAT1 activation [71,
cytokines IL-4, IL-5, and IL-13. Consistent 721, indicating a positive feed back loop
with previous findings from several similar t o Th2 cell differentiation. This figure
laboratories (reviewed in Refs. 65, 70), IL-4 also indicates that in addition to TCR and
directs Th2 differentiation by a mechanism cytokine receptor signaling, costimulatory
that involves STAT6-dependent activation o f molecules (such as CD28), adhesion
GATA-3 expression. GATA-3 is the “mirror molecules (such as LFA-l), and signaling
image” ofT-bet in that it activates through other cell surface receptors (e.g.,
expression of Th2 cytokines and represses CD40-CD40 ligand interactions) can
the T h l cytokine, IFN-y. The main feature o f influence T helper cell differentiation.
expression. We therefore decided to analyze gene expression in Thl and Th2

cells generated from three additional donors and to analyze the dataset using
a statistical algorithm (paired t-test). We found 215 genes to be differentially
expressed at a confidence level of 95% and whose change in expression level
was at least twofold. To confirm the results obtained with oligonucleotide
arrays with an independent technique, we also analyzed mRNA expression of
a selected set of genes in Thl and Th2 RNA samples using kinetic RT-PCR
[74]. As expected, we noticed variability in gene expression changes in cell lines
derived from different subjects, but we could confirm differential expression of
28 of 29 genes in Thl and T h 2 cells generated from two independent donors.
cytokines, growth factors and receptors apoptosis and proteolytic systems
IFN-gamma V00536 0048 61 CD26IDPPIV X60708 0044
IL-IPR beta2 U64198 0009 XlAP associated factor 1 X99699 0013
oncostatin M M27288 0001 perforin M31951 0021
leplin 049487 0033 L i 90 granzyme B M28879 0010 d
4
EGF-like growth factor M60278 0041 0 7 2 TRAIL U37518 0.013
FGF-RllN-sam X66945 0.023 I-TRAF U59863 0017 8
IL-18R U43672 004 7 pre-granzyme 3 U26174 0008
lymphotoxin bela U89922 0013 UPAR U09937 0006
TNF-R1 M58286 0037 M27891 0003
cystalln c 1 3
LlFR S83362 0006
TNF-R2 elastase Inhibitor M93056 0046 -23
M32315 0031
IL-IOR U00872 0016 IAP homolog C U37546 0036 23
granzyme H M36118 0025 22
BAK 2 U16812 0047
transcriptional regul;ition caspase 8 X98172 0027
ets-1 XI 4798 0 022 protease M U62801 0029 62
NF-IL6 beta M83667 0 013
d"
ROR alpha 2 U04898 0 001 I 8 4 enzymes and other signaling molecules
ISGF-3 p91 M97936 0 003 65
IFN-induced GBP-1 M55542 0036
IRF-1 LO5072 0 003 46
PKC-L M55284 0.045
GATA-lIERYF1 XI 7254 0 038 13.5
NKG5 M85276 0.006
TlNUR 577154 0 028
p1rn-1 MI6750 0001
IRF-7A U53830 0 020
c-myb
IFN-induced protein 35
ICSBP
HIF-1 alpha
USFZ
leucine ripper protein
U22376
U72862
M91196
U22431
X90824
LO6633
0 001
0015
0 021
0 002
0019
0 004
- I CD38
CD69
RAB 32
IFN-induced GBP-2
PGEP receptor EP2
MAPKKK5
D84276
230426
U59878
M55543
LZ8175
U67156
0.007
0.027
0.006
0.003
0.038
0023
PKC-beta 2 X07109 0000
EZF-4 U15641 0 047 I L L
MAPK-actsated kinase MNKl A8000409 0 007
XBP-1 M31627 0 001 21
CD40-ligand D31797 0009
HOX-1A U37431 0 020 -24
BF-2 X74143 ITK L10717 0.006
0016 24
EGRZ
EGR alpha
581439
J04076
0021
0 049
25
25
-- beta-arrestin 2
kinase suppressor of RAS-1 KSRI
AF106941
U43586
0.001
0036
PPZA subunit delta L76702 0018
a
GCFnCF 9 M29204 002 4
PTP zeta M93426 0019
GATA 3 X58072 0003 :
:B PTP-alpha M34668 0007
' EBV-induced GPCR EBI 2 LO8177 0043
adhesion and migration GPCR EDG-I M31210 0009
MIP-1 beta M69203 0032 GPCR GPR6 U18549 0010
p-cadhenn X63629 0001 FDG-1 RHOIRAC GEF U11690 0 030
MIP-I alpha M23178 0001 lhrornboxane A2 receptor D38081 0025 47
CXCR3 X95876 0046
MIG X72755 0047 metabolic pathways
L-selectlr M25280 0007 phosphodiesterase 48 L20971 0018 159
CD97 U76764 0001 E32 2 senne-pyruvale amlnolransferase X53414 0 021
CD2 MI6336 0012 h 21 metallothionin X64177 0 008
TARC D43767 0029 annex,n 111 L20591 0004
CD6 X60992 0017 GTP cvclohvdrolase
~. I U19523 0017
ICAM 2 M32334 0039 acyl :oA synthelase LO9229 0.013
integrin beta 7 S80335 0001 -257 apol ioprolein E receptor 2 275190 0.010
cyc1 h l i n M80254 0.001
tern ial transferase MI1722 0003 3
ion channels and transporters
adei dale cyclase D25538 0.015 36
voltage gated Calcium channel U07139 0 039 54 NDf kinase YO7604 0.006 37
delayed rectifier K(+) channel AF003743 0 039 SPlll osomal protein SAP 61 UO8815 0007 38
lanotropic ATP receptor P2X U49395 0 027 alde yde dehydrogenase 7 U10868 0004 61-
vacuolar proton ATPase X71490 0046 -
TAP 1 X57522 0003
Expression level: > 1000 200 -1 000 0 < 200
4 Fig. 18.1-8 Gene expression profiles o f negative values indicate the opposite. Colors
human T h l and Th2 cells generated from indicate the “absolute” expression level o f a
five independent donors were analyzed gene (arbitrary fluorescence units). Black:
using high-density oligonucleotide arrays. high level ofexpression (>1000); grey:
Genes were selected i f differential medium level o f expression (200-1000);
expression between T h l and Th2 cells was white: low transcript abundance (<200). The
determined at a confidence level o f 95% on column next to the bar diagram indicates
the basis o f t-test statistics performed on a the P value obtained from the result o f a
dataset derived from five independent paired t-test performed with the data from
experiments and i f at least a twofold change independently derived T h l and Th2 cell lines
in expression level was observed. Bars from five donors. Genes were grouped
represent “fold change” ofthe mRNA level according to their presumed function, based
o f a particular gene when comparing T h l on information available in public databases
versus Th2 cells (mean o f five experiments). or in the literature (from Ref. 73).
Positive values indicate that the transcript is
more abundant in T h l than in Th2 cells and
Well-established marker genes for Thl cells, such as IFN-y and IL-12Rp2
were found at much higher levels in Thl than in Th2 cells (Fig. 18.1-8).In
addition, some genes that had previously not been implicated in the process
of T helper cell differentiation, such as oncostatin M (OSM), were found to
be overexpressed in Thl cells (Fig. 18.1-8).The gene expression profiles of
Thl and Th2 cells also revealed differential expression of genes encoding
transcription factors, some of which (GATA-3and IRF-1) had previously been
characterized in the context of T helper cell differentiation [69, 75-77]. In
addition, several transcription factors that had not been associated with T
helper cell polarization were also identified, including ETS-1, RORa2, IRF-
7A, and c-fos. Although, the target genes of these factors in regulating the
gene expression patterns specific to each T helper cell subset are not known,
it is possible that some of these factors may control individual cytokine
gene expression as GATA-3 and T-bet control IL-4 and IFN-y production,
respectively. In fact, the recent analysis of Ets-1-deficient mice demonstrated
that this transcription factor is an important cofactor ofT-bet to promote IFN-y
production and is essential for the efficient development ofThl responses [78].
Thl cells are more susceptible to activation-induced cell death (AICD), a
mechanism for downregulation of an immune response and maintenance of
T-cell tolerance and are important mediators of tissue damage in inflammatory
and autoimmune diseases. Results from our gene expression analysis
suggested a potential mechanism for increased susceptibility of Thl cells
to AICD and their cytotoxic effects [73]. Thl cells expressed higher levels of
TRAIL than Th2 cells, an apoptosis inducing molecule; BAK, a proapoptotic
Bcl-2 family member; and proapoptotic caspase-8, perforin and granzyme B
(Fig. 18.1-8).The functional program of Thl and Th2 lymphocytes requires
these cells to be home to different sites. Thl cells have been shown to
preferentially express the chemokine receptor CCR5 and CXCR3, whereas
Th2 cells were reported to preferentially express CCR3, CCR4, CCR8, and
1102
I the chemoattractant receptor CRTh2 [79]. Other gene expression changes
identified in our study were consistent with previous experiments defining
differential recruitment of Thl and Th2 cells to sites of inflammation. We
reported an increased expression of mRNA for fucosyltransferase VII (FucT-
VII), which codes for an enzyme that mediates the fucosylation of selectin
ligands on the surface of T cells (Fig. 18.1-8).This fucosylation is required for
the first step of lymphocyte adhesion to endothelial cells, “rolling”. Recent
in vivo observations have validated the biological relevance of this finding:
FucT-VII was in fact found to be upregulated in Thl cells infiltrating the
inflamed joints of patients affected by either RA or juvenile idiopathic arthritis
(JIA) [73,80].Moreover, FucT-VII expression and increased P-selectin binding
capacity of T cells were associated with a more severe course of the disease
[80].These data indicate a critical role of FucT-VII in the enhanced homing
of T cells to the inflamed synovium and suggest that inhibitors of FucT-VII
enzyme activity may be of significant therapeutic value in the treatment of
chronic arthritis.
IL-12 also induced two chemokine receptors CCR5 and CCR1, both of
which promote increased responsiveness ofThl, but not Th2, cells to MIP-la
or RANTES. The activity of RANTES and other chemokines is regulated
by CD2G (dipeptidyl-peptidase 1V)-mediated cleavage. The DPP4 (encoding
CD2G) mRNA was found upregulated in Thl cells compared to Th2 cells
(Fig. 18.1-8).The inactivation of chemokines by CD2G may contribute to the
fine control of chemotactic migration of Thl cells by providing a stop signal
that keeps cells at the site of inflammation. Finally, higher expression of
integrin aGP1 on Thl cells suggested that adhesion and extravasation of Thl
cells into tissues triggered by inflammatory chemokines might be mediated by
higher surface levels of integrin aGP1 binding to laminin in basal membranes
and extracellular matrix.
Of the 215 genes which we found differentially expressed in Thl and Th2
cells, 157 genes were expressed at higher levels in Thl cells and 58 genes
were overexpressed in Th2 cells. There are several possible explanations for
the apparent Thl bias of our shdy. Previous studies have demonstrated that
Th2 cells may require more time to acquire their effector functions than Thl
cells [81,821.
Hamalainen et al. have used an oligonucleotide microarray specifically
designed to screen for 250 inflammation-related genes to identify those
differentially expressed in human, cord blood-derived Thl and Th2 lines,
2 weeks after initial stimulation [83]. Although the experimental protocol
to generate Thl and Th2 cells used in the study by Hamalainen et al. was
quite distinct from our protocol [73], there was a large overlap of the genes
identified in both studies. In addition to the Thl/Th2 signature cytokines,
several chemokines (MIP-la, MIP-lP, RANTES) and chemokine receptors
(CCR1, CCR2, CCR4, CCR5) were found differentially expressed in human
Thl and Th2 cells [83]. These results further emphasize the importance of
correct homing of polarized effector T cells to eradicate pathogens.
18. I Genome-wide Gene Expression Analysis 1 1103
In a subsequent study, Chtanova et al. used high-density oligonucleotide

microarrays to analyze gene expression in mouse CD4’ Thl and Th2 cells, as
well as CD8+ type 1 and type 2 T cells (Tcl and Tc2) [84]. In contrast to our
study in which Thl-overexpressed genes predominated [73], Chtanova et al.
identified more type 2-biased genes [84]. It is important to note that different
protocols were used to generate polarized T-cell subsets in the two studies.
Chtanova et al. stimulated purified naive mouse CD4+ and CD8+ T cells with
anti-CD3/CD28 antibodies, IL-2, and IL-6 plus the polarizing cytokine cocktail.
Cells were cultured for 7 days and then restimulated for 24 h with anti-CD3
before extracting RNA. A previous report has demonstrated that IL-6 is able to
polarize naive CD4’ T cells into Th2 cells by inducing the initial production
of IL-4 in CD4+ T cells [%I. In addition, it has been shown that IL-6 inhibits
Thl differentiation in an IL-4-independent manner through the induction
of SOCSl [86]. The addition of IL-6 to the cultures could therefore be a
possible explanation for the Th2 bias observed in this study [84]. Genes that
showed a change of least twofold in at least two separate experiments were
considered as differentially expressed. An interesting finding of this study
was that STAT4 (the signal transducer relaying IL-12 signals from the cell
surface to the nucleus) was expressed at higher levels in mouse Thl than in
Th2 cells. We did not observe differential expression of STAT4 mRNAs in
human Thl and Th2 cells [73] and Hamalainen et al. found a rather modest
downregulation of STAT4 transcripts in human Thl cells [83]. Protein data
from our lab and from others did not indicate differential expression of STAT4
in human Thl and Th2 cells [87-901. However, a study by Usui et al. confirmed
downregulation of STAT4 mRNA and protein during differentiation of mouse
Th2 cells [91].This study also showed that downregulation of STAT4 in Th2
cells is mediated by GATA-3, the “master” regulator of Th2 development
[91]. It is at present not clear whether the observed differences of STAT4
regulation in human and mouse Thl and Th2 cells reflect a species-specific
differentiation program or result from the experimental conditions in which
the cell populations were generated. Chtanova et al. also found two members
of the tumor necrosis factor receptor-associated factor (TRAF) family to be
differentially expressed in Thl and Th2 cells. TRAF4 was expressed at a
higher level in type 1 cells while TRAFS was preferentially expressed in type 2
cells. Members of this family serve as adapter proteins that mediate cytokine
signaling; in particular, they seem to play a role in tumor necrosis factor (TNF)
and Toll/IL-1 signaling, resulting in activation of transcription factors NF-KB
and AP-1. Clearly, additional studies are required to address the biological
relevance of these findings.
A more recent study addressed specifically the kinetics of gene expression
during mouse T helper cell differentiation. Lu et al. analyzed gene expression
in unstimulated naive CD4+ T cells and in cells stimulated for 1, 2, 3, and
4 days in Thl- or Th2-inducing conditions [92]. In addition, the authors
analyzed the gene expression profiles in Thl and Th2 cells that had been
restimulated for 4 h with anti-CD3 antibodies, a procedure that induces, in
1104
I particular, expression of genes that are associated with the effector functions of
T helper cells. Two independent experiments were performed and genes that
showed greater than twofold changes in both were chosen for further analysis.
A global hierarchical clustering analysis revealed that the expression pattern
of day 1 or day 2 Thl cells is closer to day 1or day 2 Th2 cells than to Thl cells
harvested on day 3 or 4. A similar relationship was also observed for Th2 cells,
indicating that at the global gene expression level, Thl and Th2 cells begin
to diverge at day 3 after primary stimulation [92]. These findings correlate
with previous studies that analyzed the kinetics of changes in the chromatin
structure at the IFN-y and IL-4 cytokine loci. Histone hyperacetylation at the
IL-4 locus was observed in both Thl and Th2 cultures during the first 2 days
of T helper cell differentiation. However, at later time points of T helper cell
differentiation, histone acetylation was selectively detected at the IL-4 locus in
T h 2 cells and at the IFN-y locus in Thl cells [93].
The above studies have provided insight into the mechanisms that control
the development of polarized helper T-cell subsets and have given important
information about previously unknown effector functions of these cells.
However, the in vitro systems used to generate polarized Thl and Th2 cells
might not reproduce the conditions that lead to the differentiation of these
subsets in vivo. In addition, a critical issue that could not be addressed in these
studies concerns the interaction of differentiated Thl and Th2 cells with the
tissues, during an infection or in the setting of an inflammatory disease.
Infection with the parasite Schistosoma mansoni is a well-established model
to study Th2 responses in vivo [64]. Intravenous injection of S. mansoni
eggs, which are retained in the lung, results in a strong Th2 response and
granuloma formation in the lung. This model has been widely used to study
basic mechanisms of asthma, allergy, and other Th2-mediated inflammatory
diseases. Neutralization of IL-4 in this model results in a reduced granuloma
size and a diminished Th2 response, whereas neutralization of IL-12 results
in increased granuloma size and Th2 cytokine production. In the absence of
the immunoregulatory cytokine IL-10, enhanced levels of IL-4 and IL-12 are
secreted, compared to wild-type mice [94]. IL-4/IL-10 and IL-lO/IL-l2 double
knockout mice develop highly polarized Thl and Th2 responses, respectively,
after infection with S. rnansoni eggs [95]. Sandler et al. have recently analyzed
gene expression profiles of lung tissue from wild type, IL-4/IL-10, and
IL-lO/IL-12 double-deficient mice at several time points after challenge with
S. mansoni eggs [96].They found that Thl-polarized mice developed only small
granulomas and expressed genes that are characteristic of tissue damage. In
addition to genes known to be associated with Thl responses (IFN-y-induced
genes and TNF-a-induced protein 2), Thl-polarized mice expressed several
chemokines (IFN-y-inducibleprotein10 and RANTES),as well as Natural Killer
(NK) cell ligands. Activation of macrophages, a hallmark of Thl responses,
was reflected by the upregulation of MIP-3a, macrophage-expressed gene 1,
macrosialin, and macrophage C-type lectin. Thl-polarized mice also showed
features of the acute-phase response, as levels of both IL-1B and its activator
caspase 1 were increased, as well as serum amyloids A2 and A3. A particular

striking observation was the upregulation of cytotoxic genes such as granzymes
A, B, and K, caspases 1 and 3, and the programmed cell death 1ligand. Finally,
genes responsible for intracellular protein degradation, including ubiquitin
D, ubiquitin-conjugating enzyme 8, and cathepsin D are also upregulated
in Thl-polarized mice [96]. I n contrast, Th2-polarized mice formed large
granulomas with massive collagen deposition and demonstrated upregulation
of genes associated with wound healing [9G]. In particular, expression of IL-13,
an important mediator of fibrosis, chemokines that recruit Th2 effector cells,
such as MCP-2, and genes induced by Th2 cytokines, such as TGF-p-induced
and IL-4-induced gene 1 were found to be upregulated in Th2-polarized
mice. Eosinophilia in the lung tissue correlated with the expression of several
eotaxins (chemokines that recruit eosinophils) and with increased expression
of eosinophil-specific genes such as eosinophil-associated ribonucleases 1, 2,
and 5. Furthermore, the presence of alternatively activated macrophages was
indicated by their markers, arginase and leukotriene. Thromboxane synthesis
was suggested by the induction of arachidonate 15-lipoxygenaseand platelet
thromboxane A synthase 1. As noted above, a variety of genes involved in
wound healing is found in the lung tissue ofTh2-polarized mice. These include
the matrix metalloproteinases (MMPs) 12 and 13, and the gene encoding tissue
inhibitor of matrix metalloproteinases (TIMP)-I,the protein that degrades the
majority of MMPs. The time-course analysis revealed maximum expression
of MMP-9, MMP-13, and TIMP-1 at day 4, followed by TIMP-2 at day 8, and
MMP-12 peaking at day 14. Of note, precursors of elements of the extracellular
matrix, the procollagens followed a similar pattern. Procollagen types I, 111,
and XVIII were expressed early at day 4, followed by procollagen type XIV
and XV, peaking at day 8 [9G]. In conclusion, this study demonstrated that
Thl responses to S. mansoni eggs are characterized by the expression of genes
crucial for cytotoxicity and tissue damage, whereas Th2 responses direct tissue
remodeling and wound healing [96].
All these studies have shown the impact of large-scale gene expression
profiling on the analysis of polarized helper T-cell populations. The analyses of
the expression of 6000 genes in human Thl and Th2 cells and of 11 000 genes
in mouse Thl, Tcl, Th2, and Tc2 cells were first attempts to understand the
molecular mechanisms underlying the functional diversity of distinct CD4'
T-cell subsets. The finding that genes regulating key steps in the process
of leukocyte extravasation into inflamed tissues are coregulated in human
T-cell subsets, sheds light on the importance of the correct homing of T cells
within tissues to eliminate pathogens. Moreover, the analysis of global gene
expression profiles during lung inflammation in an infectious disease model
has revealed important information about the divergent effects of polarized
T h l and Th2 responses on tissues. These large-scale studies have furthered
the understanding of the genetic program that controls the differentiation and
functional properties of polarized helper T-cell subsets and may have impact
on the development of more advanced therapies for inflammatory diseases.
1106
18.1.4.2 Uncoveringthe Mysteries of Regulatory CD4+ CD25' T Lymphocytes

by Gene Expression Profiling
One of the central problems in immunology is to understand how the
immune system can discriminate between self and nonself, thereby inhibiting
autoimmunity but mounting efficient immune responses to eradicate
infectious microorganisms. The evolution of the adaptive immune system
in higher vertebrates allows a more efficient and specific elimination of
invading pathogens than the ancestral innate immune system. The adaptive
immune system is characterized by the random generation of antigen
receptors in lymphocyte clones with essentially unlimited specificities. This
system, however, also generates self-reactivelymphocytes and therefore poses
the threat of autoimmunity. Work over the past decades has unraveled
the cellular and molecular mechanisms that lead to the elimination or
functional inactivation of autoreactive T and B cells in the thymus and bone
marrow, respectively. The clonal deletion of self-reactiveT cells in the thymus
by apoptosis is called recessive tolerance because elimination of individual
autoreactive T-cell clones does not affect other autoreactive clones. However,
it soon became clear that not all autoreactive T cells are deleted in the
thymus and that additional mechanisms must exist to maintain tolerance.
Work over the past 10 years has identified a subpopulation of CD4+ T cells
that acts in a dominant way to actively suppress immune activation and plays
a critical role in the maintenance of self-tolerance and homeostasis. These
cells are characterized by high-level expression of the a chain of the IL-2
receptor (CD25) and have been called natural CD4+ CD25+ T regulatory
(Treg) cells [97] (Table 18.1-2). The identification of CD25 as a cell surface
marker of Treg and the development of an i n vitro T-cell suppression assay
have greatly facilitated the analysis of the mechanisms of T-cell-mediated
dominant tolerance [98, 991. Although this is still a matter of some discussion,
Table 18.1-2 An overview of CD4+ T-cell subsets
CD4+ T-cell Function Transcription factor Cell surface Cytokines secreted

subset involved in lineage marker following
specification stimulation
Treg Maintenance of tolerance FOXP3 CD25 IL-10 (?)

Prevention of autoimmunity CTLA-4
Homeostasis GITR
Thl Cell-mediated immunity T-bet IL-12Rp2 IFN-y
Protection against STAT4
intracellular pathogens
Th2 Humoral immunity GATA-3 CRTHZ IL-4
Protection against STATG IL-5
extracellular pathogens IL-13
Only main features of the individual subsets are shown

Molecules in bold are unique for the respective subset.
18. I Genome-wide Gene Expression Analysis 1 1107
there is now good evidence that CD4+ CD25+ Treg constitute a separate
lineage that develops in the thymus (see Refs. 100-102 for recent reviews).
The recent identification of FOXP3 as a transcription factor essential for the
development and function of CD4+ CD25+ Treg has provided an important
breakthrough for the analysis of this subpopulation of peripheral CD4+ T cells
[ 103- 1051. Evidence that this forkhead/winged-helix transcription factor is
essential for Treg development comes from the analysis of scurfy mice. These
mice carry a mutated Fox@ gene and are characterized by a massive activation
and expansion of CD4+ T cells resulting in gross enlargement of secondary
lymphoid organs, severe dermatitis, lymphocytic infiltration of multiple
organs, hypergammaglobulinemia, and autoimmune hemolytic anemia [ 1061.
The analysis of scurfy mice demonstrated that the disease is mediated by CD4+
T cells. This finding was confirmed by the analysis of FOXP3-deficient mice,
which display polyclonal activation of CD4+ T cells already 7 days after birth
[103]. By knock-in of a GFP-FOXP3 reporter allele into the murine FOXP3
locus, the Rudensky laboratory has now provided compelling evidence that
Treg constitute a separate lineage that develops in the thymus and that FOXP3
is in fact the lineage-specification factor of these cells [107].
Importantly, FOXP3 mutations are also responsible for the pathogenesis
of immune dysregulation, polyendocninopathy, enteropathy, X-linked (IPEX),
a fatal human X-linked disorder characterized by extensive multiorgan lym-
phocyte infiltration and abnormal activation of effector CD4+ Tcells. At a
very young age, IPEX patients present with massive lymphoproliferation,
early onset IDDM, thyroiditis, eczema, severe enteropathy, and food allergies
preventing normal food intake, and additional autoimmune pathologies such
as autoimmune hemolytic anemia and thrombocytopenia, as well as severe
infection [108-1101. Affected males succumb to the IPEX syndrome between
3 and 4 weeks of age. Altogether, there is compelling evidence that FOXP3
is necessary for development of CD4+ CD25+ Treg in mice, and the identifi-
cation of FOXP3 mutations in IPEX patients suggests that this transcription
factor plays a similar role in humans. Although the identification of FOXP3
as lineage specification of Treg has provided a precious tool to understand
the ontogeny and function of this lineage, the important question of how
Treg acquire and exert their suppressive action remains unresolved [ill]. In
particular, the target genes of Foxp3 have not been identified and nothing
is known about the molecular mechanism by which this transcription factor
downregulates the activity of CD4+ T cells. Given the accumulating evidence
that the immunosuppressive potential of Treg could be used therapeutically
to treat autoimmune diseases and facilitate transplant tolerance, or could be
targeted to elicit tumor immunotherapy, it is not surprising that many labo-
ratories are currently trying to unravel the molecular basis of Treg-mediated
immunosuppression. Several labs have performed large-scale gene expression
studies to identify molecules mediating the suppressive effects of Treg. Most
of these studies have been performed in mice and, given the current pace of
the field, human studies are sure to follow soon.
1108
Gavin et al. have purified resting CD4+ CD25+ and CD4+ CD25- T cells
from normal BG mice by cell sorting and have analyzed their gene expression
profiles using Affymetrix m u l l K and mul9K oligonucleotide arrays [112].
In the first experiment, biotinylated cRNA was amplified directly from
cDNA, whereas in the second experiment two sequential rounds of in
vitro transcription were used to obtain enough cRNA for analysis. With a
few exceptions, only transcripts that were differentially expressed in both
experiments were considered for confirmation by real-time RT-PCR. A
comforting finding was the strong upregulation of CD25 in Treg when
compared to CD4+ CD25- T cells. Additional cell surface receptors that were
upregulated in Treg included cytotoxic T lymphocyte-associated protein 4
(CTLA4), a molecule that has been implicated in the suppressive effects
of Treg, and several members of the TNF receptor superfamily, including
glucocorticoid-induced tumor necrosis factor receptor (GITR, also called
Tnf$l8), OX40 (also called Tnf$4), 4-1BB (also called Tnfsp),and TNFR2
(also called Tnfsfllb, the p75 chain of the TNF receptor). Together with the
overexpression of FAS-associated phosphatase I (FAP-l), these data point
to a prolonged survival of Treg by restriction of TCR-induced apoptosis.
Furthermore, the authors found higher transcript levels of TGF-BRI, the
signal-transducing subunit of TGF-B, an important negative regulator of
cell growth and inflammation. Additional transcripts that were found
overexpressed in Treg include the suppressors of cytokine signaling SOCSl
and SOCSZ, as well as RGSI, a molecule that inhibits chemoltine-induced
signaling through heterotrimeric G proteins [ 1121. The authors concluded that
the interplay of several pathways, such as increased T-cell survival and blockage
of TCR and cytokine signaling, may account for the unique characteristics of
Treg [112].
The characteristics of mouse Treg and CD4+ CD25- Tcells were also
analyzed in a similar study by McHugh et al. [113].As in the previous report,
only two biological replicates were performed; however, this study also analyzed
the gene expression profiles of Treg and CD4+ CD25- T cells that had been
stimulated for 12 and 48 h with anti-CD3 antibodies. Gene expression profiling
was performed using Affymetrix m u l l K oligonucleotide arrays. Only 29 genes
were found to be differentially expressed when comparing resting Treg and
CD4+ CD25- T cells. For unknown reasons, in this study,the “positive control”
of this experiment, CD25, was not detected in Treg [113].Although the use of
only two replicate experiments in both studies certainly does not allow major
conclusions to be drawn, the fact that 50% ofthe genes found by McHugh et al.
were also detected in the study by Gavin et al. provides some cross-validation
of the results [112, 1131. McHugh et al. focused their study on the functional
role of GITR for the suppressive functions of Treg and demonstrated that
agonistic antibodies against G ITR could abrogate Treg-mediated suppression
in in vitro T-cell suppression assays [113].Additional microarray-based studies
have identified neuropilin-1 (Nrpl) [ 1141 and lymphocyte activation gene-3
(Lag-3) [115] as Treg-specific cell surface molecules. With respect to Lag-3, it
should be noted that this receptor is also highly expressed on activated Thl
cells [116].
Herman et al. have recently analyzed the function of Treg in a type 1
diabetes model in mice [117].Type 1 diabetes models are particular useful for
the study of autoimmune diseases because mice spontaneously develop the
disease and their pathology is very similar to the human counterpart, IDDM.
The disease develops in two stages: in the BDC2.5 model cells invade the
pancreas and set up a massive infiltrate in the islets at 15-18 days of age
(insulitis). Subsequently, only 10-20% of animals develop diabetes resulting
from the massive destruction of pancreatic ,&cells at around 20 weeks of age.
The authors studied whether the relatively long prediabetic period and low
incidence of diabetes in this model may be explained by the presence of Treg
in the pancreas. They show that both Treg and effector T cells coexist within
the pancreatic lesion before the onset of diabetes. To assess the potential
roles of Treg within the lesion, they sorted CD4+ CD25+ CDG9- Treg cells
from the pancreas of prediabetic mice and compared their gene expression
profile to effector T-cell populations, also isolated by cell sorting from pancreas
preparations. Since only small cell numbers could be obtained with these
procedures, the authors used commercial kits to amplify RNA. Three to
five independent experiments were performed for each cell population and
statistical algorithms were used for data analysis [117]. In addition to genes
overexpressed in Treg, such as GITR, CD103, Nrp-1, IL-10, and CTLA-4, the
authors identified several molecules that had previously not been associated
with Treg functions [117]. One of these molecules, inducible costimulator
(ICOS), was shown to be specifically upregulated on Treg purified from
pancreas but not on Treg that had been purified from peripheral lymph nodes.
The authors showed that blockade of ICOS results in a rapid progression
from insulitis to diabetes, giving a strong indication that this molecule may
play an important role in the maintenance of the prediabetic stage [117].This
study provides an excellent example of how increased understanding of the
molecular and cellular basis of regulatory events in the pancreatic islets could
lead to the development of therapies that promote long-term tolerance even
after an immune response has been established in the lesion.
18.1.5
Future Development
Genome-wide gene expression analysis has become a tool that is widely used
in biology and biomedical research. Technological improvements are likely to
occur with respect to reduced sample input and/or more robust protocols for
the preamplification of RNA, an increase of sensitivity, a better signal-to-noise
ratio, the development of exon-specific probes to tackle the important issue
of differentially spliced transcripts and of probes allowing the analysis of
micro-RNAs. An equally important, although certainly more difficult, issue
1110
concerns standardization. Some of the efforts in this direction have been

discussed in this review but we would like to reemphasize the importance of
a common language for gene annotation and standardized information about
experimental setups. Initially, microarray technology was used to answer very
basic biological questions and usually the analysis focused on the identification
of differentially expressed genes. Now, as microarray technology is becoming
more widely used, it is possible to address more and more sophisticated
questions by reanalyzing gene expression data present in the databases.
Finally, the development of whole-genome microarrays will allow the study of
genome-wide regulation of gene expression. The recently developed, so-called
ChIP-on-chip technology analyzes the association of a specific factor to a
particular region of the genome, at a given time. Very recent examples for this
technology include the genome-wide analysis of RNA polymerase I1 association
with the yeast genome [118],the global mapping of histone acetylation patterns
to gene expression in yeast [119],and the genome-wide analysis oftranscription
factor binding to the yeast genome [120].
References
1. D.J. Lockhart, E.A. Winzeler, DNA microarrays, Nut. Genet. 1999,

Genomics, gene expression and DNA 21,33-37.
arrays, Nature 2000, 405,827-836. 8. R. J. Lipshutz, S.P. Fodor, T.R.
2. M.B. Eisen, P.T. Spellman, P.O. Gingeras, D.J. Lockhart, High
Brown, D. Botstein, Cluster analysis density synthetic oligonucleotide
and display of genome-wide arrays, Nut. Genet. 1999, 21, 20-24.
expression patterns, Proc. Natl. Acad. 9. S.P. Fodor, J.L. Read, M.C. Pirrung,
Sci. U.S.A. 1998, 95, 14863-14868. L. Stryer, A.T. Lu, D. Solas,
3. j. Quackenbush, Computational Light-directed, spatially addressable
analysis of microarray data, Nat. Rev. parallel chemical synthesis, Science
Genet. 2001,2,418-427. 1991,251,767-773.
4. H.C. Causton, J. Quackenbush, 10. A.C. Pease, D. Solas, E.J. Sullivan,
A. Brazma, Microarray Gene M.T. Cronin, C.P. Holmes, S.P.
Expression Data Analysis. A Beginner’s Fodor, Light-generated
Guide, Blackwell Publishing, Oxford, oligonucleotide arrays for rapid DNA
2003. sequence analysis, Proc. Natl. Acad.
5. M. Schena, D. Shalon, R.W. Davis, Sci. U.S.A. 1994, 91, 5022-5026.
P.O. Brown, Quantitative monjtoring 11. D.J. Lockhart, H. Dong, M.C. Byrne,
of gene expression patterns with a M.T. Follettie, M.V. Gallo, M.S.
complementary DNA microarray, Chee, M. Mittmann, C. Wang,
Science 1995, 270, 467-470. M. Kobayashi, H. Horton, E.L.
6. R.J. Lipshutz, D. Morris, M. Chee, Brown, Expression monitoring by
E. Hubbell, M.J. Kozal, N. Shah, hybridization to high-density
N. Shen, R. Yang, S.P. Fodor, Using oligonucleotide arrays, Nut.
oligonucleotide probe arrays to Biotechnol. 1996, 14, 1675-1680.
access genetic diversity, Biotechniques 12. T.H.G.S. Consortium, Finishing the
1995, 19,442-447. euchromatic sequence of the human
7. P.O. Brown, D. Botstein, Exploring genome, Nature 2004, 431,
the new world of the genome with 931-945.
References I 11 11
13. M.K. Kerr, G.A. Churchill, profiling data interpretation, BMC

Experimental design for gene Bioinformatics 2002, 3, 4.
expression microarrays, Biostatistics 22. J.P. Novak, R. Sladek, T.J. Hudson,
2001, 2, 183-201. Characterization of variability in
14. B.A. Williams, R.M. Gwirtz, B.J. large-scale gene expression data:
Wold, Genomic DNA as a Implications for study design,
cohybridization standard for Genomics 2002, 79, 104-113.
mammalian microarray 23. C.C. Pritchard, L. Hsu, J. Delrow,
measurements, Nucleic Acids Res. P.S. Nelson, Project normal:
2004,32, e81-e81. Defining normal variance in mouse
15. N. Novoradovskaya, M. Whitfield, gene expression, Proc. Natl. Acad. Sci.
L. Basehore, A. Novoradovsky, U.S.A. 2001, 98,13266-13271.
R. Pesich, J. Usary, M. Karaca, 24. M.C.K. Yang, J.J. Yang, R.A.
W. Wong, 0. Aprelikova, M. Fero, Mclndoe, J.X. She, Microarray
C. Perou, D. Botstein, J. Braman, experimental design: power and
Universal reference RNA as a sample size considerations, Physiol.
standard for microarray experiments, Genomics 2003, 16,24-28.
BMC Genomics 2004, 5,20. 25. C. Wei, J. Li, R. Bumgarner, Sample
16. P.J. Park, Y.A. Cao, S.Y. Lee, J.-W. size for detecting differentially
Kim, M.S. Chang, R. Hart, S. Choi, expressed genes in microarray
Current issues for DNA microarrays: experiments, B M C Genomics 2004,5,
platform comparison, double linear 87.
amplification, and universal RNA 26. S.-H. Jung, H. Bang, S. Young,
reference, J . Biotechnol. 2004, 112, Sample size calculation for multiple
225 -245. testing in microarray data analysis,
17. W. Gregory Cox, M.P. Beaudet, J.Y. Biostatistics 2005, 6, 157-169.
Agnew, J.L. Ruth, Possible sources of 27. K. Dobbin, R. Simon, Sample size
dye-related signal correlation bias in determination in microarray
two-color DNA microarray assays, experiments for class comparison
Anal. Biochem. 2004, 331, 243-254. and prognostic classification,
18. A.A. Dombkowski, B.J. Thibodeau, Biostatistics 2005, 6, 27-38.
S.L. Starcevic, R.F. Novak, 28. C.-A. Tsai, S.-J. Wang, D.-T. Chen,
Gene-specific dye bias in microarray J.J. Chen, Sample size for gene
reference designs, FEBS Lett. 2004, expression microarray experiments,
560,120-124. Bioinformatics 2005, 21, 1502-1508.
19. M.-L. Martin-Magniette, J. Aubert, 29. X. Peng, C. Wood, E. Blalock,
E. Cabannes, J.-J. Daudin, Evaluation K. Chen, P. Landfield, A. Stromberg,
of the gene-specific dye bias in cDNA Statistical implications of pooling
microarray experiments, RNA samples for microarray
Bioinformatics 2005, 21, 1995-2000. experiments, BMC Bioinformatics
20. B.H. Mecham, G.T. Klus, J. Strovel, 2003, 4, 26.
M. Augustus, D. Byrne, P.Bozso, 30. C. Kendziorski, R.A. Irizarry, K.-S.
D.Z. Wetmore, T. J. Mariani, I.S. Chen, J.D. Haag, M.N. Gould, On the
Kohane, 2. Szallasi, utility of pooling biological samples
Sequence-matched probes produce in microarray experiments, Proc.
increased cross-platform consistency Natl. Acad. Sci. U.S.A. 2005, 102,
and more reproducible biological 4252-4257.
results in microarray-based gene 31. J.H. Shih, A.M. Michalowska,
expression measurements, Nucleic K. Dobbin, Y. Ye, T.H. Qiu, J.E.
Acids Res. 2004, 32, e74. Green, Effects of pooling mRNA in
21. M. Bakay, Y.-W. Chen, R. Borup, microarray class comparisons,
P.Zhao, K. Nagaraju, E. Hoffman, Bioinformatics 2004, 20, 3318-3325.
Sources of variability and effect of 32. L. Luo, R.C. Salunga, H. Guo,
experimental approach on expression A. Bittner, K.C. Joy, J.E. Galindo,
1112
H. Xiao, K.E. Rogers, J.S. Wan, M.R. 39. S. Klur, K. Toy, M.P. Williams,
Jackson, M.G. Erlander, et al. Gene U.Certa, Evaluation of procedures
expression profiles of laser-captured for amplification of small-size
adjacent neuronal subtypes, Nut. samples for hybridization on
Med. 1999, 5, 117-122. microarrays, Genomics 2004, 83,
33. C. Leethanakul, V. Patel, J. Gillespie, 508-5 17.
M. Pallente, J.F. Ensley, 40. J. McClintick, R. Jerome,
S. Koontongkaew, L.A. Liotta, C. Nicholson, D. Crabb,
M. Emmert-Buck, J.S. Gutkind, H. Edenberg, Reproducibility of
Distinct pattern of expression of oligonucleotide arrays using small
differentiation and growth-related samples, BMC Genomics 2003, 4, 4.
genes in squamous cell carcinomas 41. R. Singh, R.J. Maganti, S.V. Jabba,
of the head and neck revealed by the M. Wang, G. Deng, J.D. Heath,
use of laser capture microdissection N. Kurn, P. Wangemann, Microarray
and cDNA arrays, Oncogene 2000, 19, based comparison of three
3220-3224. amplification methods for nanogram
34. L.V. Hooper, M.H. Wong, A. Thelin, amounts of total RNA, AmJ Physiol
L. Hansson, P.G. Falk, J.I. Gordon, Cell Physiol, 2005, 288, 1179-1189.
Molecular analysis of commensal 42. L. Li, J. Roden, B.E. Shapiro, B.J.
host-microbial relationships in the Wold, S. Bhatia, S.J. Forman,
intestine, Science 2001, 291,881-884. R. Bhatia, Reproducibility,fidelity,
35. V. Luzzi, M. Mahadevappa, R. Raja, and discriminant validity of mRNA
J.A. Warrington, M.A. Watson, Amplification for microarray analysis
Accurate and reproducible gene from primary hematopoietic cells, J.
expression profiles from laser Mol. Diagn. 2005, 7,48-56.
capture microdissection, transcript 43. J. J. Upson, R. Stoyanova, H.S.
amplification, and high density Cooper, C. Patriotis, E.A. Ross,
oligonucleotide microarray analysis, B. Boman, M.L. Clapper, A.G.
J. Mol. Diagn. 2003, 5, 9-14. Knudson, A. Bellacosa, Optimized
36. C. King, N. Guo, G.M. Frampton, procedures for microarray analysis of
N.P. Gerry, M.E. Lenburg, C.L. histological specimens processed by
Rosenberg, Reliability and laser capture microdissection, 1.Cell.
reproducibility of gene expression Physiol. 2004, 201, 366-373.
measurements using amplified KNA 44. B.M. Bolstad, F. Collin, K.M.
from laser-microdissected primary Simpson, R.A. Irizarry, T.P. Speed,
breast tissue with oligonucleotide Experimental Design and Low-Level
arrays, J. Mol. Diagn. 2005, 7, 57-64. Analysis of Microarray Data, Int Rev
37. T. Ernst, M. Hergenhahn, Neurobiol. 2004, 60, 25-58.
M. Kenzelmann, C.D. Cohen, 45. N.J. Armstrong, M.A. van de Wiel,
M. Bonrouhi, A. Weninger, Microarray data analysis: From
R. Klaren, E.F. Grone, M. Wiesel, hypotheses to conclusions using
C. Gudemann, J. Kuster, W. Schott, gene expression data, Cell. Oncol.
G. Staehler, M. Kretzler, 2004,26,279-290.
M. Hollstein, H.-J. Grone, Decrease 46. D.K. Slonim, From patterns to
and gain of gene expression are pathways: gene expression data
equally discriminatory markers for analysis comes of age, Nat. Genet.
prostate carcinoma: A gene 2002,32,502-508.
expression analysis on total and 47. P. Khatri, P. Bhavsar, G. Bawa,
microdissected prostate tissue, Am. J . S. Draghici, Onto-Tools:an ensemble
Pathol. 2002, 160, 2169-2180. of web-accessible,ontology-based
38. D.J. Kelly, S. Ghosh, RNA profiling tools for the functional design and
for biomarker discovery: practical interpretation of high-throughput
considerations for limiting sample gene expression experiments, Nucleic
sizes, Dis. Markers 2005, 21,43-48. Acids Res. 2004, 32, W449-W456.
References I 1113
48. P.K. Tan, T.J. Downey, E.L. 55. A. Brazma, P. Hingamp,

Spitznagel Jr, P. Xu, D. Fu, D.S. j. Quackenbush, G. Sherlock,
Dimitrov, R.A. Lempicki, B.M. P. Spellman, C. Stoeckert, J. Aach,
Raaka, M.C. Cam, Evaluation of gene W. Ansorge, C.A. Ball, H.C. Causton
expression measurements from T. Gaasterland, P. Glenisson, F.C.
commercial microarray platforms, Holstege, I.F. Kim, V. Markowitz,
Nucleic Acids Res. 2003, 31, J.C. Matese, H. Parkinson,
5676-5684. A. Robinson, U. Sarkans,
49. R. Shippy, T. Sendera, R. Lockner, S . Schulze-Kremer, J. Stewart,
C. Palaniappan, T. Kaysser-Kranich, R. Taylor, J. Vilo, M. Vingron,
G. Watts, J. Alsobrook, Performance Minimum information about a
evaluation of commercial microarray experiment
short-oligonucleotide microarrays (MIAME)-toward standards for
and the impact of noise in making microarray data, Nat Genet. 2001, 29,
cross-platform correlations, B M C 365-371.
Genomics 2004, 5, 61. 56. C.]. Penkett, J. Baehler, Navigating
50. D. Hollingshead, D.A. Lewis, public microarray databases, Comp.
K. Mirnics, Platform influence on Funct. Genomics 2004,5471-479,
DNA microarray data in postmortem 57. E. Manduchi, G.R. Grant, H. He,
brain research, Neurobiol. Dis.2005, J. Liu, M.D. Mailman, A.D. Pizarro,
18,649-655. P.L. Whetzel, C.J. Stoeckert Jr, RAD
51. L.W. Jurata, Y.V. Bukhman, and the RAD study-annotator: an
V. Charles, F. Capriglione, J. Bullard, approach to collection, organization
A.L. Lemire, A. Mohammed, and exchange of all relevant
Q. Pham, P. Laeng, J.A. Brockman, information for high-throughput
C.A. Altar, Comparison of gene expression studies,
microarray-based mRNA profiling Bioinformatics 2004, 20,452-459.
technologies for identification of
58. H. Bono, T. Kasukawa,
psychiatric disease and drug
Y. Hayashizaki, Y. Okazaki, READ:
signatures, /. Neurosci. Methods 2004,
R I K E N expression array database,
138,173-188.
Nucleic Acids Res. 2002, 30, 211-213.
52. A,-K. Jarvinen, S. Hautaniemi,
59. C. Wiederkehr, R. Basavaraj,
H. Edgren, P. Auvinen, J. Saarela,
C. Sarrauste de Menthiere,
0.-P. Kallioniemi, 0. Monni, Are
L. Hermida, R. Koch, U. Schlecht,
data from different gene expression
A. Amon, S. Brachat,
microarray platforms comparable?
Genomics 2004,83,1164-1168.
M. Breitenbach, P. Briza, S. Caburet,
53. J. Lee, K. Bussey, F. Gwadry, M. Cherry, R. Davis,
W, Reinhold, G. Riddick, S. Pelletier, A. Deutschbauer, H.G. Dickinson,
S. Nishizuka, G. Szakacs, J:P. T. Dumitrescu, M. Fellous,
Annereau, U. Shankavararn, A. Goldman, J.A. Grootegoed,
S. Lababidi, L. Smith, M. Gottesman, R. Hawley, R. Ishii, B. Jegou, R.J.
J. Weinstein, Comparing cDNA and Kaufman, F. Klein, N. Lamb,
oligonucleotide array data: B. Maro, K. Nasmyth, A. Nicolas,
concordance of gene expression T. Orr-Weaver, P. Philippsen,
across platforms for the NCI-60 C. Pineau, K.P. Rabitsch, V. Reinke,
cancer cells, Genome Biol. 2003, 4, H. Roest, W. Saunders, M. Schroder,
R82. T. Schedl, M. Siep, A. Villeneuve,
54. S. Mitchell, K. Brown, M. Henry, D.J. Wolgemuth, M. Yamamoto,
M. Mintz, D. Catchpoole, B. LaFleur, D. Zickler, R.E. Esposito, M. Primig,
D. Stephan, Inter-Platform Germonline, a cross-species
comparability of microarrays in acute community knowledgebase on germ
lymphoblastic leukemia, B M C cell differentiation, Nucleic Acids Res.
Genomics 2004, 5, 71. 2004, 32, D56O-DS67.
1114
I 60. V. Praz, V. Jagannathan, P. Bucher, development, Immunol. Rev. 2004,
CleanEx: a database of heterogeneous 202,203-222.
gene expression data based on a 71. A.A. Lighvani, D.M. Frucht,
consistent gene nomenclature, D. Jankovic, H. Yamane, J. Aliberti,
Nucleic Acids Res. 2004, 32, B.D. Hissong, B.V. Nguyen,
D542-D547. M. Gadina, A. Sher, W.E. Paul, J.J.
61. M. Ringwald, J.T. Eppig, J.E. O’Shea, T-bet is rapidly induced by
Richardson, GXD: integrated access interferon-g in lymphoid and
to gene expression data for the myeloid cells, Proc. Nutl. Acad. Sci.
laboratory mouse, Trends Genet. U.S.A. 2001, 98,15137-15142.
2000, 16,188-190. 72. M. Afkarian, J.R. Sedy, J. Yang, N.G.
62. D.P. Hill, D.A. Begley, J.H. Finger, Jacobson, N. Cereb, S.Y. Yang, T.L.
T.F. Hayamizu, I. J. McCright, C.M. Murphy, K.M. Murphy, T-bet is a
Smith, J.S. Beal, L.E. Corbani, J.A. STAT1-induced regulator of IL-12R
Blake, J.T. Eppig, J.A. Kadin, J.E. expression in naive CD4+ T cells,
Richardson, M. Ringwald, The Nut. Immunol. 2002, 3, 549-557.
mouse Gene Expression Database 73. L. Rogge, E. Bianchi, M. Biffi,
(GXD):updates and enhancements, E. Bono, S.Y. Chang, H. Alexander,
Nucleic Acids Res. 2004, 32, C. Santini, G. Ferrari, L. Sinigaglia,
D568-D571. M. Seiler, M. Neeb, J. Mous,
63. T.R. Mosmann, R.L. Coffman, Th1 F. Sinigaglia, U. Certa, Transcript
and Th2 cells: Different patterns of imaging of the development of
lymphokine secretion lead to human T helper cells using
different functional properties, Annu. oligonucleotide arrays, Nat. Genet.
Rev. Immunol. 1989, 7, 145-173.
2000,25,96-101.
64. A.K. Abbas, K.M. Murphy, A. Sher, 74. R. Higuchi, R. Watson, Kinetic PCR
Functional diversity of helper T
analysis using a CCD camera and
lymphocytes, Nature 1996,383,
without using oligo nucleotide
787-793.
probes. In PCR Applications (Eds.:
65. K.M. Murphy, S.L. Reiner, The
M.A. Innis, D.H. Gelfand, J.J.
lineage decisions of helper T cells,
Sninsky), Academic Press, San
Nut. Rev. Immunol. 2002, 2, 933-944.
66. G. Trinchieri, Interleukin-12 and the Diego, 1999, pp. 263-284.
regulation of innate resistance and 75. W. Ouyang, S.H. Ranganath,
adaptive immunity, Nat. Rev. K. Weindel, D. Bhattacharya, T.L.
Immunol. 2003,3,133-146. Murphy, W.C. Sha, K.M. Murphy,
67. S.J. Szabo, B.M. Sullivan, S.L. Peng, Inhibition of Thl development
L.H. Glimcher, Molecular mediated by GATA-3 through an
mechanisms regulating Thl immune IL-4-independent mechanism,
responses, Annu. Rev. Immunol. Immunity 1998, 9,745-755.
2003,21,713-758. 76. S. Taki, T. Sato, K. Ogasawara,
68. S.J. Szabo, S.T. Kim, G.L. Costa, T. Fukuda, M. Sato, S. Hida,
X. Zhang, C.C. Fathman, L.H. G. Suzuki, M. Mitsuyama, E.-H.
Glimcher, A novel transcription Shin, S. Kojima, T. Taniguchi,
factor, T-bet, directs Thl lineage Y. Asano, Multistage regulation of
commitment, Cell 2000, 100, Thl-type immune responses by the
655-669. transcription factor IRF-1, Immunity
69. W. Zheng, R.A. Flavell, The 1997, 6,673-679.
transcription factor GATA-3 is 77. E.M. Coccia, N. Passini, A. Battistini,
necessary and sufficient for Th2 C. Pini, F. Sinigaglia, L. Rogge, IL-12
cytokine gene expression in CD4 T induces expression of interferon
cells, Cell 1997, 89, 587-596. regulatory factor-1 via signal
70. K.A. Mowen, L.H. Glimcher, transducer and activator of
Signaling pathways in Th2 transcription-4 in human T helper
References 1 1115
type 1 cells, J . B i d . Chem. 1999, 274, 86. S. Diehl, I.Anguita, A. Hoffmeyer,

6698-6703. T. Zapton, J.N. Ihle, E. Fikrig,
78. R. Grenningloh, B.Y. Kang, I.C. Ho, M. Rincon, Inhibition of T h l
Ets-1, a functional cofactor ofT-bet, is differentiation by IL-6 is mediated by
essential for T h l inflammatory SOCS1, Immunity 2000, 13,805-815.
responses, J . Exp. Med. 2005, 201, 87. C.M.U. Hilkins, G. Messer,
615-626. K. Tesselaar, A.G.I. van Rietschoten,
79. F. Sallusto, C.R. Mackay, M.L. Kapsenberg, E.A. Wierenga,
A. Lanzavecchia, The role of Lack of IL-12 signaling in human
chemokine receptors in primary, allergen-specific Th2 cells, J.
effector, and memory immune Irnmunol. 1996, 157,4316-4321.
responses, Annu. Rev. Imrnunol. 88. L. Rogge, L. Barberis-Maino, M. Biffi,
2000, 18,593-620. N. Passini, D.H. Presky, U. Gubler,
80. F. De Benedetti, P. Pignatti, M. Biffi, F. Sinigaglia, Selective expression of
E. Bono, S. Wahid, F. Ingegnoli, S.Y. an interleukin-12 receptor
Chang, H. Alexander, M. Massa, component by human T helper 1
A. Pistorio, A. Martini, C. Pitzalis, cells, /. Exp. Med. 1997, 185,
F. Sinigaglia, L. Rogge, Increased 825-831.
expression of 89. L. Rogge, D. D’Ambrosio, M. Biffi,
alpha(l,3)-fucosyltransferase-VIIand G. Penna, L.J. Minetti, D.H. Presky,
P-selectin binding of synovial fluid T L. Adorini, F. Sinigaglia, The role of
cells in juvenile idiopathic arthritis, /. Stat4 in species-specific regulation of
Rheol. 2003, 30, 1611-1615. Th cell development by type I IFNs,].
lmrnunol. 1998, 161,6567-6574.
81. J.J. Bird, D.R. Brown, A.C. Mullen,
90. V. Athie-Morales, H.H. Smits, D.A.
N.H. Moskowitz, M.A. Mahowald,
Cantrell, C.M. Hilkens, Sustained
J.R. Sider, T.F. Gajewski, C.-R.
IL-12 signaling is required for T h l
Wang, S.L. Reiner, Helper T cell
development, J. Immunol. 2004, 172,
differentiation is controlled by the
61-69.
cell cycle, Immunity 1998, 9, 229-237. T. Usui, R. Nishikomori, A. Kitani,
91.
82. J.A. Lederer, J.S. Liou, S. Kim, W. Strober, GATA-3 suppresses Thl
N. Rice, A. Lichtman, Regulation of development by downregulation of
NF-kB activation in T helper 1 and T Stat4 and not through effects on
helper 2 cells,]. Immunol. 1996, 156, IL-12Rbeta2 chain or T-bet, Immunity
56-63. 2003, 18,415-428.
83. H. Hamalainen, H. Zhou, W. Chou. 92. B. Lu, P. Zagouras, J.E. Fischer, J. Lu,
H. Hashizume, R. Heller, B. Li, R.A. Flavell, Kinetic analysis of
R. Lahesmaa, Distinct gene genomewide gene expression reveals
expression profiles of human type 1 molecule circuitries that control T
and type 2 T helper cells, Genome cell activation and Th1/2
Biol. 2001, 2, differentiation, Proc. Nutl. Acud. Sci.
research 0022.1-0022.11 . U.S.A. 2004, 101,3023-3028.
84. T. Chtanova, R.A. Kemp, A.P. 93. 0. Avni, D. Lee, F. Macian, S.J.
Sutherland, F. Ronchese, C.R. Szabo, L.H. Glimcher, A. Rao, T(H)
Mackay, Gene microarrays reveal cell differentiation is accompanied by
extensive differential gene expression dynamic changes in histone
in both CD4(+) and CD8(+) type 1 acetylation of cytokine genes, Nut.
and type 2 T cells,J. Immunol. 2001, lmrnunol. 2002,3,643-651.
167, 3057-3063. 94. T.A. Wynn, R. Morawetz,
85. M. Rincon, J. Anguita, T. Nakamura, T. Scharton-Kersten, S. Hieny, H.C.
E. Fikrig, R.A. Flavell, Interleukin Morse 111, R. Kuhn, W. Muller, A.W.
(1L)-6directs the differentiation of Cheever, A. Sher, Analysis of
IL-4-producing CD4+ T cells, J. Exp. granuloma formation in double
Med. 1997, 185,461-469. cytokine-deficient mice reveals a
1116
central role for IL-10 in polarizing 103. J.D. Fontenot, M.A. Gavin, A.Y.
both T helper cell 1-and T helper cell Rudensky, Foxp3 programs the
2-type cytokine responses in vivo, J. development and function of
Immunol. 1997,159,5014-5023. CD4+CD25+ regulatory T cells, Nut.
95. K.F. Hoffmann, S.L. James, A.W. rmmunol. 2003,4, 330-336.
Cheever, T.A. Wynn, Studies with 104. R. Khattri, T. Cox, S.A. Yasayko,
double cytokine-deficientmice reveal F. Ramsdell, An essential role for
that highly polarized Thl- and Scurfin in CD4+CD25+ T regulatory
Th2-Type cytokine and antibody cells, Nut. Immunol. 2003, 4,
responses contribute equally to 337-342.
vaccine-induced immunity to 105. S . Hori, T. Nomura, S. Sakaguchi,
schistosoma mansoni, J . Immunol. Control of regulatory T cell
1999, 163,927-938. development by the transcription
96. N.G. Sandler, M.M. Mentink-Kane, factor Foxp3, Science 2003, 299,
A.W. Cheever, T.A. Wynn, Global 1057-1061.
gene expression profiles during acute 106. M.E. Brunkow, E.W. Jeffery, K.A.
pathogen-induced pulmonary Hjerrild, B. Paeper, L.B. Clark, S.A.
inflammation reveal divergent roles Yasayko, J.E. Wilkinson, D. Galas,
for Thl and Th2 responses in tissue S.F. Ziegler, F. Ramsdell, Disruption
repair, /. Immunol. 2003, 171, of a new forkheadlwinged-helix
3655-3667. protein, scurfin, results in the fatal
97. S. Sakaguchi, N. Sakaguchi, lymphoproliferative disorder of the
M. Asano, M. Itoh, M. Toda, scurfy mouse, Nut. Genet. 2001, 27,
Immunologic self-tolerance 68-73.
maintained by activated T cells 107. J.D. Fontenot, J.P. Rasmussen, L.M.
expressing IL-2 receptor alpha-chains Williams, J.L. Dooley, A.G. Farr, A.Y.
(CD25).Breakdown of a single Rudensky, Regulatory T cell lineage
mechanism of self-tolerance causes specification by the forkhead
various autoimmune diseases, 1. transcription factor foxp3, Immunity
Immunol.1995, 155,1151-1164. 2005,22, 329-341.
98. E.M. Shevach, CD4+ CD25+ 108. T.A. Chatila, F. Blaeser, N. Ho, H.M.
suppressor T cells: more questions Lederman, C. Voulgaropoulos,
than answers, Nut. Rev. Immunol. C. Helms, A.M. Bowcock, JM2,
2002,2,389-400. encoding a fork head-related protein,
99. S. Sakaguchi, Naturally arising is mutated in X-linked
CD4+ regulatory t cells for autoimmunity-allergic disregulation
immunologic self-tolerance and syndrome, J. Clin. Invest. 2000, 106,
negative control of immune R75-R81.
responses, Annu. Rev. Immunol. 109. R.S. Wildin, F. Ramsdell, J. Peake,
2004,22,531-562. F. Faravelli, J.L. Casanova, N. Buist,
100. R.H. Schwartz, Natural regulatory T E. Levy-Lahad, M. Mazzella,
cells and self-tolerance, Nut. 0. Goulet, L. Perroni, F.D. Bricarelli,
Immunol.2005, 6, 327-330. G. Byrne, M. McEuen, S . Proll,
101. S. Sakaguchi, Naturally arising M. Appleby, M.E. Brunkow, X-linked
Foxp3-expressingCD25+CD4+ neonatal diabetes mellitus,
regulatory T cells in immunological enteropathy and endocrinopathy
tolerance to self and non-self, Nut. syndrome is the human equivalent of
Immunol. 2005, 6,345-352. mouse scurfy, Nut. Genet. 2001, 27,
102. J.D. Fontenot, A.Y. Rudensky, A well 18-20.
adapted regulatory contrivance: 110. C.L. Bennett, J. Christie, F. Ramsdell,
regulatory T cell development and M.E. Bmnkow, P.J. Ferguson,
the forkhead family transcription L. Whitesell, T.E. Kelly, F.T.
factor Foxp3, Nut. Immunol. 2005, 6, Saulsbury, P.F. Chance, H.D. Ochs,
331-337. The immune dysregulation,
References I 1 1 1 7
polyendocrinopathy, enteropathy, role for interleukin-4 and

X-linked syndrome (IPEX) is caused interferon-gamma on CD30 and
by mutations of FOXP3, Nut. Genet. lymphocyte activation gene-3 (LAG-3)
2001,27,20-21. expression by activated naive T cells,
111. H. von Boehmer, Mechanisms of Eur. J. Immunol. 1997, 27,
suppression by suppressor T cells, 2239- 2244.
Nat. Immunol. 2005, 6, 338-344. 117. A.E. Herman, G.J. Freeman,
112. M.A. Gavin, S.R. Clarke, E. Negrou, D. Mathis, C. Benoist, CD4+CD25+
A. Gallegos, A. Rudensky, T regulatory cells dependent on
Homeostasis and anergy of ICOS promote regulation of effector
CD4(+)CD25(+) suppressor T cells cells in the prediabetic lesion, J . Exp.
in vivo, Nut. Immunol. 2002, 3, Med. 2004, 199, 1479-1489.
33-41. 118. M. Radonjic, 7.-C. Andrau,
113. R.S. McHugh, M.J. Whitters, C.A. P. Lijnzaad, P. Kemmeren, T.T.J.P.
Piccirillo, D.A. Young, E.M. Shevach, Kockelkorn, D. van Leenen, N.L. van
M. Collins, M.C. Byrne, Berkum, F.C.P. Holstege,
CD4(+)CD25 (+)immunoregulatory Genome-wide analyses reveal RNA
T cells: gene expression analysis polymerase 11 located upstream of
reveals a functional role for the genes poised for rapid response upon
glucocorticoid-induced TNF receptor, S. cerevisiae stationary phase exit,
Immunity 2002, 16,311-323. Mol. Cells 2005, 18, 171-183.
114. D. Bruder, M. Probst-Kepper, A.M. 119. S.K. Kurdistani, S. Tavazoie,
Westendorf, R. Geffers, S. Beissert, M. Grunstein, Mapping global
K. Loser, H. von Boehmer, J. Buer, histone acetylation patterns to gene
W. Hansen, Neuropilin-1: a surface expression, Cell 2004, 117, 721-733.
marker of regulatory T cells, Eur. J . 120. C.T. Harbison, D.B. Gordon, T.I. Lee,
lmmunol. 2004,34,623-630. N.J. Rinaldi, K.D. Macisaac, T.W.
115. C.T. Huang, C.J. Workman, D. Flies, Danford, N.M. Hannett, J.B. Tagne,
X. Pan, A.L. Marson, G. Zhou, E.L. D.B. Reynolds, J. Yoo, E.G. Jennings,
Hipkiss, S. Ravi, J. Kowalski, H.I. J . Zeitlinger, D.K. Pokholok,
Levitsky, J.D. Powell, D.M. Pardoll, M. Kellis, P.A. Rolfe, K.T.
C.G. Drake, D.A. Vignali, Role of Takusagawa, E.S. Lander, D.K.
LAG-3 in regulatory T cells, Gifford, E. Fraenkel, R.A. Young,
Immunity 2004, 21,503-513. Transcriptional regulatory code of a
116. F. Annunziato, R. Manetti, L. Cosmi, eukaryotic genome, Nature 2004, 431,
G. Galli, C.H. Heusser, 99-104.
S. Romagnani, E. Maggi, Opposite
Chemical Biology
1118
18.2
Scanning the Proteome for Targets of Organic Small Molecules Using
Bifunctional Receptor Ligands
Nikolai Hey
Outlook
The terms chemical genomics and chemical proteomics refer to a systematic

analysis of the effects of organic small molecules on genomic and proteomic
activity (i.e., a chemical approach to systems biology). The goal of this type
of analysis is to improve our understanding of the cellular targets and
signaling mechanisms that underlie or could predict drug effects. Recent
chemical proteomic initiatives have resulted in the emergence of several
novel, complementary methods for the characterization and identification
of molecular targets of organic small molecules. This chapter reviews
the evolution, development, and applications of three-hybrid-based (3H)
technologies that utilize chemically engineered bifunctional ligands and
facilitate proteome-wide small molecule target discovery. 3H approaches may
prove particularly useful in tracing an observed therapeutic/physiological
effect of a small molecule to one or more molecular targets or, alternatively, in
revealing novel molecular targets that could suggest an alternative therapeutic
potential for a particular drug, drug candidate, or chemical class.
18.2.1
Introduction
Organic small molecules embody an important class of therapeutic agents.

They are also increasingly being used in chemical biological studies as
molecular probes to study the cellular functions of proteins, signaling
pathways, and processes associated with disease pathogenesis. The usefulness
of a small molecule as a molecular probe is, however, critically dependent
on an understanding of its target spectrum, and its specificity and selectivity
profiles. Similarly, when a small molecule is selected as a probe to unravel the
molecular basis for an observed phenotypic effect in cultured cells or in vivo
model systems, the identification of its molecular targets is of fundamental
importance.
In drug discovery research, an understanding of the molecular targets
of a drug or drug candidate can shed important light on activities that
are either positively or negatively associated with its therapeutic efficacy,
as well as on activities that may raise concern with respect to potential
adverse side effects. Whatever the individual scenario, target identification
represents an important element in rational lead optimization that strives to
Chemical Biology. From Small Molecules to .System Biology and Drug DesigM
ISBN: 978-3-527-31150-7
18.2 Scanning the Proteornefor Targets oforganic Small Molecules I 1119
achieve an optimal target spectrum and a therapeutic index for a given drug
candidate. Alternatively, the identification of proteins with known function
as novel molecular targets could reveal a previously unrecognized therapeutic
application(s) for a drug candidate or a marketed drug. In some instances,
this could also present an opportunity to resurrect drug candidates that failed
to progress in the discovery or development process due to the lack of a
good understanding of their mechanisms of action (MoA). With regard to
drug development, target discovery may also lead to the identification of
novel surrogate markers for therapeutic efficacy, permitting an assessment
of the extent to which a putative therapeutic drug might yield a satisfactory
clinical result. This is particularly important for the development of the new
generation of mechanism-based drugs. Thus, the identification of protein
targets of organic small molecules is of fundamental importance in many
areas of biomedical research.
Recent chemical proteomic initiatives have resulted in the emergence of
various alternatives to classical protein activity profiling (e.g., i n uitro kinase
assays using purified enzymes) for small molecule target identification. One
such alternative method utilizes a variety of chemically reactive probes to
profile and identify enzymes or other protein targets in complex mixtures
based on their catalytic or ligand-binding activities. This approach, known as
activity-based protein profiling (ABPP), is designed to address subproteomes,
such as a discrete enzyme family [l-71. Depending on the spectrum of targets
recognized by a “pan-active’’ chemical probe, competitive profiling provides
information on the selectivity profile of a compound. Because the number of
suitable reactive probes is steadily growing, ABPP promises to become a more
widely used methodology in chemical proteomics [7].
Another alternative that has been recently described is based on monitoring
the interaction of a small molecule with proteins expressed as fusions to T7
bacteriophage [8,91. This approach has been applied successfully in target and
selectivity profiling of kinase inhibitors [9]. I t is conceivable that this approach
could be adapted to support cDNA library screening, which would expand
its application to proteins other than kinases - although it would be limited
to proteins that function as monomers. Several other methods for detecting
small molecule-protein interactions have been described, including ribosome
display, drug-far western, and protein or small molecule microarray-based
methods [ 10- 141, but these studies consisted primarily of proof-of-principle
studies using known interaction partners. In contrast, the other alternatives
noted above have already been successfully applied to the profiling of specific
subproteomes, and have resulted in the discovery of many novel molecular
interactions.
Traditionally, the identification of protein targets of small molecules has
relied on in uitro biochemical methods, such as photocross-linking, radiolabeled
ligand binding, and affinity chromatography. Affinity chromatography is still a
widely used method and can be used to identify targets present in any cell extract
of choice. Therefore, it is, in principle, not restricted to an analysis of specific
1120
protein classes or subproteomes. Recent advances in the fabrication of solid

supports with improved physical properties for protein affinity purification
[ 151, improvements in experimental design and purification schemes [ 16-22],
as well as advances in mass spectrometry methods have improved the success
rate of affinity-based purification of small molecule targets [23]. However,
despite some past and more recent successes, the affinity-based approach does
not always deliver results. Most successful examples involve a combination of
a high-affinity small molecule with fairly abundant protein receptor. Such a
scenario is more of an exception than the rule. Furthermore, most bioactive
synthetic small molecules are somewhat hydrophobic, which predisposes
them to nonspecific protein binding when coupled at high density to solid
supports. This requires stringent wash conditions, which may be unfavorable
for many interactions. An unfavorable signal-to-noise ratio is indeed the
most common cause of failure in the identification of specific interactions
[24]. Another drawback of affinity purification is that it does not directly
deliver a cDNA clone that encodes the candidate protein target, which is
required for subsequent validation of any putative interaction. cDNA cloning
can be laborious and time consuming, especially if the number of candidate
targets is large and prioritization of these targets based on some rationale is
difficult. Three-hybrid (3H) technologies, which enable the detection of small
molecule-protein interactions in intact cells, lack some of the drawbacks
encountered with methods such as those outlined above and will be described
in more detail here.
18.2.2
The yeast three-hybrid (Y3H) system is a cellular assay system designed for
the identification and characterization of small molecule-protein interactions
in intact cells [25]. It uses yeast Saccharomyces cerevisiae as a host system
and combines aspects of the yeast two-hybrid (Y2H) system [26] with recent
developments in chemical dimerizer technology [27, 281.
The discovery of the MoA of the immunosuppressive macrocyclic lactone
lactams FK506 and rapamycin marked the beginning of our current
understanding of chemical dimerizers [29]. These bifunctional molecules
are able to simultaneously interact with two different proteins through
distinct structural elements, promoting the formation of a ternary complex
(Fig. 18.2-1). In the case of FK506, the ternary complex consists of
FKBP12-FK506-calcineurin. Recruitment of calcineurin, a Ca’+/calmodulin-
dependent protein phosphatase, to the FKBP12- FK506 complex inhibits
its function. This results in impaired signaling of the T-cell antigen
receptor (TCR) and subsequent immunosuppression [30]. Rapamycin forms
a FKB P 12-rapamycin- FRAP ternary complex (FRAP: FKBPl2-rapamycin-
associated protein, also named RAFTZ, RAPTI, or TOR). Recruitment of
18.2 Scanning the Proteomefor Targets oforganic S m a l l Molecules 1 1121
Fig. 18.2-1 (a). Chemical structures ofthe two fusion proteins containing FKBP and
immunosuppressants FK506 and FRB (FKBP12-rapamycin binding domain of
rapamycin. (b) Ribbon diagram ofthe FRAP) fused to specific signaling domains.
FKBP-FK506-calcineurin complex (adapted DD - "docking domain", which could be a
from Griffith et al., Cell 1995, 82, 507-522, DNA-binding domain or a sequence causing
with permission from Elsevier). Color membrane localization ofthe FKBP fusion
coding is as follows: calcineurin A (blue), protein. ED - "effector domain", which
calcineurin B (green), FKBP12 (red), FK506 could be a transcription activation domain
(white). (c) Schematic representation o f or Some other signaling domain (e.g.?
how rapamycin may be used to induce a kinase).
signal transduction through dimerization of
FRAP inhibits its function in T lymphocytes, which results in impaired

interleukin-2 receptor signaling [31-341, and this is thought to be the basis
for the immunosuppressive actions of rapamycin (301. Another example of an
immunosuppressant that acts in this manner is cyclosporin A, which interacts
with cyclophilin to form a complex that then binds calcineurin [30].
FK50G and rapamycin, and various analogs thereof, have been widely used
to cross-link at will hybrid proteins that have been designed to contain
appropriate binding sites for these molecules, thereby controlling intracellular
signaling events that are naturally or otherwise regulated by protein-protein
interactions [27, 29, 351 (Fig. 18.2-1). Chemically engineered analogs include
1122
I dimeric versions of FK506 (FK506-FK506, appropriately termed FK1022)
78 Genome and Proteorne Studies
[36], FK506-cyclosporin [37], and analogs of FK506 and rapamycin that

recognize only mutated forms of FRAP or FKBP12 (and are therefore devoid
of the cytotoxic activities associated with FK506 and rapamycin) [38-401. The
synthesis and use of the hybrid ligand/dimerizer FK1012 [36] marked the
beginning of 3H systems, in which a synthetic bifunctional molecule is used
to induce the homo- or heterodimerization of chimeric proteins. The use of this
hybrid ligand and other dimerizers was initially strictly focused on promoting
cellular signaling events in a temporally controllable and dynamic fashion.
However, the synthesis of hybrid ligands incorporating a small molecule
test compound for the purpose of de novo identification of protein targets
of that molecule followed shortly thereafter [25]. This marked the beginning
of the development of 3H systems for proteome-wide screening of small
molecule-protein targets.
Liu and colleagues [25]took advantage of the concept of compound-induced
protein-protein interaction and modified the previously developed Y2H
system [26] to create a Y3H system. Y2H is arguably the most widely used
technology for the detection and identification of hybrid proteins on a large
scale [41-431. A logical next step was to adapt it to a Y3H system that could,
in principle, support the screening of complex cDNA libraries for identifying
targets of small molecules. The basic elements of the Y2H system and their
interactions are depicted and described in Fig. 18.2-2. The basic components
of a Y3H system are shown and described in Fig. 18.2-3. Both the Y2H and
Y3H systems make use of fusion proteins that contain a DNA-binding domain
(DBD) or a transcription activation domain (AD). In Y2H, the interaction of
the fusion proteins is a direct interaction of proteins or protein domains that
are fused to the DBD and AD domains. In Y3H, the interaction of DBD-
and AD-fusion proteins is mediated by a hybrid ligand (chemical dimerizer).
The chemical dimerizer consists of an anchor moiety with known binding
affinity for a ligand-binding domain (LBD) that is fused to the DBD domain.
Recruitment of the AD-fusion protein to the promoter region of a reporter
gene is induced by its interaction with a small molecule that is linked to
the anchor moiety of the hybrid ligand. A productive interaction generates a
ternary complex that promotes the transcriptional activation of the resident
reporter gene. The Y3H system described by Liu and colleagues made use of
a dexamethasone (DEX)-FK506 heterodimer. The fusion proteins consisted
of the glucocorticoid receptor (GR, the LBD) fused to LexA (the DBD), and
FKBP12 fused to a transcription AD derived from the bacterial protein B42 [25].
The use of mutant forms of GR, which displayed higher affinity for DEX than
wild-type GR, was necessary for the detection of the interaction. These findings
suggest that affinities in at least the nanomolar range are most likely required
for successful display of a synthetic hybrid ligand by the DBD-fusion protein.
Importantly, using DEX-FK506, FKBPl2 could be identified in a screen using
a cDNA library encoding a complex mixture of AD-fusion proteins, suggesting
that Y3H could, in principle, be used to identify novel drug receptors. This
18.2 Scanning the Proteomefor Targets oforganic Small Molecules I 1123
Fig. 18.2-2 The Y2H system: interaction o f grow in the absence of histidine in the
bait and prey fusion proteins activates the culture medium), LacZ (can be detected in a
expression of a reporter gene. colorimetric assay). Inset shows an array of
DBD - DNA-binding domain. yeast cells that has been generated using an
AD - transcription activation domain. appropriate robot. As shown, LacZ reporter
R E - promoter response element. induction (bluelgreen colored yeast cells)
Reporters: HIS3 (an auxotrophic marker, the reflects a productive protein-protein
induction of which enables yeast cells to interaction.
was confirmed by the use of a DEX-methotrexate (MTX) hybrid ligand in

cDNA library screening, which resulted in the identification of its known
target dihydrofolate reductase (DHFR) [44]. These studies, however, involved
molecules (FK506 and MTX) that exhibit high affinity for their respective
receptors, leaving unanswered the important question of whether Y3H is also
suitable for the detection of lower affinity small molecule-protein interactions
and for the identification of novel protein targets.
Since the first report on Y3H, several other hybrid ligands have been de-
scribed, all of which incorporate anchor moieties with high affinity for a
particular receptor protein. These include DEX, FK506, estradiol analogs, and
MTX [25,36,45-SO]. MTX-based hybrid ligands (also referred to as MTX-fusion
compounds or MFCs) appear particularly promising, as recently demonstrated
by their use in the screening of cDNA libraries and the identification of known
as well as novel targets of ATP-competitive small molecule kinase inhibitors
[51] (see also Figs. 18.2-3 and 18.2-4).This work addressed a number of pre-
viously unanswered questions, providing evidence that cDNA library screens
can be performed at high complexity and redundancy; the emergence of false
positives can be easily controlled and deselected for, using appropriate genetic
1124 18 Genome and Proteorne Studies
I
Fig. 18.2-3 Y3H system. (a) Components a mutant form o f glucocorticoid receptor
ofthe Y3H system. A MTX-based hybrid (GR) with high affinity for dexamethasone
ligand associates with a DBD-fusion protein (DEX). Activation o f gene expression is
and AD-fusion protein. Formation o f a reflected in positive yeast growth (HIS3
complex induces activation o f a reporter marker). Alternatively, induction o f the Lac2
gene. In the example shown here, the reporter is detected by a colorimetric assay.
MTX-fusion compound (MFC) incorporates (c) Example o f outgrowth o f yeast cells in
a PEG linker and the small molecule kinase which a positive interaction has taken place
inhibitor purvalanol B (PurvB). (b) Example in the presence o f a MFC. Such kind o f yeast
o f a Y3H interaction. The DNA-binding colonies, typically formed during cDNA
domain fusion protein is a LexA library screens, can be picked and subjected
(DBD)-DHFR fusion. The AD-fusion protein t o subsequent analysis, as described in
is a CR*-Cal4 (AD) fusion. GR* represents Fig. 18.2-4.
Fig. 18.2-4 Y3H-cDNA library screening interrogated once again with the t e s t MFC
workflow, as recently described (adapted and control compounds (96-well format
from Becker et al., Chem. Biol. 2005, 1 7 , assay). Each 96-well plate represents the
21 1-223, with permission from Elsevier). effects o f one particular compound. Images
Screening involves transformation o f yeast from each array screen are then clustered to
cells with a cDNA library, selection ofyeast yield a composite image, as shown. The
colonies (HIS3 selection), picking o f yeast composite image shows an example o f the
cells, rearraying ofthese yeast cells, interaction o f MFCs o f kinase inhibitors, and
interrogation of arrays with t e s t MFC and variants thereof, with their respective
other hybrid ligands (and MTX-PEG), protein kinase targets (adapted from Becker
picking o f positives, isolation o f plasmid et al., Chem. Biol. 2005, 1 7 , 21 1-223, with
DNA, and sequencing. Plasmids are then permission from Elsevier).
retransformed into yeast cells and arrays are
counterscreens and a combination of different hybrid ligands; interactions with

affinities in the low micromolar range can still be detected; and interactions
can be detected with a high degree of specificity.
1126
I 18 G e n o m e a n d Proteome Studies
Kley and colleagues have also described the development of array-based

screening approaches for the rapid profiling of small molecule-protein
interactions with Y3H [51]. In this screening paradigm, yeast cells are
transformed with a specific cDNA encoding a candidate target protein and are
subsequently spotted with an appropriate robot on agar plates to generate a
yeast cell array (96-wellformat, see Fig. 18.2-4). Prior to spotting the yeast cells,
hybrid ligand is deposited at the same location. Positive concentric outgrowth
of yeast cells at a particular coordinate indicates that an interaction of a
candidate target protein with the small molecule of interest has occurred. The
implementation of yeast cell arrays and automation of the spotting process was
found critical in performing controlled cDNA library screens and appropriate
quality control tests (to ensure a high signal/noise ratio). Array screening also
enables a direct interaction analysis of any cloned open reading frame (ORF)
of interest, as has been described for the screening of a defined set of kinases
[Sl]. It should be noted that array screening is inherently more sensitive
than complex cDNA library screening. In an array screen, each potential
interaction is tested separately; therefore, no competitive growth selection is
taking place and weaker interactions are easier to detect. The application of
such an approach to the scanning of the kinome for small molecule-protein
interactions is described below (see Section 18.2.4).
To successfully perform Y3H screens, the choice of the anchor moiety of the
hybrid ligand is important. MTX, as already indicated, shows much promise.
It exhibits high affinity (low nanomolar to picomolar) for the monomeric
form of E. coli dihydrofolate reductase (eDHFR), which is a small, compact
molecule that can be easily expressed as a fusion protein in yeast cells
[46]. Furthermore, contrary to what is often observed with nonhybrid small
molecules, MTX-hybrid molecules appear to generally permeate yeast cells
quite readily. At GPC Biotech we have screened over 50 hybrid ligands in
which MTX was coupled to various small molecule chemotypes. To date we
have not encountered difficulties with cellular uptake of these molecules.
Cellular uptake can readily be determined using appropriate competition
experiments, as outlined in Fig. 18.2-5.
For practical purposes, the choice of linker and strategy for the chemical syn-
thesis of the hybrid ligands is also important. MTX-based hybrid ligands that
include polyethylene glycol (PEG) as a linker have proven quite successful [Sl].
PEG linkers of variable length have been used, and generally PEG repeats of
n = 3-6 generate suitable hybrid ligands. A PEGylated test compound, which
is generated as an intermediate in the synthesis of the MTX-hybrid ligand, also
provides a suitable probe for coupling to solid phase and for biochemical vali-
dation of any interactions that might be identified with Y3H. A general strategy
for the synthesis of MTX-based hybrid ligands is described in Fig. 18.2-6.
In summary, the development of MTX-based hybrid ligands and array-based
screening approaches have led to the “reemergence” of Y3H as a chemical
proteomics technology that can be successfully deployed for the scanning of
the proteome or subproteomes with organic small molecules. Thus, although
18.2 Scanning the Proteome for Targets oforganic Small Molecules I 1127
Fig. 18.2-5 A Y3H competition assay. The activation induced by a “reference” MFC
competition assay provides a measure o f (reflected in the decrease in yeast growth in
cellular uptakelfunctionality o f a t e s t MFC. response to increasing concentration o f test
Also shown is an example o f experimental MFCs) (adapted from Becker et al., Chem.
results showing a dose-dependent Biol. 2005, I 1 , 21 1-223, with permission
competitive inhibition o f HIS3 reporter from Elsevier).
Y3H may not be suitable for lead discovery, it could prove particularly useful
in tracing an observed therapeutic/physiological effect of a small molecule on
one or more molecular targets or, alternatively, reveal molecular targets that
could suggest an alternative therapeutic potential for a particular drug, drug
candidate, or chemical class.
18.2.3
As outlined above, Y3H offers a promising alternative to other methods for the
identification and characterization of small molecule-target interactions. It
provides a means to rapidly screen complex cDNA libraries encoding candidate
target proteins. The identification of an interaction is directly associated with
the availability of a cDNA clone encoding a target protein, which enables
rapid secondary validation experiments. Furthermore, once a clone has been
identified, it becomes a permanent resource that can be interrogated in a
reiterative fashion with any small molecule hybrid ligand of interest. Another
advantage of Y3H is that it is a binding assay that does not require a priori
knowledge of the biochemical activity of candidate target proteins. Thus, it
also makes possible the identification and characterization of targets whose
biological functions are unknown.
Compared to Y2H, Y3H boasts the advantage that the DBD-fusion protein
for a given system (e.g., LexA-DHFR, see Fig. 18.2-3)remains invariant. Many
1128
Fig. 18.2-6 A strategy for the synthesis o f generated as an intermediate in the

MFCs (adapted from Kley, Chem. Biol. 2005, synthesis ofthe MFC. Various chemical
1 1 , 599-608, with permission from reactions can be applied when using
Elsevier). A probe that can be immobilized different functional groups for coupling
on solid phase for biochemical studies is reactions.
false positives in Y2H screens emerge due to "stickiness" of a particular

DBD-bait fusion protein and its nonspecific binding to AD-fusion proteins.
This is not an issue with Y3H. Furthermore, multiple hybrid ligands can easily
be used to assess the specificity of a particular interaction [Sl].However, Y3H
also shares some limitations with Y2H. For instance, it is limited to proteins
that can be expressed as fusion proteins in yeast and that translocate into the
nucleus. Thus, it is not suitable for the analysis of membrane proteins, unless
specific domains of such proteins are expressed (e.g., cytoplasmic domains
of receptor tyrosine kinases). Interactions that require accessory proteins may
also not be detected.
The need for the use of hybrid ligands in Y3H may also limit its application.
For example, coupling of the PEG linker and MTX to a test molecule
may perturb its binding affinity to certain target proteins. In the event
that structure-activity relationship (SAR) information, which can provide
18.2 Scanning the Proteome for Targets oforganic Small Molecules I 1129
a rational basis for the positioning of PEG linker in the test molecule, is
not available, positional scans may have to be performed. In that respect,
Y3H has constraints similar to those seen with aftinity purification methods,
which require modification and solid-phase immobilization of a test molecule.
MTX-based hybrid ligands that cause growth inhibition or cell death in yeast
cells would also be unsuitable, although we have not yet encountered such
a case. One complication, which we encountered once, involved a MTX
ligand that autoactivated the Y3H system. This appeared to be due to the
interaction of the test molecule with a yeast protein that, when recruited
to the promoter region of the reporter gene, causes transcription activation
(manuscript in preparation). This supposition is based on the findings that
the same hybrid ligand was not autoactivating in a yeast strain that was
made deficient in the gene encoding that particular yeast protein (which was
identified by screening of a yeast cDNA library). Alternatively, autoactivation
could be suppressed by adding 3-amino-1,2,4-triazole (3AT) to the culture
medium (as frequently done in Y2H experiments that utilize baits that
are autoactivating [41, 431). Another arguable limitation of the Y3H system
is that robust screening requires, ideally, robotic handling of yeast cells
and the generation of yeast cell arrays. This technical capability may not
be available to every laboratory, in which case more labor intensive and
error prone manual handling and spotting of yeast cells would have to be
performed.
In summary, although the application of Y3H may be limited in some
scenarios, most of these are likely to be rare events. The most limiting
factor is likely the requirement for expression of fusion proteins that
are able to translocate into the nucleus of yeast cells while retaining a
properly folded small molecule binding domain. This may, however, not
be an issue with many proteins, because of their modular structure. A
modular structure favors proper folding of a binding domain, even when
it is expressed in isolation or as part of a hybrid fusion protein. Thus, the
use of complex cDNA libraries, which contain multiple fusion variants of
a particular protein, is preferable and will decrease the occurrence of false
negatives.
18.2.4
As outlined in the previous section, the emergence of a Y3H system that

supports cDNA library screening and the identification of novel interactions is
fairly recent. However, its potential is clearly demonstrated by its application
to the identification of targets of small molecule kinase inhibitors [51, 521.
Protein kinases have been implicated as pivotal signal transducers in many
cell signaling networks, and have emerged as an attractive class of drug targets
for many disease indications, in particular, cancer and inflammation [53].
1130
I The realization that small molecule inhibitors of protein kinases might be
I8 Genome and Proteome Studies
of therapeutic use, as exemplified by the phenyl-aminopyrimidine STI571

(also known as imatinib mesylate or Gleevec) in the treatment of myelogenous
leukemia [54], has led to intensive drug discovery efforts involving multiple
disease-relevant kinases. These include the cyclin-dependent kinases (CDKs)
and CDK-related kinases (CRKs). Protein kinase inhibitors from a large
number of different chemotypes have emerged in recent years. Most of these
interact with the ATP-binding domain (activesite) ofkinases, thereby inhibiting
catalysis and substrate phosphorylation. Because of the structural similarity
of the active sites of different kinases, such compounds have the potential
for cross-reacting with kinases other than the intended target kinase(s) [23].
Sequence similarityper se is not a good predictor of cross-reactivity, which often
occurs with phylogenetically distantly related kinases [9, 231. Cross-reactivity
with other proteins, such as purine-binding proteins, may also occur. Thus,
extensive target screening is an important factor in the characterization of the
MoA of kinase inhibitors.
Assessing the effects of kinase inhibitors on the in vitro kinase activity
of purified kinases has been critical in determining their selectivity profiles.
However, screening of a large number of purified kinases is costly and assays
are available for only part of the kinome (approx. 200 kinase assays; the
kinome encodes >500 human kinases [55]).The functions of many kinases
are unknown, as are their substrates, and no standard assays are available
to probe the effects of a small molecule on their activity. Y3H provides an
opportunity to simultaneously assay any kinase or kinase domains that can
be expressed as a fusion protein in yeast. A recent study successfully made
use of a hybrid ligand incorporating the potent CDK inhibitor purvalanol B,
a purine analog, suggesting that many different kinases, or their modular
ATP-binding domains, can be assayed with Y3H [Sl]. Thus, a significant
coverage of the kinome might be achieved. That study also revealed that
purvalanol B, deployed as a CDK inhibitor in a wide number of biological
studies, actually “sees” many more kinases than previously known, including
tyrosine kinases. Roscovitine, a closely related purine analog, appeared to be
more specific. However, this compound is also a far less potent CDK inhibitor.
Similar observations were obtained with other kinase inhibitor chemotypes.
For example, indenopyrazoles, which are potent inhibitors of CDK1/2/4, were
found through Y3H screening to be much more promiscuous than one might
have anticipated [Sl]. This was recently confirmed using in vitro kinase activity
profiling (unpublished results).
In contrast to the previous examples, potent CDK inhibitors that are based
on a [1,3,G]-tri~ub~tituted-pyrazolo-[3,4-d]-pyrimidine-4-one kinase inhibitor
scaffold [5G] have recently been found (using Y3H) to exhibit a remarkable
proteome-wide specificity for a relatively small number of CDKs/CRKs [52].
These included kinases other than the known targets CDK1/2, some ofwhich
have been implicated in cellular processes associated with cellular proliferation
or, alternatively, the pathogenesis of diseases other than cancer. Thus, a
compound derived from the [1,3,G]-trisubstituted-pyrazolo-[3,4-d]-pyrimidine-

4-one scaffold could possibly be optimized for enhanced or decreased affinity
for one or the other target(s), making it more suitable for one or the other
therapeutic application. We have indeed recently identified such compounds
(unpublished results). This latter study [52] provides a good example of how
Y3H-based target profiling can be used to gain a more detailed understanding
oftargets that could underlie the biological effects of a small molecule, as well as
the range of potential therapeutic applications of the compound class/inhibitor
scaffold from which it was derived. Furthermore, the biological functions
of some of the newly identified CRK targets are only poorly understood.
The availability of chemical probes for these kinases should facilitate their
functional characterization.
We have also used Y3H to profile a number of different kinase inhibitors
that are in clinical trials or in the market. Consistent with results recently
published [9], many of these were found to interact with kinases other than
their intended targets. These findings strongly emphasize the importance
of kinome-wide selectivity profiling of kinase inhibitors. Y3H-based kinase
inhibitor profiling, using yeast cell arrays that display many kinases, should
facilitate such studies. We have recently assembled such a resource (Ref. 52,
manuscript in preparation) and will integrate it into Y3H for standard kinome
profiling of putative kinase inhibitors.
Although the Y3H studies reported by our laboratory have focused on the use
of kinase inhibitors, a growing number of studies indicate that Y3H is equally
suitable for use with other types of small molecules. For example, we have
detected bona jide interactions of small molecules with phosphodiesterases
(PDEs),histone deacetylases (HDACs),sirtuins (SIRTs), carbonic anhydrase,
and various other proteins (manuscript in preparation).
In addition to being broadly applicable to the de novo identification of
targets of small molecules, 3H systems may be used to further characterize
their interactions and to investigate SAR parameters. For example, one may
rapidly investigate the effects of particular mutations or naturally occurring
polymorphisms on the interaction of a small molecule with its target protein.
Additionally, mutagenesis screens may be performed to identify protein
variants that display altered characteristics in their ligand-binding properties.
This kind of functional cloning approach has been used to identify FKBP or
FRAP mutants that bind specific analogs of FK506 and rapamycin, which have
reduced affinity for the naturally occurring forms of these proteins [38]. This
has led to the development of chemical dimerizers with higher affinity for
their target proteins, along with reduced cytotoxicity. A similar approach could
be used to identify mutant variants of a target protein that have decreased
affinity for a particular compound while retaining biological activity. Such
drug-resistant mutants could be used to explore the relative importance of
that target in the pharmacological effects of that compound [57]. Yet another
functional cloning application of Y3H has recently been described by Cornish
and colleagues [58],in which Y3H was used to assay for an enzymatic activity
1132
I of a protein expressed in yeast cells that could cleave the linker moiety of a
specific dimerizer. These examples emphasize the broad range of the possible
applications of Y 3H .
18.2.5
Future Developments
Y3H is the first 3H system that has been successfully applied to large scale
screening for small molecule targets. Future developments of 3H systems that
operate in mammalian cells rather than in yeast cells should further expand
the range of applications of the 3H concept. As already discussed, Y3H relies
on the expression of hybrid proteins in yeast cells and their translocation into
the nucleus. Furthermore, yeast cells are generally less permeable to small
molecules than mammalian cells, with the previously noted exception of MTX
heterodimers. These drawbacks render it difficult to perform competition
experiments, in which the ability of a test compound to compete with a hybrid
ligand for binding to a specific target protein is determined. This would be less
of an issue in a mammalian 3H (M3H) system. Furthermore, a M3H system
may facilitate the detection of interactions that require accessory proteins or
posttranslational modifications of the target protein.
Several 2H systems that enable the detection of protein-protein interactions
in mammalian cells have been described, for example: (a) the ubiquitin-split-
protein-sensor (USPS) technology [59], (b)two-component protein fragment
complementation assays (PCAs)[GO, 611 (e.g., systems based on reconstitution
of split-DHFR, split-b-lactamase,and split-GFP),and (c) interaction technolo-
gies based on resonance energy transfer between reporter proteins with either
fluorescent or bioluminescent properties (FRET:fluorescent resonance energy
transfer and BRET bioluminescent resonance energy transfer). These systems
have been used to monitor specific known protein-protein interactions in in-
tact cells or to determine whether one protein would be able to interact with
another protein (direct interaction tests). They have not been applied to ran-
dom screening of protein-protein interactions using cDNA library screening
paradigms, with the exception of a recent report on the use of split-GFP [G2].
How broadly applicable this system is remains to be determined. One potential
drawback of PCA assays is susceptibility to steric constraints imposed on the
assembly of two reporter protein fragments when these are fused to other
proteins or protein fragments of varying sizes and properties. Limited sensi-
tivity and dynamic range might also be an issue in some instances. Thus, even
if these 2H systems could be adapted to a 3H version for the detection and
characterization of defined small molecule-protein interactions (as has been
described for some of these [GO, G l ] ) , it remains uncertain whether they would
be suitable for random, large scale cDNA library screening and for de novo
target identification. On the other hand, a recently described M2H method,
termed mammalian protein-protein interaction trap (MAPPIT)[G3],has already
18.2 Scanning the Proteornefor Targets oforganic Small Molecules 1 1133
provided a novel opportunity for the development of a M3H system with

broader applications.
MAPPIT has been successfully used by Tavernier and colleagues [63], as
well as in our laboratory (unpublished observations), in the identification of
novel protein-protein interactions using cDNA library screening. Its basic
components and their mode of action are described in Fig. 18.2-7. It operates
according to the concept ofa “protein recruitment” system. In this instance, the
bait protein (the “docking station”) recruits a prey protein to the cytoplasmic
domain of a cytokine receptor, which triggers a signal transduction event that
can be easily monitored. In that respect, MAPPIT displays similarities to the
Y2H system, in which an AD-fusion protein (the prey) is recruited to DNA
through its association with a DBD-fusion protein (the bait). Such protein
recruitment systems are arguably less susceptible than PCA-based systems to
the occurrence of false negatives due to steric constraints encountered during
protein fragment assembly.
We have recently been successful in developing a 3H version of the
MAPPIT technology, termed mammalian small molecule-protein interaction
trap (MASPIT), which, similar to Y3H, is suitable for the detection of the
interaction of MTX-based hybrid ligands with their target proteins [64]. The
concept and components ofthis system are described in Fig. 18.2-7. In contrast
to Y3H, MASPIT can be readily used to perform competition experiments with
hybrid ligands and nonmodified parent molecules. Thus, the interaction of the
parent molecule with a candidate target protein can be directly validated in this
fashion. Additionally, dose-response experiments can provide a measure for
the targeting potency of a compound for a target protein in the context of an
intact cell [64]. Such measurements could lend some important insights into
how effective a compound might be in inhibiting the activity of a target protein
in the context of other competing interactions. For instance, if a competing
protein was expressed at high levels, higher doses of the compound might
be required to inhibit the intended target(s) as effectively as might otherwise
be the case (as, for instance, with purified target protein). For a number
of reasons, monitoring the interaction of a small molecule with its target
protein(s) in intact cells could reflect a more realistic setting in which to shtdy
a compound’s cellular MoA. It would simultaneously address variables that
may influence the cellular potency of a compound, such as cell permeability,
posttranslational modifications of target proteins, competitive interactions,
intracellular concentrations of molecules such as ATP, and so on. A cell-based
assay would also enable the analysis of the interaction of a target protein with
a drug that is presented to cells in the form of a prodrug and which requires
intracellular conversion to an active ligand (unpublished observations).
Since MASPIT is a “simple” binding assay, it could also be used potentially to
screen small molecule libraries for compounds that interfere with or compete
for binding of a known molecule with its target protein. Therefore, MASPIT
provides an opportunity for small molecule discovery that is not possible with
Y3H (due to the less favorable permeability of yeast cells to small molecules).
1134
Fig. 18.2-7 The MAPPIT and MASPIT can be monitored using a STAT3-responsive
systems. (a) Events occurring in response t o reporter gene, which uses the pancreatitis
ligand-induced activation o f a type 1 associated protein 1 (rPAP1) promoter.
cytokine receptor. Ligand-binding results in (b) MAPPIT. This 2H system is based on the
conformational changes in the receptor concept described in (a). It employs a
complex, ultimately leading to juxtaposition signalingdeficient leptin receptor F3
and activation o f a receptor-associated (lepRF3) variant that cannot recruit STAT3.
Janus kinase (JAK). JAK then phosphorylates An interaction o f t h e bait and prey proteins
the cytoplasmic part o f the receptor, leading results in the recruitment o f a gpl30 protein
t o recruitment o f signaling molecules. fragment containing STAT3 recruitment
including signal transducers and activators sites. STAT3 can now be recruited and
o f transcriptions (STATs). JAK subsequently phosphorylated by JAK2,
phosphorylates STAT, which causes STAT t o leading t o its activation. (c) MASPIT. In this
dissociate from the receptor, form a system, the recruitment o f the g p l 3 0 protein
homodimer, translocate to the nucleus and fragment is triggered by the interaction o f a
activate transcription o f a STAT-response prey protein with the t e s t compound moiety
gene (or reporter gene). STAT3-activation o f an MFC.
Finally, we have recently successfully applied MASPIT to the screening of
cDNA libraries and to the identification of novel small molecule-protein
interactions [64]. These studies mark the beginning of the development of a
broadly applicable M3H system that holds promise for future use in target
identification and drug discovery.
18.2.6
Conclusions
A detailed understanding of the MoA of organic small molecules is equally

important in chemical biology and drug discovery. In chemical biology,
mapping of the target spectrum and selectivity profile of a small molecule is
critical for its meaningful use as a probe to study protein function, as well as
in tracing molecular targets to its observed therapeutic/physiological effects.
In drug discovery, an understanding of the MoA of small molecules can have
an impact on the discovery process at multiple stages, particularly in the lead
optimization and the assessment of the therapeutic potential of drugs or drug
candidates [52, 651.
Thus, recent advances in the development of 3H systems hold promise for
their more widespread use in biomedical research and drug discovery. Y3H has
already provided a powerful approach in the identification of novel molecular
targets of small molecules, as exemplified by the studies with protein kinase
inhibitors, and by a method to study the effects of mutations or polymorphisms
on small molecule-protein interactions. The emergence of mammalian-based
systems promises to further expand the range of 3H applications, such as a
determination of relative targeting potencies of small molecules for protein
targets in intact cells, and pending a successful adaptation to higher throughput
analysis, even for limited compound screening and hitflead identification.
Acknowledgments
I thank Dr. Margaret Lee Kley for a critical reading of the manuscript and
many helpful comments.
References
I . Y. Liu, M.P. Patricelli, B.F. Cravatt. J. Krumrine, S. Toba, K. Chehade,

Activity-based protein profiling: D. Bromme, I.D. Kuntz, M. Bogyo,
the serine hydrolases, Proc. Natl. Small molecule affinity fingerprinting.
Acad. Sci. U.S.A. 1999, 96(26), A tool for enzyme family
14694-14699. subclassification, target identification,
2. D.C. Creenbaum, W.D. Arnold, F. Lu, and inhibitor design, Chem. B i d . 2002,
L. Hayrapetian, A. Baruch, 9(lo), 1085-1094.
1136
3. A. Borodovsky, H. Ovaa, N. Kolli, T. drug-western, Mol. Pharmacol. 1999,

Gan-Erdene, K.D. Wilkinson, H.L. 55(2), 356-363.
Ploegh, B.M. Kessler, Chemistry- 12. G. MacBeath, S.L. Schreiber, Printing
based functional proteomics reveals proteins as microarrays for
novel members of the high-throughput function
deubiquitinating enzyme family, determination, Science 2000,
Chern. Biol. 2002, 9(10),1149-1159. 289(S48S),1760- 1763.
4. D. Leung, C. Hardouin, D.L. Boger, 13. F.G. Kuruvilla, A.F. Shamji, S.M.
B.F. Cravatt, Discovering potent and Sternson, P.J. Hergenrother, S.L.
selective reversible inhibitors of Schreiber, Dissecting glucose
enzymes in complex proteomes, Nat. signalling with diversity-oriented
Biotechnol. 2003, 21(6),687-691. synthesis and small-molecule
5. D.A. Campbell, A.K. Szardenings, microarrays, Nature 2002, 416(6881),
Functional profiling of the proteome 653-657.
with affinity labels, C u r . Opin. Chem. 14. N. Winssinger, S. Ficarro,
Biol. 2003, 7(2),296-303. P.G. Schultz, J.L. Harris, Profiling
6. A.E. Speers, B.F. Cravatt, Profiling protein function with small molecule
enzyme activities in vivo using click microarrays, Proc. Natl. Acad. Sci.
chemistry methods, Chem. Biol. 2004, U.S.A. 2002, 99(17),11139-11144.
11(4),535-546. 15. N. Shimizu, K. Sugimoto, J. Tang,
7. N. Jessani, B.F. Cravatt, The T. Nishi, I. Sato, M. Hiramoto,
development and application of S. Aizawa, M. Hatakeyama, R. Ohba,
methods for activity-based protein H. Hatori, T. Yoshikawa, F. Suzuki,
profiling. Curr. Opin. Chem. Biol. A. Oomori, H. Tanaka, H. Kawaguchi,
2004, 8(l),54-59. H. Watanable, H. Handa, High-
8. P.P. Sche, K.M. McKenzie, J.D.White, performance affinity beads for
D.J. Austin, Display cloning: identifying drug receptors, Nat.
functional identification of natural Biotechnol. 2000, 18(8),877-881.
product receptors using cDNA-phage 16. M. Knockaert, N. Gray, E. Damiens,
display, Chem. Biol. 1999, G(lO), Y.T. Chang, P. Grellier, K. Grant,
707-716. D. Fergusson, J. Mottram, M. Soete,
9. M.A. Fabian, W.H. Biggs, D.K. J.F. Dubremetz, K. Le Roch, C. Doerig,
Treiber, C.E. Atteridge, M.D. P. Schultz, L. Meijer, Intracellular
Azimioara, M.G. Benedetti, T.A. targets of cyclin-dependent kinase
Carter, P. Ciceri, P.T. Edeen, M. Floyd, inhibitors: identification by affinity
J.M. Ford, M. Galvin, J.L. Gerlach, chromatography using immobilised
R.M. Grotzfeld, S. Herrgard, D.E. inhibitors, Chem. Biol. 2000, 7(6),
Insko, M.A. Insk0,A.G. Lai, J.M. 411-422.
Lelias, S.A. Mehta, Z.V. Milanov, A.M. 17. M. Knockaert, K. Wieking, S. Schmitt,
Velasco, L.M. Wodicka, H.K. Patel, M. Leost, K.M. Grant, J.C. Mottram,
P.P. Zarrinkar, D.J. Lockhart, A small C. Kunick, L. Meijer, Intracellular
molecule-kinase interaction map for targets of paullones. Identification
clinical kinase inhibitors, Nat. following affinity purification on
Biotechnol. 2005, 23(3),329-336. immobilized inhibitor, J . Biol. Chem.
10. M. McPherson, Y. Yang, P.W. 2002,277(28),25493-25501.
Hammond, B.L. Kreider, Drug 18. P.R. Graves, J.J.Kwiek, P. Fadden,
receptor identification from multiple R. Ray, K. Hardeman, A.M. Coley,
tissues using cellular-derived mRNA M. Foley, T.A. Haystead, Discovery of
display libraries, Chem. Biol. 2002, novel targets of quinoline drugs in the
9(6),691-698. human purine binding proteome,
11. H. Tanaka, N. Ohshima, H. Hidaka, Mol. Phamacol. 2002, 62(6),
Isolation of cDNAs encoding cellular 1364-1372.
drug-binding proteins using a novel 19. G. Lolli, F. Thaler, B. Valsasina,
expression cloning procedure: F. Roletto, S. Knapp, M. Uggeri,
A. Bachi, V. Matafora, P. Storici, molecules, Chem. Biol. 2004, 1 1 ( 5 ) ,
A. Stewart, H.M. Kalisz, A. Isacchi, 599-608.
Inhibitor affinity chromatography: 28. S . Lefurgy, V. Cornish, Finding
profiling the specific reactivity of the Cinderella after the ball: a three-hybrid
proteome with immobilized approach to drug target identification,
molecules, Proteomics 2003, 3(7), Chem. Biol. 2004, 11(2),151-153.
1287-1298. 29. S.L. Schreiber, Chemical genetics
20. K. Godl, I. Wissing, A. Kurtenbach, resulting from a passion for synthetic
P. Habenberger, S. Blencke, organic chemistry, Bioorg. Med. Chem.
H. Gutbrod, K. Salassidis, 1998, 6(8), 1127-1152.
M. Stein-Gerlach, A. Missio, 30. J. Liu, J.D. Farmer, Jr.,W.S. Lane,
M. Cotten, H. Daub, An efficient J. Friedman, I. Weissman,
proteomics method to identify the S.L. Schreiber, Calcineurin is a
cellular targets of protein kinase common target of
inhibitors, Proc. Natl. Acad. Sci. U.S.A. cyclophilin-cyclosporin A and
2003, 100(26),15434-15439. FKBP-FK506complexes, Cell 1991,
21. J. Wissing, K. Godl, D. Brehmer, 66(4),807-815.
S. Blencke, M. Weber, 31. J. Heitman, N.R. Mowa, M.N. Hall,
P. Habenberger, M. Stein-Gerlach, Targets for cell cycle arrest by the
A. Missio, M. Cotten, S. Muller, immunosuppressant rapamycin in
H. Daub, Chemical proteomic analysis yeast, Science 1991, 253(5022),
reveals alternative modes of action for 905-909.
pyrido[2,3-d]pyrimidine kinase 32. E.J. Brown, M.W. Albers, T.B. Shin,
inhibitors, Mol. Cell Proteomics 2004, K. Ichikawa, C.T. Keith, W.S. Lane,
3(12), 1181- 1193. S.L. Schreiber, A mammalian protein
22. Y. Liu, K.R. Shreder, W. Gai, S. Corral, targeted by G1-arresting
D.K. Ferris, J.S. Rosenblum, rapamycin-receptor complex, Nature
Wortmannin, a widely used 1994, 369(6483), 756-758.
phosphoinositide 3-kinase inhibitor, 33. D.M. Sabatini, H. Erdjument-
also potently inhibits mammalian Bromage, M. Lui, P. Tempst,
polo-like kinase, Chem. Biol. 2005, S.H. Snyder, RAFT1: a mammalian
12(1),99-107. protein that binds to FKBPl2 in a
23. H. Daub, K. Godl, D. Brehmer, rapamycin-dependent fashion and is
B. Klebl, G. Muller, Evaluation of homologous to yeast TORS, Cell 1994,
kinase inhibitor selectivity by chemical 78(1), 35-43.
proteomics, Assay Drug Dev. Technol. 34. M.I. Chiu, H. Katz, V. Berlin, RAPT1,
2004, 2(2),215-224. a mammalian homolog of yeast Tor,
24. L. Burdine, T. Kodadek, Target interacts with the FKBPlZ/rapamycin
identification in chemical genetics: the complex, Proc. Natl. Acad. Sci. U.S.A.
(often) missing link, Chem. Biol. 2004, 1994, 91(26),12574- 12578.
1 1 ( 5 ) ,593-597. 35. R. Pollock, T. Clackson,
25. E.J. Licitra, J.O. Liu, A three-hybrid Dimerizer-regulated gene expression,
system for detecting small C u r . Opin. Biotechnol 2002, 13(5),
ligand-protein receptor interactions, 459-467.
Proc. Natl. Acad. Sci. U.S.A. 1996, 36. D.M. Spencer, T.J. Wandless,
93(23), 12817- 12821. S.L. Schreiber, G.R. Crabtree,
26. S . Fields, 0. Song, A novel genetic Controlling signal transduction with
system to detect protein-protein synthetic ligands, Science 1993,
interactions, Nature 1989, 340(6230), 262(5136), 1019-1024.
245-246. 37. P.J. Belshaw, D.M. Spencer,
27. N. Kley, Chemical dimerizers and G.R. Crabtree, S.L. Schreiber,
three-hybrid systems: scanning the Controlling programmed cell death
proteome for targets of organi; small with a cyclophilin-cyclosporin-based
1138
I chemical inducer of dimerization, vivo, J. Am. Chem. SOC. 2000, 122,
Chem. Biol. 1996, 3(9),731-738. 4247-4248.
38. S.D. Liberles, S.T. Diver, D.J. Austin, 46. W.M. Abida, B.T. Carter, E.A. Althoff,
S.L. Schreiber, Inducible gene H. Lin, V.W. Cornish,
expression and protein translocation Receptor-dependence of the
using nontoxic ligands identified by a transcription read-out in a
mammalian three-hybrid screen, Proc. small-molecule three-hybrid system,
Natl. Acad. Sci. U.S.A. 1997, 94(15), Chembiochem2002, 3(9),887-895.
7825-7830. 47. K. Baker, D. Sengupta, G. Salazar-
39. T. Clackson, W. Yang, L.W. Rozamus, Jimenez,V.W. Cornish, An optimized
M. Hatada, J.F. Amara, C.T. Rollins, dexamethasone-methotrexate yeast
L.F. Stevenson, S.R. Magari, %hybrid system for high-throughput
S.A. Wood, N.L. Courage, X. Lu, screening of small molecule-protein
F. Cerasoli, Jr., M. Gilman, D.A. Ilolt, interactions, Anal. Biochem. 2003,
Redesigning an FKBP-ligand interface 315(1),134-137.
to generate chemical dimerizers with 48. K.S. De Felipe, B.T. Carter,
novel specificity, Proc. Natl. Acad. Sci. E.A. Althoff, V.W. Cornish,
U.S.A. 1998, 95(18),10437-10442. Correlation between ligand-receptor
40. T. Clackson, Redesigning small affinity and the transcription readout
molecule-protein interfaces, C u r . in a yeast three-hybrid system,
Opin. Stmct. Biol. 1998, 8(4),451-458. Biochemistry 2004, 43(32),
41. P.Uetz, L. Giot, G. Cagney, 10353-10363.
T.A. Mansfied, R.S. Judson, 49. S.L. Hussey, S.S. Muddana, B.R.
J.R. Knight, D. Lockshon, V. Narayan, Peterson, Synthesis of a
M. Srinivasan, P. Pochart, beta-estradiol-biotinchimera that
A. Qureshi-Emili, Y. Li, B. Godwin, potently heterodimerizes estrogen
D. Conover, T. Kalbfleisch, receptor and streptavidin proteins in a
G. Vijayadamodar, M. Yang, yeast three-hybrid system, J. Am.
M. Johnston, S. Fields, J.M. Rothberg, Chem. SOC.2003, 125(13),3692-3693.
A comprehensive analysis of 50. S.S. Muddana, B.R. Peterson, Facile
protein-protein interactions in synthesis of cids: biotinylated estrone
Saccharomyces cerevisiae, Naturr oximes efficiently heterodimerize
2000, 403(6770),623-627. estrogen receptor and streptavidin
42. T. Ito, T. Chiba, R. Ozawa, proteins in yeast three hybrid systems,
M. Yoshida, M. Hattori, Y. Sakaki, A Org. Lett. 2004, 6(9),1409-1412.
comprehensive two-hybrid analysis to 51. F. Becker, K. Murthi, C. Smith,
explore the yeast protein interactome, J. Come, N. Costa-Roldan,
Proc. Natl. Acad. Sci. U.S.A. 2001, C. Kaufmann, U. Hanke,
98(8),4569-4574. C. Degenhart, S. Baumann,
43. A.J. Walhout, M. Vidal, Protein W. Wallner, A. Huber, S. Dedier,
interaction maps for model S. Dill, D. Kinsman, M. Hediger,
organisms, Nut. Rev. Mol. Cell Biol. N. Bockovich, S. Meier-Ewert,
2001, 2(1),55-62. A.F. Kluge, N. Kley, A three-hybrid
44. D.C. Henthorn, A.A. Jaxa-Chamiec, approach to scanning the proteome for
E. Meldrum, A GAL4-based yeast targets of small molecule kinase
three-hybrid system for the inhibitors, Chem. Biol. 2004, 11(2),
identification of small molecule-target 211-223.
protein interactions, Biochem. 52. M. Caligiuri, F. Becker, K. Murthi,
P ~ u ~ u c 2002,
o ~ . 63(9),1619-1628. F. Kaplan, S. Dedier, C. Kaufmann,
45. H. Lin, W. Abida, R.C. Sauer, V.W. G. Zybarth, J. Richard, N. Bockovich,
Cornish, Dexamethasone- A.F. Kluge, N. Kley, A proteome-wide
methotrexate: an efficient chemical CDK/CRK-specific kinase inhibitor
inducer of protein dimerization in promotes tumor cell death in the
References I 1139
absence ofcell cycle progression, Chem. 60. S.W. Michnick, I . Remy,

Biol. 2005, 12, 1103-1115 in press. F.X. Campbell-Valois,
53. P.Cohen. Protein kinases-the major A. Vallee-Belisle, J.N. Pelletier,
drug targets of the twenty-first Detection of protein-protein
century? Nat. Rev. Drug Discov. 2002, interactions by protein fragment
1(4),309-315. complementation strategies, Methods
54. R. Capdeville, E. Buchdunger, Enzynzol. 2000, 328,208-230.
J. Zimmerrnann, A. Matter, Glivec 61. I. Remy, S.W. Michnick, Mapping
(STI571, imatinib), a rationally de- biochemical networks with
veloped, targeted anticancer drug, Nat. protein-fragment complementation
Rev. Drug Discov. 2002, 1(7),493-502. assays, Methods Mol. Biol. 2004, 261,
55. G. Manning, D.B. Whyte, R. Martinez, 41 1-426.
T. Hunter, S. Sudarsanarn, The 62. I. Remy, S.W. Michnick, Regulation of
protein kinase complement of the apoptosis by the Ftl protein, a new
human genome, Science 2002, modulator of protein kinase B/Akt,
298(5600),1912- 1934. Mol. Cell. Biol. 2004, 24(4), 1493-1504.
56. J.A. Markwalder, M.R. Arnone, 63. S. Eyckerman, A. Verhee, J.V. der
P.A. Benfield, M. Boisclair, Heyden, I. Lernmens, X.V. Ostade,
C.R. Burton, C.H. Chang, S.S. Cox, J. Vandekerckhove, J . Tavernier,
P.M. Czerniak, C.L. Dean, Design and application of a
D. Doleniak, R. Grafstrom, cytokine-receptor-based interaction
B.A. Harrison, R.F. Kaltenbach, 3rd, trap, Nat. Cell Biol. 2001, 3(12),
D.A. Nugiel, K.A. Rossi, S.R. Sherk, 1114-1119.
L.M. Sisk, P.Stouten, G.L. Trainor, 64. M. Caligiuri, L. Molz, Q. Liu,
P.Worland, S.P. Seitz, Synthesis and F. Kaplan, J.P. Xu, J.Z. Majeti,
biological evaluation of l-ary1-4,5- R. Ramos-Kelsey, K. Murthi,
dihydro- 1H -pyrazolo[3,4-d]pyrimidin- S. Lievens, J. Tavernier, N. Kley,
4-one inhibitors of cyclin-dependent MASPIT: Three-hybrid trap for
kinases, /. Med. Chem. 2004, 47(24), quantitative proteome fingerprinting
5894-5911. of small molecule-protein interactions
57. P.A. Eyers, I.P. van den, R.A. Quinlan, in mammalian cells, Chem. Biol. 200k
M. Goedert, P. Cohen, Use of a drug- 13,711-722.
resistant mutant of stress-activated 65. T.A. Carter, L.M. Wodicka, N.P.Shah,
protein kinase 2a/p38 to validate A.M. Velasco, M.A. Fabian,
the in vivo specificity of SB 203580, D.K. Treiber, Z.V. Milanov,
FEBS Lett. 1999, 451(2), 191-196. C.E. Atteridge, W.H. Biggs, 3rd,
58. K. Baker, C. Bleczinski, H. Lin, P.T. Edeen, M. Floyd, J.M. Ford,
G . Salazar-Jimenez,D. Sengupta, R.M. Grotzfeld, S. Herrgard.
S. Krane, V.W. Cornish, Chemical D.E. Insko, S.A. Mehta, H.K. Patel,
complementation: a reaction- W. Pao, C.L. Sawyers, H. Varmus,
independent genetic assay for enzyme P.P. Zarrinkar, D.J. Lockhart,
catalysis, Proc. Natl. Acad. Sci. U.S.A. Inhibition of drug-resistant mutants of
2002, 99(26), 16537-16542. ABL, KIT, and EGF receptor kinases,
59. N. Johnsson, A. Varshavsky, Split Proc. Natl. Acad. Sci. U.S.A. 2005,
ubiquitin as a sensor of protein 102(31),11011- 11016.
interactions in vivo, Proc. Natl. Acad.
S C ~ U.S.A.
. 1994, 91(22),10340-10344.
PART Vlll
Outlook
ISBN: 978-3-527-31150-7
Chemical Biology
I1143
19
Chemical Biology - An Outlook
Giinther Wess
Outlook
Chemical Biology has evolved to a strong driving force in biomedical science. It

is a paradigm change and will enable scientists to approach grand challenges.
Chemical Biology is not limited to academia. It will contribute to a wide range
of industrial applications, in particular in the field of drug discovery. Systems
biology as well as translational medicine might also benefit from several
elements of Chemical Biology. In this article the wide range of application and
impacts will be highlighted.
19.1
The Evolving Concept of Chemical Biology
Almost 20 years ago Arthur Kornberg stated in his famous article “The Two
Cultures: Chemistry and Biology” the following: “. . . we now have the paradox
of the two cultures, Chemistry and Biology, growing farther apart even as they
discover more common ground . . [l] .I’
This was made at a time when it had already become apparent that the
1980s had ushered in a new era in biomedical research with new technologies
providing previously undreamed opportunities. Ten years later S.L. Schreiber
and KC Nicolaou commented on the emerging concept of Chemical Biology as
“. . . the perhaps most exciting development. . .”, “. . . that biological problems
are increasingly well defined from a chemist’s point of view . . .” and . . .
“while Molecular Biology allows the function of biological molecules such
as proteins and nucleic acids to be altered by mutation, Chemical Biology
directly alters the function of biological molecules by chemical means . . .”.
Finally they defined the core of the field of chemical biology as “. . . using
small molecules or designed molecules as ligands to directly alter the function
of biological molecules . . [2]. The next milestone happened in 2005: The
.I’
Nature Publishing Group launched the new journal Nature Chemical Biology
with the statement that “. . . Chemical Biology has emerged as a field grounded
Edited bv Stuart L. Schreiber. Tarun M. Kaooor. and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag G k b H 6 Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
1144
l in technical advances brought about by the close collaborations of Chemists
19 Chemical Biology - An Outlook
and Biologists . . .” and “. . . Chemical Biologists have tackled challenging

problems in Biology, ranging from cellular signaling to drug development and
Neurobiology . . . the field is connected by a common desire to understand and
manipulate living systems at the molecular level with increasing precision . . .”
[3]. Where are we today and what does the future hold? Is chemical biology the
bridge or the common ground between both disciplines? What are the great
challenges ahead of us that can be answered in the next 20 years? How will
the field emerge?
In my view the previous chapters of this book have convincingly
demonstrated that chemical biology is not simply a new scientific discipline.
It is a paradigm change in the way scientists approach biomedical questions.
In addition, it is a kind of mindset change across different scientific cultures
facilitating seamless interactions and collaborations. This is required to be able
to approach grand challenges in biomedical science. If Arthur Kornberg was
right at his time, chemical biology will bring scientific disciplines and research
areas closer together, and enable them to discover more common ground,
sharing a common vision, setting common goals, and launching joint efforts.
19.2
Chemical Biology in Academia
Although there is not yet a precise definition of chemical biology, the common
understanding among many scientists is that chemical biology directly alters,
activates, perturbes or inhibits the function of biological macromolecules by
chemical means, that is, small-molecule ligands. In future, this leitmotiv
should be extended to higher levels of complexity and should also include
biological systems and pathways, regulatory networks, cellular processes, and
even whole organisms. The scientific questions will range from basic science,
purely academic in nature, to questions of life science, drug discovery, and
future medicine. It will also include plant biology and even ecosystems and
their evolution.
Chemical biology brings the small molecules into play. It will significantly
give new insight - how things function at various levels. Needless to mention
that this will require the fruitful interplay of many disciplines and technologies
such as Biology, Chemistry, Medicine, and Mathematics, screening in vivo
models and metabolomics. Such an approach will not only give new insight
into fundamental biological processes but will also create new opportunities
for new products and businesses.
At this point, some remarks on the future role of chemistry in the context
of chemical biology seem to be required. With some oversimplification,
chemistry was traditionally concerned with structure and synthesis, and biology
more with function (with the exception of structural biology of biological
macromolecules). Research into structure-activity relationships was always
79.2 Chemical Biology in Academia I1145
an interdisciplinary affair and was therefore fairly underdeveloped in view of

the actual opportunities. In the world of chemical biology, structure-activity
relationships would be extended to a broader understanding of how to induce a
particular biological response in a biological system through a small molecule.
It is quite compelling that in addition to the three elements of structure,
synthesis, and function the paradigm of chemical biology requires a fourth
element: this is selection. It addresses unambiguously the critical question of
WHAT is the chemical structure needed to get the desired biological response
and how does one get there. Therefore, selection is a key element of chemical
biology approaches. Eschenmoser has differentiated presynthesis selection
from postsyntheses selection [4]. In his view presynthesis selection is clearly a
design process in which the chemist has the knowledge to define one molecule
that will exhibit the biological properties. In contrast, postsynthesis selection
means discovery, that is, finding the molecule in a typical high-throughput
screening approach. As biological function is the ultimate goal, the chemist is
challenged by the question: WHAT is the structure I need and how do I get
there? This strongly depends on the information that is available about the
biological system, in particular, the biological space that needs to be occupied
by a small molecule to get the expected biological response. Therefore, the
central theme is how to generate and accumulate knowledge that enables
identification of the regions of chemical space that are generated by small
molecules, which are biologically relevant.
Every day we learn more about the complexity of biological systems and that
our reductionistic models are getting less useful, explaining our experimental
results. Therefore, we are far away from de novo predicting chemical structures
that are biologically relevant. A combination of design and discovery processes
is still required. It is a very long way to go and the accumulation of knowledge
on the structure and biological function of biological macromolecules in whole
systems is on the critical path for the future. Regarding the biological systems,
we need very reliable experimental data to make correlations. Meaningful
high-content screening systems as well as phenotypic screening and in vivo
systems with smart readouts that allow quantification are required. These
capabilities will also significantly contribute to projects in systems biology.
One can even go one step further that chemical biology will become a driver
of systems biology.
As structure function correlations are a central theme of chemical biology
approaches and chemical biologists will define WHAT needs to be synthesized,
they need excellent synthetic organic chemists as their partners who are skilled
to rapidly synthesize in reasonable quantity what is really required. This in-
cludes single small molecules as well as small-molecule libraries. I also refer to
the categories DOS and TOS, which have been introduced by S.L. Schreiber [ S ] .
In conclusion, the study of biological systems at higher levels of complexity,
through small molecules and finding out the rules behind how things function
will be the greatest challenge and a tantalizing opportunity. A typical example
could be the understanding of stem cell biology in health and diseases and
1146
I stimulating the body’s own regenerative mechanisms through small-molecule
I 9 Chemical Biology - An Outlook
treatment for promoting survival, migration and homing, proliferation, and

differentiation [GI.
19.3
Chemical Biology in Industry
Chemical biology is by no means limited to academic projects. It has the

potential to contribute significantly to bring industrial research, in particular,
drug discovery,to the next level and help improve innovation and productivity.
Currently, the pharmaceutical industry is challenged by a decline of their
R&D productivity, in particular, delivering innovative products that are real
breakthroughs. Many multifaceted reasons can be identified. In summary,
they fall under three main categories:
identifying relevant disease approaches based on drugable
targets;
generating a molecule that has the properties to become a
drug (druglike molecules);
demonstrating a real therapeutic advantage over existing
therapies, which justifies a competitive label.
With regard to the identification of drugable targets, chemical biology can,

as described in the previous paragraph, play an important role in target or
pathway validation to better understand the biological systems or get an idea
of potential side effects. In this context, it might provide valuable tools and
probes for experiments to validate hypotheses.
It should be mentioned here that several efforts are ongoing across the
industry to improve the target identification/validation output not only
by introducing new technologies into the value chain or through new
organizational models and processes but also by introducing new scientific
strategies dealing with genomics and disease biology. Such an effort has
recently been described as “a new grammar for drug discovery” [7].One might
also speculate that the interplay between chemical biology and systems biology
opens new opportunities.
However, the most important contribution of chemical biology is in the area
of generating drug-like molecules. This can simply be summarized by “finding
better compounds faster”. Compounds that are not only high affinity binders
of bio-macromolecules but compounds that can also be optimized into drugs
with reasonable effort.
Two aspects have to be considered and distinguished:
Finding a molecule with the right biological profile interacting
with the defined target(s) and/or exhibiting the required
pleiotropic effects in the biological system. In addition, having
the required selectivity and lack of activity against antitargets
19.3 Chemical Biology in Industry I 1147
that would diminish the therapeutic effect and/or create

unwanted side effects.
Finding a molecule that has the right profile with regard to
Absorption, Distribution, Metabolism, Excretion, and Toxicity
(ADMET)as well as physicochemical properties.
Both areas comprise very complex challenges. The first one deals with the
question of what the molecule does to the biological system with regard to
activity and specificity, for example, inhibiting an enzyme or activating a
receptor. The second one deals with the question of what the system does to
the molecule, for example, getting metabolized by an enzyme of the liver or
being transported through a membrane.
Despite the fact that pharmaceutical companies will optimize these areas by
applying new technologies and management processes [8] there are typical,
critical, success elements chemical biology can contribute. These elements
are primarily based on knowledge on targets and molecules and particularly
on target families and privileged molecular scaffolds, recognition patterns,
and binding motifs. This knowledge has to be accumulated over time and
needs validation in vivo to become more valuable. In addition, this knowledge
on target classes and privileged drug-like molecules will be complemented
by further insight into the ADMET rules and the correlation to the human
system.
Chemical biology in drug discovery would also address how drugs really work
in interdependent systems including pleiotropic effects of drugs [9].Emphasis
would also be laid on the characterization of compounds in distinguished
transgenic cellular and in vivo models to get a comprehensive set of data on
the whole biological profile. Such a systematic science-driven strategy would
lead into a new science of drug discovery. New types of targets require new
approaches that are much more knowledge-based and see the molecules in
their complex environment of interdependent biological networks. Needless
to say that the intention is definitely not to replace the classical pharmacology
approach. The question is simply how to reach the next level and get the most
relevant success critical information as soon as possible (Fig. 19-1).
Mechanisms of health and diseases and the complex interaction with the
environment at macroscopic and microscopic levels will become another
central theme in the context of future medicine that will be much more
focused on the question of prevention rather than classical treatment
and “polypharmacy” strategies. Other aspects are how to induce repair
mechanisms and how to cope with the question of personalized medicine.
It is apparent that these complex future questions will require much more
interaction between academic research and industry. The grand challenges
in drug discovery require new types of interaction, networks, and clusters of
knowledge. Chemical biology will not only be a major contributor but also a
key driver.
1148
I 19 Chemical Biology - An Outlook
Fig. 19-1 Reaching the next level.
19.4
Chemical Biology and Translational Medicine
Finally, some comments are needed regarding the interaction of chemical

biology and translational medicine. The leitmotiv of translational medicine
has been taken “from bench to bedside and bedside to bench”. Chemical
biologists need validation of their hypotheses, and also a learning loop from
clinical studies feeding back clinical observations and building them in into
new hypothesis. This is true for academic research as well as industry research.
I t regards not only new compounds but also already known drugs in the market
and their biological profiles including side effects. In the long run, this will
lead to future medicine with a strong focus on individualized prevention. Key
milestones and achievements will be the better use of already existing drugs,
and drugs for the individual needs of the patients.
This will require a battery of diagnostic tools, which characterizes the patient
in such a way that personalized treatment becomes a reality. Chemical biology
will also make valuable contributions by dealing with the biological systems
and supporting the development of new diagnostic tools.
19.5
Knowledge and Networks, Education and Training
Integration and leverage of knowledge across disciplines and working in

teams and networks are critical success factors. Therefore, it must be assured
19.G Conclusion I1149
that knowledge can flow and that there are no hierarchical or bureaucratic
boundaries. There is also a component that has to do with values and behavior:
sharing of knowledge across organizations and disciplines. Networks should
have in place mechanisms that encourage and reward knowledge sharing.
The networks should not be limited to academia. They should also include
partners from industry. This is a great chance to approach new fields with
grand challenges and to use the complementary capabilities of academia and
industry. In the precompetitive area, it’s just a question of commitment and
real interest. In the competitive area, it should be possible to find adequate
legal frames that respect the interest of the different stakeholders. In addition,
by performing joint efforts these partners will find more common ground, as
previously expected.
How should chemical biologists be trained and educated? Is this a training
in the job, a new curriculum or branch at the chemistry departments, or a
graduate program? Currently, there are all kinds of approaches and a clear
answer cannot be given at present. As the field is emerging, the requirements
and necessary skills will become defined. In the end, there might perhaps be
less traditional chemistry departments but more chemical biologists working
at different places.
19.6
Conclusion
There is already one common denominator or even a leitmotiv of future

chemical biology: chemical structures of small molecules and the biological
function in health and diseases at the level of biological systems. How do
structures look like those that induce the desired biological response profiles?
Although we are far away from predicting chemical structures and biological
function in whole organisms and do not yet understand the rules behind,
we feel very much encouraged through the chemical biology approaches. We
are looking forward with excitement to reach the next milestones. We can
define them and approach them in interdisciplinary projects. Some might be
at the level of grand projects and need significant resources. They will all
be based on knowledge. Knowledge is the key driver. The chemical biology
approach is a new paradigm. It will guide us in the biomedical research of
the twenty-first century. Currently, we are becoming more and more aware
of how complex biological systems function. And even the question of what
a gene really is, has been asked recently [lo].Therefore, the realization of our
vision requires even more joint efforts across disciplines, organizations and
institutions.
Chemical biology has been the answer to Arthur Kornberg’s provocative
statement. It is the common ground from which new directions will evolve
and grand challenges will be approached. This will bring science to the next
level.
1150
I I9 Chemical Biology - An Outlook
Chemical biology will contribute significantly to systems biology, and to

some extent contribute to translational medicine.
Today chemical biology still means different things to different people.
Nevertheless this is more a strength than a weakness. It is a unique opportunity
to become defined and positioned over time by the scientists and their
invaluable scientific achievements.
Acknowledgment
I am very grateful to a number of colleagues who I had the privilege to work

with and who have stimulated and encouraged me to develop Chemical Biology
approaches in industry: Frank Douglas, Birgit Konig, Hildegard Nimmesgern,
Daniel Schirlin, Andreas Batzer, Hans-Peter Nestler, Heiner Glombik, and
Bruce Baron. They all contributed significantly not only to develop a great
concept but also to implement and make it a success.
References
1. A. Kornberg, The Two Cultures: 6. S. Ding, P. Schultz, A role for

chemistry and Biology, Biochemistry chemistry in stem cell biology, Nut.
1987,26,6888-6891. Biotechnol 2004, 22, 833-840.
2. S.L. Schreiber, K.C. Nicolaou, 7. M.C. Fishman, J.A. Porter, A new
What’s in a name? Chem. Biol. 1996,3, grammar for drug discovery, Nature
1-2. 2005,437,491-493.
3. A community of chemists and 8. G. Wess, M. Urmann, B. Sickenberger,
biologists, Nat. Chem. Biol. 2005, 1, 3, Medicinal Chemistry: Challenges and
Editorial. Opportunities, Angew. Chem. Int. Ed.
4. A. Eschenmoser, One Hundred Years Engl. 2001,40,3341-3350.
Lock-and-keyPrinciple, Angew. Chem. 9. G. Drews, Case histories, magic bullets
Int. Ed. Engl. 1994, 33, 2363. and state of drug discovery, Nature
5. S.L. Schreiber, Target-Oriented and Reviews Drug Discovery, 2006, 5,
Diversity-OrientedOrganic Synthesis in 635-640.
Drug Discovery, Science 2000, 287, 10. H. Pearson, Whats a Gene, Nature
1964-1969. 2006,441,399-401.
Chemical Biology
I1151
Index
a ACE, Angiotensin converting enzyme (ACE)

AANAT, Arylakylamine N-acetyltransfrase 699
(AANAT) 385 N-Acetyl Galactosamine (GalNAc) 551
ABPP, Activity-based protein profiling N-acetyllactosamine
( A B D D ) 403,1119 natural substrate 643
Absorption, distribution, metabolism, 2’-acetyltransferase(AAC(2’)) 68 1
elimination and toxicology (ADMET) 6’-acetyltransferase(AAC(6‘)) 681
801,1147 ACM, Atom Connectivity Matrix ( A C M )
Absorption, distribution, metabolism, 729
elimination/excretion, and toxicity ACP, Acyl camerprotein (ACP) 463, 472,
(ADMET) properties 521
applicability domain, estimation of AcpS, Acyl carrier protein synthase (AcpS)
1015f 472
applications and examples of 1018ff Acridonylalanine (acdAla) 289
datasets 1OlOf Actin, see Cytochalasin
pretreatment of 1013 Actinorhodin 525
descriptors, calculation of 1016ff Activated sugar-nucleotide substrates
development of 1008f 636
drug solubility 1007 Activation domain (AD) 1122
future developments in 1035f Activation function 1 (AF1) 895
general considerations for 1009ff Activation function 2 (AF2) 892
Activation-induced cell death (AICD) 1101
history of 1008f
Activator protein 1 (AP-1) 895
in silico toxicity models 1033ff
activities
intestinal permeability 1007f
like depudecin 99
model validation 1013f
Activity identifier (AID) 769
models 1OlOf
Activity-based protein profiling (ABPP)
multivariate methods 403,406, 1119
linear 1Ollf disease-associated enzymes
nonlinear 1013 parallel discovery of 423
outlier compound, labeling of 1015f human disease, diagnostic markers and
Mahalanobis distance 1015 therapeutic targets for 423
prediction of 1003ff small-molecule probes, active
statistical tools 1 O l l f f site-directed
toxicity 1008 measuring protein activity 403
training and test set selection 1014f Acyl carrier protein (ACP) 463,472, 521
acdAla, Acridonylalanine (acdAla) 289 fusion proteins
ACDName 771 fluorescence labeling of 474
Copyright 0 2007 WILEY-VCH Verlag CmbH 61 Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
1152
I Index
Acyl transferase (AT) 521 Aminoglycoside arrays

Acyl-carrier protein synthase (AcpS) 472 hybridization to 681
AD, Activation domain (AD) 1122 Aminooxypentane (AOP) 583
Adenine triphosphate (ATP) 826 AMPA,
ADMET, Absorption, distribution, a -amino-3-hydroxy-5-methyl-4-isox~zole-
metabolism, elimination and toxicoloby propionate (AMPA) 460
(ADMET) 801,1147 Ampholyte 1020
ADMET properties, Absorption, Analog-specific Kinases 127
distribution, metabolism, kinase-signaling pathways 128
elimination/excretion, and toxicity peptide substrates
(ADMET) properties 1003 combinatorial 128
Adrenoceptor 938 phosphoproteomics 128
Bz-adrenoceptor protein 941 targets
cloning of 941 in the genome 128
AF1, Activationfunction I (AFI) 895 of each kinase 128
AF2, Activationfunction 2 (AF2) 89% Analysis of variance (ANOVA) 1087
Affinity chromatography 941 Androgen receptor (AR) 903
Affinity labeling 941 ANF, Atrial natriureticfactor (ANF) 374,
Agonist 939 714
full 939 Angiogenesis 104
inverse 939 blood vessels
partial 939 from preexisting 104
AGT, O'-Alkylguanine-DNA alkyltransfirase new 104
(AGT) 428,463 Curcuminoids 105
fusion proteins Fumagillin 105
application of 465 Inhibitors 104
immobilization, scheme for 468 Angiotensin converting enzyme (ACE)
labeling of 463ff 699
Aib, a-amino isobutyric acid (Aib) 995 1,2-anhydrosugar 671
AICD, Activation-induced cell death (AICD) Animal Models 239
1101 degenerative diseases 240
AID, Activity identijier (AID) 769 of Disease 239
Aldehyde dehydrogenase-1 (ALDH-1) 411 study of
ALFUC, a-l-Fucosidase (ALFUC) 369 invivo 239
Alkene-containing linker 671 pathway 239
Allosteric (allotopic) modulator 939 protein 239
Amide ligation transgenic mice 239
using auxiliaries 577f ANN, Art$cial neural network (ANN)
Amine-containing linkers 673 1023
a-amino-3-hydroxy-5-methyl-4-isoxazole-ANOVA, Analysis of variance (ANOVA)
propionate (AMPA) 460 1087
a-amino isobutyric acid (Aib) 995 Antagonist 939
Amino acid Antibodies 52
FlAsH approach catalytic antibodies 53
small molecule modification, reliance molecules
on 612 clonal expansion 53
Amino acid side chains designed 52
synthesis of functionalized 578 guide 53
Amino group somatic mutation 53
lysine acylation 595 Antithrombin 111 (AT 111) 683
secondary bioconjugation AOP, Aminooxypentane (AOP) 583
oxidative coupling reactions 623 AP-1, Activator protein 1 (AP-1) 895
Aminoacyl tRNA synthases 386 Apicidin 98
Aminoglycoside 668,679,681,682 cyclic tetrapeptide 98
Depudecin 98 B-Arrestin 942
Index
I 1153
structural similarity Artificial neural network ( A N N ) 1023

toTPX 99 Aryl carrier proteins (ArCPs) 472
Apoptotic pathways 1046 Arylalkylamine N-acetyltransferase
Applications 96, 216, 237f, 255 (AANAT) 394
Angiogenesis 104 melatonin production 394
Animal Models 239 nonphosphorylated 395
Capsaicin 108 phosphonate-containing 394
Catalysis 220 role of phosphorylation of 394
Cell Therapies 240 semisynthetic, stabilities of 395
DNA-Protein Interactions 218 Ascomycin 558
Helical Mimetics 260 AT, Acyl transferuse (AT) 521
Immunosuppressant 106 AT 111, Antithrombin I l l (AT 111) 683
Mechanism of action 97 Atom Connectivity Matrix (ACM) 729
modulators ATP, Adenine triphosphate (ATP) 826
bioavailability 255 ATP-binding site 3%
peptide-based 255 ATPyS-acetyl-kemptide 400
Parthenolide 109 Atrial natriuretic factor (ANF) 374, 714
Practical Examples 96, 255 Automated
Proteasome 101 carbohydrate synthesis 670
Protein Function 239 oligosaccharide synthesis
Protein-Protein Interactions 216 with glycosyl phosphates 673
Regulated Transcription and Gene oligosaccharide synthesizer 668
Therapies 241 Aventis
RNA-Protein Interactions 219 traditional research and development
Small Molecule-Protein Interactions organization
220 organizational design, of three
B-TurnslStrands 256 principles 790
two-hybrid assay relevant selected target, critical in
for biology research 216 disease 791
integral 216 Azides and alkynes
Aqueous solution dipolar cycloaddition
native chemical ligation in 575 Click reactions, use of “spring-loaded’’
AR, Androgen receptor ( A R ) 903 reactive components 619
ArCPs, Aryl carrier proteins (ArCPs) 472 enumerating stereospecific chemical
Array experiments reactions 619ff
experimental designs, issues of 1085ff
global gene expression studies b
high-density oligonucleotide arrays, Bacteria, pathogenic
biotin-labeled cRNA target 1086 detection of 684
microarray technology Bacteriorhodopsin 941, 949
amplification protocol, choice of Bafilomycin and Concanamycin 103
1089 biological activities in vitro 103
messenger RNA, and pooling of R N A regulators of organelle pH 104
samples 1088 BAL, British anti-lewisite (BAL) 435
replication and sample size 1087f BCS, Biopharmaceutics classijcation system
RNA amplification 1088 (BCS) 1032
replicate microarray experiments BCUT descriptors 1038
natural differences of, gene expression Beadle and Tatum’s
in inbred mouse strains 1087 original tenets of “one gene-one enzyme”
spotted complementary DNA (cDNA) hypothesis 302
microarrays Benzamide
and oligonucleotide microarrays HDAC inhibitors, fourth class of 701
1086- O‘-Benzylguanine (BG) 463
1154
I hdex
06-Benzylguanine-Methotrexate (BGMtx) NHS esters, reaction of

467 widely used strategy 595
Benzylguanine-SNARF(BGSF) 466 Bioconjugation proteins
BG, 06-Benzylguanine( B G ) 463 metal-free bioconjugation
BG, Bindinggroup ( B G ) 409,463 using strain-promoted [3 + 21 dipolar
BGMtx, 06-Benzylguanine-Methotrexate cycloaddition reaction 622
(BGMtx) 467 Bioconjugation reaction
BGSF, Benzylguanine-SNARF( B G S F ) 466 bioconjugate purification 624f
Biarsenical mass spectrometry, advances in 627
for tetracysteine peptide new transition metal-based methods,
picomolar affinity of 452f availability of 627
photoinduced generation of Bioconjugation technique
ReAsH-tetracysteinecomplex 449 targeting native functionality
singlet oxygen 449ff countless new strategies, provision of
Biarsenical dye 594
sequence-specific protein labeling Bioinformatics 959, 1048
with FlAsH dyes 612 Biological Analysis
Biarsenical-tetracysteine Screening 20
FlAsH-tetracysteineanisotropy Biological field
monitoring proteolysis 447 gene profiling, molecular basis of 1083
Biarsenical-tetracysteine complex Biological networks
biarsenical ligand 432 connectivity of 302
SDS-polyacrylamideGel Electrophoresis Biological Problems 18, 19, 21, 23, 25, 27,
(PAGE)analysis 453 29, 31, 33, 35, 37, 39,41,43, 5 3
Biarsenical-tetracysteine method BiologicalAnalysis 20
dithiol arsenic antidotes Chemical Synthesis 20
EDT 437 designed
Biarsenical-tetracysteine system biological functions 54
application DNA modules
BarNile-EDT2,synthesis of 446 predefined 54
smaller fluorescent reporter, genome
constructing 441 fully synthetic 54
Bicyclohexyl mimetics 646 man-made cell 53
Bile salt export pump (BSEP) 367 by Nature directly 53
Binding energie 396 synthetic biology 53
Binding group (BG) 409 Biological research
Biochemical mechanisms cell
pathway activation complete protein repertoire
kinetics of, magnitudes and timing of (proteome) 404
signals 1061 functional proteomics
Biochemical networks 1045 chemical strategies for 405ff
Biochemical pathways protein expression and protein
downstream signaling cascades and function 405
networks 1072ff history and development of 404f
Biochemical signaling mechanisms molecular mechanisms
evaluation of, quantitative models and focuson 1061
quantitative experimentation 1077 introduction to lO6lff
Bioconjugates novel genes, identification of 405
chemical synthesis of large 567ff postgenome era
Bioconjugation global approaches for 404f
history and development of 595ff Biological Solutions to 45,47,49, 51
new methods Biological space
targeting of, unnatural functional changing scaffolds, to scaffold morphing
groups 616ff 841
chemical and biological space new diagnostic markers and drug
Index
I 1155
concepts, schematic visualization of targets, identification of 404

835 transcript profiling, standard tool in
chemical space 404
focused libraries and scaffold hopping, Biomolecular Interfaces 135
iournev through 837ff biological specificity
kinase inhibitors: competitors of ATP oflarger interfaces 135
837 Engineering 135
kinome maps 837 Extended 135
combinatorial chemistry interfaces
building on established - privileged large regions of protein 135
scaffolds 835ff redesign 135
building on established - privileged Biomolecules
scaffolds, relation to target families unnatural functional groups
839 “Amber” codon 613
in silico scaffold hopping, and methods for, biosynthetic
biological scaffold morphing 840 incorporation of 612ff
molecular diversity, advent of 838 posttranslational protein
combinatorial libraries modifications using metabolic
chemical space, around proven machinery for 613
starting points 836 successful for, N-acetylglucosamine
exploring of 834ff derivatives 614
putting pieces together - fragment Biopharmaceutics classification system
approaches 842ff (BCS)
selected fragment screening experiment classes of 1032
application to, proteases and kinases computer-based 1032ff
845 Biopolymers
Biological studies classes of 669
gene function interactions of classes of 670
chemical probes with, specificity of Bisubstrate analog 395
genetic methods 365 for serinelthreonine kinase 399
Biological systems Bisubstrate tyrosine kinase inhibitors 396
different strategies Black, James
comparison of 363 alky-substituted histamine analogs
global response of 379 beta-blockers, development of 359
levels of hierarchy, probing of 355, 356 antihistamines
protein function and two histamine receptors 794
deeper understanding of 355 Blood group determinant oligosaccharides
modulation by small molecules 355 671
reverse chemical genetics Bovine rhodopsin 941
agonists of a - and /?-adrenergic Branched oligosaccharides 671
receptors 359 Breast cancer cells
chemical biology, probe tools gene expression profiling
identification 357 identification of desired molecular
concept of 356ff fingerprints 922
general considerations 361ff Brefeldin A (BFA) 84
Biology and biomedical research 110-kDprotein 85
genome-wide gene expression analysis BFA action
widely used tool 1109f biochemical 87
Bioluminescent resonance energy transfer mechanism 87
(BRET) 1132 cycle
Biomedical research GTP-GDP 87
proteomics Golgi
methods, need for 405 ARF binding to 87
1156
I Index
Brefeldin A (BFA) (continued) Caged Proteins 156
Golgi-ER channel activation
recycling pathway 85 kinetics of 159
membrane transport methodology
from the Golgi 85 biosynthetic 156
BRET, Bioluminescent resonance energy Mutagenesis
tranSfer (BRET) 1132 Amino Acid 156
British anti-lewisite (BAL) 435 Site Directed 156
BSEP, Bile salt export pump (BSEP) 367 Unnatural 156
BTK, Bruton’s Tyrosine Kinase ( B T K ) 858
Photoactive Residues
Bumps and Holes 231 Introduction of 156
photoirradiation
after 157
C before 157
C-Abl protein kinase 549 replacing
C-Crk-I1 550 natural ones 157
C-Crk-I1 signaling protein 549 trans-cis
C-terminal thioester 387 photoisomerization of the azobenzene
C-terminal tyrosine phosphorylated tail moiety 158
391 Caged Tyrosine Residues 160
C-type lectin-like domains (CTLDs) 643 Caged Cysteine and Thiophosphoryl
C-type lectins 643ff Residues 162
C2 hydroxyl group 671 caged version
a-CA, a-Chloroacetamide(a -CA) 411 in vitro 161
Ca+2-sensingreceptor (CaR) 953 in vivo 161
CaBP, Calcium-bindingprotein (CaBP) 369 LMS-1 161
CADD, Computer-assisted drug design RS-20 161
(CADD) 958 nitrobenzyl group
cage asacage 160
cyclic nucleotides 147 signaling pathway 162
Caged Compounds 140 Calcitonin receptor-like receptor
Caged Proteins 156 (CRLR) 948
Controlling Protein Function 140 Calcium-binding protein (CaBP)
modulate protein function 140 369
Multiresidue Protein Caging 150 CALI, Chromophore-assistedlight
Photoactivatable Groups 140 inactivation (CALI) 428
Single Residue Protein Caging 152 Calmodulin (CaM) 446
Small Caged Molecules 159 Ca2+ activation
small molecule 140 protein dynamics of 446
Caged Cysteine and Thiophosphoryl single FlAsH-labeled CaM molecules
Residues 162 protein motions of 448
on serine residues 162 CaMKI, Calmodulin-depend kinase I
peptide 163 (CaMKI) 870
GRTGRRNAI 164 CAMP, Cyclic adenosine monophosphate
inhibitory behavior 164 (CAMP) 312,938
thiophosphotyrosyl 163 CAMP response element binding (CREB)
protein kinase A 163 313
thiophosphoryl-Ser residue CAMP-response Element Binding Protein
over a cysteine residue 162 (CBP) 914
Caged Peptides 159 Cancer chemotherapy
Caged lysine 160 multiple HDAC inhibitors, in clinical
Caged Tyrosine Residues 160 trials for 696
Phosphorylation Sites and Candidate drugs (CDs) 1004
Phosphopeptides 165 selection of 1010
Capsaicin 108, 133 CBP, CAMP-response Element Binding
lndex
I 1157
biochemical change in mammal versus Protein (CBP) 914

avianVR1 134 CBP, Chemical biology platforms (CBP)
cation channel 789,914
avianVR1 133 CC, Computational Chemistry (CC)
VR1 133 1003
channel's response to heat and acid CCD, Charge-coupled device (CCD) 448
134 CCK, Choleqstokinin (CCK) 955
component of hot chili 133 CDCA, Chenodeoxycholic acid (CDCA)
pungent ingredient of hot pepper 108 367
Sensitivity 133 CDG, Congenital disorders ofglycosylation
VR1 108 (CDG) 635
CaR, Ca*+-sensingreceptor (CaR) 953 CDK-related kinases (CRKs) 1130
Carbodiimide coupling reagents 485 CDK2, Cyclin-dependent Kinase 2 (CDKZ)
Carbohydrate 567,635,668 845
branched 671 CDKs, Cyclin-dependent kinases (CDKs)
cell-surface 681 1130
function of, in biologically important cDNA, Complementary DNA (cDNA)
recognition processes 669 1084
interactions of, in biological systems CDs, Candidate drugs (CDs) 1004
672 Cell
as vaccines 677 living cells
Carbohydrate affinity screening 637, 677 designing protein tags for 454
Carbohydrate-functionalized fluorescent Cell biology
polymer 668,684 regulatory processes in 1045
Carbohydrate microarrays 674, 676 Cell culture
preparation of 676 isoprenoid biosynthesis, halting of
Carbohydrate-modifying enzymes 638f with addition of lovastatin 615
Carbohydrate-nucleic acid interactions Cell cycle 1046
aminoglycosides 679 Cell decision making
Carbohydrate-processing enzymes in context-dependent manner, tightly
inhibitors of 657f controlled 1061
Carbohydrate-protein interactions Cell function
selectins and heparin 681 cytosolic signaling enzymes and adaptor
Carbohydrate recognition domains (CRDs) proteins
641 association of 1066
CARMl, Coactivator-associated arghine growth factor and cytokine receptors
methyltransferuse 2 (CARMI) 914 of more complex situations 1065
Carrier protein (CP) 471 intracellular signal transduction
CART, Constitutively activating receptor processes
technology (CART) 948 modelling of lO6lff
Catalysis 206, 220 intracellular signaling 1065
bond formation normal and diseased cell function
acceptor 222 ability of control 1061
donor 222 receptor phosphorylation
glycosidic 221 general purpose of 1066
cephalosporin receptor-mediated covalent
hydrolysis 207 modifications
enzyme 206 and molecular interactions 1065ff
as a fourth component 206 signal transduction
three-hybrid system 20G biochemical integration of 1061
Quest 208 Cell lines
S. cerevisiae 222 human cancer cell lines, behavior of
CBD, Chitin binding domain (CBD) 545 416
1158
I Index
Cell lines (continued) Cellular compartments

xenograft-derivedbreast cancer cells cellular and subcellular length scales
secreted protease activities, dramatic concentration gradient, concept of
elevations in 416 1069
Cell-permeable inhibitors 640 cytosol, diffusive transport in
Cell receptor complexes 1069
kinetic proof reading 1067 spatial organization and gradients on
ligands with fast off-rates 1068 1069ff
receptor phosphorylation Cellular functions
and binding states lO66f spatial gradient sensing
significant challenges, standpoint of ability of localizing, intracellular
modeling 1066 second messenger(s) 1070
slow versus rapid exchange and chemotaxis 1070
determination of, substrate spatial gradient sensing, in eukaryotic
phosphorylation rates 1069 cells
sub-nanomolar ligands adhesion processes, driving cell
functional receptor complexes, crawling 1070
forming of 1067ff Cellular gene products
T-cell receptor engagement target identification problem 308
of peptide-MHC complexes 1067 Cellular processes
Cell regulation and function multiparametric considerations
molecular mechanisms dosage effects 318
underlying cell function 1061 dose and time 318
Cell surface multidimensional 318
receptor dimerization Cellular retinoic acid binding protein
forming dimers, or higher oligoniers (CRABP-1) 442
oncell 1063ff Cellular retinoic acid binding protein 11
receptor trafficking (CRABP-11) 369
non static receptors 1065 Cellulose GG9
Cell-surfacecarbohydrate 681 Central nervous system (CNS) 379
Cell-surface carbohydrate recognition CFP, Cyanfluorescent protein (CFP)
interactions 641ff 428
Cell surface receptors cGMP, Cyclic guanosine monophosphate
binding of (cGMP) 373
signaling pathways 1062ff Chain length factor (CLF) 520
Michaelis-Menten kinetics Charge-coupled device (CCD) 448
hyperbolic binding 1064 Selvin and coworkers
receptor dimerization single ReAsH-tetracysteine complexes
receptor dirnerization mechanisms 448
and dose response 1064 single ReAsH-tetracysteinecomplexes
Cell Therapies 240 nanometer localization of 448
cell growth switch 240 Chemical Abstracts via SciFinder 760
death switch 241 Chemical and biological data
Regulated 240 other organizational and knowledge
signaling proteins 240 challenges 801f
vaccine Chemical biological studies
cellular cancer 241 molecular probes
Cell-based assays 361 to study, cellular functions of proteins
Cell-based reporter assays 1118
FK228 studied by Yoshida group Chemical biology 1143ff
712 altering landscape
spiruchostatin A, biological with new chemical tools 628
characterization of 712ff array synthesis, starting points for
Cell-cell recognition 668 libraries 835
biological space “molecular toolkit”, expanding of
charting biological space - structural 300
biology and informatics 829ff nonnatural amino acids
homology modeling, understanding transfection method 288
structural space 830 nonnatural mutagenesis
membership of, protein to protein application of 289ff
family 831 basic strategy of 291
orphan GPCRs, receptors without fluorescence labeling 289
agonistic or antagonistic ligands polarity-sensitive fluorescent amino
832 acids 289
understanding biological machines, position-specific fluorescence labeling
from structure to function 832ff 289
understanding of 828ff nonnatural mutants
and drug discovery engineered aaRSs 287
understanding of, MoA of organic in vivo aminoacylation 287
small molecules 1135 microinjection method 288
and target family approach 847 synthesis of 287ff
and polar/hydrophobic balance 805 novel molecular entities
chemical-genetic approaches modulating biological processes 825
high-throughput phenotypic assays pathways and networks
307 screens to reveal connections between
chemical-genetic maps, creation of 307
307 PNA-assisted aminoacylation method
chemical-genetic modifier, use of 307 for amino acids and tRNAs 281
combining structural information reshaping methods of, drug discovery
biological process modulation 825 846
concept of 1143ff role of chemistry in 1144
drug discovery single molecular spectroscopic analysis
synergizing structural relationships of 289
proteins 826 small-molecule modulators
drug-like molecules, generation of charge of identifying 423
1146 structure function correlations 1145
drugable targets, identification of structure-activity relationships
1146 1145
education and training of chemical synthetic codons
biologists 1149 containing nonnatural nucleobases
genomic tools 286f
for identifying candidate targets 832 Schultz’s group, nonnatural base pairs
green fluorescent protein (GFP) 287
alternative methods, variety of 427 system biology 1145
Hecht method target family approach 825
for isolated tRNAs in test tube 281 foundations of 825
micelle-mediated method, for target family oriented concepts
aminoacylation in test tube 281 discovery paradigm in, pharmaceutical
in academia 1144ff industry 826
in biomedical science 1143 translational medicine 1148
in drug discovery 1143,1147 tRNA aminoacylated with nonnatural
in industry 114Gf amino acids, import of 288
in vitro cellular experiments Y3H-cDNA library screening workflow
compound within range of, solubility interaction of MFCs, of kinase
knowlegde and networks 1148 inhibitors 1125
medicinal chemist Chemical biology and drug discovery
chemical tools and leads, learning protein function
from experience 804 important strategy for 355ff
1160
I Index
Chemical biology platforms (CBP) 789 centrosome-duplication assay

core team chemical-genetic modifier screens
appointment of knowledge 315
management specialist 798 chemical space
drug innovation and approval (DIdA) dimensionality reduction and
797 visualization of 330ff
drug metabolism and pharmacokinetics dimensionality-reduction and
(DMPK) pattern-finding techniques 331
human studies and “rapid overview of 331
prototyping“ feed back information classical genetics
796 development and refinement of 344
Kinase Chemical Biology Platform general considerations of 307ff
first of four CBPs 798ff genetically encoded probes, use of
lead optimization organization 345
areas of drug metabolism and molecular recognition code(s) 346
pharmacokinetics (DMPK) 796 cluster analysis of
management multidimensional, chemical-genetic
mergers, additional complexity of data 332
789 computational framework
management challenge dendrogram showing clustering of,
discovery and development cycle small molecules 332
789ff mapping chemical space 327f
in implementation 789ff cytoskeleton
knowledge-driven S curve 790 and cell division 305
modern day version of, “drug forward and reverse chemical genetics
discoverer” 789 308
organizational structures for forward chemical genetics
establishment of 796ff important role in 314
Chemical Complementation 199, 201, molecular tool box, development of
203,205,207, 209, 211, 213, 215, 217, 299
219,221 forward chemical-genetic discovery
Power of Genetics 199, 201, 203, probes of biological mechanisms
205,207, 209, 211, 213, 215, 217, 346
219,221 gene products
Chemical databases 760 probes of biological mechanisms
Chemical Dimerization Technology 301
228 targeting small molecules 301
Development of 228 historical and conceptual developments
Dimerization Systems 229 of 299ff
FK1012 history/development of 302ff
homodirnerizer 229 image-based phenotypic screen
Rapamycin 229 inhibiting PI3K/Akt signaling 316
immunosuppressive drug localization of GFP-tagged FOXOla
FK506 228 315
interaction PI3K/PTEN/Akt signal transduction
FK506-FKBP 229 pathway, importance of 316
Chemical genetics mapping chemical space
and classical genetics adjacency matrix 329
perturbations, nonheritable and using forward chemical genetics
combinations of 316f 326ff
applications and practical examples small molecules as chemical graphs
336ff 329
biological mechanisms mRNA profiling
small-molecule probes of 299 chemical-genomic profiling 333
multidimensional phenotypic hybrid carbohydrate/glycoprotein
Index
I 1161
descriptors 330 microarrays 676

chemical-genetic data array 330 microsphere arrays 676
Neurospora work surface plasmon resonance (SPR)
one gene-one enzyme 299ff 676f
Pearson correlation matrix 332 Chemical graph
phenotypic assays concept of 727
neural stem-cell differentiation 314 Chemical Inducer of Dimerization (CID)
phenotypic assays for 312 208,466
forward chemical-genetic screening pairs
311ff high-affinity 210
protein function, study of 371ff ligand/receptor 209
protein targets to dimerize
biologically active small molecules, in vivo 208
examples of 303 transcriptional activator 208
reverse chemical genetics Chemical Industry 54
applications and practical examples CO and HL,tohydrocarbons 55
366 Fossil Fuel Dilemma 54
Schreiber group Nuclear energy 55
immunophilins and histone Present 54
deacetylases, chemical biology of Chemical library synthesis
366 conceptual development in 319
small molecules, assaying of 347 Chemical ligation
small-molecule libraries future directions of 586ff
appropriate cell-based assays 304 Chemical ligation reactions
small organic molecules conditions, selection of 580f
screening of 308 native 580f
T-cell signaling rates, enhancement of 581
role of calcineurin in 307 requirements for 574
using “forward” chemical genetics 299 site, selection of 580
using signaling pathway Chemical ligation theme
characterizing of 304 native
Chemical genomics variations on 576
and chemical proteomics Chemical probes
scanning proteome for 1118ff search to illuminate carbohydrate
scanning proteome function 635ff
using bifunctional receptor ligands, development of 636ff
outlook of 1118 history of 636ff
Chemical glycomics Chemical Problems 10, 11, 13, 15, 17,45,
automated carbohydrate synthesis 47, 49, 51
670ff Antibodies 52
carbohydrate-nucleic acid interactions artificial models
679ff of living systems 12
carbohydrate-protein interactions Biological Problems 18, 19, 21, 23,
68lff 25, 27, 29, 31, 33, 35, 37, 39, 41, 43,
for drug discovery 668ff 53
oligosaccharide conjugate vaccines chemical sciences
6778 chemical biology 10
pathogenic bacteria, detection of 684ff Diels-Alder Reaction 16
tools for 672ff Historical Periods 12
carbohydrate affinity screening 677 ideal synthesis 11
Carbohydrate Microarrays 674ff industry
fluorescent carbohydrate conjugates efficiency 11
677 expediency 11
1162
I Index
Chemical Problems (continued) Chemical Synthesis 20

nanotechnology lock-and-keymetaphor 20
chemical sciences 10 modify target structure 21
Organic synthesis Multicomponent 28
bottom-up strategies 10 Preparation 20
perfect reaction 11 Sequence 20
Proteins 45 Single-component 21
synthetic chemist target molecule
as a practicing technologist 12 synthesizing 21
Chemical proteomics with particular properties 21
affinity chromatography 378 Chemical synthesis 538
widely used method 1119 Chemical topology 730
cellular assay system Chemical-genetic modifier screens
strategy for, synthesis of MFCs small-molecule suppressors and
1128 enhancers
yeast three-hybrid (Y3H) 112Off identification of 317
chemical proteomic initiatives Chemical-genetic network
alternatives to, classical protein activity chemical-genetic modifier screens
profiling 11 19 graph-theoretic framework 336
compound-induced protein-protein forward chemical-genetic screen for
interaction inhibitors of mitosis 337
concept of 1122 Chemical-genetic screens
interaction of, small molecule with discrete methods of
proteins analysis of forward chemical-genetic
supporting cDNA library screening data 334ff
1119 Cheminformatics 723, 958
new cheminformatic approaches 379 chemical space 724f
organic small molecules chemical structure graphs 725ff
embodying therapeutic agents, computable representations of structure
important class of 1118ff 729ff
small molecule targets molecular descriptor spaces 746ff
and future development of 1132ff multidimensional outcome metrics
three-hybrid-based (3H) technologies 750ff
evolution, development, and Chemistry
applications of 1118 complex biochemical milieu, compatible
understanding, cellular targets and with
signaling mechanisms 1118 high reactivity and selectivity 454
Y2H system future development
interaction of bait and prey fusion discipline of 421f
proteins 1123 functional analysis of proteome 421
Chemical research Chemistry and biological applications
computer assistance to 724 biarsenical-tetracysteine method 427
Chemical shift perturbations (CSPs) 866 biarsenical-tetracysteine protein tag
Chemical Solutions 10,11, 13, 15, 17 427ff
for the construction ofmolecular protein trafficking 427f
skeletons 10 novel applications, development of 427
trusted reactions 10 Chemistry and Biology 3
Chemical space 723,725 analysis
cheminformatics and 724f Top-dow 3
concept of 726 biochemistry 4
Chemical structure Biological Solutions 45, 47,49, 51
basic principle of 725 Chemical Industry 54
encoding 729 Chemical Solutions 10, 11, 13, 15, 17
properties of 725 Darwinian evolution 4
interdisciplinary 3 CLF, Chain lengthfactor (CLF) 520
fndex
I 1163
Lessons 55 CMC, Comprehensive Medicinal Chemistry

living cell as a model 3 (CMC) 760
molecular biology 4 CNS, Central nervous system ( C N S ) 379
protein synthesis 4 CoA, Coenzyme A (CoA) 694
synthesis Coactivator-associatedarginine
bottom-up 3 methyltransferase 1 (CARMI) 914
Chemoattractant Receptor-Homologous Colchicine and Tubulin 72
Molecule Expressed on T Helper Type 2 aneuploidy 72
(CRTH2) 960 chromosome
Chemokine 581 movements 72
structure-function analysis of %Iff colchicine
Chemokine receptor 948 binding activity 74
Chemoselective coupling reaction 540 labeled with H3 74
Chemoselective ligation 539 microtubules 74
Chemoselective transthioesterification mitosis 72
reaction 540 spindle
Chenodeoxycholicacid (CDCA) 367 fiber dynamics 73
Chinese Hamster Ovarian (CHO) 395 mitotic 72
Chitin binding domain (CBD) 545 taxol 74
CHO, Chinese Hamster Ovarian 385,465 vinca alkaloids 74
Cholecystokinin (CCK) 955 Column chromatography 484
Chromophore-assisted light inactivation combinatorial approach
(CALI) 428 large variations of related molecules 33
Chromophore-labeled proteins Combinatorial chemistry
purification of building blocks
hostlguest interaction 626 growing accessibility of 378
chromosome 77 compound libraries
genes 79 natural product guided compound
genetic screens library development 362
Mad/Bub 80 in silico scaffold hopping, and biological
nocodazole 80 scaffold morphing
spindle kinase-directed drug discovery 840
assembly 78 isoform selective inhibitor
checkpoint 78 roles of isoforms 370
mitotic 78 privileged fragments
CID, Chemical Inducers of Dimerimtion DFG-out conformation 838
( C I D ) 466 peptide-binding GPCR antagonists
Classical genetics 839
central dogma (DNA-to-RNA-to-protein) target family oriented libraries, design of
tenets of 300 838
chemical genetics Combinatorial library system
mapping “chemical space” using using CDK2 protein crystals 845
phenotypic descriptors 299 Combinatorial synthesis 487ff
vs. chemical genetics 301 Combinatorialization
genetic maps,creation of 299 power of 487f
Cleavage Plane 80 CoMFA, Comparative molecularjield
in Cytokinesis 80 analysis (CoMFA) 950
Mad2 81 Competitive antagonism 939
model 81 Complementary DNA (cDNA) 1084
Monastrol Complex proteomes
cytokinesis 80 ABPP strategies for
inhibitor 80 in vivo analysis of enzyme activities
Positioning 80 418f
1164
I Index
Complex proteomes (continued) small molecule

activity-based probes protein target, identification of 362
functional role o f , cysteine proteases small molecule probes
416 computer-assisted drug design 362
activity-based protein profiling (ABPP) Computer chemistry 724
comparative and competitive ABPP, Computer-assisted drug design (CADD)
applications and practical examples 958
415ff Computer-encodable structure
general considerations of 407ff representation
schematic of, representative protease classes of 730
posttranslational regulation Concanamycin, see Bafilomycin
mechanisms 407 Conditional protein splicing (CPS) 557,
activity-based protein profiling (ABPP), 559
expanding scope of 419ff Congenital disorders of glycosylation
bio-orthogonal chemical reactions (CDG) 635,649
enabling ligation of, reporter tags onto Conklin
proteins 419 receptors activated solely by synthetic
comparative profiling for ligands (RASSL)approach 365
discovery of enzyme activities Connection tables 730
415ff Conotoxins
competitive ABPP for nAChRs, chemical biological study of
potent and selective reversible enzyme 376
inhibitors 417f Constitutively activating receptor
1DE gel-based methods for technology (CART) 948
employing gel-based or gel-free Core team and strategy teams (CBP
strategies 422 strategy teams)
probe-labeled proteomes 422 responsibility of
enzyme activities downstream implications 799
global profiling of 407 Corepressor
general method for, performing AB PP activity diminishing accessory proteins,
419 role of 914f
in vivo model of, human cancer-breast interference in NF-KB and AP-1
cancer xenografts 416 pathways 915
inhibitor discovery by ABPP CoRNR, Corepressor nuclear receptor
reversible inhibitor library, and (CoRNR) 914
activity-basedprobe 418 Correcting Errors 81
papain-directed ABPP probes anaphase 84
inhibitor screening 418 attachment
probe-enzyme reactions errors 83
molecular basis for 421 syntelic 83
SH superfamily, of enzymes 415 Aurora kinase
Complex signaling networks inhibitors 81
molecular composition of 1049 Reversible 81
Compound libraries small molecule 81
synthesis of 378 dynamics
Comprehensive Medicinal Chemistry microtubule fibers 83
(CMC) 760 mitosis
Computational chemistry 724 timescales 84
Computational Chemistry (CC) 1003 oncogenesis 83
Computational permeability models Corticotrophin releasing factor (CRF)
accuracy, factors influencing 1030ff 95s
Computational tools Cowpea mosaic virus (CPMV) 620
3D-pharmacophore searches COX, Cychxygenase ( C O X ) 792
and high-throughput docking 362 CP, Carrierprotein (CP) 471
CP-fusion proteins different potencies 107
Index
I 1165
labeling of structurally different 107

as tool to study cell surface proteins Cyclosporine A (CsA) 304
470ff Cys residue 547
CPMV, Cowpea mosaic virus ( C P M V ) 620 Cysteine
CPS, Conditional protein splicing (CPS) modification of 597
557,559 uniquely reactive cysteine group
CRABP-1, Cellular retinoic acid binding using site-directed mutagenesis 596
protein ( C R A B P - I ) 442 Cysteine protection 546
CRABP-11, Cellular retinoic acid binding Cysteine residue
protein 11 (CRABP-lZ) 369 chemical modification of 386
CRDs, Carbohydrate recognition domains Cytochalasin and Actin 74
(CRDs) 641 actin filaments 75
CREB, C A M Presponse element binding cytochalasin
(CREB) 313 phenotype 75
CRF, Corticotrophin releasingfactor (CRF) direct link 75
955 microfilaments 75
Critical circadian rhythm hormone 394 Cytochrome P450 interactions 1005
CRKs, CDK-related kinases (CRKs) 1130 Cytoplasm
CRLR, Calcitonin receptor-like receptor apoptosis, programmed cell death
(CRLR) 948 release of, mitochondria1 cytochrome
Cross-reactive sensor analysis 685 441f
Cross-validation 1013 Cytotoxic T lymphocyte-associated protein
CRTH2, Chemoattractant 4 (CTLA-4) 1108
Receptor-Homologous Molecule
Expressed on T Helper Type 2 ( C R T H Z )
960 d
Crystallography 583 DAB, Diaminobenzidine(DAB) 449
binding modes, investigation of 844 Darwinian Era 18
CsA, Cyclosporine A (CsA) 304 genotype 19
CSPs, Chemical shiJ perturbations (CSPs) natural selection rested on analogy 18
866 Origin of Species 18
CTLA-4, Cytotoxic T lymphocyte-associated phenotype 19
protein 4 (CTLA-4) 1108 DBD, D N A binding domain ( D B D ) 895,
CTLDs, C-type lectin-like domains (CTLDs) 1122
643 DC-SIGN, Dendritic cell-spec$c intracellular
Curcuminoids 105 adhesion molecule-3-grabbino-non-integrin
isolated (DC-SIGN) 643
from turmeric 105 2DE, Two-dimensional electrophoresis (2DE)
Current Patents Fast Alert 760 405
Cyan fluorescent protein (CFP) 428 DEBS, 6-Deoxyerythronolide B Synthase
Cyclic adenosine monophosphate (CAMP) (DEBS) 523
312,938 Deciphering human genome
Cyclic guanosine monophosphate (cGMP) challenges of 801
373 Dehydratase (DH) 522
Cyclic peptides 556 Dendritic cell-specific intracellular
Cyclin-dependent Kinase 2 (CDK2) 845 adhesion
Cyclin-dependent kinases (CDKs) 1130 molecule-3-grabbino-non-integrin
Inhibitors 99 (DC-SIGN) 643
Purine Analogs 99 Deorphanization 947ff
Cyclooxygenase (COX) 792 6-Deoxyerythronolide B Synthase (DEBS)
Cyclosporin A (CsA) and FK506 107 523
biological activity schematic diagram of 524
same phenotypic 107 system, manipulation of 529
hdex
1166
I Deoxyribonucleic acid (DNA) 300,576 Diels-Alder Reaction 16
Depsipeptide HDAC inhibitors Prototype of a SyntheticallyUseful
completion of, total syntheses of Reaction 16
FK228 and FR901,375by Mitsunobu steroid synthesis 17
macrolactonization 710 in the synthesis of
total synthesis of steroids 16
macrocyclizations, and completion of structurally complex natural products
synthesis 709ff 16
Derived from Natural Repressors 175 Diethylstilbestrol (DES) 905
IPTG Diffusion ordered spectroscopy (DOSY)
stable synthetic analog 175 860
lac Difluoromethylene 389
binds to operons 175 Dihydrofolate reductase (DHFR) 460,
LacR-VP16 chimera 176 556,1123
Ligand-dependent 175 Dihydroneopterin aldolase (DHNA) 844
activators 176 2,3-dimercaptopropanesulfonate(DMPS)
repressors 176 453
Tet-On 176 Dimerization Systems 229
tetracycline 175 Homodimerization 229
DES, Diethylstilbestrol (DES) 905 Reverse Dimerization 235
Descriptors 1030 Transcription 235
1-D 1017 Dimethyl dioxirane (DMDO) 671
2-D 1017 Dimethylformamide (DMF) 539, 569
3-D based 1017 Dimethylsulfoxide (DMSO) 572
biological 501 Discoverygate 760
hydrophobic 1026 Disease biology
physicochemical 501 complete human-genome sequence
structural 501 single-gene Mendelian disorders
used for permeability predictions 300
1026ff Disulfide bonds
Desensitization 939 modification of
Desogestrel using metallocarbenoids 605ff
total synthesis 25 Dithiothreitol (DTT) 438, 602, 704
Dess-Martin Periodinane (DMP) 607 Divalent ligands 955
Desulfination 547 Diversity-orientedsynthesis (DOS) 483ff
Desulfurization reaction 546 applications and examples for 502ff
DEX, Dexamethasone ( D E X ) 1122 assessing library diversity 501f
DH, Dehydratase (DH) 522 chemical and biological space 496
DHFR, Dihydrofolate reductase ( D H F R ) chemical methodologies for 502
460,1123 of combinatorial libraries
DHNA, Dihydroneopterin aldolase ( D H N A ) early efforts in 495
844 development of 484ff
Diaminobenzidine (DAB) 449 early efforts in 492f
Diarylpropionitrile (DPN) 368 future development of 514
Diazonium salt general considerations in 496ff
coupling reactions history of 484ff
introduction of, new functional groups libraries
598 design strategies 496ff
tyrosine residues, modification of screening of 502
using electron-deficient 599 separation techniques in 487
Dictyostelium discoideum synthetic strategies 499ff
amoeboid migration 1070 planning 499ff
DIdA, Drug innovation and approval ( D I d A ) DMDO, Dimethyl dioxirane ( D M D O ) 671
706
_- DMF, Dimethylfomamide (DMF) 569
DMP, Dess-Martin Periodinane ( D M P ) biomolecular N M R spectroscopy
Index
I 1167
607 855
DMPK, drug metabolism and chemical glycomics for 668ff
phamacokinetics ( D M P K ) 796 COX-2inhibitors
DMPS, 2,3-dimercaptopropanesulfonate development of, celecoxib (Celebrex)
( D M P S ) 453 792
DMSO, Dimethylsulfoxide ( D M S O ) 572 enzyme, identification of 792
DNA, Deoxyribonucleic acid ( D N A ) 300, drugs target
576,668 N R account, in pharmaceutical sales
DNA binding domain (DBD) 895 90 1
DNA-Protein Interaction 204, 218 gene-family approach
AD-cDNA fusion 205 for protein classes 852
genes histone deacetylases (HDACs)
olfactory-specific 205 outstripping histone acetyltransferases
one-hybrid assay 204 (HATS) 696
phage display 219 isolating and synthesizing active
transcriptional activators 218 ingredient
two-hybrid assay and pharmacological experiments in
into one-hybrid system 218 parallel 793
zinc-finger evolution 219 mechanism-based discovery background
DOS, Diversity-oriented synthesis ( D O S ) 793f
48 3 propranolol, interesting development
DOSY, D i f i s i o n ordered spectroscopy of 793
( D O S Y ) 860 new rules for 379
DPN, Diarylpropionitrile ( D P N ) 368 N M R spectroscopy
DRIP, Vitamin D receptor-interactingprotein different stages of, pharmaceutical
( D R I P ) 914 research 855f
Drosophila phenotypes 937 N R drug discovery
Drospirenone tissue-selective benefits 916
combinatorial acceleration of tissue-selective benefits, examples of
preparation 28 917
screening 28 N R drugs, brief history of 901ff
leading position N R function
in hormonal contraception 27 binding druglike small molecules
synthesis 27 895
unnatural N R LBD fold, of three stacked a-helical
biologically 27 sheets 892
Drug delivery applications N R superfamily
chemical groups on entrance of reverse endocrinology approach 903
protein into reducing environments NR-targeted drug discovery
597 history of 901
Drug development nuclear receptor structure/function,
inhibition of HDACs features of 891
beneficial effect in, repressing nuclear receptor superfamily
hypertrophy 698 classic steroid receptors 897
reasons for attrition in 1005 domain organization of 893
Drug discovery features of 891ff
approaches to general mechanisms of, N R function
C-terminal 891 896
biological models key methodologies, for nuclear
discovery of, penicillin-resistant receptor-targeted drugs 891
Streptococcus pneumoniae 794 representative structures of, N R
novel anti-infective drug 794f functional modules 895
1168
I Index
Drug discovery (continued) medicinal chemists and chemical

observation-based discovery background biologists
791ff predicting molecular basis of 804ff
organic acids, ibuprofen and predictions
diclofenac 792 using nuclear magnetic resonance
penicillin discovery (NMR) 808
in historical approach 792 Druggability prediction method
recent N R drugs human genome, accessible to protein
and novel drug candidates 916ff therapeutics 819
small molecules predictions of, human druggable
new protein discovery, role in 360 genome size 818
target validation, critical factor in 355 Druggable genome
traditional approach draft human genome
differences between 802f systematic survey of 809
traditional drug discovery drug targets
differences between 802 feature-based druggability prediction
validated disease target 816
“common mechanism” target 790 initial estimates of 809
Drug discovery research druggable-binding sites
understanding of structure-based druggability analysis
molecular targets of, drug or drug of, PDB Structures 816f
candidate 1118 Drugstore and StARLITe 81 1
Drug innovation and approval (DIdA) 796 estimating size of 808ff
organization of Aventis gene family distributions
centers of expertise in 791 small-molecule druggable genome,
units of innovation 796 and protein therapeutics 820
Drug metabolism and pharmacokinetics homology-based analysis of, drug targets
(DMPK) 810ff
sharing of knowledge Hopkins and Groom’s method
and improved attrition rate 797 systematic survey of 809f
Drug molecule Orth
binding energy druggable gene families, Interpro
affinity of 806 domain assignments 810
hydrophobic surface to, binding energy protein sequence
“magic methyl” 806 uncompetitive allosteric-binding sites
Drug targets 808
accessible to, protein therapeutics Russ and Lampel’s Update 2005 810
817ff sequence and structural levels 808
approved drugs Druglike compounds
molecular targets of 811 fast Ertl method
COX-2 inhibitors with 2D approximation 807
withdrawal of drugs 355 relationship between, molecular weight
NCE approvals, antibody taking over and molecular surface area 807
818 Drugs
physicochemical constraints of 807 discovery 979
whole genomes proposed decision tree 984ff
sequencing of 355 Drugs and leads
Drug-like libraries 496f feature-based probabilistic druggability
Drugbank 760 analysis 809
Druggability homology-based analysis,
druggability argument 804 comprehensive survey of 808
druggability hypothesis structure-based amenability analysis
molecular recognition, basis of 805ff 809
DTT, Dithiothreitol (DTT) 438, 602, substitution equilibria
Index
I 1169
704 conjugates exchanged 38

Dynamic Variation 34 substitutions
activity (inhibition) 40 binary 39
afinity (binding) 40 pathways 39
activity of substrate S
a conjugate triplet 44 fluorescence-labeled 43
single molecular species 44 ternary complexes R A B , R A C , and
Base-pairing dynamics of R B C 39
single strands a, b, and c 35 Dyslipidemia 949
binary complexes
R A , R:B, and R C 39
conjugates e
A, B,andC 37 e-NOS, endothelial Nitric Oxide Synthase
equilibria 37 ( e - N O S ) 368
three sets 37 E. coli 211
dynamic system assays
heterobifunctional character 45 alternate 21 1
receptor profiling 45 transcription-based 21 1
enzyme-binding experiment 40 bacterial
exchangeability of three-hybrid 2 13
effectors 40 two-hybrid 212
receptor 40 doubling rate 211
experimen t pathway
enzyme inhibition 43 lytic/lysogenic 212
screening 43 proteins
inhibition heat shock 213
competitive (ACB:R) 41 Transcription Activation Assays 211
mixed (ACB:R+ACB:R:S) 41 yeast proteins
uncompetitive (ACB:R:S) 41 G a l l 1 212
inhibitory activity Gal4 212
color coding 43 interacting 212
degree of 43 E. coli dihydrofolate reductase (eDHFR)
interactions 1126
equilibria 38 Ebola virus
receptor R 37 viral coat proteins
specific 37 trafficking of 439
triple peptide combinations 37 EDG, Endothelial differentiation gene ( E D G )
nonbiogenic substance 942
dendrimers 44 eDHFR, E. coli dihydrofolate reductase
in place of the peptides 44 (eDHFR) 1126
pairing Edman sequencing 488
equilibrium constants 36 EDT, I,2-Ethanedithiol (EDT) 429
ternary complexes acb 36 EF-Tu, Elongationfactor (EF-Tu)
Preparation 34 271
pyranosyl-RNA (p-RNA) single strands EGF, Epidennal growth factor (EGF)
a , b , a n d c 35 938, 1065
intobinary 35 EGFP, Enhanced Green Fluorescent Protein
into ternary supermolecules 35 ( E G F P ) 466
self-assembly 35 Ehlers-Danlos syndrome
quaternary complex R A C B 39 progeroid-type 649
Screening 34 Elan pharmaceuticals
stoichiometry MVIIa Ziconotide (PrialtTM)
for maximum activity 43 novel nonopioid drug 376
1170
I Index
Electron microscopy (EM) Transcription Control by Small

fluorescently labeled proteins, imaging Molecules 174, 175, 177, 179, 181,
of 451 183,185,187,189
gap junctions of Transcriptional Regulators 175
connexin43-tetracysteine 451 Enhanced Green Fluorescent Protein
ReAsH-mediated photoconversion of (EGFP) 466
diaminobenzidine for correlated Enol reductase (ER) 522
fluorescence 451f Enolpyruvyl uridine diphosphate
Electron paramagnetic resonance (EPR) N-acetylglucosamine (EP-UNAG) 655
454 Enzyme activity
Electrophoretic mobility shift assays enzyme-catalyzed reactions
(EMSA) 513 protein-protein and protein-lipid
Electrospray ionization 670 complexes, assembly of 1061
Electrospray ionization mass spectrometry signal transduction
(ESI-MS) 569 modeling intracellular processes
Electrotopological indices 1027 1061
ELISA, Enzyme-linked immunosorbent assays outlook of 1061
(ELISA) 513,637,989 Enzyme classes
Elongation factor (EF-Tu) 271 cysteine proteases
Electron microscopy (EM) 451 useful pharmacological agents 417
EMSA, Electrophoretic mobility sh$ ussuys nondirected ABPP - probe design for
(EMSA) 513 410ff
Enabled VASP homology type 1 (EVH1) Enzyme families
nondirected strategies
969
bona fide activity-based probes for
Encephalopsin 944
411
Endocrinology
Enzyme inhibitors 979
controlling activities and processes, act
Enzyme mechanisms
of 891-901
domain folds, on molecular level 826
controlling activities and processes
Enzyme recruitment
NR superfamily, a phylogeny plot
slow diffusion of, membrane-associated
892
substrates
ligand-bound NR relays, and ligand and gradients on, molecular scale
celltype 891 1070ff
Endoplasmic reticulum (ER) 465 Enzyme-linked immunosorbant assay
Endothelial differentiation gene (EDG) (ELISA) 513,637,989
942 Enzymes 385
endothelial Nitric Oxide Synthase (e-NOS) ABPP, proteome coverage of
368 probe-labeled 422
Engineered Nuclear Receptor 185 complex physiological and pathological
Potential 185 processes 421
Engineering enzyme classes
Uniquely Inhibitable Kinases 126 whole proteomes, active site profiling
Engineering Control 174, 175, 177, 179, in 421
181,183,185,187,189 enzyme superfamily
ligand cryptic members, of enzyme classes
naturally occurring 174 42 1
ligand-dependent database (BLAST) searches 420
multiple 174 sequence-unrelated members, class
transcription 174 assignment of 420f
Over Protein Function 174, 175, 177, histone deacetylases
179,181,183,185,187,189 conserved group of 696f
proteins individual human HDAC enzymes
denovo 174 696
Index I 1171
histone modifying enzymes specificity
nonhistone proteins, regulated by gene targeting 177
acetylation status 697 Eukaryotic HDACs
history and outlook of 693 difficulty of expressing 699
EP-UNAG, Enolpyruvyl uridine diphosphate EVH1, Enabled VASP homology type 1
N-acetylglucosamine(EP-UNAG) 655 (EVHI) 969
Epidermal growth factor (EGF) 938, 1065 Evolutionary Thinking 18
Epigenetic mechanisms Darwinian Era 18
histone acetylation, schematic Darwinian evolution
representation of accepted as a reality 19
model for transcriptional control post-Darwinian Era 19
695 pre-Darwinian 18
EPL, Expressed protein ligation 385 quasispecies 19
Epothilone 519 Role of 18
cY,B-Epoxyketones 102 Shaping Biology 18
Bafilomycins and Concanamycins 103 Expanding
chemokines 103 By Design 51
chemotaxis 103 By Natural Selection 50
covalent inhibitors 102 Experimental design
downmodulation mechanism 103 and purification schemes
eponemycin 102 affinity-based purification of, small
Epoxomicin 102 molecule targets 1120
EPR, Electron paramagnetic resonance (EPR) issues of general considerations 1085ff
454 Exploit fusion proteins
ER, Endoplasmic reticulum (ER) 465, 522, chemical approaches to 458ff
902 applications and examples of 463ff
ER, Enol reductuse (ER) 522 future developments of 476f
ER, Estrogen receptor (ER) 559,902 general considerations of 459ff
Erythroid progenitor cells 1049
Expressed protein ligation (EPL) 387, 390,
Erythromycin 519
537ff
ESI-MS, Electrospray ionization mass
applications of 548ff
spectrometry (ESI-MS) 569
bottleneck of 542
EST, Expressed sequence tags (EST) 378,
general considerations in 542ff
902,944,1084
genesis of 538ff
Ester-containing linker 671
and ligation reaction 545
Estrogen receptor (ER) 559,902
and protein transsplicing 556
1,2-Ethanedithiol (EDT) 429
reactions, one-pot 548
Eukaryotes 648
segmental isotope labeling 555
examples of, posttranslational
semisynthetic nature of 550
modifications
use of, in future developments 560
at histone tails 695
Expressed Sequence Tags (EST) 378, 902,
gene-silencing mechanism
944,1084
CpG residues, methylation at 694
Exteins 540
genomic DNA of 694
Eukaryotic 177
heat-shock protein 178 f
hormone FACS, Fluorescence activated cell sorter
Steroid 178 (FACS) 435
receptors FAD, Flavin adenine dinucleotide (FAD)
ecdysone 179 655
endogenous 179 FAP-1, FAS-associatedphosphatase I
reprogram (FAP-I) 1108
ligand-binding 177 Farnesoid X receptor (FXR) 366, 511, 903
Reprogramming 177 FAS, Fatty acid synthesis (FAS) 471
1172
I Index
FAS-associated phosphatase I (FAP-1) Fluorescence labeling 465

1108 Fluorescence microscopy 677
Fatty acid synthesis (FAS) 471 Fluorescence polarization (FP) 361
FCS, Fluorescence correlation spectroscopy Fluorescence resonance energy transfer
(FCS) 361 (FRET) 291, 361,428,466,511,549,
FDA drugs 596,685,871,1132
molecular targets of Fluorescent carbohydrate 668
drug substances and drug targets, in Fluorescent carbohydrate conjugates 677
gene family 812 Fluorescent imaging plate reader (FLIPR)
FDC-PET, Fluorodeoxyglucose 947
positron-emission tomography (FDG-PET) Fluorescent Probes 548ff
304 Fluorescent proteins 548
Fetal liver kinase-1 (Flk-1) 771 Fluorescent spectroscopy 548
Fexaramine 511-512 Fluorodeoxyglucose positron-emission
FITC, Fluorescein isothiocyanate (FITC) tomography (FDG-PET) 304
446 9-fluroenylmethoxycarbonyl (Fmoc)-based
FKBP, FKS06-binding protein (FKBP) 470 SPPS 543
FKBP12-rapamycin-associated protein Fluorophore-labeled carbohydrate-binding
(FRAP) 303,1120 protein 676
FlAsH-tetracysteine complex Fluorophores 549
fluorescence anisotropy of biarsenical derivatives of 432
four arsenic-sulfur bonds 446 tetracysteine motifs, requiring 433
FlAsH-tetracysteine complexes Fluorophosphonate (FP) 409,410
fluorescent properties, and stability of Fluorous tags 485
FlAsH bound to, peptide with higher FLV, Flavopiridol (FLV) 100
affinity 434 Fmoc (fluorenylmethoxycarbonyl) 671
Flavin adenine dinucleotide (FAD) 655 Forward chemical genetics
Flavopiridol (FLV) 100 chemical-genetic screens
mechanisms 100 overlapping distance measurements
rohitukine 100 326
semisynthetic 100 computational framework
Fleming, Alexander chemical-genetic screens 326
lysozyme discovery 793 Morgan and Sturtevant, legacy of
FLIPR, Fluorescent imagingplate reader 325f
(FLIPR) 312,947 small-molecule probes for, biological
FLIPR duplex calcium mobilization assays mechanisms 348
963 target identification problem 319ff
Flk-1, Fetal liver kinase-2 (Flk-I) 771 Fosfomycin 652,653
Flow cytometry 677 FP, Fluorescence polarization (FP) 361, 409
Fluorescein isothiocyanate (FITC) 446 FP, Fluorophosphonate (FP) 409,410
Fluorescence activated cell sorter (FACS) Fragmentation codes 730
435 FRAP, FKBPZ 2-rapamycin-associated protein
FRET or ReAsH fluorescence (FRAP) 303,1120
with pooling or single-cell collection Frenolicin 525
options 436 FRET, Fluorescence resonance energy tranSfer
Fluorescence and Electron microscopy (FRET) 291,361,428,466,511,596,
(EM) 871,1132
ReAsH-mediated photoconversion a-L-Fucosidase (ALFUC) 369
diaminobenzidine, for correlated FucT-VII, Fucosyltransferase VII (FucT-VII)
fluorescence 452 1102
Fluorescence correlation spectroscopy Fumagillin
(FCS) 361 A. fumigatus 105
Fluorescence imaging plate reader (FLIPR) drug candidate
312 TNP-470 105
mechanism chemical biology
Index
I 1173
ofaction 106 molecular informatics, contribution of

p21 ' I p / WAk 106 959f
TNP-470 106 deorphanization
Functional genomics strategies for 947ff
central aim of 302 designing compound libraries 954ff
Functional Orthogonality 180 endo- 943
ligand-receptor pair family A 937
modified 180 family B 937
Requirement of 180 family C 937
Functional proteomics future developments of 9688
activity-based probes glycoprotein hormone 937
enzyme activity profiles 408 HTS, advantages in 96lff
activity-based protein profiling (ABPP) human
chemical ABPP probes 408 classification of 937
directed ABPP - probe design for families of 935
enzyme classes 409 and other genomes 943ff
directed versus nondirected strategies monoamine ligands 957
408 monoamine-related
general strategy for 409 combinatorial library for 966ff
integrity of, enzyme active sites 408 ligand binding sites model for 951
chemical probes olfactory 937
activity-based probes 408 reporter gene
chemical proteomic strategy easy-to-measure surrogate for gene
active site-directed chemical probes product 313
404 signaling of 940
click chemistry-based ABPP 419 small molecule/peptide hormone 937
second bio-orthogonal reaction, structural biology of 949ff
Staudinger ligation 419 thematic analysis 956
covalent inhibitors top selling drugs
combinatorial, or nondirected strategy chemical structures of 934, 935
forABPP 410 Venus flytrap module (VFTM) 937
serine hydrolase (SH) G protein-coupled receptor 4 (GPR4) 949
fluorophosphonate labeling of 410 G-protein-coupled receptor interacting
Fusion proteins proteins (GIPs) 943
CP-based labeling of 473 G-protein-coupled receptor kinase (GRK)
942
Future Development 222
G-protein transducin 941
dynamics
GABA8, y-aminobutyric acid type B
analyzing 223
(GABAB) 944
in living cells 223
Galectin-3
total protein 223
bound to N-acetyllactosamine 642
genetics 223
structure of 642
FXR, Fametoid X Receptor ( F X R ) 366,
Galectins 641Ff
511,903
multivalency 643
y-aminobutyric acid type B (GABA8) 944
g y-lactone aminolysis 499
G protein-coupled receptor (GPCR) 312, Ganesan and Doi-Takahashi
428,471, 647, 796, 809, 826, 852, procedures for
933 enantioselective acetate aldol
active compounds reactions, with aldehyde 707
examples of 956 syntheses of
applications and examples of 9608 spiruchostatin A seco acids 709
biological expression of 960f Gastrointestinal (GI) absorption 1005
1174
I GE-HTS, Gene expression-based high-
Index
Gene therapy
throughput screening (GE-HTS) 313 targeted nuclear acid repair
Gene expression assay for 442
selected putative target, based on Genes
differential gene expression 795 chemical events
Gene expression omnibus (GEO) 1096 regulation of 300
Gene expression profiling genes 79
using microarrays Bub 78
new technology, history and Mad 78
development of 1084f Genetic approaches
Gene expression-based high-throughput forward chemical genetics
screening (GE-HTS) 313 phenotype of interest, relies on
Gene family 309
molecular targets with, chemical leads protein targets and genetic pathways,
and tools 813 identification of 310
redundant ortholog targets 813 forward genetics
Gene microarrays classical genetic approach 309
complementary oligonucleotide novel gene products, identification of
hybridization 309
inherent specificity of 405 use of, phenotype-based screening
Gene ontology (GO) 818 308
Gene profiling forward versus reverse chemical genetics
genome-wide gene expression analysis small molecules and phenotypic
outlook of 1083 assays 310
practical considerations and new small-molecule modulator of
application to 1083ff gene product 311
microarray analysis reverse chemical-genetic approach
data analysis, principles of 1089ff for dissecting biological systems
delineating of, biological pathways 311
involved in a process 1090 reverse chemical-genetic screen
pattern-recognition algorithms, starting point, protein of interest
identifying gene expression profiles 311
1091 reverse genetics
supervised methods, using “training phenotypic consequences of,
set” 1092 mutations in known gene 309
support vector machines (SVMs), use Genetic Code
of 1092 Cracking 50
public databases for Expanding 50
gene expression data 1095f Genetic Disease 186
T-cell subsets Complementation/Rescue 186
application and practical examples of compounds
1097ff Computer-aided design 188
unsupervised learning approach that rescue mutations 188
K-means clustering 1091 hormone
Gene profiling T helper cell differentiation analogspecific forms 186
Thl and Th2 cells, developing from nuclear/steroid 186
common precursor 1098 receptors 186
Gene regulation hormone analogs
altered patterns of, protein expression designed 187
694 interface
epigenetic mechanisms of 694ff receptor-hormone 187
and role of, activity enhancing accessory mutations
proteins 913f genetic disease 186
Gene regulatory networks 1046 in nuclear receptors 186
Genetic diversity Glycoconjugate biosynthesis 635
Index
I 1175
chemical mutagens importance of 649

ethylnitrosourea capable of, inducing Glycoconjugates 636, 658, 668, 669
point mutations 318 N-linked 649
genetic vs. chemical diversity Glycogen Synthase Kinase-3B (GSK-3B)
phenotypic variation, sources of 509
318f Glycomimetics 641, 647
Herman J. Muller carbohydrate-derived 639
heritable mutations, in Drosophila strategies for 640
318 Glycoprotein microarrays 676
Genomic age p-Glycoprotein protein (pgp)-1 714
generating information Glycoproteins
and approximate upper or common Hedgehog 937
mechanism curve 796 Wnt 937
Genomic approach Glycosidic linkages 639
mRNA transcript levels, reliance on Glycosyl phosphate monomers 671
404 Glycosyl phosphates 671
Genomics unified schema (GUS) 1096 Glycosyl trichloroacetimidates 671
GEO, Gene expression omnibus (GEO) Glycosylating agents 671
1096 Glycosylation 550
GFP, Greenrfluorescent protein (GFP) 314, Glycosylation reactions 671
458,612 Glycosylphosphatidylinositolis (GPI) 678
see Greenfluorescent protein (GFP) 427 Glycosyltransferase
GHRF, Growth hormone releasingfactor loss of 635
(GHRF) 955 Glycosyltransferases 668
GHS, Growth hormone secretagogue ( G H S ) GO, Gene ontology ( G O ) 818
950 Golgi-ER 85
GIPs, G-protein-coupled receptor interacting dynamic nature of 87
proteins (GIPs) 943 invitro 87
GITR, Glucocorticoid-induced tumor necrosis transport 87
factor receptor (GITR) 1108 GPCR, G-protein coupled receptor (GPCR)
Global organizations 312,428,471,796,809,826,852,933
CBP project GPR4, G protein-coupled receptor 4 (GPR4)
scenario for 795 949
observation summary and future GR, Glucocorticoid receptor ( G R ) 467, 902,
application 795f 1122
Glucocorticoid receptor (GR) 467,902, Grave’s disease 969
1122 GRE, Glucocorticoid response element (GRE)
Glucocorticoid response element (GRE) 913
913 Green fluorescent protein (GFP) 314,
Glucocorticoid-induced tumor necrosis 427,458,548,612
factor receptor (GITR) 1108 FRET sensors of biochemical pathways
Glucose signaling 505ff replacing CFP with FlAsH 440f
Glutathione S-transferase (GST) 446 relative sizes of
Gluthation S-transferase fusion protein and biarsenical-tetracysteine complex
859 428
GlyCAM-1 551 GRK, G-protein-coupled receptor kinase
Glycan biosynthesis (GRK) 942
inhibitors of 651 Growth hormone releasing factor (GHRF)
Glycine 554 955
Glycoarrays 636 Growth hormone secretagogue (GHS)
Glycobiology 950
tools for 674 GSK-3,9, Glycogen Synthase Kinase-3B
Glycocalix 669 ( G S K - 3 B ) 509
1176
I Index
GST, Glutathione S-transferase (GST) 446, side chains

859 alkyl or aryl 260
GTPases to XTPases 128 terphenyl derivatives
mutation cylindrical shape 262
aspartate to the asparagine 129 with side chains 261
D138N 130 staggered conformation 262
nucleotides structural mimetics 261
radiolabeled 130 synthetic inhibitor 261
orthogonal nucleotide Terphenyl-based 260
specificity 129 Heparin 681,683
translation experiments Heparin-protein interactions 684
invitro 129 Hepatocyte nuclear factors 4 (HNF4s) 906
Guanidinoglycosides 681 hERG, Human Ether-a-Go-Go-Related Gene
GUS, Genomics unijied schema ( G U S ) @ERG) 1005
1096 Hetero-oligomers 981
Heterodimerization 230, 949
Ligand-Protein Pairs 231
h rapamycin
H1 histamine receptor 778 heterodimerizer 230
Halobacterium halobium 941 Heterodimerizers 233
HATs, Histone acetyltransferases (HATs) bump- hole
694 solutions 234
HDAC, Histone deacetylase ( H D A C ) 505, Bumped 233
693f, 914,1131 Ma-rap
Heat shock proteins (hsps) 896 in vivo 235
Hedgehog signaling pathway 509 preclude 235
HeLa cells Rapalogs 233
FlAsH fluorescence C16-substituted 234
specificity of FlAsH staining 444 rapamycin
turnover of, Connexin43 in gap C l 6 methoxy 234
junctions C20-methallyl 234
two-color pulse chase 443f Heterodimers 944, 948
Helical Mimetics 260 HF, Hydrofluoric acid ( H F ) 569
a-helix mimetics Hidden Markov Model (HMM) 959
BH3 domain 261 High performance liquid chromatography
of the Bak protein 261 (HPLC) 369,569
assay orexin-A and orexin-B, existence of 369
fluorescence polarization 261 High-throughput screening (HTS) 355,
that Disrupt the Bcl-xL/Bak Interaction 484,724,760,933,947,1003
260 Histacin 505, 508f
HEK293 cells 262 Histone acetyltransferases (HATs) 694
pathway Histone deacetylase (HDAC) 96, 505,
apoptotic 261 SOSf, 694,914,1131
blocking 261 Apicidin 98
protein Inhibitors 96, 508
p53 263 Modifications 96
tumor suppressor 263 Trapoxin 98
protein surface Trichostatin A (TSA) 97
shallow cleft 261 Historical Periods 12
scaffold advancements
synthetic agents 260 discontinuities 12
terphenyl 260 of Chemical Synthesis 1 2
secondary structures firstphase 12
a-helical 260 pre- Woodwardian 12
Index
scientific potentially druggable proteins
I 1177
technological 12 in druggable gene families 809

Woodwardian 14 H PLC, High performance liquid
HIV, H u m a n immunodeficiency virus ( H I V ) chromatography ( H P L C ) 369,434, 569,
583 954
HIV Protease (HIV PR) 116 hsps, Heat shock proteins (hsps) 896
drugs HTRF, Homogeneous time resolved
indinavir 116 Juorescence ( H T R F ) 361
nelfinavir 116 HTS, High-throughput screening ( H T S )
Inhibition 116 355,484,933,947,1003
mutants Human enzymes
HIVPR 118 human histone deacetylase (HDAC)
inhibitor resistant 118 inhibitors
V82A 118 depsipeptide HDAC inhibitors 703f
mutation Human Ether-a-Go-Go-RelatedGene
alanine-to-valine 118 (hERG) 1005
coevolve 119 Human genome
in the enzyme 119 computer-aided drug design methods
at the NC-pl cleavage site 118 docking compounds into binding
atP2 118 pockets 368
in the substrate 119 deorphanizing receptors
Substrate Selectivity 116 by reverse pharmacology 369f
HIV-1, H u m a n immunodeficiency virus type finished euchromatic sequence of
1 ( H I V - I ) 445 1084
HIV-1 matrix protein high-throughput synthesis and
synthesis with an N-terminal myristoyl screening, and structure-driven drug
584 design 825
HMM, Hidden Markov Model ( H M M ) Hopkins and Groom
959 druggable target, estimating size of
HNF4s, Hepatocyte nuclearfactors 4 808
( H N F 4 s ) 906 isotype-selective small molecule probes
HOBT, Hydroxybenzotriazole ( H O B T ) computational design of 367ff
595 isotype-selective probes for E R a and
Homer scaffolding proteins 969 ERB 368
Homo-oligomers 981 methodologies and approaches
Homodimerization 229 for druggable portions of targets 808
clustering order 230 orphan nuclear receptors
FK1012 design 230 isotype-selective small molecule
Heterodimerization 230 probes for 366f
Homodimerizers 233 reverse chemical genetics
AP1903 sequencing of 378
i n vivo studies 233 reverse pharmacology
affinity 233 strategy of 370
selectivity 233 selective tool compounds for
Bumped 233 farnesoid X receptor 367
Homogeneous time resolved fluorescence sequencing of 825ff
(HTRF) 361 sequencing of, protein kinases 853
Hopkins and Groom target families, drug candidates of 827
Investigational Drugs Database and target validation
Pharma Projects database pharmacological approach of 376ff
399 nonredundant molecular targets, Human histone deacetylase (HDAC) 693
identification of 810 depsipeptide HDAC inhibitors
identification of, 399 nonredundant Evans’ chiral auxiliary, with
molecular targets 809 chloroacetate 705
1178
I Index
Human histone deacetylase (HDAC) relative expression levels of

(continued) by qPCR in series of, cancer cell lines
drug discovery targets 702
class I and class I1 HDACs 697f selectivity in, classical metal-binding
HDAC inhibitors, in infectious HDAC inhibitor 703
diseases 698 sequence homology between
investigations into, HDAC inhibitors mammalian HDACs, and bacterial
698 HDAC-like protein (HDLP) 700
small molecule HDAC inhibitors, simplest HDAC inhibitors
study of 697 in clinical trials, anticancer agents
function in, eukaryotic cell regulation 701
693 short chain carboxylic acids 700
growing set of, therapeutic indications total synthesis of
693 depsipeptide HDAC
histone acetylation inhibitors - routes to, p-hydroxy
immunoblotting analysis 716 acid fragment 704ff
in spiruchostatin A-, or TSA- treated X-ray structure of, bacterial histone
cells 715 deacetylase-likeprotein
induction of, pgp-1 RNA expression homologous to human class I HDACs
and expression of pgp-1 699
RNA, analyzed using Q-RT-PCR Human immunodeficiency virus (HIV)
715 583
natural product, bicyclic depsipeptide Human immunodeficiency virus type 1
family of 693 (HIV-1) 445
natural products, FK228 synthesis, intracellular site of
in advanced clinical trials for cancer probing of 445
693 Human nuclear receptor superfamily
Parkinson’s and Huntington’s disease classic RXR-heterodimer receptors
HDAC inhibitors for, thyroid hormone receptor (TR)
neurodegenerative ailment 898
treatment 698 classical receptors
transient histone acetylation to more recently discovered family
associated with, “pulse” treatment of members 900
cells 716 ligands and therapeutic utilities,
Human histone deacetylase (HDAC) examples of 897
inhibitors role in, neuronal development
bicyclic depsipeptide HDAC inhibitors (COUP-TFI)
703 and vascular development
depsipeptide HDAC inhibitors (COUP-TFII) 899
Simon’s aldol reaction 706 Huuskonen aqueous solubility dataset
Wentworth-Janda synthesis 1026
705 Huuskonen dataset 1023, 1037
HDAC inhibitors, third family of HxBP, Hydroxanzate-benzophenone (HxBP)
cyclic tetrapeptide natural products 420
70 1 Hybrid carbohydrate 676
hydroxamic acids Hydrofluoric acid (HF) 569
excellent metal-binding chelators Hydrogen-suppressed molecular graphs
700 72 7
lead small molecule inhibitors of Hydrophobic descriptors 1026
zinc-dependent class I and class 11 Hydroxamate-benzophenone (HxBP)
HDACs 698ff 420
peptide synthesis Hydroxybenzotriazole (HOBT) 595
and formation of seco-hydroxy acid Hypothesis generation 724
706ff Hypothesis testing 724
I Insulin-dependent diabetes mellitus
Index
I 1179
ICAT, Isotope-coded afinip tagging (ICAT) (IDDM) 1097

406 Intein 540
ICOS, Inducible costimulator IICOS) 1109 Interferon-y (IFN-y) 1097
IDDM, Insulin-dependent diabetes mkllitus Interleukin (IL) 1097
( I D D M ) 1097 Interleukin 2 (IL-2) 1063
IFN-y, Interferon-y ( I F N - y ) 1097 Interleukin-8 (IL-8) 582
IL, Interleukin ( I L ) 1097 International Union of Pure and Applied
IL-2, Interleukin 2 ( I L - 2 ) 1063 Chemistry (IUPAC) 770
IL-8, Interleukin-8 (IL-8) 582 Intestinal drug absorption
Immune dysregulation, factors influencing 1008
polyendocninopathy, enteropathy, fraction absorbed 1021f
X-linked (IPEX) 1107 in silico models 1026ff
Immunological response 668 permeability 1020f
Immunology vs. human fraction absorbed 1032
regulatory CD4+ CD25+ T lymphocytes in silico models 1021, 1026ff
by gene expression profiling ll06ff prediction of
T-cell subsets physiological factors and experimental
Rudensky laboratory findings 1107 parameters influencing 1018ff
T-cell subsets, overview of solubility 1018ff
by gene expression profiling 1106 in silico models 1020, 1022ff
Immunosuppressant 106 salting-in effect 1020
Cyclosporin A (CsA)and FK 506,107 Intestinal permeability 1007f
pathways IPEX, Immune dysregulation,
signal transduction 107 polyendocninopathy, enteropathy, X-linked
in T lymphocytes 107 (IPEX) 1107
Rapamycin 108 IRK, Insulin receptor kinase 385,855
IMPACT (intein-mediated purification IS, Inhibitory switch ( I S ) 855
with an affinity chitin binding tag) Isotope-coded affinity tagging (ICAT) 406
system 544, 545 I sotopes
in the synthesis of 16 stable 555
estrone 17
Inducible costimulator (ICOS) 1109
Inflammatory diseases J
James Black
transcriptional networks in gene alkyl-substituted histamine analogs
profiling beta-blockers, development of 359
of T-cell subsets 1097 Janus kinase-signal transduction and
Informatic tools activator of transcription (JAK-STAT)
development of 1009 pathway 1046,1049
Inhibitory switch (IS) 855 J I A , Juvenile idiopathic arthritis (JIA) 1102
Inpharmatica’s Drugstore Joshua Ledenberg
relational database genetic recombination
FDA approved drugs 811 discovery of 300
Inpharmatica’s Drugstore database Journal of Medicinal Chemistry (JMC)
predicting dmggability on, protein drug
761
targets 817 Jurkat cell surfaces
Inpharmatica’s StARLITe database
chemospecific labeling of 618
gene family distribution of, human
Juvenile idiopathic arthritis (JIA) 1102
proteins with small-molecule chemical
leads 814
Insulin receptor kinase (IRK) 397,855 k
Insulin receptor kinase (IRK) inhibitors KOpioid Receptor (KOR) 365
398 Kaposi’s sarcomagenesis 947
Insulin receptor tyrosine kinase 399 Kenograms 727-
Index
1180
I Ketones and azides I
unnatural functional groups L-type Calcium Channel Signaling
through posttranslational modification 130
614 assay
Ketoreductase (KR) 522 radioligand-binding 132
Ketosynthase (KS) 520 calcium channel
Kinase DHP-resistant 133
amendable kinases to, NMR-guided dmg L-type 133
discovery 852 T1006Y mutant 133
cancer patients calcium channels
antineoplastic drugs 122 Voltage-gated 131
as drug targets 856 calcium signal
imatinib targets act locally 131
Bcr-Abl 123 chimeric channels 132
c-Abl 123 photoaffinity labels 132
c-Kit 123 Resistance Mutations 130
kinases 123 single protein
PDGFR 123 uniquely resistant to a general
inhibitor inhibitor 131
BAY43- 9006,125 Lactacystin 101
Bcr-Abl tyrosine kinase 123 a,B-Epoxyketones 102
imatinib 123 analog 101
of (VEGFR) 125 nonspecific
Inhibitors 122 inhibitor 101
ligand binding TMC-95A 103
binding mechanisms by lineshape Lag-3, Lymphocyte activation gene-3 (Lag-3)
analysis 874f 1108
mechanism LBD, Ligand-binding domain (LBD) 366,
imatinib resistance 123 559,892,1122
mutation LC-MS, Liquid chromatography-mass
control ligand selectivity 124 spectrometry ( L C - M S ) 408
T315I 124 Le” - Ley nonasaccharide 672
Philadelphia chromosome 123 Ley-Le” nonasaccharide 671
protein NMR spectroscopy 856ff, Lead identification (LI) 795
875 Leptomicin B 1056
Bruton’s Tyrosine Kinase (BTK) Lessons
858 From 55
Resistance 122 Patchouli Alcohol 55
single kinase Published Total Syntheses 55
cancers 125 Quinine 56
catalytic activity of 125 Lewis antigens 671
tumour-specific kinase inhibitors dimeric combinations of 671
cancer patients, therapeutic Lewis hexasaccharide 672
opportunities for 852 Lewis X pentasaccharide 671,672
Kinase CBP Lewis Y hexasaccharide 671
establishment of, core panel kinases LI, Lead identijcation ( L r ) 795
799 Library synthesis
kinase insert domain-containing receptor guidelines for 493
(KDR) 771 Ligand
Kinase-substrate interactions 388 binding energy potential of 806
KOR, K Opioid Receptor ( K O R ) physicochemical characteristics of,
365 binding site 806
KR, Ketoreductase ( K R ) 522 small molecule ligand-binding sites
KS, Ketosynthase ( K S ) 520 808
thermodynamic argument light-activated
Index
I 1181
thermodynamics and selection transcription 189

pressure, for ligand interactions translation 189
806 nuclear receptor agonists
Ligand binding photocaging 190
ER ligand discovery small molecules
ER-directed drug discovery 918 gene expression 190
ER-selectivemolecule 918 photocaged 189
ligand on N R LBD conformation, Line notations 730
influence of 909ff Lipinski
LXRB LBD Dement World Drug Index
structure and features of 909 concept of, physicochemical property
multitude of, ligand-induced N R actions limits to drugs 805
913ff Lipinski’s rule-of-five 805
Ligand Selectivelyof Ion Channels 130 “rule-of-five” (Ro5) 766
Capsaicin 133 commonly used guidelines of 826
Engineering 130 Lipophilicity 1026
L-type Calcium Channel Signaling 130 Liquid chromatography-mass spectrometry
Ligand-binding domain (LBD) 366, 559, (LC-MS) 408
892,1122 Low-molecular-weightcompounds
Ligand-binding Pocket synthesis of 99Gff
de novo LXRs, Liver X receptors (LXRs) 905
binding sites 189 Lymphocyte activation gene-3 (Lag-3)
De Novo Design 188 1108
into proteins 188 Lymphocytes 681
zinc finger domains Lysine
inducible 189 residues
Ligand-binding Pockets 188 modification through, reductive
Ligand-dependent Activators 177 alkylation 595
Exploiting 177 Lysozyme 385
Prokaryotic 177
receptors
quorum-sensing 177 m
Ligand-Protein Pairs 231 M3H, Mammalian 3 H ( M 3 H ) 1132
Bumps and Holes 231 mAb, Monoclonal antibody (rnAb) 337
modified ligand 231 MAGE-ML, Microarray gene expression
steric clash 231 markup language ( M A G E - M L ) 1094
Heterodimerizers 233 Magnetic resonance imaging (MRI) 438
Homodimerizers 233 Major histocompatibility complex (MHC)
Refining 231 1098
Ligand-receptor interactions MALDI, Matrix assisted laser
molecular modeling of 949ff desorptionlionization spectrometry
Ligation (MALDI) 569
sequential 545 Maltose binding protein (MBP) 558
single 545 Mammalian 3H (M3H) 1132
strategies of 547f Mammalian protein-protein interaction
Ligation reaction 546 trap (MAPPIT) 1132
Light-activated Gene Expression 189 Mammalian small molecule-protein
cell interaction trap (MASPIT) 1133
cultured 190 Mammalian target of rapamycin (mTOR)
monolayer 190 303
duration of Mannich reaction
reporter gene response 190 not targeting cysteine, or lysine residues
from Small Molecules 189 601
1182
I Index
Mannose-binding bacteria 685 Melatonin

Mannose-binding proteins (MBPs) 643 pineal gland biosynthesis of 394
MAP, Mitogen-activatedprotein 861, 943, Members of Later Generations 24
1073 Desogestrel 24
MAP kinase activation 393 Drospirenone 25
MAP, Multiantigenicpeptide ( M A P ) 585, exogenous gestagen
861,943,1073 new 24
MAPKAP-2, Mitogen-activated protein Gestoden 24
kinase-activated protein norethindrone 28
kinase-Z(MAPKAP-2) 859 trial and error approach 24
MAPPIT, Mammalian protein-protein Members of the First Generation 22
interaction trap ( M A P P I T ) 1132 Norethindrone
MASPIT, Mammalian small from estrone-methylether by partial
molecule-protein interaction trap synthesis 22
( M A S P I T ) 1133 gestagenic component 22
Mass spectrometry (MS) 405 Members of the Second Generation 23
Mathematical biology 1048 ethyl group in C(13) 23
Mathematical modeling 1045 gestagen (-)-norgestrel31b 23
Mathematical models total synthesis 23
in silico biology 1047 Mendel, Gregor
Matrix assisted laser desorption/ionization discovery of “heritable factors” 300
spectrometry (MALDI) 569 genetic maps
Matrix metalloproteases (MMPs) 420, law of independent assortment 326
1105 2-Mercaptoethane sulfonate (MES) 434
Maximum recommended therapeutic dose 2-Mercaptoethansulfonic Acid (MESNA)
(MRTD) 776 545
MBP, Maltose binding protein (MBP) 558 2-(2-(2-Mercaptoethoxy)ethoxy)ethanol
MBPs, Mannose-binding proteins (MBPs) 674,675
643 Merrifield’sresin 671
MC4, Melanocortin-4 ( M C 4 ) 950 MES, 2-Mercaptoethane sulfonate ( M E S )
MCF7 cells 771 434
MCH2, Melanin-concentrating hormone Messenger Ribonucleic Acid (mRNA) 299
subtype 2 ( M C H 2 ) 943 Metabolic pathways
MDL Drug Data Report (MDDR) 760 amplified sensitivity to stimulus
Mechanisms of action (MoA) 1119 enzyme-mediated covalent
Medicinal chemistry modifications 1073
ligand-NR recognition enzymejsubstrate
structure of, GR LBD and ligand compartmentalization, effects of
binding features 904 1073
ligand-NR recognition, basic principles Metabolic systems 1046
of 903ff connectivity theorems 1046
RXR-heterodimer receptors control theory for 1046
PPARs, RXR, LXR, FXR 905ff robustness of 1046
small-molecule modulator summation 1046
biological target of interest 804 Metabotropic Glutamate Receptor (mGluR)
steroid and RXR-heterodimer receptors 935
“orphan” receptors 906ff Metalloproteases (MPs) 419
steroid receptors activity-based probes for
ligand-binding pockets of 903ff proteomic profiling of 419f
Melanin-concentrating hormone subtype 2 Metastasis 668
(MCH2) 943 Methotrexate (MTX) 460,1123
Melanine stimulating factor (MSF) 955 Methylene 389
Melanocortin-4 (MC4) 950 MFCs, MTX-fision compounds (MFCs)
Melanopsin 944 1123
MGED, Microarray gene expression data antagonists
lndex
I 1183
(MGED) 1094 potency 265

mGluR, Metabotropic Glutamate Receptor Applications 255
(mGluR) 935 complexation
MHC, Major histocompatibility complex receptor-ligand 265
( M H C ) 1098 drug design
MIAME, Minimum information about a computer-aided 264
microarray experiment ( M I A M E ) structure-based 253, 264
1094 hotspot 251
Microarray data interactions
MGED Ontology protein-peptide 254
MGED guidelines, compliance with protein-protein 254
1094 thermodynamic 254
standard terms for, annotation of interface
microarray experiments 1094 barnase-barstar 254
Nature and Cell protein-protein 254
requiring authors to submit interfaces
microarray data, for public analysis 255
repository 1094 interfacial residues 252
Microarray data analysis as Modulators of Protein-Protein 250
mathematicians generating, dedicated nonpeptide agents 252
algorithms and tools 1084 protein
Microarray experiments clefts or cavities 250
context-dependent Protein Secondary Structure 250
standardization toward 1094f as Protein-Ligand Interactions 250
experimental designs protein-protein
gene expression levels, estimation of association 253
1085 disrupters 253
loop design, of Kerr and Churchill mechanism 254
1085 screening methods
reference sample 1085ff mass spectrometry 264
use of, common reference sample N M R 264
1085 small molecule 250
gene expression small molecules
interplatform comparison of results druglike 251
1091ff structural mimetics of
Microarray gene expression data (MGED) @-helices 251
1094 B-turns 251
Microarray gene expression markup strands 251
language (MAGE-ML) 1094 synthetic agents
Microarray technology in drug discovery 250
transcriptome (cDNA sequences) synthetic inhibitors 251
knowledge of 1084 Mineralocorticoid receptor (MR) 903
Microarrays 668 Minimum information about a microarray
and binding events 674 experiment (MIAME) 1094
ordered array of DNA sequences Mitogen-activated protein (MAP) 861,
technology revealing, physiology of 943,1073
cells and tissues 1083 linear picture of
Microsequencing signal transmission 1073
of small peptide 941 Mitogen-activated protein kinase-activated
Microsphere arrays 676 protein kinase-2 (MAPKAP-2) 859
Mimetics 250 Mitogen-activated protein (MAP)-kinase
anchor pathways 1046
low-affinity 265 Mixture synthesis 488f
1184
I Index
MLR, Multiple linear regression ( M L R ) mRNA, Messenger Ribonucleic Acid ( m R N A )
1011 299
MMPs, Matrix metalloproteases ( M M P s ) MRTD, M a x i m u m recommended therapeutic
420,1105 dose ( M R T D ) 776
MoA, Mechanisms ofaction ( M o A j 1119 MS, Mass spectrometry ( M S ) 405
MOBILE, Modeling binding sites including MSF, Melanine stimulatingfactor ( M S F )
ligand information explicitly ( M O B I L E ) 955
952 mTOR, Mammalian target of rapamycin
Molecular biology ( m T O R ) 303
new techniques MTX, Methotrexate ( M T X ) 460, 1123
emergence of 360 MTX-fusion compounds (MFCs) 1123
Molecular cloning 935,941 hybrid ligand
Molecular connection table 730 DBD-fusion protein and AD-fusion
Molecular encoding protein, associating with 1124
molecular tags 33 MudPIT, Multidimensional protein
Molecular genetics identijcation technology ( M u d P I T ) 406
biological systems, understanding of Multiantigenic peptide (MAP) 585
300 Multicomponent 28
Molecular graph 727 asthmatic
types of 727 controlling 29
Molecular information systems 959 inflammation 29
Molecular Libraries Initiative (MLI) 760 Dynamic Variation 34
Molecular mechanisms focused variation
chemical-genomic profiling 340ff cluster ofpoints 31
small-molecule perturbagens (SMPs) combinatorial approach 31
344 natural products 29
WT strain of the budding yeast 342 non-natural ligands
mitosis and spindle assembly 336ff action on the immune system 30
chemical-genetic screens for, collection of 30
inhibitors of mitosis 336 synthesized independently 30
molecular toolbox signal carriers
intracellular protein acetylation cascade of 29
338ff, 343 immunosuppressants 29
selective inhibitors of, a-tubulin initiated by allergens 29
(tubacin) and histone deacetylation T-cell overproduction 29
342 signaling pathways
Molecular properties pharmacological treatment 29
for solubility and permeability 1006 Simultaneous Procedure 28
Molecules Static Variation 31
assessing druglike properties 806 variant
quantitative approach collective screening 28
"rule-of-five"index 807 population 28
assessing druglike properties of 806 restricted 28
Monoclonal antibody (mAb) 337 Multidimensional protein identification
Monomeric red-fluorescent protein technology (MudPIT) 406
(mRFP) Multiple linear regression (MLR) 1011,
ReAsH-mediated CALI of Connexin43 1036
and L-type calcium channels 450f Multiresidue Protein Caging 150
Monomeric sugar mimics dynamics
use of 639 in actin filament 151
MPs, Metalloproteases ( M P s ) 419 local perturbation 151
MR, Minerulocorticoid receptor (MR) 903 G-actin conjugates 151
MRI, Magnetic resonance imaging (MRI) o-nitrobenzylgroup
438 toward specific residues 150
Multiscaffold libraries Natural product-like libraries 497ff
Index
I 1185
early efforts toward 495 Natural Products 95

MurA 651 bioassay screening
MurB inhibitors 656 cell-based 109
MurG inhibitors 653 natural products 109
Mutagenesis cell systems
site-directed 567,988 model 96
Mutagenic analysis 386 perturbing 96
Mutant bacteria 685 chemical genetics 95
Mutant inteins 542 protein
Mutants inhibit 95
classes of 389 knockout 95
mutation 118 Small molecules
Mutation genetics conditional alleles 95
forward chemical genetics 356 to Unravel Biological Mechanisms 71
phenotypes or biomarkers 356 to Unravel Cell Biology 95
Mycobacterial cell wall NBEs, New biological entities (NBEs) 811
components of 651 NCBI, National centerfor biotechnology
information ( N C B I ) 1096
NCEs, New chemical entities (NCEs) 811
n NCL, Native chemical ligation ( N C L ) 601
N-hydroxy succinimidyl ester (NHS) 453 NCoR, Nuclear receptor corepressor (NCoR)
N-myristoylated HIV-1 matrix protein 914
synthesis from three peptide segments Nerve growth factor-induced B (NGFIB)
583f 906
N-terminal Cys 387 Nestler, Hans Peter
N-terminal cysteine chemical biology Book of Knowledge
alternative to 546 recommendations from 800
N-terminal cysteine residues Network connectivity
protecting groups for 546 FOXOla nuclear export
Na+/H+ Exchanger Regulatory Factor nucleocytoplasmic transport 324
(NHERF) 943 small-molecule probes
NAD+, Nicotinamide adenine dinucleotide relationship between 323ff
( N A D + ) 696 Neural networks (NNs) 1013,1037
Narcolepsy backpropagation 1013
orexin Neurons
sleep and wakefulness, regulation of glutamate receptors
370 activity dependant turnover and
National center for biotechnology trafficking of 443ff
information (NCBI) 1096 Neuropeptide Y (NPY) 955
Native chemical ligation 387 Neuropilin-1 (Nrpl) 1108
auxillary mediated 577 New biological entities (NBEs) 811
to yield noncysteine ligation products New chemical entities (NCEs) 811
577 New Ligand Specificities 179
Native chemical ligation (NCL) 540, 601 bump and hole 179
mechanism of 541 chemical inducers of dimerization (CID)
protein a-thioesters 542 179
for protein semisynthesis 540 Engineering 179
Native peptide bonds intoNHRs 179
chemoselective ligation to form 574ff New molecular entities (NMEs) 811
Natural amino acids NF-KB, Nuclearfactor kappa B ( N F - K B )
new bioconjugation methods 895
targeting of 597ff NF-AT, Nuclearfactor ofactivated T cell
Natural Killer (NK) 370, 1104 (NF-AT) 304
1186
I NGFIB, Nerve growthfactor-induced B
Index
statistics of amino acids 869

(NGFIB) 906 ribbon representation of, protein kinase
NHERF, Na+/H+Exchanger Regulatory PKA
Factor ( N H E R F ) 943 p38 MAP kinase, and N-lobe, C-lobe,
NHRs 185 ATP-binding site 869
actions of NHRs NMR methods
extranuclear 185 activation and substrate binding
nongenomic 185 protein phosphorylation 873
Chemical Biology 185 eight kinase-targeted oncology drugs
pathways 852
cellular signaling 186 kinases
Vitamin D activation and substrate binding
analogs 186 871ff
NHS, N-hydro? succinimidyl ester ( N H S ) kinases, chemical biology outlook 852
453 NMR-based screening trials 852
Niacin 949 applicable tool (LIGDOCK) 852
Nicotinamide adenine dinucleotide protein kinases
(NAD+) 696 structure-guided drug design 852ff
Nitric oxide (NO) 373 NMR, Nuclear magnetic resonance ( N M R )
Nitrilotriacetate (NTA) 471 362,583,808,954,990
2-nitrobenzyl 141 NMR spectroscopy
kinetics of chemical biology of kinases, studies of
muscle contraction 141 852ff
Nitrobenzyl and Nitrophenyl 140
fragment approach
o-nitrobenzyl 141 fragment linking, building scaffolds of
2-nitrobenzyl 141
complex compound 877ff
applications
fragment-based hits
invivo 145
M detected NMR fragment approach
cage
880
coumarin-based 146
peptides 146 NMR-basedfragment approach 881
proteins 146 fragment-based hits, strategy of 879ff
derivatives kinases
alcohol 141 NMR-based screening 876,877
aldehyde 141 screening techniques/strategies 875
electron-donating groups titrations curves, indicating different
to the aromatic moiety 143 binding mechanisms 875
formation of kinases, screening of 875ff, 882
diastereomers 144 ligand-detected NMR screening
isomeric NMR reporter screening 878f
nitroaromatic 145 NNs, Neural networks ( N N s ) 1013, 1037
photo-by-product 145 NO, Nitric oxide ( N O ) 373
protecting groups Nonlinear protein structures
photolabile 144 synthesis of 584ff
o-nitrobenzyl 141 nonpolar surface area (NPSA) 766,1027
effect of nonribosomal peptide synthesis 471
electronic nature 144 nonribosomal peptide synthetase (NRPS)
release kinetics 143 522
Nitrocellulose coated slides 676 Nonsteroidal anti-inflammatory drugs
NK, Natural Killer ( N K ) 370, 1104 (NSAIDs) 792
NMEs, New molecular entities (NMEs) 811 Noonan syndrome 391
NMR investigations Novartis TAM
kinases combinatorial libraries
protein-based results of 867ff prototype structures of 967
NPSA, Nonpolar surface area (NPSA) Oligomerization
lndex
I 1187
1027 ofGPCRs 954

NPY, Neuropeptide Y (NPY) 955 Oligomers 981
N R Chemical biology Oligonucleotides 567
human NRs Oligosaccharide conjugate vaccines
structural class 923 malaria and HIV 677
NR modulation Oligosaccharide sequencing 669
concept of 919f Oligosaccharides 550,636,637,669
NR, Nuclear hormone receptor (NR) 891 automated assembly of 670
NR research and drug discovery chain length of 669
new approaches to 920ff Oncostatin M (OSM) 1101
microarray technology 921 One-pot EPL reactions 548
Nrpl, Neuropilin-1 (Nrpl) 1108 Ontology working group (OGW) 1094
NRPS, Nonribosomal peptide synthesis Open reading frame (ORF) 1126
(NRPS) 471,522 Opsins 937,944
NSAIDs, Nonsteroidal anti-inflammatory Oral Contraceptives 2 1
drugs ( N S A I D s ) 792 estrogenic 19-nor-steroid
NTA, Nitrilotriacetate (NTA) 471 Binding of a gestagen 22
Nuclear factor kappa B (NF-KB) 895 hand-and-glove metaphor 22
Nuclear factor of activated T cell (NF-AT) Members of Later Generations 24
304 Members of the First Generation 22
Nuclear hormone receptor (NR) 891 Members of the Second Generation 23
nonnuclear functions and interactions, ORF, Open readingf/ame (ORF) 1126
with other cellular proteins 915 Organic chemistry
NR drugs and novel drug candidates synthetic organic chemistry
examples of 916ff strategies for, construction of complex
NR genes, identification in humans natural products 593
891 Organic solvent
Nuclear magnetic resonance (NMR) 362, auxillary mediated segment
583,808,954,990 condensation 571
Nuclear receptor corepressor (NCoR) 914 Organic synthesis
Nuclear Receptor Engineering 183 sophisticated tools of 567
by Selection 183 Orphan receptors 949
NHR mutants OSM, Oncostatin M ( O S M ) 1101
screening 183 Ovarian cancer G protein-coupled receptor
selecting 183 1 (OGR1) 949
selectivities 184 Oxidative coupling
Nucleic acid-nucleic acid interactions reactions, aniline functionalization
669 623ff
Nucleophilic groups Oxocarbenium ions 638
ketone functionalization Oxyethanethiol group 546
through hydrazone and oxime
formation 616
Nucleotide-binding site 396 P
P-selectin
Nucleotide-sugar substrates 649 potent inhibitor of 647
p2*activated protein kinase 1 (PAK1) 855
0 p53-hdm2 interaction
OGR1, Ovarian cancer G protein-coupled inhibitors of 991ff
receptor 1 (OGRI) 949 biological background of 991
OGW, Ontology workinggroup (OGW) interface, characterization of 992f
1094 pharmacophore model, establishment
Olfactory receptor genes 944 and validation of 993ff
Olfactory receptors 944 P450 datasets 1034
1188
I Index
PAGE, Polyacrylamide gel electrophoresis Peptide carrier protein (PCP) 615
(PAGE) 447 Peptide moiety-kinase interaction
PAI-1, Plasminogen activator inhibitor 399
(PAZ-I) 704 Peptide nucleic acid (PNA) 272, 576
PAKl, p2lactiuated protein kinase I ( P A K I ) Peptide thioesters
855 production of 543
Pancreatic trypsin inhibitor 539 solid-phase peptide synthesis 543
Parallel synthesis 489 tent-botylmethoxycarbonyl (Boc)-based
Parathyroid Hormone/Parathyroid peptide synthesis 543
Hormone Related Protein (PTHIPTHrP) Peptides 567, 989
942 C-terminal thioester
Parthenolide 109 synthesis of 579f
Feverfew 109 C-terminally modified
nuclear translocation solid phase synthesis of 579
NF-KB 109 synthesis of 578-579
phosphorylation chemical synthesis of 568
IKB 109 fragment condensation of 570
Partial least squares (PLS) 1011, 1036 thioester method for 570
Partitioned total surface areas (PTSAs) fully unprotected 572
1027 intermolecular linking of 571
Patchouli Alcohol N-alkyl 568
accepted N-terminal modification of 578
X-ray 55 solid phase synthesis of 578
proof of structure N-terminally functionalized
total synthesis 55 synthesis of 578
Structural Proof 55 partially protected 570f
structure coupling of 571
wrong 55 synthesis of 988
Synthetic Lesson 55 unprotected
Trouble with 55 chemoselective ligation of 572ff
Patient population hydrazone ligation in aqueous
target validation solution 572
proof of principle, in phase Ira clinical thioester ligation in aqueous solution
trials 791 573
PCA, Principal component analysis (PCA) Peptidoglycan 650
333,501 synthesis 652
PCAs, Proteinfragment complementation Peptidyl carrier protein (PCP) 472, 522
assays (PCAs) 1132 Peropsin 944
PCP, Peptidyl carrier protein (PCP) 472, Peroxisome proliferator activated receptor
522,615 gamma (PPARy) 902
PCR, Polymerase chain reaction (PCR) PET, Positron emission tomography (PET)
405,436,941, 1086 438
PDB, Protein Data Bank (PDB) 949 PGIS, Prostacyclin synthase ( P G I S ) 369
PDE, Phosphodiesterases (PDE) 374, 1131 pgp-1,p-Glycoprotein protein (pgpj-l
PDGF, Platelet-derived growthfactor (PDGF) 714
1065 Pharmaceutical industry
PEG, Poly(ethylene glycol) (PEG) 607, medicinal chemists
1126 screening campaigns for 804
PEP, Phosphoenolpyruuate (PEP) 651 Pharmaceutical research
Peptide combination strategy of, ligand-detected
optimal peptides and protein-detected NMR 880
library approach 435 fragment-based NMR approach
Peptide a-thioesters 543 Jun N-terminal Kinase 3 (JNK3)
Peptide binding 953 881
Pharmacological literature Photoactivatable Groups 140
Drews Applications 140
identication of, 483 known drug cinnamate cage
targets 809 E + 2 photoisomerization 147
ligand-binding domains, estimation of Nitrobenzyl and Nitrophenyl 140
809 nucleophilic group
Phenol sulphuric acid test 685 alcohol 148
Phenylalanine phosphonates 390 amino 147
Pheromone receptors 944 in proteins and peptides 147
PhK, Phosphorylase kinase ( P h K ) 871 thiol 147
Phosphatidylinositol-3-OH kinase (PI3K) Photocleavable Groups 147
915 thiophosphates 149
Phosphodiesterases (PDE) 374, 1131 via diazo compounds 149
Phosphoenolpyruvate (PEP) 651 Photocleavable Groups 147
Phosphoinositide (PI) 1067 Vinylogenic 147
Phospholamban pentamer Photoreceptor cell-specific receptor (PNR)
biarsenical-tetracysteine complex 902
structure of 447 Photoremovable Groups
Phospholipase C (PLC) 1067 photoremovable protecting groups 146
Phospholipase Cp (PLCB) 947 Physical chemistry 725
Phosphonates 389 Physician Desk Reference (PDR) 760
Phosphonomethylene alanine (Pma) 390 PI, Phosphoinositide ( P I ) 1067
Phosphonomethylene phenylalanine P13K, Phosphatidylinositol-3-OH kinase
(Pmp) 390 ( P 1 3 K ) 915
Phosphonomethylphenylalanine (Pmp) PKA, Protein kinase A 385, 855, 942
995 PKB, Protein kinase B ( P K B ) 859
Phosphopantetheine transferase (PPTase) PKS, Polyketide synthesis ( P K S ) 471
463 Plasma membrane (PM) 439,445
Phosphorylase kinase (PhK) 871 Plasminogen activator inhibitor (PAI-1)
Phosphorylated STAT-5 704
in cytoplasm 1051 Platelet-derived growth factor (PDGF)
Phosphorylation Sites and 1065
Phosphopeptides 165 PLC, Phospholipase C ( P L C ) 1067
cage PIXa, Phospholipase CB(PLC,) 947
to the phosphate 166 Plerograms 727
Caged 165 PLP, Pyridoxal phosphate ( P L P ) 610
caged phosphoserine PLS, Partial least squares ( P L S ) 1011,1036
containing phosphopeptides 166 PM, Plasma membrane (PM) 439,445
efficiency of Pma, Phosphonomethylene alanine (Pma)
photoactivation 166 390
peptide probe Pma-32 AANAT 395
activity 165 Pmp, Phosphonomethylene phenylalanine
monitors protein kinase C 165 (Pmp) 390
photoactivatable fluorescent 165 Pmp, Phosphonomethylphenylalanine (Pmp)
Ser-caged 165 995
phosphoproteins PNA, Peptide nucleic acid ( P N A ) 272, 576
on the phosphate moiety 167 PNR, Photoreceptor cell-spectj'ic receptor
with cages 167 ( P N R ) 902
phosphoserine polar surface area (PSA) 766, 1026
2-nitrophenylethyl-caged 166 Poly(ethy1eneglycol) (PEG) 607
tripeptide Poly@-Phenylene Ethynylene (PPE) 685
N-formyl-(L) Met-(L) Leu-(L) Phe 168 Polyacrylamide gel electrophoresis (PAGE)
Caged versions 168 447
Phosphoserine/threonine 389 Polyethylene glycol (PEG) 1126
1190
I Index
Polyethylene glycol-derivedpolyamide pre- Woodwardian 12
(PPO) 585 Emil Fischer
Polyfluorocarbon chains 485 synthetic chemistry in biology 13
Polyhistidine-containing sequence (HIS) Estrone
558 Dane strategy 14
Polyketide synthases (PKSs) 520 Robert Robinson
Polyketide synthesis (PKS) 471 employ mechanistic considerations
Polyketides 14
aromatic 525,533 modifications in a pathway 13
analog production 526 steroid synthesis 13
combinatorial biosynthesis of 529 Precipitation tags 485
classes of 520 Predicted residual error sum of squares
formation of 521 (PRESS) 1013
Polyketides and nonribosomal peptides Pregnane X receptor (PXR) 902
combinatorial biosynthesis of 519ff preparative chemistry 9
applications and examples of 529ff Preparative Chemistry - Synthetic
development of 523ff Chemistry 9
future development of 531ff preparative chemistry 9
general considerations for 527 PRESS, Predicted residual error sum of
history of 523ff squares ( P R E S S ) 1013
Polymerase chain reaction (PCR) 405, Principal component analysis (PCA) 333,
436,675,941,1086 501
Polymers Euclidean distance-preserving rotation
classes of 668 333
non-cross-linked 485 Pearson correlation coefficients 333
Polypeptides 567 linear dimensionality reduction 334
chemoselective ligation for 573 Probability of success (POS) 790
POS, Probability ofsuccess ( P O S ) 790 Probe 77
Positron emission tomography (PET) Brefeldin A
438 Principles of Membrane Transport
post-Darwinian Era 19 84
genetic mutation 20 Correcting Errors
Modern Synthesis 20 in Chromosome-spindle Attachments
multidimensional sequence space 20 81
natural selection 20 Progression
New Synthesis 20 through Mitosis 77
Postsynaptic density (PSD-95) 969 Ribosomal RNA 88
Posttranslational modifications 550 Progesterone receptor (PR) 903
Power of Genetics 199, 201, 203, 205, Progression 77
207,209,211,213,215,217,219,221 chromosome
Chemistry 199, 201, 203, 205, 207, into two daughter cells 78
209,211,213,215,217,219,221 movements 77
PPARy , Peroxisome prolijerator activated segregation 78
receptor gamma ( P P A R y ) 902 sister 77
PPO, Polyethylene glycol-derivedpolyamide Cleavage Plane 80
(PPO) 585 Prokaryotes 635,648
PPT, Propyl pyrazole triol (PPT) 368 Prokaryotic and eukaryotic organisms
PPTase, Phosphopantetheine transfrase complete genome sequences
(PPTase) 463 availability of 403f
PR, Progesterone receptor ( P R ) 903 genomic and proteomic methods
pre-Darwinian 18 mRNA and protein abundance,
anatomical function 18 measurements of 403
anatomical structure 18 Propyl pyrazole triol (PPT) 368
Cuvier-Geoffroy debate 18 Prostacyclin synthase (PGIS) 369
Prostaglandins lysine, cysteine, and glutamic acid
markers for residues
inflammatory and thrombotic disease's strategies for 596
792 molecules and materials, attached to
role in, inflammation and platelet proteins
function 792 survey of 594
Prostate-specific gene receptor (PSGR) new chemical methods
944 attachment of, synthetic molecules to
Protease Chemical Biology Platform, proteins 593ff
launching of outlook of 593
by Hans Peter 801 Protein biosynthetic system
Proteasome 101 Central Dogma
700kDa 101 micelle-mediated aminoacylation
Inhibitors 101 275ff
Lactacystin 101 synthetic expansion of 271ff
proteolysis directed evolution of, existing
of intracellular 101 aaRS/tRNA Pair to accept nonnatural
regulator 101 amino acids 278ff
Protecting groups four-base codons
for N-terminal cysteine 546 CGGG and AGGU 285f
orthogonal 671 complementary four-base anticodons
strategies of 546 285
Protein frame-shift suppressor tRNA 285
fluorescein bis(arsenica1) (FlAsH) dyes nonnatural base pairs, orthogonal to
binding of, tetracysteine motifs to 287
Gllf principle of, four-base codon strategy
lysine residues 285
reductive alkylation using transfer top codons for, multiple
hydrogenation 607 incorporations 286
modification of genetic codes
transition metal catalyzed reactions, amber suppression method 284
using G O l f f expansion of 284f
N-termini of stop-codon suppression method,
site-selective modification of 607ff drawbacks of 285
posttranslational modifications of three stop codons (UAG, UAA, UGA)
387ff 285
Protein a-thioesters 542 nonnatural amino acids
Protein assemblies adaptability of EF-Tu to
functionalization of aminoacyl-tRNAs 283
diazonium-coupling strategies 599 adaptability of, E. coli ribosome 283
Protein bioconjugation adaptability of ribosome 283f
activity based protein profiling biomolecules optimized for 281f
cycloaddition reaction, detecting EF-Tu molecule 283
probes attached protein reactive incorporation of, proteins and
sites 620 small-sized ones 284
central role in, Chemical biology using puromycin analogs 283
593ff variety of 271
field of nonnatural aminoacylation
unique reactivity attributes 593 alternative approach to 278
future development of 625ff Methanococcas jannaschi, mutation of
ketone groups tRNA structure 278
using primary bioconjugation negative selection for, eliminating
reactions 616 TyrRS 279
1192
I Index
Protein biosynthetic system (continued) Protein - Ligand Interactions 117,

nonnatural amino acid as 21st amino 119, 121, 123, 125, 127, 129, 131, 133,
acid 280 135
selection of, tRNAs not aminoacylated Protein-Ligand Interactions 115
279 Using Chemistry 115
TyrRS mutants, positive selection for Protein interfaces
280 analysis of 987
orthogonal aaRS/tRNA pair Protein kinase A (PKA) 394, 399,855,
in mammalian cells 281 942
Schultz and Yokoyama, elegant Protein kinase B (PKB) 859
Protein kinase inhibitors 388
approaches of 281
Protein kinase-bisubstrate analog
orthogonal tRNAs
inhibitors 3961
nonnatural amino acids 282
Protein kinases 388
outlook of 271
catalytic domain fold
PNA-assisted aminoacylation 277f construct and condition optimization
in vitro translation system 278 859
Nielsen-type PNA, obstacle of 278 characterizing kinase-ligand
yeast phenylalanine tRNA, 9-mer PNA interactions
277 byNMR 882ff
protein synthesis, mechanism of 273 construct and condition optimization
ribozyme-mediated aminoacylation ['H,'SN]-TROSYspectra of, protein
276f kinase catalytic domains of 862
flexizyme 277 as drug targets 852f
Protein catenane implicated as, pivotal signal transducers
synthesis of 587 in cell signaling networks 1129
Protein circularization 556 inhibition of
Protein Complementation Assay 213 signal transduction pathways, study of
interactions 853
detection 214 kinase - ligand interactions
protein-small molecule 214 chemical shift perturbations 883
protein interactions simulation of NMR spectra, of two
detect 213 state DFG-in/DFG-out model 885
incell 213 kinase-ligand interactions
invitro 213 DFG-in/DFG-out 884ff
invivo 213 LIGDOCK procedure 886
Protein cyclization 585 mapping of, chemical shift
Protein Data Bank (PDB) 949 perturbations 882f
Protein Engineering 134 NMR resonance assignment
['H,''N]-TROSY spectra 868
Challenges 134
['H,''N]-TROSY spectrum of, active
mutant proteins
murine protein kinase A (PKA)
compromised function 134
863
mutations
chemical shift matching procedure
engineered 135 864f
impact on the activity of the protein paramagnetic spin labels 867
135 use of, paramagnetic spin labels
Protein engineering 556 866ff
Protein fragment complementation assays using, triple-resonance experiments
(PCAs) 1132 861ff
Protein Function 115, 239 optimization of, buffer conditions
Analysis of 239 unfolded or aggregated protein state,
Engineering Control 115 folded protein suitable for NMR
pathway it controls 239 860
protein dynamic behavior Protein tyrosine phosphatase (PTP) 385;
Index
I 1193
solution-state N M R 873 388,391

protein dynamic behavior, study of Protein-based catalysts 385
873f Protein- DNA interactions
protein kinase catalytic domain 853 antagonist 511ff
protein-based bisubstrate analogs of Protein-Ligand Interactions 115
385 Biomolecular Interfaces 135
ribbon diagram of Engineering 115
murine protein kinase A (PKA) in Genetic approaches 115
complex with Mg/ATP, catalytic Ligand Selectively of Ion Channels 130
domain of 854 mutations 115
signal transduction phenotype 116
biochemical reactions, succession of protein
853 alter ligand specificity 116
structural biology of 853ff mutated 116
Yeast three-hybrid (Y3H) protein engineering 116, 134
applications and practical examples Resistance Mutations 116
1129ff Revealing Biological Specificity 115
using in vitro kinase activity profiling Sensitizing Mutations 126
1130 Protein - Ligand Interactions 117, 119,
Protein ligation 544 121, 123,125, 127, 129, 131, 133,
Protein lipidation 583 135
Protein Medicinay Chemistry 582 Engineering 117,119,121,123,125,
Protein network analysis 127,129,131,133,135
proteome analysis Revealing Biological Specificity 117,
119, 121, 123, 125, 127, 129, 131, 133,
position-specific fluorescence labeling
135
289
Protein-carbohydrate interactions 636f
Protein phosphatases 388
inhibition (348
Protein phosphorylation 388
strategies for 639ff
Protein semisynthesis 386f, 390, 539
inhibitors, identification of 645ff
in living cells 558
Protein-nucleic acid interactions 669
and proteolytic enzymes 539
Protein-protein Interactions 199, 216,
scope of 539 227,388, S l l f f , 669
Protein splicing 540ff activators
in living cell fully synthetic 245
conditional protein splicing 557 transcriptional 245
control of 557 aptamer
in living cells 557ff peptide 217
Protein substrate sites selections 217
advantage of 396 Applications 216, 237
Protein synthesis Catalysis 206
and protein folding Chemical Dimerization Technology
bacteria with FlAsH, monitoring of 228
442 Chemical Inducer of Dimerization (CID)
using peptide fragments 208
from solid phase peptide synthesis CID anchor 215
569 compound libraries 989ff
Protein target Controlling 199, 227
isoform selective inhibitor cyclin-dependent kinase (CDK) Cdc2O
new clinical aspect 373f 204
Protein transduction domain (PTD) 557, Development 202
558 dimerization
Protein transsplicing 542, 556, 560 reverse 227
1194
I Index
Protein-protein Interactions (continued) variant 215

dimerizer Proteins 45,668
cell-permeant organic molecule 227 Ala scanning mutagenesis of 572
diversity of 980ff amino acids
DNA-Protein Interactions 204 ordered arrangement 48
drugs targeting 979ff in proteins 48
E.coli 211 azide modification
transcription assays 210 using Staudinger ligation 616ff
Future Development 222 bio-macromolecules 46
genetic assays 210 biochemist
pathway-specific 201 bottom-up view 46
traditional 201 biomimetic strategy for
History 202 N-terminal modification 610
n-hybrid assays 202 carboxylate residues of 595
independent domains chemical orthogonality
DBD 202 preparation of 598
functionally 202 chemical synthesis of 567ff
transcription AD 202 chemically synthesized 572
inhibitors of 979 common strategies for
K~cutoff 215 N-terminus, modification of 609
medium competitive inhibition 987
lacking histidine 205 complementarity 983
molecules complexes of 981
chemical discovery 200 different binding sites of
in the cell 200 conotoxins and nicotinic acetylcholine
nucleic acids 200 receptors 375f
small molecules 200 expressed in, prokaryotes
Myc - Max 513 strategies targeting N-terminal serine
protein residues 610
evolution 216 function of 458
protein chimera Generation 45
DNA-binding 203 Genetic Code 50
transcription activation 203 human genes and proteins
Protein Complementation Assay 213 potentially druggable 808
receptors intein-based labeling of 460
activate 245 labeling of 459
cytokine 245 messenger-RNAs (mRNAs)
RNA-Protein Interactions 205 template-RNA 49
S. cerevisiae 208 unstable intermediates 49
screening techniques 989ff modification of, C-terminus using native
selected interface chemical ligation 611
experimental validation of 988f modification of, cowpea mosaic virus
Small molecule-Protein Interactions (CPMV)
206 using “Click” chemistry 621
targets 989ff modulation of enzymatic activity
three-hybrid assay Briggs-Haldane mechanism, of
small molecule 208 enzyme action 1067
Transient 227 (molecular) biologist
two-hybrid assay 199 topdown attitude 46
Using Chemical Inducers and Using Molecular Biologist’s Look 48
Disrupters of Dimerization 227 N-terminal modification strategies
Yeast 210 critical consideration 609
zinc-finger nucleophilic groups
protein 215 number of GO3
pharmaceuticals, development of 581ff PTD, Protein transduction domain (PTD)
Index
I 1195
phosphorylated proteins 557,558

receptor binding of 1067 PSD-95, Postsynaptic density (PSD-95) 969
plasma membrane Pseudooligosaccharides 679
receptor association 1067 PSGR, Prostate-spec$c gene receptor (PSGR)
Polypeptide synthesis 944
polymer supports 47 PTH/PTHrP, Parathyroid
protein target cysteine residues HormonelParathyroid Hormone Related
site-specific modification of 596 Protein (PTHIPTHrP) 942
ReAsH-mediated photoconversion PTP, Protein tyrosine phosphatase 385
practical for 452 PTSAs, Partitioned total surface areas
reductive alkylation of (PTSAs) 1027
using iridium catalyzed transfer Pubchem database 760
hydrogenation 608 Pulmonary fibrosis 391
self-assembly Purine Analogs 99
due to codon-anticodon interaction CDK inhibition 99
49 Flavopiridol (FLV) 100
during translation 49 inhibitors
mRNA and tRNA 49
selective kinase 99
STAT transcription factors
PXR, Pregnane X receptor (PXR) 902
phosphorylation and dimerization of
pyranosyl-RNA (p-RNA) single strands
1067
with nucleobases 36
Structure 45
Pyridoxal phosphate (PLP) 610
hydrogen bonding 47
polypeptide chains 47
synthesis 9
automated solid-phase 47 qPCR, Quantitative, polymerase chain
protecting group technology 47 reaction (qPCR) 702
targeting of Q SAR, Quantitative structure-activity
other functional groups 597 relationship (QSAR) 310, 1008
The Chemist’s Look 47 Qualitative roadmap
unwanted disulfide bond formation intracellular molecules
or scrambling 597 signal transduction pathways,
Proteomes organizing to form 1061
candidate inhibitors, library of 417 Quantitative, polymerase chain reaction
enzyme target (qPCR) 702
ABPP probe structures, and target Quantitative Strucure-Activity
enzyme classes 412 Relationship (QSAR) 310, 731, 1008
SE probe library, reactivity profile with ~ ~ 56 i ~ i ~ ~
411 partial synthesis
probe library from quinitoxine 56
screening libraries of 41 1 Rabe and Kindler 56
Proteomics Synthetic Lesson 56
activity-based proteomics total synthesis
and activity-based methods 403 formal 58
chemical strategies for 403ff Stork 58
complex biological proteomes Woodward and Doring 58
functional analysis of 403
Trouble with Total Syntheses 56
prokaryotic and eukaryotic genomes
assignment of, molecular and cellular
functions 403 r
proteins R1128 525
functional characterization of 422f RA, Rheumatoid arthritis (RA) 1097
PSA, Polar surface area (PSA) 1026 Rab escort protein (REP) 549
1196
I Index
Rab geranylgeranyl transferase general considerations for 943ff

(RabGGTase) 549 history of 938ff
Rab GTPase Receptor tyrosine kinases (RTKs) 1063
effect of prenylation on 550 Receptors activated solely by synthetic
RAC3, Receptor associated coactivator ligands (RASSL) 365
(RAC3) 914 Receptosomes 935,943
RAD, R N A abundance database ( R A D ) Recombinase 184
1096 Conditional 184
Radio Immune Assay (RIA) 368 Cre-ER system 184
RAMPs, Receptor activity modulating Engineered Nuclear Receptors 185
proteins (RAMPS) 948 Ligand-dependent 184
Rapamycin 108,519 NHRs 185
toFK 506 receptor
different activity 108 antagonists 185
structurally similar 108 synthetic 185
RASSL, Receptors activated solely by synthetic site-specific 184
ligands ( R A S S L ) 365 Recursive deconvolution 491f
RDCs, Residual dipolarcouplings (RDCs) Recursive partitioning (RP) 1034,
866 1038
Reaction constant 731 Regulated Transcription and Gene
Reactive group (RG) 408 Therapies 241
Reagents activation
carbodiimide coupling 485 allosteric 242
solid-supported 485 diphtheria toxin 242
Receptor 939 genes
Receptor activity modulating proteins control of 242
(RAMPS) 948 endogenous 242
Receptor associated coactivator (RAC3) tetracycline-inducible 241
914 Three-hybrid Approaches
Receptor Plasticity 180 chemical complementation 243
arginine residue 181 REP, Rab escort protein (REP) 549
estrogen analogs 182 Research and development
estrogen receptor 181 clinical knowledge for
functionalized next generation projects 790
carboxylate 183 successful phase I11 clinical studies
ligands 183 790
hormone-binding Residual dipolar couplings (RDCs) 866
selectivity 181 Resistance Mutations 116
hormones HIV Protease 116
bumped 180 Kinase 122
mutation to Small-molecule Agents 116
Glu353 182 Target of Rapamycin 119
near drugs The Selection 116
9 4 s retinoic acid 180 Resistance-causing enzymes
Overcoming 180 inhibitors of 681
polar group exchange 183 Retinal G protein-coupled receptor (RGR)
receptor 944
RAR 180 Retinoid X receptor (RXR) 905
retinoid 181 Reverse chemical genetics
salt bridge proteins
ligand-receptor 181 biological function of, full control of
Receptor target family 380
GPCR - 7TM 933ff target validation, necessary tools in
development of 938ff 379f
Reverse Dimerization 235
Index
RT-PCR, Reverse transcriptase-polymer chain

I 1197
Inducible Disaggregation 235 reaction ( R T - P C R ) 961

ligand RTKs, Receptor prosine kinases (RTKs)
analogous 236 1063
bumped 236 RXR, Retinoid X receptor ( R X R ) 905
to one half of AP1903 236
two-hybrid assay 236
5
Reverse transcriptase-polymer chain
S-type lectins 641ff
reaction (RT-PCR) 961 structure of 642
RG, Reactivegroup ( R G ) 408 S l P, Sphigosine-I-phosphate ( S I P ) 959
RGR, Retinal G protein-coupled receptor Saccharides 635
( R G R ) 944 SAE, Sialic acid 9-0-acetylesterase (SAE)
Rhamnose biosynthesis 420
probe identification 656 SAGE, Serial Analysis OfExpression ( S A G E )
Rhamnose biosynthetic pathway 655 1096
inhibitors of 656 SAHA, Suberoylanilide hydroxamic acid
Rheumatoid arthritis (RA) 1097 ( S A H A ) 701
Rhodium carbenoids SAR, Structure-activity relationship ( S A R )
in disulfide modification 606 792,811,828,876,950,1008,1128
using, tryptophan modification 605 Saturation transfer detection (STD) 873
Rhodopsin 935,949,953 SCAM, Substituted-cystein accessibility
RIA, Radio I m m u n e Assay (RIA) 368 method ( S C A M ) 949
Ribonucleic acid (RNA) 300, 576 Scavengers 485
Ribonucleic acid-based interference (RNAi) Scintillation proximity assay (SPA) 361
307 Screening campaigns
Ribosomal RNA 88 failure of
aminoacyl-tRNA druglike leads or chemical tools,
mimic 88 discovery of 804
Catalysis 88 Scytovirin-N 679
model 88 SE, Sulfonate ester ( S E ) 411
Puromycin 88 Segmental isotopic labeling 555
ribosome 88 Selectins 643ff, 681
Yarus inhibitor 88 features of 644
Ribosome 668 Selective estrogen receptor modulators
Ribosome-synthesized proteins 554 (SERMs) 916
RMSD, Roo; mean squave dijirence ( R M S D ) tamoxifen
865 and second generation SERM,
RNA abundance database (RAD) 1096 raloxifene 916ff
RNA-Protein Interactions 205, 219 Selective GR modulators (SGRMs) 918
in vitro methods 219 drugs for variety of, debilitating diseases
specificity 220 918f
switch Selective nuclear receptor modulators
sperm/oocyte 220 (SNuRMs) 916
third component Selective optimization of side activities
hybrid RNA 205 (SOSA) 958
three-hybrid assay 205 Selective peroxisome proliferator activated
to the two-hybrid system 205 receptor gamma modulators (SPPARMs)
RNA, Ribonucleic acids ( R N A ) 300, 576, 919
668 Selenocysteine 576
RNAi, Ribonucleic acid-based interference Self-assembled monolayers 676
( R N A i ) 307 Semantics 4, 5, 7, 9
Root mean square difference (RMSD) 865 Preparative Chemistry - Synthetic
RP, Recursive partitioning ( R P ) 1034, 1038 Chemistry 9
1198
I Index multiple pathways and cell stimuli
Semantics (continued )
Synthetic Design 8 model generality 1075
Sensitizing Mutations 126 signaling module, compression of
to Engineer Nucleotide Binding Pockets activation of PDGF receptor 1076
126 Signal transduction pathways
Exploiting 126 mathematical modeling
GTPases to XTPases 128 emergence of, powerful tool 1061
Uniquely Inhibitable Kinases 126 Signaling cascades and networks
Sequential ligation 545 bistability, existence and functional
Serial Analysis Of Expression (SAGE) significance of 1073
1096 general considerations and
series identifier (SID) 767 pathway-specificmodels 1073f
Serine hydrolase (SH) 409 multiple signaling pathways 1074
Serine/threonine kinase 385, 399 Erk activation, by Raf-MEK-Erk
SERMs, Selective estrogen receptor cascade 1074
modulators ( S E R M s ) 916 pathway crosstalk interactions
Serotonin N-acetyltransferase 394 in positive feedback loops 1074
Serpentine receptors 933 Signaling literature
7TM, Seven transmembrane ( 7 T M ) 933 conceptual models
SGRMs, Selective G R modulators ( S G R M s ) invoked in 1062
918 Signaling pathways 1046
SH, Serine hydrolase ( S H ) 409 binding, cell surface receptors 1062
Shikimic acid 647, 648 Signaling processes
Shokat novel experiments, outcomes of
kinases, allele-specific chemical generating hypothesis-driven research
intervention of 365 1062
Short synthetic peptides 400 quantitative models of 1062
SHP, Small heterodimerpartner ( S H P ) Silencing mediator of retinoid and thyroid
367 (SMRT) 914
SHP-1 391, 392 Similog descriptor 958
mutations of, in mice 391 Simplified Molecular Input Line Entry
SHP-2 391,392 System (SMILE) 761
Sialic acid 9-0-acetylesterase (SAE) 420 Single gene mutations 1045
Sialyl Le' 682 Single nuclear polymorphism (SNP)
Sialyl Le" 682 970
Signal transducer and activator of Single nucleotide polymorphism (SNP)
transcription 6 (STAT6) 1097 378
Signal transducers and activators of Single Residue Protein Caging 152
transcriptions (STATs) 1134 alkyl halides
Signal transduction photolabile 152
intracellular signaling, modeling amino acid
fundamentals of 1063 different from lysine or cysteine 155
modeling of 1062 residues 155
Signal transduction mechanisms specific 155
complex kinetic models BChE
control of cytoskeleton 1074 catalytic activity 155
gene regulatory networks and mechanistic properties of 156
genomic data, interface with 1074 cysteine residues
limitations of 1074ff essential 152
model compression and integration, modification 152
issues of 1075 in vitro
prospects and challenges 1074ff F-actin filaments 153
sequence of, signaling complex motility assay 153
assembly 1074 motility models 154
in vivo protein function 71
role of cofilin 154 short timescales 71
kinase Small-molecule interaction database
protein 153 (SMID) 348
phenacyl groups 154 SMD, Stanford microarray database ( S M D )
Single-component 21 1096
Consecutive Procedure 21 SMDLID 767
example SMID, Small-molecule interaction database
total synthesis of estrone 21 ( S M I D ) 348
Oral Contraceptives 21 SMILE, Simplijed Molecular Input Line
Singlet oxygen Entry System ( S M l L E ) 761
CALI, alternative methods of SMPs, Small molecule perturbagens ( S M P s )
transgenic knockouts 450 318
chromophore, or fluorophore assisted SMRT, Silencing mediator of retinoid and
laser or light inactivation 450 thyroid ( S M R T ) 914
SIRT, Sirtuin ( S I R T ) 696, 1131 SNF, Sucrose nonferuenting ( S N F )
Site-directed mutagenesis 386, 988 694
Skeletal diversity SNP, Single nuclear polymorphism (SNP)
approaches to 501 970
Smad2 553,555 SNP, Single nucleotide polymorphism ( S N P )
Small Caged Molecules 159
378,970
Caged Peptides 159
SNuRMs, Selective nuclear receptor
to Control Protein Activity 159
modulators (SNuRMs) 916
ligand
SOD, Superoxide dismutase ( S O D ) 621
activating 159
Solid phase peptide synthesis (SPPS) 543,
inhibiting 159
568-569
synthesis
abilityof 569
obstacles 150
restrictions of 545
Small heterodimer partner (SHP) 367
Solid-phase reactions
Small molecule perturbagens (SMPs) 318
heterogeneous nature of 485
Small Molecule- Protein Interactions
Solid-phase synthesis 484f, 487,670
206,220
chemical inducers of dimerization advantages of 670
in a small molecule 206 Solid-supported reagents 485
drug discovery research 220 SOS-NMR,Structural information using
enzyme 220 overhauser efects and selective labeling
invivo 221 ( S O S - N M R ) 887
three-hybrid assay SOSA, Selective optimization of side activities
yeast 206 (SOSA) 958
Small Molecules 71, 73, 75, 77, 79,81, 83, SPA, Scintillation proximity assay (SPA)
85, 87, 89 361
inhibitor Sphigosine-1-phosphate (SlP) 959
design strategies 89 spindle 72
Discovery 89 Spiruchostatin
specificity 89 epimer of
probes investigating, saturable transporters
fluorescence-based 90 716
as Probes Split inteins 542, 559
for Biological Processes 77 Split-pool synthesis 489ff
proteome encoding 492
small fraction 90 solid-phase 493
targeted 90 SPPARMs, Selective peroxisome proliferutor
to perturb activated receptor g a m m a modulators
designing strategies 71 ( S P P A R M s ) 919
Index
1200
I SPPS, Solid phase peptide synthesis ( S P P S ) Structural biology
568 and application of knowledge
SPR, Surface plasmon resonance ( S P R ) management
361,843,855 families of, targets kinases, proteases,
SRC1, Steroid receptor coactivator I ( S R C I ) ion channels 796
511,914 Structural information using overhauser
Stanford microarray database (SMD) effects and selective labeling (SOS-NMR)
1096 887
STAT6, Signal transducer and activator of Structure activity relationship (SAR) 505,
transcription 6 (STAT6) 1097 792,811,828,876,950,1008,1128
Static Variation 31 Suberoylanilide hydroxamic acid (SAHA)
Clark Still’sencoding-decoding 701
alternation 32 Substituent constant 731
combinatorial approach Substituted-cystein accessibility method
antiasthma drug 31 (SCAM) 949
split-and-mix strategy 31 Subtiligase 539, 574
Molecular decoding 33 Sucrose nonferuenting (SNF) 694
Molecular encoding 32 Sugar-nucleotide-binding enzymes
molecular tags effective probes, design of 655ff
cleaved photochemically 33 high-throughput screening
on-bead selection test probe identification through 651ff
specified 33 inhibitors of 648ff
Preparation 31 identification of 651ff
Screening 31 Sulfhydryl-reactiveaffinity reagents 949
variants Sulfonate ester (SE) 411
identified 32 Superoxide dismutase (SOD) 621
removed 32 Support vector machines (SVMs) 958
with affinity for the receptor 32 Surface plasmon resonance (SPR) 361,
variation 668,673,676,843,855
preparative rounds 32 SVMs, Support vector machines ( S V M s )
on resin-beads 32 958
screened 32 SWI, Yeast mating type switching (SWI)
STATs, Signal transducers and activators of 694
transcriptions (STATs) 1134 SwissProt ID 771
Staudinger ligation 546, 547 Synaptotagmin
first bioconjugation reaction 617 FlAsH-FALI inactivation of
generation of, fluorescent Staudinger Davis and coworkers, using 450
ligation products 619 Synthesis - Genesis - Preparation 4
powerful tool artificial indigo 6
study of, glycosylation pathways artificial urea 5
618 biological
quenching process indigo 7
enhancement in, dye quantum yield ch em ica1
618 indigo 7
STD, Saturation transfer detection ( S T D ) construction
873 anabolic pathway 5
Stem cells degradation
differentiation modulators 509f catabolic pathway 5
differentiation of example
small molecule modulators 510 indigo 7
pluripotent embryonic 509 genesis
Steroid receptor coactivator 1 (SRC1) 511, programmed 8
914 indigo 6f
Stimuli 1045 N-phenylglycine 5
preparation of metabolic systems 1046
intuitive 8 VS. one Postdoc - one protein 1049
synthesis organism function
planned 8 concept of 1083f
synthetic chemist protein function of cell
asdesigner 8 multicell organisms, complex
as molecule maker 8 interplay in 355
Synthetic Execution 8 vs. proteomics, genomics, metabolomics
target molecule and high-throughput technologies
constitution of 6 1048
degradation products 6 signal pathways
Urea 5 turning into signal networks 379
Wohler 5
Synthetic carbohydrates 668
Synthetic chemistry t
Hecht method, chemical aminoacylation T cell death-associated gene 8 (TDAG8)
of isolated tRNAs 274 949
nonnatural amino acids T Helper Type 1 (Thl) 1097
aminoacylation oftRNA 274f T Helper Type 2 (Th2) 1097
progress of 272f T Regulatory (Treg) 1106
Synthetic Design 8 T-cell receptor (TCR) 1097,1120
Design 8 cell differentiation
execution control of 1099
bottom-up-oriented 8 Tail tyrosine residue of Src
R. B. Woodward phosphorylation of, by Csk 400
art of organic synthesis 9 TAM, Tertiary amine (TAM) 966
synthetic planning 9 Tamoxifen
top-down event 8 first synthetic NR small molecule
Synthetic drugs 496 with differential tissue effects 916ff
and natural products first-line treatment for
structures of 499 ER-positivebreast cancer 916
vs. natural products 503 Tanimoto metrics
Synthetic Execution 8 mean and standard deviation of, the
Synthetic organic chemistry 725 distribution 332
Systematic nomenclature 730 Target family
Systems biology 1045 kinases
vs. bioinformatics 1048 prototype of 826
biological signals and actions 826 Target family approach
interactions of proteins, and pathways foundations of 825
of transferring 826 proteins, clustering of 825
chemical biology and 1145 Target of Rapamycin 119
chemical genomics F KB P - rapamycin complex 120
and chemical proteomics, chemical Identification 119
approach to 1118 immunosuppressants
definition of 1048 cyclosporin A 121
general considerations in 1047ff FK506 121
goal of 1045 Mechanism 121
history of 1046 mechanism of action 121
holistic approach of proteins
biological networks and experimental target of rapamycin 120
data 379 TORI 120
impact on medicine 1058 TOR2 120
limiting factor in 1057 rapam ycin
vs. mathematical biology 1048 cellular targets of 120
1202
I Index
Target of Rapamycin (continued) red-fluorescent dye resorufin (ReAsH)

immunosuppressant 119 important biarsenical besides FlAsH
natural product 119 43 1
resistance mutations single-molecule studies
from genome-wide screens 122 using biarsenical-tetracysteines 448
isolating 122 small-molecule labeling systems
targets comparison with 438f
phenotypically 119 specificity of, biarsenical-tetracysteine
relevant 119 method
TASP, Template assembled synthetic protein optimized tetracysteine sequences
(TASP) 585 435
Taste receptors 944 tetracysteine motif 433ff
TCEP, Tris(carboxyethylphosphi~e) (TCEP) toxicity and antidotes 437f
620 two-color method
TCR, T-cell receptor (TCR) 1097, 1120 continuous imaging of single cells
TDAG8, T cell death-associated gene 8 443
(TDAG8) 949 TFA, Trijuoroacetic acid (TFA) 569
TE, Thioesterase (TE) 522 TGFB, Transfonninggrowthfactor /?(TGFB)
Temperature-Sensitive Glycoprotein of 552
Vesicular Stomatitis TGFB signaling 552
Virus-O'--alkylguanine-DNA Thl, T Helper Type I ( T h I ) 1097
Alkyltransferase (tsVSVG-AGT) 465 Th2, T Helper Type 2 (Th2) 1097
multicolor analysis of 466 Thermal Sensation 76
Template assembled synthetic protein capsaicin
(TASP) 585 cellular phenotype 76
Tent-botylmethoxycarbonyl (Boc)-based natural product 76
peptide synthesis 543 cation channel 77
Tertiary amine (TAM) 966 cloned receptor
Tetracenomycin 525 VR1 (vanilloidreceptor subtypel) 77
Tetracysteine-biarsenical system cloning strategy 77
biarsenicals, chemistry of 430ff Thiazolidinediones (TZDs) 902
environment-sensitive fluorescent Thioesterase (TE) 522
biarsenicals 445f Thioesters 542ff
FlAsH-EDT2, synthesis of 431 C-terminus
fluorescence anisotropy of intein-based methods 611
FlAsH-tetracysteinecomplex 446ff Thiolate anions
future developments, and applications of sulfur-carbon bond
453f lysine cross-reactivity 597
general considerations of 430ff Three-hybrid (3H) 1120
genetically encoded fluorescence tag Thyroid-stimulating hormone (TSH) 969
small size of 439ff TIF2, Transcription intennediaryfactor 2
history (TIF2) 914
and design concepts of 429f TIMP-1, Tissue inhibitor ofmatrix
multicolor pulse-chase labeling 443ff metalloproteinase ( T I M P ) - I 1105
nonspecific staining TIPS, Triisopropylsilyl ( T I P S ) 709
limitation of 454 Tissue-specific progenitor cells
peptide libraries dedifferentiation of 509
optimizing tetracysteine sequence TMC-95A 103
with 435ff Tmsotf 671
practical applications of 439ff TMV, Tobacco mosaic virus (TMV) 600
protein-lipoates cofactors and enzyme TNF, Tumor necrosisfactor (TNF) 794,
thiols 1103
regeneration of 429f to Link a Protein Target 72
regeneration of, to arsenic 429 capsaicin and menthol 76
Colchicine and Tubulin 72 nanomolar 97
Index
I 1203
Cytochalasin and Actin 74 Trifluoroacetic acid (TFA) 569

phenotypes Triisopropylsilyl (TIPS) 709
inhibition 72 Trimethylsilyl triflate 671
Thermal Sensation 76 Tris(carboxyethy1phosphine)(TCEP) 620
to a Cellular Phenotype 72 Tryptophan
Tobacco mosaic virus (TMV) 600 residues, modification of
Toxicology 1033 using metallocarbenoids 604ff
TPX, Trapoxin ( T P X ) 98 TSA, Trichostatin A ( T S A ) 508, 701
TR-associated protein (TRAP) 914 TSH, Thyroid-stimulating hormone (TSH)
TRAF, Tumor necrosisfactor 969
receptor-associatedfactor (TRAF) 1103 tsVSVG-AGT, Temperature-Sensitive
Transcription 235 Glycoprotein of Vesicular Stomatitis
Regulated 235 Viuus-O'--alkylguanine-DNA
transcription Alkyltransferase (tsVSVG-AGT) 465
activate 235 Tubacin 505,508f
Transcription intermediary factor 2 (TIF2) Tumor necrosis factor (TNF) 794, 1103
914 cytokine
Transcriptional Regulators 175 synovial proliferation, critical role in
Derived from Natural Repressors 175 794
Eukaryotic 177 Tumor necrosis factor receptor-associated
Functional Orthogonality 180 factor (TRAF) 1103
Genetic Disease 186 Tunicamycin 649
Ligand-binding Pockets 188 B-TurnslStrands
Ligand-dependent Activators 177 bilayer
Light-activated Gene Expression 189 lipid 258
New Ligand Specificities 179 Computational modeling
Nuclear Receptor Engineering 183 Macromodel program 257
Receptor Plasticity 180 computer-simulated
Recombinases 184 conformational search 257
Role of Ligand-dependent 175 HIV-1 protease
Transducers 939 inhibitors 258
Transforming growth factor B (TGFB) surface 258
552 pyrrolinone
Translational medicine derivatives 258
chemical biology and 1148 inhibitory effects 258
Transthioesterification 522 scaffold
TRAP, TR-associated protein (TRAP) 914 #I-D-glucose 256
Trapoxin (TPX) 98 nonpeptide 257
affinity reagent scaffolds
synthesized 98 denovo 259
fungal metabolite 98 designed 259
Treg, T Regulatory (Treg) 1106 structures
Triantennary N-linked mannoside mimic 257
(Man)g(GlcNAc)z 679, 681 protein 257
Triarylphosphines and azides secondary 257
Staudinger ligation synthetic
reacting to form, iminophosphorane scaffolds 259
imtermediate 617 P-TurnslStrands 256
Trichostatin A (TSA) 97, 508, 701 Peptidomimetics 256
anti fungal Two-dimensional electrophoresis (2DE)
from a Streptomyces 97 405
concentrations two-hybrid assay 199
low 97 biased toward proteins 201
1204
I Index
two-hybrid assay (continued) US P S, Ubiquitin-split-protein-sensorUSPS)

eukaryotic transcription factor 200 1132
genetic 199 UWL, Unstirred waterlayer (UWL) 1021
key modifications 202
libraries V
DNA 201 Vaccines
exact cDNA-AD 201 for malaria and HIV 677ff
screen entire genome 204 Vacuolar ATPases (V-ATPases) 103
selection strain 200 enzymes 103
yeast 199 function as
Tyrocidine 527 proton pumps 103
Tyrosine 385 Inhibitors 103
bioconjugation van der Waals components
protein surface residues, as targets for affinity of binding 805
598 Vancomycin 519
electrophilic aromatic substitution Vasoactive intestinal peptide (VIP) 955
method for 598 VEGFR-2, Vascular endothelid growthfactor
modification of receptor subtype 2 (VEGFR-2) 771
commercially available lysine-reactive VFTM, Venusflytrap module (VFTM) 937
probes 602 VIP, Vasoactive intestinal peptide (VIP)
three component Mannich-type 955
reaction 600 Viral membrane proteins 585
using palladium JC -ally1chemistry Vitamin D receptor-interacting protein
603 (DRIP) 914
residues VLP, Virus-like particle ( V L P ) s 439
native chemical ligations, using 602
residues, modification of W
new chemical tools 597ff Wild-type O6-a1kylguanine-DNA
TZDs, Thiazolidinediones (TZDs) 902 alkyltransferase (wtAGT) 464
Wild-type (WT) 317
U
Wnt/B-Catenin 1046
WOMBAT 760f
Ubiquitin-split-protein-sensor (USPS)
activity identifier (AID) 769
1132
bioactivity summary panel 766
UGM, Uridine
computed chemical properties panel
5’-diphosphate-galactopyranosemutase
766,768
(UGM) 639
database structure of 767ff
uHTS, Ultra high-throughput screening datamining with 779ff
( u H T S ) 361 and errors 772ff
Uniquely Inhibitable Kinases 126 quality control 769ff
Analog-specific Kinases 127 reference database 766, 768
gatekeeper residue rule-of-three compliant molecules 782
mutation 126 SMDLID 767
inhibitor target and biological information panel
designed 127 766,767
pyrazolopyrimidine-based 127 target types in 763
uniquely sensitive kinase allele 126 WOMBAT 2006.1 761ff
Unstirred waterlayer (UWL) 1021 Bioactivity distribution pie charts in
Ure2p 505ff 76 3
Uretupamines 505ff enzyme inhibitors 763
Uridine 5’-diphosphate-galactopyranose estrogen receptor 771
mutase (UGM) 639,653 mostypopulated oncology-relatedtargets
inhibitors of 654 in 781
target type distribution pie charts in Yeast 210
Index
I 1205
764 GFP and tetracysteine tags

vascular endothelial growth factor to p-tubulin 440
receptor subtype 2 (VEGFR-2) 771 hybrid systems
WOMBAT-Pharmacokinetics, reverse 210
WOMBAT-PK 755ff n-Hybrid System 210
Woodwardian 14 split
beginning in 1937 14 hybrid systems 210
Case Study transcriptional strength 21 1
(+c)-Estrone 1 5 yeast chromosome 211
chemical reactions Yeast mating type switching (SWI) 694
by diastereoselection 14 Yeast three-hybrid (Y3H) 1120
second phase 14 chemical structures of
World Drug Index (WDI) 760 immunosuppressants FK50G and
World of Molecular BioAcTivity, rapamycin 1121
WOMBAT 761 competition assay
WT, Wild-type( W T ) 317 measure of, cellular
wtAGT, Wild-type06-alkylguanine-DNA uptakelfunctionality of test MFC
alkyltransjerase (wtAGT) 464 1127
promising alternative methods
general considerations 1127ff
X
Yeast two-hybrid (Y2H) 1120
X-ray crystallography 641, 646, 652
Yeastcloning 214
Xenopus melanocytes 948
Yersinia bacteria
mammalian cells
Y infection of 440
Y2H, Yeast two-hybrid ( Y Z H ) 1120 YFP, Yellowfluorescent protein (YFP)
Y3H, Yeast three-hybrid ( Y 3 H ) 1120 44 1

Chemical Biology Vol 1 (2007)

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Chemical Biology Vol 1 (2007)

Enviado por

Direitos autorais:

Formatos disponíveis

Chemical Biology

Larijani, B., Woscholski, R., Casteiger, I. (ed.)

Systems Biology in Practice Handbook of Combinatorial

Kubinyi, H.,Muller, G . (eds.)

William J. Pesce Peter Booth Wiley

From Small Molecules to Systems Biology

WILEY-VCH Verlag CmbH & Co. KCaA

All rights reserved (including those o f

Typesetting Laserwords Private Ltd,

List of Contributors XVll

Part I chemistry and Biology - Historical and Philosophical Aspects

1 Chemistry and Biology - Historical and PhilosophicalAspects 3

1.7.2 Two Lessons From the Wealth of Published Total Syntheses 55

Part II Using Natural Products to Unravel Biological Mechanisms

2 Using Natural Products to Unravel Biological Mechanisms 71

2.2 Using Natural Products to Unravel Cell Biology 95

3 Engineering Control Over Protein Function Using Chemistry

3.2 Controlling Protein Function by Caged Compounds 140

3.3 Engineering Control Over Protein Function; Transcription

4 Controlling Protein-Protein Interactions 199

4.2 Controlling Protein- Protein Interactions Using Chemical

4.2.1 Introduction 227

4.3 Protein Secondary Structure Mimetics as Modulators of

5 Expanding the Genetic Code 271

5.1.6 Application of Nonnatural Mutagenesis - Fluorescence

Part Ill Engineering Control Over Protein Function Using Chemistry

6 Forward Chemical Genetics 299

7 Reverse Chemical Genetics Revisited 355

7.2 Chemical Biology and Enzymology: Protein Phosphorylation as a

7.3 Chemical Strategies for Activity-based Proteomics 403

8 Tags and Probes for Chemical Biology 427

8.1.3 General Considerations 430

8.2 Chemical Approaches to Exploit Fusion Proteins for Functional

Part IV Controlling Protein- Protein Interactions

9.2 Combinatorial Biosynthesis of Polyketides and Nonribosomal

10 Synthesis of Large Biological Molecules 537

10.2 Chemical Synthesis of Proteins and Large Bioconjugates 567

10.3 New Methods for Protein Bioconjugation 593

11 Advances in Sugar Chemistry 635

11.2 Chemical Glycomics as Basis for Drug Discovery 668

12 The Bicyclic Depsipeptide Family of Histone Deacetylase In-

Part V Expandingthe Genetic Code

13 Chemical Informatics 723

13.2 WOMBAT and WOMBAT-PK Bioactivity Databases for Lead and

Part VI Forward Chemical Genetics

14 Chemical Biology and Drug Discovery 789

14.2 The Molecular Basis of Predicting Druggability 804

15 Target Families 825

15.2 Chemical Biology of Kinases Studied by NMR Spectroscopy

15.3 The Nuclear Receptor Superfamily and Drug Discovery 891

15.4 The GPCR - 7TM Receptor Target Family 933

15.5 Drugs Targeting Protein-Protein Interactions 979

16 Prediction of ADM ET Properties I003

Part VII Reverse Chemical Genetics Revisited

17 Computational Methods and Modeling 1045

17.2 Modeling Intracellular Signal Transduction Processes 1 061

18 Genome and Proteome Studies 1083

18.2 Scanning the Proteome for Targets of Organic Small Molecules

Part Vlll Tags and Probes for Chemical Biology

19 Chemical Biology - An Outlook 1143

Small molecules are at the heart of chemical biology. The contributions in

Stuart L. Schreiber, Cambridge January 2007

Stephen R. Adarns Elisabetta Bianchi

Patrick Chene Craig M. Crews