The Pharmacogenomics Journal (2004) 4, 88–90

binding of a drug to a neurotransmit-

The HapMap project and its ter receptor (eg variants in the GABAA
receptor subunits alter responsiveness
application to genetic studies to the antiepileptic benzodiazepines).
The challenges facing us in making
of drug response effective use of genetic and genomic
information in pharmacological appli-
cations rely on being able to discover
P Deloukas and D Bentley the biologically and medically impor-
tant genetic variants for each pheno-
Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK type, and to apply them to improve
the use of drugs in healthcare. Studies
to date have mainly focused on candi-
The Pharmacogenomics Journal (2004) 4, genic diseases and other phenotypes date genes, each one chosen on the
88–90. doi:10.1038/sj.tpj.6500226 in which multiple genetic and envir- basis of a prior hypothesis that they
Published online 16 December 2003 onmental factors contribute to the risk encode a protein that is involved in a
that an individual has to develop particular drug response. Such studies
disease. It is now widely accepted that are now greatly enhanced by the
A central goal in the study of human association studies offer greater statis- wealth of information on new genes
biology is to understand the molecular tical power over linkage in detecting and variants that is available in
basis of common disease, and variable genetic effects underlying complex the public domain as a result of the
sensitivity to drugs and other environ- traits2 when the causative variant is Human Genome Project and asso-
mental factors. Adverse drug effects are common in the population. This ciated research. The more ambitious
a major cause of hospitalisation.1 The power is determined by the allele approach would be to scan the
development of more effective, safer frequency and risk ratio of the causa- entire genome for important new
medicines requires understanding of tive variant in relationship to the variants—an approach which is not
the genetic factors which govern vari- sample size. A key element in under- limited by any prior hypothesis,
able drug response in different indivi- taking such studies is the establish- but which requires effective resources
duals. Recent advances in genetics and ment of a comprehensive catalogue and technology for genome-wide
genomics are paving the way to devel- of common variants in the human analysis. But what is the best way
op diagnostic tests that will enable the population. forward, how do we make the best
administration of drugs to be tailored Detecting the clinically important use of the available information,
to groups of individuals, and may in genetic factors in determining variable and what resources do we need to
future help to define appropriate in- drug response poses a similar problem. develop?
dividual dosages and drug combina- Variability in drug response is gov-
tions in pharmacological treatment. erned both by genetic variants and HUMAN VARIATION
These hopes are reflected in the nongenetic factors such as diet, age Modern humans are a young species in
growth of pharmacogenetic and phar- and gender. The contribution of ge- evolutionary terms, with a limited
macogenomic research. Here we re- netic factors may be classified into two amount of sequence variation. It has
view the potential impact of current groups: pharmacokinetic variants af- been estimated that there are 11–15
research in human genetic variation fect uptake, absorption, metabolism or million variants with a minor allele
on our understanding and manage- clearance of a drug, and occur, for frequency 41%. Over 90% of these
ment of variable drug responses. example, in genes that encode liver variants are single-nucleotide poly-
In the past 10 years, there has been enzymes or drug transporters. The morphisms (SNP). Through large-scale
great success in identifying the genetic same genetic variants may affect the efforts, over 4.5 million SNPs are
basis of rare Mendelian disorders response to a range of drugs, because currently available in the public do-
through linkage studies in large af- multiple drugs are metabolised via the main (dbSNP; build116). However,
fected families. The genes and under- same route (as in the well-known case only a small fraction of these SNPs
lying mutations, which cause over of cytochrome P450 enzyme CYP2D6, has been characterised in terms of
1400 such conditions, are reported in which metabolises a quarter of all allele frequency and ethnic distribu-
OMIM (http://www.ncbi.nlm.nih.- commonly used drugs). Pharmacody- tion. This year has seen the comple-
gov). However, similar approaches namic variants affect the biological tion of a highly accurate and
have yielded much more modest suc- function of the drug at the site of contiguous reference sequence of the
cess when applied to common, poly- action—for example, by altering the euchromatic (gene-coding) part of the
HapMap project and its application
P Deloukas and D Bentley

human genome. A comprehensive SNPs.7 Based on this rationale, an evenly throughout the human gen-
set of annotated human genes is international effort, the HapMap pro- ome, resulting in sparse SNP coverage
available, and comparative analysis ject, was launched in October 2002, in many regions. The first deliverable
of the human genome sequence with the aim of constructing a gen- of the HapMap project is therefore the
to those of other species is highlight- ome-wide map of LD and common discovery of new SNPs. New shotgun
ing additional evolutionary conserved haplotypes in four populations from sequence data have been generated
regions, which are likely to be en- Africa, East-Asia and Europe. across the whole genome, using li-
riched in functional elements such braries made from DNA samples of a
as transcriptional regulators. As a range of individuals. The sequence
result, we can begin to systematically THE HAPMAP PROJECT data are being aligned to the finished
explore the sequence variation con- The study-design of the HapMap pro- human sequence, and candidate SNPs
tent of the functional part of the ject includes four population samples, are being detected using the program
genome, which recent studies estimate namely 30 trios from CEPH/Utah SSAHA-SNP. The new data, coupled
not to exceed 5%, in the human families (North European descent), 30 with improvements to the SNP-calling
population. Yoruban (Nigeria) trios, 48 unrelated algorithms, have already made a sig-
An association study can be under- Japanese and 48 unrelated Chan Chi- nificant contribution to the current
taken using either a direct or an nese. The International Consortium total of 4.5 million SNPs; it is antici-
indirect approach. The former relies (members listed at http://www.hapma- pated that this figure will exceed 5.5
on the availability of a complete list of p.org) adopted a hierarchical mapping million by the end of 2003. Further-
functional variants, which are then strategy. In phase 1 of the project, a more, the accumulation of sequence
tested for association to the trait of map of evenly spaced SNPs (1 per 5 kb) data through these efforts made it
interest in a number of phenotypically with minor allele frequency greater possible to devise a filter for selecting
matched cases and controls. The latter than or equal to 0.05 is being gener- SNPs that are likely to be common
involves testing a series of genomic ated. Analysis of local LD patterns in (minor allele frequency 40.05). Over
variants across a region (or all) of the each population will identify ‘haplo- 1.3 million of the currently available
genome for association, and relies on type blocks’, and also identify the SNPs have each allele ascertained
the assumption that the causative intervening regions that require data by two individual sequence reads.
variant will be in linkage disequili- from additional SNPs to try to detect Empirical data confirm that these
brium (LD) with one of the variants LD. Such regions will be the main ‘double-hit’ SNPs convert to working
tested. Thus, characterisation and un- focus of phase 2, which may include assays at a much higher rate than
derstanding of the dynamics of LD multiple rounds of SNP selection and random SNPs.
across the genome is necessary for genotyping. Some additional SNPs will The second deliverable of the Hap-
enabling whole-genome association also be typed within the emerging Map project will be a genome-wide set
studies. ‘haplotype blocks’ to corroborate the of SNP assays validated in four popula-
Recent studies conducted at various findings. Work is in progress to devel- tions. SNP assay development and
levels of resolution have all shown op an optimal statistical approach that validation remain a costly and la-
that the extent of LD in different parts describes accurately the highly vari- bour-intensive exercise; currently,
of the genome is highly variable, able nature of LD, and to provide such work needs to be carried out
averaging 5–20 kb but extending up precise parameters to assess comple- upfront for almost every genetic study
to hundreds of kilobases.3–5 Variability tion of the project in each region of undertaken. Like other large-scale
in average LD has also been observed the genome. It is estimated that over genomic projects, The HapMap con-
between ethnic groups, owing to dif- 1.6 million SNPs will be tested in the sortium has set up continuous quality
ferences in their demographic history, course of the project. Data will be control and assessment of the produc-
for example, population bottlenecks, released regularly into the public do- tion pipeline. High throughput geno-
admixture, genetic drift and natural main, initially via the consortium’s typing is carried out on five different
selection. The key observation is that Data Coordination Center (DCC) platforms: MassExtend (Sequenom),
present day chromosomes consist (http://www.hapmap.org), and from Invader (Third Wave), AcycloPrime-FP
mainly of short segments that have there to dbSNP and other public (Perkin-Elmer), Golden Gate-BeadAr-
undergone very limited or no historic databases. ray (Illumina) and Parallele. Cross-
recombination, and that for each of What are the HapMap deliverables, checking has enabled independent
these segments a few common haplo- and how may they contribute to the assessment of the performance of the
types represent the majority in a advancement of human genetics in different platforms. Raw genotype
population.3,6 For each region of high general and pharmacogenomics in data conform to a consistent high
LD and low haplotypic diversity (also particular? standard (499.5% accuracy), based
termed a haplotype block), only a few At the outset of the HapMap project, on reproducibility of results in dupli-
variants are then needed to tag the it was recognised that the collection of cate samples, concordance using in-
common haplotypes within it. These 2.4 million publicly available SNPs dependent assays for the same SNP
are referred to as haplotype tag (ht) (October 2002) was distributed un- and checks for consistency of allelic

HapMap project and its application
P Deloukas and D Bentley

inheritance patterns in pedigrees. The FUTURE BENEFITS/APPLICATIONS evaluation of new drugs during devel-
HapMap effort to assess genotyping In the course of the next 2 years, the opment and in clinical trials. The
platforms will be of direct relevance to HapMap project will result in a vali- importance of concerted international
the use of large-scale genotyping for dated set of SNP-based markers that efforts to produce public resources is
genetic studies in the future. describe local patterns of LD and underlined by the HapMap project,
The third deliverable of the project capture common haplotypes across which follows the principles adopted
will be an LD map of the human most of the genome in four popula- for the human genome sequence
genome and the determination of the tions. Clearly, the applicability of the itself.
underlying common haplotypes in HapMap in studying variation in po-
regions of strong LD. Characterisation pulations other than those included in
of local patterns of LD at high resolu- the study will need to be thoroughly
tion across the genome is a challen- tested. As a tool, the HapMap is
The authors declare that they have no
ging task, owing to both the highly designed to enable the study of com-
competing financial interests; both are
variable nature of LD and the lack of mon variants for association to disease members of the International HapMap
knowledge of the true demographic and drug response, but it should not consortium.
history of population samples. Current be seen in isolation of other resources,
statistical methods, such as various for example, a catalogue of all func-
haplotype block definitions, plots of tional variants. Overlaid on the anno- Correspondence should be sent to:
average LD vs physical distance, LDU tated genome sequence, a haplotype P Deloukas, Wellcome Trust Sanger
maps and statistical estimates of re- map that integrates functional var- Institute, Hinxton, Cambridgeshire CB10
combination rates, all generate differ- iants will become a very powerful tool 1SA, UK.
ent views of LD, which often do not for pharmacogenetic and pharmacoge- Tel: þ 44 0 1223 834 244
fully overlap. Appropriate training sets nomic studies. Both candidate gene Fax: þ 44 01223 494919
for evaluating these methods are being and whole-genome approaches will E-mail: panos@sanger.ac.uk
developed, for example, deep re-se- make use of the HapMap resource to
quencing of genomic regions and screen common haplotypes for asso-
