Escolar Documentos
Profissional Documentos
Cultura Documentos
& 2004 Nature Publishing Group All rights reserved 1470-269X/04 $25.00
www.nature.com/tpj
PERSPECTIVE
human genome. A comprehensive SNPs.7 Based on this rationale, an evenly throughout the human gen-
set of annotated human genes is international effort, the HapMap pro- ome, resulting in sparse SNP coverage
available, and comparative analysis ject, was launched in October 2002, in many regions. The first deliverable
of the human genome sequence with the aim of constructing a gen- of the HapMap project is therefore the
to those of other species is highlight- ome-wide map of LD and common discovery of new SNPs. New shotgun
ing additional evolutionary conserved haplotypes in four populations from sequence data have been generated
regions, which are likely to be en- Africa, East-Asia and Europe. across the whole genome, using li-
riched in functional elements such braries made from DNA samples of a
as transcriptional regulators. As a range of individuals. The sequence
result, we can begin to systematically THE HAPMAP PROJECT data are being aligned to the finished
explore the sequence variation con- The study-design of the HapMap pro- human sequence, and candidate SNPs
tent of the functional part of the ject includes four population samples, are being detected using the program
genome, which recent studies estimate namely 30 trios from CEPH/Utah SSAHA-SNP. The new data, coupled
not to exceed 5%, in the human families (North European descent), 30 with improvements to the SNP-calling
population. Yoruban (Nigeria) trios, 48 unrelated algorithms, have already made a sig-
An association study can be under- Japanese and 48 unrelated Chan Chi- nificant contribution to the current
taken using either a direct or an nese. The International Consortium total of 4.5 million SNPs; it is antici-
indirect approach. The former relies (members listed at http://www.hapma- pated that this figure will exceed 5.5
on the availability of a complete list of p.org) adopted a hierarchical mapping million by the end of 2003. Further-
functional variants, which are then strategy. In phase 1 of the project, a more, the accumulation of sequence
tested for association to the trait of map of evenly spaced SNPs (1 per 5 kb) data through these efforts made it
interest in a number of phenotypically with minor allele frequency greater possible to devise a filter for selecting
matched cases and controls. The latter than or equal to 0.05 is being gener- SNPs that are likely to be common
involves testing a series of genomic ated. Analysis of local LD patterns in (minor allele frequency 40.05). Over
variants across a region (or all) of the each population will identify ‘haplo- 1.3 million of the currently available
genome for association, and relies on type blocks’, and also identify the SNPs have each allele ascertained
the assumption that the causative intervening regions that require data by two individual sequence reads.
variant will be in linkage disequili- from additional SNPs to try to detect Empirical data confirm that these
brium (LD) with one of the variants LD. Such regions will be the main ‘double-hit’ SNPs convert to working
tested. Thus, characterisation and un- focus of phase 2, which may include assays at a much higher rate than
derstanding of the dynamics of LD multiple rounds of SNP selection and random SNPs.
across the genome is necessary for genotyping. Some additional SNPs will The second deliverable of the Hap-
enabling whole-genome association also be typed within the emerging Map project will be a genome-wide set
studies. ‘haplotype blocks’ to corroborate the of SNP assays validated in four popula-
Recent studies conducted at various findings. Work is in progress to devel- tions. SNP assay development and
levels of resolution have all shown op an optimal statistical approach that validation remain a costly and la-
that the extent of LD in different parts describes accurately the highly vari- bour-intensive exercise; currently,
of the genome is highly variable, able nature of LD, and to provide such work needs to be carried out
averaging 5–20 kb but extending up precise parameters to assess comple- upfront for almost every genetic study
to hundreds of kilobases.3–5 Variability tion of the project in each region of undertaken. Like other large-scale
in average LD has also been observed the genome. It is estimated that over genomic projects, The HapMap con-
between ethnic groups, owing to dif- 1.6 million SNPs will be tested in the sortium has set up continuous quality
ferences in their demographic history, course of the project. Data will be control and assessment of the produc-
for example, population bottlenecks, released regularly into the public do- tion pipeline. High throughput geno-
admixture, genetic drift and natural main, initially via the consortium’s typing is carried out on five different
selection. The key observation is that Data Coordination Center (DCC) platforms: MassExtend (Sequenom),
present day chromosomes consist (http://www.hapmap.org), and from Invader (Third Wave), AcycloPrime-FP
mainly of short segments that have there to dbSNP and other public (Perkin-Elmer), Golden Gate-BeadAr-
undergone very limited or no historic databases. ray (Illumina) and Parallele. Cross-
recombination, and that for each of What are the HapMap deliverables, checking has enabled independent
these segments a few common haplo- and how may they contribute to the assessment of the performance of the
types represent the majority in a advancement of human genetics in different platforms. Raw genotype
population.3,6 For each region of high general and pharmacogenomics in data conform to a consistent high
LD and low haplotypic diversity (also particular? standard (499.5% accuracy), based
termed a haplotype block), only a few At the outset of the HapMap project, on reproducibility of results in dupli-
variants are then needed to tag the it was recognised that the collection of cate samples, concordance using in-
common haplotypes within it. These 2.4 million publicly available SNPs dependent assays for the same SNP
are referred to as haplotype tag (ht) (October 2002) was distributed un- and checks for consistency of allelic
www.nature.com/tpj
HapMap project and its application
P Deloukas and D Bentley
90
inheritance patterns in pedigrees. The FUTURE BENEFITS/APPLICATIONS evaluation of new drugs during devel-
HapMap effort to assess genotyping In the course of the next 2 years, the opment and in clinical trials. The
platforms will be of direct relevance to HapMap project will result in a vali- importance of concerted international
the use of large-scale genotyping for dated set of SNP-based markers that efforts to produce public resources is
genetic studies in the future. describe local patterns of LD and underlined by the HapMap project,
The third deliverable of the project capture common haplotypes across which follows the principles adopted
will be an LD map of the human most of the genome in four popula- for the human genome sequence
genome and the determination of the tions. Clearly, the applicability of the itself.
underlying common haplotypes in HapMap in studying variation in po-
regions of strong LD. Characterisation pulations other than those included in
of local patterns of LD at high resolu- the study will need to be thoroughly
DUALITY OF INTEREST
tion across the genome is a challen- tested. As a tool, the HapMap is
The authors declare that they have no
ging task, owing to both the highly designed to enable the study of com-
competing financial interests; both are
variable nature of LD and the lack of mon variants for association to disease members of the International HapMap
knowledge of the true demographic and drug response, but it should not consortium.
history of population samples. Current be seen in isolation of other resources,
statistical methods, such as various for example, a catalogue of all func-
haplotype block definitions, plots of tional variants. Overlaid on the anno- Correspondence should be sent to:
average LD vs physical distance, LDU tated genome sequence, a haplotype P Deloukas, Wellcome Trust Sanger
maps and statistical estimates of re- map that integrates functional var- Institute, Hinxton, Cambridgeshire CB10
combination rates, all generate differ- iants will become a very powerful tool 1SA, UK.
ent views of LD, which often do not for pharmacogenetic and pharmacoge- Tel: þ 44 0 1223 834 244
fully overlap. Appropriate training sets nomic studies. Both candidate gene Fax: þ 44 01223 494919
for evaluating these methods are being and whole-genome approaches will E-mail: panos@sanger.ac.uk
developed, for example, deep re-se- make use of the HapMap resource to
quencing of genomic regions and screen common haplotypes for asso-
genotyping of all identified variants. ciation to a complex genetic trait. The REFERENCES
1 Lazarou J et al. Incidence of adverse drug
However, new tools are likely to be candidate gene approach relies on our reactions in hospitalized patients: a meta-
needed and their development, knowledge of biochemical pathways analysis of prospective studies. JAMA 1998;
although a priority for the consortium, and gene interactions, and can be 279: 1200–1205.
2 Risch NJ. Searching for genetic determinants
will be open to the entire field with the complemented by targeted sequencing
in the new millennium. Nature 2000; 405:
prompt release of all raw genotype to discover rarer variants. As the 847–856.
data. Tools to determine the under- picture for the major candidates be- 3 Patil N et al. Blocks of limited haplotype
lying common haplotypes in each comes unravelled, the whole-genome diversity revealed by high-resolution scan-
ning of human chromosome 21. Science
defined region of strong LD are already approaches will come into their own
2001; 294: 1719–1723.
available. The final product, the hap- to find new targets of clinical impor- 4 Gabriel SB et al. The structure of haplotype
lotype map, will be a valuable tool for tance that do not rely on prior hy- blocks in the human genome. Science 2002;
choosing optimal marker sets in any potheses of gene or protein function. 296: 2225–2229.
5 Dawson E et al. A first generation linkage
study undertaking LD mapping of As a result of the concerted use of both
disequilibrium map of human chromosome
common complex traits. A fourth approaches, the identification of hu- 22. Nature 2002; 418: 544–548.
deliverable of the HapMap project will man sequence variants which alter 6 Daly MJ et al. High-resolution haplotype
thus be a minimal set of reference gene expression and protein function structure in the human genome. Nat Genet
2001; 29: 229–232.
markers that tag each of the common will become a driving force for the
7 Johnson GC et al. Haplotype tagging for the
haplotypes (hence ‘haplotype tag development of suitable genotype- identification of common disease genes. Nat
SNPs’, or htSNPs). based screening methods to support Genet 2001; 29: 233–237.