Life with 6000 Genes
A, Gofieau; B. G, Barrell; H. Bussey; R. W. Davis
J.D. Hoheisel; C. Jacq; M. Johnston; E. J. Louis; H. W. Mewes; Y. Murakami; P.
Philippsen; H. Tettelin; S. G. Oliver
Science, New Series, Vol. 274, No. 5287, Genome Issue (Oct. 25, 1996), 546+563-567.
Stable URL
hp: flinks,jstor-org/siisici=0036-8075% 2819961025%293%3A274%3A5287%3CS46%SALW6G%3
CO%3B:
Science is currently published by American Association for the Advancement of Science.
‘Your use of the ISTOR archive indicates your acceptance of JSTOR’s Terms and Conditions of Use, available at
hup:/www stor orglabout/terms.html. ISTOR’s Terms and Conditions of Use provides, in par, that unless you
have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and
you may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www jstor.org/joumnals/aaas.himl.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the sercen or
printed page of such transmission.
STOR is an independent not-for-profit organization dedicated to creating and preserving a digital archive of
scholarly journals. For more information regarding JSTOR, please contact support @ jstor.org,
hupulwww jstor.org/
Wed Aug 17 18:32:45 2005by Fi across ho whoa goremo. A subst othe
‘eu wore sotced wich threo Me
‘eno a the VAC hang boan cher, sna ho
poston was resoked toa cyogenat rere fs
pose to acon nih ooh) & Canon
‘ar wae von foreach of tae YA, whch
“lowed tes postion In cantmorgira tobe
etomined Bacau ths td comand es data
{or hvomesomos 10,21, and 28, an atomatie
‘Seaton was vec Cones at ro norracomtinant
‘stretoct to Ganstnon meron tia MCD
fr aly Aatre 380 (upc), (1990) and havo
Siracun extoganot locates [Orine Nao.
Wn Ienerarco in an {OM pene
nezunmin gor wee teed oes hs
28, KLE Merton Proc Net Acad, So US.A 88,7474
(000.
Dette netneion and examples or provid cn
ten eto
90. GID Shuler, JA Epa H. Cra, J Kans,
ters Enya 206, 4 (15)
‘1. Fe ado sso ded ambiguous resus
Bresutetopresenas an eararng masse band
poor epee,
2. Teco be thereoutf otra vil per tb
Leing nor ora saqyoos cuserrg over wich
twa separate genes wee orenety are 10
freee unsere ety
Typing arora na tameviors marker woul alow a
(eco EST to map wth sonar ld sores hoo
fan of to os ro for nk} ea como ro.
Irozema ean it nu Te to locale the
Trang iar, ned torneo.
brome camod by tw enero png tthe
amon mare,
4M rag and WA, Bekmere, oases 16, S49
irons.
8, V, Romano eal, Cytaganet. Cat Gane. 48, 148,
(hose
28, § Batra, M. esha, 9. Garaahy, TS,
Pre: Nat Acad Sat USA 81,6259 0038).
8
7. G0, Blingohy ta, Ar. Har. Gane. 62,288
{ioasi-8'8 Schnee: Natt Acad Sa
Usa ea, 3147 (098)
58. L Strengon of st, Muze 378,754 (19955 0.
Loitanand Gren 977, 2 (1099, 8
Hane et ats Somce 271, 300 (1900,
99, 5 Tugndtic et at, Hum. Mal Gano. 8, 1500
(iodsy,0.€ Basso, MS Bogut, P Het,
etre 379, 59 (1006; Phir, O Basco se
ao, Are Gane 13, 253 (106)
40. Tho socing sees wer ain acid eubetuon
‘ratoes base on the Abarth
{en modal of exlulonay dstanco, Pam mates
‘maybe generat fran rane t Pads oe
{relation of sbsanedmitnionfequencos ft 0
Dayhctot l,m aioe of Porn Sequence and
Shictre M0, Dayo Ea Naor Becca!
soar Foundation, Washing, OC, 1979) 9
Sravpc 3.pp 386-352.6.F Absa MOL Eo!
36,390 1083, Pad mation we iors or
Storng macs bacon Soquoncs cach oe
1S epae pete pool a emafing rcs
(Otter organ in Tab he PARDO mate
‘naa wed bacauoeitha een soon abe good ot
(rerapurpose saecting (SF asc, Mol
Bor'219, S06 1001), The BLAST crear
Gen and. J States, Nature Gene 3,268 (1935),
taics a rocoto soqwonco aus (EST or goroh
tale to as costa! OF, ad hon
‘Compare ese with pees saoe he da
Sua, TELASTN perms 9 smi rcton bat Fe
‘Sous couchos a eran query semeece agarst
‘Stare Waneatens of ach ey ha nasuctoo
‘Stouron ctabce Senos wey postr wih
ES tesserace™ tes astro rman aed secon
‘aryexpoctten paar
41, MA, Perea Vance Arm, Ham, Gane 4,
oot W001; We. Sweater eta, Pec At
esa, Sei USA 90,1077 (1095) Fil oa.
{Cat78, 1027 (0u8) N.Papaccpouls ot at. Se!
‘on 69,1825 1004; Sharer al C78 335,
Life with 6000 Genes
A. Goffeau,” B. G. Barrell, H. Bussey, R. W. Davis, B. Dujon,
H. Feldmann, F. Galibert, J. D. Hoheisel, C. Jacq, M. Johnston,
E. J. Louis, H. W. Mewes, Y. Murakami, P. Philippsen,
H. Tettelin, S. G. Oliver
‘The genome of the yeast Saccharomyces cerevisiae has been completely sequenced
through a worldwide collaboration. The sequence of 12,068 kilobases defines 5885
potential protein-encoding genes, approximately 140 genes specifying ribosomal RNA,
40 genes for small nuclear RNA molecules, and 275 transfer RNA genes. In addition, the
complete sequence provides information about the higher order organization of yeasts
16 chromosomes and allows some insight into their evolutionary history. The genome
shows a considerable amount of apparent genetic redundancy, and one of the major
problems to be tackled during the next stage ofthe yeast genome project isto elucidate
the biological functions of all of these genes.
Tle genome of the yeast Saccharomyces
cerevisiae has been completely sequenced
through an intemational effore involving
some 600 scientist in Europe, North Amer
tea, and Japan. [tithe largest genome to be
completely sequenced s0 for (a record that
‘we hope will soon be bettered) and is the
ete genome sequence of a et
karyote, A number of public data libraries
compiling the mapping. information and
546
nucleotide and protein sequence data from
teach of the 16 yeast chromosomes (1-16)
have been established (Table 1).
“The position of S. cereisiae as a model
eukaryote owes much to its intrinsic advan-
tages as an experimental system, [tis a
unicellular organism that (unlike many
‘more complex eukaryotes) can be grown on
defined media, which gives the experiment
cer complete control aver its chemical and
SCIENCE + VOL.274 + 25 OCTOBER 1996
95S Manet ate 85,4881
Py t ayt 967. 70 (1086)
42, Tepotowta let fa camprehenave ge ap it
at sates by to fat tat 82% cf pos that
Fave ‘been posiony coned. 10 ‘dat ae
‘opresoridby eno ormera ESTsin GenBank ith!
Dura. nesimh govsBESTISCEST gros
49, KH Fama A, ui, OT Koga, A
(Sa As ce 22 Ste (1958)
{4M Bora fa, Ganone Fe 6,791 (1998.
45, Coreonsus eoguaneos fom at ae he oven
fing ESTs wore gored ten 294 Union ca.
toe oF ho 29975) eased er pars, 138
(52%) of tooo ytd axccost FOR asa
46, We terk M. . Aodereon A J. Cobre, DF
Coury, Dono, D. ey, Tern J
Keayourfan, Tan, W. Ye. ard |S. Zorstova
trom the Wiha ratte fo tia ast.
thom We tn. Mle, Eyes, Lan,
fa A. Scharf eoaetalconrcuten foward
tho development of UniGene. Supt by NH
fnarce HOODOWD fo ESL. NOOO to RIMM
Fes to INS anc HGDDTST to TG and
by tho Wha for Soma sarch
fed the Weleore Tt 34 sa report ota
{initan Sort ord om to Made esearch
Couret ef Coreda, GP Sanat estan.
ter of the Howard Hughes Meal tte, The
‘Stefan Genre Cae aeths Vina
Fretito-MIT Goreme Carr ar thanks fore
‘lporucioaldce muchas with rds donated by
Sandor Pharmocoieat. Goran eapptedby
tha Resconton encase corsineMyopatioe nd
tre Growpanent Suces suf Goroma. The
Sanger Cont, Ganehon and Ooo gratohd
Terauppat tom te Ewopaan Union VRHEST Eo
{farmer Human Genome Grganzaten andthe
Wotcors Trust sporereda srs ofranings rom
(tober 12040 November 095 tho yee
‘slabraton vous t hae boo poste
physical environment. S. cerevisiae has. life
‘cycle that is ideally suited to classical ge
netic analysis, and this has permitted con-
struction of a detailed genetic map that
dlefines the haploid set of 16 chromosomes.
Moreover, very efficient techniques have
been developed that permit any of the 6000
genes to be replaced with a mutane allele, or
completely deleted from the genome, with
absolute accuracy (17-19). The combina
tion ofa large number of chromosomes and
a small genome size meant that it was pos-
sible to divide sequencing responsibilities
‘conveniently among the different interna
tional groups involved in the project.
Old Questions and New Answers
The genome. At the heginning of the se-
quencing project, perhaps 1000 genes en-
coding either RNA or protein products had
been defined by genetic analysis (20). The
complete genome sequence defines some
5885 open reading frames (ORFs) that are
likely to specify protein products in the
yeast cell. This means that a protein-encod-
Ing gene is found for every 2 kb of the yeast
genome, with almost 70% of the total se-
‘quence consisting of ORFs (21). The yeast
enome is much more compact than those
ofits more complex relatives in the eukary=‘tic world. By contrast, the genoine of the
nematode worm contains a potential pro-
tein-encosing gene every 6 kb (22), and in
the human genome, some 30 kb or more of
sequence must be ‘examined in order to
uncover such a gene. Analysis of the yeast
genome reveals the existence of 6275 ORFs
that theoretically could encode proteins
longer than 99 amino acids. However, 390
ORFs are unlikely to be translated into
proteins. Thus, only 5885 protein-encoding.
genes are believed to exist. In addition, the
yeast genome contains some 140 ribosomal
RNA genes in a large tandem array on
chromosome XII and 40_genes encoding
small nuclear RNAs (spRNAs) scattered
throughout the 16 chromosomes; 275 trans-
fer RNA (tRNA) genes (belonging to 43
families) are aso widely distributed. Table
2, which provides details ofthe distribution
of genes and other sequence elements
among yeast’s 16 chromosomes, shows that
the genome has been completely se-
‘quenced, with the exception of a set of
identical genes repeated in tandem.
‘The compact nature of the S. cerevisiae
senome is remarkable even when compared
with the genomes of other yeasts and fungi
Current data from the systematic sequence
analysis of the genome of the fision yeast
K Gafay apa Tataln, Uaversis Cache do Ls
“an, Unto do Bost Pryaelogauo, Pace rch a
‘a, 22D, 138 Laas Now, Balu
Baral Songer Con, Hrscon a Hon Car
trig COTO 188 UK
tse, Omarion ology, McGlunsosy, 1205,
Doctir Pena ver, Meet HT, Can,
FLW. Dass Deport of Boch, Stanford Un
‘ery, Bockman Carr, Room BON Stato, CA
‘taps sa07, USA.
Dun, Unt de Géngteue Makcuare des Lowes
[URAT 42 CNRS UPra2? Urverate Pot Mate
‘nett actu, 28 Pus Dostour Fu, 70724
Pare Coaoe 16, France
Feldman, Urtorett Minehen, nati i Pro
ge ane, Sear 2 S08 Mae,
F Galan, Lapoatore do Sccimie Molcouste, UPR
‘1i-Facunt a Midece, CNRS, 2 Arun du Boe
fear Léen Borar, 22043 Rares Coden, rao.
4°. fcneea, Wieser Genet Genome Sra
Dontachos Kebserecrangszonrun, in Neusrhoret
Fel, £06, 99120 Nedaberg. Goan.
Cdaeg, Gide Maceo, Ecole Normaio Sy
tres CMS UPA, 46 Ps Ub 75200 Pars
asx, Franco.
1 Jofesion, Deparment Goats, Box 252, West
lon Uneraty Medal Stoo, se Sct Avera, St
tue MO er 10, USA.
ES Lu, Yost Gonsts, latte of Moco Mak
re, Jn Roce Hospi, Oxford OX ODU, Uk
HOW Mees, Mae Panett Soon Nari:
Foxit or Proton Scorers MIPS, Am Kept
1 2189 Morro, Goma
Huan Genome seach FEN, Kojo Tous
Seenes Oy 3-1 05 era, Japan
P.Priposn, het or Apple Merbiobgy, ies
fat Saee ngobemerse 0 ale Boa, Sntzerana
5.6 Gh Deparmant of Bocsemsty an Aptos
okeular Bioby, Unity of Manchest te o
SScence art Technology (MIT), Pet Ofce Box 8,
Machete 0 160.U
“To wram areqpanderce spo be SaaS.
Schizosaccharomyces pombe indicate that the
density of protein-encoding genes is ap-
proximately one per 23 kb (23). The dit
ference between these two yeast genomes
can be ascribed co the pauety of introns in
S. cerevisiae. In the fission yeast, approxi-
rately 40% of genes contain introns (21),
whereas only 4% of protein-encoding genes
in S. cerevisiae ae similarly interrupted (Ta.
ble 2). The Saccharomyces genes that do
contain introns [notably, those encoding
ribosomal proteins (24)] usually have only
fone small intron close to the start of the
coding sequence (often interrupting the i
tiator codon) (25). It has even been sug-
gested (26) that many yeast genes represent
DNA copies thae have been generated by
the action of reverse transcripases specified
by retrotransposons (Ty elements)
The chromosomes. A complete genome
sequence provides more information than
the sum of all che genes (or ORFS) that it
contains. In particular, it permits an is
vestigation of the higher order onganiza-
tion of the S. cerevisiae genome. An ex-
ample is the longerange variation in base
Composition. Many yeast chromosomes
consist of altemating large domains of
GCerich and GC-poor DNA (2!, 27),
generally correlating with che variation in
gene density along these chromosomes.
the case of chromosome Ill, it has been
demonstrated that the periowiciey in base
‘composition is paralleled by a variation in
recombination frequeney along the eh
mosome arms, with the GC-rich peaks
coinciding with regions of high recom
ration in the midklle of each arm of the
chromosome and AT-rich troughs eoin-
ciding with the recombination-poor cen-
tromerie and telomeric sequences, Sim
chen and co-workers (28) have demo
strated that the relative incidence of dou i
ble-srand breaks, which are tho
initiate genetic recombination in yeast
(29), coreelates directly with the GC-rich
regions of this chromosome,
‘The four smallest chromosomes (I, Ill
Vi, and IX) exhibit average recombination
frequencies some 1.3 to L.8 times greater
than the average for the genome asa whole
Kahack (30) has suggested that high levels
of recombination have been selected for on
these very small chromosomes to ensure at
least one erossover pet meiosis and so per
mit them to segregate corectly. Its known
that areicial chromosomes (or chromo-
some fragments) of approximately 150 kb in
size are mitotically unstable (31, 32), which
raises the related questions of whether there
{sa minimal size for yeast chromosomes and
how the. smallest chromosomes have
achieved their current sie (see Table 2),
‘The organisation of chromosome I is very
uunusuak The 31 kb at each ofits ends are
very gene-poor, and Bussey etal. (5) have
suggested that these terminal domains may
act as “fillers” to increase the size, and
hhence the stability, of this smallest yeast
chromosome
‘Genetic redundancy is the rule at the
ends of yeast chromosomes, For instance,
the two terminal domains of chromosome
I show considerable nucleotide sequence
homology both to one another and to the
terminal domains of other chromosomes (V
and XI), The right terminal region of ehro-
smosome | is duplicated at the left end of
chromosome | (5) and at the right end of
chromosome VIII (3). The sugar fermenta-
tion genes MAL, SUC, and MEL all have @
ruber of telomere-atsoeiated copies, not
all of which are expressed (33-35). An
nteresting feature ofthe distebution of the
Table 4. Finding yeast genome formation on the Iniemet. A fuller descrpton of these and other
resouress may be found in (35,
Intemet acresees
FFP ites for he complete S. cares gonome sequence
{tpsmips.emtinst.org (cractoryyeast)
‘pb ac uk (drectory/pubidatzbases/yeast)
genome. stanford eau (deecteryyeastgename-seq)
‘Other S.cereusine ta leas
MIPS, Marnsied, Germany Iii /we mipsilocnem mo deyeast)
‘Sanger Center, Finxon, UK ftp wien sanger.ac ul/yeast ome im)
‘Saccharomyoes Genome Database (S60), Stanford Unversity, USA tpl/genome-
"wht stanford 2au/Saccharomyees)
‘SWISS. PROT, Uniersty of Geneva, Switzeand (ht: expasy heuge chisprt/sp-doou hin)
‘Yaast Protsin Database (YPD}, Proteome ne, avery, MA, USA
(htipw: proteome corYEDhome ht
‘Genedz, Europeen Molecular Biology Laboratory, Heidelberg, Germany
{hip:/srawrembt-heidetberg.de/~genequ)
IEF, National Cener for Bid.ogeal formation, Baltenore, MD, USA
{tstpv/wavened im nin. govXREFA)
{Edtor's note: For readers who would Ike a user-fendly uid to the yeast databases, please see
the specal feature atthe Sciorea Web st tp iw sclencemag org/scencevfeature/data