Você está na página 1de 102

Part 1

The Structure of
Genes and Genomes

[ Chapter 2 in Griffiths et al. ]


What’s on the Menu
What is the structure of DNA?
What is the organization of a gene?
What are chromosomes? What is chromatin?
What’s in a genome?
How are genes organized in the genome?
How are genomes different among living organisms?
1. The Structure of DNA
Nature of DNA
• Transformation (uptake of foreign DNA) in
prokaryotes and eukaryotes has repeatedly
shown that DNA is the hereditary material.
• DNA is accurately replicated prior to each
cell division.
• DNA encodes proteins needed by the cell
and the organism.
• DNA is capable of mutation, providing raw
material for evolutionary change.
The Griffith and Avery experiments (1928-1944)
A

E
B
D

?
F

Negative control
experiment: degrade DNA
with DNase -> virulence
not transmitted anymore

DNA is the hereditary material.


The nucleotide
• Building block of DNA (and RNA)
• Deoxyribose (pentose sugar), with 3’ –OH
• Phosphate (on 5’ carbon) N

• Nitrogenous base base

– purine N
• Adenine: A 5’
P C
• Guanine: G O
– pyrimidine sugar
• Thymine: T
• Cytosine: C OH
3’
The structure of DNA:
The Double Helix
WATSON, J.D. & CRICK, F.H.C. A Structure for
Deoxyribose Nucleic Acid. Nature 171, 737-738 (1953)

“We wish to suggest a structure for the salt of deoxyribose


nucleic acid (D.N.A.). This structure has novel features which are
of considerable biological interest.”

James Watson, Francis Crick and Maurice Wilkins, Nobel Prize 1962
The double helix
• DNA normally consists of two
antiparallel polynucleotide chains
– sugar–phosphate backbone
• phosphodiester bonds
• 5’ to 3’ connection
– complementary base pairs
• A–T
• G–C
• hydrogen bonds
– 2 per A – T
– 3 per G – C

• 5’ → 3’ chain polarity
• Major and minor grooves (see
model)
5’ 3’
3’ 5’

5’-AATTGGCCGATC-3’
3’-TTAACCGGCTAG-5’
Figure 1.9 Genomes 3 (© Garland Science 2007)
WATSON, J.D. & CRICK, F.H.C. A Structure for
Deoxyribose Nucleic Acid. Nature 171, 737-738 (1953)

“We wish to suggest a structure for the salt of deoxyribose


nucleic acid (D.N.A.). This structure has novel features which are
of considerable biological interest.”

James Watson, Francis Crick and Maurice Wilkins, Nobel Prize 1962
Blah, blah, blah...

???...
Rosalind Franklin
“The Dark Lady of DNA”
1920-1958

X-ray diffraction photograph of DNA


Franklin R & Goslind RG. Evidence for a 2-chain Helix in
the Crystalline Structure of Sodium Deoxyribonucleate.
Nature 172: 156 (1953)

"The instant I saw the picture my mouth fell open and my pulse began
to race.... the black cross of reflections which dominated the picture
could arise only from a helical structure... mere inspection of the X-ray
picture gave several of the vital helical parameters.” -JD Watson
Franklin and Wilkins X-ray diffraction studies
revealed that DNA was helical and had two
distinctive regularities of 0.34 nm and 3.4 nm along
the axis of the molecule. In addition, it was shown
that DNA had a uniform thickness of 2 nm.
Rosalind Franklin 2 nm

3.4 nm
10 bp

Maurice Wilkins
The DNA double helix is 2 nm wide.
A stack of 10 base pairs (= one turn of the
helix) have a linear length of 3.4 nm.

2 nm
Erwin Chargaff’s rules:
(early 1950’s)

1. The composition of DNA may


vary from one species to
another in the relative amount
of A, T, C, G
2: But for any DNA:
%A=%T
%C=%G
Chargaff’s rules
The double helix model of Watson & Crick: conclusions

The model fitted:


(1) The X-ray diffraction data produced by Franklin
(2) Chargaff’s rules

The structure also fulfilled the requirements for a


hereditary molecule:
(1) The ability to store information (coding capacity)
(2) The ability to self-replicate (strand separation and
specificity of base pairing)
(3) The ability to change over time, ie. to mutate
(base substitution)

WATSON, J.D. & CRICK, F.H.C. A Structure for


Deoxyribose Nucleic Acid. Nature 171, 737-738 (1953)
Blah, blah, blah...

YIPEEE!

James Watson, Francis Crick and Maurice Wilkins, Nobel Prize 1962
DNA: summary
• Units of measurement
– base pair (bp)
– kilobase (kb)
– megabase (Mb)
• Replication: each strand serves as template
for synthesis of complement, using rules of
base pairing
• Information: specified by sequence of
nucleotides; may be copied into RNA
• Mutation: replacement, insertion, deletion
of nucleotide results in altered sequence
2. The Structure of Genes
Structure of genes
• Gene = transcriptional unit
• Gene may encode coding RNA (mRNA) OR
non-coding RNA (tRNA, rRNA, miRNA...)
gene
promoter DNA encoding functional RNA

TSS
RNA primary transcript TTS
=Transcription =Transcription
start site termination site

• Gene is a functional element of the chromosome and is transcribed into


RNA at the correct time and place in development or cell cycle
• To some researchers, gene actually includes its adjacent regulatory
region(s) such as the promoter (remember, definition of gene may
vary)
Eukaryotic genes: introns and exons
• Intron: noncoding region of gene, excised from
primary RNA transcript (=intron splicing)
• zero to many intron per eukaryotic gene
• variable length, may represent most of gene length
• Function of introns poorly understood (no general function
known, but they often contain functional regulatory sequences)
• Exon: coding region of gene (sequence is
included in mature transcript)

Primary E1 I1 E2 I2 E3 I3 E4
transcript nuclear processing steps,
including splicing
Mature transcript E1 E2 E3 E4
Introns are only present in eukaryotic genes
(but they may be absent in some eukaryotic genes)
Because of the abundance and large size of introns,
some eukaryotic genes can attain huge sizes

An extreme example: human dystrophin gene, 2.5 Mb long


(1.5% of entire chr. X), 78 introns !
3. The Structure of
Genomes
The nature of genomes
Genomics: study of structure and function of genomes

• Nuclear Genome (very variable size, especially in


eukaryotes)
• Organellar genomes
– chloroplast, mitochondrion
– derived by endosymbiosis from bacterial ancestors
• Plasmids
– symbiotic DNA molecules, not essential but often useful to
the organism (antibiotic resistance)
– mostly circular in prokaryotes
The Prokaryotic genome
• Usually circular DNA, often a single
molecule per cell (= 1 single chromosome)
• Gene-dense: genes are close together with
little intergenic spacer
• Genes are often organized in operon
– tandem cluster of coordinately regulated genes
– Several genes transcribed as single mRNA
• No spliceosomal introns
The Lactose Operon
E. coli
A ‘simple’ genome:
the bacterial genome
§ Unicellular organism
§ Single, circular
chromosome
§ Compact genome, gene
dense, 90% coding DNA
§ ~500 to ~5,000 genes,
depending on species
Mycoplasma: one of the smallest genom
- Extremely streamlined: 580 kb
- ~500 genes
- Gene-dense: 90% is coding DNA
- Very short intergenic regions
Viral genomes
• Virus=Replicating, infectious but nonliving particle, can only
replicate within a host cell (and can move from cell to cell)
– Genome=nucleic acid
– Encodes multiple proteins Note: In prokaryotes,
viruses are referred
Many viruses cause infectious diseases
to as bacteriophages.
• Viral Genome = DNA or RNA
– single-stranded or double-stranded
– linear or circular
• Generally compact genomes with little spacer DNA, containing
from a few to many genes (note: some viral genes contain introns, like
eukaryotic genes. Recently giant ‘mimivirus’ discovered, 1.1 Mb long)
• Unknown origin of viruses, but some appear to have evolved
from mobile genetic elements. The latter are normally contained
within the host genome but they can acquire infectious capacity
allowing them to escape the cell (e.g. retroviruses originate from
retrotransposons)
The flu virus, influenza: a single-stranded RNA virus

Neuraminidase

Hemaglutinin

1918 Influenza pandemics caused 20-30 million deaths


The swine flu virus, H1N1 (ssRNA virus)
Conceptual structure of HIV, a ssRNA
virus using reverse-transcription
The replicative
cycle of HIV-1
retrovirus
Widespread in eukaryotes
Example: Ty1 element in yeast,

Restricted to
vertebrates
By acquisition of envelope gene, a Example: HIV-1
retrotransposon can gain infectious
capacity, it becomes a retrovirus
(example of an intermediate: gypsy in
fruit flies)
Eukaryotic genome: 2 or 3 genomes per cell
The Mitochondrial Genome

- Circular
- Resembles a reduced
prokaryotic genome in
terms of organization and
gene numbers
- Encodes genes mostly
involved in the production
of energy (oxydative
phosphorylation) and in
translation (tRNAs, rRNA)
- Maternally inherited
Mitochondrial genomes can vary in
size between species

Human
(mammal)
16.5 kb

Marchantia
(moss)
186 kb
Yeast
(fungus)
75 kb
The Chloroplast Genome (plants)
- Circular
- Resembles a reduced
prokaryotic genome in
terms of organization and
gene numbers
- Encodes genes mostly
involved in photosynthesis
and electron transport
- ‘Maternally’ inherited
(transmitted through the
seed)
Marchantia (moss) - Do not vary much in size
CpDNA 121 kb
4. The structure of
eukaryotic chromosomes
Eukaryotic nuclear genome: chromosomes (1)
• Linear structure
• Chromosome number is conserved within species but greatly
varies between species
• Ploidy refers to number of complete sets of chromosomes
– haploid (1n): one complete set of chromosomes (e.g. yeast)
– diploid (2n): two sets of chromosomes (e.g. most animals)
– polyploid (≥3n): more than two sets (e.g. many plants, a few animals)
• In diploids, chromosomes come in homologous pairs
(homologs)
– structurally similar (i.e size and position of centromere)
– same assortment of genes (homologous genes)
– may contain different alleles for each gene: each gene exist either as
homozygote state (same two alleles) or heterozygote state (two different
alleles)
In humans, somatic cells have 2n = 46 chromosomes
Remy’s
karyotype
Human chromosomes 11 and 17
Eukaryotic nuclear genome:
chromosomes (2)
• Cytogenetics: microscopic study of chromosomes
• Variable centromere position
– telocentric: centromere at end
– acrocentric: centromere close to end
– metacentric: centromere in middle
– For human chromosomes: p arm is shortest, q arm is
longest
• Telomere: end of chromosome
• Nucleolar organizer region (NOR): The chromosomal region
around which the nucleolus forms (contains rRNA gene
tandem array)
• Chromomere (or knob): small bead-like region of condensed
chromatin visible during meiosis and mitosis
Maize chromosomes (2n = 20)
Eukaryotic nuclear genome:
chromosomes (3)
• Considerable difference in size and in the
number of genes carried on chromosomes, both
between and within species
• Genes may occupy only a minor fraction of a
chromosome (extreme case is human Y)
Human chromosomes: size and gene density
300
Chromosome size (Mb)
250
Gene density (per 10 Mb)
200

150

100

50

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y
1 2 3 4 5 6 7 8 9 10 1112 13 1415 16 17 18 19 20 21 22 X Y
Eukaryotic nuclear genome:
chromosomes (4)
• Heterochromatin
– densely stained regions of highly condensed
DNA
– mostly made of non-coding repetitive DNA, low
gene density and transcription activity
• Euchromatin:
– poorly stained, less compact chromatin
– contains most transcribed genes

Note: Polytene chromosomes


– replicated, unseparated chromosomes
– present in certain tissues of dipteran insects (salivary
glands)
The chromosomes of maize (corn)
Microscopic view of a
tomato chromosome
Organization of Nuclear DNA
• Highly organized, various degrees of coiling
• Nucleosome
– fundamental unit of chromatin
– 146 bp of DNA wrapped twice around histone core
(histone octamer)
A haploid set of
• histones are highly conserved proteins
human
• H2A, H2B, H3, H4
chromosomes
• Chemical modification of histones underlie
consists of about
changes in chromatin compaction
3 feet of DNA !
– Nucleosome forms a 10 nm fiber
– 6 nucleosomes coil to form Solenoid, 30 nm fiber
• Higher order coiling
– solenoid loops attach to scaffold (SAR, MAR)
– form larger diameter fibers

Chromatin is a highly dynamic structure


Illustration of DNA wrapped around histones, forming a nucleosome
Illustration of DNA wrapped around histones, forming a nucleosome

H1

H3 H4

H2A H2B
Solenoid
Electron micrograph of chromosome
shows long DNA loops emanating
from the protein scaffold (at the
bottom of the pic) . Note that there are
only loops -no ends- at the top of the
pic.
5. Genome landscapes and
Comparative Genomics
Prokaryotes and eukaryotes have very
different genome landscapes
• In prokaryotes, genes are compactly
arranged, with little or no spacer sequences
in between (short intergenic regions) = most
of the genome is coding DNA
• In eukaryotes, there is considerable spacer
DNA between genes (large intergenic
regions) and within genes (introns) = most
of the genome is ‘non-coding’ DNA
Eukaryote genomes:
A whole lotta non-coding DNA
– Where is non-coding DNA? In introns, intergenic
regions, centromeric regions, telomeric regions
– the majority of non-coding DNA is repetitive DNA
= identical or nearly identical repeated units
- two types of repetitive DNA:
• Tandem repeats (e.g. DNA at centromeres and
telomeres)
• Interspersed repeats
– Most interspersed repeats are derived from
mobile genetic elements (aka transposable elements)
Genome Size
• In eukaryotes, most of the cell DNA is
from the nuclear genome
• Genome size is measured in pg or Mb
(1pg ~ 1000 Mb) human genome is ~3.2 pg
• Nuclear genome size is extremely
variable among eukaryote species
• ‘C-value paradox’ : no obvious
correlation between genome size and
organism complexity
The ‘C-value paradox’

Genome size does NOT correlate with


organismal complexity
Extensive variation in genome size (= C-value)
within and among the main groups of life

Gregory 2005
C-value of eukaryotic nuclei varies ~200,000-fold, but
there is only ~20 fold variation in the number of
protein-coding genes

Encephalitozoon cuniculi: 2.8 Mb, 2,000 genes


Navicola pelliculosa (diatom): >690,000 Mb (probably less than 40,000 genes)

-> Variation in gene numbers cannot explain


variation in genome size among eukaryotes
Transposable Elements and
genome size

• Most of variation in genome size is due to


variation in the amount of repetitive DNA (mostly
derived from TEs)
• TEs accumulate in intergenic and intronic regions
Pl

Mb
as

3000

2500

2000

1500

1000

500
mo

0
S
Bu lim diu
dd e m
i m
Fi ng y old
ssi
on east
Ne yea
u s
Ar rosp t
ab or
ido a
Br psi
as s
sic
a
Ri
c
Ne Ma e
m iz
Dr ato e
os de
op
hil
M a
Se osqu
Ze a sq ito
br ui
afi rt
sh
Fu
M gu
o
Hu use
genome size

ma
n
DNA
TE DNA
The amount of TE correlate positively with

Genomic DNA

Protein-coding

Feschotte & Pritham 2006


The proportion of protein-coding genes decreases with genome
size, while the proportion of TEs increases with genome size

TEs

Protein-coding
genes

Gregory, Nat Rev Genet 2005


Repetitive DNA and genome size
• Variation in gene numbers cannot explain
variation in genome size among eukaryotes
• Most of variation in genome size is due to
variation in the amount of non-coding, repetitive
DNA (mostly transposable elements, TEs)
• TEs accumulate in intergenic and intronic
regions
Contrasted Genome Landscapes

Transposable Element
2001: first
What have we learned draft of the
human
from the human genome
sequence
genome sequence?
Most of the Human Genome does not code
for proteins

Coding
Non-coding

1.5%
Half of the Human Genome is derived from
Transposable Elements (TEs)

TE-derived
DNA
48.5% Coding
Non-coding

1.5%
The human Genome Browser at UCSC
A snapshot of the Human Genome

Genes
Conservation in other species

TEs
TEs are the most rapidly changing components of the
genome

Human-
specific
GenesTE

Cons-
Ape- erved
specific TE Exon

TEs

Primate-specific
TE
Rapid changes in genome size in the grasses

~50 myr

~10 myr

Genome size:
4800 Mb 430 Mb 750 Mb 2500 Mb

Figure adapted from Sue Wessler


The maize genome: tiny gene islands floating on
an ocean of repetitive DNA
A typical maize chromosome

Cluster of Repetitive DNA


RIP RIP RIP RIP

gene A gene B genes C & D

Nested LTR retrotransposons


Expansion of intergenic regions in maize by
accumulation of LTR-retrotransposons

San Miguel et al. (1996) Nested Retrotransposons in the Intergenic Regions of the
Maize Genome. Science 274: 765-768
(+ other studies from Bennetzen lab)
Retrotransposon amplification has resulted in the
doubling of the maize genome in the last ~6 myr

San Miguel et al. (1998) The paleontology of intergene


retrotransposons of maize, Nature Genet. 20:43-45
Variation in TE activity triggers rapid changes
in genome size in grasses

~50 myr

~10 myr

Genome size:
4800 Mb 430 Mb 750 Mb 2500 Mb
Genes
TEs
Comparative genomics
• Study of similarities and differences among
genomes
• Many genes are shared among all living things or
between related groups
• Study of genes in model organisms provides useful
information regarding genes in other organisms
• Large genome projects produce considerable
amount of information
– Requires computer analysis and development of new
software to analyze the avalanche of data (bioinformatics)
2001: first
What have we learned draft of the
human
from the human genome
sequence
genome sequence?
1996: S. cerevisiae 1998: C. elegans 2000: D. melanogaster 2001: H. sapiens
2000: A. thaliana

2002: S. pombe 2002: F. rubripes 2002: P. falciparum 2002: A. gambiae 2002: O. sativa

2002: C. intestinalis 2002: M. musculus 2003: C. familiaris 2003: N. crassa


2004: T.nigroviridis

2004: R. norvegicus 2004: B. mori 2004: T.pseudonana 2005: P. troglodytes


2005: E. histolytica
Genome sequences can be aligned and compared
Human-Mouse
genome
comparison
A snapshot of the human genome browser at UCSC

Genes
THIS WEEK’S MENU
What is the structure of DNA?
What is the organization of a gene?
What are chromosomes? What is chromatin?
How is DNA organized at the chromosome and chromatin level?
How are genes organized in the genome?
What makes the genome of a prokaryote and a eukaryote different?
What’s in a genome?
How are genomes different among eukaryotes?
Overview
• Each species has a uniquely fundamental set of
genetic information, its genome.
• The genome is composed of one or more DNA
molecules, each organized as a chromosome.
• The prokaryotic genome is generally compact and
made of a single circular chromosome.
• The eukaryotic genome consists of one or two sets
of linear chromosomes confined to the nucleus.
• A gene is a segment of DNA that is transcribed
into a ‘functional’ RNA molecule.
• Introns interrupt many eukaryotic genes.
• Eukaryotic genomes are littered with repetitive
DNA (mostly derived from transposable elements)

Você também pode gostar