Escolar Documentos
Profissional Documentos
Cultura Documentos
S.I.D. __________________________
MCB 104 MIDTERM #2 March 14, 2013
***IMPORTANT REMINDERS***
Print your name and ID# on every page of the exam.
You will lose 0.5 point/page if you forget to do this.
If you need more space than is available on a page, continue your answer on the back
of the same page. The pages will be separated for grading, so no points will be given
for answers continued on the back of a different page.
This is a closed book, closed note exam. No calculators, phones or any electronic
device is allowed.
Look through the entire exam before starting. You should have 9 numbered pages,
including this cover page. You do not have to start with Question 1. Read each
question entirely before beginning. Write legibly. Show all of your work for
formulas/math questions.
(Do not write below this line)
PAGE 2 ______/15 total
PAGE 3 ______/13 total
PAGE 4 ______/14 total
PAGE 5 ______/11 total
PAGE 6 ______ /12 total
PAGE 7 ______ /10 total
PAGE 8 ______ /12 total
PAGE 9 ______ /13 total
Total Score _________/100
1
Name __________KEY________________
S.I.D. __________________________
1. (1 point) What does SNP stand for? underlined terms are required for points
single nucleotide polymorphism
2. (4 points) In no more than one sentence, and without drawing any pictures, explain in
words the difference between a contig and scaffold.
(2 points for accurate contig definition, 2 points for scaffold definition that explains
difference)
A contig is sequence assembled from contiguous or overlapping DNA sequence, while a
scaffold is sequence assembled from contigs connected with paired‐end reads.
3. (4 points) Define what GWAS stands for and outline three key steps explained in lecture
on how one is carried out: (1 pt for correct definition, 1 pt. for each step)
Genome wide association study
1. collect DNA from large number of cases and controls (or phenotyped or diseased/not
diseased) individuals
2. genotype genome‐wide SNPs
3. test for associations between genotype at each SNP and disease/trait/phenotype
4. (4 points) You want to sequence the genome of an organism that turns out to have a
common class of repetitive elements that is 10 kb long. In one sentence, how could you
sequence and assemble a genome so you could connect unique sequence flanking repetitive
sequence?
Make genomic DNA libraries with inserts larger than 10 kb (2 pts), sequencing paired ends
from this library, then looking for paired end reads that allow jumping over the repetitive
element to unique sequence flanking it (2 pts).
5. (2 points) In one sentence, why are the size of regions of linkage disequilibrium much
larger in a two generation QTL mapping cross than GWAS?
In a two generation mapping cross, only two rounds of recombination has shuffled genetic
variation, while many more generations and rounds of recombination have shuffled variation
in a GWAS.
2
Name __________KEY________________
S.I.D. __________________________
6. a. (4 points) Cystic fibrosis is an autosomal recessive disease caused by loss‐of‐function
mutations in the gene CFTR. In Algerians, the allele frequency of the Phe508 deletion
mutant allele of CFTR is approximately 20%. Assuming Hardy‐Weinberg equilibrium,
calculate the percentage of Algerians that are carriers (heterozygotes) for the disease allele,
and the percentage of affected newborns. You must show all of your work for full credit.
1 pt. each for q, p, 2pq, and q2
Allele frequency = q = 0.2
p = 1‐ q = 0.8
2pq = 2 (0.8)(0.2) = 0.32 or 32% heterozygous carriers
q2 = 0.2 x 0.2 = 0.04 or 4% of newborns are homozygous affecteds
b. (3 points) One percent of the global human population is homozygous for a deletion
allele of CCR5 (CCR5del32) that has been clearly associated with resistance to HIV.
Calculate the allele frequencies of the wild‐type and deletion alleles, and the frequency of
heterozygous carriers assuming Hardy‐Weinberg equilibrium.
1 pt each for q2 p, and 2pq
q2 = .01, q = 0.1 or 10% for CCR5del32
p = 1‐q = 0.9 or 90% for wild‐type allele
2pq = 2 * 0.9*0.1 = 0.18 = 18% frequency of heterozygotes
7. (4 points) What are paired end reads, genomic libraries, cDNA, and ESTs? Define each in
no more than one sentence each. Your definition of cDNA should explain how cDNA is
made.
(1 pt/definition)
Paired end reads are sequencing reads of both ends of a DNA molecule. Genomic libraries are
collections of genomic DNA fragments, cloned into vectors. cDNA is complementary DNA
made from mRNA by reverse transcriptase. ESTs are expressed sequence tags (0.5 pt), or
paired end reads from a cDNA library (0.5 pt).
8. (2 points) In no more than two or three words, why are haplotypes on average smaller
in older populations?
only 1 point if use more than 3 words
More historical recombination (or more meioses, more recombination, more historical
meioses)
3
Name __________KEY________________
S.I.D. __________________________
9. (8 points) Kernel color in wheat is a simple quantitative trait, controlled by two unlinked
loci, each with two alleles that make purple pigment. Plants homozygous for purple alleles
at both loci (AABB) have purple kernels. Plants homozygous for white alleles at both loci
(aabb) have white kernels. The total number of A and B alleles determines kernel color in a
simple additive way: one allele of either A or B makes kernels light red, two alleles of either
A or B makes kernels red, and three alleles of A or B makes kernels dark red. You cross
AABB purple plants to aabb white plants and get all red AaBb F1 hybrids. You then
intercross these AaBb F1 hybrids.
What are the expected ratios of F2 genotypes and phenotypes, using the Aa Bb convention
above?
(4 points for correct genotype ratios, 4 points for correct phenotype ratios)
AaBb x AaBb
F2 phenotype and genotype ratios:
1/16 purple, AABB
4/16 dark red, (1/8 or 2/16 AaBB + 1/8 or 2/16 AABb)
6/16 red, (1/16 AAbb, 1/ 4 or 4/16 AaBb, 1/16 aaBB)
4/16 light red (1/8 or 2/16 Aabb + 1/8 or 2/16 aaBb)
1/16 white, aabb
10. a. (3 points) Two SNPs are in absolute linkage disequilibrium. SNP1 has allele
frequencies of 40% G and 60% C, and SNP2 has allele frequencies of 40% A and 60% C.
Calculate the observed frequencies of haplotypes in this population.
1.5 pts/each haplotype frequency
40% GA
60% CC , SNPS are in complete linkage disequilibrium, so GC and CA are never observed.
b. (3 points) If SNP1 has allele frequencies of 40% G and 60% C and SNP2 has allele
frequencies of 40% A and 60% C, calculate the observed frequencies of haplotypes in the
absence of linkage disequilibrium.
0.5 pts/each haplotype frequency, 1 pt. for showing work
If allele frequencies are 0.4 G and 0.6 C at SNP1 and 0.4 A and 0.6 C at SNP2, haplotype
frequencies in the absence of linkage disequilibrium would be:
0.4*0.4 = 0.16 GA
0.4*0.6 = 0.24 GC
0.6*0.4 = 0.24 CA
0.6*0.6 = 0.36 CC
4
Name __________KEY________________
S.I.D. __________________________
11. (4 points) Shown below are a matrix of human SNP genotypes. Each row is a haploid
human chromosome, and each column is a specific SNP position, aligned in physical order.
G A T C G G C A A C G A
A T C A C A T G T T T C
G A T C G G C A A T T C
A T C A C A T G T C G A
G A T C G G C A A T T C
G A T C G G C A A T T C
G A T C G G C A A C G A
G A T C G G C A A C G A
A T C A C A T G T T T C
A T C A C A T G T C G A
Do the observed genotypes suggest a recombination hotspot? Answer yes or no, and
explain your answer in one sentence. If you answered yes, indicate with a line where the
hotspot it.
Yes (1 pt). Hotspot is vertical line between 4th from right and 3rd from right columns (1 pt).
Here linkage disequilibrium breaks down and is not present, as all 4 haploid genotypes (AC,
TT, AT, TC) are observed at the expected frequency (2 pts).
12. (4 points) In dogs, regions of linkage disequilibrium are about 100 times larger than
they are in humans. For the next two questions, consider only the relative size of linkage
disequilibrium in dogs and humans. Compared to a human GWAS, would a dog GWAS on
average be more or less likely to identify a linked SNP? Once a linked SNP was found,
compared to a human GWAS, would a dog GWAS offer better or worse resolution? Answer
both questions in no more than one sentence each.
Given the larger blocks of LD, a linked SNP would be more likely to be found in dogs than
humans (2 pts). However, once a linked SNP was found, the resolution would be better in
humans than dogs (again, given the larger blocks of LD) (2 pts).
13. (3 points) You pay to have your own genome genotyped for 300,000 tagSNPs, and are
sent a long list of disease risks based on a subset of your specific SNP genotypes and many
published human GWAS studies. Are any of these SNPs causative for diseases you might
have or get? Answer yes, no, maybe, and explain why in no more than two sentences.
Maybe (1 pt). The SNPs are simply tagging blocks of genetic variation, so could just be linked
to the causative mutation (1 pt). However, the SNP could itself be causative (1 pt).
5
Name __________KEY________________
S.I.D. __________________________
14. In a recent GWAS of heart disease, the SNP rs1333040 was genotyped in disease cases
and controls. The results are shown in the following table:
genotype Disease Controls
TT 350 250
TC 300 500
CC 110 270
A. (4 points) For both disease and controls, calculate the total number of T and C alleles,
showing your work: For each cell, 0.5 pt for correctly setting up equation, 0.5 pt for
correct number
Allele Heart disease Controls
T 350*2 +300 =700+300 = 250*2 + 500 = 500+500
1000 = 1000
C 110*2 + 300 = 220+300=520 270*2 + 500 = 540+500
=1040
B. (6 points) Set up the equation to calculate the allelic odds ratio (for T against C) for
heart disease risk? Show all of your work in setting up all equations for full credit.
odds of getting disease with T = 1000/1000 = 1 (2 pts)
odds of getting disease with C = 520/1040 = 0.5 (2 pts)
odds ratio for T against C = (1000/1000)/(520/1040) = 1/0.5 = 2 (full credit if all equations
set up correctly) (2 pts)
C. (2 points) One of the 317,503 SNPs genotyped in this study gave a p‐value of 0.03 for
association with heart disease. Is this association significant? Answer yes or no, and justify
your answer in one sentence.
No. Multiple hypotheses are being tested, so Bonferroni correction is needed and p‐value for
significance would be 0.05/317,503. Need to either say Bonferroni correction is needed or set
up the fraction for significance for full credit.
6
Name __________KEY________________
S.I.D. __________________________
15. In trying to determine the genetic basis of a human disease, you genotype a human
pedigree shown below, where an autosomal dominant phenotype present in one parent is
transmitted to four of eight offspring. Your molecular genotyping assay is a microsatellite
known to be tightly linked to the disease locus. You amplify the microsatellite with PCR and
size‐separate by electrophoresis. Molecular genotypes are shown beneath the pedigree. In
the following questions, the lanes on the far left and far right of the pedigrees are DNA size
fragments of (from top to bottom): 200 bp, 150 bp, and 100 bp.
A. (2 points) In this pedigree, circle the disease‐
linked allele beneath the affected parent. 200
bp band beneath top black square. Mom’s 200 bp
allele is the same size in bp, but is not linked to
disease causing allele.
B. (2 points) What are the odds that a child of
the last unaffected male (last white square to
the right) and a homozygous wild‐type mother will be affected? Zero. This male has mom’s
200 bp allele that is not linked to disease allele (since he got 100 bp paternal allele, you know
200 bp allele in this male must be from mom).
In a different human pedigree, a simple Mendelian autosomal recessive disease is
segregating and you have genotyped a tightly linked microsatellite. Again molecular
genotypes are shown beneath the pedigree.
C. (2 points) In this pedigree, circle all of the
alleles linked to the disease allele in both
parents and offspring. Must have circled all
eight lower 100 bp bands for full credit.
D. (4 points) In a third pedigree, you genotype
a molecular marker that is a SNP that alters a
restriction enzyme recognition site. This
marker consists of a pair of PCR primers that amplify a 150 base pair band. Within this 150
bp region, the wild‐type allele does not have the cut site but the mutant allele does, and
when cut results in a 100 and 50 bp. If this marker is tightly linked to the disease to the
dominant disease in the pedigree below, fill in the box below for what these molecular
markers would look like on a gel. In this case the DNA size makers to the far left and far
right are 150, 100, and 50 base pairs. 1 pt/parent, 1 pt@ for affected and unaffected kids
7
Name __________KEY________________
S.I.D. __________________________
16. (4 points) In what linear order would these five sequences assemble into a contig?
1: TGTGACGTAGCAATCTTGGTTGCTGGAA
2: TTCCCGGTTTCCCCCCTCGTTGGTAAGG
3: GTTTTTATAGTTTATGTGACGTAGCAA
4: CTTGGTTGCTGGAATGGCTGTATCATAGC
5: CTCGTTGGTAAGGCGTTTTTATAGTTTA
Answer just with the linear order of sequence reads from 5’ to 3’.
2‐5‐3‐1‐4 (partial credit if part of sequence is correct, 0 pts for 1 bp overlaps)
17. You want to test a non‐coding genomic sequence containing tagSNPs linked to a human
disease for enhancer activity. You make a reporter gene with this non‐coding human
sequence, a minimal promoter, and the lacZ reporter gene, then inject this construct into
many one‐celled mouse embryos. You raise these embryos until embryonic day 11, then
stain embryos for lacZ expression. In total, you observe ten embryos with blue domains
indicating lacZ expression. In all ten embryos, the embryonic heart is blue. In two of these
embryos, there is also lacZ expression in other domains, one embryo also has a blue lacZ
expression domain in the brain, and another embryo also has a blue lacZ expression
domain in the limb.
A.(3 points) In one sentence, what can you conclude about the non‐coding sequence of
interest?
It is a heart enhancer.
B.(3 points) In one sentence, how can you explain the single embryos with additional brain
or limb expression domains?
These ectopic expression domains are likely due to other enhancers (1.5 pts) nearby the
integration site of the transgene (1.5 pts).
18. (2 points) In one sentence, why do some stretches of SNPs covary?
Because they have not been separated by historical meiotic recombination. Partial credit for
“linkage disequilibrium”, but must mention no recombination between them for full credit).
8
Name __________KEY________________
S.I.D. __________________________
19. (3 points) If you used Sanger sequencing to sequence with two vector primers the
paired end reads of one 10 kb clone from a genomic library, would you get the entire 10 kb
of sequence or would you have a gap in the sequence of the clone? Answer this question
and explain your answer in no more than one sentence.
You would have a gap in the sequence (1 pt), because you only get about 1 kb of sequence
with Sanger sequencing (1 pt), so by sequencing 1 kb from either end, there’d be a gap of
about 8kb between these sequences (1 pt).
20. a. (3 points) Some human genes have ESTs in all human cDNA libraries, while other
genes have ESTs present only in certain tissues. Explain this observation in no more than
one sentence.
Some genes are only expressed in certain tissues (1 pt), due to tissue‐specific enhancers (1 pt),
while other genes are ubiquitously expressed (i.e. transcribed in all cells) (1 pt).
b. (3 points) Many human genes have 3’ ESTs that align to different genomic regions. Many
human genes also show variation in the number of cDNA segments that align to the
genome. Explain these two observations in one sentence each.
Many genes have multiple polyadenylation (transcription stop) sites, so 3’ ESTs can align to
different genomic regions (1.5 pts). Many genes are differently spliced, so cDNAs of those
genes can vary in which exons are included in the mRNA (1.5 pts).
21. (4 points) Recent transcriptome sequencing studies have surprisingly revealed a class
of circular RNA molecules, that are transcribed as linear mRNAs and then the 5’ and 3’ ends
of the mRNA are ligated to make a circular mRNA. The evidence for these circular mRNAs
comes in part from two observations made from cDNA sequencing, and how cDNA
sequences align to the genomic DNA sequence. What two observations in cDNA sequences
are predicted if mRNA molecules are circularized after being transcribed? (Hint: the first
observation is made from a subset of single end transcriptome reads, and the second
observation is made from comparing the pattern and orientation of paired end read
alignments to the human genome.)
Some single end transcriptome reads align to two distinct genomic locations, with the
circularization junction spanned by the read (i.e. sequencing reads that span the 5’‐3’ ligation
junction would have two parts that align to distinct regions of the genome – one 5’ and one 3’)
(2 pts). The second prediction is that some paired end reads would map with their 5’ and 3’
reads pointing away from each other (i.e. sequence of paired end reads starts at two positions,
and heads outwards from both points, instead of inwards towards each other, like most 5’ and
3’ EST sequences coming from linear transcripts) (2 pts).
9