Escolar Documentos
Profissional Documentos
Cultura Documentos
http://genome.ucsc.edu/
Louis Tang
Bioinformatics R&D
National Genotyping Center
Academia Sinica.
Quick Overview of features on
genomic regions
Loads of Data
Visual Correlation
Customization
The Rise
Francis Collins
Craig Venter
VS. Francis Collins
5 9
Regulation
Variation
Expression
More…
Annotation Tracks
Coordinate
Everlasting Assembly
Organism Code + version
chr3:1,000,000-2,000,000
chr3:1,000,000+2000
chr3:1,000,000-1,001,999
Landmark
Chromosome
Gene
SNP
STS EST
Cytogenetic band
chr7
chr7:1-158,821,424
20q12
chr20:37,100,001-41,100,000
apoe
chr19:50,100,879-50,104,490
rs328
chr8:19864004
±250
chr8:19,863,754-19,864,254
D16S2837
±100,000
chr16:81,020,163-81,220,399
Landmark1;Landmark2
Landmark1 Landmark2
rs328;rs316
chr8:19,864,004 chr8:19,862,716
Sort
chr8:19,862,716-19,864,004
Author
44
McAndrew,P.E.
Practice
Track Control
Mark
62
Gene
>>>>>> >>>>
63
PDB
>>>>>> >>>>
64
Reviewed
>>>>>> >>>>
65
RefSeq Provisional
>>>>>> >>>>
66
Non-RefSeq
>>>>>> >>>>
67
Alignment
TAACCAGCTGCCCAA--------TAGAAACTACGAGAGACAACAGGGAGT
||||| ||||||||| ||||||| | ||||||||||
TAACCAGCTGCCCAACTGTAGAAACTACCAACTCATTTCGAACAGGGAGT
68
Wiggle
69
Mapping and Sequencing
Phenotype and Disease Association (OMIN)
Expression
Comparative Genomics
ENCODE Pilot
ENCODE Production
84
Display Mode
85
Full
86
Pack
87
Squish
88
Dense
89
Hide
90
91
92
93
Configuration
Description
Display Convention
Method
Credit
References
Practice
hg18, tp53
UCSC Gene : full
RefSeq: dense
SNP: squish
RepeatMasker: pack
103
104
105
106
107
108
Different Display Mode
Exhibits Different Behavior
109
110
111
112
113
114
115
116
Stroll Along The Genome
117
118
>
>>>>>> >>>>
119
>
>>>>>> >>>>
120
121
Move End >
122
123
124
125
Click
Drag
Ctrl + Click
Practice
http://www.openhelix.com/downloads/ucsc/ucsc_home.shtml
Where is the Sequence?
131
132
RefSeq Gene
SNP
Practice
rs328
McAndrew,P.E.
?
AACTAGAAATCAGTCAACAAATTGGATGCTTAGGATAAATTCAAGAACTG
AGTAGAGAAATAAAGCTTAATGAATGACCTTTTGGGCTCCTTCCAGTTCC
AAGGTTTTAGTATTCTAAAATTTTCGGCACAGAACAACTCCAAATGCTCA
GGAAATAAGAATGAGGTCTGTTTTTAAAAGGTGCAGTTTGGAGCATGTTG
GGTGGATGAGGCTATAAAAAGTGAAGTACGATTTTCAAGGAAAGGAAGCT
GACCAATCAAAGTCTTTTGGGCAGCCCCTCCAGAAATCCAGGTGAAGCCC
GGCTCCAGGCTGAGTTGCTGTTACTCTACACGAAAGCCAGGCCGCTACTT
BLAT
500x
158
Protein
50x
159
DNA & RNA
95%
25 bases
Protein
80%
20 bases
DNA
mRNA
BLAT’s Guess
169
DNA (RNA)
Query Genome
DNA DNA
Protein
Query Genome
Protein DNA
6 frames
Protein
Translated RNA
Query Genome
DNA DNA
3 frames 6 frames
Protein Protein
Translated DNA
Query Database
DNA DNA
6 frames 6 frames
Protein Protein
Query
Genome
Query
Genome
Query
Genome
Query
Genome
Usage Restriction
DNA Query
25,000 bases
Protein Query
10,000 bases
Translated Query
10,000 bases
BEFORE
Total
25 sequences
50,000 letters
Practice
http://www.openhelix.com/downloads/ucsc/ucsc_home.shtml
Pretty graphic is good,
but…
I want raw data
Table Browser
Text-based access to features
on genomic regions
Mission
(Hg18, SNP130)
Find all single SNPs on tp53
Genome Browser Table Browser
Database
Database
Table
Table
Table
Table Table
Genome Browser
Annotation Track
Table Table
Table
Table Browser
Table
Table
Table
Positional
C A C C T C A G A
Biology 1 2 3 4 5 6 7 8 9
Computer 0 1 2 3 4 5 6 7 8
Coordinate Mismatch
C A C C T C A G A
Biology 1 2 3 4 5 6 7 8 9
Computer 0 1 2 3 4 5 6 7 8
Coordinate Mismatch
C A C C T C A G A
Biology 1 2 3 4 5 6 7 8 9
Computer 0 1 2 3 4 5 6 7 8
Coordinate Mismatch
C A C C T C A G A
Biology 1 2 3 4 5 6 7 8 9
Computer 0 1 2 3 4 5 6 7 8
Coordinate Mismatch
C A C C T C A G A
Biology 1 2 3 4 5 6 7 8 9
Computer 0 1 2 3 4 5 6 7 8
3 Questions
Table?
Output Format?
Filter Criteria?
Table
249
250
251
252
253
254
255
Database
Table
Table
Table
Table
Table
Database
Table
Table
Table
Table
Table
snp130CondingDbSnp.name (via snp130.name)
snp130
name chrom observed
rs1642789 chr17 A/T
… … …
snp130CodingDbSNP
name transcript alleles codons
rs1642789 NM_001126113 TGT,AGT,
… … …
Connected Table
Table
Table
Table
Table
Table
Table Schema
Connected Table
Sample Rows
Track Description
Output
Practice
hg18
snp130
chr6:1,000,000-1,001,000
Sequence Output
100 extra bases on either stream
Filter Criteria
Positional
Non-Positional
Coordinate
Landmark
Landmark;Landmark
Author
One-Based
chrX 151073054 151173000
chrX 151183000 151190000
chrX 151283000 151290000
Zero-Based
chrX:151,073,055-151,173,000
chrX:151,183,001-151,190,000
chrX:151,283,001-151,290,000
One-Based
Limit
1,000 regions
Practice Time
hg18
snp130
chr6:1,000,000-1,500,000
chr6:2,000,000-2,500,000
TP53 = TP53
100,200 or or
100 200 100, 200
Enumeration
where
Free-form Query
Item Count: 47
SNP
Gene
SNP
Intersect
Gene
Intersect Items
5’ 3’
Intersect
Base-Pair-Wise Intersect
5’ 3’
Intersect
Base-Pair-Wise Union
5’ 3’
Union
Complement
5’ 3’
Complement
Practice Time
hg18
chr1:1,000,000-2,000,000
Find SNPs(v.130) on sno/miRNA
Zero-Based
chr4 1000000 1005000 myData 200 + 1000100 1004900 100,150,200
name
chr4 1000000 1005000 myData 200 + 1000100 1004900 100,150,200
score (0 ~ 1000)
chr4 1000000 1005000 myData 200 + 1000100 1004900 100,150,200
strand
chr4 1000000 1005000 myData 200 + 1000100 1004900 100,150,200
thickStart thickEnd
chr4 1000000 1005000 myData 200 + 1000100 1004900 100,150,200
color
Red,Green,Blue
0~255
Track
Track
Track
full
pack
squish
dense (Default)
hide
Track
Shading: 1
No Shading: 0 (Default)
Track
Color: on
No Color: off (Default)
Track
Flip
LiftOver
Mission
hg18 chr1:1,000,000-1,100,000
hg19 ?
Zero-Based
chr1 1010136 1020137
Zero-Based
DNA Duster
Mission
Hi,
On the in-silico PCR input page there are two options: min perfect match
& min good match. There are some explanations on the same page:
…
But how can these two options be specified simultaneously? will one
option be overridden by another?
Louis
Subject: Re: [Genome] in-Silico PCR Min Perfect/Good Match confuses me
From: Galt Barber galt@soe.ucsc.edu
To: genome@lists.soe.ucsc.edu
Date: 04/23/2010 02:50:01 AM
Hi, Louis!
-Galt
"It's been a wonderful stone soup, where
other people have contributed bits,"
- James Kent