Escolar Documentos
Profissional Documentos
Cultura Documentos
A Field Guide
Eric Sayers Medha Bhagwat sayers@ncbi.nlm.nih.gov bhagwat@ncbi.nlm.nih.gov revised 11/30/06
Online Resources
Course Resources Course home page PowerPoint slides Workshop I exercises Workshop II exercises Alignment Guide Amino Acid Explorer PSSM Viewer General NCBI NCBI home page About NCBI NCBI news NCBI Structure Sites Structure main page MMDB main page CDD main page VAST page PubChem page NCBI Handbook NCBI Structure Tools Cn3D CD-Search VAST Search CDART NCBI Threader Entrez Structure Entrez 3D Domains Entrez Domains COG MMDB URLs http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi http://www.ncbi.nlm.nih.gov/Structure/VAST/vastsearch.html http://www.ncbi.nlm.nih.gov/Structure/lexington/lexington.cgi?cmd=rps http://www.ncbi.nlm.nih.gov/Structure/RESEARCH/threading.shtml http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Domains http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd http://www.ncbi.nlm.nih.gov/COG/ http://www.ncbi.nlm.nih.gov/Structure/MISC/linking.html#mmdbsrv http://www.ncbi.nlm.nih.gov/Structure http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml http://www.ncbi.nlm.nih.gov/Structure/VAST/vast.shtml http://pubchem.ncbi.nlm.nih.gov
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.chapter.ch3
NCBI Structure FTP Resources CDD Cn3D CDART RPS-BLAST ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/ ftp://ftp.ncbi.nih.gov/cn3d/ ftp://ftp.ncbi.nih.gov/pub/mmdb/cdart/ ftp://ftp.ncbi.nih.gov/blast/
Literature
Structural Biology 1. Bourne, P.E. and Weissig, H., eds. Structural Bioinformatics. John Wiley & Sons. Hoboken, NJ. 2003. 2. McPherson, A. Introduction to Macromolecular Crystallography. John Wiley & Sons. Hoboken, NJ. 2002. 3. Cavanagh, J., Fairbrother, W.J., Palmer, A.G., and Skelton, N.J. Protein NMR Spectroscopy. Academic Press, San Diego. 1996. NCBI Databases 1. Chen, J. et al, (2003) MMDB: Entrezs 3D-structure database. Nucleic Acids Res 31, 474477. 2. Marchler-Bauer, A. et al, (2003) CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res 31, 383-387. VAST 1. Gibrat, J.-F. et al, (1996) Surprising similarities in structure comparison. Curr Opin Struct Biol 6, 377-385. 2. Madej, T. et al, (1995) Threading a database of protein cores. Proteins 23, 356-369. CDART 1. Geer, L.Y. et al, (2002) CDART: protein homology by domain architecture. Genome Res 12, 1619-23. Other relevant NCBI publications are linked from http://www.ncbi.nih.gov/Structure/.
Lecture Topics
Exploring 3D Molecular Structures Using NCBI Tools
Lecture 1: Structures in Cn3D Lecture 1 Overview of Structural Informatics at NCBI Indexing Structural Data at NCBI Entrez Protein Entrez Structure Entrez 3D Domains Entrez Conserved Domains (CDD) Finding Structures by text queries (Entrez) Finding Structures by sequence (BLAST, CD-Search) Lecture 2 Finding Structures by 3D and Functional Homology
Domains Function
12,589
41,527
3D Conformation Structure
10,129,516
10,197,719
434
Solving Structures
X-Ray Crystallography
Resolution Disorder
Cn3D
5 3 1
Pro-Phe-Ile
NMR Spectroscopy
RMSD ()
Bond C-S C-C C-N C-O S-H C=O C-H N-H O-H
Temperature
Cn3D
protons!!
PDB
HEADER TITLE COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND SOURCE SOURCE SOURCE SOURCE SOURCE SOURCE SOURCE SOURCE KEYWDS
Occupancy
Temperature Factor
Issues:
Justification Nomenclature
Protein
3D Domains
Conserved Domains
Nucleotide
Entrez 3D Domains
3D Domain 0: 1EJ9A0 = entire polypeptide
Protein 1EJ9A
Nucleotide
Nucleotide
1EJ9C
1EJ9D
Creating 3D Domains
1EJ9A2
Cn3D
Pfam SMART CD
Sanger
Pfam-A seeds: HMM based models representing a wide variety of functional domains derived from SWISS-PROT HMM based models originally concentrating on eukaryotic signaling domains, now expanding NCBI curated domains based on sequence and structural alignments
EMBL
NCBI
Protein Families
COG
NCBI
CD-Search Summary
Catalytic arginine
Catalytic arginine
Entrez Queries
Structure: Find human topoisomerases complexed with dsDNA
topoisomerase AND 2[dnachaincount] AND human[organism]
Structure: Find all fungal structures with bound calcium at 1-2 resolution
calcium[ligname] AND fungi[organism] AND 1.0:2.0[resolution]
57 structures
15 structures
Protein: Find all 50-100 kDa proteins with structures published in 2004
2004[pdat] AND 50000:100000[molwt] AND protein structure[filter]
GVKWKYLEHKGPVFAPPYDPLP
NM Goal: Find the most sequence-similar structure BLAST Related Structures Displays a graphical and text alignment between a query sequence and a similar sequence with structure CD-Search (RPS-BLAST) Displays a graphical alignment of all CDs that match a query sequence
RefSeqs
NP
Related Structures
Cn3D
Structures in CDs
CDD v2.09: A database of Position Specific Score Matrices (PSSMs) Single Domains
Pfam SMART CD
pfam01234
Sanger
29.0% have structures (1523 of 5252) 60.3% have structures (347 of 575) 57.9% have structures directly (1443 of 2494) 100% have structures in their parent (or self)
smart00123
EMBL
cd01234
NCBI
NP_003277
Protein Families
COG
COG0123
NCBI
CD-Search Output
pfam02919: A Non-curated CD
curated CDs
Aligned query
Structures added!
CDSearch
827
1D3Y
69.4% hit CDD 48.0% BLAST MMDB 45.2% hit both 27.8% hit neither
Summary
Structure Summary Page
Entrez Structure (MMDB) Entrez Protein (protein sequences) Entrez 3D Domains (structural domains for VAST) Entrez CDD (conserved functional domains)
Searching by text
Use indexing fields, ie [organism], [resolution]
Searching by sequence
Related Structures (uses simple BLASTp, BLOSUM62) CD-Search (uses PSSMs from CDD) more sensitive!
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/dna_photo.html
Workshop I
Course Main Page Session I Exercises
1 of 2
6/19/2006 5:25 PM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/dna_photo.html
10. g c Display your annotation d e f In the structure window, choose Show/Hide / Show Selected Residues. If you still see helix or strand objects, choose Style / Edit Global Style and uncheck the boxes for Helix Object and Strand Object. Click anywhere in the sequence window to remove the highlighting. 11. f g Save your work c d e You can now use File / Save As to save the annotated file for future use.
Revised January 4, 2006
2 of 2
6/19/2006 5:25 PM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/nos.html
Workshop I
Course Main Page Session I Exercises
1 of 2
2/23/2007 12:53 AM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/nos.html
You can now save the annotated file for future use.
Revised January 16, 2007
2 of 2
2/23/2007 12:53 AM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/arg_kin.html
Workshop I
Course Main Page Session I Exercises
1 of 2
2/23/2007 1:00 AM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/arg_kin.html
click on the highlighted residues that are NOT arginine to deselect them. 9. g c Create a custom annotation d e f Choose Style / Annotate, click "New", and give the annotation the name "Arg". Click Edit Style, check the protein side chain box, and render them as "Tubes" with a color of your choice. 10. g c Label residues in the annotation d e f Click the Labels tab and set the protein backbone spacing to 1 and the type to One Letter. Click Done, OK, and then Done. Now deselect the residues to see the annotated color. 11. g c Add a second annotation d e f The catalytic residue is thought to be either E225 or E314. Using Style / Annotate, create a new annotation where the sidechains of these residues are rendered as ball-and-stick models colored by charge. Label these with one letter labels as before. 12. g c Save your work d e f Experiment with turning the annotations on and off in the Annotation panel. You can now save the annotated file for future use.
Revised January 16, 2007
2 of 2
2/23/2007 1:00 AM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/rnaseh.html
Workshop I
Course Main Page Session I Exercises
1 of 2
6/19/2006 5:47 PM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/rnaseh.html
9. g c Create a custom annotation d e f In the structure window, choose Style / Annotate. Click "New" and enter a name for the annotation. Click "Edit Style" and uncheck the boxes next to Helix Objects and Strand Objects. 10. g c Display and label the side chains d e f Check the box next to Protein side chains, select "Tubes" from the menu to the right, and select a color of your choice. Click on the Labels tab, and set the spacing to 1 under Protein Backbone. Click Done, then OK, and Done. 11. f g Annotate a metal ion c d e Double click the nearby magnesium ion. Create a second annotation using Style/Annotate, and color the magnesium ion as you like. Use the Heterogen row of controls. 12. g c View your annotations d e f In the structure window, choose Show/Hide / Show Selected Residues. Double click in any white area of the sequence window to remove the highlighting. Using the File menu, you can now save the file for future use.
Revised June 19, 2006
2 of 2
6/19/2006 5:47 PM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/hfe.html
Workshop I
Course Main Page Session I Exercises
1 of 2
6/19/2006 5:43 PM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/hfe.html
7. g c Interpret your results d e f What structural role does the residue play that aligns to C282 in NP_000401? What could be the consequence of mutating this residue to a tyrosine? Click here for a hint.
Revised November 22, 2005
2 of 2
6/19/2006 5:43 PM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/Rtrna.html
Workshop I
Course Main Page Session I Exercises
1 of 2
6/19/2006 5:44 PM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/Rtrna.html
9. g c Explore charges at the molecular interface d e f Choose Style / Edit Global Style, and render the protein side chains as space filling models colored by charge. What charges are present at the interface? 10. g c Explore the quality of the structural data at the molecular interface d e f Choose Style / Coloring Shortcuts / Temperature. What areas of the protein and RNA are the least well determined (more red)? The most well determined (more blue)? Where are these portions relative to the interface? 11. g c Label residues in the interface d e f Choose Style / Global Style, and render the protein side chains as ball and stick models. Click on the Labels tab and change the protein backbone spacing to 1. Click Done. 12. f g Examine contacts in the interface c d e Choose Style / Coloring Shortcuts / Molecule. Zoom in and double click a labeled residue. Use Show/Hide / Search by distance, Other Molecules to find RNA atoms within 3.5 Angstroms of this residue. Repeat for other residues as you like. 13. g c Draw conclusions about contacts in the interface d e f What are the most common amino acids involved in the contacts? What part of the RNA is involved in most of the contacts?
Revised November 22, 2005
2 of 2
6/19/2006 5:44 PM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/entrez_structure...
Workshop I
Course Main Page Session I Exercises
1. f g Begin with Preview/Index in Entrez Structure c d e From the NCBI home page, click on the "All Databases" link on the top bar. Click on Structure in the list of databases. Click on Preview/Index to view the list of available fields. 2. g c Search 1 d e f Find all protein structures from primates. Use the fields [Organism] and [ProteinChainCount]. Add a term to your search to find all structures of protein-DNA complexes from primates containing only double-stranded DNA. Use the field [DnaChainCount]. Go to the next step to check your answer. 3. f g Answer 1 c d e Click here to check your answer. 4. g c Search 2 d e f Using the [LigName] field, extend Search 1 to limit to only those complexes containing a zinc ion. Go to the next step to check your answer. 5. f g Answer 2 c d e Click here to check your answer. 6. f g Search 3 c d e Find all complexes containing at least one protein chain, one DNA chain, and one RNA chain. First use "all[Filter]" to find all structures, then use terms such as "NOT 0[DnaChainCount]". 7. f g Answer 3 c d e Click here to check your answer. 8. g c Search 4 d e f Find all structures determined by x-ray diffraction from Archaea at resolutions less than 2.0 Angstroms. Use the ":" range operator as follows: 0.0:2.0[resolution]. Use the [Filter] field to limit the results to those structures linked to SNP records, indicating that they are a structural model for a known SNP. Using Index in Preview/Index, look for terms in the [Filter] field such as "structure abc", which find links between structure and database abc. 9. f g Answer 4 c d e Click here to check your answer.
1 of 2
6/19/2006 5:45 PM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/entrez_structure...
10. f g Search 5 c d e Find all fungal structures of single RNA chains only with one bound ligand. 11. f g Answer 5 c d e Click here to check your answer.
Revised January 4, 2006
2 of 2
6/19/2006 5:45 PM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/entrez_3Ddom...
Workshop I
Course Main Page Session I Exercises
1. f g Begin with Preview/Index in Entrez 3D Domains c d e From the NCBI home page, click on the "All Databases" link on the top bar. Click on 3D Domains in the list of databases. Click on Preview/Index to view the list of available fields. 2. f g Search 1 c d e Find all structures of mammalian proteins containing only 2 helices and 2 strands. Use the fields [helixcount] and [strandcount], and remember to limit [domainno] to 0. 3. f g Answer 1 c d e Click here to check your answer. 4. f g Search 2 c d e Find all protein structures from green plants with molecular weights greater than 100 kDa. Use a term such as 100000:1000000[molweight]. 5. f g Answer 2 c d e Click here to check your answer. 6. f g Search 3 c d e Find all polypeptide chains containing between 10 and 20 3D domains. First use the [domainno] field and the range operator ":". Example: 5:8[domainno] gives 5-8 3D domains. Find the polypeptide with the largest number of 3D domains. How many does it have? 7. f g Answer 3 c d e Click here to check your answer. 8. f g Search 4 c d e Find all human 3D Domains published between October 15, 2004 and November 15, 2004. Use the [pdat] field with date format YYYY/MM/DD. 9. f g Answer 4 c d e Click here to check your answer.
Revised January 4, 2006
1 of 1
6/19/2006 5:45 PM
Block 1
Block 2
Block 3
For each 3D domain, locate SSEs (secondary structure elements), and represent them as individual vectors.
2 5
Human IL-4
5 6
VAST: Calculate ij
Vector position about the z axis
For both the query and target structures, Calculate the midpoint of each SSE. For each SSE k, align k along z and project midpoints onto the xy plane. Then calculate [ij]k for i k, j k.
5 2
For both the query and target structures, For each SSE k, set the origin at the midpoint of k.
1 3
r13
1
z13
xy
2 5
4 6
14
VAST: Refinement
Aligned residues are red
C atoms are added to the aligned SSEs Alignments are allowed to extend beyond SSE boundaries All atoms are added to the models, and the detailed backbone and sidechain positions are refined
10
1 1
IL-4
5 3 6 1
2 3 4
Arcs: 16<>15
must follow sequence order Select path with highest weights
9
IL-6
2 5 1
VAST: Summary
Secondary structure elements are represented as vectors and are aligned based on their relative orientations
VAST ignores loops and tolerates variation in SSE length The initial alignment is wholly ignorant of atomic coordinates
Helix 2
Helix 3 N N N C
VAST: Scoring
p = d P(s > s0, n) c(n, P1, P2)
The probability that the VAST alignment occurred by chance.
Search space: Number of possible alignments of n SSEs between vector sets P1 and P2.
13
14
Query by whole chain Not found using whole chain query! Query by domain 5
Cn3D
17
18
Structure + Function
VAST finds proteins that have similar 3D folds CD-Search finds proteins that have similar sequences and similar functions Curated CDs = VAST + CD-Search Proteins that have similar 3D folds, similar sequences and similar functions
New!
19
20
Cn3D
Cn3D
VAST
10
20
30
40
50
60
70
80
90
100
VAST
10
20
30
40
50
60
70 80 90
100
cd00203
model alignment RMS 0 4 8 12
10
20
30
40
50
60
70
80
90
100 A. Marchler-Bauer 22
21
cd00659: A Curated CD
CD Family Values
Residues aligned in the parent must be aligned in the child
Parent: cd00397, C-term catalytic domain of DNA breaking-rejoining enzymes
164 columns
218 columns
23
24
Curated CD Summary
Cn3D
catalytic residue
structure to the query B. VAST: Find the structural neighbors to the most sequence similar structure C. Cn3D: Import and align the sequence to the VAST alignment using
algorithms in Cn3D
25
26
BLAST
1
Import
27
curated CDs geometry violation Fix these problems by adjusting the block lengths!
Geometry violations: Indicated by green shading These result when a loop between aligned blocks in the import window is too short to span the distance between the block ends based on the master structure in the template
29 30
Cn3D
pfam02518
31
32
33
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/plasmodium.html
Workshop II
Course Main Page Workshop II Exercises Amino Acid Explorer
1 of 2
6/19/2006 6:16 PM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/plasmodium.html
Click the structure evidence from the right window in the panel and click Show. Make a note of the comment displayed and the highlighted region of the structure. Verify the comment by highlighting the active site residue. 9. g c Investigate a possible binding surface d e f Study the distinctive shape of the domain by rendering it as a space filling model (use Style / Edit Global Style). Color the molecule by charge (use Style / Coloring shortcuts). Where do you think the DNA might bind? Quit Cn3D when you are finished.
Revised June 19, 2006
2 of 2
6/19/2006 6:16 PM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/lhx3.html
Workshop II
Course Main Page Workshop II Exercises Amino Acid Explorer
1 of 2
2/23/2007 1:08 AM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/lhx3.html
In the row to the right of the List button, change Graphics to Table, and click List. 9. g c View the VAST alignment d e f Check the boxes to the left of the top three structure neighbors. Click View 3D Structure to launch Cn3D. 10. g c Import NP_055379 d e f In the alignment window, choose Edit / Enable Editor. Choose Imports / Show Imports. In the Imports window, choose Edit / Import Sequences. Choose Network via accession and click OK. Type NP_055379 in the box and click OK. 11. g c Align NP_055379 d e f Choose Algorithms / BLAST/PSSM and click anywhere on the pair of sequences. If there are red-shaded regions, try the Block Aligner. Otherwise, skip the next two steps. Choose Algorithms / Block Align Single and click on the sequences. In the dialog, uncheck Global Alignment, then click OK. Choose Alignments / Merge All to merge your new alignment. Close the Imports window. 12. g c Find the SNP position in the alignment d e f Find the SNP position in NP_055379 (the bottom row) in the alignment window by moving your mouse across the sequence and monitoring the location in the lower left corner of the alignment window. In the alignment window, choose Mouse Mode / Select Columns. Click on the alignment column at the SNP position. 13. g c Locate the SNP position in the structure d e f Locate the wild type residue in the structure. What metal binding site is it near? Clear your highlighting by clicking in whitespace in the alignment window. Discover what amino acids contact the metal ion by double-clicking the metal ion from any of the structures and searching by distance with a radius of 4 Angstroms. What residues are they? 14. g c Investigate the possible changes caused by the mutation d e f Click here to launch the Amino Acid Explorer. Using the Compare menus in the left bar of the page, compare the wild type (Tyr) and mutant (Cys) residues using Text. What are the significant differences between these residues? Compare them using Graphics to assess the relative sizes and shapes of the side chains. 15. g c Find residues in contact with the SNP position d e f Return to Cn3D. Again select the column at the SNP position as you did before. In the structure window, choose Show/Hide / Select by Distance / Residues Only. Set the radius to 3.0 Angstroms. 16. g c Predict the consequences of the SNP d e f Are any of the residues that contact the ligand within 3.0 Angstroms of the SNP position? Knowing what the mutant residue is, what additional interactions might occur in the mutant protein? What might be the biochemical and biological consequences of these interactions?
Revised January 17, 2007
2 of 2
2/23/2007 1:08 AM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/RNaseH.html
Workshop II
Course Main Page Workshop II Exercises Amino Acid Explorer
1 of 2
2/23/2007 1:09 AM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/RNaseH.html
9. g c Analyze the conserved domains in the structure d e f What is this functional domain? What kind of CD record is this (what source database)? Do any of the other aligned structures look familiar? Find the features menu in the "Show Alignment" row. What functional features are annotated on this domain? 10. g c View the CD in Cn3D d e f Launch Cn3D by clicking "Show Structure." Click "Show Annotations Panel" in the CDD Descriptions panel, and highlight each functional feature. Do any of them occur near the long, internal disordered region in 1TFR? What might this region do? 11. g c Locate structural features unique to 1TFR d e f Select Style/Coloring Shortcuts/Object. Can you find a region close in space to the disordered region analyzed in the previous step that is unique to 1TFR? Can you design a deletion mutant of 1TFR that may reveal specific interactions between RNase H and other proteins? 12. g c EXTRA CREDIT (optional): Use BLAST to find sequence-similar structures to 1TFR d e f Keep Cn3D open, and open a new browser window. From the NCBI home page, retrieve 1TFR in Entrez Protein. Click on "Related Structure" in the Links menu to the right of the accession. 13. g c EXTRA CREDIT (optional): View the BLAST results d e f Do you find any structures? Do any of the structures look familiar from your VAST search? Compare the BLAST alignment to the VAST alignment between these structures and 1TFR. How did BLAST do?
Revised January 17, 2007
2 of 2
2/23/2007 1:09 AM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/photolyase.html
Workshop II
Course Main Page Workshop II Exercises Amino Acid Explorer
1 of 2
6/19/2006 6:19 PM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/photolyase.html
Make a list of the residues that contact FAD in E. coli. Based on the BLAST alignment, which of these residues are conserved in both proteins? Quit Cn3D when you are finished. 8. g c Use VAST to find structure neighbors to 1IQR d e f Return to the NCBI home page, and click Structure on the top tool bar. Enter 1IQR in the search box and click Go. Click on the accession to load the structure summary page. Click on the gray bar labeled Chain A to load the VAST neighbors. 9. g c View the VAST alignment in Cn3D d e f Type the PDB code (1DNP) for the E. coli sturcture in the Find box and click Find. Check the box to the left of chain A of the E. coli structure (make sure you choose the entire chain, the row with the longest alignment). Click "View 3D Structure". 10. f g Analyze the co-factor binding site c d e Referring to your list of residues contacting FAD in E. coli, determine if these residues are conserved in 1IQR according to the VAST alignment. Select Style / Coloring Shortcuts / Object. Highlight the FAD in 1IQR and find the residues within 3.0 Angstroms of the ligand. Do the same residues contact FAD in both structures? Are the residues that do contact FAD in both structures conserved? 11. g c Locate potential features conferring thermostability d e f Choose Style / Coloring Shortcuts / Secondary Structure (helices are green, strands are tan, and loop/coils are blue). Which sequence has more gaps (represented as ~ symbols)? In what kind of secondary structure element do they occur most frequently? 12. g c Locate potential features conferring thermostability d e f In the sequence window, choose View / Find pattern, and search for PPP (proline triplet). Locate any matches. In which sequence is it? Are there other prolines nearby? View these residues in the structure. Proline-rich sequences tend to form polyproline helices. Do you see any evidence of that happening here? 13. f g Develop a hypothesis c d e Using the evidence gathered in the last step, form a hypothesis about how the T. thermophilus structure may be more stable at high temperature than the E. coli protein.
Revised June 19, 2006
2 of 2
6/19/2006 6:19 PM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/challenge.html
Workshop II
Course Main Page Workshop II Exercises Amino Acid Explorer
1 of 3
2/23/2007 1:12 AM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/challenge.html
master. Click on the accession of the most similar structure. Click on the grey chain A bar to load the VAST neighbors. 8. g c Analyze the VAST neighbors d e f Change the List subset menu to "High redundancy" and click List. Compare the three best VAST neighbors to the sequences found by BLAST. How do the VAST alignments compare to the BLAST alignments? Which are better? 9. g c Import a VAST alignment into Cn3D d e f Change the menu from 'Graphics' to 'Table' and click List. Check the boxes to the left of the three best VAST hits that are photolyases and that are not identical to the query (check the %id column). Click "View 3D Alignment". 10. g c Clean up the VAST alignment d e f In the sequence window, select Edit / Enable Editor. In the structure window, select Style / Coloring Shortcuts / Secondary Structure. Using the scroll bar in the sequence window, carefully scan the alignment for blocks of fewer than 4 residues. Delete these blocks by selecting Edit / Delete Block, and then clicking on the block to delete. What kind of secondary structure elements were these little blocks in? 11. g c Import AAB32328 into Cn3D d e f Select Imports / Show Imports. In the Imports window, select Edit / Import Sequences. Select Network and enter AAB32328. You should see AAB32328 (gi 688103) paired, but unaligned, to the master structure. 12. g c Alignment trial 1: PSI-BLAST d e f Select Algorithms / BLAST/PSSM Single. Position the pointer over the sequence pair, and click. Position the pointer over the aligned pair, and find the e-value of the alignment shown in the bottom of the window. 13. g c Alignment trial 1: PSI-BLAST d e f Look for red shaded areas that indicate portions of the PSI-BLAST alignment that do not match the VAST alignment), and also for green shaded areas that indicate geometry violations. What portion(s) of the sequence did not align? 14. g c Alignment trial 2: BLOCK Aligner d e f Select Algorithms / Block Align Single. Click on the paired sequences. In the options window, accept all defaults except uncheck "Global Alignment." 15. g c Fixing a geometry violation d e f Find the two green-shaded residues in the bottom row. The shading indicates a geometry violation, usually meaning that the blocks are too tight and need to be relaxed (shortened). In the alignment window, choose Mouse Mode / Horizontal drag 16. g c Fixing a geometry violation d e f Find the block on the left side of the geometry violation. Note that this block ends with random coil. Click on the block end (->) and drag it one residue to the left. Repeat the BLOCK Aligner. The green should disappear! 17. g c Now for the rest of the sequence... d e f
2 of 3
2/23/2007 1:12 AM
NCBI Course
http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/challenge.html
What portion of the sequence is still not matching? At this point we will focus mainly on this problematic region. 18. g c Alignment trial 3: Threader d e f Choose Algorithms / Thread Single. Click on the paired sequences. Accept all defaults, but uncheck "Merge results after each row is threaded". Click OK to begin threading. The Threader will now attempt to align only the portions of the sequence not already in blocks. It may take a few minutes. 19. g c Merge the trial alignment d e f When you see blocks appear, the threader is finished. Position the pointer on the aligned sequences to see the score in the bottom of the window. Select Alignments / Merge All to merge the new alignment into the VAST alignment. AAB32328 now appears as the bottom row. 20. g c Analyze the new alignment d e f You can now evalute the alignment by coloring by sequence conservation, hydrophobicity (looking for conserved hydrophobic cores), or by finding the residues that contact the FAD cofactor. This is a very preliminary example of using the threader, which in general should be run several times using different block alignment structures and combinations of frozen blocks, and then analyzing the threading scores and resulting alignments. This preliminary alignment is only the beginning!
Revised January 17, 2007
3 of 3
2/23/2007 1:12 AM