Você está na página 1de 46

Exploring 3D Molecular Structures Using NCBI Resources

A Field Guide
Eric Sayers Medha Bhagwat sayers@ncbi.nlm.nih.gov bhagwat@ncbi.nlm.nih.gov revised 11/30/06

Online Resources
Course Resources Course home page PowerPoint slides Workshop I exercises Workshop II exercises Alignment Guide Amino Acid Explorer PSSM Viewer General NCBI NCBI home page About NCBI NCBI news NCBI Structure Sites Structure main page MMDB main page CDD main page VAST page PubChem page NCBI Handbook NCBI Structure Tools Cn3D CD-Search VAST Search CDART NCBI Threader Entrez Structure Entrez 3D Domains Entrez Domains COG MMDB URLs http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi http://www.ncbi.nlm.nih.gov/Structure/VAST/vastsearch.html http://www.ncbi.nlm.nih.gov/Structure/lexington/lexington.cgi?cmd=rps http://www.ncbi.nlm.nih.gov/Structure/RESEARCH/threading.shtml http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Domains http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd http://www.ncbi.nlm.nih.gov/COG/ http://www.ncbi.nlm.nih.gov/Structure/MISC/linking.html#mmdbsrv http://www.ncbi.nlm.nih.gov/Structure http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml http://www.ncbi.nlm.nih.gov/Structure/VAST/vast.shtml http://pubchem.ncbi.nlm.nih.gov
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.chapter.ch3

http://www.ncbi.nlm.nih.gov/Class/Structure/course.html ftp://ftp.ncbi.nih.gov/pub/sayers/Structure/ http://www.ncbi.nlm.nih.gov/Class/Structure/handson_I.html http://www.ncbi.nlm.nih.gov/Class/Structure/handson_II.html http://www.ncbi.nlm.nih.gov/Class/Structure/align_guide.html http://www.ncbi.nlm.nih.gov/Class/Structure/aa/aa_explorer.cgi http://www.ncbi.nlm.nih.gov/Class/Structure/pssm/pssm_viewer.cgi

http://www.ncbi.nlm.nih.gov http://www.ncbi.nlm.nih.gov/About http://www.ncbi.nlm.nih.gov/About/newsletter.html

Other Databases PDB Pfam SMART http://www.rcsb.org/pdb/ http://pfam.wustl.edu/ http://smart.embl-heidelberg.de/

NCBI Structure FTP Resources CDD Cn3D CDART RPS-BLAST ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/ ftp://ftp.ncbi.nih.gov/cn3d/ ftp://ftp.ncbi.nih.gov/pub/mmdb/cdart/ ftp://ftp.ncbi.nih.gov/blast/

Literature
Structural Biology 1. Bourne, P.E. and Weissig, H., eds. Structural Bioinformatics. John Wiley & Sons. Hoboken, NJ. 2003. 2. McPherson, A. Introduction to Macromolecular Crystallography. John Wiley & Sons. Hoboken, NJ. 2002. 3. Cavanagh, J., Fairbrother, W.J., Palmer, A.G., and Skelton, N.J. Protein NMR Spectroscopy. Academic Press, San Diego. 1996. NCBI Databases 1. Chen, J. et al, (2003) MMDB: Entrezs 3D-structure database. Nucleic Acids Res 31, 474477. 2. Marchler-Bauer, A. et al, (2003) CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res 31, 383-387. VAST 1. Gibrat, J.-F. et al, (1996) Surprising similarities in structure comparison. Curr Opin Struct Biol 6, 377-385. 2. Madej, T. et al, (1995) Threading a database of protein cores. Proteins 23, 356-369. CDART 1. Geer, L.Y. et al, (2002) CDART: protein homology by domain architecture. Genome Res 12, 1619-23. Other relevant NCBI publications are linked from http://www.ncbi.nih.gov/Structure/.

Lecture Topics
Exploring 3D Molecular Structures Using NCBI Tools
Lecture 1: Structures in Cn3D Lecture 1 Overview of Structural Informatics at NCBI Indexing Structural Data at NCBI Entrez Protein Entrez Structure Entrez 3D Domains Entrez Conserved Domains (CDD) Finding Structures by text queries (Entrez) Finding Structures by sequence (BLAST, CD-Search) Lecture 2 Finding Structures by 3D and Functional Homology

March 30, 2007

Structural Informatics in Entrez


Folding 3D Domains Units
186,138

Growth of Entrez Structure

Domains Function

12,589

41,527

3D Conformation Structure

Protein Protein Sequence

10,129,516

10,197,719

434

PubChem Chemical Compound Formula

PubChem Bioactive Substance Ligand


17,234,522

PubChem Biological BioAssay Activity

Solving Structures
X-Ray Crystallography
Resolution Disorder
Cn3D

More About Resolution


1EJG: Crambin at 0.54 2TMA: Tropomyosin at 15

5 3 1
Pro-Phe-Ile

NMR Spectroscopy
RMSD ()

Bond C-S C-C C-N C-O S-H C=O C-H N-H O-H

r () 1.82 1.54 1.47 1.43 1.34 1.20 1.09 1.01 0.96

Temperature

Cn3D

protons!!

only alpha carbons!!

PDB
HEADER TITLE COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND SOURCE SOURCE SOURCE SOURCE SOURCE SOURCE SOURCE SOURCE KEYWDS

PDB File: Header


ISOMERASE/DNA 01-MAR-00 1EJ9 CRYSTAL STRUCTURE OF HUMAN TOPOISOMERASE I DNA COMPLEX MOL_ID: 1; 2 MOLECULE: DNA TOPOISOMERASE I; 3 CHAIN: A; 4 FRAGMENT: C-TERMINAL DOMAIN, RESIDUES 203-765; 5 EC: 5.99.1.2; 6 ENGINEERED: YES; 7 MUTATION: YES; 8 MOL_ID: 2; 9 MOLECULE: DNA (5'10 D(*C*AP*AP*AP*AP*AP*GP*AP*CP*TP*CP*AP*GP*AP*AP*AP*AP*AP*TP* 11 TP*TP*TP*T)-3'); 12 CHAIN: C; 13 ENGINEERED: YES; 14 MOL_ID: 3; 15 MOLECULE: DNA (5'16 D(*C*AP*AP*AP*AP*AP*TP*TP*TP*TP*TP*CP*TP*GP*AP*GP*TP*CP*TP* REMARK 1 17 TP*TP*TP*T)-3'); REMARK 2 18 CHAIN: D; 2 RESOLUTION. 2.60 ANGSTROMS. REMARK 19 ENGINEERED: REMARK 3YES MOL_ID: 1; 3 REFINEMENT. REMARK 2 ORGANISM_SCIENTIFIC: HOMO SAPIENS; REMARK 3 PROGRAM : X-PLOR 3.1 3 EXPRESSION_SYSTEM_COMMON: BACULOVIRUS REMARK 3 AUTHORS : BRUNGER EXPRESSION SYSTEM; 4 EXPRESSION_SYSTEM_CELL: SF9 INSECT CELLS; 5 MOL_ID: 2; REMARK 280 6 SYNTHETIC: YES; REMARK 280 CRYSTALLIZATION CONDITIONS: 27% PEG 400, 145 MM MGCL2, 20 7 MOL_ID: REMARK3; 280 MM MES PH 6.8, 5 MM TRIS PH 8.0, 30 MM DTT 8 SYNTHETIC: YES REMARK 290 PROTEIN-DNA COMPLEX, TYPE I TOPOISOMERASE, HUMAN ...

PDB File: Data


ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM Name ATOM ATOM 1 2 3 4 5 6 1 7 8 9 10 11 12 13 14 Atom N CA C O CB CG N CD1 CD2 NE1 CE2 CE3 CZ2 CZ3 CH2 TRP A 203 TRP A 203 TRP A 203 TRP A 203 TRP A 203 TRP TRP A 203 A TRP A 203 TRP A 203 TRP A 203 TRP A 203 TRP A 203 TRP A 203 TRP A 203 TRP A 203 30.156 30.797 30.369 29.315 30.518 30.847 203 32.028 29.980 31.956 30.704 28.657 30.149 28.101 28.849 -4.908 37.767 1.00 50.81 -4.667 36.431 1.00 49.96 -3.337 35.766 1.00 49.18 -3.238 35.147 1.00 49.27 -5.863 35.513 1.00 46.77 -5.651 34.081 -4.908 1.00 44.60 37.767 30.156 -5.234 33.553 1.00 49.72 -5.876 32.984 1.00 43.73 -5.191 32.177 1.00 45.45 -5.582 31.805 1.00 45.23 -6.305 32.877 1.00 46.48 -5.705 30.539 1.00 46.06 -6.431 31.622 1.00 43.08 -6.131 30.463 1.00 45.77 N C C O C C 1.00 C C N C C C C C

From Coordinates to Models


1EJ9: Human topoisomerase I
50.81

Number Atom Name Residue Name

Occupancy

Residue Number Chain ID

Temperature Factor

Issues:
Justification Nomenclature

Entrez Structure (MMDB)


Pubmed PDB Description Taxonomy VAST Neighbors

Indexing into MMDB


Structure Import only experimentally determined structures Create backbone model (C, P only) Convert to ASN.1 Create single-conformer model Verify sequences Add secondary structure
id 1 , name "helix 1" , type helix , location subgraph residues interval { { molecule-id 1 , from 49 , to 61 } } } ,

Protein

3D Domains

Add chemical bonds


inter-residue-bonds { { atom-id-1 { molecule-id 1 , residue-id 1 , atom-id 1 } , atom-id-2 { molecule-id 1 , residue-id 2 , atom-id 9 } } ,

Conserved Domains

Nucleotide

Creating Sequence Records


One record per chain

Entrez 3D Domains
3D Domain 0: 1EJ9A0 = entire polypeptide

Protein 1EJ9A

Nucleotide

Nucleotide

1EJ9C

1EJ9D

Creating 3D Domains

Building the Structure Summary

1EJ9A1 1EJ9A4 3D Domains 1EJ9A3 1EJ9A5

1EJ9A2

< 3 Secondary Structure Elements

Building the Structure Summary

Entrez Conserved Domains (CDD)


CDD v2.10: A database of Position Specific Score Matrices (PSSMs) Single Domains

Cn3D

Pfam SMART CD

pfam01234 5252 (42%) smart00123 575 (5%)

Sanger

Pfam-A seeds: HMM based models representing a wide variety of functional domains derived from SWISS-PROT HMM based models originally concentrating on eukaryotic signaling domains, now expanding NCBI curated domains based on sequence and structural alignments

EMBL

cd01234 2661 (21%)

NCBI

Protein Families

COG

COG0123 4101 (32%)

NCBI

BLAST based alignments derived from complete proteomes of prokaryotes

Building the Structure Summary

CD-Search Summary

Linking Sequence to Function


The PSSM
Position Specific Score Matrix
Arginine scored differently in these two positions

Linking Sequence to Function


The PSSM
Position Specific Score Matrix

Catalytic arginine

Catalytic arginine

cd00659 Topo I catalytic domain

Finding Structures with Entrez


term1[field1] AND/OR/NOT term2[field2] AND/OR/NOT

Entrez Queries
Structure: Find human topoisomerases complexed with dsDNA
topoisomerase AND 2[dnachaincount] AND human[organism]

1. Use field limits and Boolean operators 2. Put phrases in quotes


bacteria topoisomerase I

Structure: Find all fungal structures with bound calcium at 1-2 resolution
calcium[ligname] AND fungi[organism] AND 1.0:2.0[resolution]

57 structures

3D Domains: Find all viral four helix bundles


4[helixcount] AND 0[strandcount] AND 0[domainno] AND viruses[organism]

(Bacteria[Organism] OR bacteria[All Fields]) AND topoisomerase[All Fields]

3D Domains: Find all 50-100 kDa strand-only domains published in 2004


0[helixcount] AND 2004[pdat] AND 50000:100000[molwt]

bacteria[organism] AND topoisomerase I

15 structures

Protein: Find all 50-100 kDa proteins with structures published in 2004
2004[pdat] AND 50000:100000[molwt] AND protein structure[filter]

Bacteria[Organism] AND topoisomerase I[All Fields]

Finding Structures with BLAST


GIKWKFLEHKGPVFAPPYEPLP

Linking Genes to Structures


NC
genomic DNA exons

GVKWKYLEHKGPVFAPPYDPLP

NM Goal: Find the most sequence-similar structure BLAST Related Structures Displays a graphical and text alignment between a query sequence and a similar sequence with structure CD-Search (RPS-BLAST) Displays a graphical alignment of all CDs that match a query sequence
RefSeqs

mRNA protein Find the most sequence-similar structure to the NP

NP

Related Structure Link

Related Structures

Cn3D

Coordinates begin at residue 174

Structures in CDs
CDD v2.09: A database of Position Specific Score Matrices (PSSMs) Single Domains

CD-Search: Functional Homology


CD-Search / RPS-BLAST Query: protein sequence Database: PSSMs
pre-computed in Entrez Protein

Pfam SMART CD

pfam01234

Sanger

29.0% have structures (1523 of 5252) 60.3% have structures (347 of 575) 57.9% have structures directly (1443 of 2494) 100% have structures in their parent (or self)

smart00123

EMBL

cd01234

NCBI

NP_003277

Protein Families

COG

Enter accession, GI, or FASTA sequence into RPS-BLAST

COG0123

NCBI

9.6% have structures (392 of 4101)

CD-Search Output

pfam02919: A Non-curated CD

curated CDs

Click on a colored bar to align your query to the CD

Aligned query

Structures added!

Related Structure vs CD-Search


NP_701529 hypothetical protein PFL0825c [Plasmodium falciparum 3D7] BLASTp against pdb

Available Sequence Homology


Hit CDD 8184 BLAST MMDB

Hit CDD and BLAST MMDB 7136 13343

Human RefSeq proteins (29490)

CDSearch

827

1D3Y

69.4% hit CDD 48.0% BLAST MMDB 45.2% hit both 27.8% hit neither

Summary
Structure Summary Page
Entrez Structure (MMDB) Entrez Protein (protein sequences) Entrez 3D Domains (structural domains for VAST) Entrez CDD (conserved functional domains)

Searching by text
Use indexing fields, ie [organism], [resolution]

Searching by sequence
Related Structures (uses simple BLASTp, BLOSUM62) CD-Search (uses PSSMs from CDD) more sensitive!

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/dna_photo.html

Workshop I
Course Main Page Session I Exercises

Working with Bound Substrates and Cofactors: DNA photolyase


Launch guide window Goals Find and view a structure record Locate a bound co-factor and its binding site Annotate the binding site residues Steps 1. g c Find a structure record using Entrez d e f On the NCBI home page, enter the following query into the search box and click Go: dna photolyase AND thermus thermophilus[organism] Click on the number of Structure hits to view them Open the Structure Summary page 2. f g c d e Click on the accession of the record that is not a complex, 1IQR. 3. f g View the structure in Cn3D c d e Click the "View 3D Structure" button. 4. f g Locate a bound co-factor c d e In the structure window, choose Style / Coloring Shortcuts / Molecule. Find the bound FAD co-factor. It will appear as a brown "ball and sticks" model in the middle of the structure. 5. f g Select the co-factor c d e Zoom in on the FAD co-factor, and double click on any atom of the FAD to highlight it. It will turn yellow when you have succeeded. 6. g c Find residues within 3 Angstroms of the FAD d e f In the structure window, choose Show/Hide / Select by Distance / Residues only. Enter a radius of 3.0 Angstroms in the box. Click OK. All residues within 3.0 angstroms of the FAD should now be highlighted in yellow. 7. g c Create a custom annotation d e f In the structure window, choose Style / Annotate. Click "New" and enter a name for the annotation. 8. g c Annotate side chains d e f Click "Edit Style" and uncheck the boxes next to Helix Objects and Strand Objects. Check the box next to Protein sidechains, and choose a rendering style and color of your choice from the menus to the right. 9. g c Annotate and label the backbone d e f Select Tubes from the rendering menu next to protein backbone, and then select Complete under Show. Color the backbone as you like. Click on the Labels tab, and set the spacing to 1 under Protein Backbone, and select one-letter type. Click Done, then OK, and Done. Launch NCBI Home Page

1 of 2

6/19/2006 5:25 PM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/dna_photo.html

10. g c Display your annotation d e f In the structure window, choose Show/Hide / Show Selected Residues. If you still see helix or strand objects, choose Style / Edit Global Style and uncheck the boxes for Helix Object and Strand Object. Click anywhere in the sequence window to remove the highlighting. 11. f g Save your work c d e You can now use File / Save As to save the annotated file for future use.
Revised January 4, 2006

2 of 2

6/19/2006 5:25 PM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/nos.html

Workshop I
Course Main Page Session I Exercises

Working with bound substrates and cofactors: Nitric oxide synthase


Launch guide window Goals Find information about heterogens from a PDB file Find an amino acid pattern in a protein sequence Display a side chain for a residue in the pattern Create a custom annotation of the heterogen binding site Steps 1. g c Find a structure record using Entrez d e f On the NCBI home page, choose Structure from the Search menu. Enter PDB code 2NOS and click Go. 2. g c Identify heterogens in the structure d e f Click on the accession number, 2NOS, to open the structure summary page. Scroll down to the bottom of the page. What heterogen groups are linked to this record? When was this structure submitted? 3. g c View the structure in Cn3D d e f Click the "View 3D Structure" button on the structure summary page. 4. g c Find an amino acid pattern d e f In the Sequence/Alignment Viewer, choose View / Find Pattern. Search for the nitric oxide synthase (NOS) signature, RCIGRIQW, in which the cysteine is the heme iron ligand. 5. g c Select a single residue in the pattern d e f Find the (yellow) highlighted amino acids in the structure window. Highlight only the cysteine in the pattern by clicking on it in the sequence viewer. 6. g c Create a custom annotation d e f In the structure window, choose Style / Annotate. Click "New" and enter a name for the annotation. Click "Edit Style" and uncheck the boxes next to Helix Objects and Strand Objects. 7. g c Display and label the side chains d e f Check the box next to protein side chains, and render them as "Tubes" using a color of your choice. Click the Labels tab, and set the spacing to 1 under Protein Backbone. Click Done, then OK, and Done. 8. g c View your annotation d e f In the structure, double click on the heterogen, protoporphyrin IX containing Fe. Choose Show/Hide / Show Selected Residues. Click in any white space in the sequence window to remove the highlighting. 9. g c Save your work d e f Launch NCBI Home Page

1 of 2

2/23/2007 12:53 AM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/nos.html

You can now save the annotated file for future use.
Revised January 16, 2007

2 of 2

2/23/2007 12:53 AM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/arg_kin.html

Workshop I
Course Main Page Session I Exercises

Working with bound substrates and cofactors: Arginine kinase


Launch guide window Goals Find structure records using the Taxonomy Browser Locate and identify heterogens in a structure Locate and annotate binding site residues Steps 1. g c Find a species using the Taxonomy Browser d e f On the NCBI home page, click TaxBrowser on the black bar at the top. Type "horseshoe crab" in the box and set the menu to "token set". Click Go. 2. g c Find structure records for a single species d e f In the results list, click on the common name "Atlantic horseshoe crab". In the table, click on the number of structure records. 3. g c Refine the set of structure records d e f Add the following to the current query in the search box: AND 1.0:1.5[resolution] and click Go. Find the record in the list labeled as a transition state and click on the accession number to open the structure summary page. 4. g c Find ligands linked to the structure record d e f Scroll down to the bottom of the page and make a note of the heterogens present. 5. g c Locate relevant literature about a structure d e f Click on the "All References" link at the top of the page, and then open the 1998 abstract by Zhou et al. Make a note of the comments about the positioning of the bound ligands. 6. g c Locate and identify heterogens in a structure d e f Return to the structure summary page by going back twice in your browser, and click "View 3D Structure". Choose Style / Coloring Shortcuts / Molecule Locate the three heterogens next to the magnesium (Mg) ion. They will each be colored differently. Note their relative locations. 7. g c Locate a bound co-factor d e f Zoom in and locate the nitrate ion bound to the Mg ion. (Nitrate has four atoms, and is directly bound to Mg). Double click on the nitrate ion (it should turn yellow). 8. g c Locate binding site residues d e f Choose Show/Hide / Select by Distance / Residues Only and set the distance to 4.5 Angstroms. Find the five arginines among the highlighted residues. Holding down the Control key, Launch NCBI Home Page

1 of 2

2/23/2007 1:00 AM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/arg_kin.html

click on the highlighted residues that are NOT arginine to deselect them. 9. g c Create a custom annotation d e f Choose Style / Annotate, click "New", and give the annotation the name "Arg". Click Edit Style, check the protein side chain box, and render them as "Tubes" with a color of your choice. 10. g c Label residues in the annotation d e f Click the Labels tab and set the protein backbone spacing to 1 and the type to One Letter. Click Done, OK, and then Done. Now deselect the residues to see the annotated color. 11. g c Add a second annotation d e f The catalytic residue is thought to be either E225 or E314. Using Style / Annotate, create a new annotation where the sidechains of these residues are rendered as ball-and-stick models colored by charge. Label these with one letter labels as before. 12. g c Save your work d e f Experiment with turning the annotations on and off in the Annotation panel. You can now save the annotated file for future use.
Revised January 16, 2007

2 of 2

2/23/2007 1:00 AM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/rnaseh.html

Workshop I
Course Main Page Session I Exercises

Locating Disordered Regions: RNase H


Launch guide window Goals Locate disordered residues in a PDB file Locate disordered residues in Cn3D Select and annotate metal-binding side chains Steps 1. g c Retrieve the structure record d e f On the NCBI home page, click on the Structure link on the top tool bar. Enter 1TFR in the search box and click Go. 2. g c View the Structure Summary page d e f Click on the accession number to open the structure summary page. When was this structure submitted? How many amino acids does this protein contain? 3. g c View the original PDB file d e f Click on the PDB code to view the PDB summary. In the upper right portion of the page, click on the icon to the right of the PDB code (the icon looks like a paper document). 4. g c Locate disordered residues in the PDB file d e f Scroll down to REMARK 465 listing the missing residues. Confirm the list by scrolling down to the ATOM records where the coordinates are listed. The residue number appears in the 5th column. 5. g c View the structure in Cn3D d e f Go back in your browser to the structure summary page for 1TFR. Click the "View 3D Structure" button. 6. g c Locate the N- and C-termini using Cn3D d e f In the structure window, choose Style / Edit Global style. Uncheck the Strand and Helix Object boxes. Click on the Labels tab, and check the termini box for Protein Backbone. Click Done. 7. g c Locate the disordered residues using Cn3D d e f Double click on the ends of the discontinuous parts of the structure (they should turn yellow). Locate the highlighted residues in the Sequence/Alignment Viewer. What color are the disordered (missing) residues in the sequence? 8. g c Select metal-binding residues d e f Position the mouse pointer over any residue in the sequence window, and find its number in the lower left panel of the window. Select residues 19, 71, 132 and 155 by shift-clicking them in the sequence window. Find the residues on the structure. What heterogen is near these residues? Launch NCBI Home Page

1 of 2

6/19/2006 5:47 PM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/rnaseh.html

9. g c Create a custom annotation d e f In the structure window, choose Style / Annotate. Click "New" and enter a name for the annotation. Click "Edit Style" and uncheck the boxes next to Helix Objects and Strand Objects. 10. g c Display and label the side chains d e f Check the box next to Protein side chains, select "Tubes" from the menu to the right, and select a color of your choice. Click on the Labels tab, and set the spacing to 1 under Protein Backbone. Click Done, then OK, and Done. 11. f g Annotate a metal ion c d e Double click the nearby magnesium ion. Create a second annotation using Style/Annotate, and color the magnesium ion as you like. Use the Heterogen row of controls. 12. g c View your annotations d e f In the structure window, choose Show/Hide / Show Selected Residues. Double click in any white area of the sequence window to remove the highlighting. Using the File menu, you can now save the file for future use.
Revised June 19, 2006

2 of 2

6/19/2006 5:47 PM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/hfe.html

Workshop I
Course Main Page Session I Exercises

Annotating Single Nucleotide Polymorphism (SNP) Locations: HFE


Launch guide window Goals Use BLink to find a structure corresponding to a protein sequence Locate the position of a SNP in the structure using Cn3D Create an annotation of the region near the SNP Steps 1. g c Retrieve the protein sequence for human HFE d e f On the NCBI home page, enter NP_000401 in the search box and click Go. Click on the number of protein hits. What is this protein? 2. g c Find a sequence-similar structure d e f Click on "Related Structure" in the Links menu to the right of the accession. Change the List menu from 'All MMDB' to 'Non-identical' and click List. What is the most sequence-similar structure (the one with the lowest e-value)? Is it a full or partial sequence match? 3. g c View the related structure in Cn3D d e f Click on the red bar representing the most sequence-similar structure (to the right of its accession). On the Related Structures page, click 'Get 3D Structure data'. Cn3D should launch. Compare the two sequences (red = identical residues, blue = non-identical residues). 4. g c Locate a SNP in Cn3D d e f Find C282 in NP_000401 (a position of a known SNP) by pointing your mouse over the residues in the second sequence. The residue numbers will be shown in the lower left of the sequence window. In the sequence window, click on the residue in the structure aligned to C282 (the top sequence). Find the residue in the structure. In what structural feature is the residue involved? 5. g c Annotate the region near the SNP d e f In the structure window, double click the residue connected to the residue that you highlighted in the previous step. What residue is this? Choose Style / Annotate, click "New", and give the annotation a name. Click Edit Style, and check the box next to protein side chains. Render them as ball and sticks, and color them as you choose. 6. g c Label the residues near the SNP d e f Click the Labels tab and change the spacing to 1 under protein backbone. If labels don't appear, click Apply. Click Done, OK, and Done. If the virtual bond between the residues is still visible, select Style / Edit Global Style and uncheck "Virtual Disulfides". Launch NCBI Home Page

1 of 2

6/19/2006 5:43 PM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/hfe.html

7. g c Interpret your results d e f What structural role does the residue play that aligns to C282 in NP_000401? What could be the consequence of mutating this residue to a tyrosine? Click here for a hint.
Revised November 22, 2005

2 of 2

6/19/2006 5:43 PM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/Rtrna.html

Workshop I
Course Main Page Session I Exercises

Investigating a Molecular Interface: Arginyl tRNA Synthetase


Launch guide window Goals View a molecular complex in Cn3D Find and annotate a molecular interface in the complex Identify residues involved in the interface Steps 1. g c Navigate to Entrez Structure d e f On the NCBI home page, click the All Databases link on the black bar at the top. Click on the word Structure, then click on Preview/Index. 2. g c Find structures using an Entrez query d e f Find all structures containing bound arginine from the genus Saccharomyces using the [ligname] and [organism] fields: arginine[ligname] AND saccharomyces[organism] Type this in the search box and click Preview to see how many structures match the query. 3. g c Refine the Entrez query d e f Now add the following to the query to limit to those structures containing one RNA chain: AND 1[rnachaincount] Click Preview, and then click on number of returned structures. 4. g c View the structure in Cn3D d e f Click the accession to open the structure summary page for this record. Click "View 3D Structure" to launch Cn3D. 5. g c Locate the molecular interface d e f In the sequence window, select Mouse Mode / Select Rows. Select the entire RNA (chain B) by clicking anywhere in the sequence. In the structure window, choose Show/Hide / Search by distance, residues only and use a radius of 3.5 Angstroms. 6. f g Display the molecular interface c d e Choose Show/Hide / Show Selected Residues. Choose Style / Edit Global Style, and uncheck the boxes next to Helix Objects and Strand Objects. Click Done. 7. g c Display the protein side chains d e f Click anywhere in white space in the sequence window to remove the highlighting. Choose Style / Edit Global Style, and check the box next to protein side chains, render them as ball and stick models with a color of your choice. 8. g c Display the nucleotide side chains d e f Check the box next to nucleotide side chains, and also render as ball and stick. Render the nucleotide backbone as "Tubes", then choose Complete from the menu. Click Done. Launch NCBI Home Page

1 of 2

6/19/2006 5:44 PM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/Rtrna.html

9. g c Explore charges at the molecular interface d e f Choose Style / Edit Global Style, and render the protein side chains as space filling models colored by charge. What charges are present at the interface? 10. g c Explore the quality of the structural data at the molecular interface d e f Choose Style / Coloring Shortcuts / Temperature. What areas of the protein and RNA are the least well determined (more red)? The most well determined (more blue)? Where are these portions relative to the interface? 11. g c Label residues in the interface d e f Choose Style / Global Style, and render the protein side chains as ball and stick models. Click on the Labels tab and change the protein backbone spacing to 1. Click Done. 12. f g Examine contacts in the interface c d e Choose Style / Coloring Shortcuts / Molecule. Zoom in and double click a labeled residue. Use Show/Hide / Search by distance, Other Molecules to find RNA atoms within 3.5 Angstroms of this residue. Repeat for other residues as you like. 13. g c Draw conclusions about contacts in the interface d e f What are the most common amino acids involved in the contacts? What part of the RNA is involved in most of the contacts?
Revised November 22, 2005

2 of 2

6/19/2006 5:44 PM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/entrez_structure...

Workshop I
Course Main Page Session I Exercises

Entrez Searching: Entrez Structure


Launch guide window Goals Find structures using Entrez queries Construct advanced queries using Preview/Index. Steps Launch NCBI Home Page

Entrez Structure Help

1. f g Begin with Preview/Index in Entrez Structure c d e From the NCBI home page, click on the "All Databases" link on the top bar. Click on Structure in the list of databases. Click on Preview/Index to view the list of available fields. 2. g c Search 1 d e f Find all protein structures from primates. Use the fields [Organism] and [ProteinChainCount]. Add a term to your search to find all structures of protein-DNA complexes from primates containing only double-stranded DNA. Use the field [DnaChainCount]. Go to the next step to check your answer. 3. f g Answer 1 c d e Click here to check your answer. 4. g c Search 2 d e f Using the [LigName] field, extend Search 1 to limit to only those complexes containing a zinc ion. Go to the next step to check your answer. 5. f g Answer 2 c d e Click here to check your answer. 6. f g Search 3 c d e Find all complexes containing at least one protein chain, one DNA chain, and one RNA chain. First use "all[Filter]" to find all structures, then use terms such as "NOT 0[DnaChainCount]". 7. f g Answer 3 c d e Click here to check your answer. 8. g c Search 4 d e f Find all structures determined by x-ray diffraction from Archaea at resolutions less than 2.0 Angstroms. Use the ":" range operator as follows: 0.0:2.0[resolution]. Use the [Filter] field to limit the results to those structures linked to SNP records, indicating that they are a structural model for a known SNP. Using Index in Preview/Index, look for terms in the [Filter] field such as "structure abc", which find links between structure and database abc. 9. f g Answer 4 c d e Click here to check your answer.

1 of 2

6/19/2006 5:45 PM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/entrez_structure...

10. f g Search 5 c d e Find all fungal structures of single RNA chains only with one bound ligand. 11. f g Answer 5 c d e Click here to check your answer.
Revised January 4, 2006

2 of 2

6/19/2006 5:45 PM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopI/entrez_3Ddom...

Workshop I
Course Main Page Session I Exercises

Entrez Searching: Entrez 3D Domains


Launch guide window Goals Find 3D domain records using Entrez 3D Domains Construct advanced queries using Preview/Index Steps Launch NCBI Home Page

Entrez 3D Domain Help

1. f g Begin with Preview/Index in Entrez 3D Domains c d e From the NCBI home page, click on the "All Databases" link on the top bar. Click on 3D Domains in the list of databases. Click on Preview/Index to view the list of available fields. 2. f g Search 1 c d e Find all structures of mammalian proteins containing only 2 helices and 2 strands. Use the fields [helixcount] and [strandcount], and remember to limit [domainno] to 0. 3. f g Answer 1 c d e Click here to check your answer. 4. f g Search 2 c d e Find all protein structures from green plants with molecular weights greater than 100 kDa. Use a term such as 100000:1000000[molweight]. 5. f g Answer 2 c d e Click here to check your answer. 6. f g Search 3 c d e Find all polypeptide chains containing between 10 and 20 3D domains. First use the [domainno] field and the range operator ":". Example: 5:8[domainno] gives 5-8 3D domains. Find the polypeptide with the largest number of 3D domains. How many does it have? 7. f g Answer 3 c d e Click here to check your answer. 8. f g Search 4 c d e Find all human 3D Domains published between October 15, 2004 and November 15, 2004. Use the [pdat] field with date format YYYY/MM/DD. 9. f g Answer 4 c d e Click here to check your answer.
Revised January 4, 2006

1 of 1

6/19/2006 5:45 PM

The Cn3D Alignment Model


Exploring 3D Molecular Structures Using NCBI Tools
Lecture 2: Alignments in Cn3D
Each sequence is aligned pairwise to the master PDB sequence Aligned blocks represent secondary structure elements Aligned blocks have no internal gaps Aligned sequences have a residue at each column in the block Residues in the same column occupy the same position in space

Block 1

Block 2

Block 3

March 30, 2007


2

What Cn3D Can Do


Render, rotate, and annotate multiple structure models Create and edit multiple sequence alignments Import and align sequences and structures based on an existing alignment

Finding Structures by Homology


Use simple sequence homology (BLASTp)
Finds pairwise alignment based on sequence similarity

Use sequence and functional homology (CD-Search)


Finds multiple sequence alignment based on sequence similarity

What Cn3D Cant Do


Alter the structural coordinate data Create a theoretical structure or run MD simulations Read or write PDB files directly
3

Use structural homology (VAST)


Finds multiple sequence alignment based on structural similarity

Use sequence, structure and function (Curated CDs)


Finds multiple sequence alignment based on sequence and structural similarity

VAST: Creating Structural Alignments


Why search for similar structures? To superimpose structures To find homologs that sequence searches cannot: distant protein homologs often conserve structure more strongly than sequence To explore protein evolution: similar protein folds can be used to support different functions To identify conserved core elements of a protein fold that can be used to model related proteins of unknown structure

VAST: Structural Neighbors


Vector Alignment Search Tool
4

For each 3D domain, locate SSEs (secondary structure elements), and represent them as individual vectors.
2 5

Human IL-4
5 6

VAST: Calculate ij
Vector position about the z axis

VAST: Calculate (rik, zik)


Vector position relative to the xy plane

For both the query and target structures, Calculate the midpoint of each SSE. For each SSE k, align k along z and project midpoints onto the xy plane. Then calculate [ij]k for i k, j k.
5 2

For both the query and target structures, For each SSE k, set the origin at the midpoint of k.
1 3

r13
1

Then calculate rik and zik for the endpoints of SSEs i k.

z13

xy

2 5

4 6

14

VAST: Create Comparison Graph


N
4 2

VAST: Refinement
Aligned residues are red
C atoms are added to the aligned SSEs Alignments are allowed to extend beyond SSE boundaries All atoms are added to the models, and the detailed backbone and sidechain positions are refined
10

1 1

IL-4

5 3 6 1

Nodes: r13<>r12 z13<>z12

2 3 4
Arcs: 16<>15
must follow sequence order Select path with highest weights
9

IL-6
2 5 1

Alignment extended to the end of this strand

VAST: Alignment of Sequence


Aligned blocks represent structural core elements Aligned blocks have no internal gaps Aligned residues occupy the same position in space Aligned residues are shown in CAPITAL letters
Helix 1

VAST: Summary
Secondary structure elements are represented as vectors and are aligned based on their relative orientations
VAST ignores loops and tolerates variation in SSE length The initial alignment is wholly ignorant of atomic coordinates

Pathways through aligned SSEs respect sequence order


VAST is sensitive to topology
C C

Helix 2

Helix 3 N N N C

Alignments are extended and optimized using all-atom models


Aligned blocks may extend across or into loops or other SSEs
Helix 4
11 12

VAST: Scoring
p = d P(s > s0, n) c(n, P1, P2)
The probability that the VAST alignment occurred by chance.

Accessing VAST Neighbors

d P(s > s0, n) c(n, P1, P2)

Number of structures searched (set to 500)

Probability of observing an alignment of n SSEs with a score greater than s0 by chance.

links to VAST neighbors

Search space: Number of possible alignments of n SSEs between vector sets P1 and P2.

13

14

VAST Neighbor View

Query by Chain vs 3D Domain


c(n, P1, P2) is smaller for a 3D domain!

Query by whole chain Not found using whole chain query! Query by domain 5

links to structure-based sequence alignments


15 16

VAST: Multiple Alignments

Cn3D

Entrez Links to VAST Neighbors


Limiting VAST results by an Entrez query in 3D Domains

#3 AND human[orgn] AND 4[helixcount] AND 0[domainno]

173 VAST neighbors

17

18

Submitting a PDB File to VAST


Redesigned interface! This is the best way to convert PDB into MMDB format!

Structure + Function
VAST finds proteins that have similar 3D folds CD-Search finds proteins that have similar sequences and similar functions Curated CDs = VAST + CD-Search Proteins that have similar 3D folds, similar sequences and similar functions

New!

19

20

Cn3D

Curating CDs with VAST


smart00235

Cn3D

CD-Curation: Effect on model alignment accuracy


model alignment RMS 0 4 8 12

VAST

10

20

30

40

50

60

70

80

90

100

%id in structure alignment

VAST

model alignment RMS 0 4 8 12

RPS-BLAST before curation

10

20

30

40

50

60

70 80 90

100

cd00203
model alignment RMS 0 4 8 12

%id in structure alignment

RPS-BLAST after curation

10

20

30

40

50

60

70

80

90

100 A. Marchler-Bauer 22

21

%id in structure alignment

cd00659: A Curated CD

CD Family Values
Residues aligned in the parent must be aligned in the child
Parent: cd00397, C-term catalytic domain of DNA breaking-rejoining enzymes

164 columns

parent CD cd00659 child CD

218 columns

23

Child: cd00659, C-term catalytic domain of DNA Topo IB

24

Curated CD Summary

Cn3D

A Path to a Structural Template


1. Look for a curated CD A. CD-Search: Youre done if you find one; otherwise continue.
2. Construct a structural alignment A. BLASTp (Related Structure): Find the most sequence-similar

catalytic residue

Aligned query catalytic residues

structure to the query B. VAST: Find the structural neighbors to the most sequence similar structure C. Cn3D: Import and align the sequence to the VAST alignment using
algorithms in Cn3D

25

26

Importing into Cn3D


Master

Cn3D Alignment Algorithms


Algorithm Input
Query, master, BLOSUM62 Query, master, PSSM Query, master, PSSM, blocks Query, master, PSSM, blocks, structure
28

Sequences are initially unaligned, with red regions indicating blocks

BLAST
1

Import

BLAST/PSSM Block Align Thread

27

Identifying Alignment Problems


Block errors: Indicated by red shading These result when the extent of an aligned block in the import window differs from that in the template
block error pfam02518

Trial with NP_001058


Human topoisomerase IIa

curated CDs geometry violation Fix these problems by adjusting the block lengths!

Geometry violations: Indicated by green shading These result when a loop between aligned blocks in the import window is too short to span the distance between the block ends based on the master structure in the template
29 30

Step 1: Related Structures


pfam02518

Step 2: VAST Neighbors of 1ZXM

Cn3D

1ZXM: the most sequence-similar structure

pfam02518

31

32

For more information


Course web pages info@ncbi.nlm.nih.gov NCBI Handbook, Ch. 3

33

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/plasmodium.html

Workshop II
Course Main Page Workshop II Exercises Amino Acid Explorer

Finding a structural template for an unannotated protein


Launch guide window Goals Locate conserved domains in a protein Align the protein to a curated CD Predict functional and structural sites in the protein based on this alignment Steps 1. g c Retrieve a protein sequence using Entrez d e f Find accession NP_701529 in Entrez protein. Open the record and review its annotation. What is known about the function of this protein? 2. g c Locate Conserved Domains (CDs) in the sequence d e f Click the Conserved Domains link to search for CDs in the protein sequence. Click the Full Result button to see all aligned domains. 3. g c Analyze the CD-Search results d e f What CD is the best hit? Is there a curated CD (cd*****)? What function might this protein perform, and where in the sequence is this functional domain located? 4. g c Align the query sequence to a CD d e f Click the red bar corresponding to the curated CD. Find the alignment row labeled query, representing NP_701529. What residues of NP_701529 align to this CD? Residue numbers are in green at the ends of each aligned row. 5. g c Locate functional residues in the domain d e f Locate the Feature Display menu in the Show Alignment row (the rightmost menu). The first feature is currently annotated in the alignment in the top row labeled "Feature 1". The residues involved in this feature are indicated by a "#" sign above them. What residues are they? Does NP_701529 conserve these residues? 6. f g View the alignment in Cn3D c d e Click the "Show Structure" button to launch Cn3D. Locate the query sequence and the master sequence (top sequence with structure) in the alignment. 7. g c Highlight functional features on the structure d e f In the CDD Descriptive Items window, click "Show Annotations Panel". Select the first feature and click Highlight. Does the query conserve these residues? Do the other sequences conserve them? Use the annotations panel to locate the active site residue. Is the active site residue conserved? 8. g c View structure evidence for an annotation d e f Use the annotations panel to highlight the homodimer interface. Launch NCBI Home Page

1 of 2

6/19/2006 6:16 PM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/plasmodium.html

Click the structure evidence from the right window in the panel and click Show. Make a note of the comment displayed and the highlighted region of the structure. Verify the comment by highlighting the active site residue. 9. g c Investigate a possible binding surface d e f Study the distinctive shape of the domain by rendering it as a space filling model (use Style / Edit Global Style). Color the molecule by charge (use Style / Coloring shortcuts). Where do you think the DNA might bind? Quit Cn3D when you are finished.
Revised June 19, 2006

2 of 2

6/19/2006 6:16 PM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/lhx3.html

Workshop II
Course Main Page Workshop II Exercises Amino Acid Explorer

Finding a structural template for a gene product


Launch guide window Goals Find protein sequences associated with a given gene Find the most closely related structure to these proteins Use the structure and CD to explore a SNP known to cause a disease phenotype Import and align a transcript variant to the template alignment. Steps 1. g c Retrieve a gene record using Entrez d e f From the NCBI home page, search for "LHX3[sym] AND human[orgn]". Click on the Gene hits to view them. 2. g c Review the gene record d e f Click the accession for human LHX3 and review its annotation. Scroll down to graphic under Genomic regions, transcripts, and products. How many proteins is this gene expected to produce? Make a note of the NP accession numbers of these proteins. 3. g c Load a related OMIM record describing a disease phenotype d e f Follow the link to OMIM in the Links menu. Click on the accession of the record labeled "LIM Homeobox Gene 3" 4. g c Find a SNP in LHX3 resulting in a disease phenotype d e f Click on "Allelic Variants" in the left bar of the page. Concentrate on variant .0001, Y116C. What are the clinical consequences of this SNP? 5. g c Find the SNP site in an annotated protein sequence d e f Scroll up to the top of the OMIM page, and follow the link to Gene in the Links menu. Click the accession to open the gene record. Click on the accession of NP_055379 (isoform b) to the right of the graphic. Open the Genpept format and confirm that the sequence has a tyrosine at position 116 (Y116). 6. g c Find a structural template for the region of the protein including the SNP d e f Click 'Related Structure' in the Links menu in the upper right of the protein record. Use the ruler above the graphic to find the best-matching structures that contain a sequence match at the position of the SNP (116). 7. g c Locate the SNP position in the sequence-similar structure d e f Click on the red bar representing a best-matching structure containing position 116. Find the SNP position in the text alignment. The green numbers indicate residue positions in each sequence. What position is the SNP in the structure record? 8. g c Build a VAST alignment d e f Click on the MMDB link (to the right of Reference) in the top section of the page. On the structure summary page, click anywhere in the blue bar (domain 2) that contains the SNP position. Launch NCBI Home Page

1 of 2

2/23/2007 1:08 AM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/lhx3.html

In the row to the right of the List button, change Graphics to Table, and click List. 9. g c View the VAST alignment d e f Check the boxes to the left of the top three structure neighbors. Click View 3D Structure to launch Cn3D. 10. g c Import NP_055379 d e f In the alignment window, choose Edit / Enable Editor. Choose Imports / Show Imports. In the Imports window, choose Edit / Import Sequences. Choose Network via accession and click OK. Type NP_055379 in the box and click OK. 11. g c Align NP_055379 d e f Choose Algorithms / BLAST/PSSM and click anywhere on the pair of sequences. If there are red-shaded regions, try the Block Aligner. Otherwise, skip the next two steps. Choose Algorithms / Block Align Single and click on the sequences. In the dialog, uncheck Global Alignment, then click OK. Choose Alignments / Merge All to merge your new alignment. Close the Imports window. 12. g c Find the SNP position in the alignment d e f Find the SNP position in NP_055379 (the bottom row) in the alignment window by moving your mouse across the sequence and monitoring the location in the lower left corner of the alignment window. In the alignment window, choose Mouse Mode / Select Columns. Click on the alignment column at the SNP position. 13. g c Locate the SNP position in the structure d e f Locate the wild type residue in the structure. What metal binding site is it near? Clear your highlighting by clicking in whitespace in the alignment window. Discover what amino acids contact the metal ion by double-clicking the metal ion from any of the structures and searching by distance with a radius of 4 Angstroms. What residues are they? 14. g c Investigate the possible changes caused by the mutation d e f Click here to launch the Amino Acid Explorer. Using the Compare menus in the left bar of the page, compare the wild type (Tyr) and mutant (Cys) residues using Text. What are the significant differences between these residues? Compare them using Graphics to assess the relative sizes and shapes of the side chains. 15. g c Find residues in contact with the SNP position d e f Return to Cn3D. Again select the column at the SNP position as you did before. In the structure window, choose Show/Hide / Select by Distance / Residues Only. Set the radius to 3.0 Angstroms. 16. g c Predict the consequences of the SNP d e f Are any of the residues that contact the ligand within 3.0 Angstroms of the SNP position? Knowing what the mutant residue is, what additional interactions might occur in the mutant protein? What might be the biochemical and biological consequences of these interactions?
Revised January 17, 2007

2 of 2

2/23/2007 1:08 AM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/RNaseH.html

Workshop II
Course Main Page Workshop II Exercises Amino Acid Explorer

Finding a Structural Template for a Disordered Region


Launch guide window Goals Locate a disordered region in the structure of RNase H from enterobacteria phage T4. Use VAST to find a structural template for this region. Use a curated CD to predict the function of the disordered region. Steps 1. g c Retrieve the structure in Entrez d e f Retrieve accession 1TFR from Entrez Structure. Click on the accession to load the structure summary page. 2. g c Locate the disordered regions using Cn3D d e f Click "View 3D Structure". Make a note of the positions of the three disordered regions (shown as gray letters). 3. g c Retrieve the VAST neighbors for 1TFR d e f Quit Cn3D and return to the structure summary page for 1TFR. Click the gray Chain bar to retrieve the VAST neighbors for the entire chain. 4. f g Analyze the VAST neighbors for 1TFR c d e Change the List setting from "Graphics" to "Table" (rightmost menu in the List row). Click List to update the view. What is the most similar structure? What is the sequence identity and RMSD between this protein and 1TFR? 5. g c View a VAST alignment in Cn3D d e f Check the box to the left of the most similar VAST neighbor to 1TFR. Click "View 3D Alignment." Color the structures by Object (use Style / Coloring shortcuts). 6. g c Analyze the VAST alignment d e f Compare the disordered regions in 1TFR to the corresponding regions in the VAST neighbor using the list of their locations you prepared earlier. Are there regions that have coordinate data in one structure but not the other? (Hint: Use the sequence window. Disordered residues will be gray, and residues with coordinates will be colored.) Make a note of these regions. 7. g Assess the uncertainty in the structural data c d e f Color the structures by temperature (use Style / Coloring shortcuts). How well determined are the regions you identified in the previous step? (more red = more error) 8. g c Find conserved domains in the structure d e f Quit Cn3D. Return to the structure summary by clicking on the MMDB ID 5678 at the very top of the VAST neighbor page. Click on the aligned CD record to view that multiple alignment. Launch NCBI Home Page

1 of 2

2/23/2007 1:09 AM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/RNaseH.html

9. g c Analyze the conserved domains in the structure d e f What is this functional domain? What kind of CD record is this (what source database)? Do any of the other aligned structures look familiar? Find the features menu in the "Show Alignment" row. What functional features are annotated on this domain? 10. g c View the CD in Cn3D d e f Launch Cn3D by clicking "Show Structure." Click "Show Annotations Panel" in the CDD Descriptions panel, and highlight each functional feature. Do any of them occur near the long, internal disordered region in 1TFR? What might this region do? 11. g c Locate structural features unique to 1TFR d e f Select Style/Coloring Shortcuts/Object. Can you find a region close in space to the disordered region analyzed in the previous step that is unique to 1TFR? Can you design a deletion mutant of 1TFR that may reveal specific interactions between RNase H and other proteins? 12. g c EXTRA CREDIT (optional): Use BLAST to find sequence-similar structures to 1TFR d e f Keep Cn3D open, and open a new browser window. From the NCBI home page, retrieve 1TFR in Entrez Protein. Click on "Related Structure" in the Links menu to the right of the accession. 13. g c EXTRA CREDIT (optional): View the BLAST results d e f Do you find any structures? Do any of the structures look familiar from your VAST search? Compare the BLAST alignment to the VAST alignment between these structures and 1TFR. How did BLAST do?
Revised January 17, 2007

2 of 2

2/23/2007 1:09 AM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/photolyase.html

Workshop II
Course Main Page Workshop II Exercises Amino Acid Explorer

Structural Features Conferring Enzyme Thermostability


Launch guide window Goals Use BLAST to find structures in a mesophilic species that are sequence-similar to a query structure in a thermophilic species Analyze the sequence conservation of a binding site between the two structures Use VAST to build a structural alignment between these structures Analyze the VAST alignment to discover structural differences that may confer thermostability Steps 1. g c Find sequence-similar structures with BLAST d e f From the NCBI home page, click the BLAST link on the top tool bar. Click Standard Protein-protein BLAST. 2. g c Set up the BLAST search d e f Enter 1IQRA in the Search box (a T. thermophilus DNA photolyase). Select PDB from the database menu. In the Options section, select Escherichia coli[orgn] from the menu to the right of Limit by Entrez query. Click the BLAST button to begin your search. 3. g c View the BLAST results d e f Click the Format button to retrieve your results. If they are not ready, close the window, and click Format again after a few moments. Scroll down beneath the graphic summary, and make a note of the PDB code of the best E. coli homolog. Click on the Related Structures link just below the graphical summary. 4. f g Analyze your results using BLAST Related Structures c d e The T. thermophilus protein is represented by the top bar, with the conserved FAD binding and photolyase domains shown beneath this (the phrB domains are COGs). Where are the regions of sequence similarity between these proteins? In which conserved domain is the better match? 5. g c View the sequence alignment in Cn3D d e f Click on the red bar of the best hit to the FAD binding domain. Click "Get 3D Structure Data". The E. coli structure is shown, and is the top sequence in the alignment. Residues that are identical between the two proteins will be colored red. 6. g c Locate residues that bind a co-factor d e f Find the FAD co-factor in the structure (it will be the one in the center of the protein). Highlight the FAD by double clicking it. Find all residues within 3.5 Angstroms of the FAD using Show/Hide / Select by Distance. 7. g c Analyze the sequence conservation of these residues d e f Point the mouse over the highlighted E. coli residues and find their number in the lower left corner of the window. Launch NCBI Home Page

1 of 2

6/19/2006 6:19 PM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/photolyase.html

Make a list of the residues that contact FAD in E. coli. Based on the BLAST alignment, which of these residues are conserved in both proteins? Quit Cn3D when you are finished. 8. g c Use VAST to find structure neighbors to 1IQR d e f Return to the NCBI home page, and click Structure on the top tool bar. Enter 1IQR in the search box and click Go. Click on the accession to load the structure summary page. Click on the gray bar labeled Chain A to load the VAST neighbors. 9. g c View the VAST alignment in Cn3D d e f Type the PDB code (1DNP) for the E. coli sturcture in the Find box and click Find. Check the box to the left of chain A of the E. coli structure (make sure you choose the entire chain, the row with the longest alignment). Click "View 3D Structure". 10. f g Analyze the co-factor binding site c d e Referring to your list of residues contacting FAD in E. coli, determine if these residues are conserved in 1IQR according to the VAST alignment. Select Style / Coloring Shortcuts / Object. Highlight the FAD in 1IQR and find the residues within 3.0 Angstroms of the ligand. Do the same residues contact FAD in both structures? Are the residues that do contact FAD in both structures conserved? 11. g c Locate potential features conferring thermostability d e f Choose Style / Coloring Shortcuts / Secondary Structure (helices are green, strands are tan, and loop/coils are blue). Which sequence has more gaps (represented as ~ symbols)? In what kind of secondary structure element do they occur most frequently? 12. g c Locate potential features conferring thermostability d e f In the sequence window, choose View / Find pattern, and search for PPP (proline triplet). Locate any matches. In which sequence is it? Are there other prolines nearby? View these residues in the structure. Proline-rich sequences tend to form polyproline helices. Do you see any evidence of that happening here? 13. f g Develop a hypothesis c d e Using the evidence gathered in the last step, form a hypothesis about how the T. thermophilus structure may be more stable at high temperature than the E. coli protein.
Revised June 19, 2006

2 of 2

6/19/2006 6:19 PM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/challenge.html

Workshop II
Course Main Page Workshop II Exercises Amino Acid Explorer

The Alignment Challenge: An enzyme of unknown structure


Launch guide window Goals Find the most sequence-similar structure to a protein of unknown structure Build a VAST alignment to this structure Use the VAST alignment as a structural template for the query protein Align the query sequence to the VAST alignment using Cn3D Introduction In this exercise we will attempt to find a structural template for a DNA photolyase from D. melanogaster, accession AAB32328, using a well- studied group of bacterial photolyases. This exercise has no real end, but rather is an example of how one might begin a modeling project! Steps 1. g c Retrieve the protein sequence using Entrez d e f Use Entrez Protein to retrieve AAB32328. Click on the accession and review the record. 2. g c Locate conserved domains in the protein d e f Click the Conserved Domains link to view the matching CDs, and then click the Full Report button. What two major functional domains does the protein contain (look for single-domain CDs, not COGs)? From what database were these CDs derived? 3. f g Find a sequence-similar structure to the query c d e Click on the link containing the protein accession to the right of "Query sequence" at the top of the page. Click 'Related Structure' in the Links menu. Change the menu from 'All MMDB' to 'Non-identical', and click List. 4. g c View the BLAST results d e f Change the display menu from 'Table' to 'Graphic' and click List. Based on their names, do the top few hits seem to be reasonable candidates for homologs? 5. g c Analyze the BLAST results d e f Make a note of the most similar structure. What is its e-value? Make a note of the next three most sequence-similar structures. What are their e-values? 6. g c Analyze the locations of matching CDs d e f Change 'Graphic' back to 'Table' and click List. Which CD does the top structure match best? Do any of the structures fully match both CDs? 7. g c Retrieve VAST neighbors to the target structure d e f Since the two single-domain CDs in AAB32328 are not curated records, we will build a VAST multiple alignment as a potential template, using the most similar structure as the Launch NCBI Home Page

1 of 3

2/23/2007 1:12 AM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/challenge.html

master. Click on the accession of the most similar structure. Click on the grey chain A bar to load the VAST neighbors. 8. g c Analyze the VAST neighbors d e f Change the List subset menu to "High redundancy" and click List. Compare the three best VAST neighbors to the sequences found by BLAST. How do the VAST alignments compare to the BLAST alignments? Which are better? 9. g c Import a VAST alignment into Cn3D d e f Change the menu from 'Graphics' to 'Table' and click List. Check the boxes to the left of the three best VAST hits that are photolyases and that are not identical to the query (check the %id column). Click "View 3D Alignment". 10. g c Clean up the VAST alignment d e f In the sequence window, select Edit / Enable Editor. In the structure window, select Style / Coloring Shortcuts / Secondary Structure. Using the scroll bar in the sequence window, carefully scan the alignment for blocks of fewer than 4 residues. Delete these blocks by selecting Edit / Delete Block, and then clicking on the block to delete. What kind of secondary structure elements were these little blocks in? 11. g c Import AAB32328 into Cn3D d e f Select Imports / Show Imports. In the Imports window, select Edit / Import Sequences. Select Network and enter AAB32328. You should see AAB32328 (gi 688103) paired, but unaligned, to the master structure. 12. g c Alignment trial 1: PSI-BLAST d e f Select Algorithms / BLAST/PSSM Single. Position the pointer over the sequence pair, and click. Position the pointer over the aligned pair, and find the e-value of the alignment shown in the bottom of the window. 13. g c Alignment trial 1: PSI-BLAST d e f Look for red shaded areas that indicate portions of the PSI-BLAST alignment that do not match the VAST alignment), and also for green shaded areas that indicate geometry violations. What portion(s) of the sequence did not align? 14. g c Alignment trial 2: BLOCK Aligner d e f Select Algorithms / Block Align Single. Click on the paired sequences. In the options window, accept all defaults except uncheck "Global Alignment." 15. g c Fixing a geometry violation d e f Find the two green-shaded residues in the bottom row. The shading indicates a geometry violation, usually meaning that the blocks are too tight and need to be relaxed (shortened). In the alignment window, choose Mouse Mode / Horizontal drag 16. g c Fixing a geometry violation d e f Find the block on the left side of the geometry violation. Note that this block ends with random coil. Click on the block end (->) and drag it one residue to the left. Repeat the BLOCK Aligner. The green should disappear! 17. g c Now for the rest of the sequence... d e f

2 of 3

2/23/2007 1:12 AM

NCBI Course

http://www.ncbi.nlm.nih.gov/Class/Structure/workshopII/challenge.html

What portion of the sequence is still not matching? At this point we will focus mainly on this problematic region. 18. g c Alignment trial 3: Threader d e f Choose Algorithms / Thread Single. Click on the paired sequences. Accept all defaults, but uncheck "Merge results after each row is threaded". Click OK to begin threading. The Threader will now attempt to align only the portions of the sequence not already in blocks. It may take a few minutes. 19. g c Merge the trial alignment d e f When you see blocks appear, the threader is finished. Position the pointer on the aligned sequences to see the score in the bottom of the window. Select Alignments / Merge All to merge the new alignment into the VAST alignment. AAB32328 now appears as the bottom row. 20. g c Analyze the new alignment d e f You can now evalute the alignment by coloring by sequence conservation, hydrophobicity (looking for conserved hydrophobic cores), or by finding the residues that contact the FAD cofactor. This is a very preliminary example of using the threader, which in general should be run several times using different block alignment structures and combinations of frozen blocks, and then analyzing the threading scores and resulting alignments. This preliminary alignment is only the beginning!
Revised January 17, 2007

3 of 3

2/23/2007 1:12 AM

Você também pode gostar