Escolar Documentos
Profissional Documentos
Cultura Documentos
Introduction
To understand and map the universe of protein, it is
necessary to know the protein structures which provide
important information regarding its function and
mechanism of action. The prediction of the native
conformation of proteins is one of the most challenging
problems in molecular biology. Secondary structure
prediction is an important intermediate step in this
process.
Basics of Protein
Structure
Primary primary structure
Secondary ACDEFGHIKLMNPQRSTVWY
Tertiary
Quaternary
…the way
Traditional experimental methods:
X-Ray or NMR to solve three dimensional structures;
it requires relatively large amounts of pure protein
(generally greater than milligram quantities). The
structures of many proteins will remain out of reach.
Thus 3D structure prediction it is a great problem
(more than three decades of history).
Strong demand for structure prediction:
continual advances in molecular biology provide
protein sequence information (primary structures) at
a pace that far exceeds the speed with which higher-
order protein structures can be determined.
Reasons for Predicting Secondary
Structure
Since secondary structure is local, just need amino
acid sequence
Accurate secondary structure prediction can be an
important information for the tertiary structure
prediction
Protein function prediction
Predicting structural change
Protein classification
To gain insights into the protein folding process
Facilitate alignment for homology modeling of
distantly related proteins
To assist 3D structure modeling from NMR data
Secondary Structure Prediction
methods:
1st-generation method
Calculate propensities for each amino acid
Chou-Fasman method (P.Y. Chou, G.D. Fasman, 1974)
2nd-generation method
Calculate propensities for segments of 3-51 amino acids
GOR method (Garnier et al, 1978)
3rd-generation method
Use evolutional information, multiple sequence
alignment
Neural Network method (Qian & Sejnowski 1988, Karplus 1996)
Nearest neighbour methods (Yi & Lander, 1993)
PHD algorithm (Rost & Sander, 1993)
Homology or nearest neighbor comparisons (Levin, 1993)
Evolutionary methods (Barton, Niemann)
Combined approaches (Rost 1994, Levin, Argos)
Chou-Fasman Algorithm
Empirical method (statistical) for secondary structure
prediction [α-helix, β-strand, or coil].
Several prediction methods were developed in recent years,
and have relevant improvement in the accuracy of prediction,
in comparison to the original CF method. Nevertheless, many
authors still use amino acid propensities or CF method, for
2D structure predictions as well as for evolution studies and
in developing or evaluating new prediction methods.
Uses known small amount of structural data.
Based on two parameters:
Frequency
Propensity
Methods
The analyses were performed using PDBselect as a
set of experimentally determined, non-redundant
protein structures in PDB. The PDBselect list with
<25% sequence homology, released in Oct 2007,
which contained 3693 protein chains. All analysis
perform in Human protein.
The extracted protein sequences were classified into
four secondary structural classes (all-alpha, all-beta,
alpha+beta and alpha/beta)[321 human protein] from
the structural information provided by SCOP database.
The 2D structure for every PDBselect entry was
assigned by the DSSP algorithm.
Secondary structure
alphabets
Standard 3-state alphabet: DSSP alphabet by Kabsch &
H: α-helix Sanders (1983):
E: β-strand (extended structure) H: α-helix
C: coil (any other structure) G: 310 helix
I : π-helix
E: extended strand (β-strand)
CASP convention Standard:
B: residue in isolated β-bridge
H = (H, G, I)
T: H-bonded turn
E = (E, B)
S: bend
C = (T, S)
Propensity
Pij= (nij/ni) / (Nj/NT)
where;
nij= number of amino acids (i) that occur in α helix (j).
n = total no. of these residues occur in α helix (j) in database.
i
N = total no. of all amino acids residues in α-helix (j).
j
N = total no. of all amino acids residues in database.
T
http://symatlas.gnf.org/Symatlas/