Você está na página 1de 18

Presented by

Introduction
To understand and map the universe of protein, it is
necessary to know the protein structures which provide
important information regarding its function and
mechanism of action. The prediction of the native
conformation of proteins is one of the most challenging
problems in molecular biology. Secondary structure
prediction is an important intermediate step in this
process.
Basics of Protein
Structure
Primary primary structure

Secondary ACDEFGHIKLMNPQRSTVWY

Tertiary
Quaternary
…the way
Traditional experimental methods:
X-Ray or NMR to solve three dimensional structures;
it requires relatively large amounts of pure protein
(generally greater than milligram quantities). The
structures of many proteins will remain out of reach.
Thus 3D structure prediction it is a great problem
(more than three decades of history).
Strong demand for structure prediction:
continual advances in molecular biology provide
protein sequence information (primary structures) at
a pace that far exceeds the speed with which higher-
order protein structures can be determined.
Reasons for Predicting Secondary
Structure
Since secondary structure is local, just need amino
acid sequence
Accurate secondary structure prediction can be an
important information for the tertiary structure
prediction
Protein function prediction
Predicting structural change
Protein classification
To gain insights into the protein folding process
Facilitate alignment for homology modeling of
distantly related proteins
To assist 3D structure modeling from NMR data
Secondary Structure Prediction
methods:
1st-generation method
Calculate propensities for each amino acid
 Chou-Fasman method (P.Y. Chou, G.D. Fasman, 1974)

2nd-generation method
Calculate propensities for segments of 3-51 amino acids
 GOR method (Garnier et al, 1978)

3rd-generation method
Use evolutional information, multiple sequence
alignment
 Neural Network method (Qian & Sejnowski 1988, Karplus 1996)
 Nearest neighbour methods (Yi & Lander, 1993)
 PHD algorithm (Rost & Sander, 1993)
 Homology or nearest neighbor comparisons (Levin, 1993)
 Evolutionary methods (Barton, Niemann)
 Combined approaches (Rost 1994, Levin, Argos)
Chou-Fasman Algorithm
Empirical method (statistical) for secondary structure
prediction [α-helix, β-strand, or coil].
Several prediction methods were developed in recent years,
and have relevant improvement in the accuracy of prediction,
in comparison to the original CF method. Nevertheless, many
authors still use amino acid propensities or CF method, for
2D structure predictions as well as for evolution studies and
in developing or evaluating new prediction methods.
Uses known small amount of structural data.
Based on two parameters:
 Frequency

 Propensity
Methods
The analyses were performed using PDBselect as a
set of experimentally determined, non-redundant
protein structures in PDB. The PDBselect list with
<25% sequence homology, released in Oct 2007,
which contained 3693 protein chains. All analysis
perform in Human protein.
The extracted protein sequences were classified into
four secondary structural classes (all-alpha, all-beta,
alpha+beta and alpha/beta)[321 human protein] from
the structural information provided by SCOP database.
The 2D structure for every PDBselect entry was
assigned by the DSSP algorithm.
Secondary structure
alphabets
Standard 3-state alphabet: DSSP alphabet by Kabsch &
 H: α-helix Sanders (1983):
 E: β-strand (extended structure)  H: α-helix
 C: coil (any other structure)  G: 310 helix
 I : π-helix
 E: extended strand (β-strand)
CASP convention Standard:
 B: residue in isolated β-bridge
 H = (H, G, I)
 T: H-bonded turn
 E = (E, B)
 S: bend
 C = (T, S)
Propensity
Pij= (nij/ni) / (Nj/NT)
 where;
 nij= number of amino acids (i) that occur in α helix (j).
 n = total no. of these residues occur in α helix (j) in database.
i
 N = total no. of all amino acids residues in α-helix (j).
j
 N = total no. of all amino acids residues in database.
T

 Propensity value of all amino acids are calculated by using a perl


program, in LINUX enviorment.

 The t-test is used to evaluate the significance of the pairwise


differences in intra and inter amino acid propensities in the four
secondary structural class of human protein.
Analysis of GC3
GC3:
Base composition at third codon position.
TBLASTN used to retrieve the corresponding mRNA
sequence from protein sequence.
GC3 value calculated by SingleFasta program
(developed in bioinformatics center, Bose Institute).
Devided into two categories GC3 low (<45%) and GC3
high (>60%).
T-test performed to evaluate significance level of GC3
and corresponding amino acid propensities in four
structural class in Helix, Sheet and coil.
Expression level
Expression level:
Expression data collect from GNF SymAtlas database.
Divided into two categories expression low (<20%) and
expression high (>80%).
T-test performed to evaluate significance level of
Expression values and corresponding amino acid
propensities in four structural class i.e., all-a, all-b, a+b,
and a/b in Helix, Sheet and coil.
Result
Calculation of propensities:
 At the level of individual significance, all-a and all-b show
better significant result in helix as well as sheet and average
in coil. All-a & a+b show average significant in helix and coil
but better in sheet. All-a & a/b less significant in helix but
strong significant in sheet and coil. All-b & a+b show good
significance in helix, but in sheet and coil, show average and
less significance respectively. In all-b and a/b protein show,
all significance difference in helix, but less significance in
sheet as well as coil. A+b and a/b show less significance
deviation in helix, sheet, and coil.
GC3 Vs propensities:
Inter-different structural class are more significance than
intra different structural class.
In case of GC3 high and GC3 low, inter different structural
class are more significant.
Expression Vs propensity:
Sheet structure of all protein class gives significant result.
In low expression protein better significant result are
shown.
Val show significant result in random match.
Conclusion
Intrinsic propensity of amino acids for secondary structure
is influenced by the context of the sequence and structural
organization. This aspect could suggest that propensity for
secondary structure may not be considered a really intrinsic
property of each amino acid, but it must be viewed as
influenced by the contest.
Amino acid propensities for secondary structures are not
only influence by protein structural class, but also genomic
GC3 and expression level.
Although other predictive approaches exist and give results
better than the statistical methods, this results indicate that
improvements of statistical methods are still possible.
Refference:
URL:
PDBselect :http://bioinfo.tg.fh-giessen.de/pdbselect
SCOP: http://scop.mrc-lmb.cam.ac.uk/scop/
 DSSP: http://www.embl-heidelberg.de/dssp/
GNF SymAtlas:

http://symatlas.gnf.org/Symatlas/

Você também pode gostar