Escolar Documentos
Profissional Documentos
Cultura Documentos
AIM: To browse Human genome data, OMIM, SNP databases to understand genetic and
metabolic disorders
Genomes: A genome is all of a living thing's genetic material. It is the entire set of hereditary
instructions for building, running, and maintaining an organism, and passing life on to the
next generation. In short, it is the complete set of chromosomes with all the genes (for diploid
organisms, often it is given as a haploid genome) for that species.
Genomic Resources: The organism’s genomic resources are stored as Genome databases
and these are a collection of complete and incomplete large-scale sequencing, assembly,
annotation and mapping projects for cellular organisms. The genome database provides views
for a variety of genomes, complete chromosomes, sequence maps and integrated genetic and
physical maps, organelles, plasmids as well as genome assemblies.
Human Genome: It is the complete information of all the 22 pairs of autosomes and pair
ofsex chromosomes namely X and Y. It includes the information about the location and the
sequence of genes along the length of each chromosome and the distance between two
adjacent genes as well as the entire sequence of nucleotides for each gene (with its allelic
forms) in the entire chromosome complement of 46 chromosomes for both the alleles.
Genetic Disorders: These are the disorders of gene structure and lead to its malfunctioning
thereby affecting the phenotype concerned with that gene (phenotypic expression of that
gene). The study of genetic disorders is done by studying inheritance pattern and gene
structure and function.
Metabolic Disorders: The metabolic disorders are the disorders of metabolism and may be
due some infection and other reasons. By using drug we can treat these disorders and cure
patient.
PROCEDURE:
http://genes.mit.edu/GENSCAN.html
http://www-bimas.cit.nih.gov/molbio/proscan/
All of these principles can be used to help locate the position of genes in DNA and are often
known as “searching by signal,” “searching by content,” and “homology inference”
respectively. Homology inference can be especially helpful, but what happens in cases
without any similar proteins in the databases, and even if homologues can be found,
discovering exon-intron borders and UTRs (5ʼ and 3ʼ Untranslated Regions) can be very
difficult. If you have cDNA available, then you can align it to the genomic sequence to
ascertain where the genes lay, but even this can be quite difficult, and cDNA libraries are not
always available. No one method is absolutely reliable, but one seldom has the luxury of
knowing the complete amino acid sequence to the protein of interest and simply translating
all of the DNA until the correct pieces fall out. This is the only method that would be 100%
positive. Since we are usually forced to discover just where these pieces are, especially with
genomic DNA, computerized analysis becomes essential.
GENE PREDICTION
Gene finding typically refers to the area of computational biology that is concerned with
algorithmically identifying stretches of sequence, usually genomic DNA, that are biologically
functional. This especially includes protein-coding genes, but may also include other
functional elements such as RNA genes and regulatory regions. Gene finding is one of the
first and most important steps in understanding the genome of a species once it has been
sequenced. In its earliest days, "gene finding" was based on painstaking experimentation on
living cells and organisms. Statistical analysis of the rates of homologous recombination of
several different genes could determine their order on a certain chromosome, and information
from many such experiments could be combined to create a genetic map specifying the rough
location of known genes relative to each other. Today, with comprehensive genome sequence
and powerful computational resources at the disposal of the research community, gene
finding has been redefined as a largely computational problem. Determining that a sequence
is functional should be distinguished from determining the function of the gene or its product.
The latter still demands in vivo experimentation through gene knockout and other assays,
although frontiers of bioinformatics research are making it increasingly possible to predict the
function of a gene based on its sequence alone.
PROMOTER PREDICTION
PROCEDURE:
Experiment No: 3
http://blast.ncbi.nlm.nih.gov/Blast.cgi
http://www.ebi.ac.uk/Tools/msa/clustalw2/
Given the nucleotide or amino acid sequence of a biological molecule, what can we know
about that molecule? We can find biologically relevant information in sequences by searching
for particular patterns that may reflect some function of the molecule. These can be
catalogued motifs and domains, secondary structure predictions, physical attributes such as
hydrophobicity, or even the content of DNA itself as in some of the gene finding techniques.
But, what about comparisons with other sequences? Can we learn about one molecule by
comparing it to another? Yes, naturally we can; inference through homology is a fundamental
principle to all the biological sciences. We can learn a tremendous amount by comparing our
sequence against others.
The math can be generalized thus: for any two sequences of length m and n, local, best
alignments are identified as HSPs. HSPs are stretches of sequence pairs that cannot be further
improved by extension or trimming, as described above. For ungapped alignments, the
number of expected HSPs with a score of at least S is given by the formula: E = Kmne−λs.
This is called an E-value for the score S. In a database search n is the size of the database in
residues, so N=mn is the search space size. K and λ are be supplied by statistical theory, and,
can be calculated by comparison to precomputed, simulated distributions. These two
parameters define the statistical significance of an E-value.
PHYLOGENETIC ANALYSIS
Every living organism contains DNA, RNA, and proteins. Closely related organisms
generally have a high degree of agreement in the molecular structure of these substances,
while the molecules of organisms distantly related usually show a pattern of dissimilarity.
Conserved sequences, such as mitochondrial DNA, are expected to accumulate mutations
over time, and assuming a constant rate of mutation provides a molecular clock for dating
divergence. Molecular phylogeny uses such data to build a "relationship tree" that shows the
probable evolution of various organisms. Not until recent decades, however, has it been
possible to isolate and identify these molecular structures. The most common approach is the
comparison of homologous sequences for genes using sequence alignment techniques to
identify similarity. Another application of molecular phylogeny is in DNA barcoding, where
the species of an individual organism is identified using small sections of mitochondrial
DNA. Another application of the techniques that make this possible can be seen in the very
limited field of human genetics, such as the ever more popular use of genetic testing to
determine a child's paternity, as well as the emergence of a new branch of criminal forensics
focused on evidence known as genetic fingerprinting.
ClustalW (Thompson, Higgins & Gibson, 1994) is one of the standard programs
implementing one variant of the progressive method in wide use today for multiple sequence
alignment. The W denotes a specific version that has been developed from the original
Clustal program.
1. Compute the pairwise alignments for all against all sequences. The similarities are
stored in a matrix (sequences versus sequences).
2. Convert the sequence similarity matrix values to distance measures, reflecting
evolutionary distance between each pair of sequences.
3. Construct a tree (the so-called guide tree) for the order in which pairs of sequences
are to be aligned and combined with previous alignments. This is done using a
neighbour-joining clustering algorithm. In the case of ClustalW, a method by Saitou
& Nei is used.
4. Progressively align the sequences/alignments together into each branch point of the
guide tree, starting with the least distant pairs of sequences.
PROCEDURE:
Experiment N0:4
REQUIREMENTS:
Computer system with (legal software) equipped with Internet Connection preferably fast
Broadband.
WEB RESOURCES:
www.expasy.ch
http://blast.ncbi.nlm.nih.gov/Blast.cgi
http://swissmodel.expasy.org/workspace/index.php?func=modelling_simple1
http://nihserver.mbi.ucla.edu/SAVES/
PRINCIPLE:
Homology modeling, also known as comparative modeling of protein refers to constructing
an atomic-resolution model of the "target" protein from its amino acid sequence and an
experimental three-dimensional structure of a related homologous protein (the "template").
Homology modelling relies on the identification of one or more known protein structures
likely to resemble the structure of the query sequence, and on the production of an alignment
that maps residues in the query sequence to residues in the template sequence. It has been
shown that protein structures are more conserved than protein sequences amongst
homologues, but sequences falling below a 20% sequence identity can have very different
structure. Evolutionarily related proteins have similar sequences and naturally occurring
homologous proteins have similar protein structure. It has been shown that three-dimensional
protein structure is evolutionarily more conserved than expected due to sequence
conservation. The sequence alignment and template structure are then used to produce a
structural model of the target. Because protein structures are more conserved than DNA
sequences, detectable levels of sequence similarity usually imply significant structural
similarity.
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure. The approach can be complicated by the presence of alignment gaps
(commonly called indels) that indicate a structural region present in the target but not in the
template, and by structure gaps in the template that arise from poor resolution in the
experimental procedure (usually X-ray crystallography) used to solve the structure. Model
quality declines with decreasing sequence identity; a typical model has ~1-2 Å root mean
square deviation between the matched Cα atoms at 70% sequence identity but only 2-4 Å
agreement at 25% sequence identity. However, the errors are significantly higher in the loop
regions, where the amino acid sequences of the target and template proteins may be
completely different.
5. Model optimization.
6. Model validation.
PROCEDURE:
3. Open BLAST and select Protein BLAST option to find out the template structure of your
protein of interest.
4. Along with the suitable template structure and the sequences go for Swiss model server
(automated mode) for homology modeling.
5. After getting the model visualize in Rasmol and then go for quality assessment of the
model by PROCHECK and Verify3D tool.
Experiment N0: 5
AIM: To retrieve of drug molecule information from data base and calculating drug like
property of the molecule.
WEB RESOURCES:
http://pubchem.ncbi.nlm.nih.gov/
http://www.molinspiration.com/cgi-bin/properties
PRINCIPLE:
Pubchem
Molinspiration server
The server basically calculates the following property of drug molecules that are drawn in in
its interface.
Molecular Volume
This simple topological parameter is a measure of molecular flexibility. It has been shown to
be a very good descriptor of oral bioavailability of drugs. Rotatable bond is defined as any
single non-ring bond, bounded to nonterminal heavy (i.e., non-hydrogen) atom. Amide C-N
bonds are not considered because of their high rotational energy barrier.
logP <= 5,
Molecular weight <= 500,
Number of hydrogen bond acceptors <= 10,
Number of hydrogen bond donors <= 5.
Molecules violating more than one of these rules may have problems with
bioavailability. The rule is called "Rule of 5", because the border values
are 5, 500, 2*5, and 5.
PROCEDURE:
1. Start the computer and establish Internet connection.
2. Open Pubchem data base from NCBI resource.
3. Retrieve the structural feature of drug molecule of interest by using specific key words.
4. Open Molinspiration server and draw the structure of the drug molecule by using the
functional groups in molecular editor.
6. By using the calculate properties button, various property is computed.
Experiment N0: 6
WEB RESOURCES:
http://hex.loria.fr/
PRINCIPLE:
In the field of molecular modeling, docking is a method which predicts the preferred
orientation of one molecule to a second when bound to each other to form a stable complex.
Knowledge of the preferred orientation in turn may be used to predict the strength of
association or binding affinity between two molecules using for example functions. The
associations between biologically relevant molecules such as proteins, nucleic acids,
carbohydrates, and lipids play a central role in signal transduction. Furthermore, the relative
orientation of the two interacting partners may affect the type of signal produced (e.g.,
agonism vs antagonism). Therefore docking is useful for predicting both the strength and type
of signal produced. Docking is frequently used to predict the binding orientation of small
molecule drug candidates to their protein targets in order to in turn predict the affinity and
activity of the small molecule. Hence docking plays an important role in the rational design
of drugs. Given the biological and pharmaceutical significance of molecular docking,
considerable efforts have been directed towards improving the methods used to predict
docking.
Hex is an interactive molecular graphics program for calculating and displaying feasible
docking modes of pairs of protein and DNA molecules. Hex can also calculate protein-ligand
docking, assuming the ligand is rigid, and it can superpose pairs of molecules using only
knowledge of their 3D shapes. Hex has been available for about 12 years now, it is still the
only docking and superposition program to use spherical polar Fourier (SPF) correlations to
accelerate the calculations, and it’s still one of the few docking programs which has built-in
graphics to view the results. Also, as far as I know, it is the first protein docking program to
be able to use modern graphics processor units (GPUs) to accelerate the calculations.
PROCEDURE: