Você está na página 1de 13

Experiment No: 1

AIM: To browse Human genome data, OMIM, SNP databases to understand genetic and
metabolic disorders

REQUIREMENT: Computer system with (legal software) equipped with Internet


Connection preferably fast Broadband.

WEB RESOURCES USED:


http://www.ncbi.nlm.nih.gov/

THEORY AND PRINCIPLE:

Genomes: A genome is all of a living thing's genetic material. It is the entire set of hereditary
instructions for building, running, and maintaining an organism, and passing life on to the
next generation. In short, it is the complete set of chromosomes with all the genes (for diploid
organisms, often it is given as a haploid genome) for that species.

Genomic Resources: The organism’s genomic resources are stored as Genome databases
and these are a collection of complete and incomplete large-scale sequencing, assembly,
annotation and mapping projects for cellular organisms. The genome database provides views
for a variety of genomes, complete chromosomes, sequence maps and integrated genetic and
physical maps, organelles, plasmids as well as genome assemblies.

Human Genome: It is the complete information of all the 22 pairs of autosomes and pair
ofsex chromosomes namely X and Y. It includes the information about the location and the
sequence of genes along the length of each chromosome and the distance between two
adjacent genes as well as the entire sequence of nucleotides for each gene (with its allelic
forms) in the entire chromosome complement of 46 chromosomes for both the alleles.

OMIM Databases: OMIM is a comprehensive, authoritative, and timely compendium of


human genes and genetic phenotypes. The full-text, referenced overviews in OMIM contain
information on all known Mendelian disorders and over 12,000 genes. OMIM focuses on the
relationship between phenotype and genotype. It is updated daily, and the entries contain
copious links to other genetics resources.

SNP Databases: Single Nucleotide Polymorphism is one form of genomic variation in


population that may occur anywhere in the genome. SNP are the point mutations, i.e., single
base alterations present in alleles. The international SNP Map working group is a consortium,
where data on minor and major alleles is stored SNP is used as one of the genetic markers.

Genetic Disorders: These are the disorders of gene structure and lead to its malfunctioning
thereby affecting the phenotype concerned with that gene (phenotypic expression of that
gene). The study of genetic disorders is done by studying inheritance pattern and gene
structure and function.

Metabolic Disorders: The metabolic disorders are the disorders of metabolism and may be
due some infection and other reasons. By using drug we can treat these disorders and cure
patient.
PROCEDURE:

1. Start the computer and establish Internet connection.


2. Use any search engine like Yahoo / Google or otherwise directly open NCBI web page,
3. Double click on genome-specific resources, and browse for specific data.
Experiment No: 2

AIM: To predict gene and promoter sequence.

REQUIREMENT: Computer system with (legal software) equipped with Internet


Connection preferably fast Broadband.

WEB RESOURCES USED:

http://genes.mit.edu/GENSCAN.html
http://www-bimas.cit.nih.gov/molbio/proscan/

THEORY AND PRINCIPLE:

How are encoded proteins recognized in uncharacterized eukaryotic, genomic DNA?


Translating from all translational start codons to all ʻnonsenseʼ chain terminating, stop
codons in every frame produces a list of ORFs (Open Reading Frames), but which of them, if
any, actually code for proteins? And this only works in organisms without exons and introns,
or in processed mRNAs. Three general solutions to the gene finding
Problem can be imagined:
 All genes have certain regulatory signals positioned in or about them,
 All genes by definition contain specific code patterns, and
 Many genes have already been sequenced and recognized in other organisms so we
can infer function and location by homology if our new sequence is similar enough to
an existing sequence.

All of these principles can be used to help locate the position of genes in DNA and are often
known as “searching by signal,” “searching by content,” and “homology inference”
respectively. Homology inference can be especially helpful, but what happens in cases
without any similar proteins in the databases, and even if homologues can be found,
discovering exon-intron borders and UTRs (5ʼ and 3ʼ Untranslated Regions) can be very
difficult. If you have cDNA available, then you can align it to the genomic sequence to
ascertain where the genes lay, but even this can be quite difficult, and cDNA libraries are not
always available. No one method is absolutely reliable, but one seldom has the luxury of
knowing the complete amino acid sequence to the protein of interest and simply translating
all of the DNA until the correct pieces fall out. This is the only method that would be 100%
positive. Since we are usually forced to discover just where these pieces are, especially with
genomic DNA, computerized analysis becomes essential.

GENE PREDICTION

Gene finding typically refers to the area of computational biology that is concerned with
algorithmically identifying stretches of sequence, usually genomic DNA, that are biologically
functional. This especially includes protein-coding genes, but may also include other
functional elements such as RNA genes and regulatory regions. Gene finding is one of the
first and most important steps in understanding the genome of a species once it has been
sequenced. In its earliest days, "gene finding" was based on painstaking experimentation on
living cells and organisms. Statistical analysis of the rates of homologous recombination of
several different genes could determine their order on a certain chromosome, and information
from many such experiments could be combined to create a genetic map specifying the rough
location of known genes relative to each other. Today, with comprehensive genome sequence
and powerful computational resources at the disposal of the research community, gene
finding has been redefined as a largely computational problem. Determining that a sequence
is functional should be distinguished from determining the function of the gene or its product.
The latter still demands in vivo experimentation through gene knockout and other assays,
although frontiers of bioinformatics research are making it increasingly possible to predict the
function of a gene based on its sequence alone.

PROMOTER PREDICTION

In genetics, a promoter is a region of DNA that facilitates the transcription of a particular


gene. Promoters are typically located near the genes they regulate, on the same strand and
upstream (towards the 5' region of the sense strand). In order for the transcription to take
place, the enzyme that synthesizes RNA, known as RNA polymerase, must attach to the
DNA near a gene. Promoters contain specific DNA sequences and response elements which
provide a secure initial binding site for RNA polymerase and for proteins called transcription
factors that recruit RNA polymerase. These transcription factors have specific activator or
repressor sequences of corresponding nucleotides that attach to specific promoters and
regulate gene expressions.

PROCEDURE:

1. Start the computer and establish Internet connection.


2. Use any search engine like Yahoo / Goggle or otherwise directly use the given website to
use these tools
3. Input your sequence data in proper format and note down the results obtained.

Experiment No: 3

AIM: Multiple Sequence alignment and phylogenetic analysis

REQUIREMENT: Computer system with (legal software) equipped with Internet


Connection preferably fast Broadband.

WEB RESOURCES USED:

http://blast.ncbi.nlm.nih.gov/Blast.cgi
http://www.ebi.ac.uk/Tools/msa/clustalw2/

THEORY AND PRINCIPLE:

Given the nucleotide or amino acid sequence of a biological molecule, what can we know
about that molecule? We can find biologically relevant information in sequences by searching
for particular patterns that may reflect some function of the molecule. These can be
catalogued motifs and domains, secondary structure predictions, physical attributes such as
hydrophobicity, or even the content of DNA itself as in some of the gene finding techniques.
But, what about comparisons with other sequences? Can we learn about one molecule by
comparing it to another? Yes, naturally we can; inference through homology is a fundamental
principle to all the biological sciences. We can learn a tremendous amount by comparing our
sequence against others.

IDENTIFICATION OF HOMOLOGOUS SEQUENCES

In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for


comparing primary biological sequence information, such as the amino-acid sequences of
different proteins or the nucleotides of DNA sequences. A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences, and identify
library sequences that resemble the query sequence above a certain threshold. Different types
of BLASTs are available according to the query sequences. For example, following the
discovery of a previously unknown gene in the mouse, a scientist will typically perform a
BLAST search of the human genome to see if humans carry a similar gene; BLAST will
identify sequences in the human genome that resemble the mouse gene based on similarity of
sequence. The BLAST program was designed by Eugene Myers, Stephen Altschul, Warren
Gish, David J. Lipman and Webb Miller at the NIH and was published in J. Mol. Biol. in
1990.

The math can be generalized thus: for any two sequences of length m and n, local, best
alignments are identified as HSPs. HSPs are stretches of sequence pairs that cannot be further
improved by extension or trimming, as described above. For ungapped alignments, the
number of expected HSPs with a score of at least S is given by the formula: E = Kmne−λs.
This is called an E-value for the score S. In a database search n is the size of the database in
residues, so N=mn is the search space size. K and λ are be supplied by statistical theory, and,
can be calculated by comparison to precomputed, simulated distributions. These two
parameters define the statistical significance of an E-value.
PHYLOGENETIC ANALYSIS

Every living organism contains DNA, RNA, and proteins. Closely related organisms
generally have a high degree of agreement in the molecular structure of these substances,
while the molecules of organisms distantly related usually show a pattern of dissimilarity.
Conserved sequences, such as mitochondrial DNA, are expected to accumulate mutations
over time, and assuming a constant rate of mutation provides a molecular clock for dating
divergence. Molecular phylogeny uses such data to build a "relationship tree" that shows the
probable evolution of various organisms. Not until recent decades, however, has it been
possible to isolate and identify these molecular structures. The most common approach is the
comparison of homologous sequences for genes using sequence alignment techniques to
identify similarity. Another application of molecular phylogeny is in DNA barcoding, where
the species of an individual organism is identified using small sections of mitochondrial
DNA. Another application of the techniques that make this possible can be seen in the very
limited field of human genetics, such as the ever more popular use of genetic testing to
determine a child's paternity, as well as the emergence of a new branch of criminal forensics
focused on evidence known as genetic fingerprinting.

ClustalW (Thompson, Higgins & Gibson, 1994) is one of the standard programs
implementing one variant of the progressive method in wide use today for multiple sequence
alignment. The W denotes a specific version that has been developed from the original
Clustal program.

The basic steps of the algorithm implemented in ClustalW are:

1. Compute the pairwise alignments for all against all sequences. The similarities are
stored in a matrix (sequences versus sequences).
2. Convert the sequence similarity matrix values to distance measures, reflecting
evolutionary distance between each pair of sequences.
3. Construct a tree (the so-called guide tree) for the order in which pairs of sequences
are to be aligned and combined with previous alignments. This is done using a
neighbour-joining clustering algorithm. In the case of ClustalW, a method by Saitou
& Nei is used.
4. Progressively align the sequences/alignments together into each branch point of the
guide tree, starting with the least distant pairs of sequences.
PROCEDURE:

1. Start the computer and establish Internet connection.


2. Use any search engine like Yahoo / Goggle or otherwise directly open the home page by
using the given website.
3. Retrieve the sequence information of a protein of interest
4. Go for BLAST analysis to find out the good homologous sequence of the protein.
5. Collect the homologous sequence and go for multiple sequence alignment by using
ClustalW server and find out the evolutionary status of the protein from phylogenetic tree.

Experiment N0:4

AIM: To obtain a three dimensional model of given protein sequences by Homology


modeling method.

REQUIREMENTS:
Computer system with (legal software) equipped with Internet Connection preferably fast
Broadband.

WEB RESOURCES:

www.expasy.ch
http://blast.ncbi.nlm.nih.gov/Blast.cgi
http://swissmodel.expasy.org/workspace/index.php?func=modelling_simple1
http://nihserver.mbi.ucla.edu/SAVES/

PRINCIPLE:
Homology modeling, also known as comparative modeling of protein refers to constructing
an atomic-resolution model of the "target" protein from its amino acid sequence and an
experimental three-dimensional structure of a related homologous protein (the "template").
Homology modelling relies on the identification of one or more known protein structures
likely to resemble the structure of the query sequence, and on the production of an alignment
that maps residues in the query sequence to residues in the template sequence. It has been
shown that protein structures are more conserved than protein sequences amongst
homologues, but sequences falling below a 20% sequence identity can have very different
structure. Evolutionarily related proteins have similar sequences and naturally occurring
homologous proteins have similar protein structure. It has been shown that three-dimensional
protein structure is evolutionarily more conserved than expected due to sequence
conservation. The sequence alignment and template structure are then used to produce a
structural model of the target. Because protein structures are more conserved than DNA
sequences, detectable levels of sequence similarity usually imply significant structural
similarity.

The quality of the homology model is dependent on the quality of the sequence alignment
and template structure. The approach can be complicated by the presence of alignment gaps
(commonly called indels) that indicate a structural region present in the target but not in the
template, and by structure gaps in the template that arise from poor resolution in the
experimental procedure (usually X-ray crystallography) used to solve the structure. Model
quality declines with decreasing sequence identity; a typical model has ~1-2 Å root mean
square deviation between the matched Cα atoms at 70% sequence identity but only 2-4 Å
agreement at 25% sequence identity. However, the errors are significantly higher in the loop
regions, where the amino acid sequences of the target and template proteins may be
completely different.

The method comprises of the following steps:

1. Template recognition and initial alignment.

2. Select the template sequences of known structure.

3. Align the template and target sequence.

4. Build the model.

5. Model optimization.

6. Model validation.

SWISS-MODEL is a fully automated protein structure homology-modeling server, accessible


via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer). The
purpose of this server is to make Protein Modelling accessible to all biochemists and
molecular biologists Worldwide. Homology modelling combines sequence analysis and
molecular modelling to predict three dimensional structures. You will choose a remote
homologue of your Project protein that has not yet had its structure solved, and use the
SwissModel WWW resource to model the molecule. The theoretical structure will then be
visualized with the SwissPDBViewer and RasMol to gain insight into the way in which its
structure relates to its function. Color coding different physical attributes such as residue
charge, hydrophobicity, and secondary structure elements; different representations, such as
alpha-carbon traces, ʻcartoonʼ graphics, and space-filling models; and super-positioning of
the model with an actual structure all assist in the interpretation.

PROCEDURE:

1. Start the computer and establish Internet connection.


2. Retrieve the sequence information from EXPASY/NCBI

3. Open BLAST and select Protein BLAST option to find out the template structure of your
protein of interest.

4. Along with the suitable template structure and the sequences go for Swiss model server
(automated mode) for homology modeling.

5. After getting the model visualize in Rasmol and then go for quality assessment of the
model by PROCHECK and Verify3D tool.

6. List out the result

Experiment N0: 5

AIM: To retrieve of drug molecule information from data base and calculating drug like
property of the molecule.

REQUIREMENTS: Computer system with (legal software) equipped with Internet


Connection preferably fast Broadband.

WEB RESOURCES:
http://pubchem.ncbi.nlm.nih.gov/
http://www.molinspiration.com/cgi-bin/properties

PRINCIPLE:

Pubchem

PubChem, released in 2004, provides information on the biological activities of small


molecules. It is a component of NIH's Molecular Libraries Roadmap Initiative. PubChem is
organized as three linked databases within the NCBI's Entrez information retrieval system.
These are PubChem Substance, PubChem Compound, and PubChem BioAssay. PubChem
also provides a fast chemical structure similarity search tool.

Molinspiration server
The server basically calculates the following property of drug molecules that are drawn in in
its interface.

LogP (octanol/water partition coefficient)

LogP is calculated by the methodology developed by Molinspiration as a sum of fragment-


based contributions and correction factors. Method is very robust and is able to process
practically all organic, and most organometallic molecules.

Molecular Polar Surface Area

It is calculated based on the methodology as a sum of fragment contributions. O- and N-


centered polar fragments are considered. PSA has been shown to be a very good descriptor
characterizing drug absorption, including intestinal absorption, bioavailability, Caco-2
permeability and blood-brain barrier penetration.

Molecular Volume

This is a method for calculation of molecule volume developed at Molinspiration is based on


group contributions. These have been obtained by fitting sum of fragment contributions to
"real" 3D volume for a training set of about twelve thousand, mostly drug-like molecules. 3D
molecular geometries for a training set were fully optimized by the semi empirical AM1
method.

Number of Rotatable Bonds - nrotb

This simple topological parameter is a measure of molecular flexibility. It has been shown to
be a very good descriptor of oral bioavailability of drugs. Rotatable bond is defined as any
single non-ring bond, bounded to nonterminal heavy (i.e., non-hydrogen) atom. Amide C-N
bonds are not considered because of their high rotational energy barrier.

"Rule of 5" Properties of a drug molecule is set of simple molecular


descriptors used by Lipinski in formulating his "Rule of 5". The rule states,
those most “drug-like” molecules have

 logP <= 5,
 Molecular weight <= 500,
 Number of hydrogen bond acceptors <= 10,
 Number of hydrogen bond donors <= 5.

Molecules violating more than one of these rules may have problems with
bioavailability. The rule is called "Rule of 5", because the border values
are 5, 500, 2*5, and 5.

PROCEDURE:
1. Start the computer and establish Internet connection.
2. Open Pubchem data base from NCBI resource.
3. Retrieve the structural feature of drug molecule of interest by using specific key words.
4. Open Molinspiration server and draw the structure of the drug molecule by using the
functional groups in molecular editor.
6. By using the calculate properties button, various property is computed.

Experiment N0: 6

AIM: To analyze ligand-receptor binding affinity by molecular docking analysis.

REQUIREMENTS: Computer system with (legal software) equipped with Internet


Connection preferably fast Broadband.

WEB RESOURCES:

http://hex.loria.fr/

PRINCIPLE:

In the field of molecular modeling, docking is a method which predicts the preferred
orientation of one molecule to a second when bound to each other to form a stable complex.
Knowledge of the preferred orientation in turn may be used to predict the strength of
association or binding affinity between two molecules using for example functions. The
associations between biologically relevant molecules such as proteins, nucleic acids,
carbohydrates, and lipids play a central role in signal transduction. Furthermore, the relative
orientation of the two interacting partners may affect the type of signal produced (e.g.,
agonism vs antagonism). Therefore docking is useful for predicting both the strength and type
of signal produced. Docking is frequently used to predict the binding orientation of small
molecule drug candidates to their protein targets in order to in turn predict the affinity and
activity of the small molecule. Hence docking plays an important role in the rational design
of drugs. Given the biological and pharmaceutical significance of molecular docking,
considerable efforts have been directed towards improving the methods used to predict
docking.

Hex is an interactive molecular graphics program for calculating and displaying feasible
docking modes of pairs of protein and DNA molecules. Hex can also calculate protein-ligand
docking, assuming the ligand is rigid, and it can superpose pairs of molecules using only
knowledge of their 3D shapes. Hex has been available for about 12 years now, it is still the
only docking and superposition program to use spherical polar Fourier (SPF) correlations to
accelerate the calculations, and it’s still one of the few docking programs which has built-in
graphics to view the results. Also, as far as I know, it is the first protein docking program to
be able to use modern graphics processor units (GPUs) to accelerate the calculations.
PROCEDURE:

1. Start the computer and establish Internet connection.


2. Suitable ligand and receptor molecule is to be selected in PDB format.
3. By using Hex tool the ligand and receptor molecule is to be loaded.
4. Docking is started by choosing the appropriate button and the default parameter is to be
set.
5. Usually it takes few minutes for completion of docking depending upon the system
configuration
6. After finishing of the docking it provides the binding energy to be noted down and both
the ligand and receptor complex is to be saved as PDB format and the ligand binding site can
be observed by visualization tool.

Você também pode gostar