Você está na página 1de 81

Welco Hey Dr.

Computer . What m
I suffering from?

me
Ge
ne
Gene mining
is the process
of exploiting
deoxyribonu
cleic acid
(DNA)
sequence of
one
genotype to
isolate useful
gene from
related
genotypes

Min
ing
TARGET SPECIFIC GENE MINING
HISTORICAL BACKGROUND
EVENTS YEAR

SEQUENCING OF FIRST PLANT 2000


GENOME (Arabidopsis thaliana)

COMPLETION OF HUMAN 2003


GENOME ROJECT

FIRST WORK ON GENE MINING 2003


SOME ORGANISMS WHOSE GENOME HAVE
BEEN SEQUENCED COMPLETELY
EUKARYOTES
• (mb) PROKARYOTES (mb)
Arabidopsis thaliana (114.5) Bacillus subtilis (4.20)

Saccharomycese cerevisiae (12) Haemophilus influenzae (1.83)

Oryza sativa (466) Escherichia coli (4.6)

Homo sapiens (3200) Vibrio cholerae (4.0)

Drosophilla melanogaster (120) Mycobacterium tuberculosis (4.40)

Plasmodium falciparum (23) Treponema pallidum (1.14)


INTROD
UCTION

CENTRAL
DOGMA ?

INTRODUCTI WHERE IS
ON OF GENE GENE
MINING ? LOCATED?
WHERE IS GENE LOCATED?
CENTRAL DOGMA:-
INTRODUCTION TO GENE
MINING
The use of molecular biological
techniques have allowed for
the rapid development and
identification of nucleic acid
sequences.

Gene mining is to identify and


isolate genes that are
characterised for conferring
essential traits.
With the availability of laboratory
equipment & advanced computer
software, researchers are able to
conduct computational algorithms to
seek and identify gene sequences .
Genetic databases for many
organisms such as Escherichia coli,
Haemophilus influenzae,, and
Mycoplasma pneumoniae , to name
a few, are available for public.

Internet is readily
accessible to scientists
worldwide.
These biological databases
store information that is
searchable and from which
biological information may
be retrieved.

Gene mining exploits


publicly available sequence
databases on the Internet
for identification of useful
genes.
The resources used mainly are
Genbank at the National Center for
Biotechnology Information (
http://www.ncbi.nlm.nih.gov/) &
the A. thaliana database at
TIGR (http://www.tigr.org/
tdb/tgi/agi/)

This work is carried out by using


an Internet connection and a DOS-
based sequence analysis software
package, and the facilities of a
basic molecular biology laboratory
for PCR verification.
VARIOUS METHODS ADOPTED FOR
GENE MINING:
1 ●
DNA Extraction and PCR BASED Gene mining

2 ●
Data Mining

3 ●
Using Genetic Algorithm

4 ●
Peptide mass fingerprinting
5 From Biomedical Literature

6 With the help of GENOWATCH


7 ●
ORIEL

8 DNA chip analysis (Microarray)



PCR Analysis:-
• Gene-specific primers amplify the DNA of each
accession, and the amplified product
represents either the entire gene or some
functional component of the allele, such as
the promoter or the coding sequences.
PROBLEMS:-
Amplification of more than
one gene (lack of
specificity)
Failure to amplify alleles in
distantly related genera.
 
2. Data Mining


Data mining mainly is about somehow

1 extracting the information


knowledge from text/data.
and

Data mining is the process of compiling,


2 organizing and analyzing large document


collections to support the delivery of
information to analysts.
Data Mining PROBLEMS
Data mining systems induce
knowledge from datasets which are
huge, noisy (incorrect), incomplete,
inconsistent, imprecise (fuzzy), and
uncertain.

The problem is that existing


systems use a limiting attribute
value language for representing
induced knowledge.

Furthermore, some important


patterns are ignored because
they are statistically
insignificant.
3. Using Genetic
Algorithm

Rapid growth of available So research on some topics


data in digital format such as text classification,
increases need for information retrieval and
methods to analyze automatic text summarization
them . became an important field
It helps the users to obtain the main pieces of information available in that text, but with a much shorter reading time .

Researchers
have provided
new tools for
analyzing and
accessing data
in databases
based on term
frequency and
are used in text
processing.
Public protein sequence database such as SWISS-PROT is used practically for the protein identification .

However, for the less of protein information for the specific plant species in these databases it is needed to construct the private protein database containing sufficient protein information .
4. Peptide
mass
fingerprin
ting
made the protein database by translating enormous coding reg

Workers had also made the individual systems based


data analysis from chromosomal mapping &
microarray data.

5. From Biomedical Literature:
weighing protein protein interactions
and connectivity


An initial set of genes and proteins is obtained from
gene disease relationships extracted from PubMed
abstracts using natural language processing.


Interactions involving the corresponding proteins
are similarly extracted and integrated with
interactions from curated databases (such as BIND
and DIP),
he literature with data from curated sources in order to uncov

We can apply the method to various diseases


to assess its effectiveness.
The comprehensive gene information retrieved
includes gene ontology, function, pathway,
disease, related articles in PubMed and so on.

Subsequent SNP functional impact analysis and


primer design of a target gene for re-sequencing
can also be done in a few clicks.
The presentation of results are carefully
designed to be as intuitive as possible to all
users.

The GenoWatch is available at the website


www.genepipe.ngc.sinica.edu.tw/genowatch.
This European
Project will develop
tools and
procedures to
promote access to
wide range of
information
resources in the life
sciences. 

7.ORIEL
Project , aims
to provide research
communities with
tools to manage large,
complex, multimedia
datasets and to
navigate through
potentially confusing
information
landscape.

Project 
ORIEL will develop tools to :
Make navigation easy, thereby
encouraging the creative
exploration of the information
landscape

Facilitate communication by
making data presentation and
information visualisation user
friendly.

Enable effective linking of


different types of biological
information (literature, factual
and multimedia databases)
MILESTONES:-
The development and
optimization of interactive and
adaptive user interfaces to
promote intelligent access to
retrieval and analysis of data
stored in digital form.
The development of new concepts
that will enhance the efficiency of
integration of different types of
biological data currently maintained in
a wide spectrum of digital collections
and resources across Europe.
8.DNA chip analysis (Microarray)
1.This is a recently
2. This requires the
developed technique
availability of many
for the analysis of
gene expression cloned genes

3. Allows the
elucidation of
complex responses

4. The expression of
many genes can be 5. Uses confocal
investigated at the same laser scanner to
time (i.e. in one
experiment)
elucidate the results.

6. Based on two RNA


samples, a control and a
sample of interest (e.g.
Yellow: no difference
Red: “test” overexressin
Green: “test”
underexpressing
INSTRUMENT USED OUTPUT
DRAW
BACKS

High
tech

Expensive:
Available requires fancy
in only few equipment and
expensive
places. reagents

Analysis not
straight forward
and still under
development
APPLICATIONS OF GENE MINING
1.Allele Mining for 2.Allele mining and
Stress Tolerance Genes sequence diversity at the
in Oryza Species and wheat powdery mildew
Related Germplasm. resistance locus Pm3.

3.Gene mining of
Arabidopsis thaliana
genome: applications for
biotechnology in Africa

4.Isolation of a 5.Gene mining


Known Gene to strategies of
Validate System drug discovery

solation of Nucleic
Acid Molecules
Contd.
7.7.Mining
Miningcolon
colon 8. Mining molecular
8. Mining molecular
tumor
tumorrelevant
relevant
signatures for
signatures for
genes leukemia subtypes
genes leukemia subtypes

9. Mining molecular
9. for
signatures
Metagenomics .
leukemia subtypes

10. 10. GENOMIC 11. GENOMIC


Metagenomics .
INDUSTRIES INDUSTRIES
1. Gene Mining for Stress Tolerance Genes in Oryza
Species

Scientists used calmodulin gene,


encoding a late embryogenesis
associated protein, and salt-
inducible rice gene for gene
The international
project to
sequence the
genome of Oryza
sativa has made
gene mining
possible for all
genes of rice.

mining of stress tolerance genes


on identified accessions of rice
However, the
primers based on
the adjacent
amino (N) and
carboxy (C)

and related germplasm.


termini amplify
additional loci.

Primers were found to be


sufficiently conserved so as to be
effective over the entire range of
germplasm in rice.

Cont..
PCR Primers Used for Amplification of Genes
Gene/Primer Sequence of PCR Primers 5' 3' Size of PCR Product
(bp)

1. Calmod 5' 3' CGC GCG CGC CTG CGT CGC CAA TGG 1254
CGA TGC TTC AAC TTA CTT GGC C

2. Calmod NC ATG GCG GAC CAG CTC ACC GAC GA 1178


CAC CAT CAA CAT CGG CCT GAC CG

3. LEA3 5' 3' GCT TAG GAT CAA TGG CTT CCC ACC 941
CCA AAG GGA AAT CAT TCA CGG CGT
C
4. LEA3 NC CTA CCG CGC CGG CGA GAC CA 838
TCC CTC GCC GTC GTC TCC GT

5. SalT 5' 3' CCA CGA AGA CTA TGA CGC TGG TG 574
CTT TGA CCA CTG GGA ATC AAG G

6. SalT NC ATG ACG CTG GTG AAG ATT GGC C 534


GGT GGA CGT AGA TGC CAA TTG C
HOW TO FIND NOVEL GENE
2. Gene mining and sequence diversity at the
wheat powdery mildew resistance locus Pm3.
ome genetic resistance, the identification and im

The gene mining approach to characterize and utilize


the naturally occurring resistance diversity in wheat.
The new interesting and functional gene can be
transferred to susceptible but economically
important wheat varieties as single genes to achieve
efficient control of mildew.

This study contributes to targeted use of genetic


diversity resources for research and breeding.
Geographic origin of the 30 Pm3b lines detected
in the ‘FIGS powdery mildew set’. The collection
sites are indicated by red triangles.
3. Gene mining of Arabidopsis thaliana
genome.

The genome sequence of the


model dicotyledonous plant
Arabidopsis thaliana is
available in the public
domain through the Internet
(www.arabidopsis.org).

The identification of the


anti-fungal
polygalacturonase inhibiting
protein (pgip) gene lays the
groundwork for functional
studies.
It has relevance as a model system for protein
protein interactions, as well as practical application
in engineering fungal resistant crop plants.

It showed that Gene ‘mining’ is the first step in a so


called ‘reverse genetics’ approach in which an
investigator first identifies a gene sequence and
then uses this information to determine the gene’s
function and role in the biology of the plant.
4. Isolation of a Known Gene to
Validate System
• In order to validate the system, it is used to
isolate a known gene; e.g. Aminopeptidase
gene in the Mudunca sexta .

• Aminopeptidase is involved in the


modulation of various cellular responses,
especially in cell-cell adhesion and signal
transduction.
• Aminopeptidase is directly involved in
resistance by insects to insecticidal toxins
of Bacillus thuringiensis. 

• The M. sexta aminopeptidase gene was


mined based on nucleotide and amino acid
sequence alignment with the existing
aminopeptidase related sequences

5. Gene mining strategies of drug
! discovery


The strategies for identifying the limited number of genes

! that will be relevant to any given disease have been


evolving at a rapid pace.


A more systematic approach to pharmacogenomics can

! now benefit from gene mining to identify disease and


drug specific patterns of gene expression.

Microarrays are used to analyze the samples, and


!

the resulting data are installed into a database.


6. Isolation of Nucleic Acid Molecules
Related to Integrin
s plays a fundamental role in the processes involved in

The specific function identified was that the target be an


integral membrane protein involved in cytoskeletal formation.
These structural-functional parameters were then
used to target potential genes based on the function
identified from the PubMed database on all
organisms

The primer design software was the MacVector


software and following an initial round of sequence
determination the primer design was improved.
7.Mining colon tumor relevant genes

tion (CV) resampling approach is used to construct the t

First, colon tumor and normal samples are randomly divided into five non-overlapping subsets of
roughly equal size, i.e. tumor subsets Di (i = 1, 2, ..., 5) and normal subsets Ni (i = 1, 2, ..., 5).

Repeat the resampling 20 times and obtain 500 pairs of training and test sets.
In order to obtain a statistical
measure of significance for
each gene, a null distribution
FV0 is constructed, as
described previously.

The proposed gene


extraction approach
is then applied to
each pair.
8. Mining molecular signatures for leukemia
subtypes

Here, the target phenotypes


are two distinct leukemia
subtypes, AML and ALL.

Thus, an ensemble decision analysis is


conducted to identify the significant molecular
signatures (subtype relevant genes) that
underpin the complex molecular mechanisms
for distinction between the two subtypes.

These data contain


measurements corresponding to
ALL and AML samples from bone
marrow and peripheral blood.
Leukemia: Acute Lymphoblastic (ALL) vs Acute Myeloid
(AML)

ALL AML

Visually similar, but genetically very different


9.Metagenomics(Uncultivable Microbes 
& Novel genes)

Modern biotechnology has a


steadily increasing demand for
novel genes for application in
various industrial processes and
development of genetically
modified organisms.

Identification, isolation and


cloning for novel genes at a
reasonable pace is the main
driving force behind the
development of scheduled
experimental approaches.
Current bottlenecks in
metagenomics include
insufficient functional
characterization of
proteins in databases.

Metagenomics is one such


novel approach for engendering
novel genes. Metagenomics of
complex microbial communities
is a rich source of novel genes
for biotechnological purposes.
10.GENOMIC INDUSTRY & GENE MINING COMPANIES

• Now that the sequence data is available and


placed in the public domain.

• Companies have been created to "mine" the data,


that is, to analyze the genomic sequences to
identify genes, their function, and their
relationships to health and disease processes.

• Companies pioneering in this area included


Sequana and Millennium Pharmaceuticals
Lung cancer has become a global
public health burden, with 1.5
million deaths expected by 2010.
Further it has substantiating the
need for early diagnosis.
The key to accomplishing both this
goal is a better understanding of
the genes and pathways disrupted
during the initiation and
progression of this disease.

11. Mining the Epigenome


for Methylated Genes in
Lung Cancer
It has stimulated the development
of screening approaches to identify
additional genes and pathways
that are disrupted within the
epigenome.
Gene promoter
hypermethylation is a major
mechanism for silencing
genes in lung cancer.
Therefoe Mining the Epigenome
for Methylated Genes in Lung
Cancer helps in the detailed study
of the same and adds to its cure.
FUTURE SCENARIO:-
1.Advancement 2.Metagenomics for
Mining New Genetic
in gene mining Resources of Microbial
companies Communities

3.GENE MINING WITH


THE HIERARCHICAL
CLUSTERING
ALGORITHM

4."Gene-mining" 5.Mining the


strategies of drug mouse genome
discovery dveloment for novel genes

Global gene mining


CONCLUSION
WORK IS BEING DONE IN AN EXPONENTIAL SCALE WORLDWIDE .

Indian scientist are also toiling hard.

Due to some bottlenecks ,not able to keep the pace.

Gene mining is not only boon for plant biotechnology but equally good for animal sciences.
Gene mining
provided molecular
biologists with a
powerful and
useable tool for
extracting disease-
relevant genes, a
major theme in the
post-genomic era.

This technique
leaves a ? For
the target
driven gene
functioning.
GENE
MINING

Drug
Discovery

Tissue
engineerin
Metage
g nomics

Nano
technology
BLAST
PROGRRAMMES &/SOFTWARES/DATABASES
USED FOR GENE MINIG

BioMart &
e- Ensemble
BLAST
p
BLAST
n

tBLAST
x
BLAST

BLAST
x

tBLAST
n
BLASTp- Compares an Amino
acid query sequence against a
protein sequence database.

BLASTn- Compares a
Nucleotide query sequence
against a protein sequence
database.
BLASTx- Compares six frame
conceptual translation
products of a Nucleotide query
sequence against a protein
sequence database.
tBLASTn- Compares a protein
query sequence against a
Nucleotide sequence database
dynamically translated in all six
reading frames.

tBLASTx - Compares six reading


frames translation of a
nucleotide query sequence
against the six frame
translation of the nucleotide
sequence database.
Biotechnology blog site imagination
Hey Dr. Computer .
What m I suffering
from?
RESEARCH PAPERS PUBLISHED:
1. R. Latha, L. Rubia, J. Bennett and M. S. Swaminathan. 2004.
Allele Mining for Stress Tolerance Genes in Oryza Species and
Related Germplasm. Molecular Biotechnology.Volume 27. 101-
108.
 
2. Kaur N, Street K , Mackay M , Yahiaoui N, Keller B. Allele
mining and sequence diversity at the wheat powdery mildew
resistance locus Pm3. Plant molecular biology. (65). 93-106.
 
3. DK Berger. 2004. Gene-mining the Arabidopsis thaliana
genome: applications for biotechnology in Africa. South
African Journal of Botany, 70(1): 173–180.
 
4. Seokkyung Chung, Jongeun Jun, Dennis McLeod. 2004. Mining Gene
Expression Datasets using Density-based Clustering. CIKM (04).
8–13.
 
5. Gerard R. Lazo1, Debbie Laudencia-Chingcuanco1, Yong Q. Gu1, Olin
D. Anderson1.2004. Gene Mining for Conserved cis Elements in
Model Genomes Using Gene Expression Patterns. In: The NCBI
Handbook. 106-109.
 
6. S. M. Khalessizadeh, R. Zaefarian, S.H. Nasseri, and E. Ardil. 2006.
Genetic Mining: Using Genetic Algorithm for Topic based on
Concept Distribution. World Academy of Science, Engineering
and Technology 13. 144-147.

7. Graciela Gonzalez¥, Juan C. Uribe, Luis Tari, Colleen brophy, chitta


baral. 2007. Mining gene-disease relationships from biomedical
literature: weighting proteinprotein interactions and
connectivity measures. Pacific symposium on biocomputing 12.
28-39.
 
Patents:-
1. Peptide Mass Fingerprinting Database
Management program Using AMWISE and
fBIND technique ,
Registration Number: 2004-01-12-835
2. Peptide Mass Fingerprinting program
Using AMWISE and fBIND technique,
Registration Number : 2004-01-12-836
Patents:-
3. cDNA Microarray data Classification
tool
Registration Number : 2004-01-22-839
4. cDNA Microarray data Clustering
tool
Registration Number : 2004-01-22-840

Você também pode gostar