Você está na página 1de 11

http://studentsidea.blogspot.

com

DEPARTMENT OF INFORMATION
TECHNOLOGY

ABSTRACT

Bioinformatics is the application of informatic processes in biotic


of statistics and computer science to the systems. Its primary use since at least the
field of molecular biology. late 1980s has been in genomics and
genetics, particularly in those areas of
The term bioinformatics was coined by
Paulien Hogeweg in 1979 for the study
genomics involving large-scale DNA approaches, however, is its focus on
sequencing. developing and applying
computationally intensive techniques
Bioinformatics now entails the creation
(e.g., pattern recognition, data mining,
and advancement of databases,
machine learning algorithms, and
algorithms, computational and statistical
visualization) to achieve this goal. Major
techniques and theory to solve formal
research efforts in the field include
and practical problems arising from the
sequence alignment, gene finding,
management and analysis of biological
genome assembly, drug design, drug
data.
discovery, protein structure alignment,
protein structure prediction, prediction
Over the past few decades rapid
of gene expression and protein-protein
developments in genomic and other
interactions, genome-wide association
molecular research technologies and
studies and the modeling of evolution.
developments in information
technologies have combined to produce
a tremendous amount of information
related to molecular biology. It is the
name given to these mathematical and Introduction:
computing approaches used to glean
Bioinformatics was applied in the
understanding of biological processes.
creation and maintenance of a database
Common activities in bioinformatics to store biological information at the
include mapping and analyzing DNA beginning of the "genomic revolution",
and protein sequences, aligning different such as nucleotide and amino acid
DNA and protein sequences to compare sequences. Development of this type of
them and creating and viewing 3-D database involved not only design issues
models of protein structures. but the development of complex
interfaces whereby researchers could
The primary goal of bioinformatics is to
both access existing data as well as
increase the understanding of biological
submit new or revised data.
processes. What sets it apart from other
In order to study how normal cellular into families of related
activities are altered in different disease sequences.
states, the biological data must be
combined to form a comprehensive Major research areas:
picture of these activities. Therefore, the
field of bioinformatics has evolved such Since the Phage Φ-X174 was
that the most pressing task now involves sequenced in 1977, the DNA sequences
the analysis and interpretation of various of thousands of organisms have been
types of data, including nucleotide and decoded and stored in databases. This
amino acid sequences, protein domains, sequence information is analyzed to
and protein structures. The actual determine genes that encode
process of analyzing and interpreting polypeptides (proteins), RNA genes,
data is referred to as computational regulatory sequences, structural motifs,
biology. Important sub-disciplines and repetitive sequences. A comparison
within bioinformatics and computational of genes within a species or between
biology include: different species can show similarities
between protein functions, or relations
 the development and between species (the use of molecular
implementation of tools that systematics to construct phylogenetic
enable efficient access to, and trees). With the growing amount of data,
use and management of, various it long ago became impractical to
types of information. analyze DNA sequences manually.
 the development of new Today, computer programs such as
algorithms (mathematical BLAST are used daily to search the
formulas) and statistics with genomes of thousands of organisms,
which to assess relationships containing billions of nucleotides. These
among members of large data programs can compensate for mutations
sets, such as methods to locate a (exchanged, deleted or inserted bases) in
gene within a sequence, predict the DNA sequence, in order to identify
protein structure and/or function, sequences that are related, but not
and cluster protein sequences identical. A variant of this sequence
alignment is used in the sequencing Another aspect of bioinformatics in
process itself. The so-called shotgun sequence analysis is annotation, which
sequencing technique (which was used, involves computational gene finding to
for example, by The Institute for search for protein-coding genes, RNA
Genomic Research to sequence the first genes, and other functional sequences
bacterial genome, Haemophilus within a genome. Not all of the
influenzae) does not produce entire nucleotides within a genome are part of
chromosomes, but instead generates the genes. Within the genome of higher
sequences of many thousands of small organisms, large parts of the DNA do
DNA fragments (ranging from 35 to 900 not serve any obvious purpose. This so-
nucleotides long, depending on the called junk DNA may, however, contain
sequencing technology). The ends of unrecognized functional elements.
these fragments overlap and, when Bioinformatics helps to bridge the gap
aligned properly by a genome assembly between genome and proteome
program, can be used to reconstruct the projects--for example, in the use of DNA
complete genome. Shotgun sequencing sequences for protein identification.
yields sequence data quickly, but the
task of assembling the fragments can be Genome annotation:

quite complicated for larger genomes.


In the context of genomics,
For a genome as large as the human
annotation is the process of marking the
genome, it may take many days of CPU
genes and other biological features in a
time on large-memory, multiprocessor
DNA sequence. The first genome
computers to assemble the fragments,
annotation software system was
and the resulting assembly will usually
designed in 1995 by Dr. Owen White,
contain numerous gaps that have to be
who was part of the team at The Institute
filled in later. Shotgun sequencing is the
for Genomic Research that sequenced
method of choice for virtually all
and analyzed the first genome of a free-
genomes sequenced today, and genome
living organism to be decoded, the
assembly algorithms are a critical area of
bacterium Haemophilus influenzae. Dr.
bioinformatics research.
White built a software system to find the
genes (places in the DNA sequence that
encode a protein), the transfer RNA, and  build complex computational
other features, and to make initial models of populations to predict
assignments of function to those genes. the outcome of the system over
Most current genome annotation systems time
work similarly, but the programs  track and share information on an
available for analysis of genomic DNA increasingly large number of
are constantly changing and improving. species and organisms

Computational evolutionary Future work endeavours to reconstruct


biology: the now more complex tree of life.

Evolutionary biology is the study of The area of research within computer

the origin and descent of species, as well science that uses genetic algorithms is

as their change over time. Informatics sometimes confused with computational

has assisted evolutionary biologists in evolutionary biology, but the two areas

several key ways; it has enabled are not necessarily related.

researchers to:
Analysis of gene expression:
 trace the evolution of a large
The expression of many genes
number of organisms by
can be determined by measuring mRNA
measuring changes in their DNA,
levels with multiple techniques including
rather than through physical
microarrays, expressed cDNA sequence
taxonomy or physiological
tag (EST) sequencing, serial analysis of
observations alone,
gene expression (SAGE) tag sequencing,
 more recently, compare entire
massively parallel signature sequencing
genomes, which permits the
(MPSS), or various applications of
study of more complex
multiplexed in-situ hybridization. All of
evolutionary events, such as gene
these techniques are extremely noise-
duplication, horizontal gene
prone and/or subject to bias in the
transfer, and the prediction of
biological measurement, and a major
factors important in bacterial
research area in computational biology
speciation,
involves developing statistical tools to a single-cell organism, one might
separate signal from noise in high- compare stages of the cell cycle, along
throughput gene expression studies. with various stress conditions (heat
Such studies are often used to determine shock, starvation, etc.). One can then
the genes implicated in a disorder: one apply clustering algorithms to that
might compare microarray data from expression data to determine which
cancerous epithelial cells to data from genes are co-expressed. For example, the
non-cancerous cells to determine the upstream regions (promoters) of co-
transcripts that are up-regulated and expressed genes can be searched for
down-regulated in a particular over-represented regulatory elements.
population of cancer cells.
Analysis of protein expression:
Analysis of regulation:
Protein microarrays and high
Regulation is the complex throughput (HT) mass spectrometry
orchestration of events starting with an (MS) can provide a snapshot of the
extracellular signal such as a hormone proteins present in a biological sample.
and leading to an increase or decrease in Bioinformatics is very much involved in
the activity of one or more proteins. making sense of protein microarray and
Bioinformatics techniques have been HT MS data; the former approach faces
applied to explore various steps in this similar problems as with microarrays
process. For example, promoter analysis targeted at mRNA, the latter involves the
involves the identification and study of problem of matching large amounts of
sequence motifs in the DNA surrounding mass data against predicted masses from
the coding region of a gene. These protein sequence databases, and the
motifs influence the extent to which that complicated statistical analysis of
region is transcribed into mRNA. samples where multiple, but incomplete
Expression data can be used to infer peptides from each protein are detected.
gene regulation: one might compare
microarray data from a wide variety of
states of an organism to form hypotheses
about the genes involved in each state. In
Analysis of mutations in new opportunities for bioinformaticians.
cancer: The data is often found to contain
considerable variability, or noise, and
In cancer, the genomes of thus Hidden Markov model and change-
affected cells are rearranged in complex point analysis methods are being
or even unpredictable ways. Massive developed to infer real copy number
sequencing efforts are used to identify changes.
previously unknown point mutations in a
variety of genes in cancer. Another type of data that requires novel

Bioinformaticians continue to produce informatics development is the analysis

specialized automated systems to of lesions found to be recurrent among

manage the sheer volume of sequence many tumors .

data produced, and they create new


Prediction of protein structure:
algorithms and software to compare the
sequencing results to the growing Protein structure prediction is
collection of human genome sequences another important application of
and germline polymorphisms. New bioinformatics. The amino acid sequence
physical detection technologies are of a protein, the so-called primary
employed, such as oligonucleotide structure, can be easily determined from
microarrays to identify chromosomal the sequence on the gene that codes for
gains and losses (called comparative it. In the vast majority of cases, this
genomic hybridization), and single- primary structure uniquely determines a
nucleotide polymorphism arrays to structure in its native environment. (Of
detect known point mutations. These course, there are exceptions, such as the
detection methods simultaneously bovine spongiform encephalopathy - aka
measure several hundred thousand sites Mad Cow Disease - prion.) Knowledge
throughout the genome, and when used of this structure is vital in understanding
in high-throughput to measure thousands the function of the protein. For lack of
of samples, generate terabytes of data better terms, structural information is
per experiment. Again the massive usually classified as one of secondary,
amounts and new types of data generate tertiary and quaternary structure. A
viable general solution to such proteins have completely different amino
predictions remains an open problem. As acid sequences, their protein structures
of now, most efforts have been directed are virtually identical, which reflects
towards heuristics that work most of the their near identical purposes.
time.
Other techniques for predicting protein
One of the key ideas in bioinformatics is structure include protein threading and
the notion of homology. In the genomic de novo (from scratch) physics-based
branch of bioinformatics, homology is modeling.
used to predict the function of a gene: if
the sequence of gene A, whose function Comparative genomics:

is known, is homologous to the sequence


The core of comparative genome
of gene B, whose function is unknown,
analysis is the establishment of the
one could infer that B may share A's
correspondence between genes
function. In the structural branch of
(orthology analysis) or other genomic
bioinformatics, homology is used to
features in different organisms. It is
determine which parts of a protein are
these intergenomic maps that make it
important in structure formation and
possible to trace the evolutionary
interaction with other proteins. In a
processes responsible for the divergence
technique called homology modeling,
of two genomes. A multitude of
this information is used to predict the
evolutionary events acting at various
structure of a protein once the structure
organizational levels shape genome
of a homologous protein is known. This
evolution. At the lowest level, point
currently remains the only way to predict
mutations affect individual nucleotides.
protein structures reliably.
At a higher level, large chromosomal

One example of this is the similar segments undergo duplication, lateral

protein homology between hemoglobin transfer, inversion, transposition,

in humans and the hemoglobin in deletion and insertion. Ultimately, whole

legumes (leghemoglobin). Both serve genomes are involved in processes of

the same purpose of transporting oxygen hybridization, polyploidization and

in the organism. Though both of these endosymbiosis, often leading to rapid


speciation. The complexity of genome simulation of simple (artificial) life
evolution poses many exciting forms.
challenges to developers of
mathematical models and algorithms, High-throughput image

who have recourse to a spectra of analysis:


algorithmic, statistical and mathematical
Computational technologies are used
techniques, ranging from exact,
to accelerate or fully automate the
heuristics, fixed parameter and
processing, quantification and analysis
approximation algorithms for problems
of large amounts of high-information-
based on parsimony models to Markov
content biomedical imagery. Modern
Chain Monte Carlo algorithms for
image analysis systems augment an
Bayesian analysis of problems based on
observer's ability to make measurements
probabilistic models.
from a large or complex set of images,
Many of these studies are based on the by improving accuracy, objectivity, or
homology detection and protein families speed. A fully developed analysis system
computation. may completely replace the observer.
Although these systems are not unique to
Modeling biological systems: biomedical imagery, biomedical imaging
is becoming more important for both
Systems biology involves the use
diagnostics and research. Some
of computer simulations of cellular
examples are:
subsystems (such as the networks of
metabolites and enzymes which  high-throughput and high-fidelity
comprise metabolism, signal quantification and sub-cellular
transduction pathways and gene localization (high-content
regulatory networks) to both analyze and screening, cytohistopathology,
visualize the complex connections of Bioimage informatics)
these cellular processes. Artificial life or  morphometrics
virtual evolution attempts to understand  clinical image analysis and
evolutionary processes via the computer visualization
 determining the real-time air- computing resources on servers in other
flow patterns in breathing lungs parts of the world. The main advantages
of living animals derive from the fact that end users do not
 quantifying occlusion size in have to deal with software and database
real-time imagery from the maintenance overheads.
development of and recovery
Basic bioinformatics services are
during arterial injury
classified by the EBI into three
 making behavioral observations
categories: SSS (Sequence Search
from extended video recordings
Services), MSA (Multiple Sequence
of laboratory animals
Alignment) and BSA (Biological
 infrared measurements for
Sequence Analysis). The availability of
metabolic activity determination
these service-oriented bioinformatics
 inferring clone overlaps in DNA
resources demonstrate the applicability
mapping, e.g. the Sulston score
of web based bioinformatics solutions,

Software and tools: and range from a collection of


standalone tools with a common data

Software tools for bioinformatics format under a single, standalone or

range from simple command-line tools, web-based interface, to integrative,

to more complex graphical programs and distributed and extensible bioinformatics

standalone web-services available from workflow management systems.

various bioinformatics companies or


public institutions. References:

Web services in bioinformatics:  Achuthsankar S Nair


Computational Biology &
SOAP and REST-based Bioinformatics - A gentle
interfaces have been developed for a Overview, Communications of
wide variety of bioinformatics Computer Society of India,
applications allowing an application January 2007
running on one computer in one part of  Aluru, Srinivas, ed. Handbook of
the world to use algorithms, data and Computational Molecular
Biology. Chapman & Hall/Crc,
2006. ISBN 1584884061
(Chapman & Hall/Crc Computer
and Information Science Series)
 Baldi, P and Brunak, S,
Bioinformatics: The Machine
Learning Approach, 2nd edition.
MIT Press, 2001. ISBN 0-262-
02506-X
 Barnes, M.R. and Gray, I.C.,
eds., Bioinformatics for
Geneticists, first edition. Wiley,
2003. ISBN 0-470-84394-2
 Baxevanis, A.D. and Ouellette,
B.F.F., eds., Bioinformatics: A
Practical Guide to the Analysis
of Genes and Proteins, third
edition. Wiley, 2005. ISBN 0-
471-47878-4
 Baxevanis, A.D., Petsko, G.A.,
Stein, L.D., and Stormo, G.D.,
eds., Current Protocols in
Bioinformatics. Wiley, 2007.
ISBN 0-471-25093-7.

http://studentsidea.blogspot.com
or
http://studentsidea.co.cc

Você também pode gostar