Você está na página 1de 20

Bioinformatics

Lecture 1
What is bioinformatics?
Why bioinformatics?
The major molecular biology facts
Brief history of bioinformatics
Typical problems of bioinformatics:
collection and retrieval of data
alignment and similarity search
prediction and classification

Expectations and the level of requirements

What is Bioinformatics?

Computer
Science

Mathematics
and Statistics

Biology

What is bioinformatics?
A working definition is that of House of
Representatives Standing Committee on Primary
Industries and Regional Services Inquiry :"All aspects of gathering, storing, handling,
analyzing, interpreting and spreading vast amounts
of biological information in databases.
The
information involved includes gene sequences,
biological
activity/function,
pharmacological
activity, biological structure, molecular structure,
protein-protein interactions, and gene expression.
Bioinformatics uses powerful computers and
statistical techniques to accomplish research
objectives, for example, to discover a new
pharmaceutical or herbicide."

Areas of current and future development of


bioinformatics
Molecular biology and genetics
Phylogenetic and evolutionary sciences
Different aspects of biotechnology including
pharmaceutical and microbiological industries
Medicine
Agriculture
Eco-management

Why bioinformatics?
Exponential growth of investments
Constant deficit of trained professionals
Diversification of bioinformatics applications
Need in different types of bioinformaticians

Central Dogma of Molecular Biology


replication

GENOTYPE (i.e. Aa)

GENE (DNA)

ATGCAAGTCCACTGTATTCCA

transcription
MESSENGER (RNA)

translation
PROTEIN

PHENOTYPE (pink)

TRAIT

reverse tr

UACGUUCAGGUGACAUAAGGG

DNA
5
3

A C G T C A T G

5 template

T G C A G T A C

Symbol

Double helix

Meaning

Explanation

Guanine

Adenine

Thymine

Cytosine

A or G

puRine

C or T

pYrimidine

A, C, G or T Any base

RNA
5

A C G U C A U G

Uracil

Genetic Code
1. Amino acids are coded by codons triplets of
nucleotides, e.g. |ACG|TAT|.
2. There are 43 = 64 codons for ~20 amino acids, the
code is degenerate
3. Codons do not overlap
4. Deletions or insertions of one or few nucleotides (not
equal to 3 x N) usually destroy a message by shifting
a reading frame
5. Three specific codons (stop codons) do not code any
amino acid and are always located at the very end of
the protein coding part of a gene

The genetic code

The 20 amino acids common in living


organisms

PROTEINS

Green Fluorecent Protein (GFP)

1mcgkkfelkidnvrfvghptllqpphtiqasktdpspkrelptmilfsvvfalranadas
61viscmhnlsrriaialqheerrcqyltreaklmlamqdevttiidsdgspqspfrqilpk
121cklardlkeaydslcttgvvrlhinnwlevsfclphkihrvggkhiplealerslkairp

Genomic Hierarchy in Eukaryotes


Genome nuclear (1)

Chromosomes (23x2)

DNA molecules (23x2)

Genes (~30,000); only a small fraction of genome

Nucleotides (~3x109)

Eukaryotic genes are complex


Start codon

Promoter Exon 1

Intron 1

Intron 2

Exon 2

Intron 3

Exon 3

Protein coding regions

Stop codon

Exon 4

Brief history of bioinformatics: Databases


The first biological database - Protein Identification Resource
was established in 1972 by Margaret Dayhoff
Dayhoff and co-workers organized the proteins into families and
superfamilies based on degree of sequence similarity
Idea of sequence alignment was introduced as well as special
tables that reflected the frequency of changes observed in the
sequences of a group of closely related proteins
Currently there are several huge Protein Banks : SwissProt, PIR
International, etc.
The first DNA database was established in 1979. Currently there
are several powerful databases: GenBank, EMBL, DDBJ, etc.

Brief history of bioinformatics:


evolutionary reconsructions

Brief history of bioinformatics: other


important steps
Development of sequence retrieval methods (1970-80s)
Development of principles of sequence alignment (1980s)
Prediction of RNA secondary structure (1980s)
Prediction of protein secondary structure and 3D (1980-90s)
The FASTA and BLAST methods for DB search (1980-90s)
Prediction of genes (1990s)
Studies of complete genome sequences (late 1990s 2000s)

Collection and retrieval of data.


Alignment methods.
Sequencing (DNA, proteins)
Submission of sequences to the databases
Computer storage of sequences
Development of sequence formats
Conversion of one sequence format to another
Development of retrieval and alignment methods

Prediction, reconstruction and


classification
Prediction of secondary and 3D structure of RNA and proteins
Gene prediction in prokaryotes and eukaryotes
Prediction of promoters and other functional sites
Reconstruction of phylogeny
Genome analysis
Classification of proteins and genes

Prediction of RNA secondary structure:


an example

A. Single stranded RNA

B. Stem and loop or hairpin loop

5
3

Expectations of students performance


Basic understanding of general principles of molecular biology
Some mathematical and computer science background
Focus on using computational methods and understanding
general ideas of analysis used in bioinformatics
Formal description of algorithms and complex methodology
will not be the core elements of this unit
The core requirement is understanding of foundations of
bioinformatics and hands on approach

Você também pode gostar