Você está na página 1de 39

Defining Proteomics

Branch of discovery science focusing on proteins In 1994 defined as the complete set of proteins that is expressed and modified following expression by the entire genome in the lifetime of a cell. If we look at an organism it means that we are looking at the proteome of 3 trillion of cells and ~1000 different cell types with different protein profiles. Can be more specific such as the complement of proteins expressed by a cell at any one time. Today proteomics is a scientific discipline that will bridge the gap between our understanding of genome sequences and cellular behavior.

Genomics and Proteomics a new field with a new vocabulary


-Omics: means area of research
DNA Genome

RNA

Transcriptome

proteins

Proteome

Metabolites Protein-protein, ProteinDNA, Protein-RNA interactions

Metabolome

Interactome

Genomics

global

Targeted

Transcriptomics global Targeted Proteomics

Functional genomics

global Targeted Metabolomics

global

Targeted System Biology

Interactomics

Knowledge from proteomics studies is limited by our inability to analyze efficiently large data sets
Gene name Interaction

Proteomics studies highlight the extreme complexity of interactions in a genomic scale.


Proteomics is facing the challenge of analyzing large and highly complex and very noisy data sets. Bioinformatics is integrated in proteomics projects to mine data and is becoming more and more important.

Proteomics
Makes use of the science of protein to develop highthroughput technologies to study the whole proteome. Proteomics combined with micro-array technology and bioinformatics can explore system biology and is becoming more and more powerful.

Proteomics is currently directed towards protein profiling, and protein discovery


Proteomics can solve important biological mechanism in combination with other methods such a Molecular and Cell Biology, Genetics,etc

How does proteomics help to Identify genes involved in important diseases?


An example in Human Genetics

Genomics Databases contains the information to identify the candidate genes involved in human diseases
More than one candidate gene

NCBI National Center for biotechnology information

SNPs Single Nucleotide Polymorphism

Analysis of Genomics, Microarrays gene expression and proteomics data contained in public databases can identify the gene involved in a particular human disease
2D gel

Only one candidat gene

Disease Gene Identified with mutations

Computer Search

Microarray Gene expression data

Automated DNA sequence


In the 1970 the effort to sequence the DNA by Gilbert and Sanger leads to the decoding of DNA of a few hundred bases long. The first sequence in 1978 of a viral genome of 5000 base pairs highlights the unique insights that can be obtained into gene structure, function and genome organization when a vast amount of genetic information is generated by sequencing. In 1985 Gilbert and others launched the genomic area by improving the existing DNA sequencing technology towards intensive automation In 1998 full automation were obtained for an integrated machine that could produce factory-like DNA sequences The latest sequencing machines can decode 1.5 million of bases over 24 hours, 6000 time the throughput of the prototype

The Human Genome Project

Started officially in 1990, but followed discussion about the DNA sequencing technologies started in 1985. Objective was to obtained the genome in 15 years In 2001 two versions of the draft constituted of 3 billions of bases were available by the biotech company celera and the human genome sequencing consortium. In the process tools and methodology were obtained to sequence other genomes 100 genome to date. The entire RNA and protein output encoded by the genome can be made available in public databases to facilitate hypothesis driven science and global analysis. The HGP pushed the development of highthroughput tools for sequencing which are currently driving the creation of other methodologies related to gene expression such as micorarray and proteomics such as mass spectrometry for the analysis of other related biological information, such as RNA, proteins and molecular interactions.

Digital Nature of Biological Information


The value of genome sequence is that we can study a biological system with a precise digital core of information. The challenge is to find which information is encoded within the digital code. The genome encode the protein and RNA machine of life and the regulatory network that specify how these genes are expressed in time, space and amplitude. The evolution of the regulatory network and not the genes themselves play a critical role in making organism different from one another.

Digital Nature of Biological Information


The digital information operates in three diverse time spans: Evolution: tens to million years Development: hours to ten of years Physiology: milliseconds to weeks Regulatory network are composed of two components: Transcription factors and their DNA sites representing control regions of genes. Control regions serves as information processor to control the concentration of different transcription factors into signals that mediate gene expression to carry out developmental or physiological functions.

Digital Nature of Biological Information


Biology has evolved several different types of information into a hierarchical structure. First a regulatory hierarchy of gene network defining the relationship of a set of transcription factors and regulatory elements controlling particular aspect of development Second an evolutionary hierarchy defining an ordered sets of relationship arising from the duplication of genes. For example the Duplication of a gene to generate a gene family. Third Molecular machine may be assembled into structural hierarchies by an ordered assembly process. The ribosome is assembled by more than 50 different proteins Finally informational theory describe the flow of a gene to environment according to the following scheme:

Digital Nature of Biological Information


Informational theory describe the flow of a gene to environment according to the following scheme:

Gene
RNA Protein Protein interactions Protein complexes Network of protein complexes Tissue Organs Organism Ecosystem

Systems approaches to biology


Human starts as a single cell get fertilized and develop into an adult made of trillion of cells and thousands of cell types. During this process two type of digital information are used. genome inoformation environmental information such as: metabolite concentration Secreted or cell surface signals from other cells, chemical agents, etc. Information can be predetermined deterministic or random stochastics Example: Antibody diversity is generated by stochastic signal following the exposure to an Antigen. Expansion in number of B cells secreting antibody is directly related to the affinity of the antigen to the antibody. Higher the affinity of the antibody to the antigen is higher the cells producing this antibody will be selected for survival and proliferation

What is Proteomics?
Proteomics - A newly emerging field
of life science research that uses High Throughput (HT) technologies to display, identify and/or characterize all the proteins in a given cell, tissue or organism (I.e. the proteome).

16

3 Kinds of Proteomics
Expressional Proteomics Electrophoresis, Protein Chips, DNA Chips, SAGE Mass Spectrometry, Microsequencing Functional Proteomics HT Functional Assays, Ligand Chips Yeast 2-hybrid, Deletion Analysis, Motif Analysis Structural Proteomics High throughput X-ray Crystallography/Modelling High throughput NMR Spectroscopy/Modelling

17

Expressional Proteomics

2-D Gel

QTOF Mass Spectrometry


18

Expressional Proteomics

Prostate tumor
19

Normal

Expressional Proteomics

20

Why Expressional Proteomics?


Concerned with the display, measurement and analysis of global changes in protein expression Monitors global changes arising from application of drugs, pathogens or toxins Monitors changes arising from developmental, environmental or disease perturbations Applications in medical diagnostics and 21 therapeutic drug monitoring

Functional Proteomics

22

Functional Proteomics (in silico)


AHGQSDFILDEADGMMKSTVPN HGFDSAAVLDEADHILQWERTY GGGNDEYIVDEADSVIASDFGH
*[LIVM][LIVM]DEAD*[LIVM][LIVM]*
(EIF 4A ATP DEPENDENT HELICASE)
23

Functional Proteomics (in vitro)


Multi-well plate readers Full automation/robotics Fluorescent and/or chemiluminescent detection Small volumes (mL) Up to 1536 wells/plate Up to 200,000 tests/day Mbytes of data/day

24

Functional Proteomics

25

Functional Proteomics
In silico methods (bioinformatics) Genome-wide Protein Tagging Genome-wide Gene Deletion or Knockouts Random Tagged Mutagenisis or Transposon Insertion Yeast two-hybrid Methods Protein (Ligand) Chips
26

Why Functional Proteomics?


Concerned with the identification and classification of protein functions, activities locations and interactions at a global level To compare organisms at a global level so as to extract phylogenetic information To understand the network of interactions that take place in a cell at a molecular level To predict the phenotypic response of a cell or organism to perturbations or mutations

27

From Genotype to Phenotype

28

Structural Proteomics
High Throughput protein structure determination via Xray crystallography, NMR spectroscopy or comparative molecular modeling
29

Structural Proteomics: The Goal

30

Structural Proteomics: The Motivation


2000000 1800000 1600000 1400000 1200000 1000000 800000 600000 400000 200000 0 1980 1985 1990
31

Sequences

200000 180000 160000 140000

Structures

120000 100000
80000 60000

40000 20000
0

1995

2000

2005

The Protein Fold Universe

32

How Big Is It???

500? 2000? 10000?

Protein Structure Initiative


Organize all known protein sequences into sequence families Select family representatives as targets

Solve the 3D structures of these targets by X-ray or NMR


Build models for the remaining proteins via comparative (homology) modeling
33

Protein Structure Initiative


Organize and recruit interested structural biologists and structure biology centres from around the world

Coordinate target selection


Develop new kinds of high throughput techniques Solve, solve, solve, solve.
34

Why Structural Proteomics?


Structure Structure Function Mechanism

Structure-based Drug Design


Solving the Protein Folding Problem
Keeps Structural Biologists Employed
35

Bioinformatics & Proteomics


Agriculture Medicine

Bioinformatics

Proteomics
36

Genomics

Bioinformatics & Functional Proteomics


How to classify proteins into functional classes? How to compare one proteome with another? How to include functional/activity/pathway information in databases? How to extract functional motifs from sequence data? How to predict phenotype from proteotype?

37

Bioinformatics & Expressional Proteomics


How to correlate changes in protein expression with disease? How to distinguish important from unimportant changes in expression? How to compare, archive, retrieve gel data? How to rapidly, accurately identify proteins from MS and 2D gel data? How to include expression info in 38 databases?

Bioinformatics & Structural Proteomics


How to predict 3D structure from 1D sequence? How to determine function from structure? How to classify proteins on basis of structure? How to recognize 3D motifs and patterns? How to use bioinformatics databases to help in 3D structure determination? How to predict which proteins will express well or produce stable, folded molecules?

39

Você também pode gostar