1) Bioinformatics is the application of computer science and information technology to the field of biology and medicine.
2) It involves the use of techniques for organizing and analyzing large amounts of genomic, biochemical, and pharmacological data.
3) Some of the topics covered in bioinformatics include genome sequencing, protein structure prediction, database design, and analyzing gene and protein expression patterns.
1) Bioinformatics is the application of computer science and information technology to the field of biology and medicine.
2) It involves the use of techniques for organizing and analyzing large amounts of genomic, biochemical, and pharmacological data.
3) Some of the topics covered in bioinformatics include genome sequencing, protein structure prediction, database design, and analyzing gene and protein expression patterns.
Direitos autorais:
Attribution Non-Commercial (BY-NC)
Formatos disponíveis
Baixe no formato PDF, TXT ou leia online no Scribd
1) Bioinformatics is the application of computer science and information technology to the field of biology and medicine.
2) It involves the use of techniques for organizing and analyzing large amounts of genomic, biochemical, and pharmacological data.
3) Some of the topics covered in bioinformatics include genome sequencing, protein structure prediction, database design, and analyzing gene and protein expression patterns.
Direitos autorais:
Attribution Non-Commercial (BY-NC)
Formatos disponíveis
Baixe no formato PDF, TXT ou leia online no Scribd
1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a What is Bioinformatics?
• (Molecular) Bio - informatics
2 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
• One idea for a definition? Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical-chemistry) and then applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale. • Bioinformatics is “MIS” for Molecular Biology Information. It is a practical discipline with many applications. Organizing Molecular Biology Information: Redundancy and
3 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
Multiplicity • Different Sequences Have the Same Structure • Organism has many similar genes • Single Gene May Have Multiple Functions Integrative Genomics - • Genes are grouped into Pathways genes ↔ structures ↔ • Genomic Sequence Redundancy functions ↔ pathways ↔ due to the Genetic Code expression levels ↔ regulatory systems ↔ …. • How do we find the similarities?..... A Parts List Approach to Bike Maintenance
4 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
A Parts List Approach to Bike Maintenance How many roles can these play? How flexible and adaptable are they mechanically?
5 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
What are the shared parts (bolt, nut, washer, spring, bearing), unique parts (cogs, levers)? What are the common parts - Where are - types of parts the parts (nuts & washers)? located? What is Bioinformatics?
• (Molecular) Bio - informatics
6 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
• One idea for a definition? Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical-chemistry) and then applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale. • Bioinformatics is “MIS” for Molecular Biology Information. It is a practical discipline with many applications. General Types of “Informatics” techniques in Bioinformatics • Databases • Geometry
7 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
◊ Building, Querying ◊ Robotics ◊ Object DB ◊ Graphics (Surfaces, Volumes) • Text String Comparison ◊ Comparison and 3D Matching (Visision, recognition) ◊ Text Search ◊ 1D Alignment • Physical Simulation ◊ Significance Statistics ◊ Newtonian Mechanics ◊ Alta Vista, grep ◊ Electrostatics ◊ Numerical Algorithms • Finding Patterns ◊ Simulation ◊ AI / Machine Learning ◊ Clustering ◊ Datamining New Paradigm for Scientific Computing • Because of • Physics
8 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
increase in data and ◊ Prediction based on physical improvement in computers, principles new calculations become ◊ Exact Determination of Rocket possible Trajectory ◊ Supercomputer, CPU • But Bioinformatics has a new style of calculation... • Biology ◊ Two Paradigms ◊ Classifying information and discovering unexpected relationships ◊ globin ~ colicin~ plastocyanin~ repressor ◊ networks, “federated” database Bioinformatics Topics -- Genome Sequence • Finding Genes in Genomic
9 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
DNA ◊ introns ◊ exons ◊ promotors • Characterizing Repeats in Genomic DNA ◊ Statistics ◊ Patterns • Duplications in the Genome • Sequence Alignment ◊ non-exact string matching, gaps Bioinformatics ◊ How to align two strings optimally via Dynamic Programming Topics -- ◊ Local vs Global Alignment ◊ Suboptimal Alignment Protein Sequence ◊ Hashing to increase speed
10 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
(BLAST, FASTA) • Scoring schemes and ◊ Amino acid substitution scoring Matching statistics matrices ◊ How to tell if a given alignment or • Multiple Alignment and match is statistically significant Consensus Patterns ◊ A P-value (or an e-value)? ◊ How to align more than one ◊ Score Distributions sequence and then fuse the (extreme val. dist.) result in a consensus ◊ Low Complexity Sequences representation ◊ Transitive Comparisons ◊ HMMs, Profiles ◊ Motifs Bioinformatics Topics -- Sequence / Structure
11 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
• Secondary Structure “Prediction” ◊ via Propensities ◊ Neural Networks, Genetic • Tertiary Structure Prediction Alg. ◊ Fold Recognition ◊ Simple Statistics ◊ Threading ◊ TM-helix finding ◊ Ab initio ◊ Assessing Secondary Structure Prediction • Function Prediction ◊ Active site identification • Relation of Sequence Similarity to Structural Similarity Topics -- Structures
• Basic Protein Geometry and • Structural Alignment
12 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
Least-Squares Fitting ◊ Aligning sequences on the basis ◊ Distances, Angles, Axes, of 3D structure. Rotations ◊ DP does not converge, unlike • Calculating a helix axis in 3D sequences, what to do? via fitting a line ◊ Other Approaches: Distance ◊ LSQ fit of 2 structures Matrices, Hashing ◊ Molecular Graphics ◊ Fold Library • Calculation of Volume and Surface ◊ How to represent a plane ◊ How to represent a solid ◊ How to calculate an area ◊ Docking and Drug Design as Surface Matching ◊ Packing Measurement • Relational Database Topics -- Concepts ◊ Keys, Foreign Keys Databases ◊ SQL, OODBMS, views, forms, transactions, reports, indexes • Clustering and Trees
13 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
◊ Joining Tables, Normalization ◊ Basic clustering • Natural Join as "where" • UPGMA selection on cross product • single-linkage • Array Referencing (perl/dbm) • multiple linkage ◊ Forms and Reports ◊ Other Methods ◊ Cross-tabulation • Parsimony, Maximum • Protein Units? likelihood ◊ What are the units of biological ◊ Evolutionary implications information? • The Bias Problem • sequence, structure ◊ sequence weighting • motifs, modules, domains ◊ sampling ◊ How classified: folds, motions, pathways, functions? Topics -- Genomics
• Expression Analysis • Genome Comparisons
14 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
◊ Time Courses clustering ◊ Ortholog Families, pathways ◊ Measuring differences ◊ Large-scale censuses ◊ Identifying Regulatory Regions ◊ Frequent Words Analysis • Large scale cross referencing ◊ Genome Annotation of information ◊ Trees from Genomes ◊ Identification of interacting • Function Classification and proteins Orthologs • The Genomic vs. Single- • Structural Genomics molecule Perspective ◊ Folds in Genomes, shared & common folds ◊ Bulk Structure Prediction • Genome Trees • Topics -- Simulation
• Molecular Simulation • Parameter Sets
15 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
◊ Geometry -> Energy -> Forces • Number Density ◊ Basic interactions, potential energy functions • Poisson-Boltzman Equation ◊ Electrostatics • Lattice Models and ◊ VDW Forces Simplification ◊ Bonds as Springs ◊ How structure changes over time? • How to measure the change in a vector (gradient) ◊ Molecular Dynamics & MC ◊ Energy Minimization What is Bioinformatics?
• (Molecular) Bio - informatics
16 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
• One idea for a definition? Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical-chemistry) and then applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale. • Bioinformatics is “MIS” for Molecular Biology Information. It is a practical discipline with many applications. Major Application I: Designing Drugs • Understanding How Structures Bind Other Molecules (Function)
17 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
• Designing Inhibitors • Docking, Structure Modeling (From left to right, figures adapted from Olsen Group Docking Page at Scripps, Dyson NMR Group Web page at Scripps, and from Computational Chemistry Page at Cornell Theory Center). Major Application II: Finding Homologs
18 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
Major Application I|I: Overall Genome Characterization • Overall Occurrence of a
19 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
Certain Feature in the Genome ◊ e.g. how many kinases in Yeast • Compare Organisms and Tissues ◊ Expression levels in Cancerous vs Normal Tissues • Databases, Statistics
(Clock figures, yeast v. Synechocystis,
adapted from GeneQuiz Web Page, Sander Group, EBI) Schematic Bioinformatics
20 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
1980 Bioinformatics - History • Single Structures ◊ Modeling & Geometry 1985 ◊ Forces & Simulation ◊ Docking
21 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu