Você está na página 1de 5

Genomic Pathway Visualizer : Research : Artificial Intelligence Laboratory...





Eller College Home > MIS > Artificial Intelligence Laboratory > Research > Genomic Pathway Visualizer

About the Lab Recognition Research and Demos Papers and Conferences Funding People News and Events Resources for MIS and AI Lab Photo Gallery Contact Us AI Lab Intranet (log in required)

Genomic Pathway Visualizer

Research Goal
To develop text mining and data mining techniques to support automated extraction and inference of regulatory pathways from biomedical literature and experimental data. Technological developments in genomic and proteomic research have led to an explosion of data available for biomedical research. The sheer quantity of data generated by high throughput technologies such as DNA microarray has exceeded the capacity of traditional data analysis techniques to extract useful information. Meanwhile, rapid accumulation of research publications makes it difficult to keep abreast of new developments in the area. The research goal of Arizona BioPathway is to develop novel machine learning and Natural Language Processing (NLP) techniques to support efficient and effective data and text analysis in biomedical fields, particularly, the analysis of genetic regulatory pathways which are crucial for biological processes such as gene regulation and cancer development. Arizona BioPathway is also aimed at the creation of a framework for pathway-related knowledge integration and visualization using a combination of various approaches. The ultimate goal of Arizona BioPathway is to provide biomedical researchers with a platform of pathway-related literature abstraction, data analysis and knowledge integration, thus to of scientific hypotheses and discovery of new knowledge.
Research Goal Funding Acknowledgements Approach & Methodology Team Members Publications Previous Research

Business Intelligence and Analytics Intelligence and Security Informatics Dark Web Terrorism Research Sports Data Mining COPLINK / BorderSafe / RISC BioPortal for Disease Surveillance Knowledge Mapping for Nanotechnology Authorship Analysis Genomic Pathway Visualizer Digital Libraries & Multilingual Systems Information Analysis and Visualization Spiders Are Us OOHAY - Visualizing the Web Workgroup and Collaborative Computing Multilingual Systems

support the development

Return to Parameters

Funding for this research was received from the following sources:

1 R33 LM07299-01 National Institutes of Health/National Library of Medicine GeneScene: A toolkit for gene pathway analysis

05/01/2002 04/30/2005 $1,320,000

1R01 LM06919-01A1 National Institutes of Health/National Library of Medicine

2/15/2001 - 2/14/2004 $500,000

UMLS Enhanced Dynamic Agents to Manage Medical Knowledge

IIS-9817473 National Science Foundation

5/1/99 - 4/31/2002 $500,000

DLI Phase 2: High Performance Digital Library Classification Systems: From Information Retrieval to Knowledge Management
Return to Parameters

1 of 5

9/15/2012 1:14 AM

Genomic Pathway Visualizer : Research : Artificial Intelligence Laboratory...


Arizona Cancer Center researchers, staff, and students for providing genomic data and helping with user evaluation of our applications. Department of Plant Sciences, University of Arizona for providing domain expertise in evaluation of our applications. Arizona Health Sciences Library for their support and assistance. National Library of Medicine for providing Unified Medical Language System (UMLS).
Return to Parameters

Approach & Methodology

Current focuses of the Arizona BioPathway research include automatic extraction of regulatory pathway relations from biomedical literature using NLP techniques, inference of genetic networks from genomic data using data mining approaches, and the integration of existing knowledge and text/data mining results of regulatory pathways using a variety of biomedical ontologies. The text mining component of Arizona BioPathway is designed to extract genetic regulatory pathway relations from biomedical literature. We have experimented with two different approaches of natural language processing (NLP) to extract the pathway relations, shallow parsing and full parsing. The shallow parser uses templates based on closed-class words (e.g., prepositions) and model generic relations to capture relations between noun phrases, while the full parser uses a broad coverage syntactic-semantic hybrid grammar to identify grammatical verb relations. To increase the precision, both approaches use relevant biomedical lexicons such as Gene Ontology (GO), HUGO Gene Nomenclature, and the Specialist Lexicon of UMLS to filter the extracted relations. We are also studying various statistical learning techniques for biomedical entity recognition and relation extraction from biomedical text. The data mining component is designed to extract gene regulatory relations from genomic and proteomic data including DNA microarray by machine learning techniques such as Bayesian networks. We are experimenting various techniques to learn regulatory networks from microarray data, either with existing prior knowledge or in combination with other types of biological experimental data, e.g., DNA methylation array or protein expression. The so-called joint learning approach is promising to learn the network more accurately, avoiding bias and incompleteness inherited by a particular type of data. Linkages extracted from heterogeneous genomic data sources provide different evidence about gene functional relations. In a recent study, we develop a Bayesian framework for integrating relations extracted from multiple sources, such as gene expression, biomedical literature, and genomic sequence information, into a genome-wide functional network. In addition, we conduct studies on cancer classification using gene array data. We are adopting and developing various feature selection techniques to identify marker genes and their interactions for cancer diagnosis and drug discovery. The knowledge integration component leverages a variety of biomedical ontology and knowledge sources to form an integrated framework for pathway-related knowledge organization. We have developed a feature decomposition approach to the aggregation of extracted pathway relations and resolution of the redundancy, ambiguity and inconsistency among them, using existing lexicons and ontologies such Entrez Gene, RefSeq, Homologene, MeSH, UMLS and GO. Pathway relations extracted from text and learned from data, as well as known relations from existing knowledge sources will eventually be integrated into a consolidated knowledge base. All these pathway relations can be combined to construct regulatory networks and be visualized by automatic graph drawing algorithms implemented in the Arizona BioPathway Visualizer (see the demo).

Text mining (PubMed, 2003) P53 - Text Collection: Content: All abstract with p53 or related genes in title or abstract Abstracts: 20,360 Linguistic Parser Relations: 194,384 Co-occurrence Relations: 2,724,099 AP1-Text Collection: Content: All abstract with ap1 or related genes in title or abstract Abstracts: 23,339 Linguistic Parser Relations: 258,142 Co-occurrence Relations: 3,265,524 Yeast - Text Collection: Content: All abstract with yeast in title or abstract Abstracts: 66,197 Linguistic Parser Relations: 584,502 Co-occurrence Relations: 6,535,737 Arabidopsis -Text Collection: Content: All abstracts with MeSH terms of Arabidopsis or Arabidopsis Proteins Abstracts: 10,548 Linguistic Parser Relations: 222 Co-occurrence Relations: 1,291

2 of 5

9/15/2012 1:14 AM

Genomic Pathway Visualizer : Research : Artificial Intelligence Laboratory...


Data Mining P53 Microarray Data: Content: Gene expression measurement of p53 mutant cell lines (provided by AZCC) Gene expression measurements: 33 Genes (Homo sapiens ORFs): 5,306 Genes with greatest variations: 200 Yeast Microarray Data: Content: Microarray data of yeast cell cycle (Spellman et al. 1998) Gene expression measurements: 77 Time series: 6 Genes (S. cerevisiae ORFs): 6,177 Genes whose expression varied over the different cell-cycle stages: 800 Arabidopsis Micrarray data: Content: two high-quality microarray series of Arabidopsis at http://www.weigelworld.org Gene expression measurements: 237 for development and 298 for abiotic stress Genes (Arabidopsis): 22,810 Arabidopsis Genome sequence relations: Content: gene relations extracted from genome sequence using four different methods in ProLink (http://dip.doe-mbi.ucla.edu/pronav) Relations: Phylogenetic profiling (PP): 132,637 Rosetta Stone (RS): 989,795 Gene neighbor (GN): 18,823 Gene cluster (GC): 11,586 MDS Microarray data Content: DNA methylation arrays from Arizona Cancer Center. It is derived from the epigenomic analysis of bone marrow specimens from healthy donors and individuals with myelodysplastic syndrome (MDS). Measurements: 55 (10 normal and 45 tumor samples) Genes: 678 Ovarian Cancer Microarray data Content: microarray-based measurements of DNA methylation from the Gynecologic Oncology tumor bank at the University of Iowa and made available through the Arizona Cancer Center. Measurements: 114 (25 normal and 89 tumor samples) Genes: 6,560

A shallow parser based on closed class English words extracting noun phrase relations A full parser using syntax-semantic hybrid grammar extracting verb relations Co-occurrence analysis based on Concept Space, which generates asymmetric relations between phrases ordered according to the strength of their relation Conditional Random Field (CRF) methods for entity recognition Kernel-based learning methods for relation extraction and classification Feature decomposition for entity and relation aggregation Bayesian Network frameworks for integrating gene functional relations from multiple data sources Optimal search based feature subset selection methods for identifying marker genes for cancer classification
Return to Parameters

Team Members Dr. Hsinchun Chen Dr. Zhu Zhang Dr. Jesse Martinez Cathy Larson Jiexun Li Hua Su Chun-Ju Tseng Siddharth Kaza Xin Li Nichalin Suakkaphong Yulei Zhang (Gavin) Shailesh Joshi hchen@eller.arizona.edu

3 of 5

9/15/2012 1:14 AM

Genomic Pathway Visualizer : Research : Artificial Intelligence Laboratory...


Return to Parameters

Publications Text Mining Publications and Presentations 1. K. D. Quiones, H. Su, B. Marshall, S. Eggers, and H. Chen. Usercentered evaluation of Arizona BioPathway: an information extraction, integration, and visualization system. IEEE Transactions on Information Technology in Biomedicine, 11(5): 527-536, 2007. 2. B. Marshall, H. Su, D. McDonald, S. Eggers, and H. Chen. "Aggregating Automatically Extracted Regulatory Pathway Relations." IEEE Transactions on Information Technology in Biomedicine, 10:100-108, 2006. 3. B. Marshall, H. Su, D. McDonald, and H. Chen. Linking ontological resources using aggregatable substance identifiers to organize extracted relations. In Proceedings of Pacific Symposium on Biocomputing, pp. 162-173, 2005. 4. G. Leroy, H. Chen. "GeneScene: An Ontology-Enhanced Integration of Linguistic and Co-Occurrence Based Relations in Biomedical Texts," Journal of The American Society for Information Science and Technology (JASIST), 56: 457-468, 2005. 5. D. McDonald, H. Chen, H. Su, and B. Marshall. "Extracting Gene Pathway Relations Using a Hybrid Grammar: The Arizona Relation Parser," Bioinformatics 20:3370-3378, 2004. 6. D.M. McDonald, H. Chen, G. Leroy, and H. Su. "Combining Ontologies and Grammatical Relations to Yield Diverse Semantic Relations from Biomedical Texts, Poster presentation at Pacific Symposium on Biocomputing, January 2004. 7. G. Leroy, H. Chen, and J.D. Martinez. A Shallow Parser Based on Closed-class Words to Capture Relations in Biomedical Text. Journal of Biomedical Informatics (JBI) 36:145-158, 2003. 8. G. Leroy, H. Chen, J.Martinez, S. Eggers, R. Falsey, K. Kislin, Z. Huang, J. Li, J. Xu, D. McDonald, and G. Ng. "GeneScene: Biomedical Text and Data Mining" Presented at the Third ACM and IEEE Joint Conference on Digital Libraries (JCDL-) May 27-31, 2003, Houston, Texas, 2003. 9. G. Leroy and H. Chen. "Filling preposition-based templates to capture information for medical abstracts." In Proceedings of Pacific Symposium on Biocomputing, pp. 350-361, 2002.

Data Mining Publications and Presentations 1. J. Li, H. Su, H. Chen, and B. W. Futscher Optimal search-based gene subset selection from gene array data for cancer classification. IEEE Transactions on Information Technology in Biomedicine, accepted, 2006. 2. Z. Huang, J. Li, H. Su, G. S. Watts, H. Chen "Large-scale regulatory network analysis from microarray data: modified Bayesian Network learning and association rule mining." Decision Support Systems: Special Issue on Decision Support in Medicine, forthcoming, 2006. 3. J. Li, X. Li, H. Su, H. Chen, and D. W. Galbraith, "A framework of integrating gene relations from heterogeneous data sources: an experiment on Arabidopsis thaliana." Bioinformatics, 22:2037-2043, 2006. 4. Z. Huang, H. Su, H. Chen Joint learning using multiple types of data and knowledge, in H. Chen, S. Fuller, C. Friedman, and W. Hersh (Eds.), Medical Informatics: Knowledge Management and Data Mining in Biomedicine, Springer, p.593-624. 2005. 5. Z. Huang, H. Chen, H. Su, B. Marshall, B. L. Smith, G. W. Watts, J. D. Martinez. Learning Genetic Pathways Using Bayesian Networks and Qualitative Probabilistic Networks, Poster presentation at Pacific Symposium on Biocomputing, January 2005. 6. Z. Huang, H. Chen, H. Su, B. Marshall, B. L. Smith, G. W. Watts, J. D. Martinez. Learning Genetic Pathways Using Bayesian Networks and Qualitative Probabilistic Networks, Poster presentation at Pacific Symposium on Biocomputing, January 2004.

4 of 5

9/15/2012 1:14 AM

Genomic Pathway Visualizer : Research : Artificial Intelligence Laboratory...


Return to Parameters

For additional information, please contact us.

| More

Artificial Intelligence Laboratory


Eller College




Photo Credits

Site Map


Eller College of Management : McClelland Hall 430 : 1130 E. Helen St. : P.O. Box 210108 : Tucson, Arizona 85721-0108 : 520.621.6219 Copyright 2012 The University of Arizona. All rights reserved.

5 of 5

9/15/2012 1:14 AM

Você também pode gostar