The volume of data being generated by microarray experiments is driving a revolution in the mining of unstructured scientific natural language text. Researchers perform experiments which are documented in a publication that presents research results. Other researchers access publications looking for nuggets of insight that could be relevant to their particular areas of research. Often advances in one area of research come from orthogonal research areas, yet researchers normally stay within their sphere of expertise, rarely venturing outside their area of mastery to examine other areas of science. It is this process that is fueling an explosive interest in a field known as literature-based discovery (LBD).

The goal of this research is to examine the utility of a process that automatically suggests new gene-function relationships either as targeted areas of investigation or as additional evidence in Gene Ontology (GO) projects. The primary motivation is to identify unknown or latent gene-function relationships within the Gene Ontology and leverage LBD methods to identify supporting evidence for these candidate gene-function pairs. This talk will describe a methodology for extracting latent relationships between genes and proteins represented in the GO database. Entrez PubMed abstracts associated with latent genes are leveraged to drive a LBD process to mine evidence supporting the prospective gene-function relationship.