----------------------------------------------------------------------- BIOINFORMATICS COLLOQUIUM School of Computational Sciences George Mason University ----------------------------------------------------------------------- From Text Mining to Gene Mining Jeff Solka George Mason University Tuesday, February 15, 2005 4:30 pm Verizon Auditorium, Prince William Campus This talk will discuss recent work at applying the bipartite bipartition methodology of (Dhillon 2001) to text and gene expression data. The methodology has the nice property that it simultaneously clusters on document (samples) and words (genes). Work will be discussed that extends Dhillon's proposed methodology to perform a tree-based clustering of documents and words. This work will be illustrated using two small corpora (a collection of Science News articles and a collection of Office of Naval Research Project abstracts). Dhillon's original methodology was also modified in order to process gene expression data. The modified methodology was applied to the Golub ALL/AML gene expression dataset. The ability of this technique to reveal disease relevant genes will be discussed. The text mining work is joint with Avory Bryant of NSWCDD and Ed Wegman of GMU. The gene expression work is joint with Brandon Higgs of Mitre. Brandon is also a Ph.D. student in the bioinformatics program here at GMU. ---------------------------------------------------------------------- Refreshments are served at 4:00 pm. Find the schedule and directions at http://www.binf.gmu.edu/colloq.html