BINF 733/CSI 739
The Study of Gene Expression Using Microarrays: Methods and Data Analysis
Spring 2003


Instructors:
Jennifer Weller (PWI room 441, 993-8329, jweller@gmu.edu)
Jeff Solka (STI room 115, 993-1991(GMU), 540-653-1982(D), 540-371-3961(H),solkajl@nswc.navy.mil,jlsolka@yahoo.com)

Course Time: Thursdays 4:30 - 7:10 pm

Location: Prince William II, Room 185, Prince William Campus

Textbooks: "A Biologists Guide to Analysis of Microarray Data" by Steen Knudsen
"Microarray Analysis" by Mark Schena
Another useful text: "DNA Microarrays and Gene _Expression" by Pierre Baldi and Wesley Hatfield.

Course Description: The student will learn concepts underlying the manufacture of microarrays, the technologies employed in performing the experiments, and the fundamentals of assessing the quality of the resulting primary data, preprocessing and normalization of the data, methods for performing meta-analyses on these data sets, and techniques for visualizing these data.

Grading Policy:
Homework: 30%
Presentations: 30% Each student will make 2 presentations
Final project: 40% A written report and a presentation are required.

There will be three homework assignments. They will be due two weeks after they are assigned. Late assignments will not be accepted (late means midnight of the due date). You may write a small program or use an existing tool to analyze gene _expression datasets.

Presentations are to be of a paper or papers from the published literature, either from a pool suggested or a relevant paper that the instructors have approved. The paper must be made available to the class one week before the scheduled presentation.

Projects are to involve independent analysis or visualization of a public dataset, with a discussion of the methods chosen, validation tests verifying the suitability of the data for the technique you are displaying, a summary of the results and a discussion of the implications and possible interpretations.

Lecture Schedule Lecture Date Instructor Topic

1/23/03 Jennifer Weller Microarray Technology: platforms and manufacture

1/30/03 Jennifer Weller Experimental Design (Biophysical and sequence similarity constraints)

2/6/03 Jennifer Weller Gene Expression Databases

2/13/03 Preprocessing Gene Expression Data

2/20/03 Jeff Solka Dimensionality Reduction

Dimensionality Reduction Homework

Golub data

Golub all/aml class file

Golub B/T class file

2/27/03 Jennifer Weller Biological Assessment of Cluster Validity

2/20/03 Jeff Solka Pattern Recognition

Pattern Recognition Homework

3/13/03 None Spring Break

3/20/03 Jeff Solka Visualization of High-Dimensional Data

3/27/03 Jeff Solka Clustering

Clustering/Visualization Homework

4/5/03 HW Tutoring Session

Homework # 1 Tutorial

Homework # 2 Tutorial

Homework # 3 Tutorial

4/10/03 TBD Student presentations of papers
1 - Diggans
S. Bicciato, A. Luchini, and C. Di Bello
PCA disjoint models for multiclass cancer
analysis using gene _expression data, Bioinformatics 2003 19: 571-578.

2 - Higgs
A multivariate approach applied to microarray data
for identification of genes with cell cycle-coupled
transcription Daniel Johansson ,Petter Lindgren and Anders Berglund


3 -  Johnson
Tumor classification by partial least squares using microarray
gene expression data. Danh V. Nguyen & David M. Rocke (University of California)
,Bioiniformatics Vol. 18 no. 1 2002 (Pages 39-50)

4/17/03 TBD Student presentations of papers
1 - Porter
Identification of toxicologically predictive gene sets using cDNA
microarrays RUSSELL S. THOMAS, DAVID R. RANK, SHARRON G. PENN,
GINA M. ZASTROW, KEVIN R. HAYES, KALYAN PANDE, EDWARD GLOVER,
TOMI SILANDER, MARK W. CRAVEN, JANARDAN K. REDDY,
STEVAN B. JOVANOVICH, and CHRISTOPHER A. BRADFIELD

2 - Raman
Computational Analysis of Leukemia Microarray Expression Data Using
the GA/KNN Method Leping Li,1,* Lee G. Pedersen,2 Thomas A. Darden,
3 and Clarice R. Weinberg 1


3 -Tibriwa
Class Cover Catch Digraphs for Latent Class Discovery in Gene Expression
Monitoring by DNA Microarrays, Priebe, Solka, Marchette, and Clarke

4 - Chiluku
A Hierarchical Latent Variable Model for Data Visualization,
Bishop and Tipping

4/24/03 TBD Student presentations of projects
1 - Diggans
Generalized Association Plots in MATLAB/R and Their Application
to Gene Expression Analysis
 
Generalized Association Plots (GAPs) have previously been used to analyze
numerous data types including categorical and other general association data.
Initial implementations of this methodology. This approach has recently been
subjected to theoretical convergence analysis and has been shown to be useful inthe identification of hidden structures within moderately high-dimensional data
sets.
 
The investigator will implement GAPs in Matlab or R and apply them to the
analysis of gene expression data. The developed software will be tested on toy
problems and the Golub gene dataset.


Project Presentation


Project Report


2 - Higgs
This paper will focus on the application and extensions of

Recursive partitioning
for tumor classification with gene expression microarray data


Project Presentation


Project Report


3 - Johnson
Title:
Distinct types of diffuse large B-cell lymphoma identified by gene _expression profiling

Authors:
Alizadeh, Eisen, Davis, et al

Journal:
Nature Vol. 403 February 3, 2000

Link:

http://rana.lbl.gov/papers/Alizadeh_Nature_2000.pdf

The authors of this paper report their findings on
the basis of one type of analysis; I will use other techniques
to investigate and interrogate their claims (PCA, MDS, k-means,
divisive clustering (DIANA), & LDA-based predictive Modeling).


Project Presentation


Project Report


5/01/03 TBD Student presentations of projects
1 - Porter
The project I will be focusing on:the identification of a pharmacological response in the liver
induced by an anti-diabetic drug.  This is an exercise
in taking microarray data where animals have been treated with an agent,
and characterizing the biological effect (based on what was seen previously
in the public literature and how it relates to the data set).  Therefore,
I will show some examples of specific genes in metabolic pathways that
are regulated, etc.  In addition, a variety of statistical tests and
visualizations will be shown to verify the effect, show its "intensity,"
and relate the effect to a reference database using discriminant techniques.


Project Presentation


Project Report



2 - Raman
(w,p) Adaptation Through Genetic Algorithms and k nearest Neighbors

Previous work by Li et. al has demonstrated feature selection based on the use
of genetic algorithms and k nearest neighbors. The investigator of this project
will attempt to extend this work in order to perform not only feature selection
but also metric or distance selection. The R/Matlab developed code will use the
Li approach to not only select the best features based on discriminatory power,
but will also provide the best choice of Minkowski p parameter. The developed
software will be tested on toy problems and the Golub gene expression data set.


Project Report





3- Chiluku
Visualization Analysis of Gene Expression Data Via Hierarchical
Latent Variable Models

Hierarchical latent variable modes are a technique recently developed by
Bishop and Tipping that allow for the visualization of cluster structure in data
of moderately hight dimensions (p < 40). This code is currently available form
www.ncrg.aston.ac.uk/PhiVis. The investigator in this effort will apply
this software to the Golub gene _expression data set. it is expected that the
dimensionality of the data will first need to be reduced prior to apply the
technique. One can use as subset of the genes as features, for example the
ones identified in the knn/Ga work of Li et al. or some other subset of
features, for example obtained via ISOMAP or LLE. The developed code will be applied to
the Golub gene _expression dataset and toy problems.



Project Presentation


Project Report


Papers

Minimizing the secondary structure of DNA targets by incorporation of a modified deoxynecleoside:implications for neucleic acid analysis by hybridisation

Improved thermodynamic parameters and helix initiation factor to predict stability of dna duplexes

 

A link to Jeff's course addendum page is given below.

BINF733/CSI739 2003-Jeff's Course Addendum Page

A link to Jennifer's course addendum page is given below.

BINF733/CSI739 2003-Jennifer's Course Addendum Page

A link to last years course is given below.

CSI739 2002-Topics in Bioinformatics: Multidimensional Genome Analysis