BINF 733/CSI 739
The Study of Gene Expression Using Microarrays: Methods and Data
Analysis
Spring 2003
Instructors:
Jennifer Weller (PWI room 441, 993-8329, jweller@gmu.edu)
Jeff Solka (STI room 115, 993-1991(GMU), 540-653-1982(D),
540-371-3961(H),solkajl@nswc.navy.mil,jlsolka@yahoo.com)
Location: Prince William II, Room 185, Prince William Campus
Textbooks: "A Biologists Guide to Analysis of Microarray
Data" by Steen Knudsen
"Microarray Analysis" by Mark Schena
Another useful text: "DNA Microarrays and Gene _Expression" by Pierre
Baldi and Wesley Hatfield.
Course Description: The student will learn concepts underlying the manufacture of microarrays, the technologies employed in performing the experiments, and the fundamentals of assessing the quality of the resulting primary data, preprocessing and normalization of the data, methods for performing meta-analyses on these data sets, and techniques for visualizing these data.
Grading Policy:
Homework: 30%
Presentations: 30% Each student will make 2 presentations
Final project: 40% A written report and a presentation are required.
There will be three homework assignments. They will be due two weeks after they are assigned. Late assignments will not be accepted (late means midnight of the due date). You may write a small program or use an existing tool to analyze gene _expression datasets.
Presentations are to be of a paper or papers from the published literature, either from a pool suggested or a relevant paper that the instructors have approved. The paper must be made available to the class one week before the scheduled presentation.
Projects are to involve independent analysis or visualization of a public dataset, with a discussion of the methods chosen, validation tests verifying the suitability of the data for the technique you are displaying, a summary of the results and a discussion of the implications and possible interpretations.
Lecture Schedule Lecture Date Instructor Topic
1/23/03 Jennifer Weller Microarray Technology: platforms and manufacture
1/30/03 Jennifer Weller Experimental Design (Biophysical and sequence similarity constraints)
2/6/03 Jennifer Weller Gene Expression Databases
2/13/03 Preprocessing Gene Expression Data
2/20/03 Jeff Solka Dimensionality Reduction
Dimensionality Reduction Homework
2/27/03 Jennifer Weller Biological Assessment of Cluster Validity
2/20/03 Jeff Solka Pattern Recognition
3/13/03 None Spring Break
3/20/03 Jeff Solka Visualization of High-Dimensional Data
Clustering/Visualization Homework
4/5/03 HW Tutoring Session
4/10/03 TBD Student presentations of papers
1 - Diggans
S.
Bicciato, A. Luchini, and C. Di Bello
PCA disjoint models for multiclass cancer
analysis using gene _expression data, Bioinformatics 2003 19: 571-578.
2 - Higgs
A
multivariate approach applied to microarray data
for identification of genes with cell cycle-coupled
transcription Daniel Johansson ,Petter Lindgren and Anders Berglund
3 - Johnson
Tumor
classification by partial least squares using microarray
gene expression data. Danh V. Nguyen & David M. Rocke (University
of California)
,Bioiniformatics Vol. 18 no. 1 2002 (Pages 39-50)
4/17/03 TBD Student presentations of papers
1 - Porter
Identification
of toxicologically predictive gene sets using cDNA
microarrays RUSSELL S. THOMAS, DAVID R. RANK, SHARRON G. PENN,
GINA M. ZASTROW, KEVIN R. HAYES, KALYAN PANDE, EDWARD GLOVER,
TOMI SILANDER, MARK W. CRAVEN, JANARDAN K. REDDY,
STEVAN B. JOVANOVICH, and CHRISTOPHER A. BRADFIELD
2 - Raman
Computational
Analysis of Leukemia Microarray Expression Data Using
the GA/KNN Method Leping Li,1,* Lee G. Pedersen,2 Thomas A. Darden,
3 and Clarice R. Weinberg 1
3 -Tibriwa
Class
Cover Catch Digraphs for Latent Class Discovery in Gene Expression
Monitoring by DNA Microarrays, Priebe, Solka, Marchette, and Clarke
4 - Chiluku
A
Hierarchical Latent Variable Model for Data Visualization,
Bishop and Tipping
4/24/03 TBD Student presentations of projects
1 - Diggans
Generalized Association Plots in MATLAB/R and Their Application
to Gene Expression Analysis
Generalized Association Plots (GAPs) have previously been used to
analyze
numerous data types including categorical and other general association
data.
Initial implementations of this methodology. This approach has recently
been
subjected to theoretical convergence analysis and has been shown to be
useful inthe identification of hidden structures within moderately
high-dimensional data
sets.
The investigator will implement GAPs in Matlab or R and apply them to
the
analysis of gene expression data. The developed software will be tested
on toy
problems and the Golub gene dataset.
5/01/03 TBD Student presentations of projects
1 - Porter
The project I will be focusing on:the identification of a
pharmacological response in the liver
induced by an anti-diabetic drug. This is an exercise
in taking microarray data where animals have been treated with an agent,
and characterizing the biological effect (based on what was seen
previously
in the public literature and how it relates to the data set).
Therefore,
I will show some examples of specific genes in metabolic pathways that
are regulated, etc. In addition, a variety of statistical tests
and
visualizations will be shown to verify the effect, show its "intensity,"
and relate the effect to a reference database using discriminant
techniques.
3- Chiluku
Visualization Analysis of Gene Expression Data Via Hierarchical
Latent Variable Models
Hierarchical latent variable modes are a technique recently developed
by
Bishop and Tipping that allow for the visualization of cluster
structure in data
of moderately hight dimensions (p < 40). This code is currently
available form
www.ncrg.aston.ac.uk/PhiVis. The investigator in this effort will apply
this software to the Golub gene _expression data set. it is expected
that the
dimensionality of the data will first need to be reduced prior to apply
the
technique. One can use as subset of the genes as features, for example
the
ones identified in the knn/Ga work of Li et al. or some other subset of
features, for example obtained via ISOMAP or LLE. The developed code
will be applied to
the Golub gene _expression dataset and toy problems.
Papers
Improved thermodynamic parameters and helix initiation factor to predict stability of dna duplexes
A link to Jeff's course addendum page is given below.
BINF733/CSI739 2003-Jeff's Course Addendum Page
A link to Jennifer's course addendum page is given below.
BINF733/CSI739 2003-Jennifer's Course Addendum Page
A link to last years course is given below.
CSI739 2002-Topics in Bioinformatics: Multidimensional Genome Analysis