BINF 733 Spring 2006 Microarry Analysis

Instructors - Dr.Jeff Solka (primary) and Dr. Jennifer Weller Meeting Place - Occoquan 328
Meeting Time - 4:30 pm - 7:10 pm Thursdays


Dr. Solka Information:
Office: Occoquan 312D
email: jsolka@gmail.com
Office Hours: By appointment(walk-ins are discouraged).

Dr. Weller Information:
Office: Occoquan 328E
Telephone: (703) 993-8329
email: jweller@gmu.edu
Office Hours: 2-4pm Tuesdays or by appointment(walk-ins are discouraged).

Course Description Students will learn concepts needed for correct experimental design of gene expression experiments in which measurements are carried out using microarray platforms; students will be introduced to relevant data analysis techniques including preprocessing (data cleansing and handling missing values), appropriate data standardization and normalization methods within and between arrays; higher-level analyses including dimensionality reduction, discriminant analysis, a variety of clustering methods, annotation tools and visualization methods. These methods will be illustrated using the R computing environment and in particular the BioConductor packages developed for microarray analysis.


Course Goals Students will be instructed in the theoretical basis underlying a number of methods commonly used for handling and analyzing large datasets of molecular biological origin. These concepts will be reinforced with practical examples explained in class and reinforced with homework problems. Homework problems are selected to encourage students to gain facility using the R/BioConductor analysis packages and develop analytical skills for recognizing how to select an appropriate method from among the large number available. In some cases stand-alone tools will be discussed, since there is a large research literature in this topic. In addition to chapters from the course text, relevant research articles will be assigned, and students will be encouraged to critically dissect these papers for both methodology and results.


Grading Policy Students will be evaluated for mastery of course material through five quizzes (25%), five homework assignments (25%), an in-class presentation of a critical evaluation of a journal article (20%), and an in-class presentation (20%) with an accompanying formal written report (10%) of an analysis project (either replicating a published method in full or modifying such a method and comparing the students results to those of the original report).


Course Textbooks -

  1. "Bioinformatics and Computational Biology Solutions using R and BioConductor" by Robert Gentleman, V. Carey, W. Huber, Raphael Irizarry, and Sandrine Dudoit. (2005). The Springer Series in Statistics for Biology and Health.
  2. "Microarray Bioinformatics" by Dov Stekel (2003). From Cambridge University Press..


Tentative Course Schedule - Check back often, this will be updated frequently (TBP = To be Posted)

Data Instructors Topics Text Assignments Other Reading/Homework

Jan. 26, 2006

Class # 1

Weller/Solka

Manufacturing microarrays & probe design;

Intro to R

Stekel Chpts 1,3

 

 

Feb. 2, 2006

Class # 2

Weller/Solka

Quiz 1

Data standards and public databases;

Introduction to BioConductor

Stekel Chpt. 3

 

GHCID Chpt. 1

HW 1 Assigned

Feb. 9, 2006

Class # 3

Solka

Data cleansing and preprocessing.

 

Stekel Chpts. 4, 5

GHCID Chpts. 2, 3, 4

 

 

 

Feb. 16, 2006

Class # 4

Weller

Quiz 2

Data cleansing and preprocessing;

GHCID 5, 6

 

HW 2 Assigned

 

Feb. 23, 2006

Class # 5

Solka

Meta-data:biological annotation and visualization

 

Stekel Chpt. 2

GHCID Chpts. 7,8, and 9

 

March 2, 2006

Class # 6

Solka

Quiz 3

Statistical Analysis Overview and Distance Measures

Stekel Chpt. 6

GHCID Chpt. 11, 12

 

 

March 9, 2006

Class # 7

Solka

Methods for defining and determining significant levels of differential expression

Stekel Chpt. 7

GHCID Chpts. 14, 15

 

March 16, 2006

Week 8

NA

Spring Break Week

NA

 

 

March 23, 2006

Class # 8

Solka

Quiz 4

Clustering

 

 

Stekel Chpt. 8

GHCID Chpt. 13

HW 1,2 due

HW 3 assigned

March 30, 2006

Class # 9

Solka

Discriminant analysis

 

Stekel Chpt. 9

GHCID Chpts. 16, 17

 

April 6, 2006

Class # 10

Weller/Solka

Quiz 5

Resources for obtaining meaningful annotation of genes;

Browser-based annotation

MDA Handouts in class

GHCID Chpt. 18

HW 4 assigned

April 13, 2006

Class # 11

Weller/Solka Student Presentation Work day Optional attendance, professors available  

April 20, 2006

Class # 12

Weller/Solka Student Presentations of papers (slides due 5pm Tuesday 19th)  

HW 5 assigned

 

April 27, 2006

Class # 13

Weller/Solka Student Project work day (slides will be due 5pm Wednesday 3rd) Optional attendence, professors available  

May 4, 2006

Class # 14

Weller/Solka Student Project presentations (slides due at 5pm Wed 3rd)    
May 5, 2006 Reports due Project Reports   Project reports due from all students (hard copy) by 5pm to Prof. Weller
May 11, 2006 NA No Final Exam given - all late assignments due    

The URL for Dr. Solka's course site includes copies of Dr. Solka's lecture notes and examples of previous presentations and projects

The URLs are: http://www.scs.gmu.edu/~jsolka and http://binf.gmu.edu/jsolka/s2005/binf733/binf733_spg05.html


Lecture Notes will be posted here by the Friday following class lectures.

  • Lecture # 1 - Introduction to Arrays (Weller) One Slide Per Page
  • Lecture # 2 - Introduction to R (Solka) Two Slides Per Page
  • Lecture # 3 - Introduction to Bioconductor (Solka) Two Slides Per Page
  • Lecture # 4 - Weller Data Sources and Data models: MIAME and MAGE (Weller) One Slide Per Page
  • Lecture # 5 - Data Extraction, Cleansing and Preprocessing(Weller) One Slide Per Page
  • Lecture # 6 - Visualization(Solka) One Slide Per Page
  • Lecture # 7 - Distance Measures(Solka) Two Slides Per Page

    Information needed for homework assignments (data, scripts, references) will be posted here


    Student Paper Presentations


  • "A Graph-theoretic approach to testing associations between disparate sources of functional genomics data" by Balasubramanian et al (2004) Bioinformatics20(18) 3353-3362. Link This paper will be presented by Kavita Tanksale


  • "Outcome signature genes in breast cancer: is there a unique set?" by Ein-Dor et al (2005) Bioinformatics 21(2) 171-178. Link This paper will be presented by James Clark


  • "A rapid method for computationally inferring transcriptome coverage and microarray sensitivity" by Reverter et al (2005) Bioinformatics 21(1) 80-89. Link This paper will be presentedby Tugba Suzek


  • "Missing value estimation for DNA microarray gene expression data: local least squares imputation" by Kim, Golub and Park (2005) Bioinformatics 21(2) 187-198. Link This paper will be prsented by Baris Suzek


  • "Gene selection using a two-level hierarchical Bayesian model" by Bae and Mallick (2004) Bioinformatics 20(18) 3423-3430. Link This paper will be presented by Colin Sherrill


  • Suggested Reading for presentations and projects: Check back often, this will be updated throughout the term.

    1. "A Graph-theoretic approach to testing associations between disparate sources of functional genomics data" by Balasubramanian et al (2004) Bioinformatics20(18) 3353-3362. Link
    2. "A semi-parametric approach for marker gene selection based on gene expression data" by Guan and Zhao (2005) Bioinformatics 21(4) 529-536. Link
    3. "Construction of robust prognostic predictors by using projective adaptive resonance theory as a gene filtering method" by Takahashi, Kobayashi and Honda (2005) Bioinformatics 21(2) 179-186. Link
    4. "VarMixt:efficient variance modeling for the differential analysis of replicated gene expression data" by Delmar, Robin and Daudin (2005) Bioinformatics 21(4) 502-508. Link
    5. "Identifying time-lagged gene clusters using gene expression data" by Ji and Tan (2005) Bioinformatics 21(4) 509-516. Link
    6. "Outcome signature genes in breast cancer: is there a unique set?" by Ein-Dor et al (2005) Bioinformatics 21(2) 171-178. Link
    7. "Gene selection using a two-level hierarchical Bayesian model" by Bae and Mallick (2004) Bioinformatics 20(18) 3423-3430. Link
    8. "BagBoosting for tumor classification with gene expression data" by Marcel Dettling (2004) Bioinformatics 20(18) 3583-3593. Link
    9. "How many samples are needed to build a classifier: a general sequential approach" by Fu et al (2005) Bioinformatics 21(1) 63-70. Link
    10. "A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data" by Zou and Conzen (2005) Bioinformatics 21(1) 71-79. Link
    11. "A rapid method for computationally inferring transcriptome coverage and microarray sensitivity" by Reverter et al (2005) Bioinformatics 21(1) 80-89. Link
    12. "Missing value estimation for DNA microarray gene expression data: local least squares imputation" by Kim, Golub and Park (2005) Bioinformatics 21(2) 187-198. Link
    13. "Dimension reduction methods for microarrays with application to censored survival data" by Li and Li (2004) Bioinformatics 20(18) 3406-3412. Link
    14. "The 'subsequent articifical network'(SANN) approach might bring more classificatory power to ANN-based DNA microarray analyses" by Linder et al (2004) Bioinformatics 20(18) 3544-3552. Link
    15. "Approximate geodesic distances reveal biologically relevant structures in microarray data" by Nilsson, Fioretos, Hoglund, and Fontes (2004) Bioinformatics 20(6):874-880.
    16. "Quantitative characterization of the transcriptional regulatory network in the yeast cell cycle" by Chen et al. (2004) Bioinformatics 20(12): 1914-1927. Link