Cornell Math - MATH 675, Fall 2006

MATH 675: Statistical Theory Applicable to Genomics (Fall 2006)

Instructor: J. T. Gene Hwang

Meeting Time & Room

There are many statistical concepts that are useful in Genomics. One particular problem with Genomics (e.g. Microarray Data Analysis) is that the number of populations or Genes is large. As a result there are a huge number of hypotheses. How to test these type of hypotheses simultaneously? We will discuss concepts such as family-wise error rate, false discovery rate (FDR) of Benjamini and Hochberg(1995 JRSS B) and Storey's papers relating to pFDR. We will also discuss the fundamental cornerstone of multiple testing, the closed testing method. A shortcut algorithm is called the stepdown testing. See Westfall and Young(1993).

What other statistical inferential technique may be useful for a large number of populations or Genes? The tradition one population approach assuming that all populations are different is too inefficient. It seems interesting and important to have techniques that can combine all observations from all populations together and when the populations are similar they "borrow the strength" from each other and when the populations are very different they go separate ways. In fact, Shrinkage (or Empirical Bayes) technique, or equivalently the BLUP in mixed model can do this. So the course will spend some time discussing these techniques. We will discuss the point estimation and the confidence interval construction. A new approach called the selected mean approach proves to be promising and will be discussed.

Other topics may include permutation tests and QTL identification if time allows. This course is mainly about the (mathematical) statistical theory and hence in many lectures the focus was to prove theorems. It is recommended that you should have some statistic courses such as OR&IE 670 or MATH 674.