% % NOTE -- ONLY EDIT THE .Rnw FILE!!! The .tex file is % likely to be overwritten. % \documentclass[12pt]{article} \usepackage{amsmath,pstricks} \usepackage[authoryear,round]{natbib} \usepackage{hyperref} \textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \textwidth=6.2in \bibliographystyle{plainnat} \begin{document} %\setkeys{Gin}{width=0.55\textwidth} \title{Parsing, preprocessing, and interpreting illumina WG-8 expression data} \author{VJ Carey \copyright 2008} \maketitle \section{Introduction} We are going to look at some data obtained from GEO, GSE5415. The raw data were obtained from GEO in a tar archive and are available to you in folder C:long08:illuFGF. We wish to verify some of the assertions made in <>= library(annotate) pmid2MIAME("17038665") @ particularly \setkeys{Gin}{width=1.05\textwidth} \includegraphics{stemCellRes} \section{Basic tasks} Review and execute the following code: <>= library(lumi) starv1 = lumiR("GSM124298.txt") restim1 = lumiR("GSM124299.txt") starv2 = lumiR("GSM124300.txt") restim2 = lumiR("GSM124301.txt") type = factor(rep( "hESC", 4)) trt = factor(c("FGF-starved", "FGF-restim", "FGF-starved", "FGF-restim")) # restim = combine(starv1, restim1) restim = combine(restim, starv2) restim = combine(restim, restim2) # # sampleNames(restim) = c("g298", "g299", "g300", "g301") phenoData(restim)$type = type phenoData(restim)$trt = trt library(annotate) mi = pmid2MIAME("17038665") #experimentData(restim) = mi # nrestim = lumiN(lumiT(restim)) @ \setkeys{Gin}{width=0.85\textwidth} <>= par(mfrow=c(1,2)) boxplot(restim, main="prenorm") boxplot(nrestim, main="post Tx/Norm") par(mfrow=c(1,1)) @ Now use illuminaHumanv1 annotation to obtain identifiers for genes mentioned in the excerpt. Verify the claims graphically. For example: <>= library(illuminaHumanv1) alls = as.list(illuminaHumanv1SYMBOL) grep("FOS", unlist(alls), value=TRUE) plot( exprs(nrestim)["GI_6552332-S",]) @ \section{A hypergeometric test} We will use limma to test for differential expression: <>= library(limma) des = model.matrix(~trt, data=pData(nrestim)) f1 = lmFit(nrestim, des) ef1 = eBayes(f1) topTable(ef1,2) tt = topTable(ef1,2,n=100) @ We will regard as differentially expressed those genes with adjusted P-value $< 0.05$. <>= difg = tt[ tt$adj.P < 0.05, 1 ] @ We will also get the names of genes in the cell cycle regulation (CCR) GO category. <>= library(illuminaHumanv1) inccr = illuminaHumanv1GO2ALLPROBES[["GO:0007049"]] @ Cross-classify all genes on the chip according to presence in CCR and differential expression under restimulation with FGF: <>= fn = featureNames(nrestim) tab = table( fn %in% inccr, fn %in% difg ) tab fisher.test(tab) @ Questions: \begin{enumerate} \item Obtain the p-value for enrichment of genes differentially expressed with FGF restimulation with genes annotated to transcription factors in GO. \item Note that the names component of inccr gives us the 'evidence codes' for GO associations. What happens when you confine attention to genes for which association with CCR is documented by a TAS (Traceable Author's Statement)? \item Can we avoid the discretization of genes into 'differentially'/ 'non-differentially' expressed and make related inferences? \end{enumerate} \end{document}