% % NOTE -- ONLY EDIT THE .Rnw FILE!!! The .tex file is % likely to be overwritten. % \documentclass[12pt]{article} \usepackage{amsmath,pstricks} \usepackage[authoryear,round]{natbib} \usepackage{hyperref} \textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \textwidth=6.2in \bibliographystyle{plainnat} \begin{document} %\setkeys{Gin}{width=0.55\textwidth} \title{Day 3 Bioconductor 2008 Longwood course} \maketitle \tableofcontents \section{Data resources} \section{Affy SNP chips} \begin{verbatim} Archive: gliPack.zip Length Date Time Name -------- ---- ---- ---- 0 03-04-08 13:07 HIND/ 114 03-04-08 13:06 HIND/doc.hist 25641293 02-15-08 14:29 HIND/GSM248457.CEL 25658564 02-15-08 14:29 HIND/GSM248458.CEL 25664717 02-15-08 14:29 HIND/GSM248468.CEL 25643596 02-15-08 14:29 HIND/GSM248469.CEL 25637357 02-15-08 14:29 HIND/GSM248522.CEL 25661493 02-15-08 14:29 HIND/GSM248524.CEL 13473013 03-02-08 21:53 HIND/hindCRLMM.rda 0 03-04-08 13:07 XBA/ 116 03-04-08 13:07 XBA/dox.hist 25651694 02-15-08 14:28 XBA/GSM248244.CEL 25655151 02-15-08 14:28 XBA/GSM248245.CEL 25657892 02-15-08 14:29 XBA/GSM248255.CEL 25641899 02-15-08 14:29 XBA/GSM248256.CEL 25649886 02-15-08 14:29 XBA/GSM248309.CEL 25651094 02-15-08 14:29 XBA/GSM248311.CEL 13562608 03-02-08 21:53 XBA/xbaCRLMM.rda 27028881 03-02-08 22:12 gli100k.rda 2194 03-04-08 13:05 workComb.hist -------- ------- 361881562 20 files \end{verbatim} It takes a long time on most laptops, but you could try to read some CEL files with code like: <>= NSAMP = 2 pd = new("AnnotatedDataFrame", data=data.frame(gender=rep("male", NSAMP))) hindCRLMM = justCRLMM(dir(patt="CEL")[1:NSAMP], phenoData=pd) @ This generates an instance of SnpSetCallPlus. I have gone through the process of reading, preprocessing and summarizing the SNP data for both XBA and HIND chips yielding the gli100k.rda object. <>= library(Biobase) library(oligo) library(pd.mapping50k.xba240) load("gli100k.rda") calls(gli100k)[1:5,1:5] @ To resolve the AFFY SNP identifiers, you need <>= con = pd.mapping50k.xba240@getdb() con dbListTables(con) dbGetQuery(con, "select * from featureSet limit 1000,5") @ The sample phenoData (in text) are \begin{verbatim} primary glioblastoma 1: 244 X, 457 H primary glioblastoma 2: 245 X, 458 H anaplastic oligodendroglioma 1: 255 X, 468 H anaplastic oligodendroglioma 2: 256 X, 469 H anaplastic astrocytoma 5: 309 X, 522 H anaplastic astrocytoma 6: 311 X, 524 H \end{verbatim} where nnn X denotes the suffix of the GSM number for Xba element of 100k set, nnn H is the suffix of the GSM number for Hind element. Exercise: make the phenoData structure and attach. Is there a SNP among the first 1000 on the chips that distinguishes primary glio from anaplastic astro in the sense that subjects with one of the tumor types are consistently homozygous rare and the subjects with the other one are consistently homozygous common? What is the SNP? Note: <>= sampleNames(gli100k) @ \begin{verbatim} > table(as.numeric(calls(gli100k))) 1 2 3 259481 180787 256956 > distt = function(x) all((x[1:2] - x[5:6]) == 2) > apply(calls(gli100k)[1:1000,], 1, distt) -> chk > any(chk) [1] TRUE > which(chk) SNP_A-1642215 236 > calls(gli100k)[236,] GSM248244.CEL GSM248245.CEL GSM248255.CEL GSM248256.CEL GSM248309.CEL 3 3 3 1 1 GSM248311.CEL 1 \end{verbatim} \section{Illumina -- raw MAQC data} Use lumiR to capture data in \verb+ILM_2_ALL_raw_gene_profile.csv+. Perform background correction (use bgAdjust.affy) and normalization, and view the boxplots at all three stages: raw, background-corrected, and (default) normalized. Show the effect of normalization on expression measures of a selected gene. <>= library(lumi) library(genefilter) library(limma) ilm2 = lumiR(dir(patt="ILM")) annotation(ilm2) = "illuminaHumanv1.db" plot(ilm2) <>= boxplot(ilm2) @ <>= args(lumiN) boxplot(lumiN(ilm2)) @ \begin{enumerate} \item install new AnnotationDbi 1.1.24 and illuminaHumanv1.db (in illuminaHumanv1.zip, sic) \item Create phenoData with pctBrain correctly assigned to the samples. \item Find genes that are differentially expressed in the A vs B comparison. \item Find the GO category and KEGG pathway assignments of the top 10 genes. Build an HTML table for these genes. \end{enumerate} <>= library(illuminaHumanv1.db) <>= bilm2 = lumiB(ilm2, method="bgAdjust.affy") nilm2 = lumiN(lumiT(bilm2)) mads = apply(exprs(nilm2),1,mad) bm = which(mads > quantile(mads, .8)) isAB = which(substr(sampleNames(nilm2),1,1) %in% c("A", "B")) f2 = nilm2[bm,isAB] isB = 1*(substr(sampleNames(f2),1,1)=="B") f2$isBrain = factor(isB) mm = model.matrix(~isBrain, data=pData(f2)) l1 = lmFit(f2, mm) el1 = eBayes(l1) <>= tt = topTable(el1, 2, 10) tt library(annaffy) nn = tt[,1] myt = aafTableAnn(nn, "illuminaHumanv1.db") saveHTML(myt, file="lkbr.html") @ \end{document}