library(DT)
library(limma)
# source('http://bioconductor.org/biocLite.R') # Import biocLite() function into R environment biocLite('limma')
library(limma)
# limmaUsersGuide() # Opens pdf manual for limma biocLite('samr')
library(samr)  # Load the library
library(affy)  # Load affy package 
library(GEOquery)
# Use biomaRt for annotation biocLite('biomaRt')
library(biomaRt)

Limma

Limma is a software package for the analysis of gene expression microarray data, especially the use of linear models for analyzing designed experiments and the assessment of differential expression. The package includes pre-processing capabilities for two-color spotted arrays. The differential expression methods apply to all array platforms and treat Affymetrix, single channel and two channel experiments in a unified way. The methods are described in Smyth 2004 and in the limma manual. An illustrated introduction for the GUI packages can be found at Walter+Eliza Hall Bioinformatics Institute of Medical Research (WEHI).

We will be analyzing tissue-specific differences from Su et al., 2002. Datasets for human and mouse transcriptomes in all tissues are available from Gene Expression Omnibus. We will be analyzing a subset of brain-liver samples, download here.

Prepare the data:

eset.rma <- justRMA(celfile.path = "../data/Su_CELs/")  # RMA summarization of the CEL files
pData(eset.rma)  # Check what samples we have
##                   sample
## Brain_1.CEL            1
## Brain_2.CEL            2
## Fetal_brain_1.CEL      3
## Fetal_brain_2.CEL      4
## Fetal_liver_1.CEL      5
## Fetal_liver_2.CEL      6
## Liver_1.CEL            7
## Liver_2.CEL            8

There are two different ways to form the design matrix. We can either

  1. create a design matrix that includes a contrast coefficient for the treated vs. wild type difference, or
  2. create a design matrix that includes separate coefficients for wild type and mutant mice and then extract the differences we are interested in as contrasts.

For the first approach, the treatment-contrasts parametrization, the design matrix should be as follows:

# Design matrix: Treatment-constrast parametrization
a <- rep(0, length(pData(eset.rma)$sample))  # Create a vector of 0
a[grep("liver", rownames(pData(eset.rma)), ignore.case = T)] <- 1  # Mark 'liver' conditions as '1'
a  # Check your work
## [1] 0 0 0 0 1 1 1 1
design <- cbind(Brain = 1, LiverVsBrain = a)  # Columnwise bind
design  # Check your work
##      Brain LiverVsBrain
## [1,]     1            0
## [2,]     1            0
## [3,]     1            0
## [4,]     1            0
## [5,]     1            1
## [6,]     1            1
## [7,]     1            1
## [8,]     1            1

Here, the first coefficient estimates the mean log-expression for brain tissue and plays a role of an intercept. The second coefficient estimates the difference between brain and liver cells. Differentially expressed genes can be found by:

# Identifying differentially expressed genes
fit <- lmFit(eset.rma, design)
fit <- eBayes(fit)
result <- topTable(fit, number = 100, adjust = "BH", p.value = 0.05, lfc = 1, coef = "LiverVsBrain")  # Get top 100 differentially expressed genes
datatable(result)  # Check your work

For the second approach, the group-means parametrization, the design matrix can be computed by:

# Design matrix: separate group coefficients
design <- cbind(Brain = c(rep(1, 4), rep(0, 4)), Liver = c(rep(0, 4), rep(1, 4)))  # Manually create design matrix
design  # Check
##      Brain Liver
## [1,]     1     0
## [2,]     1     0
## [3,]     1     0
## [4,]     1     0
## [5,]     0     1
## [6,]     0     1
## [7,]     0     1
## [8,]     0     1
design <- model.matrix(~0 + factor(a))  # Another way - factor makes two levels, one for each group
colnames(design) <- c("Brain", "Liver")  # Label columns properly
design  # Check your work
##   Brain Liver
## 1     1     0
## 2     1     0
## 3     1     0
## 4     1     0
## 5     0     1
## 6     0     1
## 7     0     1
## 8     0     1
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$`factor(a)`
## [1] "contr.treatment"

To find differentially expressed genes, group-means parametrization should be converted into contrast matrix:

fit <- lmFit(eset.rma, design)
cont.matrix <- makeContrasts(LivervsBrain = Liver - Brain, levels = design)  # Make matrix of contrasts
cont.matrix  # See what's inside
##        Contrasts
## Levels  LivervsBrain
##   Brain           -1
##   Liver            1
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
result2 <- topTable(fit2, number = 100, adjust = "BH", p.value = 0.05, lfc = 1, coef = "LivervsBrain")
datatable(result2)

The above approaches for two groups extend easily to any number of groups. Suppose that we want to pairwise compare all four conditions. An appropriate design matrix can be created using:

# Several groups
a <- c(1, 1, 2, 2, 3, 3, 4, 4)  # Four conditions, two replicates per condition
design <- model.matrix(~0 + factor(a))  # Now we have four levels for design matrix
colnames(design) <- c("B", "fB", "fL", "L")  # label columns
design  # Check your work
##   B fB fL L
## 1 1  0  0 0
## 2 1  0  0 0
## 3 0  1  0 0
## 4 0  1  0 0
## 5 0  0  1 0
## 6 0  0  1 0
## 7 0  0  0 1
## 8 0  0  0 1
## attr(,"assign")
## [1] 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$`factor(a)`
## [1] "contr.treatment"

We create contrast matrix for three pairwise comparisons, for the sake of visualizing the results in a form of Venn diagram (3 comparisons at a time). Finding differentially expressed genes are the same:

contrast.matrix <- makeContrasts(B - fB, L - fL, B - L, levels = design)  # Make three contrasts
contrast.matrix  # Check your work
##       Contrasts
## Levels B - fB L - fL B - L
##     B       1      0     1
##     fB     -1      0     0
##     fL      0     -1     0
##     L       0      1    -1
fit <- lmFit(eset.rma, design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
result3 <- decideTests(fit2, adjust = "BH", p.value = 0.05, lfc = log2(2))

Use decideTests function to have a summary of the results for Venn diagram, and visualize it:

vennDiagram(result3)  # How genes differentially expressed in different conditions

vennDiagram(result3, include = "up")  # Only upregulated

vennDiagram(result3, include = "down")  # Or downregulated

We can save our results into a file:

write.table(topTable(fit2, coef = "B - L", number = 1000, adjust.method = "BH", p.value = 0.05, lfc = log2(2)), "results/filename.txt", sep = "\t", quote = FALSE)  # vary coefficient to write corresponding results to a tab-separated file