I am trying to figure out how I can annotate my set of genes using the following approach so that I can perform a pathway analysis of my annotated differentially expressed genes
This is the line of code I am having trouble with:
ann <- select(org.Mm.eg.db,keys=rownames(fit.cont),columns=c("ENTREZID","SYMBOL","GENENAME"))
I am trying to generate entrez IDs, symbols, and gene names for each of the genes in my final object.
Basically - I started with an original matrix 'readcounts_g', which has 10 samples of RNA-seq read counts data for 20k genes and rownames corresponding to ENSEMBL IDs.
Next, i executed the following lines of code:
z <- DGEList(readcounts_g)
z <- calcNormFactors(z)
v2 <- voom(z,design.mat,plot = TRUE)
fit2 <- lmFit(v2)
Generating a design matrix that specifies that first 5 samples are KO and second 5 are WT for our experimental condition.
design.mat <- cbind(c(1,1,1,1,1,0,0,0,0,0), c(0,0,0,0,0,1,1,1,1,1))
colnames(design.mat) <- c("KO_GCC", "WT_GCC")
design.mat
cont.matrix2 <- makeContrasts(A.WTvsKO2=KO_GCC - WT_GCC,levels = design.mat)
fit.cont2 <- contrasts.fit(fit2, cont.matrix)
fit.cont2 <- eBayes(fit.cont2)
summa.fit2 <- decideTests(fit.cont2)
ann <- select(org.Mm.eg.db,keys=rownames(fit.cont),columns=c("ENTREZID","SYMBOL","GENENAME"))
Ultimately, I want to be able to use:
topTable(fit.cont2,coef="A.WTvsKO",sort.by="p")
- I want this so that I can spit out a table of the top differentially expressed genes along with their gene names etc.
- I also want to be able to plot volcano plots and other graphics with the gene names rather than the ID numbers that I currently have
Another concern was whether or not when I execute the goana
function, if it will be able to properly recognize which genes are in my dataset despite only have the ENSEMBL ID numbers instead of the ENTREZ ID numbers
go <- goana(fit.cont2, coef="A.WTvsKO",species = "Mm")
Here is an example of the set that I am working with:
library(dplyr)
data_g = tibble(geneID=sample(1:3),
s1=rpois(3,10),s2=rpois(3,15),
s3=rpois(3,20),s4=rpois(3,25))
data_g$gene_name = c("ENSMUSG00000042638","ENSMUSG00000030214","ENSMUSG00000030222")
rownames(data_g) = data_g$gene_name
data = data_g[,-1]
data_1 = data[,-5]
rownames(data_1) = data$gene_name
data_1 = as.matrix(data_1)
From here, data_1 is essentially a truncated version of the matrix that I am working with for my real data.
I am trying to annotate this matrix with Entrez gene IDs so i can use the goana function in limma.
I thought this below line would work, where i specify the 'key types' as ensemble ID #s and it returns entrez IDs:
ann <- select(org.Mm.eg.db,keys = rownames(fit.cont2),keytypes="ENSEMBL", columns=c("ENTREZID"))