Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201945

How can I annotate my genes in R using the select(org.Mm.eg.db) function if the original information I have for each gene is their ENSEMBL ID?

$
0
0

I am trying to figure out how I can annotate my set of genes using the following approach so that I can perform a pathway analysis of my annotated differentially expressed genes

This is the line of code I am having trouble with:

ann <- select(org.Mm.eg.db,keys=rownames(fit.cont),columns=c("ENTREZID","SYMBOL","GENENAME"))

I am trying to generate entrez IDs, symbols, and gene names for each of the genes in my final object.

Basically - I started with an original matrix 'readcounts_g', which has 10 samples of RNA-seq read counts data for 20k genes and rownames corresponding to ENSEMBL IDs.

Next, i executed the following lines of code:

z <- DGEList(readcounts_g)
z <- calcNormFactors(z)
v2 <- voom(z,design.mat,plot = TRUE)
fit2 <- lmFit(v2)

Generating a design matrix that specifies that first 5 samples are KO and second 5 are WT for our experimental condition.

design.mat <- cbind(c(1,1,1,1,1,0,0,0,0,0), c(0,0,0,0,0,1,1,1,1,1))
colnames(design.mat) <- c("KO_GCC", "WT_GCC")
design.mat

cont.matrix2 <- makeContrasts(A.WTvsKO2=KO_GCC - WT_GCC,levels = design.mat)
fit.cont2 <- contrasts.fit(fit2, cont.matrix)
fit.cont2 <- eBayes(fit.cont2)
summa.fit2 <- decideTests(fit.cont2)
ann <- select(org.Mm.eg.db,keys=rownames(fit.cont),columns=c("ENTREZID","SYMBOL","GENENAME"))

Ultimately, I want to be able to use:

topTable(fit.cont2,coef="A.WTvsKO",sort.by="p")
  1. I want this so that I can spit out a table of the top differentially expressed genes along with their gene names etc.
  2. I also want to be able to plot volcano plots and other graphics with the gene names rather than the ID numbers that I currently have

Another concern was whether or not when I execute the goana function, if it will be able to properly recognize which genes are in my dataset despite only have the ENSEMBL ID numbers instead of the ENTREZ ID numbers

go <- goana(fit.cont2, coef="A.WTvsKO",species = "Mm")

Here is an example of the set that I am working with:

library(dplyr)

data_g = tibble(geneID=sample(1:3),
              s1=rpois(3,10),s2=rpois(3,15),
                s3=rpois(3,20),s4=rpois(3,25))

data_g$gene_name = c("ENSMUSG00000042638","ENSMUSG00000030214","ENSMUSG00000030222")
rownames(data_g) = data_g$gene_name

data = data_g[,-1]
data_1 = data[,-5]
rownames(data_1) = data$gene_name
data_1 = as.matrix(data_1)

From here, data_1 is essentially a truncated version of the matrix that I am working with for my real data.

I am trying to annotate this matrix with Entrez gene IDs so i can use the goana function in limma.

I thought this below line would work, where i specify the 'key types' as ensemble ID #s and it returns entrez IDs:

ann <- select(org.Mm.eg.db,keys = rownames(fit.cont2),keytypes="ENSEMBL", columns=c("ENTREZID"))


Viewing all articles
Browse latest Browse all 201945

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>