I am working with large matrices (~100k x 10k), some of which are dense, some of which are sparse. I am trying to use the Matrix class matrices as those implement a variety of types. However, one operation--selecting rows from matrices--seems to be orders of magnitude slower for dgeMatrix objects, the standard dense matrices in Matrix package:
library(Matrix)
library(microbenchmark)
A <- matrix(runif(10000*1000), 10000, 1000)
B <- Matrix(runif(10000*1000), 10000, 1000)
i <- 1234
j <- 987
microbenchmark(A[i,], A[,j], B[i,], B[,j])
This gives me
Unit: microseconds
expr min lq mean median uq max neval
A[i, ] 12.036 32.4635 48.46218 55.520 58.2860 96.707 100
A[, j] 28.681 36.2045 41.96735 43.147 45.7985 68.484 100
B[i, ] 35911.152 36444.2845 40367.11707 36637.658 37673.4705 63981.955 100
B[, j] 36011.814 36417.5715 41399.72686 36731.589 37991.5010 60347.724 100
As you can see, the mean indexing of B
(dgeMatrix) is almost 1000x slower than of A
(base R matrix).
Am I doing something wrong? Am I using an inefficient class? Are there faster ways to mix sparse and dense matrices?