Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201894

Speeding up linear model fitting on complete pairwise observations in large sparse matrix in R

$
0
0

I have a numeric data.frame df with 134946 rows x 1938 columns.
99.82% of the data are NA.
For each pair of (distinct) columns "P1" and "P2", I need to find which rows have non-NA values for both and then do some operations on those rows (linear model).

I wrote a script that does this, but it seems quite slow.

This post seems to discuss a related task, but I can't immediately see if or how it can be adapted to my case.

Borrowing the example from that post:

set.seed(54321)
nr = 1000;
nc = 900;
dat = matrix(runif(nr*nc), nrow=nr)
rownames(dat) = paste(1:nr)
colnames(dat) = paste("time", 1:nc)
dat[sample(nr*nc, nr*nc*0.9)] = NA

df <- as.data.frame(dat)
df_ps <- names(df)
N_ps <- length(df_ps)

My script is:

tic = proc.time()

out <- do.call(rbind,sapply(1:(N_ps-1), function(i) {
  if (i/10 == floor(i/10)) {
    cat("\ni = ",i,"\n")
    toc = proc.time();
    show(toc-tic);
  }
  do.call(rbind,sapply((i+1):N_ps, function(j) {
    w <- which(complete.cases(df[,i],df[,j]))
    N <- length(w)
    if (N >= 5) {
      xw <- df[w,i]
      yw <- df[w,j]
      if ((diff(range(xw)) != 0) & (diff(range(yw)) != 0)) {
        s <- summary(lm(yw~xw))
        o <- c(i,j,N,s$adj.r.squared,s$coefficients[2],s$coefficients[4],s$coefficients[8],s$coefficients[1],s$coefficients[3],s$coefficients[7])} else {
          o <- c(i,j,N,rep(NA,6))
        }
    } else {o <- NULL}
    return(o)
  },simplify=F))

}
,simplify=F))

toc = proc.time();
show(toc-tic);

This takes about 10 minutes on my machine.
You can imagine what happens when I need to handle a much larger (although more sparse) data matrix. I never managed to finish the calculation.

Question: do you think this could be done more efficiently?

The thing is I don't know which operations take more time (subsetting of df, in which case I would remove duplications of that? appending matrix data, in which case I would create a flat vector and then convert it to matrix at the end? ...).

Thanks!


Viewing all articles
Browse latest Browse all 201894

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>