Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

Fit many glm models: improve speed

$
0
0

I am writing a function to fit many glm models. To just give you some ideas about the function, I include a small section of my code. With the help of several SO users, the function works for my analysis purpose now. However, sometimes, particularly when the sample size is relatively small, it can take quite long time to finish the whole process. To reduce the time, I am considering changing some details of iterative maximization, such as maximum number of iterations. I have not found a way to do it, maybe because I am still not familiar with R terminology. Any suggestions to do this or other ways to reduce time would be appreciated.

all_glm <- function(crude, xlist, data, family = "binomial", ...) {
  # md_lst include formula for many models to be fitted  
  comb_lst <- unlist(lapply(1:n, function(x) combn(xlist, x, simplify=F)), recursive=F)
  md_lst   <- lapply(comb_lst,function(x) paste(crude, "+", paste(x, collapse = "+")))
  models  <- lapply(md_lst, function(x) glm(as.formula(x), family = family, data = data))
  OR      <- unlist(lapply(models, function(x) broom::tidy(x, exponentiate = TRUE)$estimate[2]))

}    

EDIT Thanks to @BenBolker who directed me to the package fastglm, I end up with several r packages which could provide faster alternatives to glm. I have tried fastglm and speedglm. It appears than both are faster than glm on my machine.

library(fastglm)
library(speedglm)
# from 
set.seed(1)
n <- 25000
k <- 500
y <- rbinom(n, size = 1, prob = 0.5)
x <- round( matrix(rnorm(n*k),n,k),digits=3)
colnames(x) <-paste("s",1:k,sep = "")
df <- data.frame(y,x)
fo <- as.formula(paste("y~",paste(paste("s",1:k,sep=""),collapse="+")))   

# Fit three models: 
system.time(m_glm <- glm(fo, data=df, family = binomial))
system.time(m_speedglm <- speedglm(fo, data= df, family = binomial()))
system.time(m_fastglm <- fastglm(x, y, family = binomial()))

> system.time(m_glm <- glm(fo, data=df, family = binomial))
   user  system elapsed 
  56.51    0.22   58.73 
> system.time(m_speedglm <- speedglm(fo, data= df, family = binomial()))
   user  system elapsed 
  17.28    0.04   17.55 
> system.time(m_fastglm <- fastglm(x, y, family = binomial()))
   user  system elapsed 
  23.87    0.09   24.12 

Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>