I am writing a function to fit many glm
models. To just give you some ideas about the function, I include a small section of my code. With the help of several SO users, the function works for my analysis purpose now. However, sometimes, particularly when the sample size is relatively small, it can take quite long time to finish the whole process.
To reduce the time, I am considering changing some details of iterative maximization, such as maximum number of iterations. I have not found a way to do it, maybe because I am still not familiar with R
terminology. Any suggestions to do this or other ways to reduce time would be appreciated.
all_glm <- function(crude, xlist, data, family = "binomial", ...) {
# md_lst include formula for many models to be fitted
comb_lst <- unlist(lapply(1:n, function(x) combn(xlist, x, simplify=F)), recursive=F)
md_lst <- lapply(comb_lst,function(x) paste(crude, "+", paste(x, collapse = "+")))
models <- lapply(md_lst, function(x) glm(as.formula(x), family = family, data = data))
OR <- unlist(lapply(models, function(x) broom::tidy(x, exponentiate = TRUE)$estimate[2]))
}
EDIT
Thanks to @BenBolker who directed me to the package fastglm
, I end up with several r
packages which could provide faster alternatives to glm
. I have tried fastglm
and speedglm
. It appears than both are faster than glm
on my machine.
library(fastglm)
library(speedglm)
# from
set.seed(1)
n <- 25000
k <- 500
y <- rbinom(n, size = 1, prob = 0.5)
x <- round( matrix(rnorm(n*k),n,k),digits=3)
colnames(x) <-paste("s",1:k,sep = "")
df <- data.frame(y,x)
fo <- as.formula(paste("y~",paste(paste("s",1:k,sep=""),collapse="+")))
# Fit three models:
system.time(m_glm <- glm(fo, data=df, family = binomial))
system.time(m_speedglm <- speedglm(fo, data= df, family = binomial()))
system.time(m_fastglm <- fastglm(x, y, family = binomial()))
> system.time(m_glm <- glm(fo, data=df, family = binomial))
user system elapsed
56.51 0.22 58.73
> system.time(m_speedglm <- speedglm(fo, data= df, family = binomial()))
user system elapsed
17.28 0.04 17.55
> system.time(m_fastglm <- fastglm(x, y, family = binomial()))
user system elapsed
23.87 0.09 24.12