I am working on a project where I used LASSO for variable selection across a dataset in which I needed to do multiple imputation to account for missingness in several variables. Combining LASSO results across the imputations was achieved using a threshold (i.e. if a variable is selected by LASSO in 3 of the 5 imputations, then the variable is assumed to be significant). The selected variables were then stored in a character string (named "threshold.variable.names") and pasted into a formula format using the following code:
formula <- as.formula(paste0('OUTCOME','~',paste0(threshold.variable.names,collapse = "+")))
and yields exactly what I'm looking for (i.e. "OUTCOME~Var1+Var2+...")
When I then try to run my regression on each of the imputed datasets using the MICE package's "with()" command fit <- with(my.mids.object,glm(formula,family=binomial(link="logit")))
. However, I get the following error: Error in eval(predvars, data, env) : object 'OUTCOME' not found
. In fact, it cannot recognize any of the variables, not just the OUTCOME.
Furthermore, I do not get any error when I try to run my pre-specified formula on one of the individually completed datasets: fit <- glm(formula,family = binomial,data = complete(my.mids.object,1))
Does anyone have some insight as to why this error is occurring and how I might be able to fix it? Thanks in advance.