I am trying to learn how to automate running 3 or more regression models over subsets of a dataset using the purrr and broom packages in R. I am doing this with the nest %>% mutate(map()) %>% unnest() flow in mind.
I am able to replicate examples online when there is only one regression model that is applied to several data subsets. However, I am running into problems when I have more than one regression model in my function.
What I tried to do
estimate_model <- function(df) {
model1 <- lm(mpg ~ wt, data = df)
model2 <- lm(mpg ~ wt + gear, data = df)
model3 <- lm(mpg ~ wt + gear + vs, data = df)
ols_1dep_3specs <- mtcars %>%
nest(-cyl) %>%
estimates = map(data, estimate_model), # want to run several models at once
coef_wt = map(estimate, ~pluck(coef(.), "wt")), # coefficient of wt only
se_wt = map(estimate, ~pluck(tidy(.), "std.error")[[2]]), # se of wt only
rsq = map(model, ~pluck(glance(.), "r.squared")),
arsq = map(model, ~pluck(glance(.), "adj.r.squared"))
) %>%
unnest(coef_wt, se_wt, rsq, arsq)
Unfortunately, this seems to only work when the function estimate_model
only contains one regression model. Any advice on how one would go about writing code when there are several specifications? Open to suggestions outside the nest() %>% mutate(map()) %>% nest() framework.
The following code sort of gets at what I am hoping to achieve but it involves a lot of repetition.
estimate_model1 <- function(df) {
model1 <- lm(mpg ~ wt, data = df)
estimate_model2 <- function(df) {
model2 <- lm(mpg ~ wt + gear, data = df)
estimate_model3 <- function(df) {
model3 <- lm(mpg ~ wt + gear + vs, data = df)
ols_1dep_3specs <- mtcars %>%
nest(-cyl) %>%
mutate(model_1 = map(data, estimate_model1),
model_2 = map(data, estimate_model2),
model_3 = map(data, estimate_model3)) %>%
mutate(coef_wt_1 = map_dbl(model_1, ~pluck(coef(.), "wt")),
coef_wt_2 = map_dbl(model_2, ~pluck(coef(.), "wt")),
coef_wt_3 = map_dbl(model_3, ~pluck(coef(.), "wt")),
rsq_1 = map_dbl(model_1, ~pluck(glance(.), "r.squared")),
rsq_2 = map_dbl(model_2, ~pluck(glance(.), "r.squared")),
rsq_3 = map_dbl(model_3, ~pluck(glance(.), "r.squared"))) %>%
dplyr::select(starts_with("coef_wt"), starts_with("rsq"))