In this SO Question bootstrapping by several groups and subgroups seemed to be easy using the broom::bootstrap
function specifying the by_group
argument with TRUE
.
My desired output is a nested tibble with n rows where the data column contains the bootstrapped data generated by each bootstrap call (and each group and subgroup has the same amount of cases as in the original data).
In broom
I did the following:
# packages
library(dplyr)
library(purrr)
library(tidyr)
library(tibble)
library(rsample)
library(broom)
# some data to bootstrap
set.seed(123)
data <- tibble(
group=rep(c('group1','group2','group3','group4'), 25),
subgroup=rep(c('subgroup1','subgroup2','subgroup3','subgroup4'), 25),
v1=rnorm(100),
v2=rnorm(100)
)
# the actual approach using broom::bootstrap
tibble(id = 1:100) %>%
mutate(data = map(id, ~ {data %>%
group_by(group,subgroup) %>%
broom::bootstrap(100, by_group=TRUE)}))
Since the broom::bootstrap
function is deprecated, I rebuild my approach with the desired output using rsample::bootstraps
. It seems to be much more complicated to get my desired output. Am I doing something wrong or have things gotten more complicated in the tidyverse when generating grouped bootstraps?
data %>%
dplyr::mutate(group2 = group,
subgroup2 = subgroup) %>%
tidyr::nest(-group2, -subgroup2) %>%
dplyr::mutate(boot = map(data, ~ rsample::bootstraps(., 100))) %>%
pull(boot) %>%
purrr::map(., "splits") %>%
transpose %>%
purrr::map(., ~ purrr::map_dfr(., rsample::analysis)) %>%
tibble(id = 1:length(.), data = .)