Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 202012

Bootstrapping by multiple groups in the tidyverse: rsample vs. broom

$
0
0

In this SO Question bootstrapping by several groups and subgroups seemed to be easy using the broom::bootstrap function specifying the by_group argument with TRUE.

My desired output is a nested tibble with n rows where the data column contains the bootstrapped data generated by each bootstrap call (and each group and subgroup has the same amount of cases as in the original data).

In broom I did the following:

# packages
library(dplyr)
library(purrr)
library(tidyr)
library(tibble)
library(rsample)
library(broom)

# some data to bootstrap
set.seed(123)
data <- tibble(
  group=rep(c('group1','group2','group3','group4'), 25),
  subgroup=rep(c('subgroup1','subgroup2','subgroup3','subgroup4'), 25),
  v1=rnorm(100),
  v2=rnorm(100)
)

# the actual approach using broom::bootstrap
tibble(id = 1:100) %>% 
  mutate(data = map(id, ~ {data %>%
      group_by(group,subgroup) %>% 
      broom::bootstrap(100, by_group=TRUE)}))

Since the broom::bootstrap function is deprecated, I rebuild my approach with the desired output using rsample::bootstraps. It seems to be much more complicated to get my desired output. Am I doing something wrong or have things gotten more complicated in the tidyverse when generating grouped bootstraps?

data %>%
  dplyr::mutate(group2 = group,
                subgroup2 = subgroup) %>% 
  tidyr::nest(-group2, -subgroup2) %>% 
  dplyr::mutate(boot  = map(data, ~ rsample::bootstraps(., 100))) %>% 
  pull(boot) %>% 
  purrr::map(., "splits") %>% 
  transpose %>% 
  purrr::map(., ~ purrr::map_dfr(., rsample::analysis)) %>% 
  tibble(id = 1:length(.), data = .)

Viewing all articles
Browse latest Browse all 202012

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>