I am trying to draw a stratified sample from a data set for which a variable exists that indicates how large the sample size per group should be.
library(dplyr)
# example data
df <- data.frame(id = 1:15,
grp = rep(1:3,each = 5),
frq = rep(c(3,2,4), each = 5))
In this example, grp
refers to the group I want to sample by and frq
is the sample size specificied for that group.
Using split
, I came up with this possible solution, which gives the desired result but seems rather inefficient :
s <- split(df, df$grp)
lapply(s,function(x) sample_n(x, size = unique(x$frq))) %>%
do.call(what = rbind)
Is there a way using just dplyr's group_by
and sample_n
to do this?
My first thought was:
df %>% group_by(grp) %>% sample_n(size = frq)
but this gives the error:
Error in is_scalar_integerish(size) : object 'frq' not found