I am trying to select all rows in a repeated measures dataset that belong to a randomly selected group of people. I am trying to do it entirely in the tidyverse
(for my own edification) but find myself having to fall back on base R functions. Here is how I do it with a combination of base R and dplyr
commands.
set.seed(145)
df <- data.frame(id = rep(letters[1:10], each = 4),
score = rnorm(40))
ids <- sample(unique(df$id), 3)
smallDF <- df %>% dplyr::filter(id %in% ids)
smallDF
# id score
# 1 a 0.6869129
# 2 a 1.0663631
# 3 a 0.5367006
# 4 a 1.9060287
# 5 c 1.1677516
# 6 c 0.7926794
# 7 c -1.2135038
# 8 c -1.0056141
# 9 d 0.2085696
# 10 d 0.4461776
# 11 d -0.6208060
# 12 d 0.4413429
I can sample randomly from the id
identifier using dplyr
...
df %>% distinct(id) %>% sample_n(3)
# id
# 1 e
# 2 c
# 3 b
...but the fact that the output is a dataframe/tibble is making it difficult for me to get to that next step where I then filter the original df
by the randomly selected id identifiers.
Can anyone help?