I am trying to merge two data frames based on a subset on common IDs. Let me demonstrate:
library(tidyverse)
set.seed(42)
df = list(id = c(1,2,3,4,1,2,2,2,1,1),
group = c("A","A","A","A","B","B","B","B","C","C"),
val = c(round(rnorm(10,6,6),0))
) %>%
tbl_df()
df_na = list(id = c(1,1,1,2,3,3,4,5,5,5),
group = c(rep(NA,10)),
val = c(rep(NA,10))
) %>%
tbl_df()
df
contains data and id
s, while df_na
only contains id
s and NA
s. I would like to create a combined data frame that contains all the information of df
and add the NA
s by group
and id
, i.e. for each group
in df
find which id
s are present in both df
and df_na
and merge.
If I was doing this manually, i.e. group for group, I would use something like this:
A_dist = df %>% filter(group=="A") %>%
distinct(id) %>%
pull()
df_A_comb = df_na %>%
filter(id %in% A_dist) %>%
bind_rows(filter(df, group=="A"))
# A tibble: 11 x 3
id group val
<dbl> <chr> <dbl>
1 1 NA NA
2 1 NA NA
3 1 NA NA
4 2 NA NA
5 3 NA NA
6 3 NA NA
7 4 NA NA
8 1 A 14
9 2 A 3
10 3 A 8
11 4 A 10
But obvioulsy, I would rather automate this. As an emerging fan of the tidyverse
, I'm trying to get my head around purrr::map
. I can create a vector of id
s for each group
.
df_dist = df %>%
split(.$group) %>%
map(distinct, id) %>%
map("id")
> df_dist
$A
[1] 1 2 3 4
$B
[1] 1 2
$C
[1] 1
But translating my dplyr
approach is more complicated and produces an error message earlier on.
###this approach doesn't work...
df_comb = df_na %>%
map(filter, id %in% df_dist)# %>%
...
Error in UseMethod("filter_") :
no applicable method for 'filter_' applied to an object of class "c('double', 'numeric')"
Any help will be much appreaciated!