I am searching for a simpler solution to the following problem. Here is my setup:
test <- tibble::tribble(
~group_name, ~id_name, ~varA, ~varB,
"groupA", "id_1", 1, "a",
"groupA", "id_2", 4, "f",
"groupA", "id_3", 5, "g",
"groupA", "id_4", 6, "x",
"groupA", "id_4", 6, "h",
"groupB", "id_1", 2, "s",
"groupB", "id_2", 13, "y",
"groupB", "id_4", 14, "t",
"groupC", "id_1", 3, "d",
"groupC", "id_2", 7, "j",
"groupC", "id_3", 8, "k",
"groupC", "id_4", 9, "l",
"groupC", "id_5", 0, "o",
"groupC", "id_6", 12, "u"
)
I want to select only those elements in id_name
that are common to all groups in group_name
- i.e. drop the rows for ids that are not present in all the groups. My actual data is large (200k rows) with anywhere between 4-20 groups (I don't know the number of groups beforehand so the solution must work for any number of groups). The id_name
in each group is NOT unique. The desired result would be:
test_result <- tibble::tribble(
~group_name, ~id_name, ~varA, ~varB,
"groupA", "id_1", 1, "a",
"groupA", "id_2", 4, "f",
"groupA", "id_4", 6, "x",
"groupA", "id_4", 6, "h",
"groupB", "id_1", 2, "s",
"groupB", "id_2", 13, "y",
"groupB", "id_4", 14, "t",
"groupC", "id_1", 3, "d",
"groupC", "id_2", 7, "j",
"groupC", "id_4", 9, "l",
)
(the rows with ids absent in at least one group are dropped). Ideally I do not want my output to have the columns joined at the end. I want "simply" to drop the rows missing in any one group but maintain the shape of the dataframe.
And I know that I can extract all the ids from each group, then intersect them all to obtain the list of unique ids present in all groups and then filter the main dataframe for just these IDs. But that sounds like a lot of work ;-)
Any hints would be very much appreciated.