Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

Select rows with common ids in grouped data frame

$
0
0

I am searching for a simpler solution to the following problem. Here is my setup:

test <- tibble::tribble(
  ~group_name, ~id_name, ~varA, ~varB,
     "groupA",   "id_1",     1,   "a",
     "groupA",   "id_2",     4,   "f",
     "groupA",   "id_3",     5,   "g",
     "groupA",   "id_4",     6,   "x",
     "groupA",   "id_4",     6,   "h",
     "groupB",   "id_1",     2,   "s",
     "groupB",   "id_2",    13,   "y",
     "groupB",   "id_4",    14,   "t",
     "groupC",   "id_1",     3,   "d",
     "groupC",   "id_2",     7,   "j",
     "groupC",   "id_3",     8,   "k",
     "groupC",   "id_4",     9,   "l",
     "groupC",   "id_5",     0,   "o",
     "groupC",   "id_6",    12,   "u"
  )

I want to select only those elements in id_name that are common to all groups in group_name - i.e. drop the rows for ids that are not present in all the groups. My actual data is large (200k rows) with anywhere between 4-20 groups (I don't know the number of groups beforehand so the solution must work for any number of groups). The id_name in each group is NOT unique. The desired result would be:

test_result <- tibble::tribble(
  ~group_name, ~id_name, ~varA, ~varB,
     "groupA",   "id_1",     1,   "a",
     "groupA",   "id_2",     4,   "f",
     "groupA",   "id_4",     6,   "x",
     "groupA",   "id_4",     6,   "h",
     "groupB",   "id_1",     2,   "s",
     "groupB",   "id_2",    13,   "y",
     "groupB",   "id_4",    14,   "t",
     "groupC",   "id_1",     3,   "d",
     "groupC",   "id_2",     7,   "j",
     "groupC",   "id_4",     9,   "l",
  )

(the rows with ids absent in at least one group are dropped). Ideally I do not want my output to have the columns joined at the end. I want "simply" to drop the rows missing in any one group but maintain the shape of the dataframe.

And I know that I can extract all the ids from each group, then intersect them all to obtain the list of unique ids present in all groups and then filter the main dataframe for just these IDs. But that sounds like a lot of work ;-)

Any hints would be very much appreciated.


Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>