Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201945

Complex dplyr group_by filtering

$
0
0

I cannot figure out how to write complex filtering criteria on groups in tidyverse. Consider the example dataframe below:

df <- tibble(
    a = c(1,1,1,2,2,2,2,3), 
    b = c(1,2,3,1,4,50,5,3),
    c = c("PIZZA", "HAM", NA, "COKE", "LOBSTER", "LOBSTER", NA, NA),
)

I want to a to be my grouping variable, and then select the record with the largest record in b such that the entry in c is not NA. Desired output:

tibble(
    a = c(1, 2, 3),
    b = c(2, 50, 3),
    c = c("HAM", "LOBSTER", NA)
)

I can of course do

df %>% group_by(a) %>% filter(b == max(b))

but then I don't satisfy the column c criterion. Some complications:

  1. The number of records between groups is not consistent.
  2. If the c record is NA for all records in the group, then choose the largest b record with NA.

Viewing all articles
Browse latest Browse all 201945

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>