I cannot figure out how to write complex filtering criteria on groups in tidyverse. Consider the example dataframe below:
df <- tibble(
a = c(1,1,1,2,2,2,2,3),
b = c(1,2,3,1,4,50,5,3),
c = c("PIZZA", "HAM", NA, "COKE", "LOBSTER", "LOBSTER", NA, NA),
)
I want to a
to be my grouping variable, and then select the record with the largest record in b
such that the entry in c
is not NA
.
Desired output:
tibble(
a = c(1, 2, 3),
b = c(2, 50, 3),
c = c("HAM", "LOBSTER", NA)
)
I can of course do
df %>% group_by(a) %>% filter(b == max(b))
but then I don't satisfy the column c
criterion. Some complications:
- The number of records between groups is not consistent.
- If the c record is
NA
for all records in the group, then choose the largestb
record withNA
.