I have a dataframe of phone numbers, emails and names. Some emails are duplicated, with different name spellings. I don't really care about which name remains, so I am grouping by email, and summarizing to choose first observation of name and phone numbers. However, there are some missing email addresses, but I want to keep them from grouping together so that I can keep the unique phone numbers. Using a simplified example, my data is:
data <- data.frame(x=c(1,2,3,4,5,5,5,6), y=c("a","b","c",NA,"d","d","d",NA))
data %>% group_by(y) %>% summarize(x=first(x))
I lose the number 6 when I do this. How do I keep the NAs from grouping together and being summarized?