Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201945

How to not include NA observations in grouping when using group_by() followed by summarize() with dplyr?

$
0
0

I have a dataframe of phone numbers, emails and names. Some emails are duplicated, with different name spellings. I don't really care about which name remains, so I am grouping by email, and summarizing to choose first observation of name and phone numbers. However, there are some missing email addresses, but I want to keep them from grouping together so that I can keep the unique phone numbers. Using a simplified example, my data is:

data <- data.frame(x=c(1,2,3,4,5,5,5,6), y=c("a","b","c",NA,"d","d","d",NA))
data %>% group_by(y) %>% summarize(x=first(x))

I lose the number 6 when I do this. How do I keep the NAs from grouping together and being summarized?


Viewing all articles
Browse latest Browse all 201945

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>