I have a working solution but am looking for a cleaner, more readable solution that perhaps takes advantage of some of the newer dplyr window functions.
Using the mtcars dataset, if I want to look at the 25th, 50th, 75th percentiles and the mean and count of miles per gallon ("mpg") by the number of cylinders ("cyl"), I use the following code:
library(dplyr)
library(tidyr)
# load data
data("mtcars")
# Percentiles used in calculation
p <- c(.25,.5,.75)
# old dplyr solution
mtcars %>% group_by(cyl) %>%
do(data.frame(p=p, stats=quantile(.$mpg, probs=p),
n = length(.$mpg), avg = mean(.$mpg))) %>%
spread(p, stats) %>%
select(1, 4:6, 3, 2)
# note: the select and spread statements are just to get the data into
# the format in which I'd like to see it, but are not critical
Is there a way I can do this more cleanly with dplyr using some of the summary functions (n_tiles, percent_rank, etc.)? By cleanly, I mean without the "do" statement.
Thank you