I've read many posts on passing column names to a data.table function, but I did not see a post dealing with passing multiple variables to "by". I commonly use code like this to calculate summary statistics by group.
# Data
library(data.table)
dt=mtcars
setDT(dt)
# Summary Stats Example
dt[cyl==4,.(Count=.N,
Mean=mean(hp),
Median=median(hp)),
by=.(am,vs)]
# am vs Count Mean Median
# 1: 1 1 7 80.571 66
# 2: 0 1 3 84.667 95
# 3: 1 0 1 91.000 91
I can't get the following function to work:
# Function
myFun <- function(df,i,j,by){
df[i==4,.(Count=.N,
Mean=mean(j),
Median=median(j)),
by=.(am,by)]
}
myFun(dt,i='cyl',j='hp',by='vs')
Note that I hard-coded "4" and "am" into the function for this example. get()
worked when only using 1 by grouping variable, but failed when multiple grouping variables are used. Guidance on how to properly use get/quote/eval/substitute/parse/as.name/etc when writing data.table functions is appreciated.