Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201894

Passing multiple column names to "by" in a data.table function

$
0
0

I've read many posts on passing column names to a data.table function, but I did not see a post dealing with passing multiple variables to "by". I commonly use code like this to calculate summary statistics by group.

# Data
library(data.table)
dt=mtcars
setDT(dt)

# Summary Stats Example
dt[cyl==4,.(Count=.N,
    Mean=mean(hp),
    Median=median(hp)),
    by=.(am,vs)]

#    am vs Count   Mean Median
# 1:  1  1     7 80.571     66
# 2:  0  1     3 84.667     95
# 3:  1  0     1 91.000     91

I can't get the following function to work:

# Function
myFun <- function(df,i,j,by){
    df[i==4,.(Count=.N,
      Mean=mean(j),
      Median=median(j)),
      by=.(am,by)]
}
myFun(dt,i='cyl',j='hp',by='vs')

Note that I hard-coded "4" and "am" into the function for this example. get() worked when only using 1 by grouping variable, but failed when multiple grouping variables are used. Guidance on how to properly use get/quote/eval/substitute/parse/as.name/etc when writing data.table functions is appreciated.


Viewing all articles
Browse latest Browse all 201894

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>