I have a large data.table
. I require:
- the rolling quantile of a numeric column
- the rolling mean of the numeric column applied on values above the (moving) quantile threshold
library(data.table)
set.seed(101)
data <- data.table(group=c(rep("A",10),rep("B",7)), value=rnorm(17))
> data
group value
1: A -0.3260365
2: A 0.5524619
3: A -0.6749438
4: A 0.2143595
5: A 0.3107692
6: A 1.1739663
7: A 0.6187899
8: A -0.1127343
9: A 0.9170283
10: A -0.2232594
11: B 0.5264481
12: B -0.7948444
13: B 1.4277555
14: B -1.4668197
15: B -0.2366834
16: B -0.1933380
17: B -0.8497547
To determine the rolling quantile of the 'value' column I am using the function runquantile()
from the package caTools
which takes ~1 minute to execute on my data set.
For the same rolling window (here k=4
), how to obtain the rolling mean value above the moving quantile, when computation time is a concern? In the example, the result should look something like column 'mean_above_q'.
library(caTools)
data[,rolling_q := c(rep(NA,3),runquantile(value,k=4,0.4,endrule="trim")),group]
> data
group value rolling_q mean_above_q
1: A -0.3260365 NA NA
2: A 0.5524619 NA NA
3: A -0.6749438 NA NA
4: A 0.2143595 -0.21795730 -0.50049020
5: A 0.3107692 0.23364141 -0.23029220
6: A 1.1739663 0.23364141 -0.23029220
7: A 0.6187899 0.37237334 0.26256430
8: A -0.1127343 0.37237334 0.09901745
9: A 0.9170283 0.67843754 0.25302780
10: A -0.2232594 0.03357052 -0.16799680
11: B 0.5264481 NA NA
12: B -0.7948444 NA NA
13: B 1.4277555 NA NA
14: B -1.4668197 -0.53058593 -1.13083200
15: B -0.2366834 -0.68321222 -1.13083200
16: B -0.1933380 -0.22801430 -0.85175150
17: B -0.8497547 -0.72714047 -1.15828700
Thank you very much.