this question is a modification of a problem I posted here where I have occurrences of a specific type on different days, but this time they are assigned to multiple users, for example:
df = data.frame(user_id = c(rep(1:2, each=5)),
cancelled_order = c(rep(c(0,1,1,0,0), 2)),
order_date = as.Date(c('2015-01-28', '2015-01-31', '2015-02-08', '2015-02-23', '2015-03-23',
'2015-01-25', '2015-01-28', '2015-02-06', '2015-02-21', '2015-03-26')))
user_id cancelled_order order_date
1 0 2015-01-28
1 1 2015-01-31
1 1 2015-02-08
1 0 2015-02-23
1 0 2015-03-23
2 0 2015-01-25
2 1 2015-01-28
2 1 2015-02-06
2 0 2015-02-21
2 0 2015-03-26
I'd like to calculate
1) the number of cancelled orders that each customer is going to have in the next x days (e.g. 7, 14), excluding the current one and
1) the number of cancelled orders that each customer had in the past x days (e.g. 7, 14) , excluding the current one.
The desired output would look like this:
solution
user_id cancelled_order order_date plus14 minus14
1 0 2015-01-28 2 0
1 1 2015-01-31 1 0
1 1 2015-02-08 0 1
1 0 2015-02-23 0 0
1 0 2015-03-23 0 0
2 0 2015-01-25 2 0
2 1 2015-01-28 1 0
2 1 2015-02-06 0 1
2 0 2015-02-21 0 0
2 0 2015-03-26 0 0
The solution that is perfectly fit for this purpose was presented by @joel.wilson using data.table
library(data.table)
vec <- c(14, 30) # Specify desired ranges
setDT(df)[, paste0("x", vec) :=
lapply(vec, function(i) sum(df$cancelled_order[between(df$order_date,
order_date,
order_date + i, # this part can be changed to reflect the past date ranges
incbounds = FALSE)])),
by = order_date]
However, it does not take into account grouping by user_id
. When I tried to modify the formula by adding this grouping as by = c("user_id", "order_date")
or by = list(user_id, order_date)
, it did not work. It seems it is something very basic, any hints on how to get around this detail?
Also, keep in mind that I'm after a solution that works, even if it is not based on the above code or data.table
at all!
Thanks!