Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

Calculate the number of occurrences of a specific event in the past AND future with groupings

$
0
0

this question is a modification of a problem I posted here where I have occurrences of a specific type on different days, but this time they are assigned to multiple users, for example:

df = data.frame(user_id = c(rep(1:2, each=5)),
            cancelled_order = c(rep(c(0,1,1,0,0), 2)),
            order_date = as.Date(c('2015-01-28', '2015-01-31', '2015-02-08', '2015-02-23',  '2015-03-23',
                                   '2015-01-25', '2015-01-28', '2015-02-06', '2015-02-21',  '2015-03-26')))


user_id cancelled_order order_date
      1               0 2015-01-28
      1               1 2015-01-31
      1               1 2015-02-08
      1               0 2015-02-23
      1               0 2015-03-23
      2               0 2015-01-25
      2               1 2015-01-28
      2               1 2015-02-06
      2               0 2015-02-21
      2               0 2015-03-26

I'd like to calculate

1) the number of cancelled orders that each customer is going to have in the next x days (e.g. 7, 14), excluding the current one and

1) the number of cancelled orders that each customer had in the past x days (e.g. 7, 14) , excluding the current one.

The desired output would look like this:

solution
user_id cancelled_order order_date plus14 minus14
      1               0 2015-01-28      2       0
      1               1 2015-01-31      1       0
      1               1 2015-02-08      0       1
      1               0 2015-02-23      0       0
      1               0 2015-03-23      0       0
      2               0 2015-01-25      2       0
      2               1 2015-01-28      1       0
      2               1 2015-02-06      0       1
      2               0 2015-02-21      0       0
      2               0 2015-03-26      0       0

The solution that is perfectly fit for this purpose was presented by @joel.wilson using data.table

library(data.table)
vec <- c(14, 30) # Specify desired ranges
setDT(df)[, paste0("x", vec) := 
        lapply(vec, function(i) sum(df$cancelled_order[between(df$order_date, 
                                                 order_date, 
                                                 order_date + i, # this part can be changed to reflect the past date ranges
                                                 incbounds = FALSE)])),
        by = order_date]

However, it does not take into account grouping by user_id. When I tried to modify the formula by adding this grouping as by = c("user_id", "order_date") or by = list(user_id, order_date), it did not work. It seems it is something very basic, any hints on how to get around this detail?

Also, keep in mind that I'm after a solution that works, even if it is not based on the above code or data.table at all!

Thanks!


Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>