I have a dataset, df, that has 10,000 rows:
DateA
9/9/2019 7:52:16 PM
9/9/2019 7:52:16 PM
9/9/2019 7:52:17 PM
9/9/2019 7:52:18 PM
9/9/2019 7:52:18 PM
9/9/2019 7:52:19 PM
9/10/2019 1:02:23 AM
9/10/2019 1:02:25 AM
9/10/2019 1:02:26 AM
9/10/2019 1:02:27 AM
9/10/2019 1:02:27 AM
9/10/2019 1:02:29 AM
9/10/2019 1:02:29 AM
9/10/2019 1:03:29 AM
9/10/2019 1:03:29 AM
9/10/2019 1:03:31 AM
9/10/2019 1:03:32 AM
9/10/2019 4:18:48 AM
9/10/2019 4:18:50 AM
9/10/2019 4:18:51 AM
I would like this output:
Group Duration
a 3 sec
b 6 sec
c 3 sec
d 3 sec
I would like the thresh to be set to 1 min or 60 sec. If a lapse of more than 60 seconds is detected, a new group will be made, along with its duration.
dput:
structure(list(DateA = structure(c(12L, 12L, 13L, 14L, 14L, 15L,
1L, 2L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("9/10/2019 1:02:23 AM",
"9/10/2019 1:02:25 AM", "9/10/2019 1:02:26 AM", "9/10/2019 1:02:27 AM",
"9/10/2019 1:02:29 AM", "9/10/2019 1:03:29 AM", "9/10/2019 1:03:31 AM",
"9/10/2019 1:03:32 AM", "9/10/2019 4:18:48 AM", "9/10/2019 4:18:50 AM",
"9/10/2019 4:18:51 AM", "9/9/2019 7:52:16 PM", "9/9/2019 7:52:17 PM",
"9/9/2019 7:52:18 PM", "9/9/2019 7:52:19 PM"), class = "factor")), class = "data.frame", row.names = c(NA,
-20L))
I have tried:
thresh1 <-60
library(data.table)
setDT(df)[, DateA := as.ITime(as.character(DateA))][,
.(Duration = difftime(max(as.POSIXct(DateA)), min(as.POSIXct(DateA)),
unit = 'sec')),.(group = letters[cumsum(c(TRUE, diff(DateA) > thresh1))])]
However, I am doing something wrong because I am only getting the output of 1 row.
group Duration
a 0
Not sure what I am doing wrong? Any suggestion is appreciated.