I have a dataframe df1
that summarizes different observations of an individual ID
overtime but rounded at fixed 45-minutes intervals starting at 00:00:00
(00:00:00
, 00:45:00
, etc.). As an example:
df1<- data.frame(DateTime45=c("2017-07-09 00:00:00","2017-07-09 00:45:00","2017-07-09 02:15:00","2017-07-09 03:45:00"),
ID=c("A","A","A","A"),
VariableX=c(0,2,0,4))
df1
DateTime45 ID VariableX
1 2017-07-09 00:00:00 A 0
2 2017-07-09 00:45:00 A 2
3 2017-07-09 02:15:00 A 0
4 2017-07-09 03:45:00 A 4
I have another dataframe df2
in which I have other info (vedba
) about this individual also overtime, but in this case without 45-minutes time intervals. As an example:
df2<- data.frame(DateTime= c("2017-07-08 23:40:57.245","2017-07-08 23:58:12.945","2017-07-09 00:01:00.345","2017-07-09 00:07:12.845","2017-07-09 00:28:34.845","2017-07-09 00:31:46.567","2017-07-09 00:53:21.345","2017-07-09 01:01:34.545","2017-07-09 01:09:12.246","2017-07-09 01:23:12.321","2017-07-09 01:34:26.687","2017-07-09 01:57:08.687","2017-07-09 02:05:23.789","2017-07-09 02:32:24.789","2017-07-09 02:42:34.536","2017-07-09 02:59:00.098","2017-07-09 03:03:01.434","2017-07-09 03:11:38.987","2017-07-09 03:23:31.345","2017-07-09 03:28:21.345","2017-07-09 03:42:53.345"),
ID=c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A"),
vedba=c(1.87,2.3,0.3,0.67,1.3,2.1,3.6,0.1,0.8,1.3,2.4,1.5,1.23,2.02,1.89,0.78,1.11,2.13,1.20,0.34,0.94))
df2$DateTime<- as.POSIXct(df2$DateTime, format="%Y-%m-%d %H:%M:%OS",tz="UTC")
df2
DateTime ID vedba
1 2017-07-08 23:40:57.244 A 1.87
2 2017-07-08 23:58:12.944 A 2.30
3 2017-07-09 00:01:00.345 A 0.30
4 2017-07-09 00:07:12.845 A 0.67
. . . . .
. . . . .
I want to calculate for each row in df1
, the mean vedba
using values from df2
. The key is that I want to consider that for each time in df1
, the window encompasses between 22 minutes and 30 seconds before and after (that is, df1$DateTime45
is the central value of the range). For instance, the time-range for df1[1,1]
(2017-07-09 00:00:00
) is between 2017-07-08 23:37:30
and 2017-07-09 00:22:30
.
In this example, I would expect to get this:
df3
DateTime45 ID VariableX meanVedba n_vedba
1 2017-07-09 00:00:00 A 0 1.2850000 4
2 2017-07-09 00:45:00 A 2 1.7750000 4
3 2017-07-09 02:15:00 A 0 1.5833333 3
4 2017-07-09 03:45:00 A 4 0.8266667 3
*Note: I include an n_vedba
variable to check if the code is taking the right number of rows from df2
.
My try was this code:
setDT(df1)[, DateTime45 := ymd_hms(DateTime45)]
setDT(df2)[, dt_floor := round_date(ymd_hms(DateTime), unit = "45 mins")]
df3<- df2[df1, .(meanVedba = mean(vedba),
n_vedba=.N),
on = .(ID, dt_floor = DateTime45), by = .EACHI]
df3
ID dt_floor meanVedba n_vedba
1: A 2017-07-09 00:00:00 0.4850000 2
2: A 2017-07-09 00:45:00 2.3333333 3
3: A 2017-07-09 02:15:00 NA 0
4: A 2017-07-09 03:45:00 0.8266667 3
However, as you can see, I don't get what I would expect.
Does anyone know why and how to change the code in order to accomplish what I want?
Extra comment
When I have hour-intervals instead of 45-minutes intervals the code I showed works.
- I create the dataframes
df1<- data.frame(DateTime=c("2017-07-09 00:00:00","2017-07-09 01:00:00","2017-07-09 02:00:00","2017-07-09 03:00:00","2017-07-09 04:00:00"),
ID=c("A","A","A","A","A"),
VariableX=c(0,2,0,4,7))
df1$DateTime<- as.POSIXct(df1$DateTime45, format="%Y-%m-%d %H:%M:%S",tz="UTC")
df1
DateTime ID VariableX
1 2017-07-09 00:00:00 A 0
2 2017-07-09 01:00:00 A 2
3 2017-07-09 02:00:00 A 0
4 2017-07-09 03:00:00 A 4
5 2017-07-09 04:00:00 A 7
df2<- data.frame(DateTime= c("2017-07-08 23:40:57.245","2017-07-08 23:58:12.945","2017-07-09 00:01:00.345","2017-07-09 00:07:12.845","2017-07-09 00:28:34.845","2017-07-09 00:31:46.567","2017-07-09 00:53:21.345","2017-07-09 01:01:34.545","2017-07-09 01:09:12.246","2017-07-09 01:23:12.321","2017-07-09 01:34:26.687","2017-07-09 01:57:08.687","2017-07-09 02:05:23.789","2017-07-09 02:32:24.789","2017-07-09 02:42:34.536","2017-07-09 02:59:00.098","2017-07-09 03:03:01.434","2017-07-09 03:11:38.987","2017-07-09 03:23:31.345","2017-07-09 03:28:21.345","2017-07-09 03:42:53.345"),
ID=c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A"),
vedba=c(1.87,2.3,0.3,0.67,1.3,2.1,3.6,0.1,0.8,1.3,2.4,1.5,1.23,2.02,1.89,0.78,1.11,2.13,1.20,0.34,0.94))
df2$DateTime<- as.POSIXct(df2$DateTime, format="%Y-%m-%d %H:%M:%OS",tz="UTC")
df2
DateTime ID vedba dt_floor
1: 2017-07-08 23:40:57 A 1.87 2017-07-09 00:00:00
2: 2017-07-08 23:58:12 A 2.30 2017-07-09 00:00:00
3: 2017-07-09 00:01:00 A 0.30 2017-07-09 00:00:00
4: 2017-07-09 00:07:12 A 0.67 2017-07-09 00:00:00
. . . . .
. . . . .
- I calculate
vedba
for hourly-bin intervals
setDT(df1)[, DateTime45 := ymd_hms(DateTime)]
setDT(df2)[, dt_floor := round_date(ymd_hms(DateTime), unit = "hour")]
df3<- df2[df1, .(meanVedba = mean(vedba),
n_vedba=.N),
on = .(ID, dt_floor = DateTime), by = .EACHI]
df3
ID dt_floor meanVedba n_vedba
1: A 2017-07-09 00:00:00 1.288000 5
2: A 2017-07-09 01:00:00 1.580000 5
3: A 2017-07-09 02:00:00 1.710000 3
4: A 2017-07-09 03:00:00 1.352857 7
5: A 2017-07-09 04:00:00 0.940000 1