I have a large data set that I want to sub-sample. My goal is to have one measurement (X) for each individual (ID) per day (datetime).
My original df looks like this:
> df
ID X datetime
555 287 450767.0 2018-03-02 15:00:00
556 287 450769.4 2018-03-02 18:00:00
557 287 450672.8 2018-03-03 00:00:00
558 287 450686.0 2018-03-03 03:00:00
559 287 450678.9 2018-03-03 09:00:00
560 287 450678.9 2018-03-03 12:00:00
561 287 450277.6 2018-03-03 21:00:00
562 287 450255.8 2018-03-04 00:00:00
563 287 450916.5 2018-03-04 21:00:00
564 287 450802.1 2018-03-05 21:00:00
565 287 450780.0 2018-03-06 00:00:00
566 287 451074.5 2018-03-06 21:00:00
567 287 450279.3 2018-03-07 00:00:00
568 287 450899.6 2018-03-07 21:00:00
569 287 450685.7 2018-03-03 03:00:00
570 287 450678.6 2018-03-03 09:00:00
571 287 450678.6 2018-03-03 12:00:00
572 287 450277.6 2018-03-03 21:00:00
573 287 450255.8 2018-03-04 00:00:00
574 287 450916.5 2018-03-04 21:00:00
575 287 450802.4 2018-03-05 21:00:00
576 287 450780.0 2018-03-06 00:00:00
577 287 451074.8 2018-03-06 21:00:00
578 287 450279.1 2018-03-07 00:00:00
805 41 450911.1 2018-03-07 12:00:00
806 41 450891.1 2018-03-07 15:00:00
807 41 450883.9 2018-03-07 18:00:00
I have tried the following:
df.thin<-df %>% group_by(ID) %>% group_by(day=floor_date(datetime, "day")) %>% sample_n(size = 1)
However this results in one measurement per day and not one measurement for each individual per day.
EX:
> df.thin
# A tibble: 6 x 4
# Groups: day [6]
ID X datetime day
<fct> <dbl> <dttm> <dttm>
1 287 450767. 2018-03-02 15:00:00 2018-03-02 00:00:00
2 287 450673. 2018-03-03 00:00:00 2018-03-03 00:00:00
3 287 450916. 2018-03-04 21:00:00 2018-03-04 00:00:00
4 287 450802. 2018-03-05 21:00:00 2018-03-05 00:00:00
5 287 451075. 2018-03-06 21:00:00 2018-03-06 00:00:00
6 41 450891. 2018-03-07 15:00:00 2018-03-07 00:00:00
My goal is the following:
> df.thin.goal
# A tibble: 7 x 4
# Groups: day [6]
ID X datetime day
<fct> <dbl> <dttm> <dttm>
1 287 450769. 2018-03-02 18:00:00 2018-03-02 00:00:00
2 287 450686. 2018-03-03 03:00:00 2018-03-03 00:00:00
3 287 450916. 2018-03-04 21:00:00 2018-03-04 00:00:00
4 287 450802. 2018-03-05 21:00:00 2018-03-05 00:00:00
5 287 451075. 2018-03-06 21:00:00 2018-03-06 00:00:00
6 287 450279. 2018-03-07 00:00:00 2018-03-07 00:00:00
7 41 450884. 2018-03-07 18:00:00 2018-03-07 00:00:00