Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201894

Sub-sampling by ID and Day

$
0
0

I have a large data set that I want to sub-sample. My goal is to have one measurement (X) for each individual (ID) per day (datetime).

My original df looks like this:

> df
     ID        X            datetime
555 287 450767.0 2018-03-02 15:00:00
556 287 450769.4 2018-03-02 18:00:00
557 287 450672.8 2018-03-03 00:00:00
558 287 450686.0 2018-03-03 03:00:00
559 287 450678.9 2018-03-03 09:00:00
560 287 450678.9 2018-03-03 12:00:00
561 287 450277.6 2018-03-03 21:00:00
562 287 450255.8 2018-03-04 00:00:00
563 287 450916.5 2018-03-04 21:00:00
564 287 450802.1 2018-03-05 21:00:00
565 287 450780.0 2018-03-06 00:00:00
566 287 451074.5 2018-03-06 21:00:00
567 287 450279.3 2018-03-07 00:00:00
568 287 450899.6 2018-03-07 21:00:00
569 287 450685.7 2018-03-03 03:00:00
570 287 450678.6 2018-03-03 09:00:00
571 287 450678.6 2018-03-03 12:00:00
572 287 450277.6 2018-03-03 21:00:00
573 287 450255.8 2018-03-04 00:00:00
574 287 450916.5 2018-03-04 21:00:00
575 287 450802.4 2018-03-05 21:00:00
576 287 450780.0 2018-03-06 00:00:00
577 287 451074.8 2018-03-06 21:00:00
578 287 450279.1 2018-03-07 00:00:00
805  41 450911.1 2018-03-07 12:00:00
806  41 450891.1 2018-03-07 15:00:00
807  41 450883.9 2018-03-07 18:00:00

I have tried the following:

df.thin<-df %>% group_by(ID) %>% group_by(day=floor_date(datetime, "day")) %>% sample_n(size = 1)

However this results in one measurement per day and not one measurement for each individual per day.

EX:

> df.thin
    # A tibble: 6 x 4
    # Groups:   day [6]
      ID          X datetime            day                
      <fct>   <dbl> <dttm>              <dttm>             
    1 287   450767. 2018-03-02 15:00:00 2018-03-02 00:00:00
    2 287   450673. 2018-03-03 00:00:00 2018-03-03 00:00:00
    3 287   450916. 2018-03-04 21:00:00 2018-03-04 00:00:00
    4 287   450802. 2018-03-05 21:00:00 2018-03-05 00:00:00
    5 287   451075. 2018-03-06 21:00:00 2018-03-06 00:00:00
    6 41    450891. 2018-03-07 15:00:00 2018-03-07 00:00:00

My goal is the following:

> df.thin.goal
# A tibble: 7 x 4
# Groups:   day [6]
  ID          X datetime            day                
  <fct>   <dbl> <dttm>              <dttm>             
1 287   450769. 2018-03-02 18:00:00 2018-03-02 00:00:00
2 287   450686. 2018-03-03 03:00:00 2018-03-03 00:00:00
3 287   450916. 2018-03-04 21:00:00 2018-03-04 00:00:00
4 287   450802. 2018-03-05 21:00:00 2018-03-05 00:00:00
5 287   451075. 2018-03-06 21:00:00 2018-03-06 00:00:00
6 287   450279. 2018-03-07 00:00:00 2018-03-07 00:00:00
7 41    450884. 2018-03-07 18:00:00 2018-03-07 00:00:00

Viewing all articles
Browse latest Browse all 201894

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>