Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 204922

How to lag dates based on multiple conditions and reseting the lag after it occurs

$
0
0

I have a dataframe of station repairs.

The workflow is like this: Mechanics go to a station and they press a button that records an action named release. After they fix the station, they press the button again and the action is now return.

You can see below that row 1 and row 2 is a completed task and took Jane Jetson 10 seconds to do.

                   dt        name foo_id foo_role bikeId station_name station_id  action
1 2019-12-12 13:05:47 Jane Jetson 106337 Mechanic  12345   FooStation    1234.89 Release
2 2019-12-12 13:05:57 Jane Jetson 106337 Mechanic  12345   FooStation    1234.89  Return
3 2019-12-12 13:06:16    John Doe 106338 Mechanic  12345   FooStation    1234.89 Release
4 2019-12-12 13:06:19    John Doe 106338 Mechanic  12345   FooStation    1234.89  Return
5 2019-12-12 13:07:16    John Doe 106338 Mechanic  12345   FooStation    1234.89 Release
6 2019-12-12 14:07:16    John Doe 106338 Mechanic  56789 Some Station    4567.12 Release

What I want to happen:

  • I want to know how long each mechanic took to repair the station using the actionRelease and then following a Return.
  • If a Release does not have a Return, I want to take the Sys.time() and subtract it from dt. You'll see that row 5 and row 6

I did this: (I'm not 100% sure I need the previous action but I included in case it's needed.)

library(dplyr)
library(tidyr)
foo = arrange(foo, foo_id, name, foo_role, bikeId, station_id) %>% 
  group_by(foo_id,name, foo_role, bikeId, station_name,station_id) %>%
  mutate(prev_dt = lag(dt, order_by = foo_id), 
         prev_action = lag(action, order_by=foo_id, default = 'NaN'))
foo$timediffsecs = as.numeric(difftime(foo$dt,foo$prev_dt,units='secs'))
> foo
# A tibble: 6 x 11
# Groups:   foo_id, name, foo_role, bikeId, station_name, station_id [3]
  dt                  name        foo_id foo_role bikeId station_name station_id action  prev_dt             prev_action timediffsecs
  <dttm>              <fct>        <int> <fct>     <int> <fct>             <dbl> <chr>   <dttm>              <chr>              <dbl>
1 2019-12-12 13:05:47 Jane Jetson 106337 Mechanic  12345 FooStation        1235. Release NA                  NaN                   NA
2 2019-12-12 13:05:57 Jane Jetson 106337 Mechanic  12345 FooStation        1235. Return  2019-12-12 13:05:47 Release               10
3 2019-12-12 13:06:16 John Doe    106338 Mechanic  12345 FooStation        1235. Release NA                  NaN                   NA
4 2019-12-12 13:06:19 John Doe    106338 Mechanic  12345 FooStation        1235. Return  2019-12-12 13:06:16 Release                3
5 2019-12-12 13:07:16 John Doe    106338 Mechanic  12345 FooStation        1235. Release 2019-12-12 13:06:19 Return                57
6 2019-12-12 14:07:16 John Doe    106338 Mechanic  56789 Some Station      4567. Release NA                  NaN                   NA

The Problem:

  1. row 5 is a new cycle because the actionRelease and Return has happened previously but timediffsecs recorded 57 secs. In row 5Prev_dt and prev_action should be NA and timediffsecs = Sys.time() - dt.

  2. row 6 should have timediffsecs = Sys.time() - dt

What I'm thinking could work:

I changed prev_action NA to NaN so I could do some if else statements but I am not quite sure how to construct one for this. I want to change the NA in prev_dt to default to dt but there were problems doing that. The reason I want to try this is so I can use a conditional statement but if that's not needed, there's no need to change NA's.

tl;dr: I want timediffsecs to record the correct seconds. row 5 and row 6 have problems. row 5 should be Sys.time() - dt. row 6 I want to return Sys.time() - dt

Data:

structure(list(dt = structure(c(1576173947, 1576173957, 1576173976, 
1576173979, 1576174036, 1576177636), class = c("POSIXct", "POSIXt"
), tzone = ""), name = structure(c(1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Jane Jetson", 
"John Doe"), class = "factor"), foo_id = c(106337L, 106337L, 
106338L, 106338L, 106338L, 106338L), foo_role = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = "Mechanic", class = "factor"), 
    bikeId = c(12345L, 12345L, 12345L, 12345L, 12345L, 56789L
    ), station_name = structure(c(1L, 1L, 1L, 1L, 1L, 2L), .Label = c("FooStation", 
    "Some Station"), class = "factor"), station_id = c(1234.89, 
    1234.89, 1234.89, 1234.89, 1234.89, 4567.12), action = c("Release", 
    "Return", "Release", "Return", "Release", "Release")), row.names = c(NA, 
-6L), class = "data.frame")

Viewing all articles
Browse latest Browse all 204922

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>