I have a dataframe of station repairs.
The workflow is like this: Mechanics go to a station and they press a button that records an action
named release
. After they fix the station, they press the button again and the action is now return
.
You can see below that row 1
and row 2
is a completed task and took Jane Jetson
10 seconds to do.
dt name foo_id foo_role bikeId station_name station_id action
1 2019-12-12 13:05:47 Jane Jetson 106337 Mechanic 12345 FooStation 1234.89 Release
2 2019-12-12 13:05:57 Jane Jetson 106337 Mechanic 12345 FooStation 1234.89 Return
3 2019-12-12 13:06:16 John Doe 106338 Mechanic 12345 FooStation 1234.89 Release
4 2019-12-12 13:06:19 John Doe 106338 Mechanic 12345 FooStation 1234.89 Return
5 2019-12-12 13:07:16 John Doe 106338 Mechanic 12345 FooStation 1234.89 Release
6 2019-12-12 14:07:16 John Doe 106338 Mechanic 56789 Some Station 4567.12 Release
What I want to happen:
- I want to know how long each
mechanic
took to repair the station using theaction
Release
and then following aReturn
. - If a
Release
does not have aReturn
, I want to take theSys.time()
and subtract it fromdt
. You'll see thatrow 5
androw 6
I did this: (I'm not 100% sure I need the previous action but I included in case it's needed.)
library(dplyr)
library(tidyr)
foo = arrange(foo, foo_id, name, foo_role, bikeId, station_id) %>%
group_by(foo_id,name, foo_role, bikeId, station_name,station_id) %>%
mutate(prev_dt = lag(dt, order_by = foo_id),
prev_action = lag(action, order_by=foo_id, default = 'NaN'))
foo$timediffsecs = as.numeric(difftime(foo$dt,foo$prev_dt,units='secs'))
> foo
# A tibble: 6 x 11
# Groups: foo_id, name, foo_role, bikeId, station_name, station_id [3]
dt name foo_id foo_role bikeId station_name station_id action prev_dt prev_action timediffsecs
<dttm> <fct> <int> <fct> <int> <fct> <dbl> <chr> <dttm> <chr> <dbl>
1 2019-12-12 13:05:47 Jane Jetson 106337 Mechanic 12345 FooStation 1235. Release NA NaN NA
2 2019-12-12 13:05:57 Jane Jetson 106337 Mechanic 12345 FooStation 1235. Return 2019-12-12 13:05:47 Release 10
3 2019-12-12 13:06:16 John Doe 106338 Mechanic 12345 FooStation 1235. Release NA NaN NA
4 2019-12-12 13:06:19 John Doe 106338 Mechanic 12345 FooStation 1235. Return 2019-12-12 13:06:16 Release 3
5 2019-12-12 13:07:16 John Doe 106338 Mechanic 12345 FooStation 1235. Release 2019-12-12 13:06:19 Return 57
6 2019-12-12 14:07:16 John Doe 106338 Mechanic 56789 Some Station 4567. Release NA NaN NA
The Problem:
row 5
is a new cycle because theaction
Release
andReturn
has happened previously buttimediffsecs
recorded 57 secs. Inrow 5
Prev_dt
andprev_action
should beNA
andtimediffsecs
=Sys.time() - dt
.row 6
should havetimediffsecs
=Sys.time() - dt
What I'm thinking could work:
I changed prev_action
NA to NaN so I could do some if else statements but I am not quite sure how to construct one for this. I want to change the NA in prev_dt
to default to dt
but there were problems doing that. The reason I want to try this is so I can use a conditional statement but if that's not needed, there's no need to change NA's.
tl;dr: I want timediffsecs
to record the correct seconds. row 5
and row 6
have problems. row 5
should be Sys.time() - dt
. row 6
I want to return Sys.time() - dt
Data:
structure(list(dt = structure(c(1576173947, 1576173957, 1576173976,
1576173979, 1576174036, 1576177636), class = c("POSIXct", "POSIXt"
), tzone = ""), name = structure(c(1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Jane Jetson",
"John Doe"), class = "factor"), foo_id = c(106337L, 106337L,
106338L, 106338L, 106338L, 106338L), foo_role = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "Mechanic", class = "factor"),
bikeId = c(12345L, 12345L, 12345L, 12345L, 12345L, 56789L
), station_name = structure(c(1L, 1L, 1L, 1L, 1L, 2L), .Label = c("FooStation",
"Some Station"), class = "factor"), station_id = c(1234.89,
1234.89, 1234.89, 1234.89, 1234.89, 4567.12), action = c("Release",
"Return", "Release", "Return", "Release", "Release")), row.names = c(NA,
-6L), class = "data.frame")