I've got a following data frame in Python:
df = pd.DataFrame.from_dict({'measurement_id': np.repeat([1, 2], [6, 6]),
'min': np.concatenate([np.repeat([1, 2, 3], [2, 2, 2]),
np.repeat([1, 2, 3], [2, 2, 2])]),
'obj': list('AB' * 6),
'var': [1, 2, 1, 2, 2, 1, 2, 1, 2, 1, 1, 1]})
First, within each group defined by object
, I'd like to assign id to unique run of measurement_id
and var
columns. If any value of those columns changes, it starts new run that should be assigned with new id. So the
df['rleid_output'] = [1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 3]
Then, for each group defined by rleid_output
I'd like to check how many minutes (min
column) the run lasted giving me expected_output
column:
df['expected_output'] = [2, 2, 2, 2, 1, 1, 2, 3, 2, 3, 1, 3]
If it was R, I'd proceed as follows:
df <- data.frame(measurement_id = rep(1:2, each = 6),
min = rep(rep(1:3, each = 2), 2),
object = rep(LETTERS[1:2], 6),
var = c(1, 2, 1, 2, 2, 1, 2, 1, 2, 1, 1, 1))
df %>%
group_by(object) %>%
mutate(rleid = data.table::rleid(measurement_id, var)) %>%
group_by(object, rleid) %>%
mutate(expected_output = last(min) - first(min) + 1)
So the main thing I need is R data.table::rleid
equivalent that would work with Python pd.DataFrame.groupby
clause. Any ideas how to solve this?
@Edit: new, updated example of data frame:
df = pd.DataFrame.from_dict({'measurement_id': np.repeat([1, 2], [6, 6]),
'min': np.concatenate([np.repeat([1, 2, 3], [2, 2, 2]),
np.repeat([1, 2, 3], [2, 2, 2])]),
'obj': list('AB' * 6),
'var': [1, 2, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1]})
df['rleid_output'] = [1, 1, 2, 1, 3, 2, 4, 3, 4, 3, 5, 3]
df['expected_output'] = [1, 2, 1, 2, 1, 1, 2, 3, 2, 3, 1, 3]