Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 205423

R group_by() + rleid() equivalent in Python

$
0
0

I've got a following data frame in Python:

df = pd.DataFrame.from_dict({'measurement_id': np.repeat([1, 2], [6, 6]),
                         'min': np.concatenate([np.repeat([1, 2, 3], [2, 2, 2]), 
                                                np.repeat([1, 2, 3], [2, 2, 2])]),
                         'obj': list('AB' * 6),
                         'var': [1, 2, 1, 2, 2, 1, 2, 1, 2, 1, 1, 1]})

First, within each group defined by object, I'd like to assign id to unique run of measurement_id and var columns. If any value of those columns changes, it starts new run that should be assigned with new id. So the

df['rleid_output'] = [1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 3]

Then, for each group defined by rleid_output I'd like to check how many minutes (min column) the run lasted giving me expected_output column:

df['expected_output'] = [2, 2, 2, 2, 1, 1, 2, 3, 2, 3, 1, 3]

If it was R, I'd proceed as follows:

df <- data.frame(measurement_id = rep(1:2, each = 6),
           min = rep(rep(1:3, each = 2), 2),
           object = rep(LETTERS[1:2], 6),
           var = c(1, 2, 1, 2, 2, 1, 2, 1, 2, 1, 1, 1))
df %>% 
  group_by(object) %>% 
  mutate(rleid = data.table::rleid(measurement_id, var)) %>% 
  group_by(object, rleid) %>% 
  mutate(expected_output = last(min) - first(min) + 1) 

So the main thing I need is R data.table::rleid equivalent that would work with Python pd.DataFrame.groupby clause. Any ideas how to solve this?

@Edit: new, updated example of data frame:

df = pd.DataFrame.from_dict({'measurement_id': np.repeat([1, 2], [6, 6]),
                         'min': np.concatenate([np.repeat([1, 2, 3], [2, 2, 2]), 
                                                np.repeat([1, 2, 3], [2, 2, 2])]),
                         'obj': list('AB' * 6),
                         'var': [1, 2, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1]})
df['rleid_output'] = [1, 1, 2, 1, 3, 2, 4, 3, 4, 3, 5, 3]
df['expected_output'] = [1, 2, 1, 2, 1, 1, 2, 3, 2, 3, 1, 3]

Viewing all articles
Browse latest Browse all 205423

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>