Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

Mutate Next unique values in one hour and expand and aggregate

$
0
0

I am trying to achieve an sliding window aggregation. I tried something using tidyr functions but I am sure there are much better / faster ways to achieve.

Let me explain what I want to achieve:

I have an input dataframe dat:

dat <- tibble(timestamp = seq.POSIXt(as.POSIXct("2019-01-01 00:00:00"), as.POSIXct("2019-01-01 02:00:00"), by = "15 min"))
set.seed(42)
dat$value <- sample(1:5, nrow(dat), replace = T)
dat
# A tibble: 9 x 2
  timestamp           value
  <dttm>              <int>
1 2019-01-01 00:00:00     5
2 2019-01-01 00:15:00     5
3 2019-01-01 00:30:00     2
4 2019-01-01 00:45:00     5
5 2019-01-01 01:00:00     4
6 2019-01-01 01:15:00     3
7 2019-01-01 01:30:00     4
8 2019-01-01 01:45:00     1
9 2019-01-01 02:00:00     4

For every row, I want to find the list of unique values from the value field (but ignore itself if present) that appeared in the next 60 minutes. Lets call that list as nextvalue Then expand each row to generate pairs between the value and the nextvalue. Then group_by, value and nextvalue and summarise the counts and sort by descending order.

I read the docs and have put the below code.

t <- dat$timestamp
value <- dat$value

getCI <- function(start, end) {
  paste(value[(start+1):end], collapse = "|")
}

LETTERS <- LETTERS[1:(length(unique(value)) - 1)]

dat %>%
  mutate(time_next = timestamp + 60*60) %>%
  rowwise() %>%
  mutate(flag = max(which(time_next >= t))) %>%
  ungroup() %>%
  mutate(row = row_number()) %>%
  rowwise() %>%
  mutate(nextvalue = getCI(row, flag)) %>%
  select(value, nextvalue) %>%
  separate(nextvalue, c(LETTERS), extra = "warn", fill = "right") %>%
  pivot_longer(LETTERS, names_to = c("Letter"), values_to = "nextvalue") %>%
  filter(!is.na(nextvalue)) %>%
  filter(value != nextvalue) %>%
  select(value, nextvalue) %>%
  group_by(value, nextvalue) %>%
  summarise(count = n()) %>%
  arrange(desc(count))
# A tibble: 13 x 3
# Groups:   value [5]
   value nextvalue count
   <int> <chr>     <int>
 1     5 4             4
 2     2 4             2
 3     3 4             2
 4     4 1             2
 5     5 2             2
 6     5 3             2
 7     1 4             1
 8     2 3             1
 9     2 5             1
10     3 1             1
11     4 3             1
12     4 NA            1
13     5 1             1

But I want to see interesting ways to achieve this in much less code and much simpler way. I would be interested in seeing how multicore approaches can be applied to this problem to speed up the entire computation. Please comment


Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>