Given a large vector. For example:
set.seed(1)
in_vec <- sample(1:10000, 5000, replace = F)
How can I efficiently collapse this into a datatable that provides the start and end coordinates for all sequential integers. I am currently using the following code:
in_vec <- sort(in_vec) # sort by sequence
library(data.table)
interval_id <- findInterval(in_vec, in_vec[which(c(1, diff(in_vec)) > 1)]) # add unique IDs for sequences
dt <- data.table(vec = in_vec, # make data.table
int_id = interval_id)
long_to_short <- function(sub){ data.table(start = sub$vec[1], end = sub$vec[nrow(sub)]) } # custom function
library(plyr)
output <- ddply(dt, "int_id", long_to_short)
output$int_id <- NULL
However, the vector I am applying this to is very large, and I therefore need to maximise performance. Is there a data.table method? Any help will be greatly appreciated!