Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 205278

How to efficiently collapse a vector of integers into a data.table of sequences, using R?

$
0
0

Given a large vector. For example:

set.seed(1)
in_vec <- sample(1:10000, 5000, replace = F)

How can I efficiently collapse this into a datatable that provides the start and end coordinates for all sequential integers. I am currently using the following code:

in_vec <- sort(in_vec) # sort by sequence
library(data.table)
interval_id <- findInterval(in_vec, in_vec[which(c(1, diff(in_vec)) > 1)]) # add unique IDs for sequences
dt <- data.table(vec = in_vec, # make data.table
             int_id = interval_id)
long_to_short <- function(sub){ data.table(start = sub$vec[1], end = sub$vec[nrow(sub)]) } # custom function
library(plyr)
output <- ddply(dt, "int_id", long_to_short)
output$int_id <- NULL

However, the vector I am applying this to is very large, and I therefore need to maximise performance. Is there a data.table method? Any help will be greatly appreciated!


Viewing all articles
Browse latest Browse all 205278

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>