Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 206305

Find Islands in GenomicRanges

$
0
0

In GenomicRanges one interesting problem is the identification of gene islands.

I am trying to find the largest subset of ranges in which neighboring ranges dont exceed a certain distance. To solve the issue I have tried to assign groups based on the difference between individual ranges.

I searched within IRanges package for suitable methods but I was not successful so far.

daf <- data.frame(seqnames="ConA",start=c(10,39,56,1000,5000),end=c(19,44,87,1100,5050),ID=paste0("Range",LETTERS[1:5]))
gr <- makeGRangesFromDataFrame(daf,keep.extra.columns=TRUE)
## Order the data frame by start column
oo <- order(daf$start)
daf <- daf[oo,]

# Calculate the distance
dd <- c(0,diff(daf$start))
daf$diff <- dd
daf$group <- rep(1,nrow(daf))

group <- 1
for(i in 1:nrow(daf)){
  if(daf$diff[i] > 500){
    group <- group + 1
  }
  daf$group[i] <- group
}

Based on the assigned group one can find the largest one. Do you know any better solution, avoiding the for loop ?


Viewing all articles
Browse latest Browse all 206305

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>