I have two data frames, one called strain_1
and the other called strain_2
. Each data frame has 4 columns (st_A
, ed_A
, st_B
, ed_B
: for "start" and "end" positions), but a different number of rows. st_A
, ed_A
and st_B
, ed_B
are the "start" and "end" positions of the block_A and block_B, respectively (see image 1 and the example below).
I am looking to identify the common overlapping blocks between strain_1
and strain_2
.
Taking an example from image 1:
strain_1:
st_A ed_A st_B ed_B
7 9 123 127
25 28 97 98
35 38 140 145
strain_2:
st_A ed_A st_B ed_B
5 8 124 129
20 25 95 100
36 39 141 147
.. .. .. ..
.. .. .. ..
From this example, we see three overlapping regions (image 1):
The overlapping region is defined by : the min value of
st_A
(orst_B
) and max value ofed_A
(ored_B
) for block_A and block_B, respectively (see image 2: green box = common region).
The objective is to create a new data frame
with these common regions (pair of blocks)
result_desired:
st_A ed_A st_B ed_B
5 9 123 129
20 28 95 100
35 39 140 147
There are 16 possible combinations (see image 3), depending on the size of each block.
Is there a fast way to do this? knowing that I have data with several thousand lines.
I'm testing with an if-loop (based on image 3), but is not the same number of rows between data frames:
for i in seq_along(strain_1){
if (strain_1[i,1] <= strain_2[i,1] & strain_1[i,2] <= strain_2[i,2] & strain_1[i,3] <= strain_2[i,3] & strain_1[i,4] <= strain_2[i,4]){
res[i,1] <- paste("start_b1:",strain_1[i,1], "end_b1:",strain_2[i,2], "start_b2 :", strain_1[i,3], "end_b2 :", strain_2[i,4]}
else if (strain_1[i,1] <= strain_2[i,1] & strain_1[i,2] <= strain_2[i,2] & strain_1[i,3] >= strain_2[i,3] & strain_1[i,4] <= strain_2[i,4]){
res[i,1] <- paste("start_b1:",strain_1[i,1], "end_b1:",strain_2[i,2], "start_b2 :", strain_2[i,3], "end_b2 :", strain_1[i,4]}
. case 3
. case 4
.
.
.
else if (case 16) { res[i,1] <- paste("start_b1:",strain_2[i,1], "end_b1:", strain_2[i,2], "start_b2:", strain_2[i,3], "end_b2:",strain_2[i,4]}
else { res[i,1] <- ""}
}
Thanks for your help.