I have a vector of 10+ million elements.
I need to find all elements satisfying a given condition A (e.g. X < 2 at rows i %in% c(6,10)
).
From each of these elements I need to skim the vector backwards and flag all preceding elements while they satisfy condition B (e.g. X < 4 for i %in% c(8:10) and c(5:6)
).
For example, given the following X column, I would like the final result to be the flag2
column. I am not interested in elements where B is true if they are not immediately preceding an element satisfying A, therefore row i == 2
has flag2 == 0
.
i | X | flag1 | flag2
---------------------------
1 | 4 | 0 | 0
2 | 3 | 0 | 0
3 | 6 | 0 | 0
4 | 9 | 0 | 0
5 | 3 | 0 | 1
6 | 1 | 1 | 1
7 | 9 | 0 | 0
8 | 3 | 0 | 1
9 | 2 | 0 | 1
10 | 1 | 1 | 1
The first operation to produce flag1 is simple and very fast:
# locate all occurrences of X < 2
my_data$flag1 = dplyr::case_when(my_data$X < 2 ~ 1, T ~ 0)
I have implemented the second operation with the following for loop, which gives the desired result but is prohibitively time-consuming given the amount of data.
# flag all elements preceding the ones already flagged while they satisfy `X < 4`
my_data$flag2 = my_data$flag1
for(i in nrow(my_data):2){
if((my_data[i,]$flag2 == 1) & (my_data[i-1,]$X < 4)){
my_data[i-1,]$flag2 = 1
}
}
Is there any way I could do this more efficiently?