Home > r - Finding `n1` TRUEs wrapped in between two `n2` FALSEs, the whole thing wrapped in between `n3` TRUEs, etc

## r - Finding `n1` TRUEs wrapped in between two `n2` FALSEs, the whole thing wrapped in between `n3` TRUEs, etc

From a sequence of TRUEs and falses, I wanted to make a function that returns TRUE whether there is a series of at least `n1` TRUEs somewhere in the sequence. Here is that function:

``````fun_1 = function(TFvec, n1){
nbT = 0
solution = -1
for (i in 1:length(x)){
if (x[i]){
nbT = nbT + 1
if (nbT == n1){
return(T)
break
}
} else {
nbT = 0
}
}
return (F)
}
``````

Test:

``````x = c(T,F,T,T,F,F,T,T,T,F,F,T,F,F)
fun_1(x,3) # TRUE
fun_1(x,4) # FALSE
``````

Then, I needed a function that returns TRUE if in a given list boolean vector, there is a series of at least `n1` TRUEs wrapped by at least two series (one on each side) of `n2` falses. Here the function:

``````fun_2 = function(TFvec, n1, n2){
if (n2 == 0){
fun_1(TFvec, n2)
}
nbFB = 0
nbFA = 0
nbT = 0
solution = -1
last = F
for (i in 1:length(TFvec)){
if(TFvec[i]){
nbT = nbT + 1
if (nbT == n1 & nbFB >= n2){
solution = i-n1+1
}
last = T
} else {
if (last){
nbFB = 0
nbFA = 0
}
nbFB = nbFB + 1
nbFA = nbFA + 1
nbT = 0
if (nbFA == n2 & solution!=-1){
return(T)
}
last = F
}
}
return(F)
}
``````

It is maybe not a very efficient function though! And I haven't tested it 100 times but it looks like it works fine!

Test:

``````x = c(T,F,T,T,F,F,T,T,T,F,F,T,F,F)
fun_2(x, 3, 2) # TRUE
fun_2(x, 3, 3) # FALSE
``````

Now, believe it or not, I'd like to make a function (`fun_3`) that returns TRUE if in the boolean vector there is a (at least) series of at least `n1` TRUEs wrapped in between (at least) two (one on each side) series of `n2` falses where the whole thing (the three series) are wrapped in between (at least) two (one on each side) series of `n3` TRUEs. And as I am afraid to have to bring this problem even further, I am asking here for help to create a function `fun_n` in which we enter two arguments `TFvec` and `list_n` where `list_n` is a list of `n` of any length.

Can you help me to create the function `fun_n`?

For convenience, record the length of the number of thresholds

``````n = length(list_n)
``````

Represent the vector of TRUE and FALSE as a run-length encoding, remembering the length of each run for convenience

``````r = rle(TFvec); l = r\$length
``````

Find possible starting locations

``````idx = which(l >= list_n[1] & r\$value)
``````

Make sure the starting locations are embedded enough to satisfy all tests

``````idx = idx[idx > n - 1 & idx + n - 1 <= length(l)]
``````

Then check that lengths of successively remote runs are consistent with the condition, keeping only those starting points that are

``````for (i in seq_len(n - 1)) {
if (length(idx) == 0)
break     # no solution
thresh = list_n[i + 1]
test = (l[idx + i] >= thresh) & (l[idx - i] >= thresh)
idx = idx[test]
}
``````

If there are any values left in `idx`, then these are the indexes into the rle satisfying the condition; the starting point(s) in the initial vector are `cumsum(l)[idx - 1] + 1`.

Combined:

``````runfun = function(TFvec, list_n) {
## setup
n = length(list_n)
r = rle(TFvec); l = r\$length

## initial condition
idx = which(l >= list_n[1] & r\$value)
idx = idx[idx > n - 1 & idx + n - 1 <= length(l)]

for (i in seq_len(n - 1)) {
if (length(idx) == 0)
break     # no solution
thresh = list_n[i + 1]
test = (l[idx + i] >= thresh) & (l[idx - i] >= thresh)
idx = idx[test]
}

## starts = cumsum(l)[idx - 1] + 1
## any luck?
length(idx) != 0
}
``````

This is fast and allows for runs >= the threshold, as stipulated in the question; for example

``````x = sample(c(TRUE, FALSE), 1000000, TRUE)
system.time(runfun(x, rep(2, 5)))
``````

completes in less than 1/5th of a second.

A fun generalization allows for flexible condition, e.g., runs of exactly `list_n`, as in the rollapply solution

``````runfun = function(TFvec, list_n, cond=`>=`) {
## setup
n = length(list_n)
r = rle(TFvec); l = r\$length

## initial condition
idx = which(cond(l, list_n[1]) & r\$value)
idx = idx[idx > n - 1 & idx + n - 1 <= length(l)]

for (i in seq_len(n - 1)) {
if (length(idx) == 0)
break     # no solution
thresh = list_n[i + 1]
test = cond(l[idx + i], thresh) & cond(l[idx - i], thresh)
idx = idx[test]
}

## starts = cumsum(l)[idx - 1] + 1
## any luck?
length(idx) != 0
}
``````

------splitte line----------------------------