ItGo.me Focus on IT Recommend

Home > r - Finding `n1` TRUEs wrapped in between two `n2` FALSEs, the whole thing wrapped in between `n3` TRUEs, etc

r - Finding `n1` TRUEs wrapped in between two `n2` FALSEs, the whole thing wrapped in between `n3` TRUEs, etc

2021腾讯云限时秒杀,爆款1核2G云服务器298元/3年!(领取2860元代金券),
地址https://cloud.tencent.com/act/cps/redirect?redirect=1062

2021阿里云最低价产品入口+领取代金券(老用户3折起),
入口地址https://www.aliyun.com/minisite/goods

From a sequence of TRUEs and falses, I wanted to make a function that returns TRUE whether there is a series of at least n1 TRUEs somewhere in the sequence. Here is that function:

fun_1 = function(TFvec, n1){
    nbT = 0
    solution = -1
    for (i in 1:length(x)){
            if (x[i]){
            nbT = nbT + 1
               if (nbT == n1){
                return(T)
                break
               }
            } else {
                nbT = 0
            }
        }
        return (F) 
}

Test:

x = c(T,F,T,T,F,F,T,T,T,F,F,T,F,F)
fun_1(x,3) # TRUE
fun_1(x,4) # FALSE

Then, I needed a function that returns TRUE if in a given list boolean vector, there is a series of at least n1 TRUEs wrapped by at least two series (one on each side) of n2 falses. Here the function:

fun_2 = function(TFvec, n1, n2){
    if (n2 == 0){
        fun_1(TFvec, n2)        
    }
    nbFB = 0
    nbFA = 0
    nbT = 0
    solution = -1
    last = F
    for (i in 1:length(TFvec)){
        if(TFvec[i]){           
            nbT = nbT + 1
            if (nbT == n1 & nbFB >= n2){
                solution = i-n1+1
            }
            last = T
        } else {
            if (last){
                nbFB = 0
                nbFA = 0        
            }
            nbFB = nbFB + 1
            nbFA = nbFA + 1
            nbT = 0
            if (nbFA == n2 & solution!=-1){
                return(T)
            }
            last = F
        }
    }
    return(F)
}

It is maybe not a very efficient function though! And I haven't tested it 100 times but it looks like it works fine!

Test:

x = c(T,F,T,T,F,F,T,T,T,F,F,T,F,F)
fun_2(x, 3, 2) # TRUE
fun_2(x, 3, 3) # FALSE

Now, believe it or not, I'd like to make a function (fun_3) that returns TRUE if in the boolean vector there is a (at least) series of at least n1 TRUEs wrapped in between (at least) two (one on each side) series of n2 falses where the whole thing (the three series) are wrapped in between (at least) two (one on each side) series of n3 TRUEs. And as I am afraid to have to bring this problem even further, I am asking here for help to create a function fun_n in which we enter two arguments TFvec and list_n where list_n is a list of n of any length.

Can you help me to create the function fun_n?

answer 1 >>---Accepted---Accepted---Accepted---

For convenience, record the length of the number of thresholds

n = length(list_n)

Represent the vector of TRUE and FALSE as a run-length encoding, remembering the length of each run for convenience

r = rle(TFvec); l = r$length

Find possible starting locations

idx = which(l >= list_n[1] & r$value)

Make sure the starting locations are embedded enough to satisfy all tests

idx = idx[idx > n - 1 & idx + n - 1 <= length(l)]

Then check that lengths of successively remote runs are consistent with the condition, keeping only those starting points that are

for (i in seq_len(n - 1)) {
    if (length(idx) == 0)
        break     # no solution
    thresh = list_n[i + 1]
    test = (l[idx + i] >= thresh) & (l[idx - i] >= thresh)
    idx = idx[test]
}

If there are any values left in idx, then these are the indexes into the rle satisfying the condition; the starting point(s) in the initial vector are cumsum(l)[idx - 1] + 1.

Combined:

runfun = function(TFvec, list_n) {
    ## setup
    n = length(list_n)
    r = rle(TFvec); l = r$length

    ## initial condition
    idx = which(l >= list_n[1] & r$value)
    idx = idx[idx > n - 1 & idx + n - 1 <= length(l)]

    ## adjacent conditions
    for (i in seq_len(n - 1)) {
        if (length(idx) == 0)
            break     # no solution
        thresh = list_n[i + 1]
        test = (l[idx + i] >= thresh) & (l[idx - i] >= thresh)
        idx = idx[test]
    }

    ## starts = cumsum(l)[idx - 1] + 1
    ## any luck?
    length(idx) != 0
}

This is fast and allows for runs >= the threshold, as stipulated in the question; for example

x = sample(c(TRUE, FALSE), 1000000, TRUE)
system.time(runfun(x, rep(2, 5)))

completes in less than 1/5th of a second.

A fun generalization allows for flexible condition, e.g., runs of exactly list_n, as in the rollapply solution

runfun = function(TFvec, list_n, cond=`>=`) {
    ## setup
    n = length(list_n)
    r = rle(TFvec); l = r$length

    ## initial condition
    idx = which(cond(l, list_n[1]) & r$value)
    idx = idx[idx > n - 1 & idx + n - 1 <= length(l)]

    ## adjacent conditions
    for (i in seq_len(n - 1)) {
        if (length(idx) == 0)
            break     # no solution
        thresh = list_n[i + 1]
        test = cond(l[idx + i], thresh) & cond(l[idx - i], thresh)
        idx = idx[test]
    }

    ## starts = cumsum(l)[idx - 1] + 1
    ## any luck?
    length(idx) != 0
}

------splitte line----------------------------
Related