Performance benefits of chaining over ANDing when filtering a data table

I'm in the habit of clumping similar tasks together into a single line. For example, if I need to filter on a, b, and c in a data table, I'll put them together in one[] with ANDs. Yesterday, I noticed that in my particular case this was incredibly slow and tested chaining filters instead. I've included an example below.

First, I seed the random number generator, load data.table, and create a dummy data set.

# Set RNG seed
set.seed(-1)

# Load libraries
library(data.table)

# Create data table
dt <- data.table(a = sample(1:1000, 1e7, replace = TRUE),
                 b = sample(1:1000, 1e7, replace = TRUE),
                 c = sample(1:1000, 1e7, replace = TRUE),
                 d = runif(1e7))

Next, I define my methods. The first approach chains filters together. The second ANDs the filters together.

# Chaining method
chain_filter <- function(){
  dt[a %between% c(1, 10)
     ][b %between% c(100, 110)
       ][c %between% c(750, 760)]
}

# Anding method
and_filter <- function(){
  dt[a %between% c(1, 10) & b %between% c(100, 110) & c %between% c(750, 760)]
}

Here, I check they give the same results.

# Check both give same result
identical(chain_filter(), and_filter())
#> [1] TRUE

Finally, I benchmark them.

# Benchmark
microbenchmark::microbenchmark(chain_filter(), and_filter())
#> Unit: milliseconds
#>            expr      min        lq      mean    median        uq       max
#>  chain_filter() 25.17734  31.24489  39.44092  37.53919  43.51588  78.12492
#>    and_filter() 92.66411 112.06136 130.92834 127.64009 149.17320 206.61777
#>  neval cld
#>    100  a 
#>    100   b

^{Created on 2019-10-25 by the reprex package (v0.3.0)}

In this case, chaining reduces run time by about 70%. Why is this the case? I mean, what's going on under the hood in data table? I haven't seen any warnings against using &, so I was surprised that the difference is so big. In both cases they evaluate the same conditions, so that shouldn't be a difference. In the AND case, & is a quick operator and then it only has to filter the data table once (i.e., using the logical vector resulting from the ANDs), as opposed to filtering three times in the chaining case.

Bonus question

Does this principle hold for data table operations in general? Is modularising tasks always a better strategy?

Performance benefits of chaining over ANDing when filtering a data table

Bonus question

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112