Quantum entanglement phrasing obviously a little tongue-in-cheek but you'll see what I mean.
If one creates multiple data.frame columns using chain assignment, those columns all behave as one later, even though they (and their parent df
) appear normal. Observe, columns created independently:
library(data.table)
testdf <- data.frame(a = 1:10)
testdf$b <- as.numeric(rep(NA, nrow(testdf)))
testdf$c <- as.numeric(rep(NA, nrow(testdf)))
mergedf <- data.frame(a = 5:7, b = 1:3, c = 8:10)
setDT(testdf)
setDT(mergedf)
testdf[mergedf, on = "a", b := i.b]
testdf[mergedf, on = "a", c := i.c]
# works as expected
Columns created by chain assignment, which assignOps
help and this suggest should be fine (or at least don't warn against it):
testdf2 <- data.frame(a = 1:10)
testdf2$c <- testdf2$b <- as.numeric(rep(NA, nrow(testdf)))
mergedf2 <- data.frame(a = 5:7, b = 1:3, c = 8:10)
setDT(testdf2)
setDT(mergedf2)
testdf2[mergedf2, on = "a", b := i.b]
# testdf2$b and $c both get mergedf2's b values
testdf2[mergedf2, on = "a", c := i.c]
# testdf2$b and $c both get mergedf2's c values, overwriting the b values
This doesn't seem to be documented in assignOps
, I've not seen it mentioned anywhere, and it seems highly unintuitive that columns/values created in this way - which appears to be a simple space-saving shortcut - become bound together by a secretive pact, potentially to turn against you when your guard is down. To further underline how secret this is, if you just do the creation lines:
testdf <- data.frame(a = 1:10)
testdf$b <- as.numeric(rep(NA, nrow(testdf)))
testdf$c <- as.numeric(rep(NA, nrow(testdf)))
testdf2 <- data.frame(a = 1:10)
testdf2$c <- testdf2$b <- as.numeric(rep(NA, nrow(testdf)))
all.equal(testdf, testdf2) #TRUE
It's also true if you setDT()
both dfs
.