I want to create several variables using a formula with R data.table. I have a list of variables, and for each one I want to perform a calculation and create a new variable, pasting the same string onto each column name. I can get it to work for one variable at a time, but it doesn't work for a lapply or a loop. I suspect I am missing something with R data.table and quotation marks or variable names vs. strings. Do I need to use ".." or wrap with eval()? A dplyr (or any tidyverse) solution would solve the issue too.
Here is example code with mtcars:
library(data.table)
mtcars.dt <- setDT(mtcars)
myVars <- c("mpg", "hp", "qsec")
# Doesn't work:
for( myVar in myVars){
mtcars.dt[, paste0(myVar, ".disp.ratio") := myVar / disp]
}
# Doesn't work:
lapply(myVars, function(myVar) mtcars.dt[, paste0(myVar, ".disp.ratio") := myVar / disp])
# Works:
mtcars.dt[, mpg.disp.ratio := mpg / disp]
# Doesn't work
for (myVar in myVars){
mtcars.dt[, paste0(myVar, ".disp.lm.adj") :=
myVar -
lm(data = .SD, formula = myVar ~ disp)$coefficients[2] * (disp - mean(disp))]
}
# Doesn't work
lapply(myVars, function(x) mtcars.dt[, paste0(x, ".disp.lm.adj") :=
x -
lm(data = .SD, formula = x ~ disp)$coefficients[2] * (disp - mean(disp))])
# Works
mtcars.dt[, mpg.disp.lm.adj :=
mpg -
lm(data = .SD, formula = mpg ~ disp)$coefficients[2] * (disp - mean(disp))]
For the ratio calculation, I get the following error:
Error in myVar/disp : non-numeric argument to binary operator
For the lm adjustment, I get the following error:
Error in model.frame.default(formula = myVar ~ disp, data = .SD, drop.unused.levels = TRUE) :
variable lengths differ (found for 'disp')