I am using the following code ( which was developed in a previous post) for the following task: performing all possible linear regressions between the first variable and the other variables and saving the results in a new data frame.
library(broom)
library(dplyr)
x <- names(data[,-1])
out <- unlist(lapply(1, function(n) combn(x, 1, FUN=function(row)
paste0("tlv ~ ", paste0(row, collapse = "+")))))
## get the regression coefficients
tmp1 = bind_rows(lapply(out, function(frml) {
a = tidy(lm(frml, data=data))
a$frml = frml
return(a)
}))
reg_coeff2 <- tmp1
## Get regression results i.e. R2, AIC, BIC
tmp2 = bind_rows(lapply(out, function(frml) {
a = glance(lm(frml, data=data))
a$frml = frml
return(a)
}))
reg_results2 <- tmp2
reg_results2$frml <- sub("tlv ~ ", "", reg_results2$frml)
The code works very well, but I would like to implement it in order to do the following.
I have the following data frame (data)
structure(list(id = c(5309039, 5284969, 5300279, 5270289, 5259957,
5267086, 5173196), var1 = c(0, 0, 0, 0, 0, 0, 0), var2 = c(23,
24, 20, 32, 31, 37, 43), var3 = c(162, 154, 156, 154, 151.5,
171, 154), var4 = c(62.8, 52.7, 64.5, 70.9, 63, 66.2, 60.3),
tlv = c(1049, 978, 1131, 1292, 1228, 1593, 1265), form20 = c(1674.12110392683,
1517.06018080512, 1666.03606715029, 1726.99450999549, 1627.94506984781,
1754.74878787639, 1608.54623766777), form19 = c(1062.84280028848,
902.364998653641, 1054.58187260355, 1116.8664734097, 1015.66220125765,
1145.22454880977, 995.841345244203), form18 = c(1050.91941325579,
891.3634649201, 1026.84722464179, 1073.58291322486, 980.997498562542,
1147.23019335865, 971.271632531001), form17 = c(1404.10436829839,
1220.98291088203, 1419.72032143583, 1517.11065788694, 1386.31581471687,
1477.21675910098, 1347.52393410332), form16 = c(1248.12292187059,
1126.73082253566, 1229.80850901466, 1265.36558733196, 1194.92548170827,
1321.39733067342, 1187.52592495257), form15 = c(990.132,
866.003, 1011.025, 1089.681, 992.59, 1031.918, 959.407),
form14 = c(1590.6052, 1436.4718, 1582.993, 1830.3706, 1688.692,
1812.3808, 1786.5202), form13 = c(1300.81321145176, 1130.23869905075,
1292.03253463863, 1358.23586808642, 1250.66417156907, 1388.37813595599,
1277.89625553694), form12 = c(1329.6, 1104.4, 1272, 1322.8,
1195.5, 1487.4, 1195.6)), row.names = c(NA, -7L), class = c("tbl_df",
"tbl", "data.frame"))
and I need to perform linear regression between the variable tlv
and all the variables whose name start with the prefix "form" , so excluding the other variables (i.e. var1
, var2
, var3
, ...)