Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 205491

performing all possible linear regressions between 1 variable and a list of variables

$
0
0

I am using the following code ( which was developed in a previous post) for the following task: performing all possible linear regressions between the first variable and the other variables and saving the results in a new data frame.

library(broom)
library(dplyr)
x <- names(data[,-1])
out <- unlist(lapply(1, function(n) combn(x, 1, FUN=function(row) 
          paste0("tlv ~ ", paste0(row, collapse = "+")))))
## get the regression coefficients
tmp1 = bind_rows(lapply(out, function(frml) {
      a = tidy(lm(frml, data=data))
      a$frml = frml
      return(a)
    }))
reg_coeff2 <- tmp1
 ## Get regression results i.e. R2, AIC, BIC
 tmp2 = bind_rows(lapply(out, function(frml) {
      a = glance(lm(frml, data=data))
      a$frml = frml
      return(a)
    }))
 reg_results2 <- tmp2
 reg_results2$frml <- sub("tlv ~ ", "", reg_results2$frml)

The code works very well, but I would like to implement it in order to do the following.

I have the following data frame (data)

structure(list(id = c(5309039, 5284969, 5300279, 5270289, 5259957, 
5267086, 5173196), var1 = c(0, 0, 0, 0, 0, 0, 0), var2 = c(23, 
24, 20, 32, 31, 37, 43), var3 = c(162, 154, 156, 154, 151.5, 
171, 154), var4 = c(62.8, 52.7, 64.5, 70.9, 63, 66.2, 60.3), 
    tlv = c(1049, 978, 1131, 1292, 1228, 1593, 1265), form20 = c(1674.12110392683, 
    1517.06018080512, 1666.03606715029, 1726.99450999549, 1627.94506984781, 
    1754.74878787639, 1608.54623766777), form19 = c(1062.84280028848, 
    902.364998653641, 1054.58187260355, 1116.8664734097, 1015.66220125765, 
    1145.22454880977, 995.841345244203), form18 = c(1050.91941325579, 
    891.3634649201, 1026.84722464179, 1073.58291322486, 980.997498562542, 
    1147.23019335865, 971.271632531001), form17 = c(1404.10436829839, 
    1220.98291088203, 1419.72032143583, 1517.11065788694, 1386.31581471687, 
    1477.21675910098, 1347.52393410332), form16 = c(1248.12292187059, 
    1126.73082253566, 1229.80850901466, 1265.36558733196, 1194.92548170827, 
    1321.39733067342, 1187.52592495257), form15 = c(990.132, 
    866.003, 1011.025, 1089.681, 992.59, 1031.918, 959.407), 
    form14 = c(1590.6052, 1436.4718, 1582.993, 1830.3706, 1688.692, 
    1812.3808, 1786.5202), form13 = c(1300.81321145176, 1130.23869905075, 
    1292.03253463863, 1358.23586808642, 1250.66417156907, 1388.37813595599, 
    1277.89625553694), form12 = c(1329.6, 1104.4, 1272, 1322.8, 
    1195.5, 1487.4, 1195.6)), row.names = c(NA, -7L), class = c("tbl_df", 
"tbl", "data.frame"))

and I need to perform linear regression between the variable tlv and all the variables whose name start with the prefix "form" , so excluding the other variables (i.e. var1, var2, var3, ...)


Viewing all articles
Browse latest Browse all 205491

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>