I am working with different linear regression models in R. I used the DATASET, which has 21263 rows and 82 columns.
All of the regression models have acceptable time consumption except the MM-estimate regression using the R function lmrob
.
I was waiting for more than 10 hours to run the first for loop (#Block A), and it does not work. By "does not work", I mean It may give me an output after two days. I tried this code with a smaller DATASET which has 9568 rows, 5 columns and it runs in a one minute.
I am using my standard Laptop.
The steps of my analysis as follows
Uploading and scaling the dataset and then used k-folds split with k=30 because I want to calculate the variance of coefficients for each variable within the k split.
Could you please provide me with any guide?
wdbc = read.csv("train.csv") #critical_temp is the dependent varaible.
wdbcc=as.data.frame(scale(wdbc)) # scaling the variables
### k-folds split ###
set.seed(12345)
k = 30
folds <- createFolds(wdbcc$critical_temp, k = k, list = TRUE, returnTrain = TRUE)
############ Start of MM Regression Model #################
#Block A
lmrob = list()
for (i in 1:k) {
lmrob[[i]] = lmrob(critical_temp~ .,
data = wdbcc[folds[[i]],],setting="KS2014")
}
#Block B
lmrob_coef = list()
lmrob_coef_var = list()
for(j in 1:(lmrob[[1]]$coefficients %>% length())){
for(i in 1:k){
lmrob_coef[[i]] = lmrob[[i]]$coefficients[j]
lmrob_coef_var[[j]] = lmrob_coef %>% unlist() %>% var()
}
}
#Block C
lmrob_var = unlist(lmrob_coef_var)
lmrob_df = cbind(coefficients = lmrob[[1]]$coefficients %>% names() %>% as.data.frame()
, variance = lmrob_var %>% as.data.frame())
colnames(lmrob_df) = c("coefficients", "variance_lmrob")
#Block D
lmrob_var_sum = sum(lmrob_var)