Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201945

Efficiently populating rows given possible values for each variable in R

$
0
0

I have a dataframe with 42 variables, each of which have different possible values. I am aiming to create a much larger dataframe which contains a row for each possible combination of values for each of the variables.

This will be millions of rows long and too large to hold in RAM. I have therefore been trying to make a script which appends each possible value to an existing file. The following code works but does so too slowly to be practical (also includes only 5 variables), taking just under 5 minutes to run on my machine.

V1 <- c(seq(0, 30, 1), NA)
V2 <- c(seq(20, 55, 1), NA)
V3 <- c(0, 1, NA)
V4 <- c(seq(1, 16, 1), NA)
V5 <- c(seq(15, 170, 1), NA)


df_empty <- data.frame(V1 = NA, V2 = NA, V3 = NA, V4 = NA)
write.csv(df_empty, "table_out.csv", row.names = FALSE)

start <- Sys.time()
for(v1 in 1:length(V1)){
  V1_val <- V1[v1]

  for(v2 in 1:length(V2)){
    V2_val <- V2[v2]

    for(v3 in 1:length(V3)){
      V3_val <- V3[v3]

      for(v4 in 1:length(V4)){
        V4_val <- V4[v4]

        row <- cbind(V1_val, V2_val, V3_val, V4_val)
        write.table(as.matrix(row), file = "table_out.csv", sep = ",", append = TRUE, quote = FALSE,col.names = FALSE, row.names = FALSE)        
      }
    }
  }
}

print(abs(Sys.time() - start)) # 4.8 minutes
print(paste(nrow(read.csv("table_out.csv")), "rows in file"))

I have tested using data.table::fwrite() but this failed to be any faster than write.table(as.matrix(x)) I'm sure the issue I have is with using so many for loops but am unsure how to translate this into a more efficient approach.

Thanks


Viewing all articles
Browse latest Browse all 201945

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>