Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201919

Handling java.lang.OutOfMemoryError when writing to Excel from R

$
0
0

The xlsx package can be used to read and write Excel spreadsheets from R. Unfortunately, even for moderately large spreadsheets, java.lang.OutOfMemoryError can occur. In particular,

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.lang.OutOfMemoryError: Java heap space

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "newInstance", .jfindClass(class), :
java.lang.OutOfMemoryError: GC overhead limit exceeded

(Other related exceptions are also possible but rarer.)

A similar question was asked regarding this error when reading spreadsheets.

Importing a big xlsx file into R?

The main advantage of using Excel spreadsheets as a data storage medium over CSV is that you can store multiple sheets in the same file, so here we consider a list of data frames to be written one data frame per worksheet. This example dataset contains 40 data frames, each with two columns of up to 200k rows. It is designed to be big enough to be problematic, but you can change the size by altering n_sheets and n_rows.

library(xlsx)
set.seed(19790801)
n_sheets <- 40
the_data <- replicate(
  n_sheets,
  {
    n_rows <- sample(2e5, 1)
    data.frame(
      x = runif(n_rows),
      y = sample(letters, n_rows, replace = TRUE)
    )
  },
  simplify = FALSE
)
names(the_data) <- paste("Sheet", seq_len(n_sheets))

The natural method of writing this to file is to create a workbook using createWorkbook, then loop over each data frame calling createSheet and addDataFrame. Finally the workbook can be written to file using saveWorkbook. I've added messages to the loop to make it easier to see where it falls over.

wb <- createWorkbook()  
for(i in seq_along(the_data))
{
  message("Creating sheet", i)
  sheet <- createSheet(wb, sheetName = names(the_data)[i])
  message("Adding data frame", i)
  addDataFrame(the_data[[i]], sheet)
}
saveWorkbook(wb, "test.xlsx")  

Running this in 64-bit on a machine with 8GB RAM, it throws the GC overhead limit exceeded error while running addDataFrame for the first time.

How do I write large datasets to Excel spreadsheets using xlsx?


Viewing all articles
Browse latest Browse all 201919

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>