Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 206278

Working with large matrix field in R6 object is slow

$
0
0

I'm creating a matrix, representing a certain value for a combination of customers (rows) and features (cols). Let's call this matrix NBA. Based on data received through an API, this matrix needs to be updated many times each second, inserting new values for each call with m[x,y] <- new_value. Subsequently, some matrix operations are carried out (not important here). The matrix is part of a R6 object as a private field, and a update_matrix method allows updating a certain cell of the matrix. However, this operation is very slow compared to updating a normal matrix object outside R6, on the order of microseconds instead of nanoseconds.

Reprex:

library(bench)
library(ggplot2)
library(tidyr)

# Create NBA matrix, sparse
no_customers <- 1e6
no_features <- 30

NBA_matrix <-
  matrix(
    sample(c(rep(0, 1000), 1), size = no_customers * no_features, replace = TRUE),
    nrow = no_customers,
    ncol = no_features
  )

# Create NBA_like R6 object with matrix
library(R6)
NBA_lite <- R6Class("NBA_lite",
              public = list(
                mm = NULL,
                initialize = function(input_matrix) self$mm <- input_matrix,
                get_matrix = function() self$mm,
                modify = function(row, col, value) self$mm[row,col] <- value
              )
)
new_NBA_lite <- NBA_lite$new(input_matrix = NBA_matrix)

# Benchmark modifying single value, matrix vs R6 field
bench::mark(matrix     = NBA_matrix[234123, 10] <- 2,
            R6_field   = new_NBA_lite$modify(row = 234123, col = 10, value = 2))
expression                                              median  total_time
NBA_matrix[234123, 10] <- 2                             804ns   8.84ms
new_NBA_lite$modify(row = 234123, col = 10, value = 2)   126ms  125.66ms

From bench::mark, it seems that using the R6 method triggers garbage collection each time, which could explain the added ~120ms.

I have a hard time understanding this, as I perceive R6 objects as pass-by-reference environments, which should not incur a copy-on-modify and need for garbage collection.

PS: The same gc delay occurs using S4 OOP in R.

Thanks in advance


Viewing all articles
Browse latest Browse all 206278

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>