I have a strong use case for parallelizing a flavor of the SGD algorithm. In such use-case I need to update the matrices P and Q with the delta gradient update and for a random batch of samples. Each process will update mutually exclusive indices on both matrices.
A simple illustration of what I intend to do would be something like this:
# create "big" matrix
A <- matrix(rnorm(10000), 100, 100)
system.time(
# update each row vector independently using all my cores
r <- mclapply(1:100, mc.cores = 6, function(i) {
# updating ...
A[i,] <- A[i,] - 0.01
# return something, i.e. here I'd return the RMSE of this batch instead
sqrt(sum(A[i,]^2))
})
)
Are there any drawbacks on using this approach? are there more R-idiomatic alternatives?
For example, to be clean (i.e. no side effects, immutable computation) returning the update A[i,] - 0.01
instead of the RMSE
would be more complex to program and peak on memory usage or even run out of memory.