Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

R - terrible parallelisation performance within a function due to pointless serialization, how to improve?

$
0
0

Once a function environment has some stuff in it, serializing all this stuff (even when it is not needed) adds a big overhead to parallelization. Is there then an effective way to use parallelization within a function? I've tried the future library but I need persistent workers, and would rather stick with base R if feasible. Example:

test<-function(){
  clct=parallel::makeCluster(4)

  a=Sys.time()
  parallel::clusterCall(clct,function(x) 1)
  print(Sys.time()-a)

  big <- matrix(rnorm(8000000))

  a=Sys.time()
  parallel::clusterCall(clct,function(x) 1)
  print(Sys.time()-a)

  parallel::stopCluster(clct)
}

test()

Time difference of 0.0009980202 secs

Time difference of 0.8078392 secs

If I simply put the lines calling the cluster in their own function in the global environment, this works fine, but then as soon as I pass anything (in this case, y=4) from the test function environment, it's again broken:

f1=function(x,y){
  a=Sys.time()
  parallel::clusterCall(x,function(x) y)
  print(Sys.time()-a)
}

test2<-function(){
  clct=parallel::makeCluster(4)
f1(clct,4)

  big <- matrix(rnorm(8000000))
f1(clct,4)
  parallel::stopCluster(clct)
}

test2()

Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>