Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 211988

foreach slows down dramatically

$
0
0

I'm running a number of serial, non-threaded processes using foreach. The intent is to run 300, but after the first 30 or so, these processes begin to crawl to a halt. The processes happen to be lasso regressions using glinternet, so I at first thought that some of the (different) data frames were proving to be computationally hard to fit. However, when I restore data form the hanging jobs and run it in an R session, it flies, so the slowdown is not the property of regressions, but the property of foreach. The data frame is not overly large - 800 samples by 1200 variables - and glinternet.cv normally takes about 1 hour to run 10x, but some jobs don't finish in days (and can be reproducibly run to completion by hand in an hour!). Has anyone seen this behavior? Does foreach hvae some sort of a cache that needs to be flushed or something of this nature? I've used foreach for years and never run into this problem.

As an extra detail, the machine is an AWS instance with 36 virtual cores, and I set up a virtual cluster of 30 nodes. no one else is using the machine and the load is below 100%. Also, the initial cluster was a PSOCK cluster. Trying with a fork cluster to eliminate network issues.


Viewing all articles
Browse latest Browse all 211988

Trending Articles