Minimalist example of what I'm trying to do:
dX_i <- rnorm(100, 0, 0.0002540362)
p_vec <- seq(0, 1, 0.25)
gamma_vec <- seq(1, 2, 0.25)
a_vec <- seq(2, 6, 1)
sigma_hat_vec <- c(0.03201636, 0.05771143, 0.07932116, 0.12262327, 0.15074560)
delta_j_vec <- c(0.0000005850109, 0.0000011700217, 0.0000017550326, 0.0000035100651, 0.0000052650977)
parameters <- expand.grid("p" = p_vec, "gamma" = gamma_vec, "a" = a_vec, "sigma_hat" = sigma_hat_vec, "delta_j" = delta_j_vec)
result <- sapply(1:nrow(parameters), function(x) {
tmp <- parameters[x,]
p <- tmp$p
a <- tmp$a
gamma <- tmp$gamma
sigma_hat <- tmp$sigma_hat
delta_j <- tmp$delta_j
B <- sum( (abs(dX_i)^p) * ( abs(dX_i) < gamma * a * sigma_hat * delta_j^(1/2) ))
return(B)
})
Goal: I need to calculate B
on vector dX given all combinations of p, a, gamma, sigma_hat, delta_j.
However, in reality the grid parameters
has ~600k rows, and dX_i
has length ~80k. Moreover, I have a list with ~1000 dX_i
. Therefore, I want to make this calculation as efficient as possible. Other approaches, e.g. converting parameters
to data.table and running sapply
within that data.table did not seem to give a speedup.
I tried parallelizing the function (I am limited to running the script on a virtual Windows machine):
cl <- makePSOCKcluster(numCores)
num.iter <- 1:nrow(parameters)
parSapply(cl, num.iter, function(x, parameters, dX_i) {
tmp <- parameters[x,]
p <- tmp$p
a <- tmp$a
gamma <- tmp$gamma
sigma_hat <- tmp$sigma_hat
delta_j <- tmp$delta_j
sum( (abs(dX_i)^p) * ( abs(dX_i) < gamma * a * sigma_hat * delta_j^(1/2) ))
}, parameters, dX_i)
stopCluster(cl)
While this gave me a speedup, I still feel like I'm not really solving this problem in the most efficient way and would appreciate any suggestions.