Allocating tasks to parallel workers so that expected cost is roughly equal

I have an assignment problem where I'm trying to allocate a number of tasks with a known expected cost (runtime in seconds) to X parallel workers, subject to the constraint that each worker receives the same number of tasks (save for remainders), so that the total expected runtime per worker is roughly equal.

I'm using a data frame that defines the tasks to be executed, and for each task I can calculate a pretty accurate expected cost (runtime in seconds). E.g. something like this:

library("tibble")

set.seed(1232)
tasks <- tibble(task = 1:20, cost = runif(20, min = 1, max = 5)^2)
head(tasks)
#> # A tibble: 6 x 2
#>    task  cost
#>   <int> <dbl>
#> 1     1 22.5 
#> 2     2 20.0 
#> 3     3 21.3 
#> 4     4  8.13
#> 5     5 18.3 
#> 6     6 19.6

^{Created on 2019-11-21 by the reprex package (v0.3.0)}

This is then used with foreach::foreach(...) %dopar% ... to execute the tasks in parallel. foreach() splits the tasks into roughly equal sized groups with size nrow(tasks)/X where X is the number of parallel workers (cores).

I'm currently shuffling the task list so that the cost is roughly equal for each worker, but there can still be substantial deviations, i.e. some workers get finished much earlier than others and thus it would have been better if they had had some more costly tasks. E.g.:

# shuffle tasks (in the original application cost is not random initially)
tasks <- tasks[sample(1:nrow(tasks)), ]

# number of workers
X <- 4
tasks$worker <- rep(1:X, each = nrow(tasks)/X)

# expected total cost (runtime in s) per worker
sapply(split(tasks$cost, tasks$worker), sum)
#>        1        2        3        4 
#> 77.25278 35.25026 66.09959 64.05435

^{Created on 2019-11-21 by the reprex package (v0.3.0)}

The second worker finishes in half the time as the other workers, so its capacity is wasted and the thing overall takes longer to finish.

What I'd like to do instead is have a way of re-ordering the task data frame so that when foreach splits it into X groups the total expected cost per group is more even.

I imagine this is a super-well known kind of problem and I just don't know the right verbiage to google (nor how to do it in R). Thanks for any help.

(EDIT) Mostly better alternative

For now, a relatively simple alternative that seems to do better than random shuffling. This orders the tasks by cost, assigns the first X tasks to workers 1 to X, then assigns the next chunk of X tasks in reverse order to workers X to 1, etc (this is "alt1" below).

library("tibble")
library("dplyr")
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library("ggplot2")
library("tidyr")

race <- function() {
  N_TASKS = 100
  X = 4
  tasks <- tibble(task = 1:N_TASKS, cost = runif(N_TASKS, min = 1, max = 10)^2)

  # random shuffle
  tasks$worker <- rep(1:X, each = nrow(tasks)/X)
  rando <- max(sapply(split(tasks$cost, tasks$worker), sum))

  # alternative 1
  tasks <- tasks[order(tasks$cost), ]
  tasks$worker <- rep(c(1:X, X:1), length.out = nrow(tasks))
  alt1 <- max(sapply(split(tasks$cost, tasks$worker), sum))

  # modified version of ivan100sic's answer
  # sort by descending cost, after initial allocation, allocate costly tasks
  # first to workers with lowest total cost so far
  group <- factor(rep(1:(ceiling(nrow(tasks)/4)), each = X))
  tasks <- tasks[order(tasks$cost, decreasing = TRUE), ]
  tasks$worker <- c(1:X, rep(NA, length.out = nrow(tasks) - X))
  task_sets <- split(tasks, group)
  task_sets[[1]]$worker <- 1:X
  for (i in 2:length(task_sets)) {
    # get current total cost by worker
    total <- task_sets %>% 
      bind_rows() %>%
      filter(!is.na(worker)) %>%
      group_by(worker) %>%
      summarize(cost = sum(cost)) %>%
      arrange(cost)
    task_sets[[i]]$worker <- total[["worker"]]
  }
  tasks <- bind_rows(task_sets)
  alt2  <- max(sapply(split(tasks$cost, tasks$worker), sum))

  c(rando = rando, alt1 = alt1, alt2 = alt2)
}

set.seed(24332)
sims <- replicate(1e3, race())
sims %>%
  t() %>%
  as_tibble() %>%
  pivot_longer(rando:alt2, names_to = "Method") %>%
  ggplot(aes(x = value, color = Method)) + 
  geom_density() +
  scale_x_continuous(limits = c(0, max(sims))) +
  labs(x = "Total runtime (s)")

^{Created on 2019-11-28 by the reprex package (v0.3.0)}

"rando": randomly shuffle the task list
"alt1": sort tasks by cost and alternate assigning to worker 1 to X, X to 1, etc.
"alt2": based on ivan100sic's answer below, after the first allocation to workers 1 to X, allocate based on total cost per worker so far

Allocating tasks to parallel workers so that expected cost is roughly equal

(EDIT) Mostly better alternative

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112