Can timing or memory usage illustrate the hazard of growing objects in a loop?

Occasionally, we find novice R programmers build data frames in a for loop, usually by initializing an empty data frame and then iteratively calling rbind. To respond to this inefficient approach, we often cite Patrick Burns'R Inferno - Circle 2: Growing Objects who emphasizes the hazard of this situation.

In Python pandas (the other open-source data science tool), experts have asserted the quadratic copy and O(N^2) logic: (@unutbu here, @Alexander here). Additionally, docs (see section note) stress the copying problem of datasets and wiki explains Python's list.append does not have the copy problem. I wonder if similar constructs apply to R.

Specifically, my question:

Can timing alone illustrate or quantify the growing object in loop problem? See microbenchmark results below. Burns shows timings to illustrate the computational challenge to create a sequence.
Or does memory usage illustrate or quantify the growing object in loop problem? See RProf results below. Burns cites using RProf to show memory consumption within code.
Or is the growing object problem, context-specific, with general rule of thumb to avoid loops in building objects?

Consider following examples of growing a random data frame of 500 rows in a loop and using a list:

grow_df_loop <- function(n) {
  final_df <- data.frame()

  for(i in 1:n) {
    df <- data.frame(
      group = sample(c("sas", "stata", "spss", "python", "r", "julia"), 500, replace=TRUE),
      int = sample(1:15, 500, replace=TRUE),
      num = rnorm(500),
      char = replicate(500, paste(sample(c(LETTERS, letters, c(0:9)), 3, replace=TRUE), collapse="")),
      bool = sample(c(TRUE, FALSE), 500, replace=TRUE),
      date = as.Date(sample(10957:as.integer(Sys.Date()), 500, replace=TRUE), origin="1970-01-01")
    )        
    final_df <- rbind(final_df, df)
  }

  return(final_df)
}

grow_df_list <- function(n) {
  df_list <- lapply(1:n, function(i)
    df <- data.frame(
      group = sample(c("sas", "stata", "spss", "python", "r", "julia"), 500, replace=TRUE),
      int = sample(1:15, 500, replace=TRUE),
      num = rnorm(500),
      char = replicate(500, paste(sample(c(LETTERS, letters, c(0:9)), 3, replace=TRUE), collapse="")),
      bool = sample(c(TRUE, FALSE), 500, replace=TRUE),
      date = as.Date(sample(10957:as.integer(Sys.Date()), 500, replace=TRUE), origin="1970-01-01")
    )
  )

  final_df <- do.call(rbind, df_list)
  return(final_df)
}

Timing

Benchmarking by timing confirms the list approach is more efficient across the different number of iterations. But given reproducible, uniform data examples can timing results capture the difference of object growth?

library(microbenchmark)

microbenchmark(grow_df_loop(50), grow_df_list(50), times = 5L)
# Unit: milliseconds
#              expr      min       lq     mean   median       uq      max neval cld
#  grow_df_loop(50) 758.2412 762.3489 809.8988 793.3590 806.4191 929.1256     5   b
#  grow_df_list(50) 554.3722 562.1949 577.6891 568.7658 589.8565 613.2560     5  a 

microbenchmark(grow_df_loop(100), grow_df_list(100), times = 5L)
# Unit: seconds
#               expr      min       lq     mean   median       uq      max neval cld
#  grow_df_loop(100) 2.223617 2.225441 2.425668 2.233529 2.677309 2.768447     5   b
#  grow_df_list(100) 1.211181 1.255191 1.325670 1.287821 1.396905 1.477252     5  a 

microbenchmark(grow_df_loop(500), grow_df_list(500), times = 5L)
# Unit: seconds
#               expr      min       lq     mean   median       uq      max neval cld
#  grow_df_loop(500) 38.78245 39.74367 41.54976 40.10221 44.36565 44.75483     5   b
#  grow_df_list(500) 13.37076 13.90227 14.67498 14.53042 15.49942 16.07203     5  a

Memory Usage

Additionally, profiling by memory shows "rbind" memory totals sizeably growing with iteration size but more pronounced with loop approach than list approach. Given a reproducible, uniform example can mem.total results capture the difference of object growth? Any other approach to use?

Loop Approach

n = 50

utils::Rprof(tmp <- tempfile(), memory.profiling = TRUE)
output_df1 <- grow_df_loop(50)
utils::Rprof(NULL)
summaryRprof(tmp, memory="both")
unlink(tmp)

# $by.total
#                           total.time total.pct mem.total self.time self.pct
# "grow_df_loop"                  0.58    100.00     349.1      0.00     0.00
# "data.frame"                    0.38     65.52     209.4      0.00     0.00
# "paste"                         0.28     48.28     186.4      0.06    10.34
# "FUN"                           0.26     44.83     150.8      0.02     3.45
# "lapply"                        0.26     44.83     150.8      0.00     0.00
# "replicate"                     0.26     44.83     150.8      0.00     0.00
# "sapply"                        0.26     44.83     150.8      0.00     0.00
# "sample"                        0.20     34.48     131.4      0.08    13.79
# "rbind"                         0.20     34.48     139.7      0.00     0.00
# "[<-.factor"                    0.12     20.69      66.0      0.10    17.24
# "[<-"                           0.12     20.69      66.0      0.00     0.00
# "factor"                        0.10     17.24      47.8      0.04     6.90
# "as.data.frame"                 0.10     17.24      48.5      0.00     0.00
# "as.data.frame.character"       0.10     17.24      48.5      0.00     0.00
# "order"                         0.06     10.34      12.9      0.06    10.34
# "as.vector"                     0.04      6.90      38.7      0.04     6.90
# "sample.int"                    0.04      6.90      18.7      0.02     3.45
# "as.vector.factor"              0.04      6.90      38.7      0.00     0.00
# "deparse"                       0.04      6.90      35.6      0.00     0.00
# "!"                             0.02      3.45      18.7      0.02     3.45
# ":"                             0.02      3.45       0.0      0.02     3.45
# "anyNA"                         0.02      3.45      19.0      0.02     3.45
# "as.POSIXlt.POSIXct"            0.02      3.45      10.1      0.02     3.45
# "c"                             0.02      3.45      19.8      0.02     3.45
# "is.na"                         0.02      3.45      18.9      0.02     3.45
# "length"                        0.02      3.45      13.8      0.02     3.45
# "mode"                          0.02      3.45      16.6      0.02     3.45
# "%in%"                          0.02      3.45      16.6      0.00     0.00
# ".deparseOpts"                  0.02      3.45      19.0      0.00     0.00
# "as.Date"                       0.02      3.45      10.1      0.00     0.00
# "as.POSIXlt"                    0.02      3.45      10.1      0.00     0.00
# "Sys.Date"                      0.02      3.45      10.1      0.00     0.00
# 
# $sample.interval
# [1] 0.02
# 
# $sampling.time
# [1] 0.58

n = 100

# $by.total
#                           total.time total.pct mem.total self.time self.pct
# "grow_df_loop"                  1.74     98.86     963.0      0.00     0.00
# "rbind"                         1.06     60.23     599.3      0.06     3.41
# "data.frame"                    0.68     38.64     363.7      0.02     1.14
# "lapply"                        0.50     28.41     239.0      0.04     2.27
# "replicate"                     0.50     28.41     239.0      0.00     0.00
# "sapply"                        0.50     28.41     239.0      0.00     0.00
# "paste"                         0.46     26.14     218.4      0.06     3.41
# "FUN"                           0.46     26.14     218.4      0.00     0.00
# "factor"                        0.44     25.00     249.2      0.24    13.64
# "sample"                        0.40     22.73     179.2      0.10     5.68
# "[<-"                           0.38     21.59     244.3      0.00     0.00
# "[<-.factor"                    0.34     19.32     229.5      0.30    17.05
# "c"                             0.26     14.77     136.6      0.26    14.77
# "as.vector"                     0.24     13.64     101.2      0.24    13.64
# "as.vector.factor"              0.24     13.64     101.2      0.00     0.00
# "order"                         0.14      7.95      87.3      0.14     7.95
# "as.data.frame"                 0.14      7.95      87.3      0.00     0.00
# "as.data.frame.character"       0.14      7.95      87.3      0.00     0.00
# "sample.int"                    0.10      5.68      28.2      0.10     5.68
# "unique"                        0.10      5.68      64.9      0.00     0.00
# "is.na"                         0.06      3.41      62.4      0.06     3.41
# "unique.default"                0.04      2.27      42.4      0.04     2.27
# "[<-.Date"                      0.04      2.27      14.9      0.00     0.00
# ".Call"                         0.02      1.14       0.0      0.02     1.14
# "Make.row.names"                0.02      1.14       0.0      0.02     1.14
# "NextMethod"                    0.02      1.14       0.0      0.02     1.14
# "structure"                     0.02      1.14      10.3      0.02     1.14
# "unclass"                       0.02      1.14      14.9      0.02     1.14
# ".Date"                         0.02      1.14       0.0      0.00     0.00
# ".rs.enqueClientEvent"          0.02      1.14       0.0      0.00     0.00
# "as.Date"                       0.02      1.14      23.2      0.00     0.00
# "as.Date.character"             0.02      1.14      23.2      0.00     0.00
# "as.Date.numeric"               0.02      1.14      23.2      0.00     0.00
# "charToDate"                    0.02      1.14      23.2      0.00     0.00
# "hook"                          0.02      1.14       0.0      0.00     0.00
# "is.na.POSIXlt"                 0.02      1.14      23.2      0.00     0.00
# "utils::Rprof"                  0.02      1.14       0.0      0.00     0.00
# 
# $sample.interval
# [1] 0.02
# 
# $sampling.time
# [1] 1.76

n = 500

# $by.total
#                           total.time total.pct mem.total self.time self.pct
# "grow_df_loop"                 28.12    100.00   15557.7      0.00     0.00
# "rbind"                        25.30     89.97   13418.5      3.06    10.88
# "factor"                        8.94     31.79    5026.5      6.98    24.82
# "[<-"                           8.72     31.01    4486.9      0.02     0.07
# "[<-.factor"                    7.62     27.10    3915.5      7.32    26.03
# "unique"                        3.06     10.88    2060.9      0.00     0.00
# "as.vector"                     2.96     10.53    1250.1      2.96    10.53
# "as.vector.factor"              2.96     10.53    1250.1      0.00     0.00
# "data.frame"                    2.82     10.03    2139.1      0.02     0.07
# "unique.default"                2.30      8.18    1657.9      2.30     8.18
# "replicate"                     1.88      6.69    1364.7      0.00     0.00
# "sapply"                        1.88      6.69    1364.7      0.00     0.00
# "FUN"                           1.84      6.54    1367.2      0.18     0.64
# "lapply"                        1.84      6.54    1338.8      0.02     0.07
# "paste"                         1.70      6.05    1281.3      0.38     1.35
# "sample"                        1.36      4.84    1089.2      0.20     0.71
# "[<-.Date"                      1.08      3.84     571.4      0.00     0.00
# "c"                             1.04      3.70     688.7      1.04     3.70
# ".Date"                         0.96      3.41     488.0      0.34     1.21
# "sample.int"                    0.76      2.70     584.2      0.74     2.63
# "as.data.frame"                 0.70      2.49     533.6      0.00     0.00
# "as.data.frame.character"       0.64      2.28     476.0      0.00     0.00
# "NextMethod"                    0.62      2.20     424.7      0.62     2.20
# "order"                         0.60      2.13     475.5      0.50     1.78
# "structure"                     0.32      1.14     155.5      0.32     1.14
# "is.na"                         0.28      1.00     150.5      0.26     0.92
# "Make.row.names"                0.12      0.43     153.8      0.12     0.43
# "unclass"                       0.12      0.43      83.3      0.12     0.43
# "as.Date"                       0.10      0.36     120.1      0.02     0.07
# "length"                        0.06      0.21      79.2      0.06     0.21
# "seq.int"                       0.06      0.21      57.0      0.06     0.21
# "vapply"                        0.06      0.21      84.6      0.02     0.07
# ":"                             0.04      0.14       1.1      0.04     0.14
# "as.POSIXlt.POSIXct"            0.04      0.14      57.7      0.04     0.14
# "is.factor"                     0.04      0.14       0.0      0.04     0.14
# "deparse"                       0.04      0.14      55.0      0.02     0.07
# "eval"                          0.04      0.14      36.2      0.02     0.07
# "match.arg"                     0.04      0.14      25.2      0.02     0.07
# "match.fun"                     0.04      0.14      32.4      0.02     0.07
# "as.data.frame.integer"         0.04      0.14      55.0      0.00     0.00
# "as.POSIXlt"                    0.04      0.14      57.7      0.00     0.00
# "force"                         0.04      0.14      55.0      0.00     0.00
# "make.names"                    0.04      0.14      42.1      0.00     0.00
# "Sys.Date"                      0.04      0.14      57.7      0.00     0.00
# "!"                             0.02      0.07      29.6      0.02     0.07
# "$"                             0.02      0.07       2.6      0.02     0.07
# "any"                           0.02      0.07      18.3      0.02     0.07
# "as.data.frame.numeric"         0.02      0.07       2.6      0.02     0.07
# "as.data.frame.vector"          0.02      0.07      21.6      0.02     0.07
# "as.list"                       0.02      0.07      26.6      0.02     0.07
# "baseenv"                       0.02      0.07      25.2      0.02     0.07
# "is.ordered"                    0.02      0.07      14.5      0.02     0.07
# "lengths"                       0.02      0.07      14.9      0.02     0.07
# "levels"                        0.02      0.07       0.0      0.02     0.07
# "mode"                          0.02      0.07      30.7      0.02     0.07
# "names"                         0.02      0.07       0.0      0.02     0.07
# "rnorm"                         0.02      0.07      29.6      0.02     0.07
# "%in%"                          0.02      0.07      30.7      0.00     0.00
# "as.Date.character"             0.02      0.07       2.6      0.00     0.00
# "as.Date.numeric"               0.02      0.07       2.6      0.00     0.00
# "as.POSIXct"                    0.02      0.07       2.6      0.00     0.00
# "as.POSIXct.POSIXlt"            0.02      0.07       2.6      0.00     0.00
# "charToDate"                    0.02      0.07       2.6      0.00     0.00
# "eval.parent"                   0.02      0.07      11.0      0.00     0.00
# "is.na.POSIXlt"                 0.02      0.07       2.6      0.00     0.00
# "simplify2array"                0.02      0.07      14.9      0.00     0.00
# 
# $sample.interval
# [1] 0.02
# 
# $sampling.time
# [1] 28.12

List Approach

n = 50

# $by.total
#                           total.time total.pct mem.total self.time self.pct
# "grow_df_list"                  0.40       100     257.0      0.00        0
# "data.frame"                    0.32        80     175.6      0.02        5
# "lapply"                        0.32        80     175.6      0.02        5
# "FUN"                           0.32        80     175.6      0.00        0
# "replicate"                     0.24        60     129.6      0.00        0
# "sapply"                        0.24        60     129.6      0.00        0
# "paste"                         0.22        55     119.2      0.10       25
# "sample"                        0.12        30      49.4      0.00        0
# "sample.int"                    0.08        20      39.1      0.08       20
# "<Anonymous>"                   0.08        20      81.4      0.00        0
# "do.call"                       0.08        20      81.4      0.00        0
# "rbind"                         0.08        20      81.4      0.00        0
# "factor"                        0.06        15      29.7      0.02        5
# "as.data.frame"                 0.06        15      29.7      0.00        0
# "as.data.frame.character"       0.06        15      29.7      0.00        0
# "c"                             0.04        10      10.3      0.04       10
# "order"                         0.04        10      17.3      0.04       10
# "unique.default"                0.04        10      31.1      0.04       10
# "[<-"                           0.04        10      50.3      0.00        0
# "unique"                        0.04        10      31.1      0.00        0
# ".Date"                         0.02         5      27.9      0.02        5
# "[<-.factor"                    0.02         5      22.4      0.02        5
# "[<-.Date"                      0.02         5      27.9      0.00        0
# 
# $sample.interval
# [1] 0.02
# 
# $sampling.time
# [1] 0.4

n = 100

# $by.total
#                           total.time total.pct mem.total self.time self.pct
# "grow_df_list"                  1.00       100     620.4      0.00        0
# "data.frame"                    0.66        66     401.8      0.00        0
# "FUN"                           0.66        66     401.8      0.00        0
# "lapply"                        0.66        66     401.8      0.00        0
# "paste"                         0.42        42     275.3      0.14       14
# "replicate"                     0.42        42     275.3      0.00        0
# "sapply"                        0.42        42     275.3      0.00        0
# "rbind"                         0.34        34     218.6      0.02        2
# "<Anonymous>"                   0.34        34     218.6      0.00        0
# "do.call"                       0.34        34     218.6      0.00        0
# "sample"                        0.28        28     188.6      0.08        8
# "unique.default"                0.20        20      90.1      0.20       20
# "unique"                        0.20        20      90.1      0.00        0
# "as.data.frame"                 0.18        18      81.2      0.00        0
# "factor"                        0.16        16      81.2      0.02        2
# "as.data.frame.character"       0.16        16      81.2      0.00        0
# "[<-.factor"                    0.14        14     112.0      0.14       14
# "sample.int"                    0.14        14      96.8      0.14       14
# "[<-"                           0.14        14     112.0      0.00        0
# "order"                         0.12        12      51.2      0.12       12
# "c"                             0.06         6      45.8      0.06        6
# "as.Date"                       0.04         4      28.3      0.02        2
# "length"                        0.02         2      17.0      0.02        2
# "strptime"                      0.02         2      11.2      0.02        2
# "structure"                     0.02         2       0.0      0.02        2
# "as.data.frame.integer"         0.02         2       0.0      0.00        0
# "as.Date.character"             0.02         2      11.2      0.00        0
# "as.Date.numeric"               0.02         2      11.2      0.00        0
# "charToDate"                    0.02         2      11.2      0.00        0
# 
# $sample.interval
# [1] 0.02
# 
# $sampling.time
# [1] 1

n = 500

# $by.total
#                           total.time total.pct mem.total self.time self.pct
# "grow_df_list"                  9.40    100.00    5621.8      0.00     0.00
# "rbind"                         6.12     65.11    3633.5      0.44     4.68
# "<Anonymous>"                   6.12     65.11    3633.5      0.00     0.00
# "do.call"                       6.12     65.11    3633.5      0.00     0.00
# "lapply"                        3.28     34.89    1988.3      0.34     3.62
# "FUN"                           3.28     34.89    1988.3      0.10     1.06
# "data.frame"                    3.28     34.89    1988.3      0.02     0.21
# "[<-"                           3.28     34.89    2118.4      0.00     0.00
# "[<-.factor"                    3.00     31.91    1829.1      3.00    31.91
# "replicate"                     2.36     25.11    1422.9      0.00     0.00
# "sapply"                        2.36     25.11    1422.9      0.00     0.00
# "unique"                        2.32     24.68    1189.9      0.00     0.00
# "paste"                         1.98     21.06    1194.2      0.70     7.45
# "unique.default"                1.96     20.85    1017.8      1.96    20.85
# "sample"                        1.20     12.77     707.4      0.44     4.68
# "as.data.frame"                 0.88      9.36     540.5      0.02     0.21
# "as.data.frame.character"       0.78      8.30     496.2      0.00     0.00
# "factor"                        0.72      7.66     444.2      0.06     0.64
# "c"                             0.68      7.23     379.6      0.68     7.23
# "order"                         0.64      6.81     385.1      0.64     6.81
# "sample.int"                    0.40      4.26     233.0      0.38     4.04
# ".Date"                         0.28      2.98     289.3      0.10     1.06
# "[<-.Date"                      0.28      2.98     289.3      0.00     0.00
# "NextMethod"                    0.18      1.91     171.2      0.18     1.91
# "deparse"                       0.08      0.85      54.6      0.02     0.21
# "%in%"                          0.08      0.85      54.6      0.00     0.00
# "mode"                          0.08      0.85      54.6      0.00     0.00
# "length"                        0.06      0.64      10.4      0.06     0.64
# "structure"                     0.06      0.64      30.8      0.04     0.43
# ".deparseOpts"                  0.06      0.64      49.1      0.02     0.21
# "[["                            0.06      0.64      34.2      0.02     0.21
# ":"                             0.04      0.43      33.6      0.04     0.43
# "[[.data.frame"                 0.04      0.43      22.6      0.04     0.43
# "force"                         0.04      0.43      20.0      0.00     0.00
# "as.vector"                     0.02      0.21       0.0      0.02     0.21
# "is.na"                         0.02      0.21       0.0      0.02     0.21
# "levels"                        0.02      0.21      14.6      0.02     0.21
# "make.names"                    0.02      0.21       9.4      0.02     0.21
# "pmatch"                        0.02      0.21      17.3      0.02     0.21
# "as.data.frame.Date"            0.02      0.21       5.5      0.00     0.00
# "as.data.frame.integer"         0.02      0.21       0.0      0.00     0.00
# "as.data.frame.logical"         0.02      0.21      14.5      0.00     0.00
# "as.data.frame.numeric"         0.02      0.21      13.5      0.00     0.00
# "as.data.frame.vector"          0.02      0.21      17.3      0.00     0.00
# "simplify2array"                0.02      0.21       0.0      0.00     0.00
# 
# $sample.interval
# [1] 0.02
# 
# $sampling.time
# [1] 9.4

Graphs(using a different call to save $by.total results)

Can timing or memory usage illustrate the hazard of growing objects in a loop?

Loop Approach

List Approach

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112