Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201945

How to demonstrate the power of dplyr's join verbs?

$
0
0

I've been wanting to demonstrate to a friend the elegance and speed of using dplyr's join verbs (e.g. inner_join()) over base R and simple subsetting. Took a big DB (from the nycflights13 package), started with a simple task, and to my surprise base R and simple subsetting was up to 10 times faster! And I could only really demonstrate the elegance, not speed.

Question is: what am I missing, when does dplyr's join verbs surpass base R and simple subsetting in performance? Do they ever?...

(P.S.: I know about data.table's excellent performance, asking about dplyr)

My Demo:

library(tidyverse)
library(nycflights13)
library(microbenchmark)

dim(flights)

[1] 336776 19

dim(airports)

[1] 1458 8

Task is: get the unique tailnums of all planes in flights where destination airport tzone was "America/New_York":

base_no_join <- function() {
  unique(flights$tailnum[flights$dest %in% airports$faa[airports$tzone == "America/New_York"]])
}

dplyr_no_join <- function() {
  flights %>%
    filter(dest %in% (airports %>%
                           filter(tzone=="America/New_York") %>%
                           pull(faa))) %>%
    pull(tailnum) %>%
    unique()
}

dplyr_join <- function() {
  flights %>%
    inner_join(airports, by = c("dest" = "faa")) %>%
    filter(tzone == "America/New_York") %>%
    pull(tailnum) %>%
    unique()
}

See that they give the same results:

all.equal(dplyr_join(), dplyr_no_join())

[1] TRUE

all.equal(dplyr_join(), base_no_join())

[1] TRUE

Now benchmark:

microbenchmark(base_no_join(), dplyr_no_join(), dplyr_join(), times = 10)
Unit: milliseconds
            expr     min      lq     mean   median       uq      max neval
  base_no_join()  9.7198 10.1067 13.16934 11.19465  13.4736  24.2831    10
 dplyr_no_join() 21.2810 22.9710 36.04867 26.59595  34.4221 108.0677    10
    dplyr_join() 60.7753 64.5726 93.86220 91.10475 119.1546 137.1721    10

Please help finding an example which shows this join's superiority if it exists.


Viewing all articles
Browse latest Browse all 201945

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>