Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201919

Select after a join with conflicting columns with dtplyr

$
0
0

If I run the following trivial example, I get the expected output:

library(dplyr)
library(dtplyr)
library(data.table)

dt1 <- lazy_dt(data.table(a = 1:5, b = 6:10))
dt2 <- lazy_dt(data.table(a = letters[1:5], b = 6:10))

dt1 %>%
  left_join(
    dt2,
    by = "b"
  ) %>%
  as.data.table()
>     b a.x a.y
> 1:  6   1   a
> 2:  7   2   b
> 3:  8   3   c
> 4:  9   4   d
> 5: 10   5   e

Note that the conflicting columns a are properly managed, using the standard dplyr format of adding .x and .y suffixes.

However, if I now try to drop one of the columns:

dt1 %>%
  left_join(
    dt2,
    by = "b"
  ) %>%
  select(
    -a.y
  ) %>%
  as.data.table()
> Error in is_character(x) : object 'a.y' not found

Interestingly, if I try to select one of the a columns (select(a.x)), I get the same error, but... if I instead try select(a) (selecting a column which shouldn't really exist anymore), I get the following output:

dt1 %>%
  left_join(
    dt2,
    by = "b"
  ) %>%
  select(
    a
  ) %>%
  as.data.table()
>    a.b
> 1:   1
> 2:   2
> 3:   3
> 4:   4
> 5:   5

where the selected column is clearly dt1$a, but for some reason the given column name is a.b. (if I try select(a.b), I get the same object not found error).

Meanwhile, if I try to drop a, both a columns are dropped:

dt1 %>%
  left_join(
    dt2,
    by = "b"
  ) %>%
  select(
    -a
  ) %>%
  as.data.table()
>     b
> 1:  6
> 2:  7
> 3:  8
> 4:  9
> 5: 10

So, how can I use select with joins where the tables have conflicting (and not joined-by) columns?


Viewing all articles
Browse latest Browse all 201919

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>