If I run the following trivial example, I get the expected output:
library(dplyr)
library(dtplyr)
library(data.table)
dt1 <- lazy_dt(data.table(a = 1:5, b = 6:10))
dt2 <- lazy_dt(data.table(a = letters[1:5], b = 6:10))
dt1 %>%
left_join(
dt2,
by = "b"
) %>%
as.data.table()
> b a.x a.y
> 1: 6 1 a
> 2: 7 2 b
> 3: 8 3 c
> 4: 9 4 d
> 5: 10 5 e
Note that the conflicting columns a
are properly managed, using the standard dplyr
format of adding .x
and .y
suffixes.
However, if I now try to drop one of the columns:
dt1 %>%
left_join(
dt2,
by = "b"
) %>%
select(
-a.y
) %>%
as.data.table()
> Error in is_character(x) : object 'a.y' not found
Interestingly, if I try to select one of the a
columns (select(a.x)
), I get the same error, but... if I instead try select(a)
(selecting a column which shouldn't really exist anymore), I get the following output:
dt1 %>%
left_join(
dt2,
by = "b"
) %>%
select(
a
) %>%
as.data.table()
> a.b
> 1: 1
> 2: 2
> 3: 3
> 4: 4
> 5: 5
where the selected column is clearly dt1$a
, but for some reason the given column name is a.b
. (if I try select(a.b)
, I get the same object not found
error).
Meanwhile, if I try to drop a
, both a
columns are dropped:
dt1 %>%
left_join(
dt2,
by = "b"
) %>%
select(
-a
) %>%
as.data.table()
> b
> 1: 6
> 2: 7
> 3: 8
> 4: 9
> 5: 10
So, how can I use select
with joins where the tables have conflicting (and not joined-by) columns?