If I run the following trivial example, I get the expected output:
library(dplyr)
library(dtplyr)
library(data.table)
dt1 <- lazy_dt(data.table(a = 1:5, b = 6:10))
dt2 <- lazy_dt(data.table(a = letters[1:5], b = 6:10))
dt1 %>%
left_join(
dt2,
by = "b"
) %>%
as.data.table()
> b a.x a.y
> 1: 6 1 a
> 2: 7 2 b
> 3: 8 3 c
> 4: 9 4 d
> 5: 10 5 e
Note that the conflicting columns a are properly managed, using the standard dplyr format of adding .x and .y suffixes.
However, if I now try to drop one of the columns:
dt1 %>%
left_join(
dt2,
by = "b"
) %>%
select(
-a.y
) %>%
as.data.table()
> Error in is_character(x) : object 'a.y' not found
Interestingly, if I try to select one of the a columns (select(a.x)), I get the same error, but... if I instead try select(a) (selecting a column which shouldn't really exist anymore), I get the following output:
dt1 %>%
left_join(
dt2,
by = "b"
) %>%
select(
a
) %>%
as.data.table()
> a.b
> 1: 1
> 2: 2
> 3: 3
> 4: 4
> 5: 5
where the selected column is clearly dt1$a, but for some reason the given column name is a.b. (if I try select(a.b), I get the same object not found error).
Meanwhile, if I try to drop a, both a columns are dropped:
dt1 %>%
left_join(
dt2,
by = "b"
) %>%
select(
-a
) %>%
as.data.table()
> b
> 1: 6
> 2: 7
> 3: 8
> 4: 9
> 5: 10
So, how can I use select with joins where the tables have conflicting (and not joined-by) columns?