Is there an R function to get the rows that are in one data.frame but not in another, if the data.frames contain list-columns? I know dplyr::setdiff will work on regular data.frames, but if I apply it to a data.frame with a list-column, I get an error.
list_df1 <- data.frame(x = c(1, 2))
list_df1$y <- list(c("A", "B"), c("C"))
list_df2 <- data.frame(x = c(2, 3))
list_df2$y <- list(c("C"), c("D", "E"))
dplyr::setdiff(list_df1, list_df2)
#> Error: Can't join on 'y' x 'y' because of incompatible types (list / list)
Currently I've been using a loop over the rows in both data.frames and directly comparing if the rows are equal:
in_df2 <- rep(FALSE, nrow(list_df1))
for (row_ind1 in seq_len(nrow(list_df1))) {
for (row_ind2 in seq_len(nrow(list_df2))) {
rows_equal <- all.equal(list_df1[row_ind1, ],
list_df2[row_ind2, ],
check.attributes = FALSE)
if (isTRUE(rows_equal)) {
in_df2[row_ind1] <- TRUE
break
}
}
}
list_df1[!in_df2, ]
#> x y
#> 1 1 A, B
#> 3 2 B, C
And while gives the result I'm looking for, I'm sure there must be a better or more efficient solution.
dplyr::anti_join is also a possible solution, if the non-list columns uniquely identify the results. But in this case, I want to remove rows only if all entries are identical between the two data.frames. If we apply anti_join on just column x we don't get the results I'm looking for:
dplyr::anti_join(list_df1, list_df2, by = "x")
#> x y
#> 1 1 A, B
And applying it to all columns gives an error, just list set_diff
dplr::anti_join(list_df1, list_df2)
#> Error: Can't join on 'y' x 'y' because of incompatible types (list / list)