Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

Using setdiff on dataframes with list columns

$
0
0

Is there an R function to get the rows that are in one data.frame but not in another, if the data.frames contain list-columns? I know dplyr::setdiff will work on regular data.frames, but if I apply it to a data.frame with a list-column, I get an error.

list_df1 <- data.frame(x = c(1, 2))
list_df1$y <- list(c("A", "B"), c("C"))
list_df2 <- data.frame(x = c(2, 3))
list_df2$y <- list(c("C"), c("D", "E"))
dplyr::setdiff(list_df1, list_df2)
#> Error: Can't join on 'y' x 'y' because of incompatible types (list / list)

Currently I've been using a loop over the rows in both data.frames and directly comparing if the rows are equal:

in_df2 <- rep(FALSE, nrow(list_df1))
for (row_ind1 in seq_len(nrow(list_df1))) {
  for (row_ind2 in seq_len(nrow(list_df2))) {
    rows_equal <- all.equal(list_df1[row_ind1, ], 
                            list_df2[row_ind2, ], 
                            check.attributes = FALSE)
    if (isTRUE(rows_equal)) {
      in_df2[row_ind1] <- TRUE
      break
    }
  }
}
list_df1[!in_df2, ]
#>   x    y
#> 1 1 A, B
#> 3 2 B, C

And while gives the result I'm looking for, I'm sure there must be a better or more efficient solution.

dplyr::anti_join is also a possible solution, if the non-list columns uniquely identify the results. But in this case, I want to remove rows only if all entries are identical between the two data.frames. If we apply anti_join on just column x we don't get the results I'm looking for:

dplyr::anti_join(list_df1, list_df2, by = "x")
#>   x    y
#> 1 1 A, B

And applying it to all columns gives an error, just list set_diff

dplr::anti_join(list_df1, list_df2)
#> Error: Can't join on 'y' x 'y' because of incompatible types (list / list)

Viewing all articles
Browse latest Browse all 201839

Trending Articles