I want to remove duplicated values in each coulmn of an uneven data.table. For instance, if the original data is (the real data.table has many rows and columns):
dt <- data.table(A = c("5p", "3p", "3p", "6y", NA), B = c("1c", "4r", "1c", NA, NA), C = c("4f", "5", "5", "5", "4m"))
> dt
A B C
1: 5p 1c 4f
2: 3p 4r 5
3: 3p 1c 5
4: 6y <NA> 5
5: <NA> <NA> 4m
after removal of duplicated values in each column it should look like this:
A B C
5p 1c 4f
3p 4r 5
NA NA NA
6y NA NA
NA NA 4m
I am trying a solution proposed in another thread (replace duplicate values with NA in time series data using dplyr) using data.table. However, I only get the first duplicated value in each column replaced with "NA", but not the subsequents.
cols <- colnames(dt)
dt[, lapply(.SD, function(x) replace(x, anyDuplicated(x), NA)), .SDcols = cols]
> dt
A B C
1: 5p 1c 4f
2: 3p 4r 5
3: <NA> <NA> <NA>
4: 6y <NA> 5
5: <NA> <NA> 4m
How should I modify the code to get all duplicates replaced?