I would like to identify and mark duplicate rows based on 2 columns. I would like to make a unique identifier for each duplicate so I know not just that the row is a duplicate, but which row it is a duplicate with. I have a dataframe that looks like below with some duplicate item pairs (on fit and sit) and other pairs that are not duplicated. While the item pairs are duplicated, the information they contain is unique (e.g., one row will have a value in Value1 for 1 row, but not Value2 and Value 3, the second or 'duplicate' row will have numbers for Value2 and Value3 just not Value1)
current dataframe
value1 value2 value3 fit sit
[1,] "1" NA NA "it1""it2"
[2,] NA "3""2""it2""it1"
[3,] "2""3""4""it3""it4"
[4,] NA NA NA "it4""it3"
[5,] "5" NA NA "it5""it6"
[6,] NA NA "2""it6""it5"
[7,] NA "4" NA "it7""it9"
code to generate example dataframe
value1<-c(1,NA,2,NA,5,NA,NA)
value2<-c(NA,3,3,NA,NA,NA, 4)
value3<-c(NA,2,4,NA,NA,2, NA)
fit<-c("it1","it2","it3","it4", "it5", "it6","it7")
sit<-c("it2","it1","it4","it3", "it6", "it5", "it9")
df.now<-cbind(value1,value2,value3, fit, sit)
what I want is to convert it to a dataframe that looks like this:
desired dataframe
val1 val2 val3 it1 it2
[1,] "1""3""2""it1""it2"
[2,] "2""3""4""it3""it4"
[3,] "5" NA "2""it5""it6"
[4,] NA "4" NA "it7""it9"
I was thinking of doing the following steps: 1. create new variables using fit and sit with the lowest item and highest items to identify duplicate pairs 2. identify duplicated item pairs 3. use ifelse to select and fill in unique information.
I know how to do steps 1 and 3, but am stuck on step 2. I think what I need to do is not just identify TRUE/FALSE duplicate, but perhaps have a column with a unique identifier for each item pair like this (there are 2 extra rows because of my step 1):
value1 value2 value3 fit sit lit hit dup
[1,] "1" NA NA "it1""it2""it1""it2" 1
[2,] NA "3""2""it2""it1""it1""it2" 1
[3,] "2""3""4""it3""it4""it3""it4" 2
[4,] NA NA NA "it4""it3""it3""it4" 2
[5,] "5" NA NA "it5""it6""it5""it6" 3
[6,] NA NA "2""it6""it5""it5""it6" 3
[7,] NA "4" NA "it7""it9""it7""it9" NA
I am not sure how to do this.
What I am asking for is either help with step 2 or perhaps there is a better way to solve it than the steps I outlined.