I hope I can explain myself clearly.
I have a dataset like this
dataset <- data.frame(ID = c(1,1,1,2,2,2,3,3,3),
Invoice = c(1,2,3,1,2,3,1,2,3),
Invoice_Date = c('09/30/2019','10/30/2019','11/30/2019',
'10/31/2019','11/30/2019','12/31/2019',
'7/31/2019','9/30/2019','12/31/2019'),
paid_unpaid = c('no','yes','yes','yes','no','no','no','yes','no'),
stringsAsFactors = FALSE)
dataset$Invoice_Date <- as.Date(dataset$Invoice_Date, '%m/%d/%y')
ID. Invoice. Date of Invoice. paid or not.
1 1 09/30/2019 no
1 2 10/30/2019 yes
1 3 11/30/2019 yes
2 1 10/31/2019 yes
2 2 11/30/2019 no
2 3 12/31/2019 no
3 1 7/31/2019 no
3 2 9/30/2019 yes
3 3 12/31/2019 no
I want to select customers who have more than one unpaid invoices. So the frequency of no in the variable "paid or not" appears more than once.
after selecting, my ideal data looks like this dataset$Invoice_Date <- as.Date(dataset$Invoice_Date, '%m/%d/%y')
ID. Invoice. Date of Invoice. paid or not.
2 1 10/31/2019 yes
2 2 11/30/2019 no
2 3 12/31/2019 no
3 1 7/31/2019 no
3 2 9/30/2019 yes
3 3 12/31/2019 no