I have a dataframe filled with names.
For a given row in the dataframe, I'd like to compare that row to every row above it in the df and determine if the number of matching names is less than or equal to 4 for every row.
Toy Example where row 3 is the row of interest
"Jim","Dwight","Michael","Andy","Stanley","Creed"
"Jim","Dwight","Angela","Pam","Ryan","Jan"
"Jim","Dwight","Angela","Pam","Creed","Ryan"<--- row of interest
So first we'd compare row 3 to row 1 and see that the name overlap is 3, which meets the <= 4 criteria.
Then we'd compare row 3 to row 2 and see that the name overlap is 5 which fails the <= 4 criteria, ultimately returning a failed condition for being <=4 for every row above it.
Right now I am doing this operation using a for loop but the speed is much too slow for the dataframe size I am working with.