I have a table with dateRanges and corresponding IDs. I want to group the IDs based on whether their start/end range overlaps with the date range for another ID. If a date range for an ID is partially or completely within that for another ID, they should belong to the same group. I want to add a column indicating this grouping, alongside the start/end date as given by the smallest and largest dates within the group.
The data:
"ID""start""end"
1 2018-10-02 2019-01-15
2 2019-01-13 2019-02-01
3 2018-10-01 2018-11-01
4 2018-10-05 2018-10-06
5 2019-09-09 2019-10-08
6 2019-02-06 2019-02-07
7 2019-03-24 2019-04-17
8 2019-03-21 2019-04-14
9 2019-03-27 2019-04-16
10 2019-04-30 2019-05-08
The ideal result:
"ID""start""end""group_ID""group_start""group_end"
1 2018-10-02 2019-01-15 1 2018-10-01 2019-02-01
2 2019-01-13 2019-02-01 1 2018-10-01 2019-02-01
3 2018-10-01 2018-11-01 1 2018-10-01 2019-02-01
4 2018-10-05 2018-10-06 1 2018-10-01 2019-02-01
5 2019-09-09 2019-10-08 2 2019-09-09 2019-10-08
6 2019-02-06 2019-02-07 3 2019-02-06 2019-05-08
7 2019-03-24 2019-04-17 3 2019-02-06 2019-05-08
8 2019-03-21 2019-04-14 3 2019-02-06 2019-05-08
9 2019-03-27 2019-04-16 3 2019-02-06 2019-05-08
10 2019-04-30 2019-05-08 3 2019-02-06 2019-05-08
What I've been thinking of that may work is creating a matrix of IDs (i.e.- rows and columns spanning from ID 1 to ID 10) and filling each cell on whether the date ranges for the given intersection of IDs overlap. Following this, binning then into groups and finding the min/max for the given group, but this seems really complicated. There must be an easier solution that does not involve looking at edges on a matrix to create clusters.