I've a dataframe as under:
+------+-----+
| from | to |
+------+-----+
| 1 | 3 |
| 1 | 5 |
| 2 | 1 |
| 2 | 3 |
| 2 | 6 |
| 3 | 6 |
| 4 | 5 |
| 4 | 8 |
| 5 | 9 |
| 6 | 10 |
| 6 | 2 |
| 6 | 4 |
| 7 | 5 |
| 7 | 4 |
| 8 | 7 |
| 9 | 8 |
| 10 | 9 |
+------+-----+
In the first iteration I want to group the to
such a way that if a value appears in the to
it should not be repeated. so the above table would look like:
+--------+--------+
| from_1 | to_1 |
+--------+--------+
| 1 | 3,5 |
| 2 | 1,6 |
| 3 | |
| 4 | 8 |
| 5 | 9 |
| 6 | 10,2,4 |
| 7 | |
+--------+--------+
In the next iteration, if to_1
has a value in from_1
and if its corresponding to_1
is null then delete that cell, so the result will look as under:
+--------+--------+
| from_2 | to_2 |
+--------+--------+
| 1 | 3,5 |
| 2 | 1,6 |
| 4 | 8 |
| 5 | 9 |
| 6 | 10,2,4 |
| 7 | |
+--------+--------+
So over here as the number 3 into
is already in from
for 1
we delete the 3
that appears in from
whose corresponding to
is null
In the last iteration, if the two_2
is present in from_2
then remove the value from two_2 unless the to_2
of the corresponding from_2
is empty in our case the 2nd example 2(1,6) so we still keep the 6 (10,2) pair, so the final table would look as under:
+--------+------+
| from_3 | to_3 |
+--------+------+
| 1 | 3 |
| 4 | 8 |
| 5 | 9 |
| 6 | 10,2 |
| 7 | |
+--------+------+