Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201867

Conditionally Remove duplicates and collapse a dataframe

$
0
0

I've a dataframe as under:

+------+-----+
| from | to  |
+------+-----+
|    1 |   3 |
|    1 |   5 |
|    2 |   1 |
|    2 |   3 |
|    2 |   6 |
|    3 |   6 |
|    4 |   5 |
|    4 |   8 |
|    5 |   9 |
|    6 |  10 |
|    6 |   2 |
|    6 |   4 |
|    7 |   5 |
|    7 |   4 |
|    8 |   7 |
|    9 |   8 |
|   10 |   9 |
+------+-----+

In the first iteration I want to group the to such a way that if a value appears in the to it should not be repeated. so the above table would look like:

+--------+--------+
| from_1 |  to_1  |
+--------+--------+
|      1 |    3,5 |
|      2 |    1,6 |
|      3 |        |
|      4 |      8 |
|      5 |      9 |
|      6 | 10,2,4 |
|      7 |        |
+--------+--------+

In the next iteration, if to_1 has a value in from_1 and if its corresponding to_1 is null then delete that cell, so the result will look as under:

+--------+--------+
| from_2 |  to_2  |
+--------+--------+
|      1 |    3,5 |
|      2 |    1,6 |
|      4 |      8 |
|      5 |      9 |
|      6 | 10,2,4 |
|      7 |        |
+--------+--------+

So over here as the number 3 into is already in from for 1 we delete the 3 that appears in from whose corresponding to is null

In the last iteration, if the two_2 is present in from_2 then remove the value from two_2 unless the to_2 of the corresponding from_2 is empty in our case the 2nd example 2(1,6) so we still keep the 6 (10,2) pair, so the final table would look as under:

+--------+------+
| from_3 | to_3 |
+--------+------+
|      1 |    3 |
|      4 |    8 |
|      5 |    9 |
|      6 | 10,2 |
|      7 |      |
+--------+------+

Viewing all articles
Browse latest Browse all 201867

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>