Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201919

R - Identical values in columns of dataframe in one row

$
0
0

I have a data frame containing 3 columns of non-integer values. The values in the respective columns allot of the time will be identical to values in the other one or two columns in the same data frame. If there are matches between columns I would like to have them on the same row.

See subset_df vs expected_subset_df below for clarification.

Notice that the values ending on "248:-" are in the same row in expected_subset_df but not in subset_df.

Summary: values in col1 can also be in col2 and/or col3. If the values between columns do match I want them on the same row.

> subset_df
         col1          col2          col3
1 20:31722330:- 20:31722330:- 20:31722330:-
2 20:31722348:- 20:31724051:- 20:31724051:-
3         FALSE 20:31722348:- 20:31722348:-
> expected_subset_df
         col1          col2          col3
1 20:31722330:- 20:31722330:- 20:31722330:-
2 20:31722348:- 20:31722348:- 20:31722348:-
3         FALSE 20:31724051:- 20:31724051:-

What I have attempted

library(dplyr)
subset_df %>% 
    mutate_all(as.character) %>% 
        mutate(col1 = subset_df$col1[match(subset_df$col2, subset_df$col1)],
        col3 = subset_df$col3[match(subset_df$col2, subset_df$col3)])

Yields:

         col1          col2          col3
1 20:31722330:- 20:31722330:- 20:31722330:-
2          <NA> 20:31724051:- 20:31724051:-
3 20:31722348:- 20:31722348:- 20:31722348:-

Is this method robust? Is there a better alternative?

Edit:

Suppose dataframe breakpoint looks like this:

> breakpoint
         col1           col2            col3
1 20:31722330:- 20:31722344:-            FALSE
2 21:15014555:- 21:15014555:-            FALSE
3 21:15014767:- 21:15014767:-    21:15014767:-

How can I turn dataframe breakpoint into this:

> expected_breakpoint
         col1           col2          col3
1 20:31722330:-          <NA>          <NA>
2          <NA>  20:31722344:-         <NA>
3 21:15014555:-  21:15014555:-         <NA>
4          <NA>          <NA>         FALSE
5          <NA>          <NA>         FALSE
6 21:15014767:-  21:15014767:-  21:15014767:-

Edit 2: FALSE into <NA> before analysis

Suppose dataframe breakpoint_new looks like this:

> breakpoint_new
         col1           col2            col3
1 20:31722330:- 20:31722344:-            <NA>
2 21:15014555:- 21:15014555:-            <NA>
3 21:15014767:- 21:15014767:-    21:15014767:-

How can I turn dataframe breakpoint_new into this:

> expected_breakpoint_new
         col1           col2          col3
1 20:31722330:-          <NA>          <NA>
2          <NA>  20:31722344:-         <NA>
3 21:15014555:-  21:15014555:-         <NA>
4 21:15014767:-  21:15014767:-  21:15014767:-

Viewing all articles
Browse latest Browse all 201919

Trending Articles