I have a data frame containing 3 columns of non-integer values. The values in the respective columns allot of the time will be identical to values in the other one or two columns in the same data frame. If there are matches between columns I would like to have them on the same row.
See subset_df vs expected_subset_df below for clarification.
Notice that the values ending on "248:-" are in the same row in expected_subset_df but not in subset_df.
Summary: values in col1 can also be in col2 and/or col3. If the values between columns do match I want them on the same row.
> subset_df
col1 col2 col3
1 20:31722330:- 20:31722330:- 20:31722330:-
2 20:31722348:- 20:31724051:- 20:31724051:-
3 FALSE 20:31722348:- 20:31722348:-
> expected_subset_df
col1 col2 col3
1 20:31722330:- 20:31722330:- 20:31722330:-
2 20:31722348:- 20:31722348:- 20:31722348:-
3 FALSE 20:31724051:- 20:31724051:-
What I have attempted
library(dplyr)
subset_df %>%
mutate_all(as.character) %>%
mutate(col1 = subset_df$col1[match(subset_df$col2, subset_df$col1)],
col3 = subset_df$col3[match(subset_df$col2, subset_df$col3)])
Yields:
col1 col2 col3
1 20:31722330:- 20:31722330:- 20:31722330:-
2 <NA> 20:31724051:- 20:31724051:-
3 20:31722348:- 20:31722348:- 20:31722348:-
Is this method robust? Is there a better alternative?
Edit:
Suppose dataframe breakpoint looks like this:
> breakpoint
col1 col2 col3
1 20:31722330:- 20:31722344:- FALSE
2 21:15014555:- 21:15014555:- FALSE
3 21:15014767:- 21:15014767:- 21:15014767:-
How can I turn dataframe breakpoint into this:
> expected_breakpoint
col1 col2 col3
1 20:31722330:- <NA> <NA>
2 <NA> 20:31722344:- <NA>
3 21:15014555:- 21:15014555:- <NA>
4 <NA> <NA> FALSE
5 <NA> <NA> FALSE
6 21:15014767:- 21:15014767:- 21:15014767:-
Edit 2: FALSE
into <NA>
before analysis
Suppose dataframe breakpoint_new looks like this:
> breakpoint_new
col1 col2 col3
1 20:31722330:- 20:31722344:- <NA>
2 21:15014555:- 21:15014555:- <NA>
3 21:15014767:- 21:15014767:- 21:15014767:-
How can I turn dataframe breakpoint_new into this:
> expected_breakpoint_new
col1 col2 col3
1 20:31722330:- <NA> <NA>
2 <NA> 20:31722344:- <NA>
3 21:15014555:- 21:15014555:- <NA>
4 21:15014767:- 21:15014767:- 21:15014767:-