Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

Create new column based on if rows from 2 data frames match

$
0
0

This seems simple enough, but can't figure it out. I'd like to create a new column in df2 (impute_id) that identifies whether or not the value (measurement) was imputed or if it is the raw, observed value from df1. If the rows match, then in the new column in df2, impute_id, assign the string observed and if the rows do not match, then assign the string imputed. I'd like to do this using dplyr if possible. Also to note, the rows in the data frames may not be in the same order even though they are in the example.


Example

Raw data

df1
   time protocol     measurement_type sample measurement
1     0     HPLC cis,cis-Muconic acid      a     0.57561
2     0     HPLC            D-Glucose      a          NA
3     0     HPLC cis,cis-Muconic acid      a          NA
4     0     HPLC            D-Glucose      b          NA
5     0    OD600      Optical Density      b     0.14430
6    22     HPLC cis,cis-Muconic acid      b          NA
7    22     HPLC            D-Glucose      a          NA
8    22    OD600      Optical Density      a          NA
9    24     HPLC cis,cis-Muconic acid      a          NA
10   24     HPLC            D-Glucose      b    33.95529

Imputed Data

df2
   time protocol     measurement_type sample measurement
1     0     HPLC cis,cis-Muconic acid      a     0.57561
2     0     HPLC            D-Glucose      a    33.95529
3     0     HPLC cis,cis-Muconic acid      a     0.57561
4     0     HPLC            D-Glucose      b    33.95529
5     0    OD600      Optical Density      b     0.14430
6    22     HPLC cis,cis-Muconic acid      b     0.57561
7    22     HPLC            D-Glucose      a    33.95529
8    22    OD600      Optical Density      a     0.14430
9    24     HPLC cis,cis-Muconic acid      a     0.57561
10   24     HPLC            D-Glucose      b    33.95529

Desired Output

df2
   time protocol     measurement_type sample measurement  impute_id
1     0     HPLC cis,cis-Muconic acid      a     0.57561   observed
2     0     HPLC            D-Glucose      a    33.95529    imputed
3     0     HPLC cis,cis-Muconic acid      a     0.57561    imputed
4     0     HPLC            D-Glucose      b    33.95529    imputed
5     0    OD600      Optical Density      b     0.14430   observed
6    22     HPLC cis,cis-Muconic acid      b     0.57561    imputed
7    22     HPLC            D-Glucose      a    33.95529    imputed
8    22    OD600      Optical Density      a     0.14430    imputed
9    24     HPLC cis,cis-Muconic acid      a     0.57561    imputed
10   24     HPLC            D-Glucose      b    33.95529   observed

Reproducible Data

Raw Data

df1 <- structure(list(time = c(0L, 0L, 0L, 0L, 0L, 22L, 22L, 22L, 24L, 
24L), protocol = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 
1L, 1L), .Label = c("HPLC", "OD600"), class = "factor"), measurement_type = structure(c(1L, 
2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L), .Label = c("cis,cis-Muconic acid", 
"D-Glucose", "Optical Density"), class = "factor"), sample = c("a", 
"a", "a", "b", "b", "b", "a", "a", "a", "b"), measurement = c(0.57561, 
NA, NA, NA, 0.1443, NA, NA, NA, NA, 33.95529)), row.names = c(NA, 
-10L), class = "data.frame")

Imputed Data

df2 <- structure(list(time = c(0L, 0L, 0L, 0L, 0L, 22L, 22L, 22L, 24L, 
24L), protocol = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 
1L, 1L), .Label = c("HPLC", "OD600"), class = "factor"), measurement_type = structure(c(1L, 
2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L), .Label = c("cis,cis-Muconic acid", 
"D-Glucose", "Optical Density"), class = "factor"), sample = c("a", 
"a", "a", "b", "b", "b", "a", "a", "a", "b"), measurement = c(0.57561, 
33.95529, 0.57561, 33.95529, 0.1443, 0.57561, 33.95529, 0.1443, 
0.57561, 33.95529)), row.names = c(NA, -10L), class = "data.frame")

Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>