This seems simple enough, but can't figure it out. I'd like to create a new column in df2
(impute_id
) that identifies whether or not the value (measurement
) was imputed or if it is the raw, observed value from df1
. If the rows match, then in the new column in df2
, impute_id
, assign the string observed
and if the rows do not match, then assign the string imputed
. I'd like to do this using dplyr
if possible. Also to note, the rows in the data frames may not be in the same order even though they are in the example.
Example
Raw data
df1
time protocol measurement_type sample measurement
1 0 HPLC cis,cis-Muconic acid a 0.57561
2 0 HPLC D-Glucose a NA
3 0 HPLC cis,cis-Muconic acid a NA
4 0 HPLC D-Glucose b NA
5 0 OD600 Optical Density b 0.14430
6 22 HPLC cis,cis-Muconic acid b NA
7 22 HPLC D-Glucose a NA
8 22 OD600 Optical Density a NA
9 24 HPLC cis,cis-Muconic acid a NA
10 24 HPLC D-Glucose b 33.95529
Imputed Data
df2
time protocol measurement_type sample measurement
1 0 HPLC cis,cis-Muconic acid a 0.57561
2 0 HPLC D-Glucose a 33.95529
3 0 HPLC cis,cis-Muconic acid a 0.57561
4 0 HPLC D-Glucose b 33.95529
5 0 OD600 Optical Density b 0.14430
6 22 HPLC cis,cis-Muconic acid b 0.57561
7 22 HPLC D-Glucose a 33.95529
8 22 OD600 Optical Density a 0.14430
9 24 HPLC cis,cis-Muconic acid a 0.57561
10 24 HPLC D-Glucose b 33.95529
Desired Output
df2
time protocol measurement_type sample measurement impute_id
1 0 HPLC cis,cis-Muconic acid a 0.57561 observed
2 0 HPLC D-Glucose a 33.95529 imputed
3 0 HPLC cis,cis-Muconic acid a 0.57561 imputed
4 0 HPLC D-Glucose b 33.95529 imputed
5 0 OD600 Optical Density b 0.14430 observed
6 22 HPLC cis,cis-Muconic acid b 0.57561 imputed
7 22 HPLC D-Glucose a 33.95529 imputed
8 22 OD600 Optical Density a 0.14430 imputed
9 24 HPLC cis,cis-Muconic acid a 0.57561 imputed
10 24 HPLC D-Glucose b 33.95529 observed
Reproducible Data
Raw Data
df1 <- structure(list(time = c(0L, 0L, 0L, 0L, 0L, 22L, 22L, 22L, 24L,
24L), protocol = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L,
1L, 1L), .Label = c("HPLC", "OD600"), class = "factor"), measurement_type = structure(c(1L,
2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L), .Label = c("cis,cis-Muconic acid",
"D-Glucose", "Optical Density"), class = "factor"), sample = c("a",
"a", "a", "b", "b", "b", "a", "a", "a", "b"), measurement = c(0.57561,
NA, NA, NA, 0.1443, NA, NA, NA, NA, 33.95529)), row.names = c(NA,
-10L), class = "data.frame")
Imputed Data
df2 <- structure(list(time = c(0L, 0L, 0L, 0L, 0L, 22L, 22L, 22L, 24L,
24L), protocol = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L,
1L, 1L), .Label = c("HPLC", "OD600"), class = "factor"), measurement_type = structure(c(1L,
2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L), .Label = c("cis,cis-Muconic acid",
"D-Glucose", "Optical Density"), class = "factor"), sample = c("a",
"a", "a", "b", "b", "b", "a", "a", "a", "b"), measurement = c(0.57561,
33.95529, 0.57561, 33.95529, 0.1443, 0.57561, 33.95529, 0.1443,
0.57561, 33.95529)), row.names = c(NA, -10L), class = "data.frame")