Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 202041

matching and filling in blanks of data frame in R

$
0
0

I have data with double entries, that looks like this:

+-----+-------+-----------+-----------+--------+
| id  | first |   last    | birthyear | father |
+-----+-------+-----------+-----------+--------+
| a12 | linda | john      | 1991      | NA     |
| 3n8 | max   | well      | 1915      | NA     |
| 15z | linda | NA        | 1991      | dan    |
| 1y9 | pam   | degeneres | 1855      | NA     |
| 84z | NA    | degeneres | 1950      | hank   |
| 9i5 | max   | well      | NA        | mike   |
+-----+-------+-----------+-----------+--------+

There are multiple entries for a single person, but each entry has unique data that needs to be preserved. I want to merge these entries, keeping all information. Only the "id" column does not have to match, i want to keep the first "id" entry in the list as the final "id". So my final dataframe would look like this:

+-----+-------+-----------+-----------+--------+
| id  | first |   last    | birthyear | father |
+-----+-------+-----------+-----------+--------+
| a12 | linda | john      | 1991      | dan    |
| 3n8 | max   | well      | 1915      | mike   |
| 1y9 | pam   | degeneres | 1855      | NA     |
| 84z | NA    | degeneres | 1950      | hank   |
+-----+-------+-----------+-----------+--------+

In this example, there are two entries with last name "degeneres" who did not get merged because the birthyear does not match. The entries where there were only matching entries (aside from NAs) did get merged.

So far, the farthest i got is generating a list ordered by matching first names:

df <- data.frame(id = c("a12", "3n8", "15z", "1y9", "84z", "9i5"), first = c("linda", "max", "linda", "pam", NA, "max"), last = c("john", "well", NA, "degeneres", "degeneres", "well"), birthyear = c("1991", "1915", "1991", "1855", "1950", NA), father = c(NA, NA, "dan", NA, "hank", "mike"), stringsAsFactors = F)

name_list <- list()
i <- 1
for(n in df$first) {
  name_list[[i]] <- df[df$first == n,]
  i <<- i + 1
}

I also tried to apply merge in a meaningful way, but that does not give me the desired results:

merge(x = df, y = df, by = c("first", "last", "birthyear", "father"))

+---------+-----------+-----------+--------+------+------+
|   first |   last    | birthyear | father | id.x | id.y |
+---------+-----------+-----------+--------+------+------+
| linda   | john      | 1991      | <NA>   | a12  | a12  |
| linda   | NA        | 1991      | dan    | 15z  | 15z  |
| max     | well      | 1915      | NA     | 3n8  | 3n8  |
| max     | well      | NA        | mike   | 9i5  | 9i5  |
| NA      | degeneres | 1950      | hank   | 84z  | 84z  |
| pam     | degeneres | 1855      | NA     | 1y9  | 1y9  |
+---------+-----------+-----------+--------+------+------+

how could i best proceed?


Viewing all articles
Browse latest Browse all 202041

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>