Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 204922

Replace NA with value within group for subset

$
0
0

I need to replace missing values in all columns of a data frame within ID and time point for a subgroup that have data from several sources. If it is not too complicated, it would be best to prioritize data from source B (e.g., in case of id 2 for variable Y in the data below).

Using the code below, it currently works (without prioritizing) for one column at the time, but since its a large data frame with millions of rows, it needs to be automated further. Also, I would like to keep it within the data.table framework if possible. Any advice?

# Data
id  time  X  Y   Source
1   2005  67 NA  A
1   2005  NA 1.1 B
1   2005  NA 1.1 B
2   2003  85 NA  B
2   2003  NA 0.4 A
2   2003  85 0.5 B

# Desired output
id  time  X  Y   Source
1   2005  67 1.1 A
1   2005  67 1.1 B
1   2005  67 1.1 B
2   2003  85 0.5 B
2   2003  85 0.4 A
2   2003  85 0.5 B

# Find duplicates
dup <- (duplicated(dat[,c('id','time')])|duplicated(dat[,c('id','time')], fromLast=TRUE))

# Replace NA in column X
library(data.table)
dat[dup & is.na(X), X := dat[!is.na(X)][.SD, on=.(id,time), mult="last", X]]


Viewing all articles
Browse latest Browse all 204922

Trending Articles