I have a data frame with two issues that I am trying to correct. Here is a toy example.
require(data.table)
tempdt <- data.table(ID1=rep(1:6,each=2),ID2=rep(letters[1:2],6),name=c('john','john',NA,'mike','steve',NA,'bob',NA,NA,'henry','joe','frank'))
ID1 ID2 name
1: 1 a john
2: 1 b john
3: 2 a <NA>
4: 2 b mike
5: 3 a steve
6: 3 b <NA>
7: 4 a bob
8: 4 b <NA>
9: 5 a <NA>
10: 5 b henry
11: 6 a joe
12: 6 b frank
There are 2 sequential grouping variables (ID1 as the primary sequence and ID2 as the secondary sequence within ID1) and a name assignment. Sometimes the name is missing and I need to fill this in based on what is assigned within that ID1 and other times I might have 2 (or more) different names for the same ID1 but there should only be one. Whichever name comes first in the order of ID2 within ID1 should be the assigned name for all of that ID1
Ultimately the name field should read c('john','john','mike','mike','steve','steve','bob','bob','henry','henry','joe','joe')
I could approach this by ordering the data frame(table) based on the two sequential variables and then doing a for loop on ID1 and making the corrections but it seems like there should be a cleaner more efficient way to sequence along ID1 and compare the sequence of ID2 within ID1 and make the corrections avoiding a loop.
Any thoughts? I have it as a data table because I usually work with them but it isn't a necessity.
Will