Translating nested STATA forloops into R tidyverse/purrr syntax
I am trying to do this using foreach or purrr, but I keep getting stuck.
STATA Code:
foreach v in zip income child{
g `v`_agg="" **generate an empty column with the specified name (e.g. address_agg or income_agg)**
foreach l in 20190601 20180401 20171001 20160801{
replace `v'_agg=`v'`l' if missing (`v'_agg) **replace agg variable with the latest version if missing**
}
}
Here is the data example:
zip_20190601 zip_20180401 zip_20171001 zip_20160801 income_20190601 income_20180401 income_20171001 income_20160801 child_20160801 child_20171001 child_20180401 child_20190601
1 NA 11440 12016 15686 75038 63573 82391 47517 0 1 1 2
2 13089 12626 13670 16155 89494 64984 62603 47252 0 1 1 2
3 13258 12249 13333 16819 NA NA 48231 45729 0 1 1 2
4 NA NA 18480 18611 89480 67348 55516 45863 0 1 1 2
5 13990 10497 12573 13406 70053 63850 87833 48332 1 2 2 3
6 17005 11491 15227 17518 78087 70741 46318 47823 1 2 2 3
7 17174 17006 13461 11189 76780 66649 54578 46196 1 2 2 3
8 12452 15317 18049 14284 76654 73583 70090 48281 0 1 1 2
9 18449 14262 11013 17810 91422 79722 53948 45986 0 1 1 2
10 11429 11731 13564 14603 84282 60190 45133 46956 0 1 1 2
structure(list(zip_20190601 = c(NA, 13089L, 13258L, NA,
13990L, 17005L, 17174L, 12452L, 18449L, 11429L), zip_20180401 = c(11440L,
12626L, 12249L, NA, 10497L, 11491L, 17006L, 15317L, 14262L,
11731L), zip_20171001 = c(12016L, 13670L, 13333L, 18480L, 12573L,
15227L, 13461L, 18049L, 11013L, 13564L), zip_20160801 = c(15686L,
16155L, 16819L, 18611L, 13406L, 17518L, 11189L, 14284L, 17810L,
14603L), income_20190601 = c(75038L, 89494L, NA, 89480L,
70053L, 78087L, 76780L, 76654L, 91422L, 84282L), income_20180401 = c(63573L,
64984L, NA, 67348L, 63850L, 70741L, 66649L, 73583L, 79722L,
60190L), income_20171001 = c(82391L, 62603L, 48231L, 55516L,
87833L, 46318L, 54578L, 70090L, 53948L, 45133L), income_20160801 = c(47517L,
47252L, 45729L, 45863L, 48332L, 47823L, 46196L, 48281L, 45986L,
46956L), child_20160801 = c(0, 0, 0, 0, 1, 1, 1, 0, 0, 0), child_20171001 = c(1,
1, 1, 1, 2, 2, 2, 1, 1, 1), child_20180401 = c(1, 1, 1, 1, 2,
2, 2, 1, 1, 1), child_20190601 = c(2, 2, 2, 2, 3, 3, 3, 2, 2,
2)), .Names = c("zip_20190601", "zip_20180401", "zip_20171001",
"zip_20160801", "income_20190601", "income_20180401", "income_20171001",
"income_20160801", "child_20160801", "child_20171001", "child_20180401",
"child_20190601"), class = "data.frame", row.names = c(NA, -10L))
Goal:
I am trying to create a current "agg" variable for each of the variables (zip_agg, child_agg, income_agg) by running a for loop through the different dated variables to replace values from the most recent data.
If the most recent version is missing, it should go back to the next latest date. I have started to code it out but I know this is incorrect.
variable_date<-c("20190601", "20180401", "20171001", "20160801")
variable_list<-c("zip", "income", "child")
# using foreach package
foreach(x=variable_list, .combine = 'cbind') %:%
foreach(y=variable_date, .combine = 'c') %do%
{
var_agg<-str_c(x, "_agg") #create variable name
my.df%<>%
mutate(var_agg=NA,
var_agg=ifelse(is.na({{var_agg}})==T, my.df[str_c(x,y)], {{var_agg}}))
Expected output
Any help would be appreciated!