Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

Replace column with the most recently available data (replace nested forloop with purrr)

$
0
0

Translating nested STATA forloops into R tidyverse/purrr syntax

I am trying to do this using foreach or purrr, but I keep getting stuck.

STATA Code:

foreach v in zip income child{
g `v`_agg="" **generate an empty column with the specified name (e.g. address_agg or income_agg)**
foreach l in 20190601 20180401 20171001 20160801{
replace `v'_agg=`v'`l' if missing (`v'_agg) **replace agg variable with the latest version if missing**
}
}

Here is the data example:

zip_20190601 zip_20180401 zip_20171001 zip_20160801 income_20190601 income_20180401 income_20171001 income_20160801 child_20160801 child_20171001 child_20180401 child_20190601
1     NA           11440        12016        15686           75038           63573           82391           47517              0              1              1              2
2     13089        12626        13670        16155           89494           64984           62603           47252              0              1              1              2
3     13258        12249        13333        16819             NA               NA           48231           45729              0              1              1              2
4     NA           NA           18480        18611           89480           67348           55516           45863              0              1              1              2
5     13990        10497        12573        13406           70053           63850           87833           48332              1              2              2              3
6     17005        11491        15227        17518           78087           70741           46318           47823              1              2              2              3
7     17174        17006        13461        11189           76780           66649           54578           46196              1              2              2              3
8     12452        15317        18049        14284           76654           73583           70090           48281              0              1              1              2
9     18449        14262        11013        17810           91422           79722           53948           45986              0              1              1              2
10    11429        11731        13564        14603           84282           60190           45133           46956              0              1              1              2


structure(list(zip_20190601 = c(NA, 13089L, 13258L, NA, 
13990L, 17005L, 17174L, 12452L, 18449L, 11429L), zip_20180401 = c(11440L, 
12626L, 12249L, NA, 10497L, 11491L, 17006L, 15317L, 14262L, 
11731L), zip_20171001 = c(12016L, 13670L, 13333L, 18480L, 12573L, 
15227L, 13461L, 18049L, 11013L, 13564L), zip_20160801 = c(15686L, 
16155L, 16819L, 18611L, 13406L, 17518L, 11189L, 14284L, 17810L, 
14603L), income_20190601 = c(75038L, 89494L, NA, 89480L, 
70053L, 78087L, 76780L, 76654L, 91422L, 84282L), income_20180401 = c(63573L, 
64984L, NA, 67348L, 63850L, 70741L, 66649L, 73583L, 79722L, 
60190L), income_20171001 = c(82391L, 62603L, 48231L, 55516L, 
87833L, 46318L, 54578L, 70090L, 53948L, 45133L), income_20160801 = c(47517L, 
47252L, 45729L, 45863L, 48332L, 47823L, 46196L, 48281L, 45986L, 
46956L), child_20160801 = c(0, 0, 0, 0, 1, 1, 1, 0, 0, 0), child_20171001 = c(1, 
1, 1, 1, 2, 2, 2, 1, 1, 1), child_20180401 = c(1, 1, 1, 1, 2, 
2, 2, 1, 1, 1), child_20190601 = c(2, 2, 2, 2, 3, 3, 3, 2, 2, 
2)), .Names = c("zip_20190601", "zip_20180401", "zip_20171001", 
"zip_20160801", "income_20190601", "income_20180401", "income_20171001", 
"income_20160801", "child_20160801", "child_20171001", "child_20180401", 
"child_20190601"), class = "data.frame", row.names = c(NA, -10L))

Goal:

I am trying to create a current "agg" variable for each of the variables (zip_agg, child_agg, income_agg) by running a for loop through the different dated variables to replace values from the most recent data.

If the most recent version is missing, it should go back to the next latest date. I have started to code it out but I know this is incorrect.

variable_date<-c("20190601", "20180401", "20171001", "20160801")
variable_list<-c("zip", "income", "child")


# using foreach package
foreach(x=variable_list, .combine = 'cbind') %:%
  foreach(y=variable_date, .combine = 'c') %do%
  { 

    var_agg<-str_c(x, "_agg") #create variable name

my.df%<>%
      mutate(var_agg=NA,
      var_agg=ifelse(is.na({{var_agg}})==T, my.df[str_c(x,y)], {{var_agg}}))

Expected output

enter image description here

Any help would be appreciated!


Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>