I generate reports wherein I update/replaces certain details using gsub and %in% operator. Issue is after replacing certain string, the associated numerical values doesn't aggregate, when I use "match" operator, it picks up the first occurrence of same string leaving the others.
Sample Code:
o <- data.frame(branch = c('MDB','PMP','MWC'),val = c(1.1,0.9,0.75), stringsAsFactors = 0)
o$branch <- gsub('MDB','Others',o$branch)
o$branch <- gsub('PMP','Others',o$branch)
# o$branch[o$branch %in% c('MDB','PMP')] <- 'Others'
o
#> branch val
#>1 Others 1.10
#>2 Others 0.90
#>3 MWC 0.75
p <- data.frame(branch = c('Others','MWC'),rev = c(1,1.25), stringsAsFactors = 0)
p
#> branch rev
#> 1 Others 1.00
#> 2 MWC 1.25
p$rev <- o$val[match(p$branch,o$branch)]
p
#> branch rev
#>1 Others 1.10
#>2 MWC 0.75
As shown above, after I use gsub on "o" dataframe, there are two "others" rows, whereas I need only one "others" row and the corresponding "val" column aggregated to (1.10 + 0.90) = 2.00. My final "p" dataframe should have "others" value 2.00 instead of 1.10. I ran the report few times getting a deflated value each time. Could someone let me know how to correct the issue.