I'm fairly new to trying to run parallel processes in R because most of the data I work with just isn't that large. However, I am no working with a larger set of data where I am attempting to 'find and replace' a set of about 2000 names from 9000 survey comments. I've created a for loop using gsub that gets the job done, but it takes quite a long time:
completed <- 0
for (name in names){
text_df$text <- sapply(text_df$text, gsub, pattern=paste0("(?<=\\W|^)", name, "(?=\\W|$)"), replacement="RemovedLeader", ignore.case=TRUE, perl=TRUE)
completed <- completed + 1
print(paste0("Completed ", completed," out of ", length(names)))
}
From what I understand, this should be a fairly simple process to run in parallel, yet I'm having a bit of trouble. I've tried running this using parSapply, but I'm having a hard time re-writing the gsub (which itself is currently in an sapply in the for loop) to work outside of the for loop. Thanks for the help.