Hello!
My goal is to compare two character vectors - the main being synonyms and another mixnames. The string elements in mixnames do not match exactly to what is in synonyms therefore some string comparison is required. My objective is to extract the elements in synonyms that have a something that looks like what is in mixnames. I tried to do this using only the tidyverse but failed. I found a solution that works using base. I know there is a better way, but I can't figure it out....
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 3.6.1
#> Warning: package 'tidyr' was built under R version 3.6.1
#> Warning: package 'dplyr' was built under R version 3.6.1
#Acetometaphin
synonyms <- c("Pediatrix","Percocet-5","Percocet-Demi","Perdolan Mono","Perfalgan",
"Phenaphen","Phenaphen W/Codeine","Phenipirin","Phogoglandin","Pinex",
"Piramin","Pirinasol","Plicet","Polmofen","Predimol","Predualito",
"Prodol","Prontina","Puernol","Pulmofen", "Pyregesic-C")
mixNames <- c("Liquiprin","Midol Maximum Strength","Midol PM Night Time Formula",
"Midol Regular Strength" ,"Midol Teen Formula","Naldegesic",
"Ornex Severe Cold Formula","Percocet","Percogesic with Codeine",
"Propacet" )
failed attempt:
#####STUFF THAT DIDNT WORK!!!!
# cross2(
# .x = synonyms, .y = mixNames #lists - each list has 2 lists - each of those is an atomic vector of 1
# ) %>%
# map_dfc(lift(str_detect)) #lift - modifies function to take a list of arguments - works for nested lists
#this returns a df just like the apply
# mix_syn_lgl_df <- map_dfc(
# mixNames,
# ~ map_lgl(synonyms, str_detect, pattern = .x)
# )
# colnames(mix_syn_lgl_df) <- mixNames
#
# mix_syn_lgl_df$synonyms <- synonyms
This actually worked:
#remove mixture names from synonyms
mix_syn_lgl_mat <- sapply(mixNames, function(x){
str_detect(string = synonyms, pattern = x)
}) #returns a matrix 21x10 of logicals while preserving colnames
rownames(mix_syn_lgl_mat) <- synonyms #add synoyms as rownames
#create a new object with a new col of sum of TRUES in row
mix_syn_lgl_mat2 <- cbind(mix_syn_lgl_mat, rowSums(mix_syn_lgl_mat))
#take the numerical matrix mix_syn_lgl_mat2 and return the row names where the last col (rowsums) > 0
badNames <- row.names(mix_syn_lgl_mat2[mix_syn_lgl_mat2[, ncol(mix_syn_lgl_mat2)] > 0, ])
#filter out those names from the synonyms vector
pureSyn <- synonyms[!(synonyms %in% badNames)]
Created on 2019-10-29 by the reprex package (v0.3.0)