Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

Better solution to check elements of one character vector with another character vector using the tidyverse?

$
0
0

Hello!
My goal is to compare two character vectors - the main being synonyms and another mixnames. The string elements in mixnames do not match exactly to what is in synonyms therefore some string comparison is required. My objective is to extract the elements in synonyms that have a something that looks like what is in mixnames. I tried to do this using only the tidyverse but failed. I found a solution that works using base. I know there is a better way, but I can't figure it out....

library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 3.6.1
#> Warning: package 'tidyr' was built under R version 3.6.1
#> Warning: package 'dplyr' was built under R version 3.6.1

#Acetometaphin 

synonyms <- c("Pediatrix","Percocet-5","Percocet-Demi","Perdolan Mono","Perfalgan", 
              "Phenaphen","Phenaphen W/Codeine","Phenipirin","Phogoglandin","Pinex", 
              "Piramin","Pirinasol","Plicet","Polmofen","Predimol","Predualito",
              "Prodol","Prontina","Puernol","Pulmofen", "Pyregesic-C")

mixNames <- c("Liquiprin","Midol Maximum Strength","Midol PM Night Time Formula",
              "Midol Regular Strength" ,"Midol Teen Formula","Naldegesic",
              "Ornex Severe Cold Formula","Percocet","Percogesic with Codeine",
              "Propacet" )

failed attempt:

#####STUFF THAT DIDNT WORK!!!!

# cross2(
#   .x = synonyms, .y = mixNames  #lists - each list has 2 lists - each of those is an atomic vector of 1
# ) %>% 
#   map_dfc(lift(str_detect)) #lift - modifies function to take a list of arguments - works for nested lists 

#this returns a df just like the apply 

# mix_syn_lgl_df <- map_dfc(
#   mixNames,
#   ~ map_lgl(synonyms, str_detect, pattern = .x)
# )

# colnames(mix_syn_lgl_df) <- mixNames
# 
# mix_syn_lgl_df$synonyms <- synonyms

This actually worked:


#remove mixture names from synonyms

mix_syn_lgl_mat <- sapply(mixNames, function(x){
  str_detect(string = synonyms, pattern = x)
}) #returns a matrix 21x10 of logicals while preserving colnames

rownames(mix_syn_lgl_mat) <- synonyms #add synoyms as rownames
#create a new object with a new col of sum of TRUES in row
mix_syn_lgl_mat2 <- cbind(mix_syn_lgl_mat, rowSums(mix_syn_lgl_mat)) 
#take the numerical matrix mix_syn_lgl_mat2 and return the row names where the last col (rowsums) > 0
badNames <- row.names(mix_syn_lgl_mat2[mix_syn_lgl_mat2[, ncol(mix_syn_lgl_mat2)] > 0, ])
#filter out those names from the synonyms vector
pureSyn <- synonyms[!(synonyms %in% badNames)]

Created on 2019-10-29 by the reprex package (v0.3.0)


Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>