Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 207255

Group by partial string matches

$
0
0

I have a table with a list of categories each with a count value that i'd like to collapse across based on similarity ... for example Mariner-1_Amel and Mariner-10 would be a single category of Mariner and anything with 'Jockey' or 'hAT' in the name should be collapsed across.

I'm struggling to find a solution that can cope with all the possibilities. Is there an easy dplyr solution?

reproducible with

> dput(tibs)
structure(list(type = c("(TTAAG)n_1", "AMARI_1", "Copia-4_LH-I", 
"DNA", "DNA-1_CQ", "DNA/hAT-Charlie", "DNA/hAT-Tip100", "DNA/MULE-MuDR", 
"DNA/P", "DNA/PiggyBac", "DNA/TcMar-Mariner", "DNA/TcMar-Tc1", 
"DNA/TcMar-Tigger", "G3_DM", "Gypsy-10_CFl-I", "hAT-1_DAn", "hAT-16_SM", 
"hAT-N4_RPr", "HELITRON7_CB", "Jockey-1_DAn", "Jockey-1_DEl", 
"Jockey-12_DF", "Jockey-5_DTa", "Jockey-6_DYa", "Jockey-6_Hmel", 
"Jockey-7_HMM", "Jockey-8_Hmel", "LINE/Dong-R4", "LINE/I", "LINE/I-Jockey", 
"LINE/I-Nimb", "LINE/Jockey", "LINE/L1", "LINE/L2", "LINE/R1", 
"LINE/R2", "LINE/R2-NeSL", "LINE/Tad1", "LTR/Gypsy", "Mariner_CA", 
"Mariner-1_AMel", "Mariner-10_HSal", "Mariner-13_ACe", "Mariner-15_HSal", 
"Mariner-16_DAn", "Mariner-19_RPr", "Mariner-30_SM", "Mariner-39_SM", 
"Mariner-42_HSal", "Mariner-46_HSal", "Mariner-49_HSal", "TE-5_EL", 
"Unknown", "Utopia-1_Crp"), n = c(1L, 1L, 1L, 2L, 1L, 18L, 3L, 
9L, 2L, 8L, 21L, 12L, 18L, 1L, 3L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 
1L, 2L, 1L, 2L, 1L, 2L, 7L, 2L, 7L, 24L, 1L, 1L, 5L, 3L, 1L, 
1L, 7L, 1L, 5L, 1L, 1L, 5L, 5L, 1L, 1L, 3L, 5L, 5L, 2L, 1L, 190L, 
1L)), row.names = c(NA, -54L), class = c("tbl_df", "tbl", "data.frame"
))

Viewing all articles
Browse latest Browse all 207255

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>