Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

how to combine two regex with if else using stringr

$
0
0

Problem: Hi all, I have this sample dataframe which has institution names I need to extract:

mydf<- data.frame(ID=c('1', '2', '3'), Institution=c('Univ of Space, TX, US', '[Bloggs, J., Smith, T.] Univ of Time, CA, US', '[Windz, P., Lol, D.] College of the World, CA, US' ))

I need to extract the institution names only, such that it would appear like this:

1 Univ of Space
2 Univ of Time
3 College of the World

I don't care about any of the other characters in the institution string, only everything until the first comma. The issue is I have some instances where the institution name will be preceded by a bracket and sometimes on its own (as in the case of the first row).

I've written the following to extract these two instances separately:

ex_inst<- str_extract_all(mydf$Institution,"(?<=])(.+?)(?=,)", simplify = TRUE)
ex_inst2<- str_extract_all(mydf$Institution,"^(.+?)(?=,)", simplify = TRUE)

I'm struggling to combine them together. I have looked into the alternation, and tried this

ex_inst3<- str_extract_all(mydf$Institution,"^(.+?)(?=,)|(?<=])(.+?)(?=,)", simplify = TRUE)

But I'm not experienced with regex and am confused by what it's outputting:

[1,] "Univ of Space"""                     
[2,] "[Bloggs"" Univ of Time"        
[3,] "[Windz"" College of the World"

What's the best way to combine this with stringr, can I use some sort of if else statement? thanks.


Viewing all articles
Browse latest Browse all 201839

Trending Articles