I can't seem to figure out how to match identical characters in regex in R. Suppose I have this data:
dt <- c("12345", "asdf", "#*§", "AAAA", ";;;;", "9999", "%:=+")
I'm able to extract all strings that consist exactly of any 4 non-whitespace characters, for example like this:
pattern <- "\\S{4}"
extract <- function(x) unlist(regmatches(x, gregexpr(pattern, x, perl = T)))
extract(dt)
[1] "1234""asdf""AAAA"";;;;""9999""%:=+"
But what I really want to match are those strings in which the same character is repeated 4 times, giving this ouput:
[1] "AAAA"";;;;""9999"
Any ideas?