Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 206503

Regular Expression R: Select the above or below lines of a regexp selection while meeting another regexp criteria

$
0
0

I am working with a text document similar to the examples below.

File <- c("Location  Name                               Code and Label                            Frequency  Percentage", 
"                  During the past 30 days, on how many days did you carry a weapon", 
"44-44     Q13     such as a gun, knife, or club on school property?", 
"                  1                  0 days                                               1,610        94.5", 
"                  2                  1 day                                                   71         4.3", 
"                  3                  2 or 3 days                                              6         0.4", 
"                  4                  4 or 5 days                                              3         0.2", 
"                  5                  6 or more days                                          12         0.7", 
"                                     Missing                                                 48", 
"45-45     Q14     During the past 12 months, on how many days did you carry a gun?", 
"                  1                  0 days                                               1,602        91.3", 
"                  2                  1 day                                                   84         5.0", 
"                  3                  2 or 3 days                                             17         1.2", 
"                  4                  4 or 5 days                                              6         0.3", 
"                  5                  6 or more days                                          38         2.2", 
"                                     Missing                                                  3", 
"                  During the past 30 days, on how many days did you not go to school", 
"46-46     Q15     because you felt you would be unsafe at school or on your way to or", 
"                  from school?", "                  1                  0 days                                               1,407        80.4", 
"                  2                  1 day                                                  180        10.9", 
"                  3                  2 or 3 days                                             97         5.4", 
"                  4                  4 or 5 days                                             31         1.8", 
"                  5                  6 or more days                                          26         1.5", 
"                                     Missing                                                  9", 
"                  During the past 12 months, how many times has someone threatened", 
"47-47     Q16     or injured you with a weapon such as a gun, knife, or club on school", 
"                  property?", "                  1                  0 times                                              1,590        92.5", 
"                  2                  1 time                                                  93         5.7", 
"                  3                  2 or 3 times                                            10         0.7", 
"                  4                  4 or 5 times                                             9         0.4", 
"                  5                  6 or 7 times                                             6         0.3", 
"                  6                  8 or 9 times                                             0         0.0", 
"                  7                  10 or 11 times                                           3         0.2", 
"                  8                  12 or more times                                         2         0.1", 
"                                     Missing                                                 37", 
"                                                                                                             4", 
"")


From the above text I want to create another document like the below result:

Desired_Result <- c(
"q13: such as a gun, knife, or club on school property?" =  "q13",
"q14: During the past 12 months, on how many days did you carry a gun?" =  "q14",
"q15: because you felt you would be unsafe at school or on your way to or" =  "q15",
"q16: or injured you with a weapon such as a gun, knife, or club on school" =  "q16",
)

Nevertheless, q13, q15 and q16 are not complete questions because the rest of the questions lines are above or below the selected line with a regular expression.

QUESTION:

My question is how can I select the above or below lines of a regular expression selection while meeting another regular expression criteria and then adequately concatenate them?

I accomplished the Desired_Result above using the following code:

Qs_Lines <- grep("[a-zA-Z]*Q[0-9][0-9]?", File, perl = TRUE, value = TRUE)
Qs_Lines <- str_trim(Qs_Lines)
Qs_Lines

# Extract Q ----
Qs <- Qs_Lines %>% str_extract("Q([0-9]){1,2}")
Qs

# Extract text after the Q[0-9][0-9]
Info_Lines <- str_extract(Qs_Lines, "[:blank:]+[a-zA-Z][a-zA-Z].*") %>% str_trim
Info_Lines

# Select lines before Qs if the sentence in Q lines is not complete

# Line_Before_Qs <-  str_subset(File, "^\\s{18,19}[A-Z][a-z]") %>% str_trim()
# Line_Before_Qs <-  Line_Before_Qs[1:100]


# Paste expression results and text
Final <-  paste0("\"", tolower(Qs), ": ", Info_Lines, "\"", " = ", " \"", tolower(Qs), "\"", ",", sep = "" )

# Include a parentheses to enclose the result  = c(XX)  --------------------

Final <- c("c(", Final, ")")

# WriteLines is a function to help se ethe end result ----------------------

writeLines(
Final
)

Bellow I include two unsuccessful code of some trials. I think they can help in getting the correct result.

Thanks a lot for your help

And the best in this New Year 2020

############# For loop with if #################
line_count <- length(File)
q_Line <- ""
before_q_Line <- ""
question <- ""

# For loop
for (i in 1:line_count){

  if (str_detect(File[i], "\\d*-\\d*\\s*Q.\\s*") == TRUE  | str_detect(File[i], "\\d*-\\d*\\s*QN.\\s*") == TRUE ) {

    q_Line[i] <- File[i]
  }

  if(str_detect(File[i], pattern = "^\\s{18,19}[A-Z][a-z]") == TRUE){

    before_q_Line[i] <- File[i]
  } 
}

question <- paste(before_q_Line, q_Line)

question
###############End of For loop with if ####################

Another try

############ for loop with if and while #############
for (i in 1:line_count){

  if (str_detect(File[i], "\\d*-\\d*\\s*Q.\\s*") == TRUE ) {

    q_line[i] <- File[i]
  }

prior <- i-1
    while(str_detect(File[prior], pattern = "^\\s{18,19}[A-Z][a-z]") == TRUE){

      before_question [i]<- File[i-1]

    }

question[i] <- str_glue(question[i], File[prior], sep = "")
}
################ End of for loop with if and while ######################


Viewing all articles
Browse latest Browse all 206503

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>