I am working with a text document similar to the examples below.
File <- c("Location Name Code and Label Frequency Percentage",
" During the past 30 days, on how many days did you carry a weapon",
"44-44 Q13 such as a gun, knife, or club on school property?",
" 1 0 days 1,610 94.5",
" 2 1 day 71 4.3",
" 3 2 or 3 days 6 0.4",
" 4 4 or 5 days 3 0.2",
" 5 6 or more days 12 0.7",
" Missing 48",
"45-45 Q14 During the past 12 months, on how many days did you carry a gun?",
" 1 0 days 1,602 91.3",
" 2 1 day 84 5.0",
" 3 2 or 3 days 17 1.2",
" 4 4 or 5 days 6 0.3",
" 5 6 or more days 38 2.2",
" Missing 3",
" During the past 30 days, on how many days did you not go to school",
"46-46 Q15 because you felt you would be unsafe at school or on your way to or",
" from school?", " 1 0 days 1,407 80.4",
" 2 1 day 180 10.9",
" 3 2 or 3 days 97 5.4",
" 4 4 or 5 days 31 1.8",
" 5 6 or more days 26 1.5",
" Missing 9",
" During the past 12 months, how many times has someone threatened",
"47-47 Q16 or injured you with a weapon such as a gun, knife, or club on school",
" property?", " 1 0 times 1,590 92.5",
" 2 1 time 93 5.7",
" 3 2 or 3 times 10 0.7",
" 4 4 or 5 times 9 0.4",
" 5 6 or 7 times 6 0.3",
" 6 8 or 9 times 0 0.0",
" 7 10 or 11 times 3 0.2",
" 8 12 or more times 2 0.1",
" Missing 37",
" 4",
"")
From the above text I want to create another document like the below result:
Desired_Result <- c(
"q13: such as a gun, knife, or club on school property?" = "q13",
"q14: During the past 12 months, on how many days did you carry a gun?" = "q14",
"q15: because you felt you would be unsafe at school or on your way to or" = "q15",
"q16: or injured you with a weapon such as a gun, knife, or club on school" = "q16",
)
Nevertheless, q13, q15 and q16 are not complete questions because the rest of the questions lines are above or below the selected line with a regular expression.
QUESTION:
My question is how can I select the above or below lines of a regular expression selection while meeting another regular expression criteria and then adequately concatenate them?
I accomplished the Desired_Result above using the following code:
Qs_Lines <- grep("[a-zA-Z]*Q[0-9][0-9]?", File, perl = TRUE, value = TRUE)
Qs_Lines <- str_trim(Qs_Lines)
Qs_Lines
# Extract Q ----
Qs <- Qs_Lines %>% str_extract("Q([0-9]){1,2}")
Qs
# Extract text after the Q[0-9][0-9]
Info_Lines <- str_extract(Qs_Lines, "[:blank:]+[a-zA-Z][a-zA-Z].*") %>% str_trim
Info_Lines
# Select lines before Qs if the sentence in Q lines is not complete
# Line_Before_Qs <- str_subset(File, "^\\s{18,19}[A-Z][a-z]") %>% str_trim()
# Line_Before_Qs <- Line_Before_Qs[1:100]
# Paste expression results and text
Final <- paste0("\"", tolower(Qs), ": ", Info_Lines, "\"", " = ", " \"", tolower(Qs), "\"", ",", sep = "" )
# Include a parentheses to enclose the result = c(XX) --------------------
Final <- c("c(", Final, ")")
# WriteLines is a function to help se ethe end result ----------------------
writeLines(
Final
)
Bellow I include two unsuccessful code of some trials. I think they can help in getting the correct result.
Thanks a lot for your help
And the best in this New Year 2020
############# For loop with if #################
line_count <- length(File)
q_Line <- ""
before_q_Line <- ""
question <- ""
# For loop
for (i in 1:line_count){
if (str_detect(File[i], "\\d*-\\d*\\s*Q.\\s*") == TRUE | str_detect(File[i], "\\d*-\\d*\\s*QN.\\s*") == TRUE ) {
q_Line[i] <- File[i]
}
if(str_detect(File[i], pattern = "^\\s{18,19}[A-Z][a-z]") == TRUE){
before_q_Line[i] <- File[i]
}
}
question <- paste(before_q_Line, q_Line)
question
###############End of For loop with if ####################
Another try
############ for loop with if and while #############
for (i in 1:line_count){
if (str_detect(File[i], "\\d*-\\d*\\s*Q.\\s*") == TRUE ) {
q_line[i] <- File[i]
}
prior <- i-1
while(str_detect(File[prior], pattern = "^\\s{18,19}[A-Z][a-z]") == TRUE){
before_question [i]<- File[i-1]
}
question[i] <- str_glue(question[i], File[prior], sep = "")
}
################ End of for loop with if and while ######################