I want to scrape all the content which has the class "content-text__container" inside the tag "p". While scraping, some of the sentences are in the row bellow. I want to collapse, so the paragraph is inside just one cell. For doing that I need to collapse the content before the empty cell in the "checagem" column.
url <- "https://g1.globo.com/fato-ou-fake/noticia/2018/10/05/veja-o-que-e-fato-ou-fake-nas-falas-dos-presidenciaveis-no-debate-da-globo.ghtml"
texto <- url %>%
read_html() %>%
html_nodes("p.content-text__container") %>%
html_text() %>%
str_trim() %>%
as.data.frame() %>%
`colnames<-`("checagem") %>%
mutate(checagem = toupper(checagem),
checagem = rm_accent(checagem)) %>%
mutate(rotulo = ifelse(str_detect(checagem, "VEJA O PORQUE:"),
checagem, NA))
For example, this means that the content in the row 121, 122 e 123 (image below) should be in just one row.