While web scraping, some of the text retrieved was broken, very similar with foreign text when the incorrect encoding is used. The problem is: the encoding seems to be correct: "UTF-8". Is there any way to fix the text, even though it is supposedly in the correct format? The chunk of code below is the problem reported here. Rstudio is configured with "UTF-8" encoding, and functions that changes the encoding used always returns even more gibberish. Thank you all in advance.
library(rvest)
url <- "https://www1.folha.uol.com.br/poder/2020/01/folhas-da-manha-da-tarde-e-da-noite-se-uniram-sob-um-so-titulo-folha-de-spaulo-ha-60-anos.shtml"
title.news <- html_text(read_html(url) %>%
html_nodes('body') %>%
html_nodes('main') %>%
html_nodes('article') %>%
html_nodes('.block') %>%
html_nodes('h1'))
title.news <- trimws(gsub(pattern = '\\s+', '', title.news))
Encoding(title.news)
[1] "UTF-8"
title.news
[1] "Folhas da Manhã, da Tarde e da Noite se uniram sob um só tÃtulo, Folha de S.Paulo, há 60 anos"
#Desired Output: Folhas da Manhã, da Tarde e da Noite se uniram sob um só título, Folha de S.Paulo, há 60 anos