Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

how to download pdf file with R from web (encode issue)

$
0
0

I am trying to download a pdf file from a website using R. When I tried to to use the function browserURL, it only worked with the argument encodeIfNeeded = T. As a result, if I pass the same url to the function download.file, it returns "cannot open destfile 'downloaded/teste.pdf', reason 'No such file or directory", i.e., it cant find the correct url.

How do I correct the encode, in order for me to be able to download the file programatically? I need to automate this, because there are more than a thousand files to download.

Here's a minimum reproducible code:

library(tidyverse)
library(rvest)


url <- "http://www.ouvidoriageral.sp.gov.br/decisoesLAI.html"
webpage <- read_html(url)

# scrapping hyperlinks
links_decisoes <- html_nodes(webpage,".borderTD a") %>%
  html_attr("href")

# creating full/correct url
full_links <- paste("http://www.ouvidoriageral.sp.gov.br/", links_decisoes, sep="" )

# browseURL only works with encodeIfNeeded = T
browseURL(full_links[1], encodeIfNeeded = T,
          browser = "C://Program Files//Mozilla Firefox//firefox.exe")
# returns an error
download.file(full_links[1], "downloaded/teste.pdf") 

Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>