Quantcast
Viewing latest article 15
Browse Latest Browse All 206316

Trying to download pdfs in R

I am trying to get a links of pdfs from a site in R but the rvest read_html() function just sites there, seemingly making no progress.

Here is my code:

# Load required librarieslibrary(tidyverse)library(rvest)# Define the URLurl <- "https://providers.anthem.com/new-york-provider/claims/reimbursement-policies/"# Read and process the HTMLlinks <- try({  read_html(url) %>%    html_node(xpath = "/html/body/main/div/div/div/section[3]/div/section/div[1]/section/div[1]/div/div[2]/div/p/a") %>%    html_attr("href") %>%    as_tibble() %>%    rename(url = value)})# Display the results with error handlingif(!inherits(links, "try-error")) {  print(links)} else {  message("Unable to scrape the URL. This might be due to:")  message("- Website requires authentication")  message("- Website blocks automated scraping")  message("- The XPath structure has changed")  message("- Network connectivity issues")}

Maybe I should do this via httr2?

Here is an error message from xml2::read_html():

>   read_html(url)Error in `open.connection()`:! cannot open the connectionHide Traceback▆ 1. ├─xml2::read_html(url) 2. └─xml2:::read_html.default(url) 3.   ├─base::suppressWarnings(...) 4.   │└─base::withCallingHandlers(...) 5.   ├─xml2::read_xml(x, encoding = encoding, ..., as_html = TRUE, options = options) 6.   └─xml2:::read_xml.character(...) 7.     └─xml2:::read_xml.connection(...) 8.       ├─base::open(x, "rb") 9.       └─base::open.connection(x, "rb")

Viewing latest article 15
Browse Latest Browse All 206316

Trending Articles