I want to scrape all the names from this page. With the result of one tibble of three columns. My code only works if all the data is there hence my error:
Error: Tibble columns must have consistent lengths, only values of length one are recycled:
* Length 20: Columns `huisarts`, `url`
* Length 21: Column `praktijk`
How can I let my code run but fill with Na
's in tibble
if the data isn't there.
My code for a pauzing robot later used in scraper function:
pauzing_robot <- function (periods = c(0, 1)) {
tictoc <- runif(1, periods[1], periods[2])
cat(paste0(Sys.time()),
"- Sleeping for ", round(tictoc, 2), "seconds\n")
Sys.sleep(tictoc)
}
Scraper:
library(tidyverse)
library(rvest)
scrape_page <- function(pagina_nummer) {
page <- read_html(paste0("https://www.zorgkaartnederland.nl/huisarts/pagina", pagina_nummer))
pauzing_robot(periods = c(0, 1.5))
tibble(
huisarts = page %>%
html_nodes(".media-heading.title.orange") %>%
html_text() %>%
str_trim(),
praktijk = page %>%
html_nodes(".location") %>%
html_text() %>%
str_trim(),
url = page %>%
html_nodes(".media-heading.title.orange") %>%
html_nodes("a") %>%
html_attr("href") %>%
str_trim() %>%
paste0("https://www.zorgkaartnederland.nl", .)
)
}
Total number of pages 445, but for example sake only scraping three:
huisartsen <- map_df(sample(1:3), scrape_page)
Page 2 seems to be the problem with inconsistent lengths because this code works:
huisartsen <- map_df(3:4, scrape_page)
If possible with tidyverse
code. Thanks in advance.