I am trying to scrape data from the open secrets webpage regarding the 2018 US congressional election. There is a different URL for each district. Using readLines on a single URL give me exactly the output i'm looking for. This gives me the right output for the Arizona first district:
dollars <- readLines("https://www.opensecrets.org/races/summary?cycle=2018&id=AZ01&spec=N", encoding ='UTF-8')
However, I want to get all the info from all 435 districts. I know I could manually create a vector of URLs but this seems extremely inefficient. Instead, I want to loop through all the URLs that share the same base. I am trying to use this code:
Parl_all <- list()
for (i in 1:435) {
df <- readLines(paste("https://www.opensecrets.org/races/summary?
cycle=2018&id=",i,"&spec=N", sep=""), encoding = 'UTF-8')
df <- str_replace_all(df, "<img src='images/check.gif'>", "<font
size='2'>Yes</font>")
df <- read_html(toString(df))
df <- as.data.frame(html_table(df, fill=TRUE))
df$district <- i
Parl_all[[i]] <- df
}
But it gives me the following error:
``Error in `$<-.data.frame`(`*tmp*`, "session", value = 1L) :
replacement has 1 row, data has 0``
Obviously the data is not being scraped. Any ideas?