Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

How can I get info from multiple (435) webpages using readLines and a loop in R?

$
0
0

I am trying to scrape data from the open secrets webpage regarding the 2018 US congressional election. There is a different URL for each district. Using readLines on a single URL give me exactly the output i'm looking for. This gives me the right output for the Arizona first district:

dollars <- readLines("https://www.opensecrets.org/races/summary?cycle=2018&id=AZ01&spec=N", encoding ='UTF-8')

However, I want to get all the info from all 435 districts. I know I could manually create a vector of URLs but this seems extremely inefficient. Instead, I want to loop through all the URLs that share the same base. I am trying to use this code:

Parl_all <- list()
for (i in 1:435) {
df <- readLines(paste("https://www.opensecrets.org/races/summary?
  cycle=2018&id=",i,"&spec=N", sep=""), encoding = 'UTF-8')
df <- str_replace_all(df, "<img src='images/check.gif'>", "<font 
size='2'>Yes</font>")
df <- read_html(toString(df))
df <- as.data.frame(html_table(df, fill=TRUE))
df$district <- i
Parl_all[[i]] <- df
}

But it gives me the following error:

``Error in `$<-.data.frame`(`*tmp*`, "session", value = 1L) : 

replacement has 1 row, data has 0``

Obviously the data is not being scraped. Any ideas?


Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>