After receiving an answer from Montgomery Clift in another post (see here), I tried writing a function in order to loop through multiple days within a month span to collect data from Baseball Prospectus (example page here). The code successfully downloads each day's files, but then I receive the following error:
Error in list_to_dataframe(res, attr(.data, "split_labels"), .id,
id_as_factor) : Results must be all atomic, or all data frames
The function code followed by what I'm running to try and collect all the data:
fetch_adjusted <- function(day) {
fname <- paste0(“standings201909”, day, “.html”)
download.file(url =
paste0(“https://legacy.baseballprospectus.com/standings/index.php?
odate=2019-09-“, day), destfile=fname)
doc0 <- htmlParse(file=fname, encoding=“UTF-8”)
doc1 <- xmlRoot(doc0)
doc2 <- getNodeSet(doc1, “//table[@id=‘content’]”)
standings <- readHTMLTable(doc2[[1]], header=TRUE, skip.rows=1,
stringsAsFactors=FALSE)
standings <- standings[[1]]
standings$day <- day
standings
}
Sept <- ldply(1:29, fetch_adjusted, .progress="text")
Can anyone help figure out how to adjust my current code so I can avoid any errors? Thank you!
UPDATE:
I'm now able to successfully download xls files from multiple dates within a span doing the following:
dates <- seq(as.Date("2019-09-01"), as.Date("2019-09-30"), by=1)
fetch_adjusted <- function(dates) {
url <-
paste0("https://legacy.baseballprospectus.com/standings/index.php?
odate=", dates, "&otype=xls")
destfile <- "test.xls"
download.file(url, destfile, mode = "wb")
}
But now, no matter what mode I use ("w", "wb", "a") it's not appending the files so what I end up with is only the very last file (in this case, 2019-09-30), which is an empty spreadsheet. My thought is it's just overwriting the last file with the most recent every time. Is there a solution for this?