Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 204922

How can I scrape table from PHP website using R?

$
0
0

Looking to import data into R from a table on this page:

https://legacy.baseballprospectus.com/standings/index.php?odate=2019-09-10

I've tried multiple methods using XML and httr with no luck. Have already looked at past posts including:

Read data from a php website with R

and

Scraping html tables into R data frames using the XML package

Wondering if maybe I'm not using the correct table ID from the source or if the table is not in the proper format given the tools I'm currently using?

Any and all help is much appreciated! Thanks in advance!

UPDATE:

After receiving the answer from Montgomery Clift (see below), I tried writing a function in order to loop through multiple days within a month span. The code successfully downloads each day's files, but then I receive the following error:

Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, 
id_as_factor) : Results must be all atomic, or all data frames

The function code followed by what I'm running to try and collect all the data:

fetch_adjusted <- function(day) {
    fname <- paste0(“standings201909”, day, “.html”)
    download.file(url = 
paste0(“https://legacy.baseballprospectus.com/standings/index.php? 
odate=2019-09-“, day), destfile=fname)
    doc0 <- htmlParse(file=fname, encoding=“UTF-8”)
    doc1 <- xmlRoot(doc0)
    doc2 <- getNodeSet(doc1, “//table[@id=‘content’]”)
    standings <- readHTMLTable(doc2[[1]], header=TRUE, skip.rows=1, 
stringsAsFactors=FALSE)
    standings <- standings[[1]]
    standings$day <- day
    standings
}

Sept <- ldply(1:29, fetch_adjusted, .progress="text")

Can anyone help figure out how to adjust my current code so I can avoid any errors? Thank you!


Viewing all articles
Browse latest Browse all 204922

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>