Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201867

Cannot save - load xml_document generated from rvest in R

$
0
0

The read_html function generates an xml_document which i would like to save and later on load it to parse it.

The problem is that after loading the xml_document there is no html within it.

library(rvest)
library(magrittr)
doc <- read_html("http://www.example.com/")
doc %>% html_node("h1") %>% html_text

I get: [1] "Example Domain"

But when I save first the xml_document doc object and load it again it seems that everything has been cleared.

save(doc, file=paste0(getwd(), "/example.RData"))
rm(doc)

load(file=paste0(getwd(), "/example.RData"))
doc %>% html_node("h1") %>% html_text

I get: Error: No matches

Or when i run doc i get: {xml_document} an empty xml_document.

It is also the case that when i run the doc, after having loaded it, i get a message that RStudio has stopped working.

I have tried it on two different windows machines, got the same problem.

sessionInfo()

R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] magrittr_1.5     rvest_0.3.1.9000 xml2_0.1.2      

loaded via a namespace (and not attached):
[1] httr_1.1.0  R6_2.1.2    tools_3.3.0 Rcpp_0.12.5

Viewing all articles
Browse latest Browse all 201867

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>