Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201945

How to default the selector in web scraping in r?

$
0
0

I am extracting information on different topics from specific objects. In the first part, when I wrote the code, the selectors provided the specific information I was looking for. However, each object provides less or more information. The fact is that the end result is a matrix with many NA values ​​and information in the wrong order. I checked it the page and the selectors in some objects provide some information and in another object the info is different.

There is a way to use the name of the variable upside of the desired selector with the goal to obtain default information without matter the changes due to website or information length.

The first part (vector Var), was obtained with a previous process

Var <- c("https://eol.org/pages/401504/data",  "https://eol.org/pages/3089826/data",
         "https://eol.org/pages/52361/data",   "https://eol.org/pages/2967667/data",
         "https://eol.org/pages/587416/data",  "https://eol.org/pages/3096662/data",
         "https://eol.org/pages/3096667/data", "https://eol.org/pages/18009694/data", 
         "https://eol.org/pages/2967662/data", "https://eol.org/pages/2967669/data")

This is my code:

GiveMeData<-function(url){ 

  furl<-read_html(url)

  iden<-"body > div.l-basic-main > div.l-tabs > div > div > div.names-wrapper > div.names > h1 > i"
  Iden<-html_node(furl,iden)
  Identext<-html_text(Iden)

  dmin<-"body > div.l-basic-main > div.l-content > div > div.l-below-filters > ul > li:nth-child(25) > div.trait-data > div.trait-val"
  Dmin<-html_node(furl,dmin)
  Dmintext<-html_text(Dmin)

  dmax<-"body > div.l-basic-main > div.l-content > div > div.l-below-filters > ul > li:nth-child(24) > div.trait-data > div.trait-val"
  Dmax<-html_node(furl,dmax)
  Dmaxtext<-html_text(Dmax)

  dminextra<-"body > div.l-basic-main > div.l-content > div > div.l-below-filters > ul > li:nth-child(24) > div.trait-data > div.trait-val"
  Dminesxtra<-html_node(furl,dminextra)
  Dminextratext<-html_text(Dmin)

  dmaxextra<-"body > div.l-basic-main > div.l-content > div > div.l-below-filters > ul > li:nth-child(27) > div.trait-data > div.trait-val"
  Dmaxesxtra<-html_node(furl,dmaxextra)
  Dmaxextratext<-html_text(Dmax)

  #Temperature
  tmin<-"body > div.l-basic-main > div.l-content > div > div.l-below-filters > ul > li:nth-child(38) > div.trait-data > div.trait-val"
  Tmin<-html_node(furl,tmin)
  Tmintext<-html_text(Tmin)

  tmax<-"body > div.l-basic-main > div.l-content > div > div.l-below-filters > ul > li:nth-child(54) > div.trait-data > div.trait-val"
  Tmax<-html_node(furl,tmax)
  Tmaxtext<-html_text(Tmax)

  tminextra<-"body > div.l-basic-main > div.l-content > div > div.l-below-filters > ul > li:nth-child(53) > div.trait-data > div.trait-val"
  Tminextra<-html_node(furl,tminextra)
  Tminextratext<-html_text(Tminextra)

  tmaxextra<-"body > div.l-basic-main > div.l-content > div > div.l-below-filters > ul > li:nth-child(52) > div.trait-data > div.trait-val"
  Tmaxextra<-html_node(furl,tmaxextra)
  Tmaxextratext<-html_text(Tmaxextra)

  Identext

  Tmaxtext<-gsub("degrees Celsius\n","",Tmaxtext)
  Tmaxtext<-gsub("\n","", Tmaxtext)


  Tmintext<-gsub("degrees Celsius\n","",Tmintext)
  Tmintext<-gsub("\n","", Tmintext)


  Tmaxextratext<-gsub("degrees Celsius\n","",Tmaxextratext)
  Tmaxextratext<-gsub("\n","", Tmaxextratext)

  Tminextratext<-gsub("degrees Celsius\n","",Tminextratext)
  Tminextratext<-gsub("\n","",Tminextratext)

  Dmaxtext<-gsub(" m\n","",Dmaxextratext)
  Dmaxtext<-gsub("\n","",Dmaxextratext)

  Dmintext<-gsub(" m\n","",Dmintext)
  Dmintext<-gsub("\n","",Dmintext)

  Dmaxextratext<-gsub(" m\n","",Dmaxextratext)
  Dmaxextratext<-gsub("\n","",Dmaxextratext)

  Dminextratext<-gsub(" m\n","",Dminextratext)
  Dminextratext<-gsub("\n","",Dminextratext)

  info=(c(as.character(Identext), as.character(Tmaxtext), as.character(Tmintext), as.character(Tminextratext), as.character(Dmaxtext), as.character(Dmaxextratext), as.character(Dminextratext)))

}


output2<- lapply(c(Var), function(x) tryCatch(GiveMeData(x), error = function(e){}))


Viewing all articles
Browse latest Browse all 201945

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>