I am extracting information on different topics from specific objects. In the first part, when I wrote the code, the selectors provided the specific information I was looking for. However, each object provides less or more information. The fact is that the end result is a matrix with many NA values and information in the wrong order. I checked it the page and the selectors in some objects provide some information and in another object the info is different.
There is a way to use the name of the variable upside of the desired selector with the goal to obtain default information without matter the changes due to website or information length.
The first part (vector Var
), was obtained with a previous process
Var <- c("https://eol.org/pages/401504/data", "https://eol.org/pages/3089826/data",
"https://eol.org/pages/52361/data", "https://eol.org/pages/2967667/data",
"https://eol.org/pages/587416/data", "https://eol.org/pages/3096662/data",
"https://eol.org/pages/3096667/data", "https://eol.org/pages/18009694/data",
"https://eol.org/pages/2967662/data", "https://eol.org/pages/2967669/data")
This is my code:
GiveMeData<-function(url){
furl<-read_html(url)
iden<-"body > div.l-basic-main > div.l-tabs > div > div > div.names-wrapper > div.names > h1 > i"
Iden<-html_node(furl,iden)
Identext<-html_text(Iden)
dmin<-"body > div.l-basic-main > div.l-content > div > div.l-below-filters > ul > li:nth-child(25) > div.trait-data > div.trait-val"
Dmin<-html_node(furl,dmin)
Dmintext<-html_text(Dmin)
dmax<-"body > div.l-basic-main > div.l-content > div > div.l-below-filters > ul > li:nth-child(24) > div.trait-data > div.trait-val"
Dmax<-html_node(furl,dmax)
Dmaxtext<-html_text(Dmax)
dminextra<-"body > div.l-basic-main > div.l-content > div > div.l-below-filters > ul > li:nth-child(24) > div.trait-data > div.trait-val"
Dminesxtra<-html_node(furl,dminextra)
Dminextratext<-html_text(Dmin)
dmaxextra<-"body > div.l-basic-main > div.l-content > div > div.l-below-filters > ul > li:nth-child(27) > div.trait-data > div.trait-val"
Dmaxesxtra<-html_node(furl,dmaxextra)
Dmaxextratext<-html_text(Dmax)
#Temperature
tmin<-"body > div.l-basic-main > div.l-content > div > div.l-below-filters > ul > li:nth-child(38) > div.trait-data > div.trait-val"
Tmin<-html_node(furl,tmin)
Tmintext<-html_text(Tmin)
tmax<-"body > div.l-basic-main > div.l-content > div > div.l-below-filters > ul > li:nth-child(54) > div.trait-data > div.trait-val"
Tmax<-html_node(furl,tmax)
Tmaxtext<-html_text(Tmax)
tminextra<-"body > div.l-basic-main > div.l-content > div > div.l-below-filters > ul > li:nth-child(53) > div.trait-data > div.trait-val"
Tminextra<-html_node(furl,tminextra)
Tminextratext<-html_text(Tminextra)
tmaxextra<-"body > div.l-basic-main > div.l-content > div > div.l-below-filters > ul > li:nth-child(52) > div.trait-data > div.trait-val"
Tmaxextra<-html_node(furl,tmaxextra)
Tmaxextratext<-html_text(Tmaxextra)
Identext
Tmaxtext<-gsub("degrees Celsius\n","",Tmaxtext)
Tmaxtext<-gsub("\n","", Tmaxtext)
Tmintext<-gsub("degrees Celsius\n","",Tmintext)
Tmintext<-gsub("\n","", Tmintext)
Tmaxextratext<-gsub("degrees Celsius\n","",Tmaxextratext)
Tmaxextratext<-gsub("\n","", Tmaxextratext)
Tminextratext<-gsub("degrees Celsius\n","",Tminextratext)
Tminextratext<-gsub("\n","",Tminextratext)
Dmaxtext<-gsub(" m\n","",Dmaxextratext)
Dmaxtext<-gsub("\n","",Dmaxextratext)
Dmintext<-gsub(" m\n","",Dmintext)
Dmintext<-gsub("\n","",Dmintext)
Dmaxextratext<-gsub(" m\n","",Dmaxextratext)
Dmaxextratext<-gsub("\n","",Dmaxextratext)
Dminextratext<-gsub(" m\n","",Dminextratext)
Dminextratext<-gsub("\n","",Dminextratext)
info=(c(as.character(Identext), as.character(Tmaxtext), as.character(Tmintext), as.character(Tminextratext), as.character(Dmaxtext), as.character(Dmaxextratext), as.character(Dminextratext)))
}
output2<- lapply(c(Var), function(x) tryCatch(GiveMeData(x), error = function(e){}))