Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

How to scrap the data from a website and write to a csv in a specified format in R

$
0
0

I am new to Web scraping and R. I am trying to scrap the data from https://www.booking.com/country.html

My Problem Statement:

The idea is to extract all numbers regarding any kind of accommodation listed for a particular country. The output needs to have the list of all countries in ‘column A’ of an excel file and the relevant number of listings for different property types (Ex. Apartments, Hostels,Resorts etc) in each respective country adjacent to the country names in separate columns. I need to capture all the details for all the property types for a given country.

The below image describes the output format required.

The above image describes the output format required in excel. I am able to get the country using the below code but not the property types and their respective data. How to get the data iteratively in function for the all the countries and write in a cvs. Any help on this code will be very helpful. Thanks in advance.

library(reshape2)
library(stringr)

url <- "https://www.booking.com/country.html"

bookingdata <- read_html(url)

#extracting the country
country <- html_nodes(bookingdata, "h2 > a") %>% 
  html_text()
write.csv(country, 'D:\\web scraping\\country.csv' ,row.names = FALSE)
print(country)

#extracting the data inside the inner div 
html_nodes(bookingdata, "div >div > div > ul > li > a")%>%
  html_text()
for (i in country) {
print(i)
html_nodes(pg, "ul > li > a") %>% 
  html_text()
  print(accomodation)
}

#getting all the data
accomodation <- html_nodes(pg, "ul > li > a") %>% 
  html_text()

#separating the numbers
accomodation.num <- (str_extract(accomodation, "[0-9]+"))
#separating the characters
accomodation.char <- (str_extract(accomodation,"[aA-zZ]+"))
#separating unique characters
unique(accomodation.char)

Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>