Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201894

How to read json data from list of URLs and tidy it into a dataframe?

$
0
0

I am using the help of https://ipstack.com to geocode IP addresses and am having a difficult time trying to geocode all 1200 addresses in a short amount of time.

With R, I've collected the URLs into a list (e.g. http://api.ipstack.com/[IP address]?access_key=[access key]) and can use read_json to read the json data of each URL. But I've not been able to develop a loop to extract the data from each URL.

library(RCurl)
library(jsonlite)

x <- c("http://api.ipstack.com/178.140.119.217?access_key=[access_key]", "http://api.ipstack.com/68.37.21.125?access_key=[access_key]", "http://api.ipstack.com/68.10.255.89?access_key=[access_key]")

read_json(x)
Error in file(path) : invalid 'description' argument

I'm looking for a solution that will be able to read multiple IP addresses and then attach the information to a dataframe.

*Edit 1: Still stuck, but I'm making some progress with the loop,

library(RCurl)
library(jsonlite)

url_lst = as.character(df$URL)

output = NULL
for (i in url_lst) { 
  x = as.data.frame(read_json(i))
  output = rbind(output,x)
 }

However, this results in an error:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : arguments imply differing number of rows: 1, 0

As well, the code only produces 8 observations rather than 1200.

*Edit 2: Bill Ash's answer got me further than I was, but it looks like some values in the JSON data aren't allowing the code to be successful.

Bill Ash's code:

library(httr)
library(tibble)
library(purrr)
library(jsonlite)

ip_addresses <- core_members$ip_address

# a simple function
ip_locate <- function(your_vector_of_ip_addresses, access_key) {

  ip <- your_vector_of_ip_addresses

  map_df(ip, ~{
    out <- httr::GET(url = paste0("http://api.ipstack.com/", .,
                                  "?access_key=", access_key))
    resp <- fromJSON(httr::content(out, "text"), flatten = TRUE)
    tibble::tibble(ip = resp$ip, 
                   country = resp$country_name, 
                   region = resp$region_name, 
                   city = resp$city, 
                   zip = resp$zip, 
                   lat = resp$latitude, 
                   lng = resp$longitude)

  })

}


ip_info <- ip_locate(your_vector_of_ip_addresses = ip_addresses, 
                     access_key = "[access_key]")

# output

ip_info %>% 
  head()

Where the error begins

ip_info <- ip_locate(your_vector_of_ip_addresses = ip_addresses, 
                     access_key = "[access_key]")

Error: All columns in a tibble must be 1d or 2d objects:
* Column `zip` is NULL
9.
stop(cnd) 
8.
abort(error_column_must_be_vector(names_x[is_xd], classes)) 
7.
check_valid_cols(x) 
6.
lst_to_tibble(xlq$output, .rows, .name_repair, lengths = xlq$lengths) 
5.
tibble::tibble(ip = resp$ip, country = resp$country_name, region = resp$region_name, 
city = resp$city, zip = resp$zip, lat = resp$latitude, lng = resp$longitude) 
4.
.f(.x[[i]], ...) 
3.
map(.x, .f, ...) 
2.
map_df(ip, ~{
out <- httr::GET(url = paste0("http://api.ipstack.com/", 
    ., "?access_key=", access_key))
resp <- fromJSON(httr::content(out, "text"), flatten = TRUE) ... 
1.
ip_locate(your_vector_of_ip_addresses = ip_addresses, access_key = "[access_key]")

Because I only need the coordinates from these IP addresses, I believe this has been resolved. Hopefully, someone is be willing to continue advising on this issue, but I won't be updating this any further.


Viewing all articles
Browse latest Browse all 201894

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>