Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201977

What is the best way to validate international geographic data?

$
0
0

I have a dataset of ~600 observations that contains geographic data. The data are international, so it is possible for the address, city, etc. to be in the United States but also in any other country in the world.

While I am confident in the address and city data, I am not confident in my country data or my state data (where states are only relevant for addresses located in the United States, obviously). What I am wondering is this:

What is the best way for me to identify the country and, subsequently, the state for each address in my dataset?

My initial thought is to do this:

  1. In R (specifically, using RStudio), use the ggmap package to geocode only the address and city information. Preserve address+city strings.
  2. Reverse geeocode the lat/long pairs to get the state and country, and create state+country strings. Concatenate that with my original address+city strings to get all the data I want for each observation.

Here is the code that I have tried to run so far:

addresses <- read_csv("~data/data_addresses_MMDDYYYY.csv")
addresses_concat <- addresses %>% 
mutate(location = paste0(ORIG_ADDRESS1, ", ", ORIG_CITY))
location <- addresses_concat$location
geo <- mutate_geocode(location) 

This was going to be an imperfect way to see at least some data, but unfortunately I got the following error message when I ran geo <- mutate_geocode(location) : "Error in data[[deparse(substitute(location))]] : subscript out of bounds". This is likely due to the various languages/character coding in the strings contained within location.

While I know I can go into locations and fix the language/coding issues, I first want to see if there are better ways to get at what I'm trying to do. Therefore, would appreciate if anyone can put me on the right track. Thanks!


Viewing all articles
Browse latest Browse all 201977

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>