I am trying to match the values of a set of points at fixed locations across multiple years to their nearest nodes that are also at fixed points. To do this I have chosen to use a Voronoi tesselation where the score of each point is associated entirely with the nearest node. I am using the sf package which I don't understand very well. I have made two ordering assumptions, these assumptions appear to be incorrect as the results are not as expected. The assumptions are shown on line 37 and 38 of the reprex.
In the reprex below I set the coordinates such that I can calculate the node scores using either the Voronoi method or a simple join. As can be seen, the two values are not identical
My question is: How do I correctly match my points to the correct cell in the Voronoi tesselation?
library(sf); library(dplyr); library(tibble)
#Create the data set of node locations
set.seed(256)
node_locations <- expand.grid(x = 1:10, y = 1:10) %>%
as_tibble() %>%
mutate(Node = expand.grid(x = LETTERS[1:10], y = LETTERS[1:10], stringsAsFactors = FALSE) %>%
{ paste0(.$x,.$y)})
#create the score spread across the geographical area of the points for multiple years
score_by_year <- expand.grid(x = 1:10, y = 1:10, year = 2001:2010) %>% as_tibble %>%
mutate(score = sample(1:1000, size = nrow(.), replace = TRUE),
ID = 1:n())
#convert to spatial data
node_locations_sf <- node_locations %>%
st_as_sf(., coords = c("x", "y"))
#add in the correct projection data to align with the map
st_crs(node_locations_sf) <- "+proj=longlat +datum=WGS84 +no_defs"
#convert to spatial data
score_by_year_sf <- score_by_year %>%
st_as_sf(., coords = c("x", "y"))
#add in the correct projection data to align with the map
st_crs(score_by_year_sf) <- "+proj=longlat +datum=WGS84 +no_defs"
#create voronoi tesselation
node_v <- node_locations_sf %>% st_union() %>% st_voronoi()
#this is what it looks like
plot(node_v, col = "0")
#find which scores are associated with the nodes
voronoi_intersection <- st_intersects(st_cast(node_v), score_by_year_sf, sparse = FALSE)
#create a dictionary to match nodes to score IDs
Node_data_dictionary <- voronoi_intersection %>% as_tibble(.) %>%
bind_cols(node_locations %>% select(Node)) %>% #I assume that the row order is the same as the node_locations df
set_names(c(score_by_year$ID, "Node")) %>% #I assume that the columns are the same order as the date_by_year df
gather(key = ID, value = value, -Node) %>%
filter(value) %>% #remove values that that show a point is NOT within a cell, this is the majority of values
select(-value) %>%
mutate(ID = as.integer(ID))
#join scores to nodes
Node_score_year <- left_join(Node_data_dictionary, score_by_year)
#create df of the sum of scores across all years for the voronoi matched df
score_across_years_voronoi <- Node_score_year %>%
group_by(Node) %>%
summarise(score = sum(score),
counts = n())
#create df of sum of scores just by joining the original two dfs together
score_across_years_join <- left_join(node_locations,score_by_year) %>%
group_by(Node) %>%
summarise(score = sum(score),
counts = n())
#Calculating the score using the two different methods does not produce the same result
score_diffs <- left_join(
score_across_years_voronoi %>% select(Node, score_voronoi = score),
score_across_years_join %>% select(Node, score_join = score)
) %>%
mutate(diffs = score_voronoi-score_join)