I need to extract information across different shapefiles for ~ 4 mio grid-cells of 1 ha. Currently I am using st_crop on each layer in a for-loop over all cells, but this runs forever. I thought to speed up the process in using a 'data.table'(DT)-sort-of-way to crop shapefiles by coordinates. Let's consider the example below, where I am looking for the extent of polygon edges in an area of interest:
require(sf)
require(data.table)
require(ggplot2)
require(tidyverse)
# load shapefile
nc = st_read(system.file("shape/nc.shp", package="sf"))
# Define a bounding-box that mimic a mowing-window or area of interest
bb <- st_bbox(c(xmin= -79, xmax=-78,ymin= 34.5, ymax= 35.5))
# Commute 'nc' into some sort of data.table object for fast subsetting, in preserving object's integrity (i.e. same id to all points of a given polygon)
nobs <- mapview::npts(nc,by_feature=T)
NC <- data.table::data.table(id=rep(1:nrow(nc),nobs),st_coordinates(nc)[,1:2])
head(NC)
# Compare cropping methods amon
library(microbenchmark)
x = runif(100)
test <- microbenchmark(
crop_nc <- st_crop(nc,bb),
crop_NC <- NC[X >= bb[1] & X < bb[3] & Y>= bb[2] & Y < bb[4]]
)
print(test)
Unit: microseconds
expr min lq mean median uq max neval cld
crop_nc 5205.051 5675.807 6837.9472 5903.219 6829.0865 16046.654 100 b
crop_NC 405.334 528.356 624.8398 576.996 656.9245 1295.361 100 a
There were 50 or more warnings (use warnings() to see the first 50)
As expected, the DT-way of subsetting is faster. Let's now go from our DT-object back to as sf-object as follow:
crop_NC_sf <- st_as_sf(crop_NC,coords=c("X","Y"),crs=st_crs(nc)) %>% group_by(id) %>% summarise(i=mean(id)) %>% st_cast("POLYGON")
Now compare the perimet of polygon's included in our study area:
sum(st_length(crop_nc),na.rm=T)
1307555 [m]
sum(st_length(crop_NC_sf),na.rm=T)
2610959 [m]
Obviously not working very well...
Questions:
is there another way to speed up st_crop()
is there a way to recreate a polygon from points in preserving the 'original' order points are connected to each others?