Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201894

Method(s) to speed up st_crop (sf package) on large datasets

$
0
0

I need to extract information across different shapefiles for ~ 4 mio grid-cells of 1 ha. Currently I am using st_crop on each layer in a for-loop over all cells, but this runs forever. I thought to speed up the process in using a 'data.table'(DT)-sort-of-way to crop shapefiles by coordinates. Let's consider the example below, where I am looking for the extent of polygon edges in an area of interest:

require(sf)
require(data.table)
require(ggplot2)
require(tidyverse)

# load shapefile
nc = st_read(system.file("shape/nc.shp", package="sf"))


# Define a bounding-box that mimic a mowing-window or area of interest
bb <- st_bbox(c(xmin= -79, xmax=-78,ymin= 34.5, ymax= 35.5))


# Commute 'nc' into some sort of data.table object for fast subsetting, in preserving object's integrity (i.e. same id to all points of a given polygon)
nobs <- mapview::npts(nc,by_feature=T)
NC <- data.table::data.table(id=rep(1:nrow(nc),nobs),st_coordinates(nc)[,1:2])
head(NC)

# Compare cropping methods amon
library(microbenchmark)
x = runif(100)
test <- microbenchmark(
  crop_nc <- st_crop(nc,bb),
  crop_NC <- NC[X >= bb[1] & X < bb[3] & Y>= bb[2] & Y < bb[4]]
)  

print(test)
Unit: microseconds
  expr      min       lq      mean   median        uq       max neval cld
 crop_nc  5205.051 5675.807 6837.9472 5903.219 6829.0865 16046.654   100   b
 crop_NC   405.334  528.356  624.8398  576.996  656.9245  1295.361   100  a 
There were 50 or more warnings (use warnings() to see the first 50)

As expected, the DT-way of subsetting is faster. Let's now go from our DT-object back to as sf-object as follow:

crop_NC_sf <- st_as_sf(crop_NC,coords=c("X","Y"),crs=st_crs(nc))  %>% group_by(id)  %>%  summarise(i=mean(id)) %>% st_cast("POLYGON")

Now compare the perimet of polygon's included in our study area:

sum(st_length(crop_nc),na.rm=T)
1307555 [m]

sum(st_length(crop_NC_sf),na.rm=T)
2610959 [m]

Obviously not working very well...

Result

Questions:

  • is there another way to speed up st_crop()

  • is there a way to recreate a polygon from points in preserving the 'original' order points are connected to each others?


Viewing all articles
Browse latest Browse all 201894

Trending Articles