This is a problem I'm just beginning to approach as part of large piece of spatial data analysis I am undertaking related to the effectiveness of protected conservation areas so no data or code yet.
I am comfortable with a good portion of my methodology and although being a relative newcomer to R there is a step I that have no idea how to script and looking for any advice in the right direction.
I will have a large data set coming from spatial point data I will bring in from Arcgis (using the Geospatial modelling environment) that will consist of 9 million+ data points each with values for 5-7 variables associated with them and values of the mahalanobis distance between them and the other data points for each variable (to be calculated using the 'vegan' package in R). A proportion of these pixels will be from protected areas and the rest from unprotected areas.
My goal is that for every pixel from protected areas (focal pixel) I want to identify the 500 nearest neighbours to it across all of the variables including both protected and unprotected pixels forming a 'similarity set' for each one.
A methodology I am adapting for this recommends that to speed up the processing involved in this step to first establish a boundary of similarity around each focal pixel so that the only pixels that fall within this would need to be tested to see if they constitute the 500 nearest-neighbours rather than the entire dataset. In practical terms all this means is filtering the whole dataset to create a sub-set of data points (pixels) that fall within a range of values for each covariate and then extracting the 500 closest matches from amongst these.
As of right now I don't know how to even begin approaching writing a script to do this so any help at all, even pointing to similar examples would be very useful.
Thanks, Ben.