I have a data frame with 3 columns:
ObjectID: the unique identifier of a polygon (or row) AvgWTRisk: probability (0-1) of a disturbance in a forest, ~0.11 is the highest value HA: AREA of a polygon in the forest
I want to develop a function to create a random sample from the data frame, based on the probability value. Here's an example of the data structure:
data
OBJECTID AvgWTRisk HA
32697 32697 0.0008456 7.7465000
36480 36480 0.0050852 7.9329797
13805 13805 0.0173463 0.7154995
38796 38796 0.0026580 0.2882192
8494 8494 0.0089310 6.4686595
23609 23609 0.0090647 6.1246000
Dput
structure(list(OBJECTID = c(32697L, 36480L, 13805L, 38796L, 8494L,
23609L), AvgWTRisk = c(0.0008456, 0.0050852, 0.0173463, 0.002658,
0.008931, 0.0090647), HA = c(7.7465, 7.9329797, 0.7154995, 0.2882192,
6.4686595, 6.1246)), row.names = c(32697L, 36480L, 13805L, 38796L,
8494L, 23609L), class = "data.frame")
I am attempting to do this using the sample() function in R.
Is there any way to use the sum of area as my 'size = ' target as opposed to a number of rows, as such:
Landscape_WTDisturbed <- Landscape_WTRisk[sample(1:nrow(Landscape_WTRisk),
size = sum(HA >= 100*0.95 && HA <= 100*1.05),
prob = WTProb, replace = FALSE),]
where: WTProb is as vector of AvgWTRisk, i.e. 'WTProb <- as.vector(Landscape_WTRisk$AvgWTRisk' and HA is the area column from the data frame.
The sample selection above provides me a dataframe with all of the columns but no rows.
As opposed to:
Landscape_WTDisturbed <- Landscape_WTRisk[sample(1:nrow(Landscape_WTRisk),
size = 10,
prob = WTProb, replace = FALSE),]
Which works in providing a sample of 10 rows. However, I have no control over the area being selected.
Should I try to achieve this with a while loop, where the area of all of the rows summed together is the criteria, and a small selection of rows can be incrementally added together until the target is reached?
Thank you in advance!