Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201894

R function/method to sample data frame using probability until condition is reached

$
0
0

I have a data frame with 3 columns:

ObjectID: the unique identifier of a polygon (or row) AvgWTRisk: probability (0-1) of a disturbance in a forest, ~0.11 is the highest value HA: AREA of a polygon in the forest

I want to develop a function to create a random sample from the data frame, based on the probability value. Here's an example of the data structure:

data

      OBJECTID AvgWTRisk        HA
32697    32697 0.0008456 7.7465000
36480    36480 0.0050852 7.9329797
13805    13805 0.0173463 0.7154995
38796    38796 0.0026580 0.2882192
8494      8494 0.0089310 6.4686595
23609    23609 0.0090647 6.1246000

Dput

structure(list(OBJECTID = c(32697L, 36480L, 13805L, 38796L, 8494L, 
23609L), AvgWTRisk = c(0.0008456, 0.0050852, 0.0173463, 0.002658, 
0.008931, 0.0090647), HA = c(7.7465, 7.9329797, 0.7154995, 0.2882192, 
6.4686595, 6.1246)), row.names = c(32697L, 36480L, 13805L, 38796L, 
8494L, 23609L), class = "data.frame")

I am attempting to do this using the sample() function in R.

Is there any way to use the sum of area as my 'size = ' target as opposed to a number of rows, as such:

Landscape_WTDisturbed <- Landscape_WTRisk[sample(1:nrow(Landscape_WTRisk),
                                                 size = sum(HA >= 100*0.95 && HA <= 100*1.05),
                                                 prob = WTProb, replace = FALSE),]

where: WTProb is as vector of AvgWTRisk, i.e. 'WTProb <- as.vector(Landscape_WTRisk$AvgWTRisk' and HA is the area column from the data frame.

The sample selection above provides me a dataframe with all of the columns but no rows.

As opposed to:

Landscape_WTDisturbed <- Landscape_WTRisk[sample(1:nrow(Landscape_WTRisk),
                                                 size = 10,
                                                 prob = WTProb, replace = FALSE),]

Which works in providing a sample of 10 rows. However, I have no control over the area being selected.

Should I try to achieve this with a while loop, where the area of all of the rows summed together is the criteria, and a small selection of rows can be incrementally added together until the target is reached?

Thank you in advance!


Viewing all articles
Browse latest Browse all 201894

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>