Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

How to create unique samples (every element appears in one sample only) using R?

$
0
0

I am working with the BTYD model to generate forecast on customer future transactions. Unfortunately, due to the use of mcmc methods I cannot run the forecast on my whole base of customers (hundreds of thousands) so I have to split the base in many random samples and perform several runs of this model on each of them to retrieve the forecast.

My idea was to use a loop to do the following:

  1. retrieve a random sample of length 10,000 from the whole base (let's call this data frame as "data")
  2. store the result in an object called "sample1"
  3. Now we have to go back to "data" and exclude customers who are in "sample1" and store the new result in "data".
  4. get a new random sample ("sample2") from the new "data"
  5. create a new version of "data" excluding all customers included in "sample2" (and "sample1").
  6. ... continue this cycle until we finish the base and we have created N samples that contain the whole base.

(Every ID must be in one sample only).

Unfortunately my code doesn't seem to be working in the way I want (I am not very good with loops at the moment.


getwd()

data<-read.csv("MOCK_DATA (1).csv") 
# this is a fake dataset of 1000 rows that contains only 2 columns: 
# customer ID (column name: "id") and a random number (column name "value").
# Every customer ID appears only once in the dataset.

head(data)

set.sample.size<-100
num.cycles<-ceiling(nrow(data)/set.sample.size)

for(i in 1:(num.cycles)) {
 nam <- paste("sample_", i, sep = "")
 assign(nam, data[sample(nrow(data), set.sample.size), ])
 data<-data[!(data$id %in% nam$id),]
}

This code generates the following error: Error in nam$id : $ operator is invalid for atomic vectors

What I expect is to get 10 objects called "sample_1".."sample_10" each of them made of 100 random id from the original data but all unique (no ID are shared between the 10 samples).


Viewing all articles
Browse latest Browse all 201839

Trending Articles