I am working with the BTYD model to generate forecast on customer future transactions. Unfortunately, due to the use of mcmc methods I cannot run the forecast on my whole base of customers (hundreds of thousands) so I have to split the base in many random samples and perform several runs of this model on each of them to retrieve the forecast.
My idea was to use a loop to do the following:
- retrieve a random sample of length 10,000 from the whole base (let's call this data frame as "data")
- store the result in an object called "sample1"
- Now we have to go back to "data" and exclude customers who are in "sample1" and store the new result in "data".
- get a new random sample ("sample2") from the new "data"
- create a new version of "data" excluding all customers included in "sample2" (and "sample1").
- ... continue this cycle until we finish the base and we have created N samples that contain the whole base.
(Every ID must be in one sample only).
Unfortunately my code doesn't seem to be working in the way I want (I am not very good with loops at the moment.
getwd()
data<-read.csv("MOCK_DATA (1).csv")
# this is a fake dataset of 1000 rows that contains only 2 columns:
# customer ID (column name: "id") and a random number (column name "value").
# Every customer ID appears only once in the dataset.
head(data)
set.sample.size<-100
num.cycles<-ceiling(nrow(data)/set.sample.size)
for(i in 1:(num.cycles)) {
nam <- paste("sample_", i, sep = "")
assign(nam, data[sample(nrow(data), set.sample.size), ])
data<-data[!(data$id %in% nam$id),]
}
This code generates the following error: Error in nam$id : $ operator is invalid for atomic vectors
What I expect is to get 10 objects called "sample_1".."sample_10" each of them made of 100 random id from the original data but all unique (no ID are shared between the 10 samples).