I have a dataset of around 50 contiguous days. I want to divide it into training and test data sets, such that each 5 of the days of the week are in the training set, and 2 of the days of the week are in the test set.
The 2 days of the test set should be selected randomly. Like not always e.g. 1st 2 days are selected.
How could I do that?
Is there any function for this in R? Currently this is how I am dividing data into training and test set but it's probably doing such that test and train data times are very near to each other so always very high MSR value resulting.
set.seed(100)
train <- sample(nrow(dataset1), 0.7 * nrow(dataset1), replace = FALSE)
TrainSet <- dataset1[train,]
#scale (TrainSet, center = TRUE, scale = TRUE)
ValidSet <- dataset1[-train,]
#scale (ValidSet, center = TRUE, scale = TRUE)
summary(TrainSet)
summary(ValidSet)
Example Data:
data
# timestamp var1 var2 var3 var5
#1 2018-07-20 13:40:00 12 0.00 30.12 10
#2 2018-07-20 13:45:00 12 0.10 10.15 10
#3 2018-07-20 13:50:00 2 11.00 19.22 17
#4 2018-07-20 13:55:00 22 3.05 23.31 3
dput(data)
structure(list(timestamp = c("2018-07-20 13:50:00", "2018-07-20 13:52:00",
"2018-07-20 13:54:00", "2018-07-20 13:56:00"), var1 = c(12, 12,
2, 22), var2 = c(0, 0.1, 11, 3.05), var3 = c(30.12, 10.15, 19.22,
23.31), var5 = c(10L, 10L, 17L, 3L)), class = "data.frame", row.names = c(NA,
-4L))