Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201945

R caret package: data partition into training / test sets before trainControl?

$
0
0

I see a lot of R codes where a full dataset is first split into a training set and a test set:

library(caret)
library(klaR)

# load the iris dataset
data(iris)

# define a 80%/20% train/test split of the dataset
trainIndex <- createDataPartition(iris$Species, p=0.8, list=FALSE)
data_train <- iris[trainIndex,]
data_test <- iris[-trainIndex,]

In a second time, a partition method is defined such as repeated k-fold cross validation:

train_control <- trainControl(method="repeatedcv", number=10, repeats=3)

Then a model is trained using the training set:

my_model <- train(Species~., data=data_train, trControl=train_control, method="nb")

Finally, predictions are performed on the test set:

pred_results <- predict(my_model, newdata=data_test)

When using specifically a (repeated) k-fold cross validation method, it seems to me that the training (n=k-1 folds ) and the test (n=1 fold) sets are already inherently defined.

In this case why adding an extra layer of partition by splitting first the full dataset into 80% training and 20% test sets? Is it necessary?


Viewing all articles
Browse latest Browse all 201945

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>