I have a simple linear regression model as such:
Y = Mean_energy , X = A + B
My dataset consist of only 20 rows.
Therefore, to obtain R2 of the model, I did a 5-fold cross validation (cv).
To do cv in Python, I used cross_validate function in scikit-learn,cross_validate(model, X, Y, cv=5, scoring='r2')
.
To do cv in R, I used model <- train(Y ~ A + B ,data = df, method = "lm", trControl = train.control)
trControl=trainControl(method = "cv", number = 5)
. And then use model$resample
to check the cv R2.
The R2 results in R seems to fluctuate a lot vs in Python. Any idea why? I have a feeling that the way I do cv in R is wrong.
cv R2 in R:
Fold 1 = 0.6686680
Fold 2 = 0.3571826
Fold 3 = 0.8858084
Fold 4 = 0.7081766
Fold 5 = 0.3101449
cv R2 in Python:
Fold 1 = 0.29353287
Fold 2 = 0.24257606
Fold 3 = 0.38664367
Fold 4 = 0.26943862
Fold 5 = 0.24531835
FYI, for R cross validation I refer to https://quantdev.ssri.psu.edu/tutorials/cross-validation-tutorial