I'm using the the package randomForest in R to create a model to classify cases into disease (1) or disease free (0):
classify_BV_100t <- randomForest(bv.disease~., data=RF_input_BV_clean, ntree = 100, localImp = TRUE)
print(classify_BV_100t)
Call:
randomForest(formula = bv.disease ~ ., data = RF_input_BV_clean, ntree = 100, localImp = TRUE)
Type of random forest: classification
Number of trees: 100
No. of variables tried at each split: 53
OOB estimate of error rate: 8.04%
Confusion matrix:
0 1 class.error
0 510 7 0.01353965
1 39 16 0.70909091
My confusion matrix shows that the model is good at classifying 0 (no disease), but is very bad as classifying 1 (disease).
But when I plot ROC plots it gives the impression that the model is pretty good.
Here are the 2 different ways I plotted ROC:
library(pROC) rf.roc<-roc(RF_input_BV_clean$bv.disease, classify_BV_100t$votes[,2]) plot(rf.roc) auc(rf.roc)
(Using How to compute ROC and AUC under ROC after training using caret in R?)
library(ROCR) predictions <- as.vector(classify_BV_100t$votes[,2]) pred <- prediction(predictions, RF_input_BV_clean$bv.disease) perf_AUC <- performance(pred,"auc") #Calculate the AUC value AUC <- perf_AUC@y.values[[1]] perf_ROC <- performance(pred,"tpr","fpr") #plot the actual ROC curve plot(perf_ROC, main="ROC plot") text(0.5,0.5,paste("AUC = ",format(AUC, digits=5, scientific=FALSE)))
These are the ROC plots from 1 and 2:
Both methods give me an AUC of 0.8621593.
Does anyone know why the results from the random forest confusion matrix don't seem to add up with the ROC/AUC?