I have an academic problem and I am trying to run a confusion matrix on a decision tree prediction on my test set. When I run the confusion matrix, I receive the error:
Error in table(data, reference, dnn = dnn, ...) : all arguments must have the same length
Here is a sample from my code:
# create a basic decision tree on training set
tree1 <- rpart(working_well ~ amount_tsh + funder + basin,
data = pump.train, method = 'class')
# predict using this tree, still on the training set. Run the confusion matrix
table(predict(tree1, type = 'class'), pump.train$working_well)
# this works fine
cm1 <- caret::confusionMatrix(predict(tree1, type="class"),
pump.train$working_well, positive = '1')
# now predict the tree on the TEST set
tree1_pred <- predict(tree1, pump.test, type="class")
# run the confusion matrix
predict_cm1 <- caret::confusionMatrix(predict(tree1, type="class"),
pump.test$working_well, positive = '1')
That is where I get the error. I searched the boards and learned that structure are not the same. Both are factors with 2 levels. However, tree1_pred has the following structure:
Factor w/ 2 levels "0","1": 2 2 1 1 2 1 2 2 1 1 ... - attr(*, "names")= chr [1:11879] "1""4""8""19" ...
The structure for pump.test$working_well
is as follows:
Factor w/ 2 levels "0","1": 1 2 2 1 2 1 2 1 1 2 ...
I assume the factors need the same exact structure to run the confusion matrix. But I am at a loss for how to do this. I'm not sure if it's even correct.
Any advice on how to fix this? I know it must be easy but I'm stuck.
If more info is needed, let me know. Thanks!!