Edited Question:
I have a data set of 699 rows and in the exercise I'm working on it is requested to generate a training set of 300 observations. The rest will be the test set. I write all the possible information in order to make the situation clearer as possible.
#First part of the code & Preprocessing
attach(Cancer_data)
names(Cancer_data)[1] <- "id"
names(Cancer_data)[2] <- "thickness"
names(Cancer_data)[3] <- "unif.size"
names(Cancer_data)[4] <- "unif.shape"
names(Cancer_data)[5] <- "adhesion"
names(Cancer_data)[6] <- "size"
names(Cancer_data)[7] <- "nuclei"
names(Cancer_data)[8] <- "chromatin"
names(Cancer_data)[9] <- "nucleoli"
names(Cancer_data)[10] <- "mitoses"
names(Cancer_data)[11] <- "Prognosis"
#Prognosis are my class labels 2 for benign cancer 4 for malignant
Prognosis <- as.factor(Cancer_data$Prognosis)
Cancer_data <- Cancer_data %>% dplyr :: select(-id)
Passing directly to the rpart model, avoiding to re-write data splitting that is clear enough, I implement this classification tree model with r part
rpart_model <- rpart(Prognosis ~.,method = "class",data = train_set)
#The train_set was implemented before with caret:: createDataPrtition()
Now it is the main issue because when I predict the tree performances on the test_set and I try to obtain the confusionMatrix R returns me this error:
Error: `data` and `reference` should be factors with the same levels.
here the implemented code
y_hat <- predict(rpart_model,test_set)
confusionMatrix(Cancer_data$Prognosis,y_hat)
I tried also
y_hat <- predict(rpart_model,type ='class')
as it was suggested in a previous Post
I apologize for the length of the question but I preferred to be as more precise as I could. Thank you in advance