I have some issues plotting several curves. I have made some predictive models using caret which predicts probablilities, like logistic and XGBoost and evaluated using ROC, but when I try to plot these against whether my outcome is 0 or 1, my ggplot misbehaves compared to how it does when not using caret (but here I need caret). One thing that I found weird with caret was that since my outcome is binary I had to make the variable a factor and label it with No and Yes for responses. I should get a plot where you get a curve that gets a steeper and steeper incline as you go to the right.
I have tried to convert the variable I predict back to numeric after modelling, but then instead of a 0/1 outcome it is 1/2, and instead of the exponentially increasing curve with my predictions and whether the event 1 occured, I get two lines formed by observations.
My code is:
log_control <- trainControl(method = "cv", number = 10, classProbs = TRUE,summaryFunction = twoClassSummary)
logistic_model <- train(default ~ profit_margin + interest_coverage_ratio + age_of_company + liquidity_ratio_2
+ adverse_audit_opinion + amount_unpaid_debt + payment_reminders +
industry_3 + industry_5 + industry_11 + total_assets + revenue + equity, data = pd_train, trControl = log_control,
method = "glm", family = "binomial", metric = "ROC")
vif(logistic_model$finalModel)
#Predictions
log_prediction <- predict.train(logistic_model, pd_test, type = "prob")
log_prediction <- log_prediction[,-1]
pd_test <- pd_test %>%
mutate(log_prob_predictions = log_prediction)
# Visual evaluation
#pd_test$default <- as.numeric(pd_test$default)
ggplot(data = pd_test, aes(x = default, y = log_prob_predictions)) +
geom_point(aes(color = default), alpha = 1, shape = 4, stroke = 2) +
xlab("Index") +
ylab("Predicted probability of default")
Any help as to fixing this is appreciated