I've trained an artificial neural network algorithm with caret and nnet using r. I am trying to generate a meaningful output - using Confusion Matrix, ideally - but continue to get errors such as "data and reference should be factors with the same levels" or "arguments must have the same length".
pitchData <- read.csv(file.choose(), header = T)
summary(pitchData)
set.seed(75)
DataSplit <- createDataPartition(cleanPitch$type, p = 0.75, list = FALSE)
trainData = cleanPitch[DataSplit,]
testData = cleanPitch[-DataSplit,]
#ANN for pitcher's case -- physical description variables only
set.seed(2713)
ANNscout <- train(type ~ code + pitch_type + b_score + b_count + s_count + outs + pitch_num + on_1b + on_2b + on_3b,
data = trainData, method = "nnet", trace = FALSE)
summary(ANNscout)
predictScout = predict(ANNscout, newData = testData)
confusionMatrix(testData$type, ANNscout)
The error occurs at confusionMatrix(testData$type, ANNscout). I have also tried confusionMatrix(predictScout, testData$type), as when summarized they have outputs of:
> summary(testData$type)
B S X
65126 82996 31456
> summary(predictScout)
B S X
195279 248965 94492
and I would think that these are the same factor length, etc.
I have also tried using the table() function as suggested elsewhere, but that does not seem to fix the root issue.
Link to dataset: https://www.kaggle.com/pschale/mlb-pitch-data-20152018#pitches.csv