I am trying to compute the accuracy of a decision tree on the seeds dataset (Link to the seeds dataset) over 20 iterations, however, I am getting very low overall accuracy (30%-35%). This is what I've done so far:
library(rpart)
seed = read.csv("seeds_dataset.txt",header= F, sep="\t")
colnames(seed)<- c("area", "per.", "comp.", "l.kernel", "w.kernel","asy_coeff", "lenkernel","type")
sampleSize <- nrow(seed)
mat = matrix(nrow=sampleSize, ncol=20)
for (t in 1:20) {
testSampleIdx <- sample(nrow(seed), size=sampleSize)
data <- seed[testSampleIdx,]
for (i in 1:nrow(data)){
training = data[-i, ]
test = data[i, ]
classification = rpart(type ~ ., data=training, method="class")
prediction = predict(classification, newdata=test, type="class")
cm = table(test$type, prediction)
accuracy <- sum(diag(cm))/sum(cm)
mat[i,t] = accuracy
}
}
for (i in 1:ncol(mat)){
print(paste("accuracy for ",i," iteration ", round((mean(mat[, i]))*100,1), "%", sep=""))
}
print(paste("overall accuracy ", round((mean(mat))*100,1), "%", sep=""))
Can anyone provide me with comments and feedback on the reason causing this low accuracy? Thank you.