I used RTextTools for train and validation of a sentiment data-set, which produces the output fine with precision recall and F1 score. However, when I use this model against new/ unseen data-set I can retrieve the results that are positive, negative and neutral, but in this instance I cannot extract the precision, recall and F1 score results as I did with Create_Analytics() function.
The code is shown below with the output (small sample) from the result of the unseen data-set.
#Read in data
sentiment_data = read.csv("PostiveNegativeNeutral_Results.csv", header =
FALSE)
# build dtm
matrix= create_matrix(sentiment_data[,1:19])
# train the model
mat = as.matrix(matrix)
# build the data to specify response variable, training set, testing set.
container = create_container(mat, as.numeric(as.factor(sentiment_data[,20])),
trainSize=1:1200, testSize=1201:1500,virgin=FALSE)
models = train_models(container, algorithms=c("MAXENT" , "SVM", "RF",
"BAGGING", "TREE"), set_heldout = 300)
###container1
results = classify_models(container, models)
analyticsResults = create_analytics(container, results)
# summarises with precision, recall and f1 results
summary(analyticsv1)
UnseenSentimentData <- read.csv("unseenSentimentData.csv", header = FALSE)
matrix2= create_matrix(UnseenSentimentData[,1:19])
mat2 = as.matrix(matrix2)
container2 = create_container(mat2, labels=NULL,
trainSize=1:29420,testSize=NULL, virgin=TRUE)
#Unseen put to test against the model created above
results2 = classify_models(container2, models)
#Shows the result of positive, negative and neutral, but no
#precision/recall/f1 score as cannot use create_analytics function, so how else do I do this?
summary(results2)
#Small sample output from the summary of results2
MAXENTROPY_LABEL MAXENTROPY_PROB
9005 Min. :0.3577
14851 1st Qu.:0.5448
5564 Median :0.6954
Mean :0.6863
3rd Qu.:0.8244
Max. :0.9769
How do I use this package to calculate the precision, recall and F1 score from the unseen result (known as results2)? Or does this package not allow to do this, so do I need to calculate it manually? If more information required, then let me know.