Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 209748

When comparing a model with new/unseen data using RTextTools, how to extract precision and recall?

$
0
0

I used RTextTools for train and validation of a sentiment data-set, which produces the output fine with precision recall and F1 score. However, when I use this model against new/ unseen data-set I can retrieve the results that are positive, negative and neutral, but in this instance I cannot extract the precision, recall and F1 score results as I did with Create_Analytics() function.

The code is shown below with the output (small sample) from the result of the unseen data-set.

#Read in data
sentiment_data = read.csv("PostiveNegativeNeutral_Results.csv", header = 
FALSE) 

# build dtm
matrix= create_matrix(sentiment_data[,1:19])
# train the model
mat = as.matrix(matrix)

# build the data to specify response variable, training set, testing set.
container = create_container(mat, as.numeric(as.factor(sentiment_data[,20])),
trainSize=1:1200, testSize=1201:1500,virgin=FALSE)

models = train_models(container, algorithms=c("MAXENT" , "SVM", "RF", 
"BAGGING", "TREE"), set_heldout = 300)

###container1
results = classify_models(container, models)

analyticsResults = create_analytics(container, results)
# summarises with precision, recall and f1 results
summary(analyticsv1)

UnseenSentimentData <- read.csv("unseenSentimentData.csv", header = FALSE) 

matrix2= create_matrix(UnseenSentimentData[,1:19])
mat2 = as.matrix(matrix2)
container2 = create_container(mat2, labels=NULL, 
trainSize=1:29420,testSize=NULL, virgin=TRUE)

#Unseen put to test against the model created above
results2 = classify_models(container2, models)

#Shows the result of positive, negative and neutral, but no 
#precision/recall/f1 score as cannot use create_analytics function, so how else do I do this?
summary(results2)

#Small sample output from the summary of results2
MAXENTROPY_LABEL MAXENTROPY_PROB
9005             Min.   :0.3577
14851            1st Qu.:0.5448
5564             Median :0.6954  
                 Mean   :0.6863 
                 3rd Qu.:0.8244 
                 Max.   :0.9769

How do I use this package to calculate the precision, recall and F1 score from the unseen result (known as results2)? Or does this package not allow to do this, so do I need to calculate it manually? If more information required, then let me know.


Viewing all articles
Browse latest Browse all 209748

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>