Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

Lemmatization in R - Problem with PlainTextDocument function

$
0
0

I got following error when trying to inspect the DocumentTermMatrix after performing lemmatization in R: no applicable method for 'meta' applied to an object of class "character"

I've tried the PlainTextDocument function to solve this but unfortunately this function removes meta data from corpus which results in following error: Error in [.simple_triplet_matrix(x, terms, docs) : Repeated indices currently not allowed.

This is my code:

corp9 <- Corpus(URISource(files),
               readerControl = list(reader =readPDF))
corp9 <- tm_map(corp9, removePunctuation, ucp = TRUE)
corp9 <- tm_map(corp9, removeNumbers)
corp9 <- tm_map(corp9, content_transformer(tolower))
corp9 <- tm_map(corp9, removeWords, stopwords("en"))
corp9 <- tm_map(corp9, stripWhitespace)
library("textstem")
corp9 <- tm_map(corp9, lemmatize_strings)

corp9 <- tm_map(corp9, PlainTextDocument)

corp.tdm9 <- TermDocumentMatrix(corp9)
inspect(corp.tdm9) 

Would be glad if someone could help me! :)


Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>