Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

Initialization of topic-word distribution in R using topicmodels package

$
0
0
  • I am estimating a topic model for different two time periods; period_1 and period_2 using the Gibbs Sampling in the R package topicmodels, with the number of topics=4;
  • I want to use the topic-word distribution from period 1 model at time t as the initial values for the Gibbs Sampler at time t, i.e. I want the word in the intitial count matrix to be distributed over the four topics;
  • Reading the text from the topicmodels package I think this can be done with the initialize command in the TopicModelcontrol-classm by suppling the model in period_2 with the model from period_1 and setting initialize to "beta". Is this correct?;
  • I included a reprex below of what I programmed;

My question: is the reprex below doing what I am aiming for?

# LOAD LIBRARY
library(topicmodels)

# LOAD DATA
data("AssociatedPress", package = "topicmodels")

# CREATE PERIOD_1 AND PERIOD_2
set.seed(123)
folding <- sample(rep(seq_len(10),ceiling(nrow(AssociatedPress)))[seq_len(nrow(AssociatedPress))])
period_1 <- which(folding == 1)
period_2 <- which(folding != 1)

# ESTIMATE LDA PERIOD 1
set.seed(123)
train_Gibbs <- LDA(AssociatedPress[period_1,],
  k = 4, 
  method = "Gibbs", 
  control = list(estimate.beta = TRUE, best = TRUE))

# ESTIMATE LDA PERIOD 2 USING WORD-TOPIC DISTRIBUTION PERIOD 1
set.seed(456)
retrain_Gibbs <- LDA(AssociatedPress[period_2,], 
  model = train_Gibbs,
  control = list(estimate.beta = TRUE,initialize = "beta"))

Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>