- I am estimating a topic model for different two time periods; period_1 and period_2 using the Gibbs Sampling in the R package topicmodels, with the number of topics=4;
- I want to use the topic-word distribution from period 1 model at time t as the initial values for the Gibbs Sampler at time t, i.e. I want the word in the intitial count matrix to be distributed over the four topics;
- Reading the text from the topicmodels package I think this can be done with the initialize command in the TopicModelcontrol-classm by suppling the model in period_2 with the model from period_1 and setting initialize to "beta". Is this correct?;
- I included a reprex below of what I programmed;
My question: is the reprex below doing what I am aiming for?
# LOAD LIBRARY
library(topicmodels)
# LOAD DATA
data("AssociatedPress", package = "topicmodels")
# CREATE PERIOD_1 AND PERIOD_2
set.seed(123)
folding <- sample(rep(seq_len(10),ceiling(nrow(AssociatedPress)))[seq_len(nrow(AssociatedPress))])
period_1 <- which(folding == 1)
period_2 <- which(folding != 1)
# ESTIMATE LDA PERIOD 1
set.seed(123)
train_Gibbs <- LDA(AssociatedPress[period_1,],
k = 4,
method = "Gibbs",
control = list(estimate.beta = TRUE, best = TRUE))
# ESTIMATE LDA PERIOD 2 USING WORD-TOPIC DISTRIBUTION PERIOD 1
set.seed(456)
retrain_Gibbs <- LDA(AssociatedPress[period_2,],
model = train_Gibbs,
control = list(estimate.beta = TRUE,initialize = "beta"))