A wrapper for the CTM function based on Blei's original code that returns a nicely-formatted topic model.

FitCtmModel(
  dtm,
  k,
  calc_coherence = TRUE,
  calc_r2 = FALSE,
  return_all = TRUE,
  ...
)

Arguments

dtm

A document term matrix of class dgCMatrix

k

Number of topics

calc_coherence

Do you want to calculate probabilistic coherence of topics after the model is trained? Defaults to TRUE.

calc_r2

Do you want to calculate R-squared after the model is trained? Defaults to FALSE.

return_all

Logical. Do you want the raw results of the underlying function returned along with the formatted results? Defaults to TRUE.

...

Other arguments to pass to CTM or TmParallelApply. See note below.

Value

Returns a list with a minimum of two objects, phi and theta. The rows of phi index topics and the columns index tokens. The rows of theta index documents and the columns index topics.

Note

When passing additional arguments to CTM, you must unlist the elements in the control argument and pass them one by one. See examples for how to dot this correctly.

Examples

# Load a pre-formatted dtm 
data(nih_sample_dtm) 

# Fit a CTM model on a sample of documents
model <- FitCtmModel(dtm = nih_sample_dtm[ sample(1:nrow(nih_sample_dtm) , 10) , ], 
                     k = 3, return_all = FALSE)
#> Error in loadNamespace(x): there is no package called ‘topicmodels’
                     
# the correct way to pass control arguments to CTM
if (FALSE) {
topics_CTM <- FitCtmModel(
    dtm = nih_sample_dtm[ sample(1:nrow(nih_sample_dtm) , 10) , ],
    k = 10,
    calc_coherence = TRUE,
    calc_r2 = TRUE,
    return_all = TRUE,
    estimate.beta = TRUE,
    verbose = 0,
    prefix = tempfile(),
    save = 0,
    keep = 0,
    seed = as.integer(Sys.time()),
    nstart = 1L,
    best = TRUE,
    var = list(iter.max = 500, tol = 10^-6),
    em = list(iter.max = 1000, tol = 10^-4),
    initialize = "random",
    cg = list(iter.max = 500, tol = 10^-5)
)
}