Calculate the log likelihood of a document term matrix given a topic model

This function takes a DTM, phi matrix (P(word|topic)), and a theta matrix (P(topic|document)) and returns a single value for the likelihood of the data given the model.

CalcLikelihood(dtm, phi, theta, ...)

Arguments

dtm: The document term matrix of class dgCMatrix.
phi: The phi matrix whose rows index topics and columns index words. The i, j entries are P(word_i | topic_j)
theta: The theta matrix whose rows index documents and columns index topics. The i, j entries are P(topic_i | document_j)
...: Other arguments to pass to TmParallelApply. See note, below.

Value

Returns an object of class numeric corresponding to the log likelihood.

Note

This function performs parallel computation if dtm has more than 3,000 rows. The default is to use all available cores according to detectCores. However, this can be modified by passing the cpus argument when calling this function.

Examples

# Load a pre-formatted dtm and topic model
data(nih_sample_dtm) 
data(nih_sample_topic_model)

# Get the likelihood of the data given the fitted model parameters
ll <- CalcLikelihood(dtm = nih_sample_dtm, 
                     phi = nih_sample_topic_model$phi, 
                     theta = nih_sample_topic_model$theta)

ll
#> [1] -57416.55