Function to calculate R-squared for a topic model. This uses a geometric interpretation of R-squared as the proportion of total distance each document is from the center of all the documents that is explained by the model.

CalcTopicModelR2(dtm, phi, theta, ...)

Arguments

dtm

A documents by terms dimensional document term matrix of class dgCMatrix or of class matrix.

phi

A topics by terms dimensional matrix where each entry is p(term_i |topic_j)

theta

A documents by topics dimensional matrix where each entry is p(topic_j|document_d)

...

Other arguments to be passed to TmParallelApply. See note, below.

Value

Returns an object of class numeric representing the proportion of variability in the data that is explained by the topic model.

Note

This function performs parallel computation if dtm has more than 3,000 rows. The default is to use all available cores according to detectCores. However, this can be modified by passing the cpus argument when calling this function.

Examples

# Load a pre-formatted dtm and topic model
data(nih_sample_dtm) 
data(nih_sample_topic_model)

# Get the R-squared of the model
r2 <- CalcTopicModelR2(dtm = nih_sample_dtm, 
                     phi = nih_sample_topic_model$phi, 
                     theta = nih_sample_topic_model$theta)


r2
#> [1] 0.3723717