Function to calculate R-squared for a topic model. This uses a geometric interpretation of R-squared as the proportion of total distance each document is from the center of all the documents that is explained by the model.

CalcTopicModelR2(dtm, phi, theta, ...)



A documents by terms dimensional document term matrix of class dgCMatrix or of class matrix.


A topics by terms dimensional matrix where each entry is p(term_i |topic_j)


A documents by topics dimensional matrix where each entry is p(topic_j|document_d)


Other arguments to be passed to TmParallelApply. See note, below.


Returns an object of class numeric representing the proportion of variability in the data that is explained by the topic model.


This function performs parallel computation if dtm has more than 3,000 rows. The default is to use all available cores according to detectCores. However, this can be modified by passing the cpus argument when calling this function.


# Load a pre-formatted dtm and topic model data(nih_sample_dtm) data(nih_sample_topic_model) # Get the R-squared of the model r2 <- CalcTopicModelR2(dtm = nih_sample_dtm, phi = nih_sample_topic_model$phi, theta = nih_sample_topic_model$theta) r2
#> [1] 0.3723717