This function takes a phi matrix (P(token|topic)) and a theta matrix (P(topic|document)) and returns the phi prime matrix (P(topic|token)). Phi prime can be used for classifying new documents and for alternative topic labels.
CalcGamma(phi, theta, p_docs = NULL, correct = TRUE)
The phi matrix whose rows index topics and columns index words. The i, j entries are P(word_i | topic_j)
The theta matrix whose rows index documents and columns index topics. The i, j entries are P(topic_i | document_j)
A numeric vector of length
nrow(theta) that is
proportional to the number of terms in each document. This is
an optional argument. It defaults to NULL
Logical. Do you want to set NAs or NaNs in the final result to
zero? Useful when hitting computational underflow. Defaults to
TRUE. Set to
FALSE for troubleshooting or diagnostics.
matrix whose rows correspond to topics and whose columns
correspond to tokens. The i,j entry corresponds to P(topic_i|token_j)
# Load a pre-formatted dtm and topic model data(nih_sample_topic_model) # Make a gamma matrix, P(topic|words) gamma <- CalcGamma(phi = nih_sample_topic_model$phi, theta = nih_sample_topic_model$theta)