Calculates the probabilistic coherence of a topic or topics. This approximates semantic coherence or human understandability of a topic.

CalcProbCoherence(phi, dtm, M = 5)

Arguments

phi

A numeric matrix or a numeric vector. The vector, or rows of the matrix represent the numeric relationship between topic(s) and terms. For example, this relationship may be p(word|topic) or p(topic|word).

dtm

A document term matrix or co-occurrence matrix of class matrix or whose class inherits from the Matrix package. Columns must index terms.

M

An integer for the number of words to be used in the calculation. Defaults to 5

Value

Returns an object of class numeric corresponding to the probabilistic coherence of the input topic(s).

Examples

# Load a pre-formatted dtm and topic model
data(nih_sample_topic_model)
data(nih_sample_dtm) 

CalcProbCoherence(phi = nih_sample_topic_model$phi, dtm = nih_sample_dtm, M = 5)
#>        t_1        t_2        t_3        t_4        t_5        t_6        t_7 
#> 0.05409345 0.41333333 0.16700000 0.19807143 0.15443478 0.26428571 0.24666667 
#>        t_8        t_9       t_10       t_11       t_12       t_13       t_14 
#> 0.26169048 0.16490323 0.09089027 0.05856349 0.21644444 0.17644444 0.11154266 
#>       t_15       t_16       t_17       t_18       t_19       t_20       t_21 
#> 0.35659259 0.20107407 0.24287931 0.48583598 0.15270529 0.37666667 0.38766667 
#>       t_22       t_23       t_24       t_25       t_26       t_27       t_28 
#> 0.03250000 0.04352301 0.23952941 0.05400000 0.12155796 0.16772038 0.08503759 
#>       t_29       t_30 
#> 0.04866667 0.33266667