TermDocFreq.Rd
This function takes a document term matrix as input and returns a data frame with columns for term frequency, document frequency, and inverse-document frequency
TermDocFreq(dtm)
dtm | A document term matrix of class |
---|
Returns a data.frame
with 4 columns. The first column,
term
is a vector of token labels. The second column, term_freq
is the count of times term
appears in the entire corpus. The third
column doc_freq
is the count of the number of documents in which
term
appears. The fourth column, idf
is the log-weighted
inverse document frequency of term
.
# Load a pre-formatted dtm and topic model data(nih_sample_dtm) data(nih_sample_topic_model) # Get the term frequencies term_freq_mat <- TermDocFreq(nih_sample_dtm) str(term_freq_mat)#> 'data.frame': 5210 obs. of 4 variables: #> $ term : chr "folding" "tosuprttedprtmnt" "importation" "hd" ... #> $ term_freq: num 1 1 1 1 1 1 1 1 1 1 ... #> $ doc_freq : int 1 1 1 1 1 1 1 1 1 1 ... #> $ idf : num 4.61 4.61 4.61 4.61 4.61 ...