This function takes a sparse matrix (DTM) as input and returns a character vector whose length is equal to the number of rows of the input DTM.

Dtm2Docs(dtm, ...)

Arguments

dtm

A sparse Matrix from the matrix package whose rownames correspond to documents and colnames correspond to words

...

Other arguments to be passed to TmParallelApply. See note, below.

Value

Returns a character vector. Each entry of this vector corresponds to the rows of dtm.

Note

This function performs parallel computation if dtm has more than 3,000 rows. The default is to use all available cores according to detectCores. However, this can be modified by passing the cpus argument when calling this function.

Examples

# Load a pre-formatted dtm and topic model
data(nih_sample)
data(nih_sample_dtm) 

# see the original documents
nih_sample$ABSTRACT_TEXT[ 1:3 ]
#> [1] "Methamphetamine (MA) is remarkably addictive and relapse to excessive use is highly probable and poses serious health concerns. Genetic factors have been little studied with regard to their role in susceptibility to MA addiction or relapse. A key goal will be to utilize a validated animal model of genetically-determined high and low susceptibility to MA use to improve genetic mapping resolution and to study an already identified neuroimmune gene network that influences MA response in this genetic model. In Aim 1, in coordination with Animal Core 3, replicated sets of selected mouse lines bred for high and low voluntary consumption of MA will be produced and QTL mapping will be performed by the Biostatistics and Genetics Core 2. These mice will be used for studies proposed in Components 7, 8 and 9. In Aim 2, neurocircuitry will be examined in the selected lines, using cFos mapping after acute and repeated MA treatment, and these data will be used to identify brain regions for immune factor analysis, and will be compared to imaging results from Scientific Component 7 for brains from the MA consumption selected lines. In Aim 3, qPCR immunology arrays will be used to examine brain and peripheral blood mononuclear cell gene expression for immune specific genes using samples from selectively bred MA drinking line mice that have been acutely or repeatedly treated with MA or with saline, or are in \"remission\". These data will be used in additional network analysis by the Biostatistics and Genetics Core 2 and compared to human peripheral gene expression results for controls, chronic MA users and user in remission. In Aim 4, cognitive, anxiety-like, and impulsivity-like traits will be examined in drug naive, acute, and repeated MA-exposed MADR mice, as well as mice in remission from MA exposure. Tissue from mice treated in the same way will be transferred to Translational Service Core 5 for analysis of immune factors and to Component 7 for imaging;half of each brain will be sent for each purpose to allow individual animal correlations to be performed. These data will also be examined for correspondence between behavior, neurocircuitry and immune system alterations. In addition, impulsivity-like measures in mice will be compared to similar measures in humans from Component 7. Finally, data collected in Component 9 will inform Component 8, with regard to traits and which of the selected lines to be studied for immunotherapeutic intervention. Cross-species analyses across components will identify key immune factors associated with chronic MA exposure (and remission) and MA-induced neuropsychiatric impairments, with the goal of ultimately identifying novel immunotherapeutic interventions."                                                                                                                                                                                                                                                                                                                             
#> [2] " Project Summary Risk bimarkers have become increasingly important in clinical decision making, guiding patients and their clinicians in choosing the most appropriate course of therapy or surveillance after treatment. Constructing accurate and individualized prediction rules and conducting rigorous validation are critical to the cancer biomarker field. Prospective cohort studies are crucial for such evaluation as time to event carries more information about a marker's value on early detection and prognosis than a simple measure of disease status. But prospective biomarker evaluation is challenging. Until now there has been little guidance pro- vided for statistical design and analysis of these studies. We propose to extend our previously funded effort to address several new challenges in prospective marker evaluation. The proposal will emphasize three unique aspects in prospective biomarker evaluation. First, for many cancers disease outcome may be heterogeneous due to the biological nature of the disease or selection of treatments. Constructing and validating prognostic and treatment selection rules based on more specific prediction of the risk of develop- ing aggressive cancer as opposed to indolent cancer is of great clinical interest yet analytically challenging. In Aim 1 we will provide statistical tools for developing and validating risk markers in a population with an unknown mixture of indolent and aggressive cancers. We propose statistical methods that facilitate the development and evaluation of prognostic markers for risk stratification. Methods for deriving and evalu- ating individualized treatment rules in the presence of a mixture of indolent and aggressive cancers will be considered. Second, among patients diagnosed with cancer who chose to be on active surveillance, developing monitoring tools to make adaptive monitoring or intervention recommendations with longitudinal biomarkers may alleviate overtreatment without missing signs of progression. In Aim 2 we will consider flexible procedures to quantify the updated predictive accuracy of longitudinal markers. In addition, we will develop and evaluate decision rules on the basis of risk, incorporating both cross-sectional and longi- tudinal marker information. The ascertainment of marker information in a large cohort requires enormous resources. Cost-effective cohort sampling is therefore highly desirable. In Aim 3 we will develop procedures to improve the efficiency of estimating risk and accuracy parameters and rigorously evaluate and compare different choices of matching/stratification rules and identify optimal pairs of analyses and sampling strate- gies. We will also develop estimation procedures for evaluating longitudinal markers in two-phase studies. Applications in cancer biomarker development provide a context for our research. Data from the Early De- tection Research Network and from several large cohort studies will be analyzed. Programs and algorithms developed in this proposal will be made available to public."
#> [3] "    DESCRIPTION (provided by applicant): Despite enormous efforts, no effective vaccine is currently available against HIV/AIDS. This is largely due to the virus's ability to evade neutralizing antibodies. The trimeric HIV envelope glycoprotein gp160 (HIV Env), which is exposed to the host immune response on the surface of the virion, effectively conceals epitopes from antibodies while maintaining its ability to promote viral entry. The few exposed antibody-binding sites are structurally unstable and sustain substantial sequence variability. Consequently, few broadly neutralizing antibodies have been identified. Here we will develop single molecule Fluorescence Resonance Energy Transfer (smFRET) imaging to monitor the conformational landscape of individual native Env molecules on the surface of an HIV virus. We will apply new enzymatic methods to introduce organic fluorophores into HIV Env at positions that don't interfere with infectivity. The placement of two dyes within one Env molecule will allow the application of smFRET to report conformational changes of the unliganded HIV Env from various angles. Following the characterization of the unliganded HIV Env, we will determine how its conformation dynamics change in response to receptor and antibody binding. Our approach will identify critical conformational states of HIV Env that should be targeted for vaccine development, reveal the molecular mechanism underlying immune evasion, and understand the potency of existing neutralizing antibodies.        "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          

# see the new documents re-structured from the DTM
new_docs <- Dtm2Docs(dtm = nih_sample_dtm)

new_docs[ 1:3 ]
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      8693991 
#>                                                                                                                                                                                                                                                                                                                                                                                                      " concerns  acutely  sets  addictive  qpcr  transferred  repeatedly  qtl  poses  mononuclear  neuroimmune  cfos  saline  replicated  madr  voluntary  neuropsychiatric  methamphetamine  correspondence  probable  impairments  ultimately  collected  neurocircuitry  neurocircuitry  impulsivity  impulsivity  anxiety  coordination  arrays  correlations  immunotherapeutic  immunotherapeutic  exposed  bred  bred  remarkably  traits  traits  purpose  naive  half  line  brains  regions  immunology  excessive  selectively  resolution  user  produced  regard  regard  genetics  genetics  treated  treated  influences  remission  remission  remission  remission  service  peripheral  peripheral  controls  similar  repeated  repeated  utilize  finally  validated  biostatistics  biostatistics  samples  studied  studied  susceptibility  susceptibility  relapse  relapse  addiction  identifying  lines  lines  lines  lines  compared  compared  compared  measures  measures  inform  performed  performed  drinking  examined  examined  examined  selected  selected  selected  selected  determined  analyses  mapping  mapping  mapping  species  genetically  additional  examine  acute  acute  network  network  chronic  chronic  components  components  cognitive  users  ma  ma  ma  ma  ma  ma  ma  ma  ma  ma  ma  ma  ma  ma  low  low  alterations  translational  cross  animal  animal  animal  humans  component  component  component  component  component  individual  consumption  consumption  highly  induced  genes  factor  scientific  blood  behavior  identified  mouse  drug  exposure  exposure  interventions  key  key  mice  mice  mice  mice  mice  mice  gene  gene  gene  system  improve  addition  expression  expression  results  results  intervention  tissue  imaging  imaging  analysis  analysis  analysis  identify  identify  immune  immune  immune  immune  immune  model  model  proposed  goal  goal  high  high  genetic  genetic  genetic  response  role  treatment  factors  factors  factors  core  core  core  core  human  brain  brain  brain  study  specific  cell  aim  aim  aim  aim  data  data  data  data  health  studies " 
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      8693362 
#> " missing  choices  ascertainment  tection  strate  ating  tudinal  bimarkers  emphasize  vided  evalu  overtreatment  alleviate  chose  signs  gies  deriving  choosing  desirable  longi  heterogeneous  validation  updated  carries  ing  analytically  efficiency  diagnosed  constructing  constructing  de  flexible  challenging  challenging  surveillance  surveillance  accuracy  accuracy  estimation  recommendations  sectional  stratification  stratification  opposed  estimating  sampling  sampling  mixture  mixture  matching  rigorous  algorithms  challenges  parameters  increasingly  pairs  indolent  indolent  indolent  guiding  rigorously  guidance  conducting  analyzed  prognosis  validating  validating  predictive  incorporating  decision  decision  event  enormous  rules  rules  rules  rules  rules  procedures  procedures  procedures  making  crucial  simple  selection  selection  prediction  prediction  individualized  individualized  applications  accurate  unknown  aggressive  aggressive  aggressive  quantify  clinicians  pro  evaluating  considered  effort  aspects  prospective  prospective  prospective  prospective  measure  biomarker  biomarker  biomarker  biomarker  great  extend  status  marker  marker  marker  marker  prognostic  prognostic  requires  nature  previously  field  cohort  cohort  cohort  cohort  made  interest  treatments  funded  cancers  cancers  cancers  compare  presence  active  optimal  summary  longitudinal  longitudinal  longitudinal  facilitate  analyses  monitoring  monitoring  address  network  statistical  statistical  statistical  detection  make  outcome  context  developing  developing  resources  tools  tools  cross  adaptive  biomarkers  phase  markers  markers  markers  markers  cost  highly  biological  progression  basis  unique  evaluation  evaluation  evaluation  evaluation  evaluation  evaluate  evaluate  large  large  public  due  therapy  information  information  information  developed  propose  propose  programs  improve  methods  methods  population  effective  addition  early  early  intervention  important  analysis  design  identify  patients  patients  critical  proposal  proposal  time  treatment  treatment  treatment  risk  risk  risk  risk  risk  risk  based  provide  provide  develop  develop  develop  develop  development  development  disease  disease  disease  cancer  cancer  cancer  cancer  cancer  project  specific  clinical  clinical  aim  aim  aim  data  studies  studies  studies  studies  research  research " 
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      8607498 
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         " conceals  landscape  dyes  conformation  angles  introduce  trimeric  interfere  virion  fluorophores  placement  epitopes  structurally  smfret  smfret  evade  sustain  gp  resonance  evasion  aids  maintaining  potency  unstable  envelope  unliganded  unliganded  sequence  exposed  exposed  entry  conformational  conformational  conformational  report  transfer  infectivity  glycoprotein  monitor  organic  reveal  molecule  molecule  broadly  native  enormous  promote  fluorescence  molecules  variability  enzymatic  neutralizing  neutralizing  neutralizing  effectively  largely  antibody  antibody  positions  apply  underlying  single  surface  surface  substantial  dynamics  characterization  antibodies  antibodies  antibodies  antibodies  energy  virus  virus  env  env  env  env  env  env  env  states  efforts  viral  binding  binding  existing  receptor  targeted  individual  understand  mechanism  change  sites  ability  ability  vaccine  vaccine  host  identified  due  methods  effective  approach  imaging  application  identify  immune  immune  critical  molecular  hiv  hiv  hiv  hiv  hiv  hiv  hiv  hiv  response  response  develop  development  description  determine  applicant  provided "