This function takes a sparse matrix (DTM) as input and returns a character vector whose length is equal to the number of rows of the input DTM.
Dtm2Docs(dtm, ...)
A sparse Matrix from the matrix package whose rownames correspond to documents and colnames correspond to words
Other arguments to be passed to TmParallelApply
. See note, below.
Returns a character vector. Each entry of this vector corresponds to the rows
of dtm
.
This function performs parallel computation if dtm
has more than 3,000
rows. The default is to use all available cores according to detectCores
.
However, this can be modified by passing the cpus
argument when calling
this function.
# Load a pre-formatted dtm and topic model
data(nih_sample)
data(nih_sample_dtm)
# see the original documents
nih_sample$ABSTRACT_TEXT[ 1:3 ]
#> [1] "Methamphetamine (MA) is remarkably addictive and relapse to excessive use is highly probable and poses serious health concerns. Genetic factors have been little studied with regard to their role in susceptibility to MA addiction or relapse. A key goal will be to utilize a validated animal model of genetically-determined high and low susceptibility to MA use to improve genetic mapping resolution and to study an already identified neuroimmune gene network that influences MA response in this genetic model. In Aim 1, in coordination with Animal Core 3, replicated sets of selected mouse lines bred for high and low voluntary consumption of MA will be produced and QTL mapping will be performed by the Biostatistics and Genetics Core 2. These mice will be used for studies proposed in Components 7, 8 and 9. In Aim 2, neurocircuitry will be examined in the selected lines, using cFos mapping after acute and repeated MA treatment, and these data will be used to identify brain regions for immune factor analysis, and will be compared to imaging results from Scientific Component 7 for brains from the MA consumption selected lines. In Aim 3, qPCR immunology arrays will be used to examine brain and peripheral blood mononuclear cell gene expression for immune specific genes using samples from selectively bred MA drinking line mice that have been acutely or repeatedly treated with MA or with saline, or are in \"remission\". These data will be used in additional network analysis by the Biostatistics and Genetics Core 2 and compared to human peripheral gene expression results for controls, chronic MA users and user in remission. In Aim 4, cognitive, anxiety-like, and impulsivity-like traits will be examined in drug naive, acute, and repeated MA-exposed MADR mice, as well as mice in remission from MA exposure. Tissue from mice treated in the same way will be transferred to Translational Service Core 5 for analysis of immune factors and to Component 7 for imaging;half of each brain will be sent for each purpose to allow individual animal correlations to be performed. These data will also be examined for correspondence between behavior, neurocircuitry and immune system alterations. In addition, impulsivity-like measures in mice will be compared to similar measures in humans from Component 7. Finally, data collected in Component 9 will inform Component 8, with regard to traits and which of the selected lines to be studied for immunotherapeutic intervention. Cross-species analyses across components will identify key immune factors associated with chronic MA exposure (and remission) and MA-induced neuropsychiatric impairments, with the goal of ultimately identifying novel immunotherapeutic interventions."
#> [2] " Project Summary Risk bimarkers have become increasingly important in clinical decision making, guiding patients and their clinicians in choosing the most appropriate course of therapy or surveillance after treatment. Constructing accurate and individualized prediction rules and conducting rigorous validation are critical to the cancer biomarker field. Prospective cohort studies are crucial for such evaluation as time to event carries more information about a marker's value on early detection and prognosis than a simple measure of disease status. But prospective biomarker evaluation is challenging. Until now there has been little guidance pro- vided for statistical design and analysis of these studies. We propose to extend our previously funded effort to address several new challenges in prospective marker evaluation. The proposal will emphasize three unique aspects in prospective biomarker evaluation. First, for many cancers disease outcome may be heterogeneous due to the biological nature of the disease or selection of treatments. Constructing and validating prognostic and treatment selection rules based on more specific prediction of the risk of develop- ing aggressive cancer as opposed to indolent cancer is of great clinical interest yet analytically challenging. In Aim 1 we will provide statistical tools for developing and validating risk markers in a population with an unknown mixture of indolent and aggressive cancers. We propose statistical methods that facilitate the development and evaluation of prognostic markers for risk stratification. Methods for deriving and evalu- ating individualized treatment rules in the presence of a mixture of indolent and aggressive cancers will be considered. Second, among patients diagnosed with cancer who chose to be on active surveillance, developing monitoring tools to make adaptive monitoring or intervention recommendations with longitudinal biomarkers may alleviate overtreatment without missing signs of progression. In Aim 2 we will consider flexible procedures to quantify the updated predictive accuracy of longitudinal markers. In addition, we will develop and evaluate decision rules on the basis of risk, incorporating both cross-sectional and longi- tudinal marker information. The ascertainment of marker information in a large cohort requires enormous resources. Cost-effective cohort sampling is therefore highly desirable. In Aim 3 we will develop procedures to improve the efficiency of estimating risk and accuracy parameters and rigorously evaluate and compare different choices of matching/stratification rules and identify optimal pairs of analyses and sampling strate- gies. We will also develop estimation procedures for evaluating longitudinal markers in two-phase studies. Applications in cancer biomarker development provide a context for our research. Data from the Early De- tection Research Network and from several large cohort studies will be analyzed. Programs and algorithms developed in this proposal will be made available to public."
#> [3] " DESCRIPTION (provided by applicant): Despite enormous efforts, no effective vaccine is currently available against HIV/AIDS. This is largely due to the virus's ability to evade neutralizing antibodies. The trimeric HIV envelope glycoprotein gp160 (HIV Env), which is exposed to the host immune response on the surface of the virion, effectively conceals epitopes from antibodies while maintaining its ability to promote viral entry. The few exposed antibody-binding sites are structurally unstable and sustain substantial sequence variability. Consequently, few broadly neutralizing antibodies have been identified. Here we will develop single molecule Fluorescence Resonance Energy Transfer (smFRET) imaging to monitor the conformational landscape of individual native Env molecules on the surface of an HIV virus. We will apply new enzymatic methods to introduce organic fluorophores into HIV Env at positions that don't interfere with infectivity. The placement of two dyes within one Env molecule will allow the application of smFRET to report conformational changes of the unliganded HIV Env from various angles. Following the characterization of the unliganded HIV Env, we will determine how its conformation dynamics change in response to receptor and antibody binding. Our approach will identify critical conformational states of HIV Env that should be targeted for vaccine development, reveal the molecular mechanism underlying immune evasion, and understand the potency of existing neutralizing antibodies. "
# see the new documents re-structured from the DTM
new_docs <- Dtm2Docs(dtm = nih_sample_dtm)
new_docs[ 1:3 ]
#> 8693991
#> " concerns acutely sets addictive qpcr transferred repeatedly qtl poses mononuclear neuroimmune cfos saline replicated madr voluntary neuropsychiatric methamphetamine correspondence probable impairments ultimately collected neurocircuitry neurocircuitry impulsivity impulsivity anxiety coordination arrays correlations immunotherapeutic immunotherapeutic exposed bred bred remarkably traits traits purpose naive half line brains regions immunology excessive selectively resolution user produced regard regard genetics genetics treated treated influences remission remission remission remission service peripheral peripheral controls similar repeated repeated utilize finally validated biostatistics biostatistics samples studied studied susceptibility susceptibility relapse relapse addiction identifying lines lines lines lines compared compared compared measures measures inform performed performed drinking examined examined examined selected selected selected selected determined analyses mapping mapping mapping species genetically additional examine acute acute network network chronic chronic components components cognitive users ma ma ma ma ma ma ma ma ma ma ma ma ma ma low low alterations translational cross animal animal animal humans component component component component component individual consumption consumption highly induced genes factor scientific blood behavior identified mouse drug exposure exposure interventions key key mice mice mice mice mice mice gene gene gene system improve addition expression expression results results intervention tissue imaging imaging analysis analysis analysis identify identify immune immune immune immune immune model model proposed goal goal high high genetic genetic genetic response role treatment factors factors factors core core core core human brain brain brain study specific cell aim aim aim aim data data data data health studies "
#> 8693362
#> " missing choices ascertainment tection strate ating tudinal bimarkers emphasize vided evalu overtreatment alleviate chose signs gies deriving choosing desirable longi heterogeneous validation updated carries ing analytically efficiency diagnosed constructing constructing de flexible challenging challenging surveillance surveillance accuracy accuracy estimation recommendations sectional stratification stratification opposed estimating sampling sampling mixture mixture matching rigorous algorithms challenges parameters increasingly pairs indolent indolent indolent guiding rigorously guidance conducting analyzed prognosis validating validating predictive incorporating decision decision event enormous rules rules rules rules rules procedures procedures procedures making crucial simple selection selection prediction prediction individualized individualized applications accurate unknown aggressive aggressive aggressive quantify clinicians pro evaluating considered effort aspects prospective prospective prospective prospective measure biomarker biomarker biomarker biomarker great extend status marker marker marker marker prognostic prognostic requires nature previously field cohort cohort cohort cohort made interest treatments funded cancers cancers cancers compare presence active optimal summary longitudinal longitudinal longitudinal facilitate analyses monitoring monitoring address network statistical statistical statistical detection make outcome context developing developing resources tools tools cross adaptive biomarkers phase markers markers markers markers cost highly biological progression basis unique evaluation evaluation evaluation evaluation evaluation evaluate evaluate large large public due therapy information information information developed propose propose programs improve methods methods population effective addition early early intervention important analysis design identify patients patients critical proposal proposal time treatment treatment treatment risk risk risk risk risk risk based provide provide develop develop develop develop development development disease disease disease cancer cancer cancer cancer cancer project specific clinical clinical aim aim aim data studies studies studies studies research research "
#> 8607498
#> " conceals landscape dyes conformation angles introduce trimeric interfere virion fluorophores placement epitopes structurally smfret smfret evade sustain gp resonance evasion aids maintaining potency unstable envelope unliganded unliganded sequence exposed exposed entry conformational conformational conformational report transfer infectivity glycoprotein monitor organic reveal molecule molecule broadly native enormous promote fluorescence molecules variability enzymatic neutralizing neutralizing neutralizing effectively largely antibody antibody positions apply underlying single surface surface substantial dynamics characterization antibodies antibodies antibodies antibodies energy virus virus env env env env env env env states efforts viral binding binding existing receptor targeted individual understand mechanism change sites ability ability vaccine vaccine host identified due methods effective approach imaging application identify immune immune critical molecular hiv hiv hiv hiv hiv hiv hiv hiv response response develop development description determine applicant provided "