Topic detection in online chat
Durham, Jonathan S.
Martell, Craig H.
Schein, Andrew I.
MetadataShow full item record
The ubiquity of Internet chat applications has benefited many different segments of society. It also creates opportunities for criminal enterprise, terrorism, and espionage. This thesis proposes statistical Natural Language Processing (NLP) methods for creating systems that would detect the topic of chat in support of larger NLP goals such as information retrieval, text classification and illicit activity detection. We propose a novel method for determining the topic of chat discourse. We trained Latent Dirichlet Allocation (LDA) models on source documents and then used inferred topic distributions as feature vectors for a Support Vector Machine (SVM) classification system. We constructed LDA models in three ways: We considered the collective posts of authors as documents, hypothesizing that we could detect the topic physics given only one side of the conversation. The resultant classifiers obtained F-scores of 0.906. Next, we considered individual posts as documents, hypothesizing we could detect physics posts. The resultant classifiers obtained F-scores of 0.481. Finally, we considered physics textbook paragraphs as documents, hypothesizing that we could determine the topic of an author or a post based on an LDA model created from a textbook and a sample of noisy chat. The resultant classifiers obtained F-scores of 0.848 and 0.536 respectively.
Approved for public release, distribution unlimited
Showing items related by title, author, creator and subject.
Van Orman, Brian Lee; Renard, Robert Joseph (1977-06);Diagnostic model output parameters, provided by the Fleet Numerical Weather Central, Monterey, California (FNWC) , and the marine fog frequency climatology developed at the Naval Postgraduate School, Monterey, California, ...
An analysis of performance at the basic school as a predictor of officer performance in the operating forces Hurndon, Nicholas A.; Wiler, Darby (Monterey, California. Naval Postgraduate School, 2008-03);The purpose of this thesis is to identify and assess factors that predict the performance of junior officers in the operating forces of the U.S. Marine Corps. In this analysis, fitness report scores are used as indicators ...
An analysis of the effect of lowered basic test battery selection scores on rephasals and disenrollments at selected Coast Guard class A schools. Kalletta, Daniel E. (Monterey, California. Naval Postgraduate School, 1978-12);This thesis investigates and evaluates the effect of lowered Basic Test Battery (BTB) selection scores on rephasals and disenrollments at selected Coast Guard Class A schools. It analyzes the differences in rephasal ...