Improving automated lexical and discourse analysis of online chat dialog

dc.contributor.advisorMartell, Craig H.
dc.contributor.authorForsyth, Eric N.
dc.contributor.corporateNaval Postgraduate School
dc.contributor.secondreadersquire, Kevin M.
dc.date.accessioned2012-03-14T17:37:52Z
dc.date.available2012-03-14T17:37:52Z
dc.date.issued2007-09
dc.description.abstractOne of the goals of natural language processing (NLP) systems is determining the meaning of what is being transmitted. Although much work has been accomplished in traditional written and spoken language domains, little has been performed in the newer computer-mediated communication domain enabled by the Internet, to include text-based chat. This is due in part to the fact that there are no annotated chat corpora available to the broader research community. The purpose of our research is to build a chat corpus, initially tagged with lexical and discourse information. Such a corpus could be used to develop stochastic NLP applications that perform tasks such as conversation thread topic detection, author profiling, entity identification, and social network analysis. During the course of our research, we preserved 477,835 chat posts and associated user profiles in an XML format for future investigation. We privacy-masked 10,567 of those posts and part-of-speech tagged a total of 45,068 tokens. Using the Penn Treebank and annotated chat data, we achieved part-ofspeech tagging accuracy of 90.8%. We also annotated each of the privacy-masked corpus's 10,567 posts with a chat dialog act. Using a neural network with 23 input features, we achieved 83.2% dialog act classification accuracy.en_US
dc.description.distributionstatementApproved for public release; distribution is unlimited.
dc.description.serviceUS Air Force (USAF) author.en_US
dc.description.urihttp://archive.org/details/improvingutomate109453281
dc.format.extentxiv, 111 p.;en_US
dc.identifier.oclc176637337
dc.identifier.urihttps://hdl.handle.net/10945/3281
dc.publisherMonterey, California. Naval Postgraduate Schoolen_US
dc.subject.lcshComputer scienceen_US
dc.titleImproving automated lexical and discourse analysis of online chat dialogen_US
dc.typeThesisen_US
dspace.entity.typePublication
etd.thesisdegree.disciplineComputer Scienceen_US
etd.thesisdegree.grantorNaval Postgraduate Schoolen_US
etd.thesisdegree.levelMastersen_US
etd.thesisdegree.nameM.S.en_US
etd.verifiednoen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
07Sep_Forsyth.pdf
Size:
487.54 KB
Format:
Adobe Portable Document Format
Collections