Novel topic impact on authorship attribution

Loading...
Thumbnail Image
Authors
Caver, Johnnie F.
Subjects
Advisors
Schein, Andrew I.
Martell, Craig H.
Date of Issue
2009-12
Date
Publisher
Monterey, California. Naval Postgraduate School
Language
Abstract
Several authorship attribution studies have speculated about the existence of a link between topic cues and author style features. This research presents a novel experimental protocol for measuring the impact of topic features on author attribution predictive models. We call our technique "novel topic crossvalidation," which consists of holding out a single topic in a test set and iterating over choices of held-out topic to compute an average performance score. Using the New York Times Annotated corpus, we perform a subset procedure to build a sub-corpus of 18,862 documents, 15 authors, and 23 topics. With this sub-corpus, we perform a novel topic cross-validation. Our experiments differ from previous attempts to model topic/author influence in scope; previous methods were limited to three or fewer topics or authors. Having a larger set of topics and authors should provide researchers with a greater opportunity to explore the variability of style cues represented in sets of authors, as well as the confounding influence of topic. For this reason, we supply document/author/topic identifications so that researchers can build upon our work in a reproducible fashion.
Type
Thesis
Description
Series/Report No
Department
Organization
Naval Postgraduate School (U.S.)
Identifiers
NPS Report Number
Sponsors
Funder
Format
xiv, 65 p. : ill. ;
Citation
Distribution Statement
Approved for public release; distribution is unlimited.
Rights
Collections