Authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification
Loading...
Authors
Levy-Minzie, Kori.
Subjects
Advisors
Martell, Craig
Date of Issue
2011-03
Date
Publisher
Monterey, CA; Naval Postgraduate School
Language
Abstract
We determined that it is possible to achieve authorship attribution in the e-mail domain when training on "ersonal" e-mails and testing on "work" e-mails and vice versa. These results are unique since they simulate two different e-mail addresses belonging to the same person where the topic of the e-mails from the two different addresses do not intersect. As we only used one classification technique, these results are preliminary and may serve as a baseline for future work in this area. The corpus of data was the entirety of the Enron corpus as well as a subsection of hand-annotated work and personal e-mails. We discovered that there is enough author signal in each class to identify an author in a sea of noise. We included suggestions for future work in the areas of expanding feature selection, increasing corpus size, and including more classification methods. Advancement in this area will contribute to increasing cyber security by identifying the senders of anonymous derogatory e-mails and reducing cyber bullying.
Type
Thesis
Description
Series/Report No
Department
Organization
Naval Postgraduate School (U.S.)
Identifiers
NPS Report Number
Sponsors
Funding
Format
xiv, 61 p. ;
Citation
Distribution Statement
Approved for public release; distribution is unlimited.
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.
