Authorship discovery in blogs using Bayesian classification with corrective scaling
Gehrke, Grant T.
Martell, Craig H.
Squire, Kevin M.
MetadataShow full item record
Widespread availability of free, public blog platforms has facilitated growth in the amount of individually written electronic text available online. Our research leverages an extremely large blog corpus for a study in authorship discovery, both to evaluate a traditional technique as applied to blogs, as well as to demonstrate the implications of authorship discovery in blogs for intelligence and forensic purposes. Our study uses a Bayesian classifier with two important extensions. First, we introduce a post-classification corrective scaling technique to mitigate the over-classification of many samples to a few authors. Second, we propose an n-percent-correct threshold metric, whereby we define a "correct" result as one where the true author is within some small subset of the original search space rather than requiring that he or she be the single most probable author. Using this technique, we are able to reduce a search space of 2000 authors to 1% of its original size with 91% accuracy when 1000 bigrams are present, or reduce the search space to 10% of its original size with 94% accuracy when only 500 bigrams are present.
Showing items related by title, author, creator and subject.
Fargues, Monique P. (Monterey, California. Naval Postgraduate School, 2001-06); NPS-EC-01-005Extracting relevant features that allow for class discrimination is the first critical step in classification applications. However, this step often leads to high-dimensional feature spaces, which requires large datasets ...
DERE, Ahmet Murat. (Monterey, California. Naval Postgraduate School, 2006-12);Usage Monitoring requires accurate regime recognition. For each regime, there is a usage assigned for each component. For example, the damage accumulated at a component is higher if the aircraft is undergoing a high G ...
Watkins, Bruce E. (Monterey, California. Naval Postgraduate School, 1991-09);This thesis investigates the application of artificial neural networks for the compression of image data. An algorithm is developed using the competitive learning paradigm which takes advantage of the parallel processing ...