Age Detection in Chat
Abstract
This paper presents the results of using statistical
analysis and automatic text categorization to identify an author’s age group based on the author's
online chat posts. A Naive Bayesian Classifier and Support Vector Machine (SVM) model were used.
The SVM model experiments generated an f-score measurement of 0.996 on test data distinguishing
teens from adults. We also introduce an alternative method for generating “stop words” that chooses
n-grams based
on their relative distribution across the classes.