A Bayesian beta kernel model for binary classification and online learning problems
Loading...
Authors
MacKenzie, Cameron A.
Trafalis, Theodore B.
Barker, Kash
Subjects
data mining
kernel
Bayesian
beta distribution
online learning
kernel
Bayesian
beta distribution
online learning
Advisors
Date of Issue
2014-01
Date
Publisher
Language
Abstract
Recent advances in data mining have integrated kernel functions with Bayesian
probabilistic analysis of Gaussian distributions. These machine learning approaches
can incorporate prior information with new data to calculate probabilistic
rather than deterministic values for unknown parameters. This paper
extensively analyzes a speci c Bayesian kernel model that uses a kernel function
to calculate a posterior beta distribution that is conjugate to the prior beta distribution.
Numerical testing of the beta kernel model on several benchmark data
sets reveals that this model's accuracy is comparable with those of the support
vector machine, relevance vector machine, naive Bayes, and logistic regression,
and the model runs more quickly than other algorithms. When one class occurs
much more frequently than the other class, the beta kernel model often
outperforms other strategies to handle imbalanced data sets, including undersampling,
over-sampling, and the Synthetic Minority Over-Sampling Technique.
If data arrive sequentially over time, the beta kernel model easily and quickly
updates the probability distribution, and this model is more accurate than an
incremental support vector machine algorithm for online learning.
Type
Article
Description
Statistical Analysis and Data Mining, 7(6), 434-449. Author's accepted manuscript
The article of record may be found at http://dx.doi.org/10.1002/sam.11241
The article of record may be found at http://dx.doi.org/10.1002/sam.11241
Series/Report No
Department
Organization
Identifiers
NPS Report Number
Sponsors
Funder
This work was funded in part by the U.S. Army Research, Development and Engineering Command, Army Research Office, Mathematical Science Division, under proposal no. 61414-MA-II.
Format
Citation
Preprint submitted to Statistical Analysis and Data Mining January 21, 2014
Distribution Statement
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.