Tropical Support Vector Machine and its Applications to Phylogenomics
Abstract
Most data in genome-wide phylogenetic analysis (phylogenomics) is essentially multidimensional, posing a major challenge to human comprehension and computational analysis. Also, we can not directly apply statistical learning models in data science to a set of phylogenetic trees since the space of phylogenetic trees is not Euclidean. In fact, the space of phylogenetic trees is a tropical Grassmannian in terms of max-plus algebra. Therefore, to classify multi-locus data sets for phylogenetic analysis, we propose tropical support vector machines (SVMs). Like classical SVMs, a tropical SVM is a discriminative classifier defined by the tropical hyperplane which max- imizes the minimum tropical distance from data points to itself in order to separate these data points into sectors (half-spaces) in the tropical projective torus. Both hard margin tropical SVMs and soft margin tropical SVMs can be formulated as linear programming problems. We focus on classifying two categories of data, and we study a simpler case by assuming the data points from the same category ideally stay in the same sector of a tropical separating hyperplane. For hard margin tropical SVMs, we prove the necessary and sufficient conditions for two categories of data points to be separated, and we show an explicit formula for the optimal value of the feasible linear programming problem. For soft margin tropical SVMs, we develop novel methods to compute an optimal tropical separating hyperplane. Computational experiments show our methods work well. We end this paper with open problems.
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.Collections
Related items
Showing items related by title, author, creator and subject.
-
Tropical Fermat–Weber Points
Lin, Bo; Yoshida, Ruriko (Society for Industrial and Applied Mathematics (SIAM), 2018);In a metric space, the Fermat–Weber points of a sample are statistics to measure the central tendency of the sample and it is well known that the Fermat–Weber point of a sample is not necessarily unique in the metric space. ... -
Tropical principal component analysis on the space of phylogenetic trees
Page, Robert; Yoshida, Ruriko; Zhang, Leon (Oxford University Press, 2020-11);Motivation: Due to new technology for efficiently generating genome data, machine learning methods are urgently needed to analyze large sets of gene trees over the space of phylogenetic trees. However, the space of ... -
Tropical principal component analysis and its application to phylogenetics
Yoshida, Ruriko; Zhang, Leon; Zhang, Xu (ArXiv, 2017-10-15);Principal component analysis is a widely-used method for the dimensionality reduction of a given data set in a high-dimensional Euclidean space. Here we define and analyze two analogues of principal component analysis in ...