Tropical principal component analysis on the space of phylogenetic trees
Authors
Page, Robert
Yoshida, Ruriko
Zhang, Leon
Subjects
Advisors
Date of Issue
2020-11
Date
2020-11
Publisher
Oxford University Press
Language
en_US
Abstract
Motivation: Due to new technology for efficiently generating genome data, machine learning methods are urgently needed to analyze large sets of gene trees over the space of phylogenetic trees. However, the space of phylogenetic trees is not Euclidean, so ordinary machine learning methods cannot be directly applied. In 2019, Yoshida et al. introduced the notion of tropical principal component analysis (PCA), a statistical method for visualization and dimensionality reduction using a tropical polytope with a fixed number of vertices that minimizes the sum of tropical distances between each data point and its tropical projection. However, their work focused on the tropical projective space rather than the space of phylogenetic trees. We focus here on tropical PCA for dimension reduction and visualization over the space of phylogenetic trees. Results: Our main results are 2-fold: (i) theoretical interpretations of the tropical principal components over the space of phylogenetic trees, namely, the existence of a tropical cell decomposition into regions of fixed tree topology; and (ii) the development of a stochastic optimization method to estimate tropical PCs over the space of phylogenetic trees using a Markov Chain Monte Carlo approach. This method performs well with simulation studies, and it is applied to three empirical datasets: Apicomplexa and African coelacanth genomes as well as sequences of hemagglutinin for influenza from New York.
Type
Article
Description
Availability and implementation: Dataset: http://polytopes.net/Data.tar.gz. Code: http://polytopes.net/tropica_ MCMC_codes.tar.gz. Supplementary information: Supplementary data are available at Bioinformatics online.
17 USC 105 interim-entered record; under review.
17 USC 105 interim-entered record; under review.
The article of record as published may be found at https://doi.org/10.1093/bioinformatics/btaa564
17 USC 105 interim-entered record; under review.
17 USC 105 interim-entered record; under review.
The article of record as published may be found at https://doi.org/10.1093/bioinformatics/btaa564
Series/Report No
Department
Operations Research (OR)
Organization
Naval Postgraduate School
Identifiers
NPS Report Number
Sponsors
Funder
Format
9 p.
Citation
Page, Robert, Ruriko Yoshida, and Leon Zhang. "Tropical principal component analysis on the space of phylogenetic trees." Bioinformatics 36.17 (2020): 4590-4598.
Distribution Statement
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.