CURatio: Genome-wide phylogenomic analysis method using ratios of total branch lengths
Authors
Kang, Qiwen
Moore, Neil
Schardl, Christopher L.
Yoshida, Ruriko
Subjects
Evolutionary models
Gene trees
Likelihood functions
Outliers
Phylogenomics
Species trees
Gene trees
Likelihood functions
Outliers
Phylogenomics
Species trees
Advisors
Date of Issue
2018
Date
2018
Publisher
IEEE
Language
Abstract
Evolutionary hypotheses provide important underpinnings of biological and medical sciences, and comprehensive, genome-wide understanding of evolutionary relationships among organisms are needed to test and refine such hypotheses. Theory and empirical evidence clearly indicate that phylogenies (trees) of different genes (loci) should not display precisely matching topologies. The main reason for such phylogenetic incongruence is reticulated evolutionary history of most species due to meiotic sexual recombination in eukaryotes, or horizontal transfers of genetic material in prokaryotes. Nevertheless, many genes should display topologically related phylogenies, and should group into one or more (for genetic hybrids) clusters in poly-dimensional “tree space”. Unusual evolutionary histories or effects of selection may result in “outlier” genes with phylogenies that fall outside the main distribution(s) of trees in tree space. We present a new phylogenomic method, CURatio, which uses ratios of total branch lengths in gene trees to help identify phylogenetic outliers in a given set of ortholog groups from multiple genomes. An advantage of CURatio over other methods is that genes absent from and/or duplicated in some genomes can be included in the analysis. We conducted a simulation study under the coalescent model, and showed that, given sufficient species depth and topological difference, these ratios are significantly higher for the “outlier” gene phylogenies. Also, we applied CURatio to a set of annotated genomes of the fungal family, Clavicipitaceae, and identified alkaloid biosynthesis genes as outliers, probably due to a history of duplication and loss. The source code is available at https://github.com/QiwenKang/CURatio, and the empirical data set on Clavicipitaceae and simulated data set are available at Mendeley https://data.mendeley.com/datasets/mrxts7wjrr/1.
Type
Article
Description
Series/Report No
Department
Operations Research (OR)
Organization
Identifiers
NPS Report Number
Sponsors
Funder
Format
9 p.
Citation
Kang, Qiwen, et al. "CURatio: Genome-wide phylogenomic analysis method using ratios of total branch lengths." IEEE/ACM transactions on computational biology and bioinformatics (2018).
Distribution Statement
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.