treeClust: an R package for tree-based clustering dissimilarities
Loading...
Authors
Buttrey, Samuel E.
Whitaker, Lyn R.
Subjects
Advisors
Date of Issue
2015
Date
Publisher
Language
Abstract
This paper describes treeClust, an R package that produces dissimilarities useful for clustering.
These dissimilarities arise from a set of classification or regression trees, one with each variable in
the data acting in turn as a the response, and all others as predictors. This use of trees produces dissimilarities that are insensitive to scaling, benefit from automatic variable selection, and appear to perform
well. The software allows a number of options to be set, affecting the set of objects returned in the call;
the user can also specify a clustering algorithm and, optionally, return only the clustering vector. The
package can also generate a numeric data set whose inter-point distances relate to the treeClust ones;
such a numeric data set can be much smaller than the vector of inter-point dissimilarities, a useful
feature in big data sets.
Type
Article
Description
Series/Report No
Department
Operations Research
Organization
Naval Postgraduate School (U.S.)
Identifiers
NPS Report Number
Sponsors
Funder
Format
10 p.
Citation
Samuel E. Buttrey, Lyn R. Whitaker, "treeClust: an R package for tree-based clustering dissimilarities"
The R Journal, v. 7, no. 2, December 2015, pp. 227-236
Distribution Statement
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.