treeClust: an R package for tree-based clustering dissimilarities
Buttrey, Samuel E.
Whitaker, Lyn R.
MetadataShow full item record
This paper describes treeClust, an R package that produces dissimilarities useful for clustering. These dissimilarities arise from a set of classification or regression trees, one with each variable in the data acting in turn as a the response, and all others as predictors. This use of trees produces dissimilarities that are insensitive to scaling, benefit from automatic variable selection, and appear to perform well. The software allows a number of options to be set, affecting the set of objects returned in the call; the user can also specify a clustering algorithm and, optionally, return only the clustering vector. The package can also generate a numeric data set whose inter-point distances relate to the treeClust ones; such a numeric data set can be much smaller than the vector of inter-point dissimilarities, a useful feature in big data sets.
Showing items related by title, author, creator and subject.
Buttrey, Samuel E.; Whitaker, Lyn R. (2016-04-19); NPS-OR-16-003Clustering techniques divide observations into groups.Current techniques usually rely on measurements of dissimilarities between pairs of observations, between pairs of clusters, and between an observation and a cluster.For ...
Lee, Suyoung (Monterey, California: Naval Postgraduate School, 2016-03);Modern data sets often consist of unstructured data and mixed data; that is, they include both numerical and categorical variables. Often, these data sets will include noise, redundancy, missing values and outliers. ...
Shaham, Yoav (Monterey, California: Naval Postgraduate School, 2015-09);This research explores the use of the tree distances of Buttrey and Whitaker to visualize multidimensional data of mixed-variable types, having both numerical and categorical data. Tree distances measure dissimilarities ...