Show simple item record

dc.contributor.authorButtrey, Samuel E.
dc.contributor.authorWhitaker, Lyn R.
dc.date.accessioned2016-05-03T15:40:36Z
dc.date.available2016-05-03T15:40:36Z
dc.date.issued2016-04-19
dc.identifier.urihttp://hdl.handle.net/10945/48615
dc.descriptionApproved for public release; distribution is unlimited.en_US
dc.description.abstractClustering techniques divide observations into groups.Current techniques usually rely on measurements of dissimilarities between pairs of observations, between pairs of clusters, and between an observation and a cluster.For numeric variables, these dissimilarity measurements often depend on the scaling of the variables, are changed by monotonic transformations, and do not provide for selection of “important" variables.In our scheme, we fit a set of regression or classification trees with each variable acting in turn as the “response" variable.Points are “close" to one another if they tend to appear in the same leaves of these trees.Trees with poor predictive power are discarded.Therefore, “noise" variables will often appear in none of the trees and have no effect on the clustering. Because our technique uses trees, the dissimilarities are unaffected by linear transformations of the numeric variables and resistant to monotonic ones and to outliers.Categorical variables are included automatically and missing values handled in a natural way.We demonstrate the performance of this technique by using these dissimilarities to cluster some well-known data sets to which noise has been added.en_US
dc.rightsThis publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.en_US
dc.titleA scale-independent, noise-resistant dissimilarity for tree-based clustering of mixed dataen_US
dc.typeTechnical Reporten_US
dc.contributor.departmentOperations Researchen_US
dc.subject.authorinter-point distanceen_US
dc.subject.authormixed dataen_US
dc.subject.authorclusteringen_US
dc.identifier.npsreportNPS-OR-16-003en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record