A scale-independent clustering method with automatic variable selection based on trees

Download
Author
Lynch, Sarah K.
Date
2014-03Advisor
Buttrey, Samuel E.
Second Reader
Whitaker, Lyn R.
Metadata
Show full item recordAbstract
Clustering is the process of putting observations into groups based on their distance, or dissimilarity, from one another. Measuring distance for continuous variables often requires scaling or monotonic transformation. Determining dissimilarity when observations have both continuous and categorical measurements can be difficult because each type of measurement must be approached differently. We introduce a new clustering method that uses one of three new distance metrics. In a dataset with p variables, we create p trees, one with each variable as the response. Distance is measured by determining on which leaf an observation falls in each tree. Two observations are similar if they tend to fall on the same leaf and dissimilar if they are usually on different leaves. The distance metrics are not affected by scaling or transformations of the variables and easily determine distances in datasets with both continuous and categorical variables. This method is tested on several well-known datasets, both with and without added noise variables, and performs very well in the presence of noise due in part to automatic variable selection. The new distance metrics outperform several existing clustering methods in a large number of scenarios.
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.Collections
Related items
Showing items related by title, author, creator and subject.
-
Case study of the Naval Postgraduate School's Distance Learning Program
Sanders, Donald R. (Monterey, California. Naval Postgraduate School, 2001-12);Amidst growing pressures of budgetary constraints and an era of downsizing, the Naval Postgraduate School must seek alternative ways of delivering quality education to its customers. NPS has turned to various forms of ... -
An optimization technique using the finite element method and orthogonal arrays
Young, Stuart H. (Monterey, California. Naval Postgraduate School, 1996-09);The objective of this research was to develop an optimization technique that can be used interactively by design engineers to approach an optimal design with minimal computational effort. The technique can be applied to ... -
Hamming, Learning to Learn: Foundations of Digital Revolution, 30 March 1995 [video]
Hamming, Richard W. (Monterey, California: Naval Postgraduate School, 1995-03-30);Foundations of the Digital (Discrete) Revolution. We are approaching the end of the revolution of going from signaling with continuous signals to signaling with discrete pulses, and we are now probably moving from using ...