Visualizing mixed variable-type multidimensional data using tree distances

Download
Author
Shaham, Yoav
Date
2015-09Advisor
Whitaker, Lyn R.
Second Reader
Buttrey, Samuel E.
Metadata
Show full item recordAbstract
This research explores the use of the tree distances of Buttrey and Whitaker to visualize multidimensional data of mixed-variable types, having both numerical and categorical data. Tree distances measure dissimilarities among observations in a data set while exploiting desirable properties of classification and regression trees: ease of handling of most variable types, indifference to variable scaling, resistance to noise and outliers, accommodations for missing values, and computational ease. In this research, we map the dissimilarities using Classical Multidimensional Scaling to a lower-dimensional Euclidean space in order to provide an analyst with a comfortable framework, which supplies visual cues in order to help find patterns and gain insights about the data. We offer in this thesis several algorithms for coloring observations in the lower-dimensional mappings in order to focus the analyst’s attention on the most important and interesting relationships in the data set. In addition, through our visualization, we gain a deeper understanding of the properties of tree distances and propose a modification. Our framework can be used on any military data set that involves mixed or non-mixed variables and is valuable for analysts who wish to shed light on data during the exploratory phase of analysis.
Rights
Copyright is reserved by the copyright owner.Related items
Showing items related by title, author, creator and subject.
-
Improving cluster analysis with automatic variable selection based on trees
Orr, Anton D. (Monterey, California: Naval Postgraduate School, 2014-12);Clustering is an algorithmic technique that aims to group similar objects together in order to give users better understanding of the underlying structure of their data. It can be thought of as a two-step process. The first ... -
Skill assessment of HF radar-derived products for Lagrangian simulations in the Bay of Biscay
Solabarrieta, Lohitzune; Forlov, Sergey; Cook, Mike; Paduan, Jeff; Rubio, Anna; González, Manuel; Mader, Julien; Charria, Guillaume (American Meteorological Society, 2016-12);Since January 2009, two long-range high-frequency (HF) radar systems have been collecting hourly high-spatial-resolution surface current data in the southeastern corner of the Bay of Biscay. The temporal resolution of ... -
A scale-independent clustering method with automatic variable selection based on trees
Lynch, Sarah K. (Monterey, California: Naval Postgraduate School, 2014-03);Clustering is the process of putting observations into groups based on their distance, or dissimilarity, from one another. Measuring distance for continuous variables often requires scaling or monotonic transformation. ...