Visualizing mixed variable-type multidimensional data using tree distances
Whitaker, Lyn R.
Buttrey, Samuel E.
MetadataShow full item record
This research explores the use of the tree distances of Buttrey and Whitaker to visualize multidimensional data of mixed-variable types, having both numerical and categorical data. Tree distances measure dissimilarities among observations in a data set while exploiting desirable properties of classification and regression trees: ease of handling of most variable types, indifference to variable scaling, resistance to noise and outliers, accommodations for missing values, and computational ease. In this research, we map the dissimilarities using Classical Multidimensional Scaling to a lower-dimensional Euclidean space in order to provide an analyst with a comfortable framework, which supplies visual cues in order to help find patterns and gain insights about the data. We offer in this thesis several algorithms for coloring observations in the lower-dimensional mappings in order to focus the analyst’s attention on the most important and interesting relationships in the data set. In addition, through our visualization, we gain a deeper understanding of the properties of tree distances and propose a modification. Our framework can be used on any military data set that involves mixed or non-mixed variables and is valuable for analysts who wish to shed light on data during the exploratory phase of analysis.
Approved for public release; distribution is unlimited
Showing items related by title, author, creator and subject.
Orr, Anton D. (Monterey, California: Naval Postgraduate School, 2014-12);Clustering is an algorithmic technique that aims to group similar objects together in order to give users better understanding of the underlying structure of their data. It can be thought of as a two-step process. The first ...
Solabarrieta, Lohitzune; Forlov, Sergey; Cook, Mike; Paduan, Jeff; Rubio, Anna; González, Manuel; Mader, Julien; Charria, Guillaume (American Meteorological Society, 2016-12);Since January 2009, two long-range high-frequency (HF) radar systems have been collecting hourly high-spatial-resolution surface current data in the southeastern corner of the Bay of Biscay. The temporal resolution of ...
Lynch, Sarah K. (Monterey, California: Naval Postgraduate School, 2014-03);Clustering is the process of putting observations into groups based on their distance, or dissimilarity, from one another. Measuring distance for continuous variables often requires scaling or monotonic transformation. ...