Naval Postgraduate School
Dudley Knox Library
NPS Dudley Knox Library
View Item 
  •   Calhoun Home
  • Theses and Dissertations
  • 1. Thesis and Dissertation Collection, all items
  • View Item
  •   Calhoun Home
  • Theses and Dissertations
  • 1. Thesis and Dissertation Collection, all items
  • View Item
  • How to search in Calhoun
  • My Accounts
  • Ask a Librarian
JavaScript is disabled for your browser. Some features of this site may not work without it.

Browse

All of CalhounCollectionsThis Collection

My Account

LoginRegister

Statistics

Most Popular ItemsStatistics by CountryMost Popular Authors

Applications and benefits for big data sets using tree distances and the t-SNE algorithm

Thumbnail
Download
Icon16Mar_Lee_Suyoung.pdf (3.918Mb)
Download Record
Download to EndNote/RefMan (RIS)
Download to BibTex
Author
Lee, Suyoung
Date
2016-03
Advisor
Buttrey, Samuel E.
Second Reader
Whitaker, Lyn R.
Metadata
Show full item record
Abstract
Modern data sets often consist of unstructured data and mixed data; that is, they include both numerical and categorical variables. Often, these data sets will include noise, redundancy, missing values and outliers. Clustering is one of the most important and widely-used data analytic methods. However, clustering requires the ability to measure distances or dissimilarities, which are not defined in an obvious way for mixed data. Practitioners often use the Gower dissimilarity for this task. In this work we use tree distance computed using Buttrey’s treeClust package in R, as discussed by Buttrey and Whitaker in 2015, to process mixed data, at the same time handling missing values and outliers. Visualization is also an important method for big data. We use the t-distributed Stochastic Neighbor Embedded (t-SNE) algorithm for visualization introduced by van der Maaten and Hinton in 2008, which produces visualization for high-dimensional data by assigning individual data points in a two- or three-dimensional map. We also use popular visualization techniques grouped under the name multidimensional scaling. We compare the results using the tree distance and the t-SNE algorithm to results from using Gower dissimilarity and multidimensional scaling. Unlike established dimensionality reduction techniques, which generally map from high dimensions directly to two (or three) dimensions, we explore a new approach in which the dimensionality reduction takes place in several separate steps. Our experiments show that our new techniques can outperform the established techniques in producing visualizations of high-dimensional mixed data.
Rights
Copyright is reserved by the copyright owner.
URI
http://hdl.handle.net/10945/48546
Collections
  • 1. Thesis and Dissertation Collection, all items

Related items

Showing items related by title, author, creator and subject.

  • Thumbnail

    3D visualization of an invariant display strategy for hyperspecteral imagery 

    Kim, Kang Suk (Monterey, California. Naval Postgraduate School, 2002-12);
    Spectral Imagery provides multi-dimensional data, which are difficult to display in standard three-color image formats. Tyo, et al. (2001) propose an invariant display strategy to address this problem. This approach is to ...
  • Thumbnail

    A SYSTEMS ENGINEERING APPROACH TO COMPARING MIXED REALITY GAMING ENGINES WITHIN THE DOD 

    Cha, Ted L.; Davis, Blake A.; Shutte, Zachariah R.; Snodgrass, Douglas J.; Wimsatt, Christopher J.; Ybarra, Rene V. (Monterey, CA; Naval Postgraduate School, 2020-12);
    Joint Special Operations Command (JSOC), the primary stakeholder of this report, identified a need to visualize the operating environment prior to mission execution. Historically, JSOC performed visualization by two-dimensional ...
  • Thumbnail

    Scenario Authoring and Visualization for Advanced Graphical Environments (SAVAGE) 

    Nicklaus, Shane D. (Monterey, California. Naval Postgraduate School, 2001-09);
    Todayαs planning and modeling systems use two-dimensional (2D) representations of the threedimensional (3D) battlespace. This presents a challenge for planners, commanders, and troops to understand the true nature of the ...
NPS Dudley Knox LibraryDUDLEY KNOX LIBRARY
Feedback

411 Dyer Rd. Bldg. 339
Monterey, CA 93943
circdesk@nps.edu
(831) 656-2947
DSN 756-2947

    Federal Depository Library      


Start Your Research

Research Guides
Academic Writing
Ask a Librarian
Copyright at NPS
Graduate Writing Center
How to Cite
Library Liaisons
Research Tools
Thesis Processing Office

Find & Download

Databases List
Articles, Books & More
NPS Theses
NPS Faculty Publications: Calhoun
Journal Titles
Course Reserves

Use the Library

My Accounts
Request Article or Book
Borrow, Renew, Return
Tech Help
Remote Access
Workshops & Tours

For Faculty & Researchers
For International Students
For Alumni

Print, Copy, Scan, Fax
Rooms & Study Spaces
Floor Map
Computers & Software
Adapters, Lockers & More

Collections

NPS Archive: Calhoun
Restricted Resources
Special Collections & Archives
Federal Depository
Homeland Security Digital Library

About

Hours
Library Staff
About Us
Special Exhibits
Policies
Our Affiliates
Visit Us

NPS-Licensed Resources—Terms & Conditions
Copyright Notice

Naval Postgraduate School

Naval Postgraduate School
1 University Circle, Monterey, CA 93943
Driving Directions | Campus Map

This is an official U.S. Navy Website |  Please read our Privacy Policy Notice  |  FOIA |  Section 508 |  No FEAR Act |  Whistleblower Protection |  Copyright and Accessibility |  Contact Webmaster

Export search results

The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

A logged-in user can export up to 15000 items. If you're not logged in, you can export no more than 500 items.

To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.