MACHINE LEARNING OF EXTREMELY LARGE SETS OF SIGNAL COLLECTIONS USING CLUSTER COMPUTING
Authors
Ferris, Christopher L.
Advisors
Kragh, Frank E.
Scrofani, James W.
Second Readers
Subjects
machine learning
cluster computing
signal collection
signal analysis
cluster computing
signal collection
signal analysis
Date of Issue
2019-12
Date
Publisher
Monterey, CA; Naval Postgraduate School
Language
Abstract
Multitudes of signals are transmitted over the airwaves at any given moment, creating a large intelligence opportunity and reconnaissance problem. As technology advances, cluster computing methods must be explored to fill the intelligence gap caused by an increasingly large amount of data and a limited number of human analysts. In this thesis, Apache HBase, Phoenix, and Spark are employed on an AWS EMR cluster to store, query, and implement the K-means machine learning algorithm on a large-scale signals database. The signal databases tested consist of up to 100 million randomly generated signals, with nine feature columns of metadata. The signal data set is first bulk-loaded into HBase and a Phoenix layer is implemented. The data is then queried from Spark into a Dataframe for machine learning implementation. Additionally, the K-means implementations are run on multiple different computer-cluster configurations to test performance as a function of the number of computers in the cluster. This thesis demonstrates the capabilities and benefits of utilizing open-source software and cluster computing to implement large-scale machine learning on signal metadata.
Type
Thesis
Description
Series/Report No
Department
Electrical and Computer Engineering (ECE)
Organization
Identifiers
NPS Report Number
Sponsors
Funding
Format
Citation
Distribution Statement
Approved for public release; distribution is unlimited.
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.
