CLUSTER COMPUTING FOR AUTOMATED NETWORK ANALYSIS AT SCALE

Loading...
Thumbnail Image
Authors
Brida, Benjamin J.
Subjects
big data
MapReduce
Hadoop
packet capture
traffic analysis
network analysis
Spark
HBase
Advisors
Kragh, Frank E.
Scrofani, James W.
Date of Issue
2018-06
Date
Publisher
Monterey, CA; Naval Postgraduate School
Language
Abstract
Conventional single node packet analyzers are unable to monitor network traffic at scale. In this thesis, elements of the Apache Hadoop ecosystem, including HBase, Spark, and MapReduce, are employed to conduct network traffic analysis on a large collection of network traffic. Limited analysis is conducted directly on packet capture next generation (pcapng) files on the Hadoop Distributed File System (HDFS) using MapReduce. Next, to allow for repeated analysis on the same dataset without reading all source files in their entirety for every calculation, pcapng files are parsed and relevant meta-data is bulk loaded into HBase, a Not Only Structured Query Language (NoSQL) database employing the HDFS for parallelization. This NoSQL database is then accessed via Apache Spark where pertinent data is loaded into DataFrames and additional analysis on the network traffic takes place. This research demonstrates the viability of custom, modular, automated analytics, employing open-source software to enable parallelization, to conduct traffic analysis at scale.
Type
Thesis
Description
Series/Report No
Department
Electrical and Computer Engineering (ECE)
Organization
Identifiers
NPS Report Number
Sponsors
Funder
Format
Citation
Distribution Statement
Approved for public release; distribution is unlimited.
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.
Collections