USING DCGAN TO GENERATE SYNTHETIC PACKET FLOWS FOR THREAT DETECTION
Authors
Tan, Swee Khoon
Subjects
synthetic dataset
machine learning
deep convolutional general adversarial networks
convolutional neural network
random forest
cyber threat detection
network traffic
packet flow
machine learning
deep convolutional general adversarial networks
convolutional neural network
random forest
cyber threat detection
network traffic
packet flow
Advisors
Barton, Armon C.
Date of Issue
2024-09
Date
Publisher
Monterey, CA; Naval Postgraduate School
Language
Abstract
Labeled network traffic data is needed in the training of machine learning models used in threat detection. Such traffic is scarce and often imbalanced as the labeling is intensive and requires domain expertise. Deep Convolutional Generative Adversarial Networks (DCGAN) are known for their image recognition and generation capabilities by learning inherent features within the image. In this thesis, we looked at how DCGAN could be trained to learn and generate synthetic packet flow data to supplement an imbalanced dataset to improve the performance of Random Forest and Convolutional Neural Network (CNN) models. The study assessed the correctness and quality of the generated data and investigated its effects on classifier models at various scarcity levels of malicious data available for training. We found that although DCGAN had problems generating valid data, post-processing them improved their validity. The resulting generated data improved the performance of both classifier model types but fell behind improvements made using up-sampling on Random Forest models. The study later showed that synthetic data could defeat up-sampling improvements at certain scarcity levels by significantly reducing the number of false negatives at the cost of higher false positives. This provides not only a solution for practitioners to overcome data scarcity problems, but also insights to enhancing threat detection using a combination of DCGAN synthetic data and CNN-based threat detection models
Type
Thesis
Description
Series/Report No
Department
Organization
Identifiers
NPS Report Number
Sponsors
Funder
Format
Citation
Distribution Statement
Distribution Statement A. Approved for public release: Distribution is unlimited.
Rights
�Copyright�is reserved by the copyright owner.