A MACHINE LEARNING APPROACH FOR CLASSIFYING JAVASCRIPT USING STATIC CODE ANALYSIS
Loading...
Authors
Miller, Michael D.
Subjects
JavaScript
machine learning
static
behavior
large data sets
Jupyter Notebook
real world data
machine learning
static
behavior
large data sets
Jupyter Notebook
real world data
Advisors
McEachen, John C.
Tummala, Murali
Date of Issue
2022-03
Date
Publisher
Monterey, CA; Naval Postgraduate School
Language
Abstract
This thesis develops a machine learning approach to classify normal and anomalous JavaScript based on a static analysis of select features derived from the top 30 000 webpages on the internet. A dataset of 136 features was extracted from 100 000 raw JavaScript files. Nine test groups were created and tested using 10 subsets of features. K-means clustering was used to group the data and manually translate into binary classification. The results from the K-means clustering show moderate performance with distortions less than 1.0 from elbow plot analysis and average silhouette scores between 0.3 and 0.8 using silhouette analysis of the clustering. The classification of each JavaScript file was then examined using naïve Bayes algorithm to re-create and examine the performance of the highest performing classifiers using a less processing intensive method. Naïve Bayes was not a good model to re-create the K-means classifier. The best performing classifiers had a Matthews correlation coefficient of 0.75 when examining small JavaScript, and less that 0.38 when examining the medium or large JavaScript. The results show that most JavaScript files were small in file size, and file size was the only defining feature. No features tested effectively categorize the vast majority of JavaScript other than file size. Further research is needed to find features that more accurately encompass the majority of JavaScript to define normal JavaScript.
Type
Thesis
Description
Series/Report No
Department
Electrical and Computer Engineering (ECE)
Organization
Identifiers
NPS Report Number
Sponsors
National Security Agency, Ft. Meade, MD, 20755
Funder
Format
Citation
Distribution Statement
Approved for public release. Distribution is unlimited.
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.