FREQUENCY-BASED FEATURE EXTRACTION FOR MALWARE CLASSIFICATION
Erwert, Jonathan P.
Rowe, Neil C.
MetadataShow full item record
Traditional signature-based malware detection is effective, but it can only identify known malicious programs. This thesis attempts to use machine-learning techniques to successfully identify previously unknown malware from a set of Windows executable programs. We analyzed the histogram of 4-, 8-, and 16-bit-sequence values contained in each program. We then analyzed the effectiveness of using these histograms in part or in full as feature vectors for machine learning experiments. We also explored the effect of an offset at the beginning of each program and its impact on classifier performance. We successfully show that a machine learning classifier can be learned from these features, with an f-measure in excess of 90% attained in one of our experiments. Using a part of the histogram as the feature vector did not significantly affect classifier performance up to a point, nor did including an offset. Our results also suggest that features derived from histograms are better suited to tree-based algorithms compared to Bayesian methods.
Approved for public release. distribution is unlimited
Showing items related by title, author, creator and subject.
Alves, Jorge; Herman, Jessica; Rowe, Neil C. (Monterey, California. Naval Postgraduate School, 2004-06);Accurate identification of unknown contacts crucial in military intelligence. Automated systems that quickly and accurately determine the identity of a contact could be a benefit in backing up electronic-signals ...
Borges, C.F. (1999);We examine the histogram method proposed in  for estimating the parameters associated with a Markov random field. This method relies on the estimation of the local interaction sums from histogram data. We derive an ...
Richstein, James K. (Monterey, California. Naval Postgraduate School, 1993-12);Histogram generation, a standard image processing operation, is a record of the intensity distribution in the image. Histogram generation has straight forward implementations on digital computers using high level languages. ...