USING TEXTURE VECTOR ANALYSIS TO MEASURE COMPUTER AND DEVICE FILE SIMILARITY

Loading...
Thumbnail Image
Authors
Allen, Bruce D.
Subjects
file similarity
fingerprint
perceptual hashing
common subsequence
binary file comparison
Advisors
Rowe, Neil C.
Michael, James B.
Date of Issue
2020-03
Date
Publisher
Monterey, CA; Naval Postgraduate School
Language
Abstract
Executable programs run on computers and digital devices. These programs are stored as executable files in storage media such as disk drives or solid-state storage drives within the device, and are opened and run. Some executable files are preinstalled by the device vendor. Other executable files may be installed by downloading them from the internet or by copying them in from an external storage media such as a memory stick or CD. It is useful to study file similarity between executable files to verify valid updates, identify potential copyright infringement, identify malware, and detect other abuse of purchased software. An alternative to relying on simplistic methods of file comparison, such as comparing their hash codes to see whether they are identical, is to identify the “texture” of files and then assess its similarity between files. To test this idea, we experimented with a sample of 23 Windows executable file families and 1,386 files. We identify points of similarity between files by comparing sections of data in their standard deviations, means, modes, mode counts, and entropies. When vectors are sufficiently similar, we calculate the offsets (shifts) between the sections to get them to align. Using a histogram, we find the most-likely offsets for blocks of similar code. Results of the experiments indicate that this approach can measure file similarity efficiently. By plotting similarity versus time, we track the progression of similarity between files.
Type
Thesis
Description
Series/Report No
Department
Computer Science (CS)
Organization
Identifiers
NPS Report Number
Sponsors
Funder
Format
Citation
Distribution Statement
Approved for public release; distribution is unlimited.
Rights
Copyright is reserved by the copyright owner.
Collections