Large scale cross-drive correlation of digital media
Loading...
Authors
Bruaene, Joseph Van
Subjects
Digital Forensics
Similarity Detection
Automated Correlation
Digital Fingerprinting
Approximate Matching
Bulk Analysis
Similarity Detection
Automated Correlation
Digital Fingerprinting
Approximate Matching
Bulk Analysis
Advisors
McCarrin, Michael
Gondree, Mark
Date of Issue
2016-03
Date
Mar-16
Publisher
Monterey, California: Naval Postgraduate School
Language
Abstract
Traditional digital forensic practices have focused on individual hard disk analysis. As the digital universe continues to grow, and cyber crimes become more prevalent, the ability to make large scale cross-drive correlations among a large corpus of digital media becomes increasingly important. We propose a methodology that builds on bulk-analysis techniques to avoid operating system- and file-system specific parsing. In addition, we apply document similarity methods to forensic artifact correlation. By representing each disk image as a set of hash values corresponding to the 512-byte sectors on the disk, and calculating pair-wise similarity scores between hard disk images, we analyze a collection of disk images taken from various storage devices purchased from the secondary market. We conclude sector-based matching is sufficient to identify images in our dataset that share common DLLs, indicating similarity in their operating systems.We present a visualization of our results as an undirected graph with similarity scores represented as edge weights, and observe that disk images with common operating systems tend to align with graph clusters. Though no common set of sectors is present on all drives—even among the large fully-connected component in our graph—we find that grouping our dataset into subsets with the same operating system version does reveal sizable collections of common sectors, and achieved the best correlation between sector matches and high-level similarities in our dataset. Extending this technique to a larger dataset and continuing our investigation of the cause of sector-level matches could yield an automated method of profiling new disk images during the triage process. Moreover, this technique could be used to corroborate deductions regarding characteristics of information systems associated with target media.
Type
Thesis
Description
Series/Report No
Department
Computer Science
Computer Science
Organization
Identifiers
NPS Report Number
Sponsors
Funder
Format
Citation
Distribution Statement
Approved for public release; distribution is unlimited.
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.