Associating Drives Based on Their Artifact and Metadata Distributions
Abstract
Associations between drive images can be important in many forensic investigations, particularly those involving organizations, conspiracies, or contraband. This work investigated metrics for comparing drives based on the distributions of 18 types of clues. The clues were email addresses, phone numbers, personal names, street addresses, possible bank-card numbers, GPS data, files in zip archives, files in rar archives, IP addresses, keyword searches, hash values on files, words in file names, words in file names of Web sites, file extensions, immediate directories of files, file sizes, weeks of file creation times, and minutes within weeks of file creation. Using a large corpus of drives, we computed distributions of document association using the cosine similarity TF/IDF formula and Kullback-Leibler divergence formula. We provide signif- icance criteria for similarity based on our tests that are well above those obtained from random distributions. We also compared similarity and divergence values, investigated the benefits of filtering and sampling the data before measuring association, examined the similarities of the same drive at different times, and developed useful visualization techniques for the associations.
Description
The article of record as published may be found at https://doi.org/10.1007/978-3-030-05487-8_9
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.Collections
Related items
Showing items related by title, author, creator and subject.
-
Evaluating Atlantic tropical cyclone track error distributions based on forecast confidence
Hauke, Matthew D. (Monterey, California. Naval Postgraduate School, 2006-06);A new Tropical Cyclone (TC) surface wind speed probability product from the National Hurricane Center (NHC) takes into account uncertainty in track, maximum wind speed, and wind radii. A Monte Carlo (MC) model is used ... -
The distribution of wave heights and periods for seas with unimodal and bimodal power density spectra
Sharpe, Matthew Michael (Monterey California. Naval Postgraduate School, 1990);Observed distributions of wave heights and periods taken from one year of surface wave monitoring near Martha's Vineyard are compared to distributions based on narrow-band theory. The joint distributions of wave heights ... -
Computer simulation of random and non-random second-phase distributions in two-phase materials
Pas, Michael E. (Monterey, California: Naval Postgraduate School, 1990-12);The mechanical properties of any material with a discontinuous second phase dispersed in a matrix are recognized to be influenced by the distribution of the second-phase particles. Current models for the prediction of ...