Testing the National Software Reference Library
Abstract
The National Software Reference Library (NSRL) is an essential data source for forensic investigators, providing in its Reference Data Set
(RDS) a set of hash values of known software. However, the NSRL RDS has not previously been tested against a broad spectrum of real-world
data. The current work did this using a corpus of 36 million files on 2337 drives from 21 countries. These experiments answered a number of
important questions about the NSRL RDS, including what fraction of files it recognizes of different types. NSRL coverage by vendor/product
was also tested, finding 51% of the vendor/product names in our corpus had no hash values at all in NSRL. It is shown that coverage or “recall”
of the NSRL can be improved with additions from our corpus such as frequently-occurring files and files whose paths were found previously in
NSRL with a different hash value. This provided 937,570 new hash values which should be uncontroversial additions to NSRL. Several
additional tests investigated the accuracy of the NSRL data. Experiments testing the hash values saw no evidence of errors. Tests of file sizes
showed them to be consistent except for a few cases. On the other hand, the product types assigned by NSRL can be disputed, and it failed to
recognize any of a sample of virus-infected files. The file names provided by NSRL had numerous discrepancies with the file names found in the
corpus, so the discrepancies were categorized; among other things, there were apparent spelling and punctuation errors. Some file names suggest
that NSRL hash values were computed on deleted files, not a safe practice. The tests had the secondary benefit of helping identify occasional
errors in the metadata obtained from drive imaging on deleted files in our corpus. This research has provided much data useful in improving
NSRL and the forensic tools that depend upon it. It also provides a general methodology and software for testing hash sets against corpora.
Description
This paper appeared in the 2012 Digital Forensics Research Workshop (DFRWS 2012), Washington, DC, August 2012.
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.Collections
Related items
Showing items related by title, author, creator and subject.
-
Dependable Software through a Holistic Framework of Tool Interoperability and Artifact Dependency
Luqi; Puett, J. (Monterey, California. Naval Postgraduate School, 2002-07); NPS-SW-02-006Objectives. The goal of this research is to develop a holistic framework for engineering dependable computing and communications software. The framework establishes collaborative mechanisms by which existing software ... -
Trusted Computing Exemplar: Quality Assurance Plan
Clark, Paul C.; Irvine, Cynthia E.; Nguyen, Thuy D. (Monterey, California. Naval Postgraduate School, 2014-12-12); NPS-CAG-14-009This document describes the Life Cycle Management Plan for the development of a high assurance secure product. A high assurance product is one for which its users have a high level of confidence that its security policies ... -
CHARTING PROGRESS IN THE SOFTWARE ACQUISITION PATHWAY
Wahidi, Richard S. (Monterey, CA; Naval Postgraduate School, 2022-12);The Department of the Navy (DON) recently implemented the Department of Defense (DOD) Software Acquisition Pathway (SWP), a software acquisition strategy for custom application and embedded software. The purpose of the SWP ...