Testing the National Software Reference Library
Rowe, Neil C.
MetadataShow full item record
The National Software Reference Library (NSRL) is an essential data source for forensic investigators, providing in its Reference Data Set (RDS) a set of hash values of known software. However, the NSRL RDS has not previously been tested against a broad spectrum of real-world data. The current work did this using a corpus of 36 million files on 2337 drives from 21 countries. These experiments answered a number of important questions about the NSRL RDS, including what fraction of files it recognizes of different types. NSRL coverage by vendor/product was also tested, finding 51% of the vendor/product names in our corpus had no hash values at all in NSRL. It is shown that coverage or “recall” of the NSRL can be improved with additions from our corpus such as frequently-occurring files and files whose paths were found previously in NSRL with a different hash value. This provided 937,570 new hash values which should be uncontroversial additions to NSRL. Several additional tests investigated the accuracy of the NSRL data. Experiments testing the hash values saw no evidence of errors. Tests of file sizes showed them to be consistent except for a few cases. On the other hand, the product types assigned by NSRL can be disputed, and it failed to recognize any of a sample of virus-infected files. The file names provided by NSRL had numerous discrepancies with the file names found in the corpus, so the discrepancies were categorized; among other things, there were apparent spelling and punctuation errors. Some file names suggest that NSRL hash values were computed on deleted files, not a safe practice. The tests had the secondary benefit of helping identify occasional errors in the metadata obtained from drive imaging on deleted files in our corpus. This research has provided much data useful in improving NSRL and the forensic tools that depend upon it. It also provides a general methodology and software for testing hash sets against corpora.
This paper appeared in the 2012 Digital Forensics Research Workshop (DFRWS 2012), Washington, DC, August 2012.
RightsThis publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.
Showing items related by title, author, creator and subject.
Luqi; Puett, J. (Monterey, California. Naval Postgraduate School, 2002-07); NPS-SW-02-006Objectives. The goal of this research is to develop a holistic framework for engineering dependable computing and communications software. The framework establishes collaborative mechanisms by which existing software ...
Clark, Paul C.; Irvine, Cynthia E.; Nguyen, Thuy D. (Monterey, California. Naval Postgraduate School, 2014-12-12); NPS-CAG-14-009This document describes the Life Cycle Management Plan for the development of a high assurance secure product. A high assurance product is one for which its users have a high level of confidence that its security policies ...
Mack, Patrick V. (Monterey, California. Naval Postgraduate School, 2010-04-30); NPS-AM-10-044The Rapid Integration and Test Environment (RITE) initiative, implemented by the Program Executive Office, Command, Control, Communications, Computers and Intelligence, Command and Control Program Office (PMW-150), was ...