Show simple item record

dc.contributor.authorRowe, Neil C.
dc.dateAugust 2012
dc.date.accessioned2013-10-08T18:22:10Z
dc.date.available2013-10-08T18:22:10Z
dc.date.issued2012-08
dc.identifier.citation2012 Digital Forensics Research Workshop (DFRWS 2012), Washington, DC, August 2012.
dc.identifier.urihttp://hdl.handle.net/10945/36827
dc.descriptionThis paper appeared in the 2012 Digital Forensics Research Workshop (DFRWS 2012), Washington, DC, August 2012.en_US
dc.description.abstractThe National Software Reference Library (NSRL) is an essential data source for forensic investigators, providing in its Reference Data Set (RDS) a set of hash values of known software. However, the NSRL RDS has not previously been tested against a broad spectrum of real-world data. The current work did this using a corpus of 36 million files on 2337 drives from 21 countries. These experiments answered a number of important questions about the NSRL RDS, including what fraction of files it recognizes of different types. NSRL coverage by vendor/product was also tested, finding 51% of the vendor/product names in our corpus had no hash values at all in NSRL. It is shown that coverage or “recall” of the NSRL can be improved with additions from our corpus such as frequently-occurring files and files whose paths were found previously in NSRL with a different hash value. This provided 937,570 new hash values which should be uncontroversial additions to NSRL. Several additional tests investigated the accuracy of the NSRL data. Experiments testing the hash values saw no evidence of errors. Tests of file sizes showed them to be consistent except for a few cases. On the other hand, the product types assigned by NSRL can be disputed, and it failed to recognize any of a sample of virus-infected files. The file names provided by NSRL had numerous discrepancies with the file names found in the corpus, so the discrepancies were categorized; among other things, there were apparent spelling and punctuation errors. Some file names suggest that NSRL hash values were computed on deleted files, not a safe practice. The tests had the secondary benefit of helping identify occasional errors in the metadata obtained from drive imaging on deleted files in our corpus. This research has provided much data useful in improving NSRL and the forensic tools that depend upon it. It also provides a general methodology and software for testing hash sets against corpora.en_US
dc.publisherMonterey, California. Naval Postgraduate Schoolen_US
dc.rightsThis publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.en_US
dc.titleTesting the National Software Reference Libraryen_US
dc.typeConference Paperen_US
dc.subject.authorNSRLen_US
dc.subject.authorforensicsen_US
dc.subject.authorfilesen_US
dc.subject.authorhash valuesen_US
dc.subject.authorcoverageen_US
dc.subject.authoraccuracyen_US
dc.subject.authorextensionsen_US
dc.subject.authordirectoriesen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record