Fabricating synthetic data in support of training for domestic terrorist activity data mining research

Download
Author
Lavelle, Stephen J.
Date
2010-09Advisor
Garfinkel, Simson
Second Reader
Dinolt, George
Metadata
Show full item recordAbstract
Data mining is a mature technology, widespread in both government and industry. The proliferation of data storage in public and private sectors has provided more information than can be expediently processed. Data mining provides a means to extract meaningful conclusions from this growing store of data. In the interests of countering criminal and terrorist activity, data mining has become a focus of law enforcement and government agencies. The use of databases containing information on persons may conflict with privacy rights and laws. Gathering public awareness of government data mining programs and databases has been accompanied with concern and investigation of these programs. Following a review of data mining and privacy issues, in 2008 the National Research Council (NRC) recommended any training in development of data mining programs involving personal data be conducted using synthesized data. This thesis seeks to present an underlying discussion of these issues, to include data mining use, a simple data synthesis model for analysis to support the validity of the NRC recommendation, and the associated difficulties encountered in the process. Included is an analysis of the inherent difficulty in creating realistic and useful data.
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.Related items
Showing items related by title, author, creator and subject.
-
Integrated decision technology for acquisition and contracting
Garrison, Roy; Dolk, Daniel; Barreto, Albert (Monterey, California. Naval Postgraduate School, 2008); NPS-AM-08-107Decision technologies in the form of decision-oriented software systems have proliferated dramatically over the past two decades. Most of these systems tend to be stand-alone systems which are focused on a relatively ... -
Randomization testing of machine induced rules
Berry, Eric Dean. (Monterey, California. Naval Postgraduate School, 1995-03);The Department of Defense (DOD) possesses tremendous amounts of data stored in many large databases. Given the size of these databases large scale data analysis tools are required to find previously unknown and ... -
Agent-Based Support for Collaborative Data Mining in Systems Management
Bordetsky, Alex (IEEE, 2001);This paper addresses the issues of structuring and supporting the collaborative data mining process. It extends the technology of multiparticipant decision making support into the data mining process and describes perspective ...