Publication:
The Islamic State battle plan: press release natural language processing

Loading...
Thumbnail Image
Authors
Friedlein, James R
Subjects
Islamic State Movement
Islamic State of Iraq
ISIS
Islamic State
Natural Language Processing
text mining
corpus
generalized linear model
cascade
R Shiny
leaflet
data visualization
Advisors
Whitaker, Lyn R.
Date of Issue
2016-06
Date
16-Jun
Publisher
Monterey, California: Naval Postgraduate School
Language
Abstract
The purpose of this study is to develop methods to accelerate and enhance the analysis of Islamic State Movement text documents. We analyze a unique database collected by Dr. Craig Whiteside, which is comprised of nearly 3,000 open-source translated press releases from 2003Ð2014. Using Natural Language Processing tools, the text data is aggregated into a corpus and processed based on document term structure and frequency. In order to reduce analyst workload, we validate Whiteside's manual analysis and construct cross-validated generalized linear models to automatically classify documents into one of seven types. A cascade classification model outperforms all other models with a mean cross-validated misclassification rate of 5.71 percent. Islamic State Movement operational summaries are classified as type Celebrate. We develop a layered algorithm based on regular expressions and location searches to extract critical information from each attack event and display the details on a map using a web-based interactive R Shiny application. With the ability to automatically classify Islamic State Movement text documents and visually interact with the data contained within those classified as type Celebrate, analysts and decision makers are able to process and understand large amounts of text data more quickly and effectively.
Type
Thesis
Description
Series/Report No
Department
Operations Research (OR)
Other Units
Identifiers
NPS Report Number
Sponsors
Funder
Format
Citation
Distribution Statement
Approved for public release; distribution is unlimited.
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.
Collections