PREDICTING THE AUTHENTICITY OF CODE-SWITCHED TEXT GENERATED BY A LARGE LANGUAGE MODEL

Loading...
Thumbnail Image
Authors
Horan, Lucas J.
Subjects
large language models
natural language processing
ChatGPT
Indo-Pacific
Japan
Japanese
code-switching
Advisors
Yoshida, Ruriko
Huang, Jefferson
Date of Issue
2023-09
Date
Publisher
Monterey, CA; Naval Postgraduate School
Language
Abstract
Japan is a crucial partner in the U.S. Navy’s effort to remain the premier naval force in an increasingly contested Indo-Pacific region. However, in the current era of generative technologies, like the large language model (LLM) Chat Generative Pre-trained Transformer (ChatGPT), malevolent actors worldwide now possess an unprecedented capability to generate text-based synthetic media able to sow disarray among allies. Consequently, alliances between the United States and its non-English speaking allies, like Japan, can be tested by text-based deepfakes seeking to reinforce their credibility by using the native languages of both countries; fabricated bilingual diplomatic statements, military communiqués, or news articles all possess the potential to upend U.S. global partnerships. Employing the tools of natural language processing (NLP), our research seeks to examine whether we can detect if bilingual text—that which may be created to undermine the relationship between the U.S. and Japan—is “authentic” (that is, human-made) or “inauthentic” (that is, generated by an LLM, namely ChatGPT). We achieved 96% accuracy in our limited trials using logistic regression, with similar results for support vector machine (SVM), k-nearest neighbor (KNN), and naive Bayes models, with each model presenting slightly different misclassifications.
Type
Thesis
Description
Series/Report No
Department
Operations Research (OR)
Organization
Identifiers
NPS Report Number
Sponsors
Funder
Format
Citation
Distribution Statement
Approved for public release. Distribution is unlimited.
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.