Understanding of Navy Technical Language via Statistical Parsing
Abstract
A key problem in indexing technical information is the interpretation of technical words and word senses, expressions not used in everyday language. This is important for captions on technical images, whose often pithy descriptions can be valuable to decipher. We describe the natural-language processing for MARIE-2, a natural-language information retrieval system for multimedia captions. Our approach is to provide general tools for lexicon enhancement with the specialized words and word senses, and to learn word usage information (both on word senses and word-sense pairs) from a training corpus with a statistical parser. Innovations of our approach are in statistical inheritance of binary co-occurrence probabilities and in weighting of sentence subsequences. MARIE-2 was trained and tested on 616 captions (with 1009 distinct sentences) from the photograph library of a Navy laboratory. The captions had extensive nominal compounds, code phrases, abbreviations, and acronyms, but few verbs, abstract nouns, conjunctions, and pronouns. Experimental results fit a processing time in seconds of 0.0858n2.876 and a number of tries before finding the best interpretation of 1.809n1.668 where n is the number of words in the sentence. Use of statistics from previous parses definitely helped in reparsing the same sentences, helped accuracy in parsing of new sentences, and did not hurt time to parse new sentences. Word-sense statistics helped dramatically; statistics on word-sense pairs generally helped but not always.
Collections
Related items
Showing items related by title, author, creator and subject.
-
Intelligent information retrieval for a multimedia database using captions.
Guglielmo, Eugene J. (Monterey, California. Naval Postgraduate School, 1992-09);This report describes an intelligent information retrieval system, MARIE, that employs natural language processing techniques for indexing and retrieving multimedia data. Captions describe photographs from the Naval Air ... -
Exploiting captions in retrieval of multimedia data
Rowe, Neil C.; Guglielmo, Eugene J. (Monterey, California. Naval Postgraduate School, 1992-07); NPS-CS-92-011Descriptive natural-language captions can help organize multimedia data. We described our MARIE system that interprets English queries directing the fetch of media objects. it is novel in the extent to which it exploits ... -
Exploiting captions for access to multimedia databases
Rowe, Neil C.; Guglielmo, Eugene J. (Monterey, California. Naval Postgraduate School, 1991-04); NPS-CS-91-012Descriptive captions help organize noncompetitive media. But automated use of captions in retrieval from computerized multimedia databases has not been much examined because it would seem to require significant natural ...