Multi-frame convolutional neural networks for object detection in temporal data
MetadataShow full item record
Given the problem of detecting objects in video, existing neural-network solutions rely on a post-processing step to combine information across frames and strengthen conclusions. This technique has been successful for videos with simple, dominant objects but it cannot detect objects if a single frame does not contain enough information to distinguish the object from its background. This problem is especially relevant in the maritime environment, where a whitecap and a human survivor may look identical except for their movement through the scene. In order to evaluate a neural network's ability to combine information across multiple frames of information, we developed two versions of a convolutional neural network: one version was given multiple frames as input while the other version was only provided a single frame. We measured the performance of both versions on the benchmark 3DPeS Dataset and observed a significant increase in both recall and precision when the network was given 10 frames instead of just one.We also developed our own noisy dataset consisting of small birds flying across the Monterey Bay. This dataset contained many instances where, in a single frame, the objects to be detected were indistinguishable from the surrounding waves and debris. For this dataset, multiple frames were essential for reliable detections. We also observed a greater improvement in the false negative rate compared to the false positive rate in this noisier dataset, suggesting that the additional frames were especially useful for improving the detection of hard-to-detect objects.
RightsThis publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.
Showing items related by title, author, creator and subject.
Masek, Theodore (Monterey, California. Naval Postgraduate School, 2008-12);Cost and miniaturization of autonomous unmanned vehicles (AUV) drive component reuse and better sensor data analysis. One such component is the forward looking sonar (FLS) which can be used for obstacle avoidance and to ...
Laielli, Michael J. (Monterey, California. Naval Postgraduate School, 2012-09);This thesis describes an object detection system that extracts and combines appearance information over multiple consecutive video frames, inherently gaining and analyzing information related to motion. Objects that exhibit ...
Lombardo, Charles P. (Monterey, California. Naval Postgraduate School, 1993-09);As virtual world systems continue to evolve, the need exists to embed multimedia information into the world so users can query objects for additional information while maintaining frame rates greater than 15 frames per ...