Data mining of extremely large ad hoc data sets to produce inverted indices
Coudray, Aaron D.
MetadataShow full item record
The purpose of this study is to leverage existing Internet-sized ad hoc data sets by creating an inverted index that will enable a robust search capability. In particular, this study is focused on the Common Crawl web corpus. This involves exploring the tools and techniques necessary to effectively traverse this data set, as well as producing the tools to create an inverted index relationship between the terms and websites found within web archive files. The primary tools utilized in this process are Apache Hadoop, Apache MapReduce, Amazon Web Services, and Java. Additionally, methods to enhance this relationship with other information of interest are investigated in this thesis. Specifically, an index was developed that contains the added component of term relative location. This inverted index relationship is an essential component of--and the first step in--creating a robust search capability for a very large ad hoc data set.
Includes supplementary material
RightsThis publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.
Showing items related by title, author, creator and subject.
Harding, Richard Warren (Monterey, California. Naval Postgraduate School, 1981);In a space booster on takeoff, a control system must be employed to prevent the rocket from falling over as it is forced upward by the engines. One accurate dynamic model of the space booster on takeoff is the inverted ...
Denardo, Bruce (Naval Postgraduate School, Monterey, California, 2014-07);This classic demonstration dramatically shows that all bodies in a gravitational field have the same acceleration in the absence of air resistance. A penny and feather are in a closed acrylic tube with a valve. The tube ...
Li, Qiang; Farmer, David M.; Duda, Timothy F.; Ramp, Steven R. (American Meteorological Society, 2009-10);The performance of pressure sensor–equipped inverted echo sounders for monitoring nonlinear internal waves is examined. The inverted echo sounder measures the round-trip acoustic travel time from the sea floor to the sea ...