Naval Postgraduate School
Dudley Knox Library
NPS Dudley Knox Library
View Item 
  •   Calhoun Home
  • Theses and Dissertations
  • 1. Thesis and Dissertation Collection, all items
  • View Item
  •   Calhoun Home
  • Theses and Dissertations
  • 1. Thesis and Dissertation Collection, all items
  • View Item
  • How to search in Calhoun
  • My Accounts
  • Ask a Librarian
JavaScript is disabled for your browser. Some features of this site may not work without it.

Browse

All of CalhounCollectionsThis Collection

My Account

LoginRegister

Statistics

Most Popular ItemsStatistics by CountryMost Popular Authors

Data mining of extremely large ad-hoc data sets to produce reverse web-link graphs

Thumbnail
Download
Icon17Mar_Chang_Tao-hsiang.pdf (1.481Mb)
Download Record
Download to EndNote/RefMan (RIS)
Download to BibTex
Author
Chang, Tao-hsiang
Date
2017-03
Advisor
Kragh, Frank
Xie, Geoffrey
Metadata
Show full item record
Abstract
Data mining can be a valuable tool, particularly in the acquisition of military intelligence. As the second study within a larger Naval Postgraduate School research project using Amazon Web Services (AWS), this thesis focuses on data mining on a very large data set (32 TB) with the open web crawler data set Common Crawl. Similar to previous studies, this research employs MapReduce (MR) for sorting and categorizing output value pairs. Our research, however, is the first to implement the basic Reverse Web-Link Graph (RWLG) algorithm as a search capability for web sites, with validation that it works correctly. A second goal is to extend the RWLG algorithm using a full Common Crawl archive as input for processing as a single MR job. To mitigate the out-of-memory error, we relate some environment variables with the Yet Another Resource Negotiator (YARN) architecture and provide some sample error tracking methods. As a further contribution, this study considers limitations associated with using AWS, which inform our recommendations for future work.
Rights
Copyright is reserved by the copyright owner.
URI
http://hdl.handle.net/10945/52962
Collections
  • 1. Thesis and Dissertation Collection, all items
NPS Dudley Knox LibraryDUDLEY KNOX LIBRARY
Feedback

411 Dyer Rd. Bldg. 339
Monterey, CA 93943
circdesk@nps.edu
(831) 656-2947
DSN 756-2947

    Federal Depository Library      


Start Your Research

Research Guides
Academic Writing
Ask a Librarian
Copyright at NPS
Graduate Writing Center
How to Cite
Library Liaisons
Research Tools
Thesis Processing Office

Find & Download

Databases List
Articles, Books & More
NPS Theses
NPS Faculty Publications: Calhoun
Journal Titles
Course Reserves

Use the Library

My Accounts
Request Article or Book
Borrow, Renew, Return
Tech Help
Remote Access
Workshops & Tours

For Faculty & Researchers
For International Students
For Alumni

Print, Copy, Scan, Fax
Rooms & Study Spaces
Floor Map
Computers & Software
Adapters, Lockers & More

Collections

NPS Archive: Calhoun
Restricted Resources
Special Collections & Archives
Federal Depository
Homeland Security Digital Library

About

Hours
Library Staff
About Us
Special Exhibits
Policies
Our Affiliates
Visit Us

NPS-Licensed Resources—Terms & Conditions
Copyright Notice

Naval Postgraduate School

Naval Postgraduate School
1 University Circle, Monterey, CA 93943
Driving Directions | Campus Map

This is an official U.S. Navy Website |  Please read our Privacy Policy Notice  |  FOIA |  Section 508 |  No FEAR Act |  Whistleblower Protection |  Copyright and Accessibility |  Contact Webmaster

Export search results

The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

A logged-in user can export up to 15000 items. If you're not logged in, you can export no more than 500 items.

To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.