Show simple item record

dc.contributor.advisorKragh, Frank
dc.contributor.advisorXie, Geoffrey
dc.contributor.authorChang, Tao-hsiang
dc.dateMar-17
dc.date.accessioned2017-05-10T16:31:27Z
dc.date.available2017-05-10T16:31:27Z
dc.date.issued2017-03
dc.identifier.urihttp://hdl.handle.net/10945/52962
dc.descriptionApproved for public release; distribution is unlimiteden_US
dc.description.abstractData mining can be a valuable tool, particularly in the acquisition of military intelligence. As the second study within a larger Naval Postgraduate School research project using Amazon Web Services (AWS), this thesis focuses on data mining on a very large data set (32 TB) with the open web crawler data set Common Crawl. Similar to previous studies, this research employs MapReduce (MR) for sorting and categorizing output value pairs. Our research, however, is the first to implement the basic Reverse Web-Link Graph (RWLG) algorithm as a search capability for web sites, with validation that it works correctly. A second goal is to extend the RWLG algorithm using a full Common Crawl archive as input for processing as a single MR job. To mitigate the out-of-memory error, we relate some environment variables with the Yet Another Resource Negotiator (YARN) architecture and provide some sample error tracking methods. As a further contribution, this study considers limitations associated with using AWS, which inform our recommendations for future work.en_US
dc.description.urihttp://archive.org/details/datminingofextre1094552962
dc.publisherMonterey, California: Naval Postgraduate Schoolen_US
dc.rightsCopyright is reserved by the copyright owner.en_US
dc.titleData mining of extremely large ad-hoc data sets to produce reverse web-link graphsen_US
dc.typeThesisen_US
dc.contributor.departmentElectrical and Computer Engineering (ECE)
dc.subject.authorAmazon Web Servicesen_US
dc.subject.authorcluster computingen_US
dc.subject.authordata miningen_US
dc.subject.authorHadoop MapReduceen_US
dc.subject.authorthe Common Crawlen_US
dc.description.serviceLieutenant, Taiwan Navyen_US
etd.thesisdegree.nameMaster of Science in Electrical Engineeringen_US
etd.thesisdegree.nameMaster of Science in Computer Scienceen_US
etd.thesisdegree.levelMastersen_US
etd.thesisdegree.disciplineElectrical Engineering and Computer Scienceen_US
etd.thesisdegree.grantorNaval Postgraduate Schoolen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record