Log analysis using Splunk Hadoop Connect
Rowe, Neil C.
MetadataShow full item record
The purpose of this research it to use Splunk and Hadoop to do timestamp analysis on computer logs. Splunk is a commercial data analytics tool. Hadoop is a system for large-scale distributed storage and processing. This research ingested computer logs from two kinds of forensic data from the Real Data Corpus to establish a baseline and find anomalies. We analyzed timestamps and EventIDs on more than two thousand logs across hundreds of drives. Additionally, we used packet captures from Center for Applied Internet Data Analysis to test Hadoop's ability to store and transfer data between Hadoop Distributed File System and Splunk. We used Splunk Hadoop Connect for data transfer between a Splunk server and a Hadoop cluster. Splunk was able to effectively identify and represent statistical anomalies in log files. These anomalies could reveal misconfiguration, security concerns, or unusual but harmless traffic. Splunk could also easily transfer data to relatively inexpensive commodity servers using Splunk Hadoop Connect.
Approved for public release; distribution is unlimited
Showing items related by title, author, creator and subject.
Nguyen, Thuy D.; Gondree, Mark A.; Khosalim, Jean; Irvine, Cynthia E. (2013);The Apache™ Hadoop® framework provides parallel processing and distributed data storage capabilities that data analytics applications can utilize to process massive sets of raw data. These Big Data applications ...
George, Johnu; Chen, Chien-An; Stoleru, Radu; Xie, Geoffrey (2016);The new generations of mobile devices have high processing power and storage, but they lag behind in terms of software systems for big data storage and processing. Hadoop is a scalable platform that provides distributed ...
Brida, Benjamin J. (Monterey, CA; Naval Postgraduate School, 2018-06);Conventional single node packet analyzers are unable to monitor network traffic at scale. In this thesis, elements of the Apache Hadoop ecosystem, including HBase, Spark, and MapReduce, are employed to conduct network ...