Log analysis using Splunk Hadoop Connect
Rowe, Neil C.
MetadataShow full item record
The purpose of this research it to use Splunk and Hadoop to do timestamp analysis on computer logs. Splunk is a commercial data analytics tool. Hadoop is a system for large-scale distributed storage and processing. This research ingested computer logs from two kinds of forensic data from the Real Data Corpus to establish a baseline and find anomalies. We analyzed timestamps and EventIDs on more than two thousand logs across hundreds of drives. Additionally, we used packet captures from Center for Applied Internet Data Analysis to test Hadoop's ability to store and transfer data between Hadoop Distributed File System and Splunk. We used Splunk Hadoop Connect for data transfer between a Splunk server and a Hadoop cluster. Splunk was able to effectively identify and represent statistical anomalies in log files. These anomalies could reveal misconfiguration, security concerns, or unusual but harmless traffic. Splunk could also easily transfer data to relatively inexpensive commodity servers using Splunk Hadoop Connect.
Approved for public release; distribution is unlimited
Showing items related by title, author, creator and subject.
Nguyen, Thuy D.; Gondree, Mark A.; Khosalim, Jean; Irvine, Cynthia E. (2013);The Apache™ Hadoop® framework provides parallel processing and distributed data storage capabilities that data analytics applications can utilize to process massive sets of raw data. These Big Data applications ...
George, Johnu; Chen, Chien-An; Stoleru, Radu; Xie, Geoffrey (2016);The new generations of mobile devices have high processing power and storage, but they lag behind in terms of software systems for big data storage and processing. Hadoop is a scalable platform that provides distributed ...
George, Johnu; Chen, Chien-An; Stoleru, Radu; Xie, Geoffrey G.; Sookoor, Tamim; Bruno, David (IEEE, 2014-12-01);We envision a future where real-time computation on the battlefield provides the tactical advantage to an Army over its adversary. The ability to collect and process large amounts of data to provide actionable information ...