Splunk

I am 3 weeks in, while working on Splunk and this post is to gather my thoughts about the experience in learning and using this tool so far.

There are times when you really want to make sense of the data, and that’s when reporting and dashboarding utilities come into picture. Everything is literally rainbows and sunshine till the time data to be processed is within normal computer's limits. But what if you are assigned a task to interpret huge amount of data generated from so many different sources. Casually think about it - there is machine data - that alone makes it huge - then there is data generated in the form of system logs - this data is unstructured and structured - but structured data with various formats of various sources, is again unstructured in case when you want to combine them - forget interpretation and visualisation, first question is how will you segregate it?

Data is the new oil - they say. I mean, that should have been obvious till now as we see the emergence of so many data analytics and processing tools and technologies being evolved. Over the period customers have realised the value of verbose so-called-garbage data as this data generates from the IT infrastructure which supports their businesses and this IT investment often is the cornerstone of their business operations. Analysing the logs, gives critical insights in such a way that enables the technology better align with business goals and vision.

Splunk, does just that. In my naive experience I have known it as a tool which has the power to process structured and unstructured data to generate interesting statistics and feeds which can be used to take proactive actions before the next blunder happens. For the blunders which have already happened, it can very well accelerate the process of remediation. Splunk acts on the data in a very smart and innovative way. It gives data a unified structure by indexing it. Doing this, helps in returning results for our queries very quickly. Based on this indexed data, it is intelligent enough to already identify the hosts, sources and source types along with several other paramaters which again it itself prioritizes based on the occurances - even without you asking for it. Thus, I think it gives you a great starting point to deep dive into it.

Splunk has its own query language known as SPL - Search Processing Language, which by the way is great. There is a good amount of dev community who take pride in being an expert in it. And why shouldn’t they be? They are the translators who listen to the server farm garbage and whisper the trade secrets in your ears. Not everyone can do that. There is also the dev community who build absolutely beautiful visualisation templates to represent the data in a manner which makes sense to our brains without much processing. Splunk has huge app store and it contains variously themed add-ons built for various types of tools out there to collect data in better ways.

At it’s very basic, Splunk has 3 main components -

  1. Indexers - they index the data.
  2. Forwarders - they are the agents which can be installed on any machine and forward the data to the Splunk indexers.
  3. Search heads - for us apes to run search queries.

The actual implementation of Splunk may contain implementation of multiple indexers, forwarders to address load balencing and DR issues.

“And above all, watch with glittering eyes the whole world around you because the greatest secrets are always hidden in the most unlikely places. Those who don’t believe in magic will never find it.” - Roald Dahl

This tool has piqued my curiosity and I can’t wait to explore the possibilities in using it in my day to day life.