Splunk is a very robust tool for digging into data. You can customize it to search a variety of data formats, and using the results you can accomplish many tasks, from producing pie chart reports to generating email alerts.
During a recent project I had the task of building a reporting dashboard reflecting server status. As I did not want to impact the production Splunk system, I spun up a test instance on a QA box.
For those that do not know, Splunk allows you to use the software for free, as long as the amount of data being indexed (flowing in) is less than 500 MB per day. See the downloads page here for more information.
Perfect, I had my own environment. I built my reports, dashboards, alerts, etc., with no impact on the production systems.
Due to unforeseen circumstances, I needed to keep this development instance running longer than expected. As I didn’t want to chance hitting the 500 MB limit I decided to block anyone from “accidentally” placing data into it.
Splunk does provide a mechanism for blocking incoming data, but the documentation is not straightforward in explaining how to achieve this, think: “This thing reads like stereo instructions” (youtube).
The solution is not truly blocking data, but ignoring incoming data.
Ignoring the incoming data is a two-step process. First you need to tell Splunk which data your interested in, and then you need to tell Splunk what to do with the data.
To accomplish step one, which data, I updated the props.conf file. Within this file I added a ‘stanza’ (or rule) identifying the source of the data. Once this was accomplished, I had to tell Splunk what to do with the data. Using the keyword ‘transform’ within the matching stanza, I could accomplish this.
Here is what my props.conf looked like. The first stanza defined my data, and if a match occurred the data would be transformed by ‘setparsing’. The second stanza matches any other data and transforms it by ‘setdevnull’.
Now for the second step of ignoring the data. This is done within the transforms.conf file. Here you can tell Splunk how to manipulate (or transform) any data. By default, Splunk will index data, but in my case, you can tell it to ignore the data. To ignore data, you must send the data to /dev/null, which Splunk calls ‘nullQueue’.
Here is what my transforms.conf file looked like:
Notice the DEST_KEY, this tells Splunk we want to deal with data going into the ‘queue’, data that is to be processed/indexed. Then the FORMAT keyword, we set to either ‘indexQueue’ (send it to the indexer) or ‘nullQueue’ (ignore it).
I tested this by sending in the access log from a web server, and that data was not indexed, WOOT! But my data was not indexed either, OOPS!. What did I do wrong?
The way Splunk works is it processes incoming data against the props.conf file linearly, one stanza at a time. The last stanza “matched” wins. So the last stanza to match was of course “Everything Else”, therefore all data was sent to /dev/null.
The easy fix would be to move my stanza to the end, but that is too easy. Looking through the documentation for props.conf, Splunk has the concept of ‘Priority’, which allows you to prioritize the data matches. I updated my stanza to contain priority=100 for my data.
Still, my data was getting dropped. The reason, the default priority is 100, so Splunk was matching the rule “Everything” and it’s priority was equal to “My Data”. By adding priority to all the stanza’s, I got the desired result.
Great! Splunk is now indexing my data, and ignoring everything else. I just had to properly prioritize the stanzas.
If your running into data issues with your Splunk instance, one solution you can use is to ignore some of the incoming data. Remember, after you identify the data, properly prioritize it, then transform it.