Introduction
Splunk is a very robust tool for digging into data. You can customize it to search a variety of data formats, and using the results you can accomplish many tasks, from producing pie chart reports to generating email alerts.
During a recent project I had the task of building a reporting dashboard reflecting server status. As I did not want to impact the production Splunk system, I spun up a test instance on a QA box.
For those that do not know, Splunk allows you to use the software for free, as long as the amount of data being indexed (flowing in) is less than 500 MB per day. See the downloads page here for more information.
Perfect, I had my own environment. I built my reports, dashboards, alerts, etc., with no impact on the production systems.
The Situation
Due to unforeseen circumstances, I needed to keep this development instance running longer than expected. As I didn’t want to chance hitting the 500 MB limit I decided to block anyone from “accidentally” placing data into it.
Splunk does provide a mechanism for blocking incoming data, but the documentation is not straightforward in explaining how to achieve this, think: “This thing reads like stereo instructions” (youtube).
The solution is not truly blocking data, but ignoring incoming data.
The Steps
Ignoring the incoming data is a two-step process. First you need to tell Splunk which data your interested in, and then you need to tell Splunk what to do with the data.
To accomplish step one, which data, I updated the props.conf file. Within this file I added a ‘stanza’ (or rule) identifying the source of the data. Once this was accomplished, I had to tell Splunk what to do with the data. Using the keyword ‘transform’ within the matching stanza, I could accomplish this.
Here is what my props.conf looked like. The first stanza defined my data, and if a match occurred the data would be transformed by ‘setparsing’. The second stanza matches any other data and transforms it by ‘setdevnull’. Optimize your data management with Splunk – Reach out to Bluefletch today!
props.conf
# My Data -- regular expression to match my data
[source::.\/mydata/foo/bar*]
TRANSFORMS-set = setparsing
# Everything Else -- the source is a catch all
[source::.*]
TRANSFORMS-set = setdevnull
Now for the second step of ignoring the data. This is done within the transforms.conf file. Here you can tell Splunk how to manipulate (or transform) any data. By default, Splunk will index data, but in my case, you can tell it to ignore the data. To ignore data, you must send the data to /dev/null, which Splunk calls ‘nullQueue’.
Here is what my transforms.conf file looked like:
transforms.conf
# Set Parsing, Index the data
[setparsing]
REGEX = .
DEST_KEY = queue
FORMAT = indexQueue
# Set Dev Null
[setdevnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
Notice the DEST_KEY, this tells Splunk we want to deal with data going into the ‘queue’, data that is to be processed/indexed. Then the FORMAT keyword, we set to either ‘indexQueue’ (send it to the indexer) or ‘nullQueue’ (ignore it). Unlock the full potential of Splunk in your organization – Book a consultation!
I tested this by sending in the access log from a web server, and that data was not indexed, WOOT! But my data was not indexed either, OOPS!. What did I do wrong?
The way Splunk works is it processes incoming data against the props.conf file linearly, one stanza at a time. The last stanza “matched” wins. So the last stanza to match was of course “Everything Else”, therefore all data was sent to /dev/null.
The easy fix would be to move my stanza to the end, but that is too easy. Looking through the documentation for props.conf, Splunk has the concept of ‘Priority’, which allows you to prioritize the data matches. I updated my stanza to contain priority=100 for my data.
props.conf
#My Data -- regular expression to match my data
[source::.\/mydata/foo/bar*]
TRANSFORMS-set = setparsing
priority=100
#Everything -- the source is a catch all
[source::.*]
TRANSFORMS-set = setdevnull
Still, my data was getting dropped. The reason, the default priority is 100, so Splunk was matching the rule “Everything” and it’s priority was equal to “My Data”. By adding priority to all the stanza’s, I got the desired result.
props.conf
#My Data -- regular expression to match my data
[source::.\/mydata/foo/bar*]
TRANSFORMS-set = setparsing
priority=100
#Everything -- the source is a catch all
[source::.*]
TRANSFORMS-set = setdevnull
priority=1
Great! Splunk is now indexing my data, and ignoring everything else. I just had to properly prioritize the stanzas.
Conclusion
If your running into data issues with your Splunk instance, one solution you can use is to ignore some of the incoming data. Remember, after you identify the data, properly prioritize it, then transform it. Enhance data insights and security with Splunk – Let Bluefletch show you how!
Happy Splunking.