AS DIRECTED BY CONTEST MANAGER I AM PLACING THIS NOTE HERE SO THAT THIS APP WILL BE ENTERED INTO BOTH OF THE FOLLOWING CATEGORIES:
- Insider Threat and Fraud
"ZIPIT" stand for (Z)ombie, (I)dle, Possible-(I)mpersonating, and (T)respassing forwarders. We had a disastrous upgrade from Splunk v5 to 6.0.1 which zombiefied hundreds of forwarders, bypassing their deploymentclient.conf settings and severing C&C from the Deployment Servers. In the process of identifying these Zombies and reattaching them, we uncovered a plethora of forwarder problems that we were totally unaware existed!
How it works
ZIPIT is a mashup of the following 2 sets of logs that should be searchable from any global Search Head:
- The phonehome events (splunkd_access.log on the DS) which tell us what forwarders are successfully checking in with the DS.
- The metrics log events (splunk_metrics.log on the Indexers) which tell us what forwarders are sending data into the indexers.
Custom searches compare and contrast certain kinds of Forwarder and Indexer activities to identify the following categories of problematic Forwarders:
- (Z)ombies who are properly authorized to be forwarding (and are doing so) but who are no longer being controlled by the DS.
- (I)dle servers who are properly authorized to be forwarding but are not doing so.
- (P)otential-(I)mpersonators who are using the same IP address as another forwarder but with a different hostname (sometimes this is a valid configuration).
- (T)respassers who are not authorized to be forwarding but are.
Challenges I ran into
ZIPIT will probably only work on Splunk forwarders on version 3.3 and later because of changes in the way forwarders log check-ins to DS. We have to make several very reasonable assumptions in order to do this work and as a result there is some possibility of false-positives, but not false negatives.
Accomplishments that I'm proud of
We have improved our best-practices to both prevent these kinds of problems from recurring and also added alerts to automatically notify us if any of them ever does. In the process we have created an elegant app that everyone who uses Splunk can use to get the same benefits and security.
What I learned
Most companies are not monitoring their Splunk forwarders and Indexers as closely as they ought, so not only is there potential for very troublesome accidents like we experienced, but there is also a widespread threat of internally inflicted Denial-of-Service, Data-Loss and Data-Tainting mistakes/attacks against Splunk Indexers.
What's next for ZIPIT
At Splunxter.com, the ZIPIT technology is a key ingredient of our flagship product: the guaranteed Splunk Health-Check & Tune-Up program in which Splunxter's engineers use our proprietary Splunk app, combined with in-house expertise, to detect, catalog, and prioritize a multitude of possible errors, search inefficiencies, and other "unbest practices" that are degrading you Splunk indexer performance or even mis-handling your data which leads directly to incorrect results and erroneous analysis. We then work with your engineers to correct these errors and to create "best-practices" documentation and training in order to reduce the likelihood of any recurrence. Most customers schedule periodic followups to ensure that things continue to run correctly and efficiently in the future. Any company that has been using Splunk for any length of time probably has so many of these problems that this service can pay for itself by delaying or even eliminating large CapEx costs through the rejuvenation of your existing infrastructures as-is.