Dealing with delayed events in Splunk

Splunk User Group London
Daniel Hernandez
Dealing with delayed events

Agenda
1. Housekeeping rules & Introductions
2. Time extraction and parsing in Splunk
3. Monitoring delayed events
4. Impact on Splunk workflow
5. Potential risks deriving from delayed events

Housekeeping
Feel free to stand up
and grab a refill.
Splunk brought us
here, pizza keeps us
here.
Ask away! You’re
not interrupting
unless you’re asking
where the exit is.
Join the community,
connect, share.
Reach out if you’d
like to contribute.

Introduction – Daniel Hernandez
• Background in Networks and Security.
• Splunk SCC1, working with ECS for about a year and a
half in Security.
• Currently leveraging Splunk for a SIEM replacement
project in the banking sector.

Time extraction
and parsing
in Splunk

“Time is what keeps everything from
happening at once”
1. Timezones: Time values from different
locations may differ – a lot.
2. Realtime/Batch processing: Sometimes
logs are collected in hourly/daily chunks.
3. Correlation Searches (Rules) and
forensic investigations rely on the
Extracted Time.
Log generation time. Extracted from the log itself. (Extracted Time)
Generated by an Indexer. Event indexing time. (Index Time)
_time
_indextime

First things first!
First of all >
1. Make sure the _time field is extracted correctly!
2. You don’t want to use Splunk to report on
internal network metrics.
3. Time extraction should be transparent for dashboards, alerts, and
reports.
Second of all >
Check your clock skew to monitor any potential delays within the
Splunk infrastructure.

Monitoring clock skew >
Code openly available in GitHub.
Based in SimpleXML and tstats.
Uses moving average (delay) to display clock skew violations.

Symptoms
Events collected from
a forwarder or from a
log file are not yet
searchable on Splunk.
Even though the time
stamps of the events
are within the search
time range, a search
does not return the
events.
Later, a search over
the same time range
returns the events.

Narrowing down the issue:
source=mysource
| eval delay_sec=_indextime-_time
| timechart min(delay_sec) avg(delay_sec) max(delay_sec) by host
source=mysource
| timechart min(delay_sec) avg(delay_sec) max(delay_sec) by source
Determine the common denominator between them. For example, all of the
delayed events might be from the same log file or the same host or source type.
Also, compare the delay from your events with the delay from the internal
Splunk logs.
index=_internal source=*splunkd.log*
| timechart min(delay_sec) avg(delay_sec) max(delay_sec) by host

Finding the root cause
If some sources are delayed but
not others, this indicates a
problem with the input.
Thruput limits
Network limits
Time zone issue
Windows event logs delay
If all the logs are delayed,
including the internal logs, then
the delay is a forwarding issue.

Data Pipeline
At a very high level:
• Parsing Queue/Pipeline
Responsible for source typing,
break-lining, time stamping, event
boundaries, regex.
• Indexing Queue/Pipeline
Event segmentation and indexing,
index building.
Splunk Admin 101:
There’s a lot that can go wrong.
You’ll find A LOT of creative ways
to trash your data pipeline in a
shared environment.

Avoid (trouble)shooting yourself in the foot >
Make sure your Management Console is set up appropriately!
You want to keep a close eye on your queues, and spot any potential
bottlenecks.

What will go wrong?
Congratulations, you’ve found severe delays in your Splunk infrastructure.
What can you expect?
Inconsistent dashboards.
Inconsistent reports.
And different results each time
they’re run.
You know that saved searches
can’t be re-run.
So your funky real-time
correlation searches are going to
miss events.

What can you do?
• Focus on _indextime when writing real-time correlation searches:
index=funky_index _index_earliest=-1h@h _index_latest=now()
| <my_funky_correlation_search>
• The scheduled saved search will capture events as they’re indexed.
• Events will appear delayed but they won’t be missed by the alert.
• Chase other teams to get it fixed ASAP! Team-work brings it home.

What does this mean in a SIEM?

Dealing with delayed events in Splunk

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Dealing with delayed events in Splunk

Ähnlich wie Dealing with delayed events in Splunk (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Dealing with delayed events in Splunk