SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Splunk User Group London
Daniel Hernandez
Dealing with delayed events
Agenda
1. Housekeeping rules & Introductions
2. Time extraction and parsing in Splunk
3. Monitoring delayed events
4. Impact on Splunk workflow
5. Potential risks deriving from delayed events
Housekeeping
Feel free to stand up
and grab a refill.
Splunk brought us
here, pizza keeps us
here.
Ask away! You’re
not interrupting
unless you’re asking
where the exit is.
Join the community,
connect, share.
Reach out if you’d
like to contribute.
Introduction – Daniel Hernandez
‱ Background in Networks and Security.
‱ Splunk SCC1, working with ECS for about a year and a
half in Security.
‱ Currently leveraging Splunk for a SIEM replacement
project in the banking sector.
Time extraction
and parsing
in Splunk
“Time is what keeps everything from
happening at once”
1. Timezones: Time values from different
locations may differ – a lot.
2. Realtime/Batch processing: Sometimes
logs are collected in hourly/daily chunks.
3. Correlation Searches (Rules) and
forensic investigations rely on the
Extracted Time.
Log generation time. Extracted from the log itself. (Extracted Time)
Generated by an Indexer. Event indexing time. (Index Time)
_time
_indextime
First things first!
First of all >
1. Make sure the _time field is extracted correctly!
2. You don’t want to use Splunk to report on
internal network metrics.
3. Time extraction should be transparent for dashboards, alerts, and
reports.
Second of all >
Check your clock skew to monitor any potential delays within the
Splunk infrastructure.
Monitoring
Delayed events
Monitoring clock skew >
Code openly available in GitHub.
Based in SimpleXML and tstats.
Uses moving average (delay) to display clock skew violations.
Symptoms
Events collected from
a forwarder or from a
log file are not yet
searchable on Splunk.
Even though the time
stamps of the events
are within the search
time range, a search
does not return the
events.
Later, a search over
the same time range
returns the events.
Narrowing down the issue:
source=mysource
| eval delay_sec=_indextime-_time
| timechart min(delay_sec) avg(delay_sec) max(delay_sec) by host
source=mysource
| eval delay_sec=_indextime-_time
| timechart min(delay_sec) avg(delay_sec) max(delay_sec) by source
Determine the common denominator between them. For example, all of the
delayed events might be from the same log file or the same host or source type.
Also, compare the delay from your events with the delay from the internal
Splunk logs.
index=_internal source=*splunkd.log*
| eval delay_sec=_indextime-_time
| timechart min(delay_sec) avg(delay_sec) max(delay_sec) by host
Finding the root cause
If some sources are delayed but
not others, this indicates a
problem with the input.
Thruput limits
Network limits
Time zone issue
Windows event logs delay
If all the logs are delayed,
including the internal logs, then
the delay is a forwarding issue.
Data Pipeline
At a very high level:
‱ Parsing Queue/Pipeline
Responsible for source typing,
break-lining, time stamping, event
boundaries, regex.
‱ Indexing Queue/Pipeline
Event segmentation and indexing,
index building.
Splunk Admin 101:
There’s a lot that can go wrong.
You’ll find A LOT of creative ways
to trash your data pipeline in a
shared environment.
Avoid (trouble)shooting yourself in the foot >
Make sure your Management Console is set up appropriately!
You want to keep a close eye on your queues, and spot any potential
bottlenecks.
What will go wrong?
Congratulations, you’ve found severe delays in your Splunk infrastructure.
What can you expect?
Inconsistent dashboards.
Inconsistent reports.
And different results each time
they’re run.
You know that saved searches
can’t be re-run.
So your funky real-time
correlation searches are going to
miss events.
What can you do?
‱ Focus on _indextime when writing real-time correlation searches:
index=funky_index _index_earliest=-1h@h _index_latest=now()
| <my_funky_correlation_search>
‱ The scheduled saved search will capture events as they’re indexed.
‱ Events will appear delayed but they won’t be missed by the alert.
‱ Chase other teams to get it fixed ASAP! Team-work brings it home.
What does this mean in a SIEM?
Question time: fire away!

Weitere Àhnliche Inhalte

Was ist angesagt?

SplunkLive! Getting Started with Splunk Enterprise
SplunkLive! Getting Started with Splunk EnterpriseSplunkLive! Getting Started with Splunk Enterprise
SplunkLive! Getting Started with Splunk Enterprise
Splunk
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 

Was ist angesagt? (20)

Log analysis using elk
Log analysis using elkLog analysis using elk
Log analysis using elk
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWS
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Odoo Online platform: architecture and challenges
Odoo Online platform: architecture and challengesOdoo Online platform: architecture and challenges
Odoo Online platform: architecture and challenges
 
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric Liang
 
Evolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesEvolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best Practices
 
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
 
[á„‹á…©á„‘á…łá†«á„‰á…©á„‰á…łá„á…„á†«á„‰á…„á†Żá„á…”á†Œ]openstack_monitoring_session
[á„‹á…©á„‘á…łá†«á„‰á…©á„‰á…łá„á…„á†«á„‰á…„á†Żá„á…”á†Œ]openstack_monitoring_session[á„‹á…©á„‘á…łá†«á„‰á…©á„‰á…łá„á…„á†«á„‰á…„á†Żá„á…”á†Œ]openstack_monitoring_session
[á„‹á…©á„‘á…łá†«á„‰á…©á„‰á…łá„á…„á†«á„‰á…„á†Żá„á…”á†Œ]openstack_monitoring_session
 
Graylog
GraylogGraylog
Graylog
 
Spark SQL Bucketing at Facebook
 Spark SQL Bucketing at Facebook Spark SQL Bucketing at Facebook
Spark SQL Bucketing at Facebook
 
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
 
Getting Started with Splunk
Getting Started with SplunkGetting Started with Splunk
Getting Started with Splunk
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
 
Best Practices for Forwarder Hierarchies
Best Practices for Forwarder HierarchiesBest Practices for Forwarder Hierarchies
Best Practices for Forwarder Hierarchies
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Atlanta ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Atlanta ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Atlanta ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Atlanta ...
 
SplunkLive! Getting Started with Splunk Enterprise
SplunkLive! Getting Started with Splunk EnterpriseSplunkLive! Getting Started with Splunk Enterprise
SplunkLive! Getting Started with Splunk Enterprise
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 

Ähnlich wie Dealing with delayed events in Splunk

Splunk .conf2011: Real Time Alerting and Monitoring
Splunk .conf2011: Real Time Alerting and MonitoringSplunk .conf2011: Real Time Alerting and Monitoring
Splunk .conf2011: Real Time Alerting and Monitoring
Erin Sweeney
 
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaSolr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Lucidworks
 

Ähnlich wie Dealing with delayed events in Splunk (20)

Intrusion Detection and Discovery via Log Correlation to support HIPAA Securi...
Intrusion Detection and Discovery via Log Correlation to support HIPAA Securi...Intrusion Detection and Discovery via Log Correlation to support HIPAA Securi...
Intrusion Detection and Discovery via Log Correlation to support HIPAA Securi...
 
Monitoring Splunk: S.o.S, DMC, and Beyond Breakout Session
Monitoring Splunk: S.o.S, DMC, and Beyond Breakout SessionMonitoring Splunk: S.o.S, DMC, and Beyond Breakout Session
Monitoring Splunk: S.o.S, DMC, and Beyond Breakout Session
 
Virtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisVirtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log Analysis
 
Splunk .conf2011: Real Time Alerting and Monitoring
Splunk .conf2011: Real Time Alerting and MonitoringSplunk .conf2011: Real Time Alerting and Monitoring
Splunk .conf2011: Real Time Alerting and Monitoring
 
The Beam Vision for Portability: "Write once run anywhere"
The Beam Vision for Portability: "Write once run anywhere"The Beam Vision for Portability: "Write once run anywhere"
The Beam Vision for Portability: "Write once run anywhere"
 
PSUG 1 - 2024-01-22 - Onboarding Best Practices
PSUG 1 - 2024-01-22 - Onboarding Best PracticesPSUG 1 - 2024-01-22 - Onboarding Best Practices
PSUG 1 - 2024-01-22 - Onboarding Best Practices
 
Kqueue : Generic Event notification
Kqueue : Generic Event notificationKqueue : Generic Event notification
Kqueue : Generic Event notification
 
(130511) #fitalk network forensics and its role and scope
(130511) #fitalk   network forensics and its role and scope(130511) #fitalk   network forensics and its role and scope
(130511) #fitalk network forensics and its role and scope
 
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaSolr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
 
Distributed tracing - get a grasp on your production
Distributed tracing - get a grasp on your productionDistributed tracing - get a grasp on your production
Distributed tracing - get a grasp on your production
 
Advanced Log Processing
Advanced Log ProcessingAdvanced Log Processing
Advanced Log Processing
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed Database
 
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDB
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
 
The Incremental Path to Observability
The Incremental Path to ObservabilityThe Incremental Path to Observability
The Incremental Path to Observability
 
Lab streaming layer
Lab streaming layerLab streaming layer
Lab streaming layer
 
Threat detection with 0 cost
Threat detection with 0 costThreat detection with 0 cost
Threat detection with 0 cost
 
Dev opsdays 2018 - Observability, the practical approach
Dev opsdays 2018 - Observability, the practical approachDev opsdays 2018 - Observability, the practical approach
Dev opsdays 2018 - Observability, the practical approach
 

KĂŒrzlich hochgeladen

KĂŒrzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Dealing with delayed events in Splunk

  • 1. Splunk User Group London Daniel Hernandez Dealing with delayed events
  • 2. Agenda 1. Housekeeping rules & Introductions 2. Time extraction and parsing in Splunk 3. Monitoring delayed events 4. Impact on Splunk workflow 5. Potential risks deriving from delayed events
  • 3. Housekeeping Feel free to stand up and grab a refill. Splunk brought us here, pizza keeps us here. Ask away! You’re not interrupting unless you’re asking where the exit is. Join the community, connect, share. Reach out if you’d like to contribute.
  • 4. Introduction – Daniel Hernandez ‱ Background in Networks and Security. ‱ Splunk SCC1, working with ECS for about a year and a half in Security. ‱ Currently leveraging Splunk for a SIEM replacement project in the banking sector.
  • 6. “Time is what keeps everything from happening at once” 1. Timezones: Time values from different locations may differ – a lot. 2. Realtime/Batch processing: Sometimes logs are collected in hourly/daily chunks. 3. Correlation Searches (Rules) and forensic investigations rely on the Extracted Time. Log generation time. Extracted from the log itself. (Extracted Time) Generated by an Indexer. Event indexing time. (Index Time) _time _indextime
  • 7. First things first! First of all > 1. Make sure the _time field is extracted correctly! 2. You don’t want to use Splunk to report on internal network metrics. 3. Time extraction should be transparent for dashboards, alerts, and reports. Second of all > Check your clock skew to monitor any potential delays within the Splunk infrastructure.
  • 9. Monitoring clock skew > Code openly available in GitHub. Based in SimpleXML and tstats. Uses moving average (delay) to display clock skew violations.
  • 10. Symptoms Events collected from a forwarder or from a log file are not yet searchable on Splunk. Even though the time stamps of the events are within the search time range, a search does not return the events. Later, a search over the same time range returns the events.
  • 11.
  • 12. Narrowing down the issue: source=mysource | eval delay_sec=_indextime-_time | timechart min(delay_sec) avg(delay_sec) max(delay_sec) by host source=mysource | eval delay_sec=_indextime-_time | timechart min(delay_sec) avg(delay_sec) max(delay_sec) by source Determine the common denominator between them. For example, all of the delayed events might be from the same log file or the same host or source type. Also, compare the delay from your events with the delay from the internal Splunk logs. index=_internal source=*splunkd.log* | eval delay_sec=_indextime-_time | timechart min(delay_sec) avg(delay_sec) max(delay_sec) by host
  • 13. Finding the root cause If some sources are delayed but not others, this indicates a problem with the input. Thruput limits Network limits Time zone issue Windows event logs delay If all the logs are delayed, including the internal logs, then the delay is a forwarding issue.
  • 14. Data Pipeline At a very high level: ‱ Parsing Queue/Pipeline Responsible for source typing, break-lining, time stamping, event boundaries, regex. ‱ Indexing Queue/Pipeline Event segmentation and indexing, index building. Splunk Admin 101: There’s a lot that can go wrong. You’ll find A LOT of creative ways to trash your data pipeline in a shared environment.
  • 15. Avoid (trouble)shooting yourself in the foot > Make sure your Management Console is set up appropriately! You want to keep a close eye on your queues, and spot any potential bottlenecks.
  • 16. What will go wrong? Congratulations, you’ve found severe delays in your Splunk infrastructure. What can you expect? Inconsistent dashboards. Inconsistent reports. And different results each time they’re run. You know that saved searches can’t be re-run. So your funky real-time correlation searches are going to miss events.
  • 17. What can you do? ‱ Focus on _indextime when writing real-time correlation searches: index=funky_index _index_earliest=-1h@h _index_latest=now() | <my_funky_correlation_search> ‱ The scheduled saved search will capture events as they’re indexed. ‱ Events will appear delayed but they won’t be missed by the alert. ‱ Chase other teams to get it fixed ASAP! Team-work brings it home.
  • 18. What does this mean in a SIEM?