SlideShare a Scribd company logo
1 of 40
Download to read offline
Mining the event storm
Vladik Romanovsky
Engineer
The Anatomy of an Action
Engineer
Gordon Chung
OpenStack is a wonderful place
when you use OpenStack you might see this
WTF???
if you’re lucky, you might find the real error!
[instance: e7933ceb-d1e7-42fe-9f37-d275ebd375bd] Instance failed to spawn
Traceback (most recent call last):
...
...
ProcessExecutionError: Unexpected error while running command.
Command: qemu-img convert -O raw
/opt/stack/data/nova/instances/_base/7434c85f2968d2cfb05b07d8c769d7d938cec5e
8.part
/opt/stack/data/nova/instances/_base/7434c85f2968d2cfb05b07d8c769d7d938cec5e
8.converted
Exit code: 1
Stdout: u''
Stderr: u'qemu-img: error while reading sector 0: Input/output errorn'
Debugging be Hard
• actions consists of multiple steps
• asynchronous calls that can cause
timing issues
• distributed nature of OpenStack
can make it difficult to debug
• parsing log files are easy -- if you’re
a robot
Use Case: Creating an Instance
Creating an Instance
api conductor scheduler
compute
manager
build
network
build
storage
start
guest
Creating an Instance
api conductor scheduler
compute
manager
build
network
build
storage
start
guest
FAIL HERE
Creating an Instance
api conductor scheduler
compute
manager
build
network
build
storage
start
guest
FAIL HERE
Creating an Instance
conductor scheduler
compute
manager
build
network
build
storage
start
guest
api
FAIL HERE
Creating an Instance
api conductor scheduler
compute
manager
build
network
build
storage
start
guest
notification bus
Creating an Instance
conductor scheduler
compute
manager
build
network
build
storage
start
guest
api
FAIL HERE
notification bus
OpenStack Events
• most services emit notifications for some discrete events
• the content of notification represent that state of the
environment, resource, etc… at the point in time
• notifications are defined by a type to describe content
• nova: compute.instance.create.*, scheduler.create_volume
• neutron: port.create.*, network.create.*
• cinder: volume.detach.*, volume.create.*
• keystone: identity.user.*, identity.project.*
• and a lot more...
Creating an Instance
api conductor scheduler
host
manager
build
network
build
storage
start
guest
notification bus
consumer?
Ceilometer
• telemetry project in OpenStack
• notification agent which consumes messages
• listens to the queues of each OpenStack service
• picks specific measurement values from notifications and
builds meters
but wait, there’s more!
every notification is also captured
as an Event
Creating an Instance
api conductor scheduler
host
manager
build
network
build
storage
start
guest
notification bus
ceilometer notification
agent
Meters Events
Ceilometer Events
• initially implemented in Icehouse (part of StackTach
integration)
• an Event represents the state of an object in an OpenStack
service at a point in time.
• built from INFO and ERROR level notifications emitted by
all services
• ability to normalise messages by mapping key attributes
from notification messages to a common name
Ceilometer Event Model
• message id
• event type
• timestamp
• traits
• queryable, indexed
attributes
• ie. payload.x.y.z => attr1
• raw
• full notification
Ceilometer Event Processing
• all events are forced through
pipelines
• events can be published to
multiple targets
• database
• file
• queue
• http
Benefits of Centralised Events
• potential lost of data if logging locally
• normalisation of data
• event flow across services gives context
• individual events means nothing
• end to end flow means something
connecting the dots…
Debugging be Easier
• we wanted a view to show all the
events of a given action by a
resource
• be able to see any errors
• temporally aware -- order of events
• show the flow and context of events
postmortem analysis using
Elasticsearch
ElasticSearch
• document-oriented, schema free database
• built on top of Apache Lucene
• focused on providing full-text search capabilities
• distributed, highly available, real time db
• kibana - gui interface to database
KIBANA!!!
KIBANA!!!
HORIZON!!!
HORIZON!!!
Extending Events
• there is a lot of data that isn’t published
• the data that is published is disorganised
• extending support in horizon
• drilling down into event to view full raw data
• filter options - time range, events for a specific request
• ceilometer
• alarm on events
• build metrics from events
thank you
BACKUP
Horizon Events Prototype,
by George Peristerakis
https://github.com/enovance/horizon/tree/event-prototype

More Related Content

What's hot

Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
DataStax
 

What's hot (20)

Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
Seastar @ SF/BA C++UG
Seastar @ SF/BA C++UGSeastar @ SF/BA C++UG
Seastar @ SF/BA C++UG
 
Chronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusChronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for Prometheus
 
Monitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-toMonitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-to
 
Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!
 
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes ClusterTaking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
 
Seastar @ NYCC++UG
Seastar @ NYCC++UGSeastar @ NYCC++UG
Seastar @ NYCC++UG
 
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
 
opentsdb in a real enviroment
opentsdb in a real enviromentopentsdb in a real enviroment
opentsdb in a real enviroment
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
 
Processing Big Data in Real-Time - Yanai Franchi, Tikal
Processing Big Data in Real-Time - Yanai Franchi, TikalProcessing Big Data in Real-Time - Yanai Franchi, Tikal
Processing Big Data in Real-Time - Yanai Franchi, Tikal
 
Resource Scheduling using Apache Mesos in Cloud Native Environments
Resource Scheduling using Apache Mesos in Cloud Native EnvironmentsResource Scheduling using Apache Mesos in Cloud Native Environments
Resource Scheduling using Apache Mesos in Cloud Native Environments
 
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
 
Mario on spark
Mario on sparkMario on spark
Mario on spark
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemQConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing system
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series World
 

Similar to Anatomy of an action

Similar to Anatomy of an action (20)

Basic Understanding and Implement of Node.js
Basic Understanding and Implement of Node.jsBasic Understanding and Implement of Node.js
Basic Understanding and Implement of Node.js
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js
 
Node js Modules and Event Emitters
Node js Modules and Event EmittersNode js Modules and Event Emitters
Node js Modules and Event Emitters
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
from source to solution - building a system for event-oriented data
from source to solution - building a system for event-oriented datafrom source to solution - building a system for event-oriented data
from source to solution - building a system for event-oriented data
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flink
 
Dpdk 2019-ipsec-eventdev
Dpdk 2019-ipsec-eventdevDpdk 2019-ipsec-eventdev
Dpdk 2019-ipsec-eventdev
 
Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015
 
Node.js
Node.jsNode.js
Node.js
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
 
Large-scaled Deploy Over 100 Servers in 3 Minutes
Large-scaled Deploy Over 100 Servers in 3 MinutesLarge-scaled Deploy Over 100 Servers in 3 Minutes
Large-scaled Deploy Over 100 Servers in 3 Minutes
 
Qt Application Programming with C++ - Part 2
Qt Application Programming with C++ - Part 2Qt Application Programming with C++ - Part 2
Qt Application Programming with C++ - Part 2
 
JavaScript Event Loop
JavaScript Event LoopJavaScript Event Loop
JavaScript Event Loop
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practice
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Using Akka Persistence to build a configuration datastore
Using Akka Persistence to build a configuration datastoreUsing Akka Persistence to build a configuration datastore
Using Akka Persistence to build a configuration datastore
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
 
.NET Debugging Workshop
.NET Debugging Workshop.NET Debugging Workshop
.NET Debugging Workshop
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 

Anatomy of an action

  • 1. Mining the event storm Vladik Romanovsky Engineer The Anatomy of an Action Engineer Gordon Chung
  • 2. OpenStack is a wonderful place
  • 3. when you use OpenStack you might see this
  • 4.
  • 5.
  • 6.
  • 7.
  • 9. if you’re lucky, you might find the real error! [instance: e7933ceb-d1e7-42fe-9f37-d275ebd375bd] Instance failed to spawn Traceback (most recent call last): ... ... ProcessExecutionError: Unexpected error while running command. Command: qemu-img convert -O raw /opt/stack/data/nova/instances/_base/7434c85f2968d2cfb05b07d8c769d7d938cec5e 8.part /opt/stack/data/nova/instances/_base/7434c85f2968d2cfb05b07d8c769d7d938cec5e 8.converted Exit code: 1 Stdout: u'' Stderr: u'qemu-img: error while reading sector 0: Input/output errorn'
  • 10. Debugging be Hard • actions consists of multiple steps • asynchronous calls that can cause timing issues • distributed nature of OpenStack can make it difficult to debug • parsing log files are easy -- if you’re a robot
  • 11. Use Case: Creating an Instance
  • 12. Creating an Instance api conductor scheduler compute manager build network build storage start guest
  • 13. Creating an Instance api conductor scheduler compute manager build network build storage start guest FAIL HERE
  • 14. Creating an Instance api conductor scheduler compute manager build network build storage start guest FAIL HERE
  • 15. Creating an Instance conductor scheduler compute manager build network build storage start guest api FAIL HERE
  • 16. Creating an Instance api conductor scheduler compute manager build network build storage start guest notification bus
  • 17. Creating an Instance conductor scheduler compute manager build network build storage start guest api FAIL HERE notification bus
  • 18. OpenStack Events • most services emit notifications for some discrete events • the content of notification represent that state of the environment, resource, etc… at the point in time • notifications are defined by a type to describe content • nova: compute.instance.create.*, scheduler.create_volume • neutron: port.create.*, network.create.* • cinder: volume.detach.*, volume.create.* • keystone: identity.user.*, identity.project.* • and a lot more...
  • 19. Creating an Instance api conductor scheduler host manager build network build storage start guest notification bus consumer?
  • 20. Ceilometer • telemetry project in OpenStack • notification agent which consumes messages • listens to the queues of each OpenStack service • picks specific measurement values from notifications and builds meters
  • 22. every notification is also captured as an Event
  • 23. Creating an Instance api conductor scheduler host manager build network build storage start guest notification bus ceilometer notification agent Meters Events
  • 24. Ceilometer Events • initially implemented in Icehouse (part of StackTach integration) • an Event represents the state of an object in an OpenStack service at a point in time. • built from INFO and ERROR level notifications emitted by all services • ability to normalise messages by mapping key attributes from notification messages to a common name
  • 25. Ceilometer Event Model • message id • event type • timestamp • traits • queryable, indexed attributes • ie. payload.x.y.z => attr1 • raw • full notification
  • 26. Ceilometer Event Processing • all events are forced through pipelines • events can be published to multiple targets • database • file • queue • http
  • 27. Benefits of Centralised Events • potential lost of data if logging locally • normalisation of data • event flow across services gives context • individual events means nothing • end to end flow means something
  • 29. Debugging be Easier • we wanted a view to show all the events of a given action by a resource • be able to see any errors • temporally aware -- order of events • show the flow and context of events
  • 31. ElasticSearch • document-oriented, schema free database • built on top of Apache Lucene • focused on providing full-text search capabilities • distributed, highly available, real time db • kibana - gui interface to database
  • 32.
  • 37. Extending Events • there is a lot of data that isn’t published • the data that is published is disorganised • extending support in horizon • drilling down into event to view full raw data • filter options - time range, events for a specific request • ceilometer • alarm on events • build metrics from events
  • 40. Horizon Events Prototype, by George Peristerakis https://github.com/enovance/horizon/tree/event-prototype