SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Building a Monitoring
Framework Using DTrace
and MongoDB
Dan Kimmel
Software Engineer, Delphix
dan.kimmel@delphix.com
Background
● Building a performance monitoring
framework on illumos using DTrace
● It's monitoring our data virtualization engine
○ That means "database storage virtualization and
rigorous administration automation" for those who
didn't have time to study up on our marketing lingo
● Our users are mostly DBAs
● The monitoring framework itself is not
released yet
● DBAs have one performance metric they
care about for their database storage
○ I/O latency, because it translates to database I/O
latency, which translates to end-user happiness
● But to make the performance data
actionable, they usually need more than that
single measurement
○ Luckily, DTrace always has more data
What to collect?
Virtualized Database Storage*
Database Process
(Oracle, SQLServer, others on the way)
Storage Appliance
(the Delphix Engine)
* as most people imagine it
Database
I/O path
Network
Hypervisor*
Delphix OS
Database Host OS
(Windows, Linux, Solaris, *BSD, HP-UX, AIX)
Virtualized Database Storage
Database Process
(Oracle, SQLServer, others on the way)
Network-Mounted Storage Layer (NFS/iSCSI)
Network
Delphix FS
Storage
Database
I/O path
* Sometimes the DB host is running on a hypervisor too, or even on the same hypervisor
Hypervisor
Delphix OS
Database Host OS
(Windows, Linux, Solaris, *BSD, HP-UX, AIX)
Latency can come from anywhere
Database Process
(Oracle, SQLServer, others on the way)
Network-Mounted Storage Layer (NFS/iSCSI)
Network
Delphix FS
Storage
Out of memory? Out of CPU?
Out of bandwidth?
Out of memory? Out of CPU?
Out of memory? Out of CPU?
Out of IOPS? Out of bandwidth?
NFS client latency
Network latency
Queuing latency
FS latency
Device latency
Database
I/O path
Bottlenecks on the left Sources of latency on the right
Investigation Requirements
Want users to be able to dig deeper during a
performance investigation.
● Show many different sources of latency and
show many possible bottlenecks
○ i.e. collect data from all levels of the I/O stack
○ This is something that we're still working on, and
sadly, not all levels of the stack have DTrace
● Allow users to narrow down the cause within
one layer
○ Concepts were inspired by other DTrace-based
analytics tools from Sun and Joyent
Narrowing down the cause
After looking at a high level view of the layers, a
user sees NFS server latency has some slow
outliers.
1. NFS latency by client IP address
○ The client at 187.124.26.12 looks slowest
2. NFS latency for 187... by operation
○ Writes look like the slow operation
3. NFS write latency for 187... by synchronous
○ Synchronous writes are slower than normal
How that exercise helped
● The user just learned a lot about the problem
○ The user might be able to solve it themselves by (for
instance) upgrading or expanding the storage we sit
on top of to handle synchronous writes better
○ They can also submit a much more useful bug report
or speak effectively to our support staff
● Saves them time, saves us time!
DTrace is the perfect tool
● To split results on a variable, collect the
variable and use it as an additional key in
your aggregations.
● To narrow down a variable, add a condition.
// Pseudocode alert!
0. probe {@latency = quantize(start - timestamp)}
1. probe {@latency[ip] = quantize(start - timestamp)}
2. probe /ip == "187..."/ {
@latency[operation] = quantize(start - timestamp);
}
3. probe /ip == "187..." && operation == "write"/ {
@latency[synchronous] = quantize(start - timestamp);
}
How we built "narrowing down"
● Templated D scripts for collecting data
internal to Delphix OS
● Allow the user to specify constraints on
variables in each template
○ Translate these into DTrace conditions
● Allow the user to specify which variables
they want to display
● Fill out a template and run the resulting
script
Enhancing Supportability
Our support staff hears this question frequently:
We got reports of slow DB accesses last
Friday, but now everything is back to normal.
Can you help us debug what went wrong?
Historical data is important too
● We always read a few system-wide statistics
● We store all readings into MongoDB
○ We're not really concerned about ACID guarantees
○ We don't know exactly what variables we will be
collecting for each collector ahead of time
○ MongoDB has a couple of features that are
specifically made for logging that we use
○ It was easy to configure and use
Storing (lots of) historical data
The collected data piles up quickly!
● Don't collect data too frequently
● Compress readings into larger and larger
time intervals as the readings age
○ We implemented this in the caller, but could have
used MongoDB's MapReduce as well
● Eventually, delete them (after ~2 weeks)
○ We used MongoDB's "time-to-live indexes" to handle
this automatically; they work nicely
Dealing with the Edge Cases
● If an investigation is ongoing, performance
data could be compressed or deleted if the
investigating takes too long
● Users can prevent data from being
compressed or deleted by explicitly saving it
Summary
● We used DTrace to allow customers to dig
deeper on performance issues
○ Customers will love it*
○ Our support staff will love it*
* at least, that's the hope!
Thanks!

Weitere ähnliche Inhalte

Was ist angesagt?

In-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsIn-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsSrinath Perera
 
ClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale outClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale outMariaDB plc
 
MySQL 高可用性
MySQL 高可用性MySQL 高可用性
MySQL 高可用性YUCHENG HU
 
Heka - Rob Miller
Heka - Rob MillerHeka - Rob Miller
Heka - Rob MillerDevopsdays
 
ClustrixDB at Samsung Cloud
ClustrixDB at Samsung CloudClustrixDB at Samsung Cloud
ClustrixDB at Samsung CloudMariaDB plc
 
Sql server tips from the field
Sql server tips from the fieldSql server tips from the field
Sql server tips from the fieldJoAnna Cheshire
 
How to build an event driven architecture with kafka and kafka connect
How to build an event driven architecture with kafka and kafka connectHow to build an event driven architecture with kafka and kafka connect
How to build an event driven architecture with kafka and kafka connectLoi Nguyen
 
Try Cloud Spanner
Try Cloud SpannerTry Cloud Spanner
Try Cloud SpannerSimon Su
 
Auto Europe's ongoing journey with MariaDB and open source
Auto Europe's ongoing journey with MariaDB and open sourceAuto Europe's ongoing journey with MariaDB and open source
Auto Europe's ongoing journey with MariaDB and open sourceMariaDB plc
 
Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 ReliabilityAli Usman
 
Munich 2016 - Z011597 Martin Packer - How To Be A Better Performance Specialist
Munich 2016 - Z011597 Martin Packer - How To Be A Better Performance SpecialistMunich 2016 - Z011597 Martin Packer - How To Be A Better Performance Specialist
Munich 2016 - Z011597 Martin Packer - How To Be A Better Performance SpecialistMartin Packer
 
Sync IT Presentation 3.16
Sync IT Presentation 3.16Sync IT Presentation 3.16
Sync IT Presentation 3.16Marcus Grimaldo
 
Redis as a Main Database, Scaling and HA
Redis as a Main Database, Scaling and HARedis as a Main Database, Scaling and HA
Redis as a Main Database, Scaling and HADave Nielsen
 
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...HostedbyConfluent
 
E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres Regunath B
 
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...In-Memory Computing Summit
 
in-memory database system and low latency
in-memory database system and low latencyin-memory database system and low latency
in-memory database system and low latencyhyeongchae lee
 

Was ist angesagt? (20)

In-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsIn-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common Patterns
 
ClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale outClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale out
 
MySQL 高可用性
MySQL 高可用性MySQL 高可用性
MySQL 高可用性
 
Heka - Rob Miller
Heka - Rob MillerHeka - Rob Miller
Heka - Rob Miller
 
ClustrixDB at Samsung Cloud
ClustrixDB at Samsung CloudClustrixDB at Samsung Cloud
ClustrixDB at Samsung Cloud
 
Sql server tips from the field
Sql server tips from the fieldSql server tips from the field
Sql server tips from the field
 
How to build an event driven architecture with kafka and kafka connect
How to build an event driven architecture with kafka and kafka connectHow to build an event driven architecture with kafka and kafka connect
How to build an event driven architecture with kafka and kafka connect
 
Try Cloud Spanner
Try Cloud SpannerTry Cloud Spanner
Try Cloud Spanner
 
Real time database
Real time databaseReal time database
Real time database
 
Auto Europe's ongoing journey with MariaDB and open source
Auto Europe's ongoing journey with MariaDB and open sourceAuto Europe's ongoing journey with MariaDB and open source
Auto Europe's ongoing journey with MariaDB and open source
 
Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 Reliability
 
Munich 2016 - Z011597 Martin Packer - How To Be A Better Performance Specialist
Munich 2016 - Z011597 Martin Packer - How To Be A Better Performance SpecialistMunich 2016 - Z011597 Martin Packer - How To Be A Better Performance Specialist
Munich 2016 - Z011597 Martin Packer - How To Be A Better Performance Specialist
 
Sync IT Presentation 3.16
Sync IT Presentation 3.16Sync IT Presentation 3.16
Sync IT Presentation 3.16
 
Redis as a Main Database, Scaling and HA
Redis as a Main Database, Scaling and HARedis as a Main Database, Scaling and HA
Redis as a Main Database, Scaling and HA
 
SNIA SDC 2016 final
SNIA SDC 2016 finalSNIA SDC 2016 final
SNIA SDC 2016 final
 
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
 
E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres
 
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
 
in-memory database system and low latency
in-memory database system and low latencyin-memory database system and low latency
in-memory database system and low latency
 
Dynomite - PerconaLive 2017
Dynomite  - PerconaLive 2017Dynomite  - PerconaLive 2017
Dynomite - PerconaLive 2017
 

Andere mochten auch

A brief history of DTrace
A brief history of DTraceA brief history of DTrace
A brief history of DTraceahl0003
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databasesAngelo Rajadurai
 
Solaris Kernel Debugging V1.0
Solaris Kernel Debugging V1.0Solaris Kernel Debugging V1.0
Solaris Kernel Debugging V1.0Jarod Wang
 
Site Operation Manual for a Typical Air Monitoring Site
Site Operation Manual for a Typical Air Monitoring SiteSite Operation Manual for a Typical Air Monitoring Site
Site Operation Manual for a Typical Air Monitoring SiteTAMUK
 
River monitoring site 7
River monitoring site 7River monitoring site 7
River monitoring site 7John Hoopman
 
Khulisa Management Services- ECD Site Monitoring Instrument
Khulisa Management Services- ECD Site Monitoring InstrumentKhulisa Management Services- ECD Site Monitoring Instrument
Khulisa Management Services- ECD Site Monitoring Instrumentkaleylemottee
 
Presentation Mrs.Smolka Ursula, Ramboll: costs and benefits when monitoring s...
Presentation Mrs.Smolka Ursula, Ramboll: costs and benefits when monitoring s...Presentation Mrs.Smolka Ursula, Ramboll: costs and benefits when monitoring s...
Presentation Mrs.Smolka Ursula, Ramboll: costs and benefits when monitoring s...Torben Haagh
 
Nagios Conference 2013 - Thomas Dunbar - Building Technology for Storage Syst...
Nagios Conference 2013 - Thomas Dunbar - Building Technology for Storage Syst...Nagios Conference 2013 - Thomas Dunbar - Building Technology for Storage Syst...
Nagios Conference 2013 - Thomas Dunbar - Building Technology for Storage Syst...Nagios
 
The Benefits of Having Nerds On Site Monitoring Your Technology
The Benefits of Having Nerds On Site Monitoring Your TechnologyThe Benefits of Having Nerds On Site Monitoring Your Technology
The Benefits of Having Nerds On Site Monitoring Your TechnologyKevin Lloyd
 
LabVIEW Based Monitoring the Building in wireless communication
LabVIEW Based Monitoring the Building in wireless communicationLabVIEW Based Monitoring the Building in wireless communication
LabVIEW Based Monitoring the Building in wireless communicationSathish Kumar
 
Building and Monitoring Services at Lithium
Building and Monitoring Services at LithiumBuilding and Monitoring Services at Lithium
Building and Monitoring Services at LithiumPaul Cichonski
 
The Drupal Ecosystem for Drupal Services
The Drupal Ecosystem for Drupal ServicesThe Drupal Ecosystem for Drupal Services
The Drupal Ecosystem for Drupal ServicesVardot
 
How to build a budget transparency site: 5 easy steps
How to build a budget transparency site: 5 easy steps How to build a budget transparency site: 5 easy steps
How to build a budget transparency site: 5 easy steps Lucy Chambers
 
Big Data and Social Monitoring: Building Meaningful Relationships
Big Data and Social Monitoring: Building Meaningful RelationshipsBig Data and Social Monitoring: Building Meaningful Relationships
Big Data and Social Monitoring: Building Meaningful RelationshipsEmanuela Zaccone
 
Low power wireless sensor network for building monitoring
Low power wireless sensor network for building monitoringLow power wireless sensor network for building monitoring
Low power wireless sensor network for building monitoringecwayerode
 
How to Efficiently and Effectively Balance Central Monitoring with On-Site Mo...
How to Efficiently and Effectively Balance Central Monitoring with On-Site Mo...How to Efficiently and Effectively Balance Central Monitoring with On-Site Mo...
How to Efficiently and Effectively Balance Central Monitoring with On-Site Mo...Target Health, Inc.
 
Experience from Phase 3 Study Using Risk- Based Monitoring and eSource Method...
Experience from Phase 3 Study Using Risk- Based Monitoring and eSource Method...Experience from Phase 3 Study Using Risk- Based Monitoring and eSource Method...
Experience from Phase 3 Study Using Risk- Based Monitoring and eSource Method...Target Health, Inc.
 
Notes to support the presentation 'Introduction to the Visual Infusion Phlebi...
Notes to support the presentation 'Introduction to the Visual Infusion Phlebi...Notes to support the presentation 'Introduction to the Visual Infusion Phlebi...
Notes to support the presentation 'Introduction to the Visual Infusion Phlebi...ivteam
 

Andere mochten auch (19)

A brief history of DTrace
A brief history of DTraceA brief history of DTrace
A brief history of DTrace
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databases
 
Solaris Kernel Debugging V1.0
Solaris Kernel Debugging V1.0Solaris Kernel Debugging V1.0
Solaris Kernel Debugging V1.0
 
Site Operation Manual for a Typical Air Monitoring Site
Site Operation Manual for a Typical Air Monitoring SiteSite Operation Manual for a Typical Air Monitoring Site
Site Operation Manual for a Typical Air Monitoring Site
 
River monitoring site 7
River monitoring site 7River monitoring site 7
River monitoring site 7
 
Khulisa Management Services- ECD Site Monitoring Instrument
Khulisa Management Services- ECD Site Monitoring InstrumentKhulisa Management Services- ECD Site Monitoring Instrument
Khulisa Management Services- ECD Site Monitoring Instrument
 
Presentation Mrs.Smolka Ursula, Ramboll: costs and benefits when monitoring s...
Presentation Mrs.Smolka Ursula, Ramboll: costs and benefits when monitoring s...Presentation Mrs.Smolka Ursula, Ramboll: costs and benefits when monitoring s...
Presentation Mrs.Smolka Ursula, Ramboll: costs and benefits when monitoring s...
 
Nagios Conference 2013 - Thomas Dunbar - Building Technology for Storage Syst...
Nagios Conference 2013 - Thomas Dunbar - Building Technology for Storage Syst...Nagios Conference 2013 - Thomas Dunbar - Building Technology for Storage Syst...
Nagios Conference 2013 - Thomas Dunbar - Building Technology for Storage Syst...
 
The Benefits of Having Nerds On Site Monitoring Your Technology
The Benefits of Having Nerds On Site Monitoring Your TechnologyThe Benefits of Having Nerds On Site Monitoring Your Technology
The Benefits of Having Nerds On Site Monitoring Your Technology
 
LabVIEW Based Monitoring the Building in wireless communication
LabVIEW Based Monitoring the Building in wireless communicationLabVIEW Based Monitoring the Building in wireless communication
LabVIEW Based Monitoring the Building in wireless communication
 
Building and Monitoring Services at Lithium
Building and Monitoring Services at LithiumBuilding and Monitoring Services at Lithium
Building and Monitoring Services at Lithium
 
The Drupal Ecosystem for Drupal Services
The Drupal Ecosystem for Drupal ServicesThe Drupal Ecosystem for Drupal Services
The Drupal Ecosystem for Drupal Services
 
How to build a budget transparency site: 5 easy steps
How to build a budget transparency site: 5 easy steps How to build a budget transparency site: 5 easy steps
How to build a budget transparency site: 5 easy steps
 
Big Data and Social Monitoring: Building Meaningful Relationships
Big Data and Social Monitoring: Building Meaningful RelationshipsBig Data and Social Monitoring: Building Meaningful Relationships
Big Data and Social Monitoring: Building Meaningful Relationships
 
Low power wireless sensor network for building monitoring
Low power wireless sensor network for building monitoringLow power wireless sensor network for building monitoring
Low power wireless sensor network for building monitoring
 
How to Efficiently and Effectively Balance Central Monitoring with On-Site Mo...
How to Efficiently and Effectively Balance Central Monitoring with On-Site Mo...How to Efficiently and Effectively Balance Central Monitoring with On-Site Mo...
How to Efficiently and Effectively Balance Central Monitoring with On-Site Mo...
 
Experience from Phase 3 Study Using Risk- Based Monitoring and eSource Method...
Experience from Phase 3 Study Using Risk- Based Monitoring and eSource Method...Experience from Phase 3 Study Using Risk- Based Monitoring and eSource Method...
Experience from Phase 3 Study Using Risk- Based Monitoring and eSource Method...
 
ECD monitoring instrument
ECD monitoring instrumentECD monitoring instrument
ECD monitoring instrument
 
Notes to support the presentation 'Introduction to the Visual Infusion Phlebi...
Notes to support the presentation 'Introduction to the Visual Infusion Phlebi...Notes to support the presentation 'Introduction to the Visual Infusion Phlebi...
Notes to support the presentation 'Introduction to the Visual Infusion Phlebi...
 

Ähnlich wie #lspe Building a Monitoring Framework using DTrace and MongoDB

MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simpleDori Waldman
 
Designing for operability and managability
Designing for operability and managabilityDesigning for operability and managability
Designing for operability and managabilityGaurav Bahrani
 
How to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsHow to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsAlluxio, Inc.
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding HadoopAhmed Ossama
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned Omid Vahdaty
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...Niraj Tolia
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Roopa Tangirala
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data PlatformDani Solà Lagares
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodbPGConf APAC
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
 
Lecture 05 - The Data Warehouse and Technology
Lecture 05 - The Data Warehouse and TechnologyLecture 05 - The Data Warehouse and Technology
Lecture 05 - The Data Warehouse and Technologyphanleson
 
Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017Deepu K Sasidharan
 
Devoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipsterDevoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipsterJulien Dubois
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
OSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithOSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithNETWAYS
 

Ähnlich wie #lspe Building a Monitoring Framework using DTrace and MongoDB (20)

EQUNIX - PPT 11DB-Postgres™.pdf
EQUNIX - PPT 11DB-Postgres™.pdfEQUNIX - PPT 11DB-Postgres™.pdf
EQUNIX - PPT 11DB-Postgres™.pdf
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
Designing for operability and managability
Designing for operability and managabilityDesigning for operability and managability
Designing for operability and managability
 
How to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsHow to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data Platforms
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Lecture 05 - The Data Warehouse and Technology
Lecture 05 - The Data Warehouse and TechnologyLecture 05 - The Data Warehouse and Technology
Lecture 05 - The Data Warehouse and Technology
 
Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017
 
Devoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipsterDevoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipster
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
OSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithOSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles Judith
 

Kürzlich hochgeladen

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

#lspe Building a Monitoring Framework using DTrace and MongoDB

  • 1. Building a Monitoring Framework Using DTrace and MongoDB Dan Kimmel Software Engineer, Delphix dan.kimmel@delphix.com
  • 2. Background ● Building a performance monitoring framework on illumos using DTrace ● It's monitoring our data virtualization engine ○ That means "database storage virtualization and rigorous administration automation" for those who didn't have time to study up on our marketing lingo ● Our users are mostly DBAs ● The monitoring framework itself is not released yet
  • 3. ● DBAs have one performance metric they care about for their database storage ○ I/O latency, because it translates to database I/O latency, which translates to end-user happiness ● But to make the performance data actionable, they usually need more than that single measurement ○ Luckily, DTrace always has more data What to collect?
  • 4. Virtualized Database Storage* Database Process (Oracle, SQLServer, others on the way) Storage Appliance (the Delphix Engine) * as most people imagine it Database I/O path Network
  • 5. Hypervisor* Delphix OS Database Host OS (Windows, Linux, Solaris, *BSD, HP-UX, AIX) Virtualized Database Storage Database Process (Oracle, SQLServer, others on the way) Network-Mounted Storage Layer (NFS/iSCSI) Network Delphix FS Storage Database I/O path * Sometimes the DB host is running on a hypervisor too, or even on the same hypervisor
  • 6. Hypervisor Delphix OS Database Host OS (Windows, Linux, Solaris, *BSD, HP-UX, AIX) Latency can come from anywhere Database Process (Oracle, SQLServer, others on the way) Network-Mounted Storage Layer (NFS/iSCSI) Network Delphix FS Storage Out of memory? Out of CPU? Out of bandwidth? Out of memory? Out of CPU? Out of memory? Out of CPU? Out of IOPS? Out of bandwidth? NFS client latency Network latency Queuing latency FS latency Device latency Database I/O path Bottlenecks on the left Sources of latency on the right
  • 7. Investigation Requirements Want users to be able to dig deeper during a performance investigation. ● Show many different sources of latency and show many possible bottlenecks ○ i.e. collect data from all levels of the I/O stack ○ This is something that we're still working on, and sadly, not all levels of the stack have DTrace ● Allow users to narrow down the cause within one layer ○ Concepts were inspired by other DTrace-based analytics tools from Sun and Joyent
  • 8. Narrowing down the cause After looking at a high level view of the layers, a user sees NFS server latency has some slow outliers. 1. NFS latency by client IP address ○ The client at 187.124.26.12 looks slowest 2. NFS latency for 187... by operation ○ Writes look like the slow operation 3. NFS write latency for 187... by synchronous ○ Synchronous writes are slower than normal
  • 9. How that exercise helped ● The user just learned a lot about the problem ○ The user might be able to solve it themselves by (for instance) upgrading or expanding the storage we sit on top of to handle synchronous writes better ○ They can also submit a much more useful bug report or speak effectively to our support staff ● Saves them time, saves us time!
  • 10. DTrace is the perfect tool ● To split results on a variable, collect the variable and use it as an additional key in your aggregations. ● To narrow down a variable, add a condition. // Pseudocode alert! 0. probe {@latency = quantize(start - timestamp)} 1. probe {@latency[ip] = quantize(start - timestamp)} 2. probe /ip == "187..."/ { @latency[operation] = quantize(start - timestamp); } 3. probe /ip == "187..." && operation == "write"/ { @latency[synchronous] = quantize(start - timestamp); }
  • 11. How we built "narrowing down" ● Templated D scripts for collecting data internal to Delphix OS ● Allow the user to specify constraints on variables in each template ○ Translate these into DTrace conditions ● Allow the user to specify which variables they want to display ● Fill out a template and run the resulting script
  • 12. Enhancing Supportability Our support staff hears this question frequently: We got reports of slow DB accesses last Friday, but now everything is back to normal. Can you help us debug what went wrong?
  • 13. Historical data is important too ● We always read a few system-wide statistics ● We store all readings into MongoDB ○ We're not really concerned about ACID guarantees ○ We don't know exactly what variables we will be collecting for each collector ahead of time ○ MongoDB has a couple of features that are specifically made for logging that we use ○ It was easy to configure and use
  • 14. Storing (lots of) historical data The collected data piles up quickly! ● Don't collect data too frequently ● Compress readings into larger and larger time intervals as the readings age ○ We implemented this in the caller, but could have used MongoDB's MapReduce as well ● Eventually, delete them (after ~2 weeks) ○ We used MongoDB's "time-to-live indexes" to handle this automatically; they work nicely
  • 15. Dealing with the Edge Cases ● If an investigation is ongoing, performance data could be compressed or deleted if the investigating takes too long ● Users can prevent data from being compressed or deleted by explicitly saving it
  • 16. Summary ● We used DTrace to allow customers to dig deeper on performance issues ○ Customers will love it* ○ Our support staff will love it* * at least, that's the hope!