SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Open source tools for optimizing
your peering infrastructure
@ DE-CIX TechMeeting 2018-06-06
by Daniel Czerwonk
• Software / Network Engineer at Mauve Mailorder Software
• Head of Network Freifunk Essen e.V.
• AS44821 (Mauve), AS206356 (Freifunk Essen e.V.),
AS202739 (routing-rocks)
• birdwatcher and bio-routing contributor
• Twitter: @dan_nrw
• Github: https://github.com/czerwonk
• LinkedIn: https://www.linkedin.com/in/czerwonk/
Who is this guy? About me…
Our journey starts late 2016
A new networking setup is about to
be build
But before that:
Let’s talk about monitoring…
• Very small operations team
• Freifunk Essen should be even less ops demanding
• Identify trends/anomalies early
• Capacity planing (beware of retention)
• Source for alerting
• Start point for traffic engineering, etc.
• Source to build post mortem on (in case of outage)
• Dashboard to give a quick overview when needed
Why is monitoring important for me?
So, let’s build a monitoring system…
• Prometheus to collect metrics
• Grafana to visualize metrics
• Alertmanager with Pushover integration for alerting
• Everything Ansible managed
What I wanted…
+ +
• Bird routing daemon
• JunOS running on a few EX series switches
• Host metrics from bare metal software router machines (statistics, resources)
• External network latencies (RIPE ATLAS, etc.)
What I wanted to scrape?
What I found…
In 2016…
Metric Solution Problem
bird no exporter available
JunOS snmp_exporter
complex configuration,
bad performance
Host metrics node_exporter
Network latencies
blackbox_exporter with
external probe VMs
bad coverage,
only one request per scrape
• Official Prometheus project
• On Linux hosts (e.g. Routers)
• Network interface metrics
• Resource consumption: CPU load, RAM usage, Disk space
• Interrupts / context switches
• License: Apache 2.0
• Source: https://github.com/prometheus/node_exporter
node_exporter
At least we got the host metrics covered.
And the rest?
I had to solve that…
So I started to write some
exporters…
• Performance is key feature
• Need for concurrent processing
• Single binary / no dependencies
• Easy installation via go get …
• Existing client API for Prometheus
• Love writing code in golang in my spare time
Which programming language?
I chose golang:
atlas_exporter
RIPE ATLAS
Milestones to an exporter suite
bird_exporter
Bird 1.x
2016 20182017
RIPE LABS
article
Support for
bird 2.x
Replaced SNMP
by SSH
junos_exporter
Juniper JunOS
using SNMP
ping_exporter
ICMP probing
mikrotik-exporter
RouterOS
• Started late 2016
• Communicates with bird via socket
• Bird 1.x and 2.x supported
• Protocols: BGP, OSPFv2, OSPFv3, Kernel, Static, Device, Direct
• License: MIT
• Source: https://github.com/czerwonk/bird_exporter
bird_exporter
bird_exporter
bird_protocol_prefix_import_count{proto=~"BGP|OSPFv3",ip_version="6"}
count(bird_protocol_up{proto=“BGP"} == 1)
• BGP session state metrics
• BGP message counts (received, sent, withdrawn, etc.)
• Prefix counts for all supported protocols (imported, exported, filtered, etc.)
• OSPFv2/OSPFv3 neighbour counts
• Protocol uptime
bird_exporter - Features
• Started early 2018
• Replacement for RRD based smokeping
• Concerning ICMP also replacement for blackbox_exporter since lack of loss
detection
• Based on go-ping by Digineo: https://github.com/digineo/go-ping
• License: MIT
• Source: https://github.com/czerwonk/ping_exporter
ping_exporter
ping_exporter
ping_rtt_mean_ms{ip_version="6"}
ping_loss_percent{ip_version="4"}
• Sends and aggregates multiple ICMP ECHO requests
• Roundtrip metrics (current, best, worst)
• Simple way to detect loss
• Supports multiple targets
• DNS refresh ensures the correct IP is measured when DNS is changed
• Only ICMP support at the moment
• Warning: ICMP is not user traffic so keep that in mind when trying to interpret these
metrics
ping_exporter - Features
• Started early 2017
• Metrics by requesting measurement results from RIPE ATLAS
• Useful to get an outside view from different other networks
• License: LGPL3 (since the binding used is under this license)
• Source: https://github.com/czerwonk/atlas_exporter
• More info:
https://labs.ripe.net/Members/daniel_czerwonk/using-ripe-atlas-measurement-
results-in-prometheus-with-atlas_exporter
atlas_exporter
atlas_exporter
avg(atlas_ping_avg_latency{ip_version="4"}) by (asn)
avg(atlas_traceroute_hops{ip_version="4"}) by (asn)
• Ping (success, min/max/avg latency, dups, size)
• Traceroute (success, hop count, rtt)
• NTP (delay, derivation, ntp version)
• DNS (succress, rtt)
• HTTP (return code, rtt, http version, header size, body size)
• SSL Certificates (alert, rtt)
atlas_exporter - Features
• Started late 2017
• snmp_exporter did not perform as required
• First implementation using a simple set of SNMP OIDs
• Early 2018: reimplementation using SSH and XML RPC representation
• Alternative to Junipers OpenNTI since telemetry is only supported on newer
versions of JunOS and hardware
• License: MIT
• Source: https://github.com/czerwonk/junos_exporter
junos_exporter
• Interfaces (bytes transmitted/received, errors, drops)
• Routes (per table, by protocol)
• Alarms (count)
• BGP (message count, prefix counts per peer, session state)
• OSPFv2, OSPFv3 (number of neighbours)
• Interface diagnostics (optical signals)
• ISIS (number of adjacencies, total number of routers)
• Environment (temperatures)
• Routing engine statistics
junos_exporter - Features
• Contribution to existing project
• Only interface and resource metrics at this point
• Added several other features
• License: BSD3
• Source: https://github.com/nshttpd/mikrotik-exporter
mikrotik-exporter
• Interface metrics (RX bytes, TX bytes, drops, errors, etc.)
• BGP session states
• BGP message counts (updates, withdraws)
• DHCP leases
• DHCPv6 bindings
• Optical diagnostics
• IPv4/IPv6 pool counts
• System resources (memory, CPU load, etc.)
• Prefix counts per protocol (in RIB)
mikrotik-exporter - Features
Dashboard examples
How to combine several exporters?
Mauve Network Overview
Mauve Routing
Alerting
When and how?
How to alert?
What the SRE book has taught us:
https://landing.google.com/sre/book/chapters/monitoring-distributed-systems.html
How to alert? A few examples…
Port saturation:
Upstream session down:
Thank you for your attention.
Special thanks to all people contributed to my projects!

Weitere ähnliche Inhalte

Was ist angesagt?

Juggling with Bits and Bytes - How Apache Flink operates on binary data
Juggling with Bits and Bytes - How Apache Flink operates on binary dataJuggling with Bits and Bytes - How Apache Flink operates on binary data
Juggling with Bits and Bytes - How Apache Flink operates on binary dataFabian Hueske
 
Eac integrations JS LiveStream
Eac integrations JS LiveStreamEac integrations JS LiveStream
Eac integrations JS LiveStreamChronoLogic
 
OSINT RF Reverse Engineering by Marc Newlin
OSINT RF Reverse Engineering by Marc NewlinOSINT RF Reverse Engineering by Marc Newlin
OSINT RF Reverse Engineering by Marc NewlinEC-Council
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTingJ On The Beach
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusMarco Pas
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)Lucas Jellema
 
Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and visionStephan Ewen
 
Monitoring with Prometheus
Monitoring with Prometheus Monitoring with Prometheus
Monitoring with Prometheus Pravin Magdum
 
Summit 16: StorPerf: Cinder Storage Performance Measurement
Summit 16: StorPerf: Cinder Storage Performance MeasurementSummit 16: StorPerf: Cinder Storage Performance Measurement
Summit 16: StorPerf: Cinder Storage Performance MeasurementOPNFV
 
OSDC 2018 - Distributed monitoring
OSDC 2018 - Distributed monitoringOSDC 2018 - Distributed monitoring
OSDC 2018 - Distributed monitoringGianluca Arbezzano
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsMonal Daxini
 
A Science Project: Swift Serial Chat
A Science Project: Swift Serial ChatA Science Project: Swift Serial Chat
A Science Project: Swift Serial Chatyeokm1
 
Maksim Vazhenin [Dell Technologies] | InfluxDB for Storage System Monitoring ...
Maksim Vazhenin [Dell Technologies] | InfluxDB for Storage System Monitoring ...Maksim Vazhenin [Dell Technologies] | InfluxDB for Storage System Monitoring ...
Maksim Vazhenin [Dell Technologies] | InfluxDB for Storage System Monitoring ...InfluxData
 
Upstream Testing Collaboration
Upstream Testing Collaboration Upstream Testing Collaboration
Upstream Testing Collaboration OPNFV
 
My Journey with Laravel by Shavkat, Ecompile.io
My Journey with Laravel by Shavkat, Ecompile.ioMy Journey with Laravel by Shavkat, Ecompile.io
My Journey with Laravel by Shavkat, Ecompile.ioappleseeds-my
 

Was ist angesagt? (20)

Flink. Pure Streaming
Flink. Pure StreamingFlink. Pure Streaming
Flink. Pure Streaming
 
Juggling with Bits and Bytes - How Apache Flink operates on binary data
Juggling with Bits and Bytes - How Apache Flink operates on binary dataJuggling with Bits and Bytes - How Apache Flink operates on binary data
Juggling with Bits and Bytes - How Apache Flink operates on binary data
 
Eac integrations JS LiveStream
Eac integrations JS LiveStreamEac integrations JS LiveStream
Eac integrations JS LiveStream
 
OSINT RF Reverse Engineering by Marc Newlin
OSINT RF Reverse Engineering by Marc NewlinOSINT RF Reverse Engineering by Marc Newlin
OSINT RF Reverse Engineering by Marc Newlin
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTing
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and vision
 
Monitoring with Prometheus
Monitoring with Prometheus Monitoring with Prometheus
Monitoring with Prometheus
 
Summit 16: StorPerf: Cinder Storage Performance Measurement
Summit 16: StorPerf: Cinder Storage Performance MeasurementSummit 16: StorPerf: Cinder Storage Performance Measurement
Summit 16: StorPerf: Cinder Storage Performance Measurement
 
OSDC 2018 - Distributed monitoring
OSDC 2018 - Distributed monitoringOSDC 2018 - Distributed monitoring
OSDC 2018 - Distributed monitoring
 
Raptor codes
Raptor codesRaptor codes
Raptor codes
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data models
 
A Science Project: Swift Serial Chat
A Science Project: Swift Serial ChatA Science Project: Swift Serial Chat
A Science Project: Swift Serial Chat
 
Training – Going Async
Training – Going AsyncTraining – Going Async
Training – Going Async
 
SecureWV - APT2
SecureWV - APT2SecureWV - APT2
SecureWV - APT2
 
Maksim Vazhenin [Dell Technologies] | InfluxDB for Storage System Monitoring ...
Maksim Vazhenin [Dell Technologies] | InfluxDB for Storage System Monitoring ...Maksim Vazhenin [Dell Technologies] | InfluxDB for Storage System Monitoring ...
Maksim Vazhenin [Dell Technologies] | InfluxDB for Storage System Monitoring ...
 
DerbyCon - APT2
DerbyCon - APT2DerbyCon - APT2
DerbyCon - APT2
 
Upstream Testing Collaboration
Upstream Testing Collaboration Upstream Testing Collaboration
Upstream Testing Collaboration
 
My Journey with Laravel by Shavkat, Ecompile.io
My Journey with Laravel by Shavkat, Ecompile.ioMy Journey with Laravel by Shavkat, Ecompile.io
My Journey with Laravel by Shavkat, Ecompile.io
 

Ähnlich wie Open source tools for optimizing your peering infrastructure @ DE-CIX TechMeeting 2018

Fluentd at HKOScon
Fluentd at HKOSconFluentd at HKOScon
Fluentd at HKOSconN Masahiro
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTWNGINX, Inc.
 
Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Timothy Spann
 
Fuzzing Janus @ IPTComm 2019
Fuzzing Janus @ IPTComm 2019Fuzzing Janus @ IPTComm 2019
Fuzzing Janus @ IPTComm 2019Lorenzo Miniero
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Puppet
 
from ai.backend import python @ pycontw2018
from ai.backend import python @ pycontw2018from ai.backend import python @ pycontw2018
from ai.backend import python @ pycontw2018Chun-Yu Tseng
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisBrendan Gregg
 
Network Situational Awareness with d00gle
Network Situational Awareness with d00gleNetwork Situational Awareness with d00gle
Network Situational Awareness with d00gleDug Song
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Databricks
 
Hail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceHail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceTimothy Spann
 
Kentik Network@Scale (Dan Ellis)
Kentik Network@Scale (Dan Ellis)Kentik Network@Scale (Dan Ellis)
Kentik Network@Scale (Dan Ellis)gvillain
 
Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)
Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)
Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)Jakub Botwicz
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureTimothy Spann
 
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...Mullaiselvan Mohan
 
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarTimothy Spann
 
Howto createOpenFlow Switchusing FPGA (at FPGAX#6)
Howto createOpenFlow Switchusing FPGA (at FPGAX#6)Howto createOpenFlow Switchusing FPGA (at FPGAX#6)
Howto createOpenFlow Switchusing FPGA (at FPGAX#6)Kentaro Ebisawa
 
Splunk: Forward me the REST of those shells
Splunk: Forward me the REST of those shellsSplunk: Forward me the REST of those shells
Splunk: Forward me the REST of those shellsAnthony D Hendricks
 
Fuzzing RTC @ Kamailio World 2019
Fuzzing RTC @ Kamailio World 2019Fuzzing RTC @ Kamailio World 2019
Fuzzing RTC @ Kamailio World 2019Lorenzo Miniero
 

Ähnlich wie Open source tools for optimizing your peering infrastructure @ DE-CIX TechMeeting 2018 (20)

Fluentd at HKOScon
Fluentd at HKOSconFluentd at HKOScon
Fluentd at HKOScon
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTW
 
Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020
 
Fuzzing Janus @ IPTComm 2019
Fuzzing Janus @ IPTComm 2019Fuzzing Janus @ IPTComm 2019
Fuzzing Janus @ IPTComm 2019
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
 
from ai.backend import python @ pycontw2018
from ai.backend import python @ pycontw2018from ai.backend import python @ pycontw2018
from ai.backend import python @ pycontw2018
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance Analysis
 
Network Situational Awareness with d00gle
Network Situational Awareness with d00gleNetwork Situational Awareness with d00gle
Network Situational Awareness with d00gle
 
Python on exadata
Python on exadataPython on exadata
Python on exadata
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
Hail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceHail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open source
 
Kentik Network@Scale (Dan Ellis)
Kentik Network@Scale (Dan Ellis)Kentik Network@Scale (Dan Ellis)
Kentik Network@Scale (Dan Ellis)
 
Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)
Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)
Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
 
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
 
Howto createOpenFlow Switchusing FPGA (at FPGAX#6)
Howto createOpenFlow Switchusing FPGA (at FPGAX#6)Howto createOpenFlow Switchusing FPGA (at FPGAX#6)
Howto createOpenFlow Switchusing FPGA (at FPGAX#6)
 
Apache edgent
Apache edgentApache edgent
Apache edgent
 
Splunk: Forward me the REST of those shells
Splunk: Forward me the REST of those shellsSplunk: Forward me the REST of those shells
Splunk: Forward me the REST of those shells
 
Fuzzing RTC @ Kamailio World 2019
Fuzzing RTC @ Kamailio World 2019Fuzzing RTC @ Kamailio World 2019
Fuzzing RTC @ Kamailio World 2019
 

Kürzlich hochgeladen

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 

Kürzlich hochgeladen (20)

Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 

Open source tools for optimizing your peering infrastructure @ DE-CIX TechMeeting 2018

  • 1. Open source tools for optimizing your peering infrastructure @ DE-CIX TechMeeting 2018-06-06 by Daniel Czerwonk
  • 2. • Software / Network Engineer at Mauve Mailorder Software • Head of Network Freifunk Essen e.V. • AS44821 (Mauve), AS206356 (Freifunk Essen e.V.), AS202739 (routing-rocks) • birdwatcher and bio-routing contributor • Twitter: @dan_nrw • Github: https://github.com/czerwonk • LinkedIn: https://www.linkedin.com/in/czerwonk/ Who is this guy? About me…
  • 3. Our journey starts late 2016 A new networking setup is about to be build
  • 4. But before that: Let’s talk about monitoring…
  • 5. • Very small operations team • Freifunk Essen should be even less ops demanding • Identify trends/anomalies early • Capacity planing (beware of retention) • Source for alerting • Start point for traffic engineering, etc. • Source to build post mortem on (in case of outage) • Dashboard to give a quick overview when needed Why is monitoring important for me?
  • 6. So, let’s build a monitoring system…
  • 7. • Prometheus to collect metrics • Grafana to visualize metrics • Alertmanager with Pushover integration for alerting • Everything Ansible managed What I wanted… + +
  • 8. • Bird routing daemon • JunOS running on a few EX series switches • Host metrics from bare metal software router machines (statistics, resources) • External network latencies (RIPE ATLAS, etc.) What I wanted to scrape?
  • 10. In 2016… Metric Solution Problem bird no exporter available JunOS snmp_exporter complex configuration, bad performance Host metrics node_exporter Network latencies blackbox_exporter with external probe VMs bad coverage, only one request per scrape
  • 11. • Official Prometheus project • On Linux hosts (e.g. Routers) • Network interface metrics • Resource consumption: CPU load, RAM usage, Disk space • Interrupts / context switches • License: Apache 2.0 • Source: https://github.com/prometheus/node_exporter node_exporter
  • 12. At least we got the host metrics covered. And the rest? I had to solve that…
  • 13. So I started to write some exporters…
  • 14. • Performance is key feature • Need for concurrent processing • Single binary / no dependencies • Easy installation via go get … • Existing client API for Prometheus • Love writing code in golang in my spare time Which programming language? I chose golang:
  • 15. atlas_exporter RIPE ATLAS Milestones to an exporter suite bird_exporter Bird 1.x 2016 20182017 RIPE LABS article Support for bird 2.x Replaced SNMP by SSH junos_exporter Juniper JunOS using SNMP ping_exporter ICMP probing mikrotik-exporter RouterOS
  • 16. • Started late 2016 • Communicates with bird via socket • Bird 1.x and 2.x supported • Protocols: BGP, OSPFv2, OSPFv3, Kernel, Static, Device, Direct • License: MIT • Source: https://github.com/czerwonk/bird_exporter bird_exporter
  • 18. • BGP session state metrics • BGP message counts (received, sent, withdrawn, etc.) • Prefix counts for all supported protocols (imported, exported, filtered, etc.) • OSPFv2/OSPFv3 neighbour counts • Protocol uptime bird_exporter - Features
  • 19. • Started early 2018 • Replacement for RRD based smokeping • Concerning ICMP also replacement for blackbox_exporter since lack of loss detection • Based on go-ping by Digineo: https://github.com/digineo/go-ping • License: MIT • Source: https://github.com/czerwonk/ping_exporter ping_exporter
  • 21. • Sends and aggregates multiple ICMP ECHO requests • Roundtrip metrics (current, best, worst) • Simple way to detect loss • Supports multiple targets • DNS refresh ensures the correct IP is measured when DNS is changed • Only ICMP support at the moment • Warning: ICMP is not user traffic so keep that in mind when trying to interpret these metrics ping_exporter - Features
  • 22. • Started early 2017 • Metrics by requesting measurement results from RIPE ATLAS • Useful to get an outside view from different other networks • License: LGPL3 (since the binding used is under this license) • Source: https://github.com/czerwonk/atlas_exporter • More info: https://labs.ripe.net/Members/daniel_czerwonk/using-ripe-atlas-measurement- results-in-prometheus-with-atlas_exporter atlas_exporter
  • 24. • Ping (success, min/max/avg latency, dups, size) • Traceroute (success, hop count, rtt) • NTP (delay, derivation, ntp version) • DNS (succress, rtt) • HTTP (return code, rtt, http version, header size, body size) • SSL Certificates (alert, rtt) atlas_exporter - Features
  • 25. • Started late 2017 • snmp_exporter did not perform as required • First implementation using a simple set of SNMP OIDs • Early 2018: reimplementation using SSH and XML RPC representation • Alternative to Junipers OpenNTI since telemetry is only supported on newer versions of JunOS and hardware • License: MIT • Source: https://github.com/czerwonk/junos_exporter junos_exporter
  • 26. • Interfaces (bytes transmitted/received, errors, drops) • Routes (per table, by protocol) • Alarms (count) • BGP (message count, prefix counts per peer, session state) • OSPFv2, OSPFv3 (number of neighbours) • Interface diagnostics (optical signals) • ISIS (number of adjacencies, total number of routers) • Environment (temperatures) • Routing engine statistics junos_exporter - Features
  • 27. • Contribution to existing project • Only interface and resource metrics at this point • Added several other features • License: BSD3 • Source: https://github.com/nshttpd/mikrotik-exporter mikrotik-exporter
  • 28. • Interface metrics (RX bytes, TX bytes, drops, errors, etc.) • BGP session states • BGP message counts (updates, withdraws) • DHCP leases • DHCPv6 bindings • Optical diagnostics • IPv4/IPv6 pool counts • System resources (memory, CPU load, etc.) • Prefix counts per protocol (in RIB) mikrotik-exporter - Features
  • 29. Dashboard examples How to combine several exporters?
  • 33. How to alert? What the SRE book has taught us: https://landing.google.com/sre/book/chapters/monitoring-distributed-systems.html
  • 34. How to alert? A few examples… Port saturation: Upstream session down:
  • 35. Thank you for your attention. Special thanks to all people contributed to my projects!