SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
The ELK Stack @ Inbot
Jilles van Gurp - Inbot Inc.
Who is Jilles?
www.jillesvangurp.com, and @jillesvangurpon everything I've signed up for
Java (J)Ruby Python Javascript/node.js
Servers reluctant Devops guy Software Architecture
Universities of Utrecht (NL), Blekinge (SE), and Groningen (NL)
GX (NL),Nokia Research (FI), Nokia/Here (DE),Localstream (DE),
Inbot(DE).
Inbot app - available for Android & IOS
ELK Stack?
Elasticsearch
Logstash
Kibana
Recent trends
Clustered/scalable time series DBs
Other people than sysadmins looking at graphs
Databases do some funky stuff these days: aggregations, search
Serverless, Docker, Amazon Lambda, Microservices etc. - where do the logs go?
More moving parts = more logs than ever
Logging
Kind of a boring topic ...
Stuff runs on servers, cloud, whatever
Produces errors, warnings, debug, telemetry, analytics, kpis, ux events, ...
Where does all this go and how do you make sense of it?
WHAT IS HAPPENING??!?!
Old school: Cat, grep, awk, cut, ….
Good luck with that on 200GB of unstructured logs from a gazillion microservices
on 40 virtual machines, docker images, etc.
That doesn't really work anymore ...
If you are doing this: you are doing it wrong!
Hadoop ecosystem?
Works great for structured data, if you know what you are looking for.
Requires a lot of infrastructure and hassle.
Not really real-time, tedious to explore data
Some hipster with a Ph.D. will fix it or ...
I’m not a data scientist, are you?
Monitoring/graphing ecosystem
Mostly geared at measuring stuff like cpu load, IO, memory, etc.
Intended for system administrators
What about the higher level stuff?
You probably should do monitoring but it’s not really what we need either ...
So, ELK ….
Logging
Most languages/servers ship with awful logging defaults, you can fix this
Log enough but not too much or too little.
Log at the right log level ⇒ Turn off DEBUG log. Use ERROR sparingly.
Log metadata so you can pick your logs apart ⇒ Metadata == json fields.
Log opportunistically, it's cheap
Too much logging
Your Elasticsearch cluster dies/you pay a fortune to keep data around that you
don’t need.
Not enough logging
Something happened, you don’t know what because there’s nothing in the logs;
you can't find back relevant events because metadata is missing.
You are going to waste what you saved in cost on finding out WTF is going on,
probably more.
Log entries in ELK
{
"message": "[3017772.750979] device-mapper: thin: 252:0: unable to service pool target messages
in READ_ONLY or FAIL mode",
"@timestamp": "2016-08-16T09:50:01.000Z",
"type": "syslog",
"host": "10.1.6.7",
"priority": 3,
"timestamp": "Aug 16 09:50:01",
"logsource": "ip-10-1-6-7",
"program": "kernel",
"severity": 3,
"facility": 0,
"facility_label": "kernel",
"severity_label": "Error"
}
Plumbing your logs
Simple problem: given some logs, convert it into json and shove it into
Elasticsearch.
Lots of components to help you do that: Logstash, Docker Gelf driver, Beats, etc.
If you can, log json natively: e.g. Logback logstash driver, http://jsonlines.org/
Ca. 40 Amazon EC2 instances, most of which have docker containers
VPC with several subnets and dmz.
Testing, production, and dev environments + dev infrastructure.
AWS comes with monitoring & alerts for basic stuff.
Everything logs to http://logs-internal.inbot.io/
Elasticsearch 2.2.0, logstash 2.2.1, kibana 4.4.1
1 week data retention, 14M events/day
Inbot technical setup
Demo time
Things to watch out for
Avoid split brains and other nasty ES failure modes -> RTFM & configure ...
Data retention policies are not optional
Use curator https://github.com/elastic/curator
Customise your mappings, changing them sucks on a live logstash cluster.
Dynamic mappings on fields that sometimes look like a number will break shit.
Running out of CPU credits in Amazon can kill your ES cluster
ES Rolling restarts take time when you have 6 months of logs
Mapped Diagnostic Context (MDC)
Common in java logging fws - log4j, slf4j, logback, etc.
Great for adding context to your logs
E.g. user_id, request url, host name, environment, headers, user agent, etc.
Makes it easy to slice and dice your logs
{
MDC.put("user_id","123");
LOG.info("some message");
MDC.remove("user_id");
}
MDC for node.js: our log4js fork
https://github.com/joona/log4js-node
Allows for MDC style attributes
Sorry: works for us but not in shape for pull request; maybe later.
But: this was an easy hack.
MdcContext
https://github.com/Inbot/inbot-utils/blob/master/src/main/java/io/inbot/utils/MdcCont
ext.java
try(MdcContext ctx=MdcContext.create()){
ctx.put("user_id","123");
LOG.info("some message");
}
Application Metrics
http://metrics.dropwizard.io/
Add counters, timers, gauges, etc. to your business logic.
metrics.register("httpclient_leased", new Gauge<Integer>() {
@Override
public Integer getValue() {
return connectionManager.getTotalStats().getLeased();
}
});
Reporter uses MDC to log once per minute: giant json blob but it works.
Docker Gelf driver
Configure your docker hosts to log the output of any docker containers using the
log driver.
command, container id, etc. become fields in log entry
nice as a fallback when you don't control the logging
/usr/bin/docker daemon --log-driver=gelf --log-opt gelf-address=udp://logs-internal.inbot.io:12201
Thanks
@jillesvangurp, @inbotapp

Weitere ähnliche Inhalte

Was ist angesagt?

Debugging and Testing ES Systems
Debugging and Testing ES SystemsDebugging and Testing ES Systems
Debugging and Testing ES SystemsChris Birchall
 
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et KibanaJournée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et KibanaPublicis Sapient Engineering
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...ForgeRock
 
Elk with Openstack
Elk with OpenstackElk with Openstack
Elk with OpenstackArun prasath
 
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupStartit
 
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppet
 
Advanced troubleshooting linux performance
Advanced troubleshooting linux performanceAdvanced troubleshooting linux performance
Advanced troubleshooting linux performanceForthscale
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 
Experiences in ELK with D3.js for Large Log Analysis and Visualization
Experiences in ELK with D3.js  for Large Log Analysis  and VisualizationExperiences in ELK with D3.js  for Large Log Analysis  and Visualization
Experiences in ELK with D3.js for Large Log Analysis and VisualizationSurasak Sanguanpong
 
OpenStack Log Mining
OpenStack Log MiningOpenStack Log Mining
OpenStack Log MiningJohn Stanford
 
Side by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and SolrSide by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and SolrSematext Group, Inc.
 
CouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourCouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourPeter Friese
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...Sematext Group, Inc.
 
Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?inovex GmbH
 

Was ist angesagt? (20)

Using Logstash, elasticsearch & kibana
Using Logstash, elasticsearch & kibanaUsing Logstash, elasticsearch & kibana
Using Logstash, elasticsearch & kibana
 
Debugging and Testing ES Systems
Debugging and Testing ES SystemsDebugging and Testing ES Systems
Debugging and Testing ES Systems
 
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et KibanaJournée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
 
Elk with Openstack
Elk with OpenstackElk with Openstack
Elk with Openstack
 
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
 
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
 
Advanced troubleshooting linux performance
Advanced troubleshooting linux performanceAdvanced troubleshooting linux performance
Advanced troubleshooting linux performance
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
Experiences in ELK with D3.js for Large Log Analysis and Visualization
Experiences in ELK with D3.js  for Large Log Analysis  and VisualizationExperiences in ELK with D3.js  for Large Log Analysis  and Visualization
Experiences in ELK with D3.js for Large Log Analysis and Visualization
 
LogStash in action
LogStash in actionLogStash in action
LogStash in action
 
OpenStack Log Mining
OpenStack Log MiningOpenStack Log Mining
OpenStack Log Mining
 
elk_stack_alexander_szalonnas
elk_stack_alexander_szalonnaselk_stack_alexander_szalonnas
elk_stack_alexander_szalonnas
 
Side by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and SolrSide by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and Solr
 
CouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourCouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 Hour
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELK
 
Introducing CouchDB
Introducing CouchDBIntroducing CouchDB
Introducing CouchDB
 
Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?
 

Ähnlich wie The ELK Stack @ Inbot: Logging and Analyzing Logs at Scale

Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stackVikrant Chauhan
 
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...Maarten Balliauw
 
Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at nightMichael Yarichuk
 
Search and analyze data in real time
Search and analyze data in real timeSearch and analyze data in real time
Search and analyze data in real timeRohit Kalsarpe
 
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management....NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...NETFest
 
Migrating the elastic stack to the cloud, or application logging @ travix
 Migrating the elastic stack to the cloud, or application logging @ travix Migrating the elastic stack to the cloud, or application logging @ travix
Migrating the elastic stack to the cloud, or application logging @ travixRuslan Lutsenko
 
Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2ice799
 
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud
 
Application Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.keyApplication Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.keyTim Bunce
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Sujee Maniyam
 
Smarter internet of things with stream and event processing virtual io_t_meet...
Smarter internet of things with stream and event processing virtual io_t_meet...Smarter internet of things with stream and event processing virtual io_t_meet...
Smarter internet of things with stream and event processing virtual io_t_meet...Istvan Rath
 
Low level java programming
Low level java programmingLow level java programming
Low level java programmingPeter Lawrey
 
DotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NETDotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NETMaarten Balliauw
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Hernan Costante
 
Exploring .NET memory management - A trip down memory lane - Copenhagen .NET ...
Exploring .NET memory management - A trip down memory lane - Copenhagen .NET ...Exploring .NET memory management - A trip down memory lane - Copenhagen .NET ...
Exploring .NET memory management - A trip down memory lane - Copenhagen .NET ...Maarten Balliauw
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the Worldjhugg
 
Kibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod serversKibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod serversHYS Enterprise
 
ConFoo - Exploring .NET’s memory management – a trip down memory lane
ConFoo - Exploring .NET’s memory management – a trip down memory laneConFoo - Exploring .NET’s memory management – a trip down memory lane
ConFoo - Exploring .NET’s memory management – a trip down memory laneMaarten Balliauw
 
Redis - for duplicate detection on real time stream
Redis - for duplicate detection on real time streamRedis - for duplicate detection on real time stream
Redis - for duplicate detection on real time streamCodemotion
 
Redis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRedis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRoberto Franchini
 

Ähnlich wie The ELK Stack @ Inbot: Logging and Analyzing Logs at Scale (20)

Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stack
 
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
 
Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at night
 
Search and analyze data in real time
Search and analyze data in real timeSearch and analyze data in real time
Search and analyze data in real time
 
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management....NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
 
Migrating the elastic stack to the cloud, or application logging @ travix
 Migrating the elastic stack to the cloud, or application logging @ travix Migrating the elastic stack to the cloud, or application logging @ travix
Migrating the elastic stack to the cloud, or application logging @ travix
 
Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2
 
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
 
Application Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.keyApplication Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.key
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2
 
Smarter internet of things with stream and event processing virtual io_t_meet...
Smarter internet of things with stream and event processing virtual io_t_meet...Smarter internet of things with stream and event processing virtual io_t_meet...
Smarter internet of things with stream and event processing virtual io_t_meet...
 
Low level java programming
Low level java programmingLow level java programming
Low level java programming
 
DotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NETDotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NET
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
 
Exploring .NET memory management - A trip down memory lane - Copenhagen .NET ...
Exploring .NET memory management - A trip down memory lane - Copenhagen .NET ...Exploring .NET memory management - A trip down memory lane - Copenhagen .NET ...
Exploring .NET memory management - A trip down memory lane - Copenhagen .NET ...
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
Kibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod serversKibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod servers
 
ConFoo - Exploring .NET’s memory management – a trip down memory lane
ConFoo - Exploring .NET’s memory management – a trip down memory laneConFoo - Exploring .NET’s memory management – a trip down memory lane
ConFoo - Exploring .NET’s memory management – a trip down memory lane
 
Redis - for duplicate detection on real time stream
Redis - for duplicate detection on real time streamRedis - for duplicate detection on real time stream
Redis - for duplicate detection on real time stream
 
Redis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRedis for duplicate detection on real time stream
Redis for duplicate detection on real time stream
 

Kürzlich hochgeladen

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 

Kürzlich hochgeladen (20)

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 

The ELK Stack @ Inbot: Logging and Analyzing Logs at Scale

  • 1. The ELK Stack @ Inbot Jilles van Gurp - Inbot Inc.
  • 2. Who is Jilles? www.jillesvangurp.com, and @jillesvangurpon everything I've signed up for Java (J)Ruby Python Javascript/node.js Servers reluctant Devops guy Software Architecture Universities of Utrecht (NL), Blekinge (SE), and Groningen (NL) GX (NL),Nokia Research (FI), Nokia/Here (DE),Localstream (DE), Inbot(DE).
  • 3. Inbot app - available for Android & IOS
  • 5. Recent trends Clustered/scalable time series DBs Other people than sysadmins looking at graphs Databases do some funky stuff these days: aggregations, search Serverless, Docker, Amazon Lambda, Microservices etc. - where do the logs go? More moving parts = more logs than ever
  • 6. Logging Kind of a boring topic ... Stuff runs on servers, cloud, whatever Produces errors, warnings, debug, telemetry, analytics, kpis, ux events, ... Where does all this go and how do you make sense of it? WHAT IS HAPPENING??!?!
  • 7. Old school: Cat, grep, awk, cut, …. Good luck with that on 200GB of unstructured logs from a gazillion microservices on 40 virtual machines, docker images, etc. That doesn't really work anymore ... If you are doing this: you are doing it wrong!
  • 8. Hadoop ecosystem? Works great for structured data, if you know what you are looking for. Requires a lot of infrastructure and hassle. Not really real-time, tedious to explore data Some hipster with a Ph.D. will fix it or ... I’m not a data scientist, are you?
  • 9. Monitoring/graphing ecosystem Mostly geared at measuring stuff like cpu load, IO, memory, etc. Intended for system administrators What about the higher level stuff? You probably should do monitoring but it’s not really what we need either ...
  • 11. Logging Most languages/servers ship with awful logging defaults, you can fix this Log enough but not too much or too little. Log at the right log level ⇒ Turn off DEBUG log. Use ERROR sparingly. Log metadata so you can pick your logs apart ⇒ Metadata == json fields. Log opportunistically, it's cheap
  • 12. Too much logging Your Elasticsearch cluster dies/you pay a fortune to keep data around that you don’t need. Not enough logging Something happened, you don’t know what because there’s nothing in the logs; you can't find back relevant events because metadata is missing. You are going to waste what you saved in cost on finding out WTF is going on, probably more.
  • 13. Log entries in ELK { "message": "[3017772.750979] device-mapper: thin: 252:0: unable to service pool target messages in READ_ONLY or FAIL mode", "@timestamp": "2016-08-16T09:50:01.000Z", "type": "syslog", "host": "10.1.6.7", "priority": 3, "timestamp": "Aug 16 09:50:01", "logsource": "ip-10-1-6-7", "program": "kernel", "severity": 3, "facility": 0, "facility_label": "kernel", "severity_label": "Error" }
  • 14. Plumbing your logs Simple problem: given some logs, convert it into json and shove it into Elasticsearch. Lots of components to help you do that: Logstash, Docker Gelf driver, Beats, etc. If you can, log json natively: e.g. Logback logstash driver, http://jsonlines.org/
  • 15. Ca. 40 Amazon EC2 instances, most of which have docker containers VPC with several subnets and dmz. Testing, production, and dev environments + dev infrastructure. AWS comes with monitoring & alerts for basic stuff. Everything logs to http://logs-internal.inbot.io/ Elasticsearch 2.2.0, logstash 2.2.1, kibana 4.4.1 1 week data retention, 14M events/day Inbot technical setup
  • 17. Things to watch out for Avoid split brains and other nasty ES failure modes -> RTFM & configure ... Data retention policies are not optional Use curator https://github.com/elastic/curator Customise your mappings, changing them sucks on a live logstash cluster. Dynamic mappings on fields that sometimes look like a number will break shit. Running out of CPU credits in Amazon can kill your ES cluster ES Rolling restarts take time when you have 6 months of logs
  • 18. Mapped Diagnostic Context (MDC) Common in java logging fws - log4j, slf4j, logback, etc. Great for adding context to your logs E.g. user_id, request url, host name, environment, headers, user agent, etc. Makes it easy to slice and dice your logs { MDC.put("user_id","123"); LOG.info("some message"); MDC.remove("user_id"); }
  • 19. MDC for node.js: our log4js fork https://github.com/joona/log4js-node Allows for MDC style attributes Sorry: works for us but not in shape for pull request; maybe later. But: this was an easy hack.
  • 21. Application Metrics http://metrics.dropwizard.io/ Add counters, timers, gauges, etc. to your business logic. metrics.register("httpclient_leased", new Gauge<Integer>() { @Override public Integer getValue() { return connectionManager.getTotalStats().getLeased(); } }); Reporter uses MDC to log once per minute: giant json blob but it works.
  • 22. Docker Gelf driver Configure your docker hosts to log the output of any docker containers using the log driver. command, container id, etc. become fields in log entry nice as a fallback when you don't control the logging /usr/bin/docker daemon --log-driver=gelf --log-opt gelf-address=udp://logs-internal.inbot.io:12201