SlideShare ist ein Scribd-Unternehmen logo
1 von 20
ELK @ LinkedIn
Scaling ELK with Kafka
Introduction
Tin Le (tinle@linkedin.com)
Senior Site Reliability Engineer
Formerly part of Mobile SRE team, responsible for servers
handling mobile apps (IOS, Android, Windows, RIM, etc.)
traffic.
Now responsible for guiding ELK @ LinkedIn as a whole
Problems
● Multiple data centers, ten of thousands of servers,
hundreds of billions of log records
● Logging, indexing, searching, storing, visualizing and
analysing all of those logs all day every day
● Security (access control, storage, transport)
● Scaling to more DCs, more servers, and even more
logs…
● ARRGGGGHHH!!!!!
Solutions
● Commercial
o Splunk, Sumo Logic, HP ArcSight Logger, Tibco,
XpoLog, Loggly, etc.
● Open Source
o Syslog + Grep
o Graylog
o Elasticsearch
o etc.
Criterias
● Scalable - horizontally, by adding more nodes
● Fast - as close to real time as possible
● Inexpensive
● Flexible
● Large user community (support)
● Open source
ELK!
The winner is...
Splunk ???
ELK at LinkedIn
● 100+ ELK clusters across 20+ teams and 6
data centers
● Some of our larger clusters have:
o Greater than 32+ billion docs (30+TB)
o Daily indices average 3.0 billion docs (~3TB)
ELK + Kafka
Summary: ELK is a popular open sourced application stack for
visualizing and analyzing logs. ELK is currently being used across
many teams within LinkedIn. The architecture we use is made up of
four components: Elasticsearch, Logstash, Kibana and Kafka.
● Elasticsearch: Distributed real-time search and analytics engine
● Logstash: Collect and parse all data sources into an easy-to-read
JSON format
● Kibana: Elasticsearch data visualization engine
● Kafka: Data transport, queue, buffer and short term storage
What is Kafka?
● Apache Kafka is a high-throughput distributed
messaging system
o Invented at LinkedIn and Open Sourced in 2011
o Fast, Scalable, Durable, and Distributed by Design
o Links for more:
 http://kafka.apache.org
 http://data.linkedin.com/opensource/kafka
Kafka at LinkedIn
● Common data transport
● Available and supported by dedicated team
o 875 Billion messages per day
o 200 TB/day In
o 700 TB/day Out
o Peak Load
 10.5 Million messages/s
 18.5 Gigabits/s Inbound
 70.5 Gigabits/s Outbound
Logging using Kafka at LinkedIn
● Dedicated cluster for logs in each data center
● Individual topics per application
● Defaults to 4 days of transport level retention
● Not currently replicating between data centers
● Common logging transport for all services, languages
and frameworks
ELK Architectural Concerns
● Network Concerns
o Bandwidth
o Network partitioning
o Latency
● Security Concerns
o Firewalls and ACLs
o Encrypting data in transit
● Resource Concerns
o A misbehaving application can swamp production resources
Multi-colo ELK Architecture
ELK Dashboard
13
Services
ELK Search
Clusters
Log
Transport
Kafka
ELK Search
Clusters
LinkedIn
Services
DC1
Services
Kafka
ELK Search
Clusters
DC2
Services
Kafka
ELK Search
Clusters
DC3
Tribes
Corp Data Centers
ELK Search Architecture
Kibana
Elasticsearch
(tribe)
Kafka
Elasticsearch
(master)
Logstash
Elasticsearch
(data node)
Logstash
Elasticsearch
(data node)
Users
Operational Challenges
● Data, lots of it.
o Transporting, queueing, storing, securing,
reliability…
o Ingesting & Indexing fast enough
o Scaling infrastructure
o Which data? (right data needed?)
o Formats, mapping, transformation
 Data from many sources: Java, Scala, Python, Node.js, Go
Operational Challenges...
● Centralized vs Siloed Cluster Management
● Aggregated views of data across the entire
infrastructure
● Consistent view (trace up/down app stack)
● Scaling - horizontally or vertically?
● Monitoring, alerting, auto-remediating
The future of ELK at LinkedIn
● More ELK clusters being used by even more teams
● Clusters with 300+ billion docs (300+TB)
● Daily indices average 10+ billion docs, 10TB - move to
hourly indices
● ~5,000 shards per cluster
Extra slides
Next two slides contain example logstash
configs to show how we use input pipe plugin
with Kafka Console Consumer, and how to
monitor logstash using metrics filter.
KCC pipe input config
pipe {
type => "mobile"
command => "/opt/bin/kafka-console-consumer/kafka-console-consumer.sh 
--formatter com.linkedin.avro.KafkaMessageJsonWithHexFormatter 
--property schema.registry.url=http://schema-
server.example.com:12250/schemaRegistry/schemas 
--autocommit.interval.ms=60000 
--zookeeper zk.example.com:12913/kafka-metrics 
--topic log_stash_event 
--group logstash1"
codec => “json”
}
Monitoring Logstash metrics
filter {
metrics {
meter => "events"
add_tag => "metric"
}
}
output {
if “metric” in [tags] [
stdout {
codec => line {
format => “Rate: %{events.rate_1m}”
}
}
}

Weitere ähnliche Inhalte

Was ist angesagt?

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019confluent
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistentconfluent
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDatabricks
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackMichel Tricot
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginnersNeil Baker
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Databricks
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producerconfluent
 
Software architecture for high traffic website
Software architecture for high traffic websiteSoftware architecture for high traffic website
Software architecture for high traffic websiteTung Nguyen Thanh
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergWalaa Eldin Moustafa
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in NetflixDanny Yuan
 

Was ist angesagt? (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stack
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Docker Kubernetes Istio
Docker Kubernetes IstioDocker Kubernetes Istio
Docker Kubernetes Istio
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Software architecture for high traffic website
Software architecture for high traffic websiteSoftware architecture for high traffic website
Software architecture for high traffic website
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and Iceberg
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 

Ähnlich wie ELK at LinkedIn - Kafka, scaling, lessons learned

Centralization of all log (application, docker, security, ...)
Centralization of all log (application, docker, security, ...)Centralization of all log (application, docker, security, ...)
Centralization of all log (application, docker, security, ...)Thierry Gayet
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudRick Bilodeau
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudStreamsets Inc.
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack IntroductionVikram Shinde
 
ELK stack introduction
ELK stack introduction ELK stack introduction
ELK stack introduction abenyeung1
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"Rob Winters
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackRohit Sharma
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Application of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibApplication of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibDavid Nzoputa Ofili
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overviewABC Talks
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h basehdhappy001
 
Elasticsearch features and ecosystem
Elasticsearch features and ecosystemElasticsearch features and ecosystem
Elasticsearch features and ecosystemPavel Alexeev
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problemsAbhishek Gupta
 
OSOM Operations in the Cloud
OSOM Operations in the CloudOSOM Operations in the Cloud
OSOM Operations in the Cloudmstuparu
 

Ähnlich wie ELK at LinkedIn - Kafka, scaling, lessons learned (20)

Centralization of all log (application, docker, security, ...)
Centralization of all log (application, docker, security, ...)Centralization of all log (application, docker, security, ...)
Centralization of all log (application, docker, security, ...)
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
 
Prashant_Agrawal_CV
Prashant_Agrawal_CVPrashant_Agrawal_CV
Prashant_Agrawal_CV
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
 
Spark meetup TCHUG
Spark meetup TCHUGSpark meetup TCHUG
Spark meetup TCHUG
 
ELK stack introduction
ELK stack introduction ELK stack introduction
ELK stack introduction
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK Stack
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
Application of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibApplication of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLib
 
DevOps, Yet Another IT Revolution
DevOps, Yet Another IT RevolutionDevOps, Yet Another IT Revolution
DevOps, Yet Another IT Revolution
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
 
Elasticsearch features and ecosystem
Elasticsearch features and ecosystemElasticsearch features and ecosystem
Elasticsearch features and ecosystem
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
 
OSOM Operations in the Cloud
OSOM Operations in the CloudOSOM Operations in the Cloud
OSOM Operations in the Cloud
 

Kürzlich hochgeladen

Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoilmeghakumariji156
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfJOHNBEBONYAP1
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsMonica Sydney
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查ydyuyu
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtrahman018755
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdfMatthew Sinclair
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsMonica Sydney
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasDigicorns Technologies
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理F
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查ydyuyu
 
Call girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girlsCall girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girlsMonica Sydney
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样ayvbos
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrHenryBriggs2
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Roommeghakumariji156
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge GraphsEleniIlkou
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.krishnachandrapal52
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiMonica Sydney
 

Kürzlich hochgeladen (20)

Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
Call girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girlsCall girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girls
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
 

ELK at LinkedIn - Kafka, scaling, lessons learned

  • 1. ELK @ LinkedIn Scaling ELK with Kafka
  • 2. Introduction Tin Le (tinle@linkedin.com) Senior Site Reliability Engineer Formerly part of Mobile SRE team, responsible for servers handling mobile apps (IOS, Android, Windows, RIM, etc.) traffic. Now responsible for guiding ELK @ LinkedIn as a whole
  • 3. Problems ● Multiple data centers, ten of thousands of servers, hundreds of billions of log records ● Logging, indexing, searching, storing, visualizing and analysing all of those logs all day every day ● Security (access control, storage, transport) ● Scaling to more DCs, more servers, and even more logs… ● ARRGGGGHHH!!!!!
  • 4. Solutions ● Commercial o Splunk, Sumo Logic, HP ArcSight Logger, Tibco, XpoLog, Loggly, etc. ● Open Source o Syslog + Grep o Graylog o Elasticsearch o etc.
  • 5. Criterias ● Scalable - horizontally, by adding more nodes ● Fast - as close to real time as possible ● Inexpensive ● Flexible ● Large user community (support) ● Open source
  • 7. ELK at LinkedIn ● 100+ ELK clusters across 20+ teams and 6 data centers ● Some of our larger clusters have: o Greater than 32+ billion docs (30+TB) o Daily indices average 3.0 billion docs (~3TB)
  • 8. ELK + Kafka Summary: ELK is a popular open sourced application stack for visualizing and analyzing logs. ELK is currently being used across many teams within LinkedIn. The architecture we use is made up of four components: Elasticsearch, Logstash, Kibana and Kafka. ● Elasticsearch: Distributed real-time search and analytics engine ● Logstash: Collect and parse all data sources into an easy-to-read JSON format ● Kibana: Elasticsearch data visualization engine ● Kafka: Data transport, queue, buffer and short term storage
  • 9. What is Kafka? ● Apache Kafka is a high-throughput distributed messaging system o Invented at LinkedIn and Open Sourced in 2011 o Fast, Scalable, Durable, and Distributed by Design o Links for more:  http://kafka.apache.org  http://data.linkedin.com/opensource/kafka
  • 10. Kafka at LinkedIn ● Common data transport ● Available and supported by dedicated team o 875 Billion messages per day o 200 TB/day In o 700 TB/day Out o Peak Load  10.5 Million messages/s  18.5 Gigabits/s Inbound  70.5 Gigabits/s Outbound
  • 11. Logging using Kafka at LinkedIn ● Dedicated cluster for logs in each data center ● Individual topics per application ● Defaults to 4 days of transport level retention ● Not currently replicating between data centers ● Common logging transport for all services, languages and frameworks
  • 12. ELK Architectural Concerns ● Network Concerns o Bandwidth o Network partitioning o Latency ● Security Concerns o Firewalls and ACLs o Encrypting data in transit ● Resource Concerns o A misbehaving application can swamp production resources
  • 13. Multi-colo ELK Architecture ELK Dashboard 13 Services ELK Search Clusters Log Transport Kafka ELK Search Clusters LinkedIn Services DC1 Services Kafka ELK Search Clusters DC2 Services Kafka ELK Search Clusters DC3 Tribes Corp Data Centers
  • 15. Operational Challenges ● Data, lots of it. o Transporting, queueing, storing, securing, reliability… o Ingesting & Indexing fast enough o Scaling infrastructure o Which data? (right data needed?) o Formats, mapping, transformation  Data from many sources: Java, Scala, Python, Node.js, Go
  • 16. Operational Challenges... ● Centralized vs Siloed Cluster Management ● Aggregated views of data across the entire infrastructure ● Consistent view (trace up/down app stack) ● Scaling - horizontally or vertically? ● Monitoring, alerting, auto-remediating
  • 17. The future of ELK at LinkedIn ● More ELK clusters being used by even more teams ● Clusters with 300+ billion docs (300+TB) ● Daily indices average 10+ billion docs, 10TB - move to hourly indices ● ~5,000 shards per cluster
  • 18. Extra slides Next two slides contain example logstash configs to show how we use input pipe plugin with Kafka Console Consumer, and how to monitor logstash using metrics filter.
  • 19. KCC pipe input config pipe { type => "mobile" command => "/opt/bin/kafka-console-consumer/kafka-console-consumer.sh --formatter com.linkedin.avro.KafkaMessageJsonWithHexFormatter --property schema.registry.url=http://schema- server.example.com:12250/schemaRegistry/schemas --autocommit.interval.ms=60000 --zookeeper zk.example.com:12913/kafka-metrics --topic log_stash_event --group logstash1" codec => “json” }
  • 20. Monitoring Logstash metrics filter { metrics { meter => "events" add_tag => "metric" } } output { if “metric” in [tags] [ stdout { codec => line { format => “Rate: %{events.rate_1m}” } } }

Hinweis der Redaktion

  1. 50+ % of site traffic come in via mobile
  2. Many applications. Mobile frontend logs: average 2.4TB size (3+ billion docs).
  3. Evaluated, used by some teams. Some Acquisitions use commercial solutions. Commercial solutions cost prohibitive.
  4. Expect storage size to increase as we migrate to using doc_values
  5. Leveraging open source stack. Large community. Leveraging common data transport. Rock solid, proven, dedicated support team.
  6. Fast : a single Kafka broker can handle hundreds of megabytes of reads and writes per sec from thousands of clients. Scaleable: Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of coordinated consumers. Durable: Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages w/o performance impact. Distributed by Design: Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
  7. These numbers are from LinkedIn Kafka presentation at Apache Con 2015. Over 1100 brokers over 50+ clusters Over 32000 topics Over 350 thousand partitions, not including replication.
  8. Log retention long enough for 3 days weekend. So we can re-index from Kafka if encounter issues. Dedicated log clusters to isolate traffic from other clusters. Two years ago, used our own kafka input plugin. Then switched to using KCC via pipe input for performance reason. Monitoring: . It’s important to monitor logstash nodes. LS has a bug where an error in any of its input, filter or output will stop the entire LS process. See metrics filter config file at end of these slides.
  9. We’ve chosen to keep all of our clients local to the clusters and use a tiered architecture due to several major concerns. The primary concern is around the networking itself. Kafka enables multiple consumers to read the same topic, which means if we are reading remotely, we are copying messages over expensive inter-datacenter connections multiple times. We also have to handle problems like network partitioning in every client. Granted, you can have a partition even within a single datacenter, but it happens much more frequently when you are dealing with large distances. There’s also the concern of latency in connections – distance increases latency. Latency can cause interesting problems in client applications, and I like life to be boring. There are also security concerns around talking across datacenters. If we keep all of our clients local, we do not have to worry about ACL problems between the clients and the brokers (and Zookeeper as well). We can also deal with the problem of encrypting data in transit much more easily. This is one problem we have not worried about as much, but it is becoming a big concern now. The last concern is over resource usage. Everything at LinkedIn talks to Kafka, and a problem that takes out a production cluster is a major event. It could mean we have to shift traffic out of the datacenter until we resolve it, or it could result in inconsistent behavior in applications. Any application could overwhelm a cluster, but there are some, such as applications that run in Hadoop, that are more prone to this. By keeping those clients talking to a cluster that is separate from the front end, we mitigate resource contention.
  10. For security reasons, data/logs generated in each DC stays there. Indexed by local ELK cluster. Aggregated views via Tribe nodes. All logstash use common filters to catch most common data leakage. How services log to Kafka - imposed common logging library. All services use common library, which automatically log to Kafka, WARN or above.
  11. General architecture for each ELK cluster. Dedicated masters. Tribe client node (HTTP services).
  12. Data - reliable transport, storing, queueing, consuming, indexing. Some data (java service logs for example) not in right format. Solutions to Data Kafka as transport, storage queue, backbone. More logstash instances, more Kafka partitions. Using KCC we can consume faster than ES can index. To increase indexing speed More ES nodes (horizontal). More shards (distribute work) Customized templates
  13. Using local Kafka log clusters instead of aggregated metrics. Tribe to aggregate clusters. Use internal tool call Nurse to monitor and auto-remediate (restarts) hung/dead instances of LS and ES.
  14. These numbers are estimate based on growth rate and plans. Beside logs, we have other application use cases internally.
  15. This is how we use Logstash pipe input plugin to call out to Kafka Console Consumer. This currently give us the highest ingestion throughput.
  16. It’s important to monitor logstash nodes. LS has a bug where an error in any of its input, filter or output will stop the entire LS process. You can use Logstash metrics filter to make sure that LS is still processing data. Sometime LS runs but no data goes through. This will let you know when that happens.