SlideShare a Scribd company logo
1 of 48
Download to read offline
2021/11/14
Hojin Shim / Site Reliabilty Engineer
ELK Stack - Log 처리 속도 개선
요청량 평균 약 100만건/분, Log 가 밀리기 시작했다.
Various Logging Pipeline


Architecture Patterns
Logging Patterns
Well-known patterns
• Remote logging 

• File Logging & Cron backup

• Logging pipeline without stream

• Logging pipeline with stream
Logging Patterns
Remote Logging
App Somewhere
Logging over network
Ex)

Logback / log4j of java
DB, Storage, etc.
• Low risk of losing records

• High risk of lag / throughput
Logging Patterns
File Logging & Cron Backup
App
PutObject
S3
• High risk of losing records 

• It’s depends on deployment patterns

• Di
ffi
cult to analyse

• It’s simple
Cron
Disk volume
Logging Patterns
Logging Pipeline Patterns (w/o stream)
App
• Risk of high throughput

• Risk of losing records
Forwarder


(pre-
processor)
Disk volume
Forwarder


(Post-
processor)
Search Engine
Logging Patterns
Logging Pipeline Patterns (w/ stream)
App
• Low risk of high throughput

• Low risk of losing records 

• High cost
Forwarder


(pre-
processor)
Disk volume
Forwarder


(post-
processor)
Search Engine
Stream
Logging Patterns
ELK Stack (Elastic Stack)
App
• Low risk of high throughput & losing records

• High cost

• Requires deep & wide technical knowledge
Disk volume
Elasticsearch
MSK (Kafka)
Filebeat
Logstash
Kibana
&
$$$ $$$
Logging Lag
Increase logging
Elasticsearch
MSK (Kafka)
App
Dis
Fi
Logstash
App
Dis
Fi
App
Dis
Fi
Requests
Lag!!
Now
Lag
What is the problem?
So many things could be a reason
• Filebeat I/O problem

• Kafka performance problem

• Logstash slow ingestion / processing problem

• Elasticsearch performance problem

• etc
Measurement
Measurement
What to measure?
• Basic system
metrics

• Etc
• Basic system
metrics

• Burst balance

• Bandwidth throttling

• Lag per topics

• Etc
• Basic system
metrics

• Num of events
processed

• Etc
• Basic system
metrics

• Indexing rate /
latency

• Etc
Filebeat MSK
(Kafka) Logstash
Elasticsearch
Measurement
How to measure? (Based on my experience)
• Telegraf 

• In
fl
uxDB

• Grafana
• Cloudwatch

• Burrow /
Prometheus

• Elasticsearch

• Grafana

• Telegraf

• Elasticsearch

• Grafana
• Cloudwatch

• Grafana
Filebeat MSK
(Kafka) Logstash
Elasticsearch
Measurement
How to measure? (Based on my experience)
• Telegraf 

• In
fl
uxDB

• Grafana
• Cloudwatch

• Burrow /
Prometheus

• Elasticsearch

• Grafana

• Telegraf

• Elasticsearch

• Grafana
• Cloudwatch

• Grafana
Filebeat MSK
(Kafka) Logstash
Elasticsearch
Consumer Lag monitoring Logstash processing rate monitoring
Measurement
Consumer Lag
Measurement
Consumer-lag
https://www.lightbend.com/blog/monitor-kafka-consumer-group-latency-with-kafka-lag-exporter
Measurement
Consumer-lag measurement
• Kubernetes friendly way

• Open Monitoring with Prometheus 



• All the time available way (demo in this session)

• Burrow / Telegraf
Measurement
Burrow / Telegraf
• Burrow

• Open source developed by Linkedin

• Apache Kafka monitoring tool

• HTTP endpoint for information

• Telegraf

• Open source developed by In
fl
uxdata

• All purpose gathering metrics

• Plugin systems
Measurement
Consumer-lag measurement with Burrow
MSK
(Kafka)
Burrow / Telegraf
Elasticsearch Grafana
Burrow Telegraf
Measurement
Burrow con
fi
g code snippet
..
.

..
.

..
.

[zookeeper
]

servers=[ "z-3.elk.abc.kafka.ap-northeast-2.amazonaws.com:2181","z-2.elk.kafka.ap-northeast-2.amazonaws.com:2181",

"z-1.product-elk-msk-abc.kafka.ap-northeast-2.amazonaws.com:2181"
]

timeout=
6

root-path="/burrow
"

[consumer.product-elk
]

class-name="kafka
"

cluster="product-elk
"

servers=[ "b-2.elk.kafka.ap-northeast-2.amazonaws.com:9094","b-1.elk.kafka.ap-northeast-2.amazonaws.com:9094"
]

client-profile=“your_prpfile
”

group-denylist=“^(some-group-|python-kafka-consumer-|quick-).*$
"

group-allowlist="
"

[cluster.product-elk
]

class-name="kafka
"

servers=[ “b-2.elk.abc.kafka.ap-northeast-2.amazonaws.com:9094”,"b-1.elk.abc.kafka.ap-northeast-2.amazonaws.com:9094"
]

client-profile="test
"

topic-refresh=6
0

offset-refresh=3
0

[tls.msk-mTLS
]

cafile="/etc/burrow/truststore.pem
"

noverify=tru
e

..
.

..
.

..
.

If you use clients / brokers encryption
Your zookeeper endpoint
Your bootstrap server endpoint
Burrow con
fi
guration - /etc/burrow/burrow.toml
Measurement
Telegraf con
fi
g code snippet
[[inputs.burrow]
]

servers = [“https://your.burrow-endpoint.com”
]

topics_exclude = [ "__consumer_offsets"
]

groups_exclude = ["console-*"
]

[inputs.burrow.tags
]

burrow = "burrow
"

[[outputs.elasticsearch]
]

urls = [ “http://your-elasticsearch-endpoint:9200”
]

timeout = "5s
"

enable_sniffer = fals
e

health_check_interval = "10s
"

index_name = "burrow-%Y.%m.%d
"

manage_template = tru
e

template_name = "telegraf-burrow
"

[outputs.elasticsearch.tagpass
]

burrow = ["burrow"]
Use tag if you have another metrics
Filter metric by tags
telegraf con
fi
guration - /etc/telegraf/telegraf.d/burrow.conf
Measurement
Data from burrow index
Some Topic Name
Lag Information
Partition
Measurement
Visualization with Grafana
Some Topic Lag
Some
topic
Some
topic
Measurement
Logstash Processing Rate
Measurement
Visualizatoin with Timelion
input
{

kafka
{

bootstrap_servers => "b-2.elk.abc.kafka.ap-northeast-2.amazonaws.com:9094,b-1.elk.abc.kafka.ap-northeast-2.amazonaws.com:9094
"

topics_pattern => "*
"

consumer_threads =>
1

codec => "json
"

decorate_events => tru
e

group_id => "logstash
"

security_protocol => "SSL
"

ssl_truststore_location => "/logstash/kafka.client.truststore.jks
"

enable_auto_commit => "true
"

}

}

..
.

filter
{

..
.

metrics
{

meter => "events
"

add_tag => "metric
"

add_field =>
{

"lsname" => “some-logstash
”

}

}
}

...

output
{

else if "metric" in [tags]
{

elasticsearch
{

hosts => ["eskibana.prd.in.musinsa.com:9200"
]

index => "logstash-metric-%{+yyyy.MM.dd}
"

}

..
.

}

Add logstash metric
logstash pipeline con
fi
guration - ./logstash/pipeline/logstash.conf
Measurement
Data from burrow index
Some Logstash Name
Event processing rate 1m
Measurement
Visualizatoin with Timelion
Problems & Solves
Logstash grok performance
Logstash filter performance
grok grok grok!
• Some log message might cause parsing problem

• Some special characters

• Long log messages

• Etc
http://some-domain/app/product/goodsview_stats/1474978/0?
utm_source=naver_jisicshopping&utm_medium=sh&source=NVSH&NaPm=ct%3Dkvyxfobc%7Cci%3Dd4151183d55ce2828c56f84eb392eab7338b2026%7Ctr%3Dslct%7Csn%3D204973%7Chk
ab6de6182e50b01b182e15ae740bcb84ce&menu=view&3Dcee524ab6de6182e50b01b182e15ae740bcb84ce&q=b3Dcee524ab6de6182e50b01b182e15ae740bcb84ce.....................
Logstash filter performance
grok grok grok!
[2021-09-03T17:26:25,923][WARN ][logstash.filters.grok ][main]
[8c1ed634e6ffe7026b0a684399b6a4893634d376554d997095836bd11d71a1c7]


Timeout executing grok


'%{IPORHOST:[nginx][access][remote_ip]} ......................'
https://www.elastic.co/guide/en/logstash/current/plugins-
fi
lters-grok.html#plugins-
fi
lters-grok-timeout_millis
Logstash filter performance
grok grok grok!
...

...

..
.

filter
{

if [event][dataset] == "nginx.access"
{

grok
{

match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - ................”]
}

remove_field => "message
"

timeout_millis => 30
0

}

...

...

...
Add short grok parsing timeout
logstash pipeline con
fi
guration - ./logstash/pipeline/logstash.conf
Problems & Solves
Logstash pipeline & batch
Logstash pipeline & batch
Too many topics to ingest
• The number of workers and CPU cores

• How many messages fetch each time

• How long to wait for undersized batch
https://www.elastic.co/guide/en/logstash/6.8/logstash-settings-
fi
le.html#logstash-settings-
fi
le
Logstash pipeline & batch
Too many topics to ingest
• The number of workers and CPU cores

• Same as CPU cores or little more

• How many messages fetch each time

• Default value is 125, New value is 1000

• How long to wait for undersized batch
https://www.elastic.co/guide/en/logstash/6.8/logstash-settings-
fi
le.html#logstash-settings-
fi
le
- pipeline.id: mai
n

path.config: "/usr/share/logstash/pipeline
"

pipeline.workers:
4

pipeline.batch.size: 100
0

pipeline.batch.delay: 5
0

logstash con
fi
guration - logstash.yaml
Problems & Solves
Kafka Partitions
Kakfa Partitions
Unbalanced input messages. It’s natural.
Order Service
Auth Service
Inventory Service
Order Topic
Inventory Topic
Auth Topic
Less log message
Heavy log message
Same amount of log ingestion per each topic
High consumer-lag possibility
Increase a number of partitions
Kakfa Partitions
Wait. What is partitions?
https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8
Topic with one partition
Writes Injest
Partition 0
Kakfa Partitions
Wait. What is partitions?
https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8
Topic with multiple partition
Writes
Partition 0
Partition 1
Partition 2
Injest
Kakfa Partitions
Wait. What is partitions?
https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8
#!/bin/bas
h

## get topic
s

ZOOKEEPER=z-3.elk.abc.kafka.ap-northeast-2.amazonaws.com:218
1

bin/kafka-topics.sh --list --zookeeper $ZOOKEEPER > topiclist.txt
 

## increase partition
s

while read line; d
o

echo "$line
"

bin/kafka-topics.sh --zookeeper $ZOOKEEPER --alter --topic $line --partitions
3

sleep 1
;

done < topiclist.tx
t

• Increase partitions of all existing topics
...
default.replication.factor=
2

num.partitions=3
log.retention.hours = 4
8

delete.topic.enable=tru
e

...
• Increase partitions from Kafka default setting (this is no e
ff
ect on existing topics)
Kakfa Partitions
Partitions / Consumers
Topic with multiple partition
Writes
Partition 0
Partition 1
Partition 2
input
{

kafka
{

..
.

bootstrap_servers => "...
"

topics_pattern => "*
"

consumer_threads =>
1

..
.

}

}

https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html
Sequential injest
Injest
Kakfa Partitions
Partitions / Consumers
Topic with multiple partition
Writes
Partition 0
Partition 1
Partition 2
input
{

kafka
{

..
.

bootstrap_servers => "...
"

topics_pattern => "*
"

consumer_threads =>
3

..
.

}

}

https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html
Parallel injest
Injest
Kakfa Partitions
Partitions / Consumers
Topic with multiple partition
Writes
Partition 0
Partition 1
Partition 2
input
{

kafka
{

..
.

bootstrap_servers => "...
"

topics_pattern => "*
"

consumer_threads =>
1

..
.

}

}

https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html
Injest
Live demo
My architecture
ELK Stack (Elastic Stack)
Elasticsearch
MSK (Kafka)
A
Di
F
Logstash
A
Di
F
A
Di
F
A
Di
F
A
Di
F
Improve partition settings
S3
Improve grok parser


Increase consumers
Wrap-up
Wrap-up
• First of all, measure it!

• Log Forwarder (in my case Logstash)

• Improve parsing performance (grok)

• Increase number of forwarders

• Message Stream (in my case Kafka)

• Partitioning

More Related Content

What's hot

What's hot (20)

噛み砕いてKafka Streams #kafkajp
噛み砕いてKafka Streams #kafkajp噛み砕いてKafka Streams #kafkajp
噛み砕いてKafka Streams #kafkajp
 
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
 
20명 규모의 팀에서 Vault 사용하기
20명 규모의 팀에서 Vault 사용하기20명 규모의 팀에서 Vault 사용하기
20명 규모의 팀에서 Vault 사용하기
 
Managed Service Provider(MSP)によるマルチOrganizations管理の裏側(Security JAWS 第24回 発表資料)
Managed Service Provider(MSP)によるマルチOrganizations管理の裏側(Security JAWS 第24回 発表資料)Managed Service Provider(MSP)によるマルチOrganizations管理の裏側(Security JAWS 第24回 発表資料)
Managed Service Provider(MSP)によるマルチOrganizations管理の裏側(Security JAWS 第24回 発表資料)
 
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교  및 구축 방법[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교  및 구축 방법
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
 
超実践 Cloud Spanner 設計講座
超実践 Cloud Spanner 設計講座超実践 Cloud Spanner 設計講座
超実践 Cloud Spanner 設計講座
 
PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...
PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...
PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...
 
OSC東京2013/Spring_JPUG資料
OSC東京2013/Spring_JPUG資料OSC東京2013/Spring_JPUG資料
OSC東京2013/Spring_JPUG資料
 
インフラ野郎 Azureチーム at クラウド boost
インフラ野郎 Azureチーム at クラウド boostインフラ野郎 Azureチーム at クラウド boost
インフラ野郎 Azureチーム at クラウド boost
 
Stream Processing 과 Confluent Cloud 시작하기
Stream Processing 과 Confluent Cloud 시작하기Stream Processing 과 Confluent Cloud 시작하기
Stream Processing 과 Confluent Cloud 시작하기
 
Prometheus on NKS
Prometheus on NKSPrometheus on NKS
Prometheus on NKS
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
 
Terraform
TerraformTerraform
Terraform
 
AKS と ACI を組み合わせて使ってみた
AKS と ACI を組み合わせて使ってみたAKS と ACI を組み合わせて使ってみた
AKS と ACI を組み合わせて使ってみた
 
OSMC 2021 | Introduction into OpenSearch
OSMC 2021 | Introduction into OpenSearchOSMC 2021 | Introduction into OpenSearch
OSMC 2021 | Introduction into OpenSearch
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication
 
Amazon Aurora Deep Dive (김기완) - AWS DB Day
Amazon Aurora Deep Dive (김기완) - AWS DB DayAmazon Aurora Deep Dive (김기완) - AWS DB Day
Amazon Aurora Deep Dive (김기완) - AWS DB Day
 
MySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELKMySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELK
 

Similar to How to improve ELK log pipeline performance

Similar to How to improve ELK log pipeline performance (20)

YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...
 DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and... DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...
DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...
 
Comprehensive Monitoring for Docker
Comprehensive Monitoring for DockerComprehensive Monitoring for Docker
Comprehensive Monitoring for Docker
 
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica SarbuOSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
 
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica SarbuOSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
Logging & Docker - Season 2
Logging & Docker - Season 2Logging & Docker - Season 2
Logging & Docker - Season 2
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
 
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsCloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Tracer
TracerTracer
Tracer
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Deploy secure, scalable, and highly available web apps with Azure Front Door ...
Deploy secure, scalable, and highly available web apps with Azure Front Door ...Deploy secure, scalable, and highly available web apps with Azure Front Door ...
Deploy secure, scalable, and highly available web apps with Azure Front Door ...
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
 
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
 
Mcas log collector deck
Mcas log collector deckMcas log collector deck
Mcas log collector deck
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 

Recently uploaded

Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 

Recently uploaded (20)

KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 

How to improve ELK log pipeline performance

  • 1. 2021/11/14 Hojin Shim / Site Reliabilty Engineer ELK Stack - Log 처리 속도 개선 요청량 평균 약 100만건/분, Log 가 밀리기 시작했다.
  • 3. Logging Patterns Well-known patterns • Remote logging • File Logging & Cron backup • Logging pipeline without stream • Logging pipeline with stream
  • 4. Logging Patterns Remote Logging App Somewhere Logging over network Ex)
 Logback / log4j of java DB, Storage, etc. • Low risk of losing records • High risk of lag / throughput
  • 5. Logging Patterns File Logging & Cron Backup App PutObject S3 • High risk of losing records • It’s depends on deployment patterns • Di ffi cult to analyse • It’s simple Cron Disk volume
  • 6. Logging Patterns Logging Pipeline Patterns (w/o stream) App • Risk of high throughput • Risk of losing records Forwarder 
 (pre- processor) Disk volume Forwarder 
 (Post- processor) Search Engine
  • 7. Logging Patterns Logging Pipeline Patterns (w/ stream) App • Low risk of high throughput • Low risk of losing records • High cost Forwarder 
 (pre- processor) Disk volume Forwarder 
 (post- processor) Search Engine Stream
  • 8. Logging Patterns ELK Stack (Elastic Stack) App • Low risk of high throughput & losing records • High cost • Requires deep & wide technical knowledge Disk volume Elasticsearch MSK (Kafka) Filebeat Logstash Kibana & $$$ $$$
  • 12. What is the problem? So many things could be a reason • Filebeat I/O problem • Kafka performance problem • Logstash slow ingestion / processing problem • Elasticsearch performance problem • etc
  • 14. Measurement What to measure? • Basic system metrics • Etc • Basic system metrics • Burst balance • Bandwidth throttling • Lag per topics • Etc • Basic system metrics • Num of events processed • Etc • Basic system metrics • Indexing rate / latency • Etc Filebeat MSK (Kafka) Logstash Elasticsearch
  • 15. Measurement How to measure? (Based on my experience) • Telegraf • In fl uxDB • Grafana • Cloudwatch • Burrow / Prometheus • Elasticsearch • Grafana • Telegraf • Elasticsearch • Grafana • Cloudwatch • Grafana Filebeat MSK (Kafka) Logstash Elasticsearch
  • 16. Measurement How to measure? (Based on my experience) • Telegraf • In fl uxDB • Grafana • Cloudwatch • Burrow / Prometheus • Elasticsearch • Grafana • Telegraf • Elasticsearch • Grafana • Cloudwatch • Grafana Filebeat MSK (Kafka) Logstash Elasticsearch Consumer Lag monitoring Logstash processing rate monitoring
  • 19. Measurement Consumer-lag measurement • Kubernetes friendly way • Open Monitoring with Prometheus 
 
 • All the time available way (demo in this session) • Burrow / Telegraf
  • 20. Measurement Burrow / Telegraf • Burrow • Open source developed by Linkedin • Apache Kafka monitoring tool • HTTP endpoint for information
 • Telegraf • Open source developed by In fl uxdata • All purpose gathering metrics • Plugin systems
  • 21. Measurement Consumer-lag measurement with Burrow MSK (Kafka) Burrow / Telegraf Elasticsearch Grafana Burrow Telegraf
  • 22. Measurement Burrow con fi g code snippet .. . .. . .. . [zookeeper ] servers=[ "z-3.elk.abc.kafka.ap-northeast-2.amazonaws.com:2181","z-2.elk.kafka.ap-northeast-2.amazonaws.com:2181",
 "z-1.product-elk-msk-abc.kafka.ap-northeast-2.amazonaws.com:2181" ] timeout= 6 root-path="/burrow " [consumer.product-elk ] class-name="kafka " cluster="product-elk " servers=[ "b-2.elk.kafka.ap-northeast-2.amazonaws.com:9094","b-1.elk.kafka.ap-northeast-2.amazonaws.com:9094" ] client-profile=“your_prpfile ” group-denylist=“^(some-group-|python-kafka-consumer-|quick-).*$ " group-allowlist=" " [cluster.product-elk ] class-name="kafka " servers=[ “b-2.elk.abc.kafka.ap-northeast-2.amazonaws.com:9094”,"b-1.elk.abc.kafka.ap-northeast-2.amazonaws.com:9094" ] client-profile="test " topic-refresh=6 0 offset-refresh=3 0 [tls.msk-mTLS ] cafile="/etc/burrow/truststore.pem " noverify=tru e .. . .. . .. . If you use clients / brokers encryption Your zookeeper endpoint Your bootstrap server endpoint Burrow con fi guration - /etc/burrow/burrow.toml
  • 23. Measurement Telegraf con fi g code snippet [[inputs.burrow] ] servers = [“https://your.burrow-endpoint.com” ] topics_exclude = [ "__consumer_offsets" ] groups_exclude = ["console-*" ] [inputs.burrow.tags ] burrow = "burrow " [[outputs.elasticsearch] ] urls = [ “http://your-elasticsearch-endpoint:9200” ] timeout = "5s " enable_sniffer = fals e health_check_interval = "10s " index_name = "burrow-%Y.%m.%d " manage_template = tru e template_name = "telegraf-burrow " [outputs.elasticsearch.tagpass ] burrow = ["burrow"] Use tag if you have another metrics Filter metric by tags telegraf con fi guration - /etc/telegraf/telegraf.d/burrow.conf
  • 24. Measurement Data from burrow index Some Topic Name Lag Information Partition
  • 25. Measurement Visualization with Grafana Some Topic Lag Some topic Some topic
  • 27. Measurement Visualizatoin with Timelion input { kafka { bootstrap_servers => "b-2.elk.abc.kafka.ap-northeast-2.amazonaws.com:9094,b-1.elk.abc.kafka.ap-northeast-2.amazonaws.com:9094 " topics_pattern => "* " consumer_threads => 1 codec => "json " decorate_events => tru e group_id => "logstash " security_protocol => "SSL " ssl_truststore_location => "/logstash/kafka.client.truststore.jks " enable_auto_commit => "true " } } .. . filter { .. . metrics { meter => "events " add_tag => "metric " add_field => { "lsname" => “some-logstash ” } } } ...
 output { else if "metric" in [tags] { elasticsearch { hosts => ["eskibana.prd.in.musinsa.com:9200" ] index => "logstash-metric-%{+yyyy.MM.dd} " } .. . } Add logstash metric logstash pipeline con fi guration - ./logstash/pipeline/logstash.conf
  • 28. Measurement Data from burrow index Some Logstash Name Event processing rate 1m
  • 30. Problems & Solves Logstash grok performance
  • 31. Logstash filter performance grok grok grok! • Some log message might cause parsing problem • Some special characters • Long log messages • Etc http://some-domain/app/product/goodsview_stats/1474978/0? utm_source=naver_jisicshopping&utm_medium=sh&source=NVSH&NaPm=ct%3Dkvyxfobc%7Cci%3Dd4151183d55ce2828c56f84eb392eab7338b2026%7Ctr%3Dslct%7Csn%3D204973%7Chk ab6de6182e50b01b182e15ae740bcb84ce&menu=view&3Dcee524ab6de6182e50b01b182e15ae740bcb84ce&q=b3Dcee524ab6de6182e50b01b182e15ae740bcb84ce.....................
  • 32. Logstash filter performance grok grok grok! [2021-09-03T17:26:25,923][WARN ][logstash.filters.grok ][main] [8c1ed634e6ffe7026b0a684399b6a4893634d376554d997095836bd11d71a1c7] 
 Timeout executing grok 
 '%{IPORHOST:[nginx][access][remote_ip]} ......................' https://www.elastic.co/guide/en/logstash/current/plugins- fi lters-grok.html#plugins- fi lters-grok-timeout_millis
  • 33. Logstash filter performance grok grok grok! ...
 ...
 .. . filter { if [event][dataset] == "nginx.access" { grok { match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - ................”] } remove_field => "message " timeout_millis => 30 0 } ...
 ...
 ... Add short grok parsing timeout logstash pipeline con fi guration - ./logstash/pipeline/logstash.conf
  • 34. Problems & Solves Logstash pipeline & batch
  • 35. Logstash pipeline & batch Too many topics to ingest • The number of workers and CPU cores • How many messages fetch each time • How long to wait for undersized batch https://www.elastic.co/guide/en/logstash/6.8/logstash-settings- fi le.html#logstash-settings- fi le
  • 36. Logstash pipeline & batch Too many topics to ingest • The number of workers and CPU cores • Same as CPU cores or little more • How many messages fetch each time • Default value is 125, New value is 1000 • How long to wait for undersized batch https://www.elastic.co/guide/en/logstash/6.8/logstash-settings- fi le.html#logstash-settings- fi le - pipeline.id: mai n path.config: "/usr/share/logstash/pipeline " pipeline.workers: 4 pipeline.batch.size: 100 0 pipeline.batch.delay: 5 0 logstash con fi guration - logstash.yaml
  • 38. Kakfa Partitions Unbalanced input messages. It’s natural. Order Service Auth Service Inventory Service Order Topic Inventory Topic Auth Topic Less log message Heavy log message Same amount of log ingestion per each topic High consumer-lag possibility Increase a number of partitions
  • 39. Kakfa Partitions Wait. What is partitions? https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8 Topic with one partition Writes Injest Partition 0
  • 40. Kakfa Partitions Wait. What is partitions? https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8 Topic with multiple partition Writes Partition 0 Partition 1 Partition 2 Injest
  • 41. Kakfa Partitions Wait. What is partitions? https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8 #!/bin/bas h ## get topic s ZOOKEEPER=z-3.elk.abc.kafka.ap-northeast-2.amazonaws.com:218 1 bin/kafka-topics.sh --list --zookeeper $ZOOKEEPER > topiclist.txt ## increase partition s while read line; d o echo "$line " bin/kafka-topics.sh --zookeeper $ZOOKEEPER --alter --topic $line --partitions 3 sleep 1 ; done < topiclist.tx t • Increase partitions of all existing topics ... default.replication.factor= 2 num.partitions=3 log.retention.hours = 4 8 delete.topic.enable=tru e ... • Increase partitions from Kafka default setting (this is no e ff ect on existing topics)
  • 42. Kakfa Partitions Partitions / Consumers Topic with multiple partition Writes Partition 0 Partition 1 Partition 2 input { kafka { .. . bootstrap_servers => "... " topics_pattern => "* " consumer_threads => 1 .. . } } https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html Sequential injest Injest
  • 43. Kakfa Partitions Partitions / Consumers Topic with multiple partition Writes Partition 0 Partition 1 Partition 2 input { kafka { .. . bootstrap_servers => "... " topics_pattern => "* " consumer_threads => 3 .. . } } https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html Parallel injest Injest
  • 44. Kakfa Partitions Partitions / Consumers Topic with multiple partition Writes Partition 0 Partition 1 Partition 2 input { kafka { .. . bootstrap_servers => "... " topics_pattern => "* " consumer_threads => 1 .. . } } https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html Injest
  • 46. My architecture ELK Stack (Elastic Stack) Elasticsearch MSK (Kafka) A Di F Logstash A Di F A Di F A Di F A Di F Improve partition settings S3 Improve grok parser Increase consumers
  • 48. Wrap-up • First of all, measure it!
 • Log Forwarder (in my case Logstash) • Improve parsing performance (grok) • Increase number of forwarders
 • Message Stream (in my case Kafka) • Partitioning