SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
Go runtime related problems in TiDB
in production
The Dark Side Of Go
Ed Huang, CTO, PingCAP
What's TiDB?
Open source, distributed SQL database
written in Go (and Rust).
pingcap/tidb
● Latency in scheduler
■ case study: latency jitter under high load due to fair
scheduling policy
● Memory control
■ case study: Abnormal memory usage due to
Transparent Huge Pages
● GC Related
■ case study: GC sweep caused latency jitter
■ case study: Lock and NUMA aware
Agenda
Part I - Latency in scheduler
● The client consists of a goroutine and a channel
○ The channel batch the request
○ A goroutine run read-send-recv loop
○ Typical MPSC
Background
Description
● When the machine load and CPU usage is high, but the network IO pressure is reasonable
● But the RPC metrics show a high latency
Network
Ready Goroutine
Wakeup
4.368ms
● Network IO is ready => goroutine wake up == 4.368ms
○ Sometime even 10ms+ latency here!
○ The time spend on runtime schedule is not negligible
● When CPU is overload, which goroutine should be given priority?
Analysis
● The goroutine is special, it block all the callers
● The scheduler treat them equally
Analysis
● Under heavy workload, goroutines get longer to be scheduled
● The runtime scheduling does not consider priority
● CPU dense workload could affect IO latency
Conclusion
Part II - Memory control
● Go Runtime
○ Allocated from OS (mmaped)
○ Managed Memory
■ Should the memory be returned to the OS?
○ Inuse Memory
■ How much memory is *really* used?
○ GC
■ After each GC, the unused objects are released
○ Idle Memory
■ the unused objects are returned to Managed Memory, not OS
Background: What are we talking about when we
talk about Memory
Background: What are we talking about when we
talk about Memory
● TLB (translation lookaside buffer)
○ A CPU cache for virtual memory to physical
addresses mapping
○ Size limited, when cache miss the memory
access become slower
● Default page size 4K
● THP, increase page size to 2M or more
○ Page size larger, so the page entries for a
process is less
○ Less page entries means better cache hit rate
○ Result in better memory access performance
Background: THP (transparent huge pages)
Case study: THP (transparent huge pages)
TiDB memory usage is abnormal, and heap memory usage is incorrect.
Description
● The TiDB server memory footprint is abnormal
● The memory available on this node is not too much
Description
● The Go Runtime thinks it does not use much memory
● The OS does not release the memory (RSS is high)
Investigate
Alloc from OS
Reserved by Go Runtime
Check /proc/{PID}/smaps to see the memory
mapping of the TiDB server processor
AnonHugePages 1980416 kB
The memory is occupied by THP
Investigate
Compare with other nodes in the cluster with THP disabled
Investigate
● So, the root cause must be related to THP (transparent huge pages)
● But … why?
Analysis
● Go Runtime manage memory at 8K size granularity
● Go Runtime give hint to OS about the use of the memory block
● With THP enabled, the memory block address is round up to huge
page boundary
● Fragment!!!
● Fragments make the memory difficult to be reclamed by the OS
Analysis
● And more confusing behavior by the OS: merging
pages into huge pages
○ The user program return 4K pages to OS
○ khugepaged turn a single 4K page to 2M
○ bloating the process’s RSS by as much as 512x
● Some clues from the Go source code
○ https://github.com/golang/go/blob/release-branch.go1.11/src
/runtime/mem_linux.go#L38
Analysis
Follow up
● TiDB has now suggest user to disabled the use of THP
○ Just like MySQL / Oracle / Redis / … did
Part III - GC Related
A brief introduction about GC
What happen when collector can not catch up allocation rate?
● How the garbage recycle interrupt the user program in memory allocation?
○ 25% CPU is dedicated to GC
○ MARK ASSIST
○ …
○
A brief introduction about GC
Case study: GC Sweep caused latency jitter
weird latency for simple requests (primary key updates)
weird latency from 30ms to 800ms
CPU Network Disk load is normal
● The query duration is unstable, varies in a wide range from 30ms to 800ms
● No problem with the IO, CPU, network
Description
So, where does the latency come from?
Investigation
curl http://0.0.0.0:10080/debug/pprof/trace?seconds=60
Get the trace information
Observation:
The query is blocked by GC sweep
Investigation
● The garbage collector can stop individual goroutines, even without STW
○ eg. A goroutine that needs to allocate memory right after a GC cycle is required to help with
the sweep work
● But why … so long?
○ Let’s dig the rabbit hole
Analysis
Analysis
● Wait, scan 1G memory?
● So, that’s why it takes so long!
● for only releasing 8K?
Analysis
And we find a similar issue latency in sweep assists when there's no garbage, and also the fix
● Verified with the fix, the latency jitter gone
○ yeah, that’s the problem
Analysis
● One more thing … how TiDB trigger that bug?
○ statistics module uses a lot of long-lived 8K
objects
○ all these objects are all alive after GC
Analysis
● Sweep, until it finds a free slot ...
● … or there are no more spans to sweep
● a lot of inuse objects after the mark phase
● Although the STW duration is low enough for Go GC
● Unexpected latency jitter caused by GC may still exist
Conclusion
Case study: Lock and NUMA aware
QPS bottleneck, but CPU usage ~60%
● TiDB server CPU usage is around 60%
● Networking and IO are both OK
● Increase the benchmark client concurrency, throughput does not increase
● It seems the resources are not exhausted, but where is the bottleneck?
Description
● Something make the CPU poorly used
● So the bottleneck should be one of them?
○ Mutex
○ GC
○ NUMA
Hypothesis
curl -G "172.16.5.2:10080/debug/pprof/mutex" > mutex.profile
go tool pprof -http 0.0.0.0:4002 mutex.profile
Is it caused by mutex?
We found some useless mutex in modules in :
● GRPC
● log slow query
● log slow cop task
● query feedback
After optimization:
QPS 4.3K => 5.2K, CPU 58.2% => 64%
Bind CPUs to NUMA nodes
Is it caused by NUMA?
● QPS 5.2K => 7.59K
● P999 3.06s => 2s
● CPU usage 64% => 88.49%
● Minor page fault 229.8K => 8.68K
This single change get the biggest improvement!
RAW Optimized Mutex Bind CPU Core +
Optimized Mutex
● Off CPU analysis with linuxki
○ From the scheduler activity report, 67% of time is spend on sleeping
Is it caused by GC?
runtime.(*mheap).freeSpan!!! it’s about 14%
Is it caused by GC?
● There is a lock on the page allocator, see this issue make the page allocator scale
● A high allocation rate combined with a high degree of parallelism leads to significant contention
● The performance is not scalable on a machine with lots of CPUs
○ PS: The Go folks have already done a lot of work on it
Analysis
● NUMA (non-uniform memory access)
○ sockets
○ CPUs
○ local memory access
● OS allocation strategy
○ balanced
○ local
● Minor Page Faults might be related to
the page migration caused by NUMA
balancing
Background: a bit about NUMA
● Unfortunately, Go's GC is not NUMA aware
Analysis: NUMA aware
● Those factors make Go runtime less scalable on large number of cores
○ Lock contention in the allocation path
○ GC is not NUMA aware
○ ...
Conclusion
Recap
● CASE1: Batching client requests
■ latency in scheduler
● CASE2: THP(transparent huge pages)
■ memory control
● CASE3: GC sweep caused latency jitter
● CASE4: Lock and NUMA aware
■ scalability on multiple cores
Thank you!
Twitter: @dxhuang
pingcap.com
github.com/pingcap/tidb

Weitere ähnliche Inhalte

Was ist angesagt?

Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafkaconfluent
 
Sizing Your MongoDB Cluster
Sizing Your MongoDB ClusterSizing Your MongoDB Cluster
Sizing Your MongoDB ClusterMongoDB
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight OverviewJacques Nadeau
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkDataWorks Summit
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
Thousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/OThousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/OGeorge Cao
 
Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Ryan Cuprak
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101Data Con LA
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveYingjun Wu
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Mayank Shrivastava
 
Intro to Telegraf
Intro to TelegrafIntro to Telegraf
Intro to TelegrafInfluxData
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Ontico
 
Realtime Indexing for Fast Queries on Massive Semi-Structured Data
Realtime Indexing for Fast Queries on Massive Semi-Structured DataRealtime Indexing for Fast Queries on Massive Semi-Structured Data
Realtime Indexing for Fast Queries on Massive Semi-Structured DataScyllaDB
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Kafka Retry and DLQ
Kafka Retry and DLQKafka Retry and DLQ
Kafka Retry and DLQGeorge Teo
 

Was ist angesagt? (20)

Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafka
 
Sizing Your MongoDB Cluster
Sizing Your MongoDB ClusterSizing Your MongoDB Cluster
Sizing Your MongoDB Cluster
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight Overview
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Airflow and supervisor
Airflow and supervisorAirflow and supervisor
Airflow and supervisor
 
Thousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/OThousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/O
 
Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWave
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020
 
Intro to Telegraf
Intro to TelegrafIntro to Telegraf
Intro to Telegraf
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...
 
Realtime Indexing for Fast Queries on Massive Semi-Structured Data
Realtime Indexing for Fast Queries on Massive Semi-Structured DataRealtime Indexing for Fast Queries on Massive Semi-Structured Data
Realtime Indexing for Fast Queries on Massive Semi-Structured Data
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Kafka Retry and DLQ
Kafka Retry and DLQKafka Retry and DLQ
Kafka Retry and DLQ
 

Ähnlich wie The Dark Side Of Go -- Go runtime related problems in TiDB in production

Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloudOVHcloud
 
Improving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at UberImproving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at UberYing Zheng
 
Data Lessons Learned at Scale
Data Lessons Learned at ScaleData Lessons Learned at Scale
Data Lessons Learned at ScaleCharlie Reverte
 
Scalability broad strokes
Scalability   broad strokesScalability   broad strokes
Scalability broad strokesGagan Bajpai
 
Raft Engine Meetup 220702.pdf
Raft Engine Meetup 220702.pdfRaft Engine Meetup 220702.pdf
Raft Engine Meetup 220702.pdffengxun
 
Linux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud TrainingLinux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud TrainingChinaNetCloud
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streamingdatamantra
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
Migrating to Apache Spark at Netflix
Migrating to Apache Spark at NetflixMigrating to Apache Spark at Netflix
Migrating to Apache Spark at NetflixDatabricks
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseGruter
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseJihoon Son
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidImply
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioAlluxio, Inc.
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
 

Ähnlich wie The Dark Side Of Go -- Go runtime related problems in TiDB in production (20)

Mongodb meetup
Mongodb meetupMongodb meetup
Mongodb meetup
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 
Improving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at UberImproving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at Uber
 
Data Lessons Learned at Scale
Data Lessons Learned at ScaleData Lessons Learned at Scale
Data Lessons Learned at Scale
 
Scalability broad strokes
Scalability   broad strokesScalability   broad strokes
Scalability broad strokes
 
Raft Engine Meetup 220702.pdf
Raft Engine Meetup 220702.pdfRaft Engine Meetup 220702.pdf
Raft Engine Meetup 220702.pdf
 
Linux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud TrainingLinux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud Training
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Migrating to Apache Spark at Netflix
Migrating to Apache Spark at NetflixMigrating to Apache Spark at Netflix
Migrating to Apache Spark at Netflix
 
Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with Alluxio
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Caching in
Caching inCaching in
Caching in
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 

Mehr von PingCAP

[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...PingCAP
 
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big DataPingCAP
 
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...PingCAP
 
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-TreePingCAP
 
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
[Paper Reading]The Bw-Tree: A B-tree for New Hardware PlatformsPingCAP
 
[Paper Reading] QAGen: Generating query-aware test databases
[Paper Reading] QAGen: Generating query-aware test databases[Paper Reading] QAGen: Generating query-aware test databases
[Paper Reading] QAGen: Generating query-aware test databasesPingCAP
 
[Paper Reading] Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
[Paper Reading]  Leases: An Efficient Fault-Tolerant Mechanism for Distribute...[Paper Reading]  Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
[Paper Reading] Leases: An Efficient Fault-Tolerant Mechanism for Distribute...PingCAP
 
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...PingCAP
 
[Paperreading] Paxos made easy (by sen han)
[Paperreading]  Paxos made easy (by sen han)[Paperreading]  Paxos made easy (by sen han)
[Paperreading] Paxos made easy (by sen han)PingCAP
 
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...PingCAP
 
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...PingCAP
 
TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote PingCAP
 
Finding Logic Bugs in Database Management Systems
Finding Logic Bugs in Database Management SystemsFinding Logic Bugs in Database Management Systems
Finding Logic Bugs in Database Management SystemsPingCAP
 
Chaos Practice in PingCAP
Chaos Practice in PingCAPChaos Practice in PingCAP
Chaos Practice in PingCAPPingCAP
 
TiDB at PayPay
TiDB at PayPayTiDB at PayPay
TiDB at PayPayPingCAP
 
Paper Reading: FPTree
Paper Reading: FPTreePaper Reading: FPTree
Paper Reading: FPTreePingCAP
 
Paper Reading: Smooth Scan
Paper Reading: Smooth ScanPaper Reading: Smooth Scan
Paper Reading: Smooth ScanPingCAP
 
Paper Reading: Flexible Paxos
Paper Reading: Flexible PaxosPaper Reading: Flexible Paxos
Paper Reading: Flexible PaxosPingCAP
 
Paper reading: Cost-based Query Transformation in Oracle
Paper reading: Cost-based Query Transformation in OraclePaper reading: Cost-based Query Transformation in Oracle
Paper reading: Cost-based Query Transformation in OraclePingCAP
 
Paper reading: HashKV and beyond
Paper reading: HashKV and beyondPaper reading: HashKV and beyond
Paper reading: HashKV and beyondPingCAP
 

Mehr von PingCAP (20)

[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
 
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
 
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
 
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
 
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
 
[Paper Reading] QAGen: Generating query-aware test databases
[Paper Reading] QAGen: Generating query-aware test databases[Paper Reading] QAGen: Generating query-aware test databases
[Paper Reading] QAGen: Generating query-aware test databases
 
[Paper Reading] Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
[Paper Reading]  Leases: An Efficient Fault-Tolerant Mechanism for Distribute...[Paper Reading]  Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
[Paper Reading] Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
 
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
 
[Paperreading] Paxos made easy (by sen han)
[Paperreading]  Paxos made easy (by sen han)[Paperreading]  Paxos made easy (by sen han)
[Paperreading] Paxos made easy (by sen han)
 
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
 
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
 
TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote
 
Finding Logic Bugs in Database Management Systems
Finding Logic Bugs in Database Management SystemsFinding Logic Bugs in Database Management Systems
Finding Logic Bugs in Database Management Systems
 
Chaos Practice in PingCAP
Chaos Practice in PingCAPChaos Practice in PingCAP
Chaos Practice in PingCAP
 
TiDB at PayPay
TiDB at PayPayTiDB at PayPay
TiDB at PayPay
 
Paper Reading: FPTree
Paper Reading: FPTreePaper Reading: FPTree
Paper Reading: FPTree
 
Paper Reading: Smooth Scan
Paper Reading: Smooth ScanPaper Reading: Smooth Scan
Paper Reading: Smooth Scan
 
Paper Reading: Flexible Paxos
Paper Reading: Flexible PaxosPaper Reading: Flexible Paxos
Paper Reading: Flexible Paxos
 
Paper reading: Cost-based Query Transformation in Oracle
Paper reading: Cost-based Query Transformation in OraclePaper reading: Cost-based Query Transformation in Oracle
Paper reading: Cost-based Query Transformation in Oracle
 
Paper reading: HashKV and beyond
Paper reading: HashKV and beyondPaper reading: HashKV and beyond
Paper reading: HashKV and beyond
 

Kürzlich hochgeladen

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Kürzlich hochgeladen (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

The Dark Side Of Go -- Go runtime related problems in TiDB in production

  • 1. Go runtime related problems in TiDB in production The Dark Side Of Go Ed Huang, CTO, PingCAP
  • 2. What's TiDB? Open source, distributed SQL database written in Go (and Rust). pingcap/tidb
  • 3. ● Latency in scheduler ■ case study: latency jitter under high load due to fair scheduling policy ● Memory control ■ case study: Abnormal memory usage due to Transparent Huge Pages ● GC Related ■ case study: GC sweep caused latency jitter ■ case study: Lock and NUMA aware Agenda
  • 4. Part I - Latency in scheduler
  • 5. ● The client consists of a goroutine and a channel ○ The channel batch the request ○ A goroutine run read-send-recv loop ○ Typical MPSC Background
  • 6. Description ● When the machine load and CPU usage is high, but the network IO pressure is reasonable ● But the RPC metrics show a high latency Network Ready Goroutine Wakeup 4.368ms
  • 7. ● Network IO is ready => goroutine wake up == 4.368ms ○ Sometime even 10ms+ latency here! ○ The time spend on runtime schedule is not negligible ● When CPU is overload, which goroutine should be given priority? Analysis
  • 8. ● The goroutine is special, it block all the callers ● The scheduler treat them equally Analysis
  • 9. ● Under heavy workload, goroutines get longer to be scheduled ● The runtime scheduling does not consider priority ● CPU dense workload could affect IO latency Conclusion
  • 10. Part II - Memory control
  • 11. ● Go Runtime ○ Allocated from OS (mmaped) ○ Managed Memory ■ Should the memory be returned to the OS? ○ Inuse Memory ■ How much memory is *really* used? ○ GC ■ After each GC, the unused objects are released ○ Idle Memory ■ the unused objects are returned to Managed Memory, not OS Background: What are we talking about when we talk about Memory
  • 12. Background: What are we talking about when we talk about Memory
  • 13. ● TLB (translation lookaside buffer) ○ A CPU cache for virtual memory to physical addresses mapping ○ Size limited, when cache miss the memory access become slower ● Default page size 4K ● THP, increase page size to 2M or more ○ Page size larger, so the page entries for a process is less ○ Less page entries means better cache hit rate ○ Result in better memory access performance Background: THP (transparent huge pages)
  • 14. Case study: THP (transparent huge pages) TiDB memory usage is abnormal, and heap memory usage is incorrect.
  • 15. Description ● The TiDB server memory footprint is abnormal
  • 16. ● The memory available on this node is not too much Description
  • 17. ● The Go Runtime thinks it does not use much memory ● The OS does not release the memory (RSS is high) Investigate Alloc from OS Reserved by Go Runtime
  • 18. Check /proc/{PID}/smaps to see the memory mapping of the TiDB server processor AnonHugePages 1980416 kB The memory is occupied by THP Investigate
  • 19. Compare with other nodes in the cluster with THP disabled Investigate
  • 20. ● So, the root cause must be related to THP (transparent huge pages) ● But … why? Analysis
  • 21. ● Go Runtime manage memory at 8K size granularity ● Go Runtime give hint to OS about the use of the memory block ● With THP enabled, the memory block address is round up to huge page boundary ● Fragment!!! ● Fragments make the memory difficult to be reclamed by the OS Analysis
  • 22. ● And more confusing behavior by the OS: merging pages into huge pages ○ The user program return 4K pages to OS ○ khugepaged turn a single 4K page to 2M ○ bloating the process’s RSS by as much as 512x ● Some clues from the Go source code ○ https://github.com/golang/go/blob/release-branch.go1.11/src /runtime/mem_linux.go#L38 Analysis
  • 23. Follow up ● TiDB has now suggest user to disabled the use of THP ○ Just like MySQL / Oracle / Redis / … did
  • 24. Part III - GC Related
  • 26. What happen when collector can not catch up allocation rate? ● How the garbage recycle interrupt the user program in memory allocation? ○ 25% CPU is dedicated to GC ○ MARK ASSIST ○ … ○ A brief introduction about GC
  • 27. Case study: GC Sweep caused latency jitter weird latency for simple requests (primary key updates) weird latency from 30ms to 800ms CPU Network Disk load is normal
  • 28. ● The query duration is unstable, varies in a wide range from 30ms to 800ms ● No problem with the IO, CPU, network Description So, where does the latency come from?
  • 29. Investigation curl http://0.0.0.0:10080/debug/pprof/trace?seconds=60 Get the trace information Observation: The query is blocked by GC sweep
  • 31. ● The garbage collector can stop individual goroutines, even without STW ○ eg. A goroutine that needs to allocate memory right after a GC cycle is required to help with the sweep work ● But why … so long? ○ Let’s dig the rabbit hole Analysis
  • 32. Analysis ● Wait, scan 1G memory? ● So, that’s why it takes so long! ● for only releasing 8K?
  • 33. Analysis And we find a similar issue latency in sweep assists when there's no garbage, and also the fix
  • 34. ● Verified with the fix, the latency jitter gone ○ yeah, that’s the problem Analysis ● One more thing … how TiDB trigger that bug? ○ statistics module uses a lot of long-lived 8K objects ○ all these objects are all alive after GC
  • 35. Analysis ● Sweep, until it finds a free slot ... ● … or there are no more spans to sweep ● a lot of inuse objects after the mark phase
  • 36. ● Although the STW duration is low enough for Go GC ● Unexpected latency jitter caused by GC may still exist Conclusion
  • 37. Case study: Lock and NUMA aware QPS bottleneck, but CPU usage ~60%
  • 38. ● TiDB server CPU usage is around 60% ● Networking and IO are both OK ● Increase the benchmark client concurrency, throughput does not increase ● It seems the resources are not exhausted, but where is the bottleneck? Description
  • 39. ● Something make the CPU poorly used ● So the bottleneck should be one of them? ○ Mutex ○ GC ○ NUMA Hypothesis
  • 40. curl -G "172.16.5.2:10080/debug/pprof/mutex" > mutex.profile go tool pprof -http 0.0.0.0:4002 mutex.profile Is it caused by mutex? We found some useless mutex in modules in : ● GRPC ● log slow query ● log slow cop task ● query feedback After optimization: QPS 4.3K => 5.2K, CPU 58.2% => 64%
  • 41. Bind CPUs to NUMA nodes Is it caused by NUMA? ● QPS 5.2K => 7.59K ● P999 3.06s => 2s ● CPU usage 64% => 88.49% ● Minor page fault 229.8K => 8.68K This single change get the biggest improvement!
  • 42. RAW Optimized Mutex Bind CPU Core + Optimized Mutex
  • 43. ● Off CPU analysis with linuxki ○ From the scheduler activity report, 67% of time is spend on sleeping Is it caused by GC?
  • 44. runtime.(*mheap).freeSpan!!! it’s about 14% Is it caused by GC?
  • 45. ● There is a lock on the page allocator, see this issue make the page allocator scale ● A high allocation rate combined with a high degree of parallelism leads to significant contention ● The performance is not scalable on a machine with lots of CPUs ○ PS: The Go folks have already done a lot of work on it Analysis
  • 46. ● NUMA (non-uniform memory access) ○ sockets ○ CPUs ○ local memory access ● OS allocation strategy ○ balanced ○ local ● Minor Page Faults might be related to the page migration caused by NUMA balancing Background: a bit about NUMA
  • 47. ● Unfortunately, Go's GC is not NUMA aware Analysis: NUMA aware
  • 48. ● Those factors make Go runtime less scalable on large number of cores ○ Lock contention in the allocation path ○ GC is not NUMA aware ○ ... Conclusion
  • 49. Recap ● CASE1: Batching client requests ■ latency in scheduler ● CASE2: THP(transparent huge pages) ■ memory control ● CASE3: GC sweep caused latency jitter ● CASE4: Lock and NUMA aware ■ scalability on multiple cores