SlideShare ist ein Scribd-Unternehmen logo
1 von 81
PFC306 
Brendan Gregg, Performance Engineering, Netflix 
November 12, 2014 | Las Vegas, NV
S3 
EC2 
Cassandra 
Applications 
(Services) 
EVCache 
ELB 
Elasticsearch 
SES SQS
Start 
i2 Select memory to 
cache working set 
Find best 
balance
ASG-v011 
… 
Instance 
Instance 
Instance 
ASG Cluster 
prod1 
ASG-v010 
… 
Instance 
Instance 
Instance 
Canary 
ELB
Select instance families Select resources 
From any desired 
resource, see 
types & cost
eg, 8 vCPU:
Acceptable Headroom Unacceptable
Cost per hour 
Services
# schedtool –B PID
vm.swappiness = 0 # from 60
# echo never > /sys/kernel/mm/transparent_hugepage/enabled # from madvise
vm.dirty_ratio = 80 # from 40 
vm.dirty_background_ratio = 5 # from 10 
vm.dirty_expire_centisecs = 12000 # from 3000 
mount -o defaults,noatime,discard,nobarrier …
/sys/block/*/queue/rq_affinity2 
/sys/block/*/queue/scheduler noop 
/sys/block/*/queue/nr_requests256 
/sys/block/*/queue/read_ahead_kb 256 
mdadm –chunk=64 ...
net.core.somaxconn = 1000 
net.core.netdev_max_backlog = 5000 
net.core.rmem_max = 16777216 
net.core.wmem_max = 16777216 
net.ipv4.tcp_wmem = 4096 12582912 16777216 
net.ipv4.tcp_rmem = 4096 12582912 16777216 
net.ipv4.tcp_max_syn_backlog = 8096 
net.ipv4.tcp_slow_start_after_idle = 0 
net.ipv4.tcp_tw_reuse = 1 
net.ipv4.ip_local_port_range = 10240 65535 
net.ipv4.tcp_abort_on_overflow = 1 # maybe
echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
Resource 
Utilization 
X (%)
Application 
System Libraries 
System Calls 
Kernel 
Devices
$ sar -n TCP,ETCP,DEV 1 
Linux 3.2.55 (test-e4f1a80b) 08/18/2014 _x86_64_ (8 CPU) 
09:10:43 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s 
09:10:44 PM lo 14.00 14.00 1.34 1.34 0.00 0.00 0.00 
09:10:44 PM eth0 4114.00 4186.00 4537.46 28513.24 0.00 0.00 0.00 
09:10:43 PM active/s passive/s iseg/s oseg/s 
09:10:44 PM 21.00 4.00 4107.00 22511.00 
09:10:43 PM atmptf/s estres/s retrans/s isegerr/s orsts/s 
09:10:44 PM 0.00 0.00 36.00 0.00 1.00 
[…]
Stack frame 
Mouse-over 
frames to 
quantify 
Ancestry
# git clone https://github.com/brendangregg/FlameGraph 
# cd FlameGraph 
# perf record -F 99 -ag -- sleep 60 
# perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > perf.svg
Broken 
Java stacks 
(missing 
frame 
pointer) 
Kernel 
TCP/IP 
GC 
Idle 
thread 
Time 
Locks 
epoll
# ./iosnoop –ts 
Tracing block I/O. Ctrl-C to end. 
STARTs ENDs COMM PID TYPE DEV BLOCK BYTES LATms 
5982800.302061 5982800.302679 supervise 1809 W 202,1 17039600 4096 0.62 
5982800.302423 5982800.302842 supervise 1809 W 202,1 17039608 4096 0.42 
5982800.304962 5982800.305446 supervise 1801 W 202,1 17039616 4096 0.48 
5982800.305250 5982800.305676 supervise 1801 W 202,1 17039624 4096 0.43 
[…] 
# ./iosnoop –h 
USAGE: iosnoop [-hQst] [-d device] [-i iotype] [-p PID] [-n name] [duration] 
-d device # device string (eg, "202,1) 
-i iotype # match type (eg, '*R*' for all reads) 
-n name # process name to match on I/O issue 
-p PID # PID to match on I/O issue 
-Q # include queueing time in LATms 
-s # include start time of I/O (s) 
-t # include completion time of I/O (s) 
[…]
# perf record –e skb:consume_skb –ag -- sleep 10 
# perf report 
[...] 
74.42% swapper [kernel.kallsyms] [k] consume_skb 
| 
--- consume_skb 
arp_process 
arp_rcv 
__netif_receive_skb_core 
__netif_receive_skb 
netif_receive_skb 
virtnet_poll 
net_rx_action 
__do_softirq 
irq_exit 
do_IRQ 
ret_from_intr 
[…] 
Summarizing stack traces for a 
tracepoint 
perf_events can do many things, 
it is hard to pick just one example
ec2-guest# ./showboost 
CPU MHz : 2500 
Turbo MHz : 2900 (10 active) 
Turbo Ratio : 116% (10 active) 
CPU 0 summary every 5 seconds... 
Real CPU MHz 
TIME C0_MCYC C0_ACYC UTIL RATIO MHz 
06:11:35 6428553166 7457384521 51% 116% 2900 
06:11:40 6349881107 7365764152 50% 115% 2899 
06:11:45 6240610655 7239046277 49% 115% 2899 
[...]
Region App Breakdowns 
Metrics 
Options 
Interactive 
Graph 
Summary Statistics
Utilization Saturation 
Errors 
Per device 
Breakdowns
http://aws.amazon.com/ec2/instance-types/ 
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html 
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html 
http://www.slideshare.net/cpwatson/cpn302-yourlinuxamioptimizationandperformance 
http://www.brendangregg.com/blog/2014-09-27/from-clouds-to-roots.html 
http://www.brendangregg.com/blog/2014-05-07/what-color-is-your-xen.html 
http://www.brendangregg.com/linuxperf.html 
http://www.slideshare.net/brendangregg/linux-performance-tools-2014 
http://www.brendangregg.com/USEmethod/use-linux.html 
http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html 
https://github.com/brendangregg/FlameGraph https://github.com/brendangregg/perf-tools
Talk Time Title 
PFC-305 Wednesday, 1:15pm Embracing Failure: Fault Injection and Service Reliability 
BDT-403 Wednesday, 2:15pm Next Generation Big Data Platform at Netflix 
PFC-306 Wednesday, 3:30pm Performance Tuning EC2 
DEV-309 Wednesday, 3:30pm From Asgard to Zuul, How Netflix’s proven Open Source 
Tools can accelerate and scale your services 
ARC-317 Wednesday, 4:30pm Maintaining a Resilient Front-Door at Massive Scale 
PFC-304 Wednesday, 4:30pm Effective Inter-process Communications in the Cloud: The 
Pros and Cons of Micro Services Architectures 
ENT-209 Wednesday, 4:30pm Cloud Migration, Dev-Ops and Distributed Systems 
APP-310 Friday, 9:00am Scheduling using Apache Mesos in the Cloud
Performance Tuning EC2 Instances

Weitere ähnliche Inhalte

Was ist angesagt?

BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
Brendan Gregg
 

Was ist angesagt? (20)

Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
USENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame Graphs
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Linux Kernel Crashdump
Linux Kernel CrashdumpLinux Kernel Crashdump
Linux Kernel Crashdump
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting Started
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on Lab
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Understanding kube proxy in ipvs mode
Understanding kube proxy in ipvs modeUnderstanding kube proxy in ipvs mode
Understanding kube proxy in ipvs mode
 
Linux Linux Traffic Control
Linux Linux Traffic ControlLinux Linux Traffic Control
Linux Linux Traffic Control
 
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko VancsaStarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
 
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
 
LISA17 Container Performance Analysis
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance Analysis
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
카프카, 산전수전 노하우
카프카, 산전수전 노하우카프카, 산전수전 노하우
카프카, 산전수전 노하우
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
 
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConAnatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
 

Andere mochten auch

Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPF
Brendan Gregg
 
ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016
Brendan Gregg
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
Tier1 App
 

Andere mochten auch (20)

Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPF
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREs
 
ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016
 
Stop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production SystemsStop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production Systems
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
 
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
 
AWS re:Invent 2016: Design Patterns for High Availability: Lessons from Amazo...
AWS re:Invent 2016: Design Patterns for High Availability: Lessons from Amazo...AWS re:Invent 2016: Design Patterns for High Availability: Lessons from Amazo...
AWS re:Invent 2016: Design Patterns for High Availability: Lessons from Amazo...
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
 
RxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsRxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance Results
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
 
G1 Garbage Collector: Details and Tuning
G1 Garbage Collector: Details and TuningG1 Garbage Collector: Details and Tuning
G1 Garbage Collector: Details and Tuning
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
 
Row Pattern Matching in SQL:2016
Row Pattern Matching in SQL:2016Row Pattern Matching in SQL:2016
Row Pattern Matching in SQL:2016
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
 

Ähnlich wie Performance Tuning EC2 Instances

Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
Ontico
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
Jeff Larkin
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014
Hajime Tazaki
 

Ähnlich wie Performance Tuning EC2 Instances (20)

(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018
 
Dpdk performance
Dpdk performanceDpdk performance
Dpdk performance
 
Performance tuning jvm
Performance tuning jvmPerformance tuning jvm
Performance tuning jvm
 
PerfUG 3 - perfs système
PerfUG 3 - perfs systèmePerfUG 3 - perfs système
PerfUG 3 - perfs système
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
 
Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPF
 
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql
 
Java/Spring과 Node.js의공존
Java/Spring과 Node.js의공존Java/Spring과 Node.js의공존
Java/Spring과 Node.js의공존
 
SiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingSiteGround Tech TeamBuilding
SiteGround Tech TeamBuilding
 
test
testtest
test
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
 
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande Modem
 
Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)
 
SOFA Tutorial
SOFA TutorialSOFA Tutorial
SOFA Tutorial
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014
 

Mehr von Brendan Gregg

Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
Brendan Gregg
 
EuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesEuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis Methodologies
Brendan Gregg
 
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPFUSENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
Brendan Gregg
 

Mehr von Brendan Gregg (19)

YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing Performance
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor Benchmarking
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
 
LPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing Tools
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 
YOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixYOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflix
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
BPF Tools 2017
BPF Tools 2017BPF Tools 2017
BPF Tools 2017
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF Observability
 
FlameScope 2018
FlameScope 2018FlameScope 2018
FlameScope 2018
 
Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
 
EuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesEuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis Methodologies
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
 
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPFUSENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Performance Tuning EC2 Instances

  • 1. PFC306 Brendan Gregg, Performance Engineering, Netflix November 12, 2014 | Las Vegas, NV
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. S3 EC2 Cassandra Applications (Services) EVCache ELB Elasticsearch SES SQS
  • 10.
  • 11.
  • 12.
  • 13. Start i2 Select memory to cache working set Find best balance
  • 14. ASG-v011 … Instance Instance Instance ASG Cluster prod1 ASG-v010 … Instance Instance Instance Canary ELB
  • 15.
  • 16.
  • 17. Select instance families Select resources From any desired resource, see types & cost
  • 19.
  • 20.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. Cost per hour Services
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 37. vm.swappiness = 0 # from 60
  • 38. # echo never > /sys/kernel/mm/transparent_hugepage/enabled # from madvise
  • 39. vm.dirty_ratio = 80 # from 40 vm.dirty_background_ratio = 5 # from 10 vm.dirty_expire_centisecs = 12000 # from 3000 mount -o defaults,noatime,discard,nobarrier …
  • 40. /sys/block/*/queue/rq_affinity2 /sys/block/*/queue/scheduler noop /sys/block/*/queue/nr_requests256 /sys/block/*/queue/read_ahead_kb 256 mdadm –chunk=64 ...
  • 41. net.core.somaxconn = 1000 net.core.netdev_max_backlog = 5000 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_wmem = 4096 12582912 16777216 net.ipv4.tcp_rmem = 4096 12582912 16777216 net.ipv4.tcp_max_syn_backlog = 8096 net.ipv4.tcp_slow_start_after_idle = 0 net.ipv4.tcp_tw_reuse = 1 net.ipv4.ip_local_port_range = 10240 65535 net.ipv4.tcp_abort_on_overflow = 1 # maybe
  • 42. echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
  • 43.
  • 44.
  • 45.
  • 46.
  • 48.
  • 49.
  • 50.
  • 51. Application System Libraries System Calls Kernel Devices
  • 52.
  • 53.
  • 54. $ sar -n TCP,ETCP,DEV 1 Linux 3.2.55 (test-e4f1a80b) 08/18/2014 _x86_64_ (8 CPU) 09:10:43 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s 09:10:44 PM lo 14.00 14.00 1.34 1.34 0.00 0.00 0.00 09:10:44 PM eth0 4114.00 4186.00 4537.46 28513.24 0.00 0.00 0.00 09:10:43 PM active/s passive/s iseg/s oseg/s 09:10:44 PM 21.00 4.00 4107.00 22511.00 09:10:43 PM atmptf/s estres/s retrans/s isegerr/s orsts/s 09:10:44 PM 0.00 0.00 36.00 0.00 1.00 […]
  • 55.
  • 56.
  • 57.
  • 58.
  • 59. Stack frame Mouse-over frames to quantify Ancestry
  • 60. # git clone https://github.com/brendangregg/FlameGraph # cd FlameGraph # perf record -F 99 -ag -- sleep 60 # perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > perf.svg
  • 61.
  • 62. Broken Java stacks (missing frame pointer) Kernel TCP/IP GC Idle thread Time Locks epoll
  • 63.
  • 64.
  • 65. # ./iosnoop –ts Tracing block I/O. Ctrl-C to end. STARTs ENDs COMM PID TYPE DEV BLOCK BYTES LATms 5982800.302061 5982800.302679 supervise 1809 W 202,1 17039600 4096 0.62 5982800.302423 5982800.302842 supervise 1809 W 202,1 17039608 4096 0.42 5982800.304962 5982800.305446 supervise 1801 W 202,1 17039616 4096 0.48 5982800.305250 5982800.305676 supervise 1801 W 202,1 17039624 4096 0.43 […] # ./iosnoop –h USAGE: iosnoop [-hQst] [-d device] [-i iotype] [-p PID] [-n name] [duration] -d device # device string (eg, "202,1) -i iotype # match type (eg, '*R*' for all reads) -n name # process name to match on I/O issue -p PID # PID to match on I/O issue -Q # include queueing time in LATms -s # include start time of I/O (s) -t # include completion time of I/O (s) […]
  • 66.
  • 67. # perf record –e skb:consume_skb –ag -- sleep 10 # perf report [...] 74.42% swapper [kernel.kallsyms] [k] consume_skb | --- consume_skb arp_process arp_rcv __netif_receive_skb_core __netif_receive_skb netif_receive_skb virtnet_poll net_rx_action __do_softirq irq_exit do_IRQ ret_from_intr […] Summarizing stack traces for a tracepoint perf_events can do many things, it is hard to pick just one example
  • 68.
  • 69. ec2-guest# ./showboost CPU MHz : 2500 Turbo MHz : 2900 (10 active) Turbo Ratio : 116% (10 active) CPU 0 summary every 5 seconds... Real CPU MHz TIME C0_MCYC C0_ACYC UTIL RATIO MHz 06:11:35 6428553166 7457384521 51% 116% 2900 06:11:40 6349881107 7365764152 50% 115% 2899 06:11:45 6240610655 7239046277 49% 115% 2899 [...]
  • 70.
  • 71.
  • 72. Region App Breakdowns Metrics Options Interactive Graph Summary Statistics
  • 73.
  • 74.
  • 75. Utilization Saturation Errors Per device Breakdowns
  • 76.
  • 77.
  • 78. http://aws.amazon.com/ec2/instance-types/ http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html http://www.slideshare.net/cpwatson/cpn302-yourlinuxamioptimizationandperformance http://www.brendangregg.com/blog/2014-09-27/from-clouds-to-roots.html http://www.brendangregg.com/blog/2014-05-07/what-color-is-your-xen.html http://www.brendangregg.com/linuxperf.html http://www.slideshare.net/brendangregg/linux-performance-tools-2014 http://www.brendangregg.com/USEmethod/use-linux.html http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html https://github.com/brendangregg/FlameGraph https://github.com/brendangregg/perf-tools
  • 79.
  • 80. Talk Time Title PFC-305 Wednesday, 1:15pm Embracing Failure: Fault Injection and Service Reliability BDT-403 Wednesday, 2:15pm Next Generation Big Data Platform at Netflix PFC-306 Wednesday, 3:30pm Performance Tuning EC2 DEV-309 Wednesday, 3:30pm From Asgard to Zuul, How Netflix’s proven Open Source Tools can accelerate and scale your services ARC-317 Wednesday, 4:30pm Maintaining a Resilient Front-Door at Massive Scale PFC-304 Wednesday, 4:30pm Effective Inter-process Communications in the Cloud: The Pros and Cons of Micro Services Architectures ENT-209 Wednesday, 4:30pm Cloud Migration, Dev-Ops and Distributed Systems APP-310 Friday, 9:00am Scheduling using Apache Mesos in the Cloud