SlideShare ist ein Scribd-Unternehmen logo
1 von 10
GC-stall and Page Scan Attacks
by Linux
Cuong Tran
LinkedIn Performance Group
Agenda
• GC attacks by Linux
• Page scan attacks by Linux
• Recommendations
Examples of
GC attacks by Linux
• 2013-10-05T05:01:04.179+0000:…. : 216982K>9328K(256000K), 0.0666320 secs] 377835K-

>170188K(768000K), 0.0675850 secs] [Times:

user=0.17

sys=0.00, real=3.18 secs]
• 2013-09-19T06:14:03.632+0000: 44372.834: [GC [1 CMS-initial-mark:
703914K(921600K)] 718372K(1433600K), 126.1196340 secs] [Times:

user=0.00 sys=127.31, real=126.10 secs]
• GC stopped the world for minutes but:
– Did no real work (CPU time in user mode = 0)
– Burned cycles in Linux kernel
GC attacks by Linux
• IO starvation
– Symptom: GC log shows “low user time, low system
time, long GC pause”.
– Cause: GC threads stuck in kernel waiting for
IO, usually due to journal commits or FS flush of
changes by gzip of log rolling

• Memory starvation.
– Symptom: GC log shows “Low user time, high system
time, long GC pause”
– Cause: Memory pressure triggers swapping or
scanning for free memory

4
Solutions for GC-attacks
• IO Starvation
– Strategy: Even out workload to disk drives (flush every 5 s rather
than 30 s)
sysctl –w vm.dirty_writeback_centisecs = 500
sysctl –w vm.dirty_expire_centisecs = 500

– In progress: Direct IO with gzip or gzip as-you-go

• Memory Starvation
– Strategy: Pre-allocate memory to JVM heap and protect it
against swapping or scanning
– Turn on –XX:+AlwaysPreTouch option in JVM
– Sysctl –w vm.swappiness=0 to protect heap and
anonymous memory
– JVM start up has 2 second delay to allocate all memory (17GB)

5
Page scan attacks by Linux
Measured: 7,000,000 scans/sec
Stall: 2+ minutes
Goal: 0 scans/sec

6
Cause : Page Scan Attacks

Transparent Huge Page (THP)
• A Redhat enhancement for performance
–
–
–
–

2MB huge pages vs. 4KB regular pages
Less TLB miss and page table walk
Only work for anonymous memory (malloc)
Improve 10% performance for SPECjbb, app server workload

• But THP can degrade performance severely
– Collapsing, Compacting, Splitting, Migration
– Very high pgscand/s
– Very busy khugepaged
– Very high system time when process compacts memory or
khugepaged runs

• THP optimization can increase GC stall time by minutes
Cause : Page Scan Attacks

NUMA Optimization
• A Linux optimization for NUMA
– 2 CPU sockets, each having 12 cores and local memory.
– Memory accessible by all 24 cores but local memory is faster
– Linux tries to allocate local memory to application
threads, i.e., from local zone
– Best suited for applications that can fit in one local zone

• NUMA optimization can degrade performance severely
– Very high pgscand/s
– Linux zone-reclaim insists on finding memory on local
zone although memory is plentiful on the other zone
– Linux migrates memory including THP, creating a viscous cycle of
breaking up 2 MB pages, scanning for 4 KB free pages, and reassembling 4KB into 2 MB pages
Cause : Page Scan Attacks

Solutions
• Turn off THP optimization and thus

khugepaged
– echo never >
/sys/kernel/mm/redhat_transparent_hugepa
ge/enabled

– Will not affect file-IO or memory mapped files
– Redhat, Oracle, Hadoop recommends no THP

• Turn off zone-reclaim optimization
– sysctl –w vm.zone_reclaim_mode=0

– Twitter recommends NUMA interleaving
9
Recommendations
• Gate keepers: SRE and SysOps
• Safe to roll-out fixes for GC attacks now
– Linux: Flush changes more frequently and protect heap
• sysctl –w vm.dirty_writeback_centisecs = 500
• sysctl –w vm.dirty_expire_centisecs = 500

• sysctl –w vm.swappiness=0
– JVM: Give JVM heap all memory it needs when started
• –XX:+AlwaysPreTouch
• Heap size per AutoTune

• Gradual roll-out fixes of page scan attacks.
– Best for back-end servers
– Linux: Turn off THP and NUMA optimization
• echo never >
/sys/kernel/mm/redhat_transparent_hugepage/enabled
• sysctl –w vm.zone_reclaim_mode = 0

– Work with product groups to test on small group of servers before
applying changes to the rest

Weitere ähnliche Inhalte

Was ist angesagt?

Autonomic nervous system testing arfa sulthana
Autonomic nervous system testing arfa sulthanaAutonomic nervous system testing arfa sulthana
Autonomic nervous system testing arfa sulthanavrkv2007
 
Why MySQL Replication Fails, and How to Get it Back
Why MySQL Replication Fails, and How to Get it BackWhy MySQL Replication Fails, and How to Get it Back
Why MySQL Replication Fails, and How to Get it BackSveta Smirnova
 
Scalenus anterior muscle
Scalenus anterior muscleScalenus anterior muscle
Scalenus anterior muscleIdris Siddiqui
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...DataStax
 
Weaponizing Recon - Smashing Applications for Security Vulnerabilities & Profits
Weaponizing Recon - Smashing Applications for Security Vulnerabilities & ProfitsWeaponizing Recon - Smashing Applications for Security Vulnerabilities & Profits
Weaponizing Recon - Smashing Applications for Security Vulnerabilities & ProfitsHarsh Bothra
 
Master Canary Forging by Yuki Koike - CODE BLUE 2015
Master Canary Forging by Yuki Koike - CODE BLUE 2015Master Canary Forging by Yuki Koike - CODE BLUE 2015
Master Canary Forging by Yuki Koike - CODE BLUE 2015CODE BLUE
 
Linux SMEP bypass techniques
Linux SMEP bypass techniquesLinux SMEP bypass techniques
Linux SMEP bypass techniquesVitaly Nikolenko
 
Let's Learn to Talk to GC Logs in Java 9
Let's Learn to Talk to GC Logs in Java 9Let's Learn to Talk to GC Logs in Java 9
Let's Learn to Talk to GC Logs in Java 9Poonam Bajaj Parhar
 
Pentest Application With GraphQL | Null Bangalore Meetup
Pentest Application With GraphQL | Null Bangalore Meetup Pentest Application With GraphQL | Null Bangalore Meetup
Pentest Application With GraphQL | Null Bangalore Meetup Divyanshu
 
Mvcc (oracle, innodb, postgres)
Mvcc (oracle, innodb, postgres)Mvcc (oracle, innodb, postgres)
Mvcc (oracle, innodb, postgres)frogd
 
motor nervous system : Stretch reflex
motor nervous system : Stretch reflexmotor nervous system : Stretch reflex
motor nervous system : Stretch reflexdina merzeban
 

Was ist angesagt? (20)

Autonomic nervous system testing arfa sulthana
Autonomic nervous system testing arfa sulthanaAutonomic nervous system testing arfa sulthana
Autonomic nervous system testing arfa sulthana
 
Alfresco tuning part2
Alfresco tuning part2Alfresco tuning part2
Alfresco tuning part2
 
Balance.pptx
Balance.pptxBalance.pptx
Balance.pptx
 
Why MySQL Replication Fails, and How to Get it Back
Why MySQL Replication Fails, and How to Get it BackWhy MySQL Replication Fails, and How to Get it Back
Why MySQL Replication Fails, and How to Get it Back
 
Scalenus anterior muscle
Scalenus anterior muscleScalenus anterior muscle
Scalenus anterior muscle
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
 
Limbic system final
Limbic system finalLimbic system final
Limbic system final
 
Head and neck osteology
Head and neck osteologyHead and neck osteology
Head and neck osteology
 
Weaponizing Recon - Smashing Applications for Security Vulnerabilities & Profits
Weaponizing Recon - Smashing Applications for Security Vulnerabilities & ProfitsWeaponizing Recon - Smashing Applications for Security Vulnerabilities & Profits
Weaponizing Recon - Smashing Applications for Security Vulnerabilities & Profits
 
Master Canary Forging by Yuki Koike - CODE BLUE 2015
Master Canary Forging by Yuki Koike - CODE BLUE 2015Master Canary Forging by Yuki Koike - CODE BLUE 2015
Master Canary Forging by Yuki Koike - CODE BLUE 2015
 
Linux SMEP bypass techniques
Linux SMEP bypass techniquesLinux SMEP bypass techniques
Linux SMEP bypass techniques
 
Everything about Blind xss
Everything about Blind xssEverything about Blind xss
Everything about Blind xss
 
Let's Learn to Talk to GC Logs in Java 9
Let's Learn to Talk to GC Logs in Java 9Let's Learn to Talk to GC Logs in Java 9
Let's Learn to Talk to GC Logs in Java 9
 
Pentest Application With GraphQL | Null Bangalore Meetup
Pentest Application With GraphQL | Null Bangalore Meetup Pentest Application With GraphQL | Null Bangalore Meetup
Pentest Application With GraphQL | Null Bangalore Meetup
 
Triangles of the neck
Triangles of the neckTriangles of the neck
Triangles of the neck
 
Hard and soft palate
Hard and soft palateHard and soft palate
Hard and soft palate
 
Mvcc (oracle, innodb, postgres)
Mvcc (oracle, innodb, postgres)Mvcc (oracle, innodb, postgres)
Mvcc (oracle, innodb, postgres)
 
motor nervous system : Stretch reflex
motor nervous system : Stretch reflexmotor nervous system : Stretch reflex
motor nervous system : Stretch reflex
 
Reticular formation
Reticular formationReticular formation
Reticular formation
 
Fig 9-02
Fig 9-02Fig 9-02
Fig 9-02
 

Andere mochten auch

Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State DrivesVinoth Chandar
 
Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State DrivesDataStax Academy
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Julien Le Dem
 
2015 08-20-criu support-in_docker_for_native_checkpoint_and_restore
2015 08-20-criu support-in_docker_for_native_checkpoint_and_restore2015 08-20-criu support-in_docker_for_native_checkpoint_and_restore
2015 08-20-criu support-in_docker_for_native_checkpoint_and_restoreSaied Kazemi
 
OS caused Large JVM pauses: Deep dive and solutions
OS caused Large JVM pauses: Deep dive and solutionsOS caused Large JVM pauses: Deep dive and solutions
OS caused Large JVM pauses: Deep dive and solutionsZhenyun Zhuang
 
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUs
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUsSpecification-Based Test Program Generation for ARM VMSAv8-64 MMUs
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUsAlexander Kamkin
 
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingAnne Nicolas
 
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsTackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsThe Linux Foundation
 
Reverse engineering for_beginners-en
Reverse engineering for_beginners-enReverse engineering for_beginners-en
Reverse engineering for_beginners-enAndri Yabu
 
BKK16-404A PCI Development Meeting
BKK16-404A PCI Development MeetingBKK16-404A PCI Development Meeting
BKK16-404A PCI Development MeetingLinaro
 
Virtualization overheads
Virtualization overheadsVirtualization overheads
Virtualization overheadsSandeep Joshi
 
Docker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in PragueDocker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in Praguetomasbart
 
Linux numa evolution
Linux numa evolutionLinux numa evolution
Linux numa evolutionLukas Pirl
 
BKK16-104 sched-freq
BKK16-104 sched-freqBKK16-104 sched-freq
BKK16-104 sched-freqLinaro
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1sprdd
 
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionTroubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionJoel Koshy
 
Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Nakul Manchanda
 
Known basic of NFV Features
Known basic of NFV FeaturesKnown basic of NFV Features
Known basic of NFV FeaturesRaul Leite
 
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)Linaro
 

Andere mochten auch (20)

Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State Drives
 
Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State Drives
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
2015 08-20-criu support-in_docker_for_native_checkpoint_and_restore
2015 08-20-criu support-in_docker_for_native_checkpoint_and_restore2015 08-20-criu support-in_docker_for_native_checkpoint_and_restore
2015 08-20-criu support-in_docker_for_native_checkpoint_and_restore
 
OS caused Large JVM pauses: Deep dive and solutions
OS caused Large JVM pauses: Deep dive and solutionsOS caused Large JVM pauses: Deep dive and solutions
OS caused Large JVM pauses: Deep dive and solutions
 
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUs
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUsSpecification-Based Test Program Generation for ARM VMSAv8-64 MMUs
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUs
 
Dulloor xen-summit
Dulloor xen-summitDulloor xen-summit
Dulloor xen-summit
 
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
 
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsTackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
 
Reverse engineering for_beginners-en
Reverse engineering for_beginners-enReverse engineering for_beginners-en
Reverse engineering for_beginners-en
 
BKK16-404A PCI Development Meeting
BKK16-404A PCI Development MeetingBKK16-404A PCI Development Meeting
BKK16-404A PCI Development Meeting
 
Virtualization overheads
Virtualization overheadsVirtualization overheads
Virtualization overheads
 
Docker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in PragueDocker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in Prague
 
Linux numa evolution
Linux numa evolutionLinux numa evolution
Linux numa evolution
 
BKK16-104 sched-freq
BKK16-104 sched-freqBKK16-104 sched-freq
BKK16-104 sched-freq
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1
 
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionTroubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolution
 
Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)
 
Known basic of NFV Features
Known basic of NFV FeaturesKnown basic of NFV Features
Known basic of NFV Features
 
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
 

Ähnlich wie Gc and-pagescan-attacks-by-linux

Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Tier1 App
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services
 
Low latency & mechanical sympathy issues and solutions
Low latency & mechanical sympathy  issues and solutionsLow latency & mechanical sympathy  issues and solutions
Low latency & mechanical sympathy issues and solutionsJean-Philippe BEMPEL
 
Java Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
Java Garbage Collectors – Moving to Java7 Garbage First (G1) CollectorJava Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
Java Garbage Collectors – Moving to Java7 Garbage First (G1) CollectorGurpreet Sachdeva
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward
 
Pick diamonds from garbage
Pick diamonds from garbagePick diamonds from garbage
Pick diamonds from garbageTier1 App
 
How (not) to kill your MySQL infrastructure
How (not) to kill your MySQL infrastructureHow (not) to kill your MySQL infrastructure
How (not) to kill your MySQL infrastructureMiklos Szel
 
Infrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxInfrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxMiklos Szel
 
Become a Java GC Hero - ConFoo Conference
Become a Java GC Hero - ConFoo ConferenceBecome a Java GC Hero - ConFoo Conference
Become a Java GC Hero - ConFoo ConferenceTier1app
 
G1 Garbage Collector - Big Heaps and Low Pauses?
G1 Garbage Collector - Big Heaps and Low Pauses?G1 Garbage Collector - Big Heaps and Low Pauses?
G1 Garbage Collector - Big Heaps and Low Pauses?C2B2 Consulting
 
Resolving Firebird performance problems
Resolving Firebird performance problemsResolving Firebird performance problems
Resolving Firebird performance problemsAlexey Kovyazin
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDaehyeok Kim
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQLlefredbe
 
Become a Garbage Collection Hero
Become a Garbage Collection HeroBecome a Garbage Collection Hero
Become a Garbage Collection HeroTier1app
 
In-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsIn-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsSrinath Perera
 
Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)Alex Rasmussen
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...ScyllaDB
 
Tuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issuesTuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issuesSergey Podolsky
 

Ähnlich wie Gc and-pagescan-attacks-by-linux (20)

Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
 
Low latency & mechanical sympathy issues and solutions
Low latency & mechanical sympathy  issues and solutionsLow latency & mechanical sympathy  issues and solutions
Low latency & mechanical sympathy issues and solutions
 
Java Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
Java Garbage Collectors – Moving to Java7 Garbage First (G1) CollectorJava Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
Java Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
 
Pick diamonds from garbage
Pick diamonds from garbagePick diamonds from garbage
Pick diamonds from garbage
 
How (not) to kill your MySQL infrastructure
How (not) to kill your MySQL infrastructureHow (not) to kill your MySQL infrastructure
How (not) to kill your MySQL infrastructure
 
Infrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxInfrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black Box
 
Become a Java GC Hero - ConFoo Conference
Become a Java GC Hero - ConFoo ConferenceBecome a Java GC Hero - ConFoo Conference
Become a Java GC Hero - ConFoo Conference
 
G1 Garbage Collector - Big Heaps and Low Pauses?
G1 Garbage Collector - Big Heaps and Low Pauses?G1 Garbage Collector - Big Heaps and Low Pauses?
G1 Garbage Collector - Big Heaps and Low Pauses?
 
Resolving Firebird performance problems
Resolving Firebird performance problemsResolving Firebird performance problems
Resolving Firebird performance problems
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed Systems
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQL
 
Linux Huge Pages
Linux Huge PagesLinux Huge Pages
Linux Huge Pages
 
Become a Garbage Collection Hero
Become a Garbage Collection HeroBecome a Garbage Collection Hero
Become a Garbage Collection Hero
 
In-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsIn-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common Patterns
 
Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)
 
OpenDS_Jazoon2010
OpenDS_Jazoon2010OpenDS_Jazoon2010
OpenDS_Jazoon2010
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
 
Tuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issuesTuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issues
 

Kürzlich hochgeladen

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Kürzlich hochgeladen (20)

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

Gc and-pagescan-attacks-by-linux

  • 1. GC-stall and Page Scan Attacks by Linux Cuong Tran LinkedIn Performance Group
  • 2. Agenda • GC attacks by Linux • Page scan attacks by Linux • Recommendations
  • 3. Examples of GC attacks by Linux • 2013-10-05T05:01:04.179+0000:…. : 216982K>9328K(256000K), 0.0666320 secs] 377835K- >170188K(768000K), 0.0675850 secs] [Times: user=0.17 sys=0.00, real=3.18 secs] • 2013-09-19T06:14:03.632+0000: 44372.834: [GC [1 CMS-initial-mark: 703914K(921600K)] 718372K(1433600K), 126.1196340 secs] [Times: user=0.00 sys=127.31, real=126.10 secs] • GC stopped the world for minutes but: – Did no real work (CPU time in user mode = 0) – Burned cycles in Linux kernel
  • 4. GC attacks by Linux • IO starvation – Symptom: GC log shows “low user time, low system time, long GC pause”. – Cause: GC threads stuck in kernel waiting for IO, usually due to journal commits or FS flush of changes by gzip of log rolling • Memory starvation. – Symptom: GC log shows “Low user time, high system time, long GC pause” – Cause: Memory pressure triggers swapping or scanning for free memory 4
  • 5. Solutions for GC-attacks • IO Starvation – Strategy: Even out workload to disk drives (flush every 5 s rather than 30 s) sysctl –w vm.dirty_writeback_centisecs = 500 sysctl –w vm.dirty_expire_centisecs = 500 – In progress: Direct IO with gzip or gzip as-you-go • Memory Starvation – Strategy: Pre-allocate memory to JVM heap and protect it against swapping or scanning – Turn on –XX:+AlwaysPreTouch option in JVM – Sysctl –w vm.swappiness=0 to protect heap and anonymous memory – JVM start up has 2 second delay to allocate all memory (17GB) 5
  • 6. Page scan attacks by Linux Measured: 7,000,000 scans/sec Stall: 2+ minutes Goal: 0 scans/sec 6
  • 7. Cause : Page Scan Attacks Transparent Huge Page (THP) • A Redhat enhancement for performance – – – – 2MB huge pages vs. 4KB regular pages Less TLB miss and page table walk Only work for anonymous memory (malloc) Improve 10% performance for SPECjbb, app server workload • But THP can degrade performance severely – Collapsing, Compacting, Splitting, Migration – Very high pgscand/s – Very busy khugepaged – Very high system time when process compacts memory or khugepaged runs • THP optimization can increase GC stall time by minutes
  • 8. Cause : Page Scan Attacks NUMA Optimization • A Linux optimization for NUMA – 2 CPU sockets, each having 12 cores and local memory. – Memory accessible by all 24 cores but local memory is faster – Linux tries to allocate local memory to application threads, i.e., from local zone – Best suited for applications that can fit in one local zone • NUMA optimization can degrade performance severely – Very high pgscand/s – Linux zone-reclaim insists on finding memory on local zone although memory is plentiful on the other zone – Linux migrates memory including THP, creating a viscous cycle of breaking up 2 MB pages, scanning for 4 KB free pages, and reassembling 4KB into 2 MB pages
  • 9. Cause : Page Scan Attacks Solutions • Turn off THP optimization and thus khugepaged – echo never > /sys/kernel/mm/redhat_transparent_hugepa ge/enabled – Will not affect file-IO or memory mapped files – Redhat, Oracle, Hadoop recommends no THP • Turn off zone-reclaim optimization – sysctl –w vm.zone_reclaim_mode=0 – Twitter recommends NUMA interleaving 9
  • 10. Recommendations • Gate keepers: SRE and SysOps • Safe to roll-out fixes for GC attacks now – Linux: Flush changes more frequently and protect heap • sysctl –w vm.dirty_writeback_centisecs = 500 • sysctl –w vm.dirty_expire_centisecs = 500 • sysctl –w vm.swappiness=0 – JVM: Give JVM heap all memory it needs when started • –XX:+AlwaysPreTouch • Heap size per AutoTune • Gradual roll-out fixes of page scan attacks. – Best for back-end servers – Linux: Turn off THP and NUMA optimization • echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled • sysctl –w vm.zone_reclaim_mode = 0 – Work with product groups to test on small group of servers before applying changes to the rest