SlideShare ist ein Scribd-Unternehmen logo
1 von 10
The lesser known GCC optimizations
Mark Veltzer
mark@veltzer.net
Using profile information for optimization
The CPU pipe line
â—Ź Most CPUs today are built using pipelines
â—Ź This enables the CPU to clock at a higher rate
â—Ź This also enables the CPU to use more of it's
infrastructure at the same time and so utilizing
the hardware better.
â—Ź It also speed things up because computation is
done in parallel.
â—Ź This is even more so in multi-core CPUs.
The problems with pipe lines
â—Ź The pipe line idea is heavily tied to the idea of branch
prediction
â—Ź If the CPU has not yet finished certain instructions
and sees a branch following them then it needs to be
able to guess where the branch will go
â—Ź Otherwise the pipeline idea itself becomes
problematic.
â—Ź That is because the execution of instructions in
parallel at the CPU level is halted every time there is
a branch.
Hardware prediction
â—Ź The hardware itself does a rudimentary form of
prediction.
â—Ź The assumption of the hardware is that what
happened before will happen again (order)
â—Ź This is OK and hardware does a good job even
without assistance.
â—Ź This means that a random program will make the
hardware miss-predict and will cause execution
speed to go down.
â—Ź Example: "if(random()<0.5) {"
Software prediction
â—Ź The hardware manufacturers allow the software layer
to tell the hardware which way a branch could go using
hints left inside the assembly.
â—Ź Special instructions were created for this by the
hardware manufacturers
â—Ź The compiler decides whether to use hinted branches
or non hinted ones.
â—Ź It will use the hinted ones only when it can guess where
the branch will go.
â—Ź For instance: in a loop the branch will tend to go back
to the loop.
The problem of branching
â—Ź When the compiler sees a branch not as part of
a loop (if(condition)) it does not know what are
the chances that the condition will evaluate to
true.
â—Ź Therefor it will usually use a hint-less branch
statement.
â—Ź Unless you tell it otherwise.
â—Ź There are two ways to tell the compiler which
way the branch will go.
First way - explicit hinting in the
software
â—Ź You can use the __builtin_expect construct to
hint at the right path.
â—Ź Instead of writing "if(x) {" you write:
"if(__builtin_expect((x),1)) {"
â—Ź You can wrap this in a nice macro like the Linux
kernel folk did.
â—Ź The compiler will plant a hint to the CPU telling
it that the branch is likely to succeed.
â—Ź See example
Second way - using profile
information
â—Ź You leave your code alone.
â—Ź You compile the code with -fprofile-arcs.
â—Ź Then you run it on a typical scenario creating files which
show which way branches went (auxname.gcda for each
file).
â—Ź Then you compile your code again with -fbranch-
probabilities which uses the gathered data to plant the
right hints.
â—Ź In GCC you must compile the phases using the exact
same flags (otherwise expect problems)
â—Ź See example
Second way - PGO
â—Ź This whole approach is a subset of a bigger
concept called PGO – Profile Generated
Optimizations
â—Ź This includes the branch prediction we saw
before but also other types of optimization
(reordering of switch cases as an example).
â—Ź This is why you should use the more general
flags -fprofile-generate and -fprofile-use which
imply the previous flags and add even more
profile generated optimization.

Weitere ähnliche Inhalte

Was ist angesagt?

Trace kernel code tips
Trace kernel code tipsTrace kernel code tips
Trace kernel code tipsViller Hsiao
 
How to write a well-behaved Python command line application
How to write a well-behaved Python command line applicationHow to write a well-behaved Python command line application
How to write a well-behaved Python command line applicationgjcross
 
[COSCUP 2020] How to use llvm frontend library-libtooling
[COSCUP 2020] How to use llvm frontend library-libtooling[COSCUP 2020] How to use llvm frontend library-libtooling
[COSCUP 2020] How to use llvm frontend library-libtoolingDouglas Chen
 
Python debugging techniques
Python debugging techniquesPython debugging techniques
Python debugging techniquesTuomas Suutari
 
Java Performance & Profiling
Java Performance & ProfilingJava Performance & Profiling
Java Performance & ProfilingIsuru Perera
 
GNU GCC - what just a compiler...?
GNU GCC - what just a compiler...?GNU GCC - what just a compiler...?
GNU GCC - what just a compiler...?Saket Pathak
 
Flux: A Language for Programming High-Performance Servers
Flux: A Language for Programming High-Performance ServersFlux: A Language for Programming High-Performance Servers
Flux: A Language for Programming High-Performance ServersEmery Berger
 
Compiling Under Linux
Compiling Under LinuxCompiling Under Linux
Compiling Under LinuxPierreMASURE
 
Gatling workshop lets test17
Gatling workshop lets test17Gatling workshop lets test17
Gatling workshop lets test17Gerald Muecke
 
30 Minutes To CPAN
30 Minutes To CPAN30 Minutes To CPAN
30 Minutes To CPANdaoswald
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingScyllaDB
 
Emulation Error Recovery
Emulation Error RecoveryEmulation Error Recovery
Emulation Error Recoverysomnathb1
 
OpenAPI and gRPC Side by-Side
OpenAPI and gRPC Side by-SideOpenAPI and gRPC Side by-Side
OpenAPI and gRPC Side by-SideTim Burks
 
Top five reasons why every DV engineer will love the latest systemverilog 201...
Top five reasons why every DV engineer will love the latest systemverilog 201...Top five reasons why every DV engineer will love the latest systemverilog 201...
Top five reasons why every DV engineer will love the latest systemverilog 201...Srinivasan Venkataramanan
 
Three Lessons about Gatling and Microservices
Three Lessons about Gatling and MicroservicesThree Lessons about Gatling and Microservices
Three Lessons about Gatling and MicroservicesDragos Manolescu
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUGslandelle
 

Was ist angesagt? (20)

Trace kernel code tips
Trace kernel code tipsTrace kernel code tips
Trace kernel code tips
 
How to write a well-behaved Python command line application
How to write a well-behaved Python command line applicationHow to write a well-behaved Python command line application
How to write a well-behaved Python command line application
 
[COSCUP 2020] How to use llvm frontend library-libtooling
[COSCUP 2020] How to use llvm frontend library-libtooling[COSCUP 2020] How to use llvm frontend library-libtooling
[COSCUP 2020] How to use llvm frontend library-libtooling
 
Python debugging techniques
Python debugging techniquesPython debugging techniques
Python debugging techniques
 
Java Performance & Profiling
Java Performance & ProfilingJava Performance & Profiling
Java Performance & Profiling
 
C under Linux
C under LinuxC under Linux
C under Linux
 
GNU GCC - what just a compiler...?
GNU GCC - what just a compiler...?GNU GCC - what just a compiler...?
GNU GCC - what just a compiler...?
 
Flux: A Language for Programming High-Performance Servers
Flux: A Language for Programming High-Performance ServersFlux: A Language for Programming High-Performance Servers
Flux: A Language for Programming High-Performance Servers
 
Test driving QML
Test driving QMLTest driving QML
Test driving QML
 
Concurrency patterns
Concurrency patternsConcurrency patterns
Concurrency patterns
 
Compiling Under Linux
Compiling Under LinuxCompiling Under Linux
Compiling Under Linux
 
Gatling workshop lets test17
Gatling workshop lets test17Gatling workshop lets test17
Gatling workshop lets test17
 
30 Minutes To CPAN
30 Minutes To CPAN30 Minutes To CPAN
30 Minutes To CPAN
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using Tracing
 
Emulation Error Recovery
Emulation Error RecoveryEmulation Error Recovery
Emulation Error Recovery
 
OpenAPI and gRPC Side by-Side
OpenAPI and gRPC Side by-SideOpenAPI and gRPC Side by-Side
OpenAPI and gRPC Side by-Side
 
Top five reasons why every DV engineer will love the latest systemverilog 201...
Top five reasons why every DV engineer will love the latest systemverilog 201...Top five reasons why every DV engineer will love the latest systemverilog 201...
Top five reasons why every DV engineer will love the latest systemverilog 201...
 
Gatling
GatlingGatling
Gatling
 
Three Lessons about Gatling and Microservices
Three Lessons about Gatling and MicroservicesThree Lessons about Gatling and Microservices
Three Lessons about Gatling and Microservices
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUG
 

Andere mochten auch

NetBeans para Java, C, C++
NetBeans para Java, C, C++NetBeans para Java, C, C++
NetBeans para Java, C, C++Manuel Antonio
 
Principles of compiler design
Principles of compiler designPrinciples of compiler design
Principles of compiler designJanani Parthiban
 
Introduction to Perl - Day 1
Introduction to Perl - Day 1Introduction to Perl - Day 1
Introduction to Perl - Day 1Dave Cross
 
How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)SĹ‚awomir Zborowski
 
MinGw Compiler
MinGw CompilerMinGw Compiler
MinGw CompilerAvnish Patel
 
GEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkGEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkAlexey Smirnov
 

Andere mochten auch (10)

HRM - PM in GCC
HRM - PM in GCCHRM - PM in GCC
HRM - PM in GCC
 
NetBeans para Java, C, C++
NetBeans para Java, C, C++NetBeans para Java, C, C++
NetBeans para Java, C, C++
 
Principles of compiler design
Principles of compiler designPrinciples of compiler design
Principles of compiler design
 
Introduction to Perl - Day 1
Introduction to Perl - Day 1Introduction to Perl - Day 1
Introduction to Perl - Day 1
 
GCC, GNU compiler collection
GCC, GNU compiler collectionGCC, GNU compiler collection
GCC, GNU compiler collection
 
How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)
 
MinGw Compiler
MinGw CompilerMinGw Compiler
MinGw Compiler
 
GEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkGEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions Framework
 
Deep C
Deep CDeep C
Deep C
 
GCC
GCCGCC
GCC
 

Ă„hnlich wie Gcc opt

AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORSAFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORScscpconf
 
Affect of parallel computing on multicore processors
Affect of parallel computing on multicore processorsAffect of parallel computing on multicore processors
Affect of parallel computing on multicore processorscsandit
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming Iđź’» Anton Gerdelan
 
Rust kafka-5-2019-unskip
Rust kafka-5-2019-unskipRust kafka-5-2019-unskip
Rust kafka-5-2019-unskipGerard Klijs
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesTanel Poder
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmAnne Nicolas
 
On component interface
On component interfaceOn component interface
On component interfaceLaurence Chen
 
Spectre and meltdown
Spectre and meltdownSpectre and meltdown
Spectre and meltdownAditya Kamat
 
Oracle: Binding versus caging
Oracle: Binding versus cagingOracle: Binding versus caging
Oracle: Binding versus cagingBertrandDrouvot
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsMatthew Dennis
 

Ă„hnlich wie Gcc opt (20)

Multicore
MulticoreMulticore
Multicore
 
Computer architecture
Computer architectureComputer architecture
Computer architecture
 
Oversimplified CA
Oversimplified CAOversimplified CA
Oversimplified CA
 
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORSAFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
 
Affect of parallel computing on multicore processors
Affect of parallel computing on multicore processorsAffect of parallel computing on multicore processors
Affect of parallel computing on multicore processors
 
Java under the hood
Java under the hoodJava under the hood
Java under the hood
 
Realtime
RealtimeRealtime
Realtime
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
 
Rust kafka-5-2019-unskip
Rust kafka-5-2019-unskipRust kafka-5-2019-unskip
Rust kafka-5-2019-unskip
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
 
On component interface
On component interfaceOn component interface
On component interface
 
Introduction to Parallelization and performance optimization
Introduction to Parallelization and performance optimizationIntroduction to Parallelization and performance optimization
Introduction to Parallelization and performance optimization
 
Spectre and meltdown
Spectre and meltdownSpectre and meltdown
Spectre and meltdown
 
Oracle: Binding versus caging
Oracle: Binding versus cagingOracle: Binding versus caging
Oracle: Binding versus caging
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patterns
 
Debugging ZFS: From Illumos to Linux
Debugging ZFS: From Illumos to LinuxDebugging ZFS: From Illumos to Linux
Debugging ZFS: From Illumos to Linux
 

Mehr von Mark Veltzer

Mehr von Mark Veltzer (10)

Gcc
GccGcc
Gcc
 
Linux io
Linux ioLinux io
Linux io
 
Linux logging
Linux loggingLinux logging
Linux logging
 
Linux monitoring
Linux monitoringLinux monitoring
Linux monitoring
 
Linux multiplexing
Linux multiplexingLinux multiplexing
Linux multiplexing
 
Linux tmpfs
Linux tmpfsLinux tmpfs
Linux tmpfs
 
Pthreads linux
Pthreads linuxPthreads linux
Pthreads linux
 
Streams
StreamsStreams
Streams
 
Volatile
VolatileVolatile
Volatile
 
Effective cplusplus
Effective cplusplusEffective cplusplus
Effective cplusplus
 

KĂĽrzlich hochgeladen

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

KĂĽrzlich hochgeladen (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Gcc opt

  • 1. The lesser known GCC optimizations Mark Veltzer mark@veltzer.net
  • 2. Using profile information for optimization
  • 3. The CPU pipe line â—Ź Most CPUs today are built using pipelines â—Ź This enables the CPU to clock at a higher rate â—Ź This also enables the CPU to use more of it's infrastructure at the same time and so utilizing the hardware better. â—Ź It also speed things up because computation is done in parallel. â—Ź This is even more so in multi-core CPUs.
  • 4. The problems with pipe lines â—Ź The pipe line idea is heavily tied to the idea of branch prediction â—Ź If the CPU has not yet finished certain instructions and sees a branch following them then it needs to be able to guess where the branch will go â—Ź Otherwise the pipeline idea itself becomes problematic. â—Ź That is because the execution of instructions in parallel at the CPU level is halted every time there is a branch.
  • 5. Hardware prediction â—Ź The hardware itself does a rudimentary form of prediction. â—Ź The assumption of the hardware is that what happened before will happen again (order) â—Ź This is OK and hardware does a good job even without assistance. â—Ź This means that a random program will make the hardware miss-predict and will cause execution speed to go down. â—Ź Example: "if(random()<0.5) {"
  • 6. Software prediction â—Ź The hardware manufacturers allow the software layer to tell the hardware which way a branch could go using hints left inside the assembly. â—Ź Special instructions were created for this by the hardware manufacturers â—Ź The compiler decides whether to use hinted branches or non hinted ones. â—Ź It will use the hinted ones only when it can guess where the branch will go. â—Ź For instance: in a loop the branch will tend to go back to the loop.
  • 7. The problem of branching â—Ź When the compiler sees a branch not as part of a loop (if(condition)) it does not know what are the chances that the condition will evaluate to true. â—Ź Therefor it will usually use a hint-less branch statement. â—Ź Unless you tell it otherwise. â—Ź There are two ways to tell the compiler which way the branch will go.
  • 8. First way - explicit hinting in the software â—Ź You can use the __builtin_expect construct to hint at the right path. â—Ź Instead of writing "if(x) {" you write: "if(__builtin_expect((x),1)) {" â—Ź You can wrap this in a nice macro like the Linux kernel folk did. â—Ź The compiler will plant a hint to the CPU telling it that the branch is likely to succeed. â—Ź See example
  • 9. Second way - using profile information â—Ź You leave your code alone. â—Ź You compile the code with -fprofile-arcs. â—Ź Then you run it on a typical scenario creating files which show which way branches went (auxname.gcda for each file). â—Ź Then you compile your code again with -fbranch- probabilities which uses the gathered data to plant the right hints. â—Ź In GCC you must compile the phases using the exact same flags (otherwise expect problems) â—Ź See example
  • 10. Second way - PGO â—Ź This whole approach is a subset of a bigger concept called PGO – Profile Generated Optimizations â—Ź This includes the branch prediction we saw before but also other types of optimization (reordering of switch cases as an example). â—Ź This is why you should use the more general flags -fprofile-generate and -fprofile-use which imply the previous flags and add even more profile generated optimization.