3. The CPU pipe line
â—Ź Most CPUs today are built using pipelines
â—Ź This enables the CPU to clock at a higher rate
â—Ź This also enables the CPU to use more of it's
infrastructure at the same time and so utilizing
the hardware better.
â—Ź It also speed things up because computation is
done in parallel.
â—Ź This is even more so in multi-core CPUs.
4. The problems with pipe lines
â—Ź The pipe line idea is heavily tied to the idea of branch
prediction
â—Ź If the CPU has not yet finished certain instructions
and sees a branch following them then it needs to be
able to guess where the branch will go
â—Ź Otherwise the pipeline idea itself becomes
problematic.
â—Ź That is because the execution of instructions in
parallel at the CPU level is halted every time there is
a branch.
5. Hardware prediction
â—Ź The hardware itself does a rudimentary form of
prediction.
â—Ź The assumption of the hardware is that what
happened before will happen again (order)
â—Ź This is OK and hardware does a good job even
without assistance.
â—Ź This means that a random program will make the
hardware miss-predict and will cause execution
speed to go down.
â—Ź Example: "if(random()<0.5) {"
6. Software prediction
â—Ź The hardware manufacturers allow the software layer
to tell the hardware which way a branch could go using
hints left inside the assembly.
â—Ź Special instructions were created for this by the
hardware manufacturers
â—Ź The compiler decides whether to use hinted branches
or non hinted ones.
â—Ź It will use the hinted ones only when it can guess where
the branch will go.
â—Ź For instance: in a loop the branch will tend to go back
to the loop.
7. The problem of branching
â—Ź When the compiler sees a branch not as part of
a loop (if(condition)) it does not know what are
the chances that the condition will evaluate to
true.
â—Ź Therefor it will usually use a hint-less branch
statement.
â—Ź Unless you tell it otherwise.
â—Ź There are two ways to tell the compiler which
way the branch will go.
8. First way - explicit hinting in the
software
â—Ź You can use the __builtin_expect construct to
hint at the right path.
â—Ź Instead of writing "if(x) {" you write:
"if(__builtin_expect((x),1)) {"
â—Ź You can wrap this in a nice macro like the Linux
kernel folk did.
â—Ź The compiler will plant a hint to the CPU telling
it that the branch is likely to succeed.
â—Ź See example
9. Second way - using profile
information
â—Ź You leave your code alone.
â—Ź You compile the code with -fprofile-arcs.
â—Ź Then you run it on a typical scenario creating files which
show which way branches went (auxname.gcda for each
file).
â—Ź Then you compile your code again with -fbranch-
probabilities which uses the gathered data to plant the
right hints.
â—Ź In GCC you must compile the phases using the exact
same flags (otherwise expect problems)
â—Ź See example
10. Second way - PGO
â—Ź This whole approach is a subset of a bigger
concept called PGO – Profile Generated
Optimizations
â—Ź This includes the branch prediction we saw
before but also other types of optimization
(reordering of switch cases as an example).
â—Ź This is why you should use the more general
flags -fprofile-generate and -fprofile-use which
imply the previous flags and add even more
profile generated optimization.