SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Mon-3-Mar, 11:15am, Mathieu Poirier
LCA14-104: GTS- A solution to
support ARM’s big.LITTLE technology
• Things to know about Global Task Scheduling (GTS).
• MP patchset description and how the solution works.
• Configuration parameters at various levels.
• Continuous integration at Linaro.
Today’s Presentation:
• This presentation is the lighter version of two
presentation Linaro has on GTS.
• The other runs for about 75 minutes and goes much
deeper in the solution.
• If you are interested in the in-depth version please
contact Joe Bates: joe.bates@linaro.org
Other Presentations on GTS:
• A set of patches enacting Global Task Scheduling(GTS).
• Developed by ARM Ltd.
• GTS modifies the Linux scheduler in order to place tasks
on the best possible CPU.
• Advantages:
• Take full advantage of the asynchronous nature of b.L architecture.
• Maximum performance
• Minimum power consumption
• Better benchmark scores for thread-intensive benchmarks.
• Increased responsiveness by spinning off new tasks on big CPUs.
• Decreases power consumption, specifically with small-task packing.
What is the MP Patchset?
• In a tarball from the release page:
• Always look for the latest “vexpress-lsk” release on release.linaro.org
- ex. for January:
http://releases.linaro.org/14.01/android/vexpress-lsk
• February should look like:
http://releases.linaro.org/14.02/android/vexpress-lsk
• In the Linaro Stable Kernel:
https://git.linaro.org/gitweb?p=kernel/linux-linaro-stable.git;a=summary
Where to get it
• In the ARM big LITTLE MP tree:
https://git.linaro.org/gitweb?p=arm/big.LITTLE/mp.git;a=summary
** Linaro doesn’t rebase the MP patchset on other kernels
than the Linaro Stable Kernel.
Where to get it (continued)
• General Overview:
• The Linux kernel builds a hierarchy of scheduling domains at boot
time. The order is (Linux convention):
• Sibling (for Hyperthreading)
• MC - multi-core
• CPU - between clusters
• NUMA
• To understand how the kernel does this:
• Enable CONFIG_SCHED_DEBUG and
• set “sched_debug=1” on the kernel cmd line
• In a pure SMP context load balancing is done by
spreading tasks evenly among all processors.
• Maximisation of CPU resources
• Run-to-completion model
MP Patchset Description
Domain Load Balancing - no GTS
CPU0 CPU1
CPU2
CFS
(MC level)
CPU3 CPU4
CFS
(MC level)
CFS
(CPU level)
CFS
(CPU level)
Vexpress (A7x3 + A15x2)
• Classic load balancing between CPU domains (i.e big
and LITTLE) is disabled.
• A derivative of Paul Turner’s “load_avg_contrib” metric is
used to decide if a task should be moved to another
HMP domain.
Paul’s work: http://lwn.net/Articles/513135/
• Migration of tasks among the CPU domains is done by
comparing their loads with migration thresholds.
• By default, all new user tasks are placed on the big
cluster.
How MP Works
Domain Load Balancing - with GTS
CPU0 CPU1
CPU2
CFS
(MC level)
CPU3 CPU4
CFS
(MC level)
CFS
(CPU level)
CFS
(CPU level)
Vexpress (A7x3 + A15x2)
GTS
Load Average Contribution and Decay
Plotting of the “runnable_avg_sum” metric introduced by Paul
Turner
• Paul Turner introduced the load average contribution
metric in his work on per-entity load tracking:
load_avg_contrib = task->weight * runnable_average
where runnable_average is:
runnable_average = runnable_avg_sum / runnable_avg_period
• runnable_avg_sum and runnable_avg_period are
geometric series.
• load_avg_contrib is good for scheduling decisions but
bad for task migration i.e, weight scaling doesn’t reflect
the true time spent by a task in the runnable state.
Per Entity Load Tracking
• The MP patchset introduces the load average ratio:
load_avg_ratio = NICE_0_LOAD * runnable_average
• The load average ratio allows for the comparison of
tasks without their weight factor, giving the same
perspective for all of them.
• At migration time the load average ratio is compared
against two thresholds:
• hmp_up_threashold
• hmp_down_threashold
Load Average Ratio
UP and Down Migration thresholds
A task’s load is compared to the up and down
migration threshold during the MP domain
balancing process.
* Source: ARM Ltd.
• The Linux scheduler will separate CPUs into domains.
• Tasks are spread out among the domains as equally as
possible.
• For GTS load balancing at the CPU domain level is
disabled.
• GTS will move tasks between CPU domains using a
derivative of the load average contribution and a couple
of thresholds.
• But when is GTS moving tasks between the CPU
domains?
What We’ve Learned So Far
• 4 task migration points:
• When tasks are created (fork migration).
• At wakeup time (wakeup migration).
• With every scheduler tick (forced migration).
• When a CPU is about to become idle (idle pull).
Task Migration Points
• When tasks are created (fork migration):
• Done by setting the task’s load statistics to their maximum value.
• Tasks are placed on big CPUs unless they are:
• Kernel Threads
• Forked from init i.e, Android services.
• Android apps are forked from Zygote, hence go on big CPUs.
• Tasks are eventually migrated down if they aren’t heavy enough.
Fork Migration
• At wakeup time (wakeup migration):
• When a task is to be placed on a CPU, the scheduler will normally
prefer:
• The previous CPU the task ran on
• Or one in the same package.
• For GTS, the decision is based on the load a task had before it was
suspended:
• if load(task) > hmp_up_threshold, select more potent HMP domain
• if load(task) < hmp_down_threshold, select less powerful HMP
domain
• What happened in the past is likely to happen again.
Wakeup Migration
• With every scheduler tick (forced migration):
• Every CPU in the system has a scheduler tick.
• With each tick (minimum interval of 1 jiffies) a CPU’s runqueue is
rebalanced if event due.
• Each time the load balancer runs, the MP code will inspect the
runqueue of all CPUs in the system:
• If LITTLE CPU → can a task be moved to big cluster?
• if ((big CPU ) && (CPU overloaded)) → offload lightest task.
• When offloading, always select an idle CPU to ensure CPU availability
for the task.
• So that tasks can be migrated as quickly as possible as domains can
stay balanced for a long time.
Forced Migration
• When a CPU is about to become idle(idle pull):
• When a CPU is about to go idle the scheduler will attempt to pull
tasks away from other CPUs in the same domain.
• Happens only if the CPU average idle time is more than the estimated
migration cost.
• Balancing within a domain is left to normal scheduler operation.
• If the scheduler didn’t find any task to pull and CPU is in big cluster:
• Go through the runqueues of all online CPUs in the LITTLE cluster.
• If a task’s load is above threshold, move it to a CPU in the big cluster.
• When moving a task, always look for the least loaded CPU.
Idle Pull
MP Migration Types
* Source: ARM Ltd.
• Scheduler will try to fit as many small task on a single
CPU as possible.
• A small task is =< 90% of NICE_0_LOAD, i,e 921
• Done on the LITTLE cluster only to make sure tasks on
the big cluster have all the CPU time they need.
• Takes place when a task is waking up:
• Using the tracked load of CPU runqueues and tasks.
• Saturation threshold to make sure tasks offloaded from
the big domain can keep being serviced.
• Effects of enabling small task packing:
• CPU operating point may increase → CPUfreq governor will kick in.
• Wakeup latency of task may increase → more tasks to run.
Small Task Packing
• Load balancing at the CPU domain level is disabled to
favour the GTS scheme.
• GTS works by comparing a task’s runnable load ratio
and migrating it to a different HMP domain if need be.
• There are 4 migration points:
• At creation time.
• At wakeup time.
• Every rebalance.
• When a CPU is about to become idle.
• Small task packing when CPU gating is possible.
Key Things to Remember
• GTS doesn’t hotplug CPUs and is not concerned at all
with hotplugging
• When hotplugging:
• It takes too long to bring a CPU in and out of service
• All smpboot threads need to be stopped.
• “stop_machine” threads suspend interrupts on all online CPUs.
• IRQs on the swapped CPU are diverted to another CPU.
• All processes in swapped CPU’s runqueue are migrated.
• CPU is taken out of coherency.
• More CPUs means longer hotplug time per CPU.
• Very expensive to make a CPU coherent with the domain hierarchy
again.
• The system needs intelligence to determine when CPUs will be
swapped in and out.
One Last Remark
• The GTS solution itself has a number of parameters that
can be tuned. Examples:
• From /sys/kernel/hmp:
• up_threshold, down_threshold for task migration limits
• load_avg_period_ms and frequency_invariant_load_scale
• From the code:
• runqueue saturation when doing small task packing
• Amount of task on a runqueue to search when force migrating between
domains
GTS Tuning
• Linaro and ARM have been using the “interactive”
governor in their testing of the solution.
• Any governor can be used.
• b.L CPUfreq driver makes the architecture seamless to the governor.
• Example of interactive governor tuneables:
• hispeed_freq and go_hispeed_load
• target_loads
• timer_rate and min_sample_time
• above_hispeed_delay
• Governors will have tuneable parameters.
• Regardless of the governor used, there are parameters to adjust in
order to yield the right behavior
• Default values are usually not what you want
CPUFreq Governor Tuning
• As Linaro assimilate MP patches in the LSK, continuous
integration testing is done daily to catch possible
regressions.
• We run bbench with an audio track in the background -
good average test case.
• exercises both big and LITTLE clusters
• All automated in our LAVA environment and results
verified each day.
• Full WA regression tests with each monthly release.
• TC2 is the only b.L platform being tested at Linaro - we’d
welcome other platforms.
MP Testing at Linaro
Question and Acknowledgements
Special thanks to:
Chris Redpath (ARM)
Robin Randhawa (ARM)
Vincent Guittot (Linaro)
More about Linaro Connect: http://connect.linaro.org
More about Linaro: http://www.linaro.org/about/
More about Linaro engineering: http://www.linaro.org/engineering/
Linaro members: www.linaro.org/members

Weitere ähnliche Inhalte

Mehr von Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaLinaro
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaLinaro
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 

Mehr von Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Kürzlich hochgeladen

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

LCA14: LCA14-104: GTS - A solution to support ARM's big.LITTLE technology

  • 1. Mon-3-Mar, 11:15am, Mathieu Poirier LCA14-104: GTS- A solution to support ARM’s big.LITTLE technology
  • 2. • Things to know about Global Task Scheduling (GTS). • MP patchset description and how the solution works. • Configuration parameters at various levels. • Continuous integration at Linaro. Today’s Presentation:
  • 3. • This presentation is the lighter version of two presentation Linaro has on GTS. • The other runs for about 75 minutes and goes much deeper in the solution. • If you are interested in the in-depth version please contact Joe Bates: joe.bates@linaro.org Other Presentations on GTS:
  • 4. • A set of patches enacting Global Task Scheduling(GTS). • Developed by ARM Ltd. • GTS modifies the Linux scheduler in order to place tasks on the best possible CPU. • Advantages: • Take full advantage of the asynchronous nature of b.L architecture. • Maximum performance • Minimum power consumption • Better benchmark scores for thread-intensive benchmarks. • Increased responsiveness by spinning off new tasks on big CPUs. • Decreases power consumption, specifically with small-task packing. What is the MP Patchset?
  • 5. • In a tarball from the release page: • Always look for the latest “vexpress-lsk” release on release.linaro.org - ex. for January: http://releases.linaro.org/14.01/android/vexpress-lsk • February should look like: http://releases.linaro.org/14.02/android/vexpress-lsk • In the Linaro Stable Kernel: https://git.linaro.org/gitweb?p=kernel/linux-linaro-stable.git;a=summary Where to get it
  • 6. • In the ARM big LITTLE MP tree: https://git.linaro.org/gitweb?p=arm/big.LITTLE/mp.git;a=summary ** Linaro doesn’t rebase the MP patchset on other kernels than the Linaro Stable Kernel. Where to get it (continued)
  • 7. • General Overview: • The Linux kernel builds a hierarchy of scheduling domains at boot time. The order is (Linux convention): • Sibling (for Hyperthreading) • MC - multi-core • CPU - between clusters • NUMA • To understand how the kernel does this: • Enable CONFIG_SCHED_DEBUG and • set “sched_debug=1” on the kernel cmd line • In a pure SMP context load balancing is done by spreading tasks evenly among all processors. • Maximisation of CPU resources • Run-to-completion model MP Patchset Description
  • 8. Domain Load Balancing - no GTS CPU0 CPU1 CPU2 CFS (MC level) CPU3 CPU4 CFS (MC level) CFS (CPU level) CFS (CPU level) Vexpress (A7x3 + A15x2)
  • 9. • Classic load balancing between CPU domains (i.e big and LITTLE) is disabled. • A derivative of Paul Turner’s “load_avg_contrib” metric is used to decide if a task should be moved to another HMP domain. Paul’s work: http://lwn.net/Articles/513135/ • Migration of tasks among the CPU domains is done by comparing their loads with migration thresholds. • By default, all new user tasks are placed on the big cluster. How MP Works
  • 10. Domain Load Balancing - with GTS CPU0 CPU1 CPU2 CFS (MC level) CPU3 CPU4 CFS (MC level) CFS (CPU level) CFS (CPU level) Vexpress (A7x3 + A15x2) GTS
  • 11. Load Average Contribution and Decay Plotting of the “runnable_avg_sum” metric introduced by Paul Turner
  • 12. • Paul Turner introduced the load average contribution metric in his work on per-entity load tracking: load_avg_contrib = task->weight * runnable_average where runnable_average is: runnable_average = runnable_avg_sum / runnable_avg_period • runnable_avg_sum and runnable_avg_period are geometric series. • load_avg_contrib is good for scheduling decisions but bad for task migration i.e, weight scaling doesn’t reflect the true time spent by a task in the runnable state. Per Entity Load Tracking
  • 13. • The MP patchset introduces the load average ratio: load_avg_ratio = NICE_0_LOAD * runnable_average • The load average ratio allows for the comparison of tasks without their weight factor, giving the same perspective for all of them. • At migration time the load average ratio is compared against two thresholds: • hmp_up_threashold • hmp_down_threashold Load Average Ratio
  • 14. UP and Down Migration thresholds A task’s load is compared to the up and down migration threshold during the MP domain balancing process. * Source: ARM Ltd.
  • 15. • The Linux scheduler will separate CPUs into domains. • Tasks are spread out among the domains as equally as possible. • For GTS load balancing at the CPU domain level is disabled. • GTS will move tasks between CPU domains using a derivative of the load average contribution and a couple of thresholds. • But when is GTS moving tasks between the CPU domains? What We’ve Learned So Far
  • 16. • 4 task migration points: • When tasks are created (fork migration). • At wakeup time (wakeup migration). • With every scheduler tick (forced migration). • When a CPU is about to become idle (idle pull). Task Migration Points
  • 17. • When tasks are created (fork migration): • Done by setting the task’s load statistics to their maximum value. • Tasks are placed on big CPUs unless they are: • Kernel Threads • Forked from init i.e, Android services. • Android apps are forked from Zygote, hence go on big CPUs. • Tasks are eventually migrated down if they aren’t heavy enough. Fork Migration
  • 18. • At wakeup time (wakeup migration): • When a task is to be placed on a CPU, the scheduler will normally prefer: • The previous CPU the task ran on • Or one in the same package. • For GTS, the decision is based on the load a task had before it was suspended: • if load(task) > hmp_up_threshold, select more potent HMP domain • if load(task) < hmp_down_threshold, select less powerful HMP domain • What happened in the past is likely to happen again. Wakeup Migration
  • 19. • With every scheduler tick (forced migration): • Every CPU in the system has a scheduler tick. • With each tick (minimum interval of 1 jiffies) a CPU’s runqueue is rebalanced if event due. • Each time the load balancer runs, the MP code will inspect the runqueue of all CPUs in the system: • If LITTLE CPU → can a task be moved to big cluster? • if ((big CPU ) && (CPU overloaded)) → offload lightest task. • When offloading, always select an idle CPU to ensure CPU availability for the task. • So that tasks can be migrated as quickly as possible as domains can stay balanced for a long time. Forced Migration
  • 20. • When a CPU is about to become idle(idle pull): • When a CPU is about to go idle the scheduler will attempt to pull tasks away from other CPUs in the same domain. • Happens only if the CPU average idle time is more than the estimated migration cost. • Balancing within a domain is left to normal scheduler operation. • If the scheduler didn’t find any task to pull and CPU is in big cluster: • Go through the runqueues of all online CPUs in the LITTLE cluster. • If a task’s load is above threshold, move it to a CPU in the big cluster. • When moving a task, always look for the least loaded CPU. Idle Pull
  • 21. MP Migration Types * Source: ARM Ltd.
  • 22. • Scheduler will try to fit as many small task on a single CPU as possible. • A small task is =< 90% of NICE_0_LOAD, i,e 921 • Done on the LITTLE cluster only to make sure tasks on the big cluster have all the CPU time they need. • Takes place when a task is waking up: • Using the tracked load of CPU runqueues and tasks. • Saturation threshold to make sure tasks offloaded from the big domain can keep being serviced. • Effects of enabling small task packing: • CPU operating point may increase → CPUfreq governor will kick in. • Wakeup latency of task may increase → more tasks to run. Small Task Packing
  • 23. • Load balancing at the CPU domain level is disabled to favour the GTS scheme. • GTS works by comparing a task’s runnable load ratio and migrating it to a different HMP domain if need be. • There are 4 migration points: • At creation time. • At wakeup time. • Every rebalance. • When a CPU is about to become idle. • Small task packing when CPU gating is possible. Key Things to Remember
  • 24. • GTS doesn’t hotplug CPUs and is not concerned at all with hotplugging • When hotplugging: • It takes too long to bring a CPU in and out of service • All smpboot threads need to be stopped. • “stop_machine” threads suspend interrupts on all online CPUs. • IRQs on the swapped CPU are diverted to another CPU. • All processes in swapped CPU’s runqueue are migrated. • CPU is taken out of coherency. • More CPUs means longer hotplug time per CPU. • Very expensive to make a CPU coherent with the domain hierarchy again. • The system needs intelligence to determine when CPUs will be swapped in and out. One Last Remark
  • 25. • The GTS solution itself has a number of parameters that can be tuned. Examples: • From /sys/kernel/hmp: • up_threshold, down_threshold for task migration limits • load_avg_period_ms and frequency_invariant_load_scale • From the code: • runqueue saturation when doing small task packing • Amount of task on a runqueue to search when force migrating between domains GTS Tuning
  • 26. • Linaro and ARM have been using the “interactive” governor in their testing of the solution. • Any governor can be used. • b.L CPUfreq driver makes the architecture seamless to the governor. • Example of interactive governor tuneables: • hispeed_freq and go_hispeed_load • target_loads • timer_rate and min_sample_time • above_hispeed_delay • Governors will have tuneable parameters. • Regardless of the governor used, there are parameters to adjust in order to yield the right behavior • Default values are usually not what you want CPUFreq Governor Tuning
  • 27. • As Linaro assimilate MP patches in the LSK, continuous integration testing is done daily to catch possible regressions. • We run bbench with an audio track in the background - good average test case. • exercises both big and LITTLE clusters • All automated in our LAVA environment and results verified each day. • Full WA regression tests with each monthly release. • TC2 is the only b.L platform being tested at Linaro - we’d welcome other platforms. MP Testing at Linaro
  • 28. Question and Acknowledgements Special thanks to: Chris Redpath (ARM) Robin Randhawa (ARM) Vincent Guittot (Linaro)
  • 29. More about Linaro Connect: http://connect.linaro.org More about Linaro: http://www.linaro.org/about/ More about Linaro engineering: http://www.linaro.org/engineering/ Linaro members: www.linaro.org/members