SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Analysis of Multithreading Capabilities
of Current High-Performance
Processors
Kamil Kędzierski, Francisco J. Cazorla, Mateo Valero
kkedzier@ac.upc.edu, francisco.cazorla@bsc.es, mateo@ac.upc.edu
Outline
• Introduction
• Implementations
• Performance
• Conclusions
• References
Introduction
Implementation
Performance
Conclusions
Do we need multithreading?
• Problem:
resources not used efficiently
in a super-scalar processor
• Goal:
increase efficiency – do more
computation with similar
resource amount
• One possible solution:
SMT – Simultaneous
Multithreading
[ Francisco J. Cazorla, “QoS for SMT Processors”, PhD Thesis, DAC, UPC, 2005 ]
Introduction
Implementation
Performance
Conclusions
Motivation
History
How did we get to SMT?
• Multithreading first appeared in
the 1950s, and Simultaneous
Multithreading was investigated
by IBM in 1968
• Super-scalar architectures: only
one thread is running at a given
time
• Multiprocessor: each thread uses
a different set of resources
[ http://www.cs.clemson.edu/~mark/multithreading.html ]
[ Francisco J. Cazorla, “QoS for SMT Processors”, PhD Thesis, DAC, UPC, 2005 ]
Introduction
Implementation
Performance
Conclusions
Motivation
History
How did we get to SMT?
• Coarse-grain MT: thread switch on a long-latency operations
– a.k.a. "concurrent multithreading (CMT)“, "switch-on-event multithreading (SOEMT)“,
"dynamic blocked multithreading (BMT)"
• Fine-grain MT: thread switch
not only on a long-latency ops
– a.k.a. "vertical multithreading“,
"interleaved multithreading (IMT)“,
"time-sharing multithreading“,
"hardware multiprogramming" (1958)
• Simultaneous MT:
ops issue from different threads
at the same clock cycle
[ http://www.cs.clemson.edu/~mark/multithreading.html ]
[ Francisco J. Cazorla, “QoS for SMT Processors”, PhD Thesis, DAC, UPC, 2005 ]
Introduction
Implementation
Performance
Conclusions
Motivation
History
Outline
• Introduction
• Implementations
• Performance
• Conclusions
• References
Introduction
Implementations
Performance
Conclusions
Studied implementations
• IBM’s POWER5
• Intel’s Xeon (Pentium 4)
POWER5 die photo
[B. Sinharoy et al,“POWER5 system microarchitecture”, IBM Journal of Research and Development 2005 ]
Introduction
Implementations
Performance
Conclusions
IBM’s POWER5
Intel’s Xeon
Comparison
SMT implementation in POWER5
• SMT added to super-scalar architecture
– Second PC
– Return stack in branch prediction logic duplicated
– GPR/FPR rename mapper expanded for second set of registers
– Completion logic duplicated
– Thread bit added to most address/tag busses
[Ron Kalla et al,”Simultaneous Multi-threading Implementation in POWER5”, HotChips, 2003 ]
[B. Sinharoy et al “POWER5 system microarchitecture”, IBM Journal of Research and Development 2005 ]
Introduction
Implementations
Performance
Conclusions
IBM’s POWER5
Intel’s Xeon
Comparison
POWER5 architecture
[B. Sinharoy et al “POWER5 system microarchitecture”, IBM Journal of Research and Development 2005 ]
Introduction
Implementations
Performance
Conclusions
IBM’s POWER5
Intel’s Xeon
Comparison
SMT implementation in Xeon
• SMT added to super-scalar architecture
– Second PC
– ITLB duplicated
– Microcode queue duplicated
– Return stack buffer duplicated
– Rename logic duplicated
– Thread bit added to most address/tag busses
[Deborah T. Marr et al “Hyper-Threading Technology Arch. and Microarchitecture”, Intel Technology Journal, Vol. 6, Issue 1, 2002 ]
Introduction
Implementations
Performance
Conclusions
IBM’s POWER5
Intel’s Xeon
Comparison
Xeon architecture
Additional stages in case of
cache miss – used to fill the
Trace Cache
[Deborah T. Marr et al “Hyper-Threading Technology Arch. and Microarchitecture”, Intel Technology Journal, Vol. 6, Issue 1, 2002 ]
[David Burns “Pre-Silicon Validation of Hyper-Threading Technology”, Intel Technology Journal, Volume 6, Issue 1, 2002 ]
Introduction
Implementations
Performance
Conclusions
IBM’s POWER5
Intel’s Xeon
Comparison
Simplified pipeline’s views
• IBM’s POWER5
• Intel’s Xeon (Pentium 4)
[ Kamil Kedzierski et al, “Analysis of Simultaneous Multithreading Implementations in Current High-Performance Processors”,
ACACES Second International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems
2006 ]
Introduction
Implementations
Performance
Conclusions
IBM’s POWER5
Intel’s Xeon
Comparison
SMT implementations comparison
[ Kamil Kedzierski et al, “Analysis of Simultaneous Multithreading Implementations in Current High-Performance Processors”,
ACACES Second International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems
2006 ]
Resource POWER5 Intel Xeon
PC duplicated duplicated
Instruction Cache shared shared
ITLB shared duplicated
DTLB shared shared
BHT shared shared
Return Stack Buffer duplicated duplicated
Decode shared shared
Instruction Buffer/
uCode Queue
duplicated duplicated
Group Formation shared -
Mapping/Rename shared duplicated
IQ shared partitioned
Scheduler - shared
Register Read shared shared
FUs shared shared
Store Queues duplicated partitioned
Register Write shared shared
GCT/Commit duplicated partitioned
Introduction
Implementations
Performance
Conclusions
IBM’s POWER5
Intel’s Xeon
Comparison
Outline
• Introduction
• Implementations
• Performance
• Conclusions
• References
Introduction
Implementations
Performance
Conclusions
POWER5 performance
0,97 + 0,97 = 1,94
[ R. Kalla et al ”IBM POWER5 Chip: A Dual-Core Multithreaded Processor”, IEEE MICRO 2004 ]
[ Francisco J. Cazorla, “QoS for SMT Processors”, PhD Thesis, DAC, UPC, 2005 ]
Introduction
Implementations
Performance
Conclusions
IBM’s POWER5
Intel’s Xeon
Comparison
POWER5 performance
0,85 + 1 = 1,85
[ Francisco J. Cazorla, “QoS for SMT Processors”, PhD Thesis, DAC, UPC, 2005 ]
[ R. Kalla et al ”IBM POWER5 Chip: A Dual-Core Multithreaded Processor”, IEEE MICRO 2004 ]
Introduction
Implementations
Performance
Conclusions
IBM’s POWER5
Intel’s Xeon
Comparison
POWER5 performance
0,85 + 1 = 1,85
[ Francisco J. Cazorla, “QoS for SMT Processors”, PhD Thesis, DAC, UPC, 2005 ]
[ R. Kalla et al ”IBM POWER5 Chip: A Dual-Core Multithreaded Processor”, IEEE MICRO 2004 ]
Shows only similar
trends!
Pictures for two different
architectures
Introduction
Implementations
Performance
Conclusions
IBM’s POWER5
Intel’s Xeon
Comparison
Xeon performance
Varying processors in a system Varying server workloads
[Deborah T. Marr et al “Hyper-Threading Technology Arch. and Microarchitecture”, Intel Technology Journal, Vol. 6, Issue 1, 2002 ]
Introduction
Implementations
Performance
Conclusions
IBM’s POWER5
Intel’s Xeon
Comparison
Average performance comparison
• POWER5’s performance is reported to increase up to 41%
• Xeon’s performance is reported to increase up to 28%
• BUT POWER5 is dual-core architecture 
[Deborah T. Marr et al “Hyper-Threading Technology Arch. and Microarchitecture”, Intel Technology Journal, Vol. 6, Issue 1, 2002 ]
[B. Sinharoy et al “POWER5 system microarchitecture”, IBM Journal of Research and Development 2005 ]
Introduction
Implementations
Performance
Conclusions
IBM’s POWER5
Intel’s Xeon
Comparison
Outline
• Introduction
• Implementations
• Performance
• Conclusions
• References
Introduction
Implementations
Performance
Conclusions
Conclusions
• Two architectures have been studied (POWER5 and Xeon)
• Adding second thread usually increases performance
– Heavy depends on a given workload
• Most of the resources are shared
• Only the crucial ones are duplicated/partitioned:
– Return Stack Buffer, Instruction Buffer/uCode Queue, Store Queues
and GCT/Commit logic
• In addition Xeon more often duplicates/partitiones than shares:
– ITLB, Rename and IQ
• The thread level parallelism is a good answer on how to increase
performance without drastically increasing the resources
Introduction
Implementations
Performance
Conclusions
Conclusions
References
References
1. R. Kalla, B. Sinharoy, J. M. Tendler, ”IBM POWER5 Chip: A Dual-Core Multithreaded Processor”, IEEE MICRO
2004, pages 40-47
2. B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner “POWER5 system
microarchitecture”, IBM Journal of Research and Development 2005, pages 505-521
3. H. M. Mathis, A. E. Mericas, J. D. McCalpin, R. J. Eickemeyer, and S. R. Kunkel “Characterization of
simultaneous multithreading (SMT) efficiency in POWER5”, IBM Journal of Research and Development 2005,
pages 555-564
4. Deborah T. Marr, Frank Binns, David L. Hill, Glenn Hinton, David A. Koufaty, J. Alan Miller, Michael Upton
“Hyper-Threading Technology Architecture and Microarchitecture”, Intel Technology Journal, Volume 6, Issue 1,
2002
5. David Burns “Pre-Silicon Validation of Hyper-Threading Technology”, Intel Technology Journal, Volume 6, Issue
1, 2002
6. Ron Kalla, BalaramSinharoy, Joel Tendler, ”Simultaneous Multi-threading Implementation in POWER5”, A
Symposium on High Performance Chips (HotChips), 19th August 2003,
7. G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. “The microarchitecture of the
Pentium 4 processor”, Intel Technology Journal, Volume 5, Issue 1, 2001
8. Joel Emer, “EV8:The Post-Ultimate Alpha”, International Conference on Parallel Architectures and Compilation
Techniques, page 155, 2001
9. Shubu Mukherjee, “The Alpha 21364 and 21464 Microprocessors: Continuing the Performance Lead Beyond
Y2K”, 1999, http://www.cs.wisc.edu/~shubu/talks/alpha.ppt
10. Rick Merritt, “Designers cut fresh paths to parallelism”, EETimes
http://www.eetimes.com/story/OEG19991008S0014
11. Francisco J. Cazorla Almeida, “Quality of Service for Simultaneous Multithreading Processors (QoS for SMT
Processors)”, PhD Thesis, DAC, UPC, 2005
12. Kamil Kędzierski, Francisco J. Cazorla and Mateo Valero, “Analysis of Simultaneous Multithreading
Implementations in Current High-Performance Processors”, ACACES Second International Summer School on
Advanced Computer Architecture and Compilation for Embedded Systems, Poster Session, July 26th
, 2006
pages 113-116
13. http://www.cs.clemson.edu/~mark/multithreading.html
Introduction
Implementations
Performance
Conclusions
Conclusions
References

Weitere ähnliche Inhalte

Ähnlich wie Analysis of Multithreading Capabilities of Current High–Performance Processors 2005

Computer Architechture and Organization
Computer Architechture and OrganizationComputer Architechture and Organization
Computer Architechture and OrganizationAiman Hafeez
 
Computer Architecture
Computer ArchitectureComputer Architecture
Computer ArchitectureHaris456
 
29092013042656 multicore-processor-technology
29092013042656 multicore-processor-technology29092013042656 multicore-processor-technology
29092013042656 multicore-processor-technologySindhu Nathan
 
Loyd Baker: MBSE - connecting the dots process with loyd baker
Loyd Baker: MBSE - connecting the dots process with loyd bakerLoyd Baker: MBSE - connecting the dots process with loyd baker
Loyd Baker: MBSE - connecting the dots process with loyd bakerEnergyTech2015
 
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processorHardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processorSlide_N
 
Embedded system Design
Embedded system DesignEmbedded system Design
Embedded system DesignAJAL A J
 
The Coming Age of Extreme Heterogeneity in HPC
The Coming Age of Extreme Heterogeneity in HPCThe Coming Age of Extreme Heterogeneity in HPC
The Coming Age of Extreme Heterogeneity in HPCinside-BigData.com
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Scale Up Performance with Intel® Development
Scale Up Performance with Intel® DevelopmentScale Up Performance with Intel® Development
Scale Up Performance with Intel® DevelopmentIntel IT Center
 
Architecting Solutions for the Manycore Future
Architecting Solutions for the Manycore FutureArchitecting Solutions for the Manycore Future
Architecting Solutions for the Manycore FutureTalbott Crowell
 
PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko Neotys
 
Comparative Study of RISC AND CISC Architectures
Comparative Study of RISC AND CISC ArchitecturesComparative Study of RISC AND CISC Architectures
Comparative Study of RISC AND CISC ArchitecturesEditor IJCATR
 
FYP Presentation MAREI
FYP Presentation MAREIFYP Presentation MAREI
FYP Presentation MAREIMohamed Marei
 
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...Kento Aoyama
 
UNIT I_Introduction.pptx
UNIT I_Introduction.pptxUNIT I_Introduction.pptx
UNIT I_Introduction.pptxssuser4ca1eb
 

Ähnlich wie Analysis of Multithreading Capabilities of Current High–Performance Processors 2005 (20)

Computer Architechture and Organization
Computer Architechture and OrganizationComputer Architechture and Organization
Computer Architechture and Organization
 
Computer Architecture
Computer ArchitectureComputer Architecture
Computer Architecture
 
Embedded System-design technology
Embedded System-design technologyEmbedded System-design technology
Embedded System-design technology
 
29092013042656 multicore-processor-technology
29092013042656 multicore-processor-technology29092013042656 multicore-processor-technology
29092013042656 multicore-processor-technology
 
Loyd Baker: MBSE - connecting the dots process with loyd baker
Loyd Baker: MBSE - connecting the dots process with loyd bakerLoyd Baker: MBSE - connecting the dots process with loyd baker
Loyd Baker: MBSE - connecting the dots process with loyd baker
 
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processorHardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
 
Embedded system Design
Embedded system DesignEmbedded system Design
Embedded system Design
 
The Coming Age of Extreme Heterogeneity in HPC
The Coming Age of Extreme Heterogeneity in HPCThe Coming Age of Extreme Heterogeneity in HPC
The Coming Age of Extreme Heterogeneity in HPC
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
computer architecture.
computer architecture.computer architecture.
computer architecture.
 
Scale Up Performance with Intel® Development
Scale Up Performance with Intel® DevelopmentScale Up Performance with Intel® Development
Scale Up Performance with Intel® Development
 
Architecting Solutions for the Manycore Future
Architecting Solutions for the Manycore FutureArchitecting Solutions for the Manycore Future
Architecting Solutions for the Manycore Future
 
Interview
InterviewInterview
Interview
 
PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko
 
Construct an Efficient and Secure Microkernel for IoT
Construct an Efficient and Secure Microkernel for IoTConstruct an Efficient and Secure Microkernel for IoT
Construct an Efficient and Secure Microkernel for IoT
 
Comparative Study of RISC AND CISC Architectures
Comparative Study of RISC AND CISC ArchitecturesComparative Study of RISC AND CISC Architectures
Comparative Study of RISC AND CISC Architectures
 
FYP Presentation MAREI
FYP Presentation MAREIFYP Presentation MAREI
FYP Presentation MAREI
 
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
 
Developing Digital Twins
Developing Digital TwinsDeveloping Digital Twins
Developing Digital Twins
 
UNIT I_Introduction.pptx
UNIT I_Introduction.pptxUNIT I_Introduction.pptx
UNIT I_Introduction.pptx
 

Kürzlich hochgeladen

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 

Kürzlich hochgeladen (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 

Analysis of Multithreading Capabilities of Current High–Performance Processors 2005

  • 1. Analysis of Multithreading Capabilities of Current High-Performance Processors Kamil Kędzierski, Francisco J. Cazorla, Mateo Valero kkedzier@ac.upc.edu, francisco.cazorla@bsc.es, mateo@ac.upc.edu
  • 2. Outline • Introduction • Implementations • Performance • Conclusions • References Introduction Implementation Performance Conclusions
  • 3. Do we need multithreading? • Problem: resources not used efficiently in a super-scalar processor • Goal: increase efficiency – do more computation with similar resource amount • One possible solution: SMT – Simultaneous Multithreading [ Francisco J. Cazorla, “QoS for SMT Processors”, PhD Thesis, DAC, UPC, 2005 ] Introduction Implementation Performance Conclusions Motivation History
  • 4. How did we get to SMT? • Multithreading first appeared in the 1950s, and Simultaneous Multithreading was investigated by IBM in 1968 • Super-scalar architectures: only one thread is running at a given time • Multiprocessor: each thread uses a different set of resources [ http://www.cs.clemson.edu/~mark/multithreading.html ] [ Francisco J. Cazorla, “QoS for SMT Processors”, PhD Thesis, DAC, UPC, 2005 ] Introduction Implementation Performance Conclusions Motivation History
  • 5. How did we get to SMT? • Coarse-grain MT: thread switch on a long-latency operations – a.k.a. "concurrent multithreading (CMT)“, "switch-on-event multithreading (SOEMT)“, "dynamic blocked multithreading (BMT)" • Fine-grain MT: thread switch not only on a long-latency ops – a.k.a. "vertical multithreading“, "interleaved multithreading (IMT)“, "time-sharing multithreading“, "hardware multiprogramming" (1958) • Simultaneous MT: ops issue from different threads at the same clock cycle [ http://www.cs.clemson.edu/~mark/multithreading.html ] [ Francisco J. Cazorla, “QoS for SMT Processors”, PhD Thesis, DAC, UPC, 2005 ] Introduction Implementation Performance Conclusions Motivation History
  • 6. Outline • Introduction • Implementations • Performance • Conclusions • References Introduction Implementations Performance Conclusions
  • 7. Studied implementations • IBM’s POWER5 • Intel’s Xeon (Pentium 4) POWER5 die photo [B. Sinharoy et al,“POWER5 system microarchitecture”, IBM Journal of Research and Development 2005 ] Introduction Implementations Performance Conclusions IBM’s POWER5 Intel’s Xeon Comparison
  • 8. SMT implementation in POWER5 • SMT added to super-scalar architecture – Second PC – Return stack in branch prediction logic duplicated – GPR/FPR rename mapper expanded for second set of registers – Completion logic duplicated – Thread bit added to most address/tag busses [Ron Kalla et al,”Simultaneous Multi-threading Implementation in POWER5”, HotChips, 2003 ] [B. Sinharoy et al “POWER5 system microarchitecture”, IBM Journal of Research and Development 2005 ] Introduction Implementations Performance Conclusions IBM’s POWER5 Intel’s Xeon Comparison
  • 9. POWER5 architecture [B. Sinharoy et al “POWER5 system microarchitecture”, IBM Journal of Research and Development 2005 ] Introduction Implementations Performance Conclusions IBM’s POWER5 Intel’s Xeon Comparison
  • 10. SMT implementation in Xeon • SMT added to super-scalar architecture – Second PC – ITLB duplicated – Microcode queue duplicated – Return stack buffer duplicated – Rename logic duplicated – Thread bit added to most address/tag busses [Deborah T. Marr et al “Hyper-Threading Technology Arch. and Microarchitecture”, Intel Technology Journal, Vol. 6, Issue 1, 2002 ] Introduction Implementations Performance Conclusions IBM’s POWER5 Intel’s Xeon Comparison
  • 11. Xeon architecture Additional stages in case of cache miss – used to fill the Trace Cache [Deborah T. Marr et al “Hyper-Threading Technology Arch. and Microarchitecture”, Intel Technology Journal, Vol. 6, Issue 1, 2002 ] [David Burns “Pre-Silicon Validation of Hyper-Threading Technology”, Intel Technology Journal, Volume 6, Issue 1, 2002 ] Introduction Implementations Performance Conclusions IBM’s POWER5 Intel’s Xeon Comparison
  • 12. Simplified pipeline’s views • IBM’s POWER5 • Intel’s Xeon (Pentium 4) [ Kamil Kedzierski et al, “Analysis of Simultaneous Multithreading Implementations in Current High-Performance Processors”, ACACES Second International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems 2006 ] Introduction Implementations Performance Conclusions IBM’s POWER5 Intel’s Xeon Comparison
  • 13. SMT implementations comparison [ Kamil Kedzierski et al, “Analysis of Simultaneous Multithreading Implementations in Current High-Performance Processors”, ACACES Second International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems 2006 ] Resource POWER5 Intel Xeon PC duplicated duplicated Instruction Cache shared shared ITLB shared duplicated DTLB shared shared BHT shared shared Return Stack Buffer duplicated duplicated Decode shared shared Instruction Buffer/ uCode Queue duplicated duplicated Group Formation shared - Mapping/Rename shared duplicated IQ shared partitioned Scheduler - shared Register Read shared shared FUs shared shared Store Queues duplicated partitioned Register Write shared shared GCT/Commit duplicated partitioned Introduction Implementations Performance Conclusions IBM’s POWER5 Intel’s Xeon Comparison
  • 14. Outline • Introduction • Implementations • Performance • Conclusions • References Introduction Implementations Performance Conclusions
  • 15. POWER5 performance 0,97 + 0,97 = 1,94 [ R. Kalla et al ”IBM POWER5 Chip: A Dual-Core Multithreaded Processor”, IEEE MICRO 2004 ] [ Francisco J. Cazorla, “QoS for SMT Processors”, PhD Thesis, DAC, UPC, 2005 ] Introduction Implementations Performance Conclusions IBM’s POWER5 Intel’s Xeon Comparison
  • 16. POWER5 performance 0,85 + 1 = 1,85 [ Francisco J. Cazorla, “QoS for SMT Processors”, PhD Thesis, DAC, UPC, 2005 ] [ R. Kalla et al ”IBM POWER5 Chip: A Dual-Core Multithreaded Processor”, IEEE MICRO 2004 ] Introduction Implementations Performance Conclusions IBM’s POWER5 Intel’s Xeon Comparison
  • 17. POWER5 performance 0,85 + 1 = 1,85 [ Francisco J. Cazorla, “QoS for SMT Processors”, PhD Thesis, DAC, UPC, 2005 ] [ R. Kalla et al ”IBM POWER5 Chip: A Dual-Core Multithreaded Processor”, IEEE MICRO 2004 ] Shows only similar trends! Pictures for two different architectures Introduction Implementations Performance Conclusions IBM’s POWER5 Intel’s Xeon Comparison
  • 18. Xeon performance Varying processors in a system Varying server workloads [Deborah T. Marr et al “Hyper-Threading Technology Arch. and Microarchitecture”, Intel Technology Journal, Vol. 6, Issue 1, 2002 ] Introduction Implementations Performance Conclusions IBM’s POWER5 Intel’s Xeon Comparison
  • 19. Average performance comparison • POWER5’s performance is reported to increase up to 41% • Xeon’s performance is reported to increase up to 28% • BUT POWER5 is dual-core architecture  [Deborah T. Marr et al “Hyper-Threading Technology Arch. and Microarchitecture”, Intel Technology Journal, Vol. 6, Issue 1, 2002 ] [B. Sinharoy et al “POWER5 system microarchitecture”, IBM Journal of Research and Development 2005 ] Introduction Implementations Performance Conclusions IBM’s POWER5 Intel’s Xeon Comparison
  • 20. Outline • Introduction • Implementations • Performance • Conclusions • References Introduction Implementations Performance Conclusions
  • 21. Conclusions • Two architectures have been studied (POWER5 and Xeon) • Adding second thread usually increases performance – Heavy depends on a given workload • Most of the resources are shared • Only the crucial ones are duplicated/partitioned: – Return Stack Buffer, Instruction Buffer/uCode Queue, Store Queues and GCT/Commit logic • In addition Xeon more often duplicates/partitiones than shares: – ITLB, Rename and IQ • The thread level parallelism is a good answer on how to increase performance without drastically increasing the resources Introduction Implementations Performance Conclusions Conclusions References
  • 22. References 1. R. Kalla, B. Sinharoy, J. M. Tendler, ”IBM POWER5 Chip: A Dual-Core Multithreaded Processor”, IEEE MICRO 2004, pages 40-47 2. B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner “POWER5 system microarchitecture”, IBM Journal of Research and Development 2005, pages 505-521 3. H. M. Mathis, A. E. Mericas, J. D. McCalpin, R. J. Eickemeyer, and S. R. Kunkel “Characterization of simultaneous multithreading (SMT) efficiency in POWER5”, IBM Journal of Research and Development 2005, pages 555-564 4. Deborah T. Marr, Frank Binns, David L. Hill, Glenn Hinton, David A. Koufaty, J. Alan Miller, Michael Upton “Hyper-Threading Technology Architecture and Microarchitecture”, Intel Technology Journal, Volume 6, Issue 1, 2002 5. David Burns “Pre-Silicon Validation of Hyper-Threading Technology”, Intel Technology Journal, Volume 6, Issue 1, 2002 6. Ron Kalla, BalaramSinharoy, Joel Tendler, ”Simultaneous Multi-threading Implementation in POWER5”, A Symposium on High Performance Chips (HotChips), 19th August 2003, 7. G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. “The microarchitecture of the Pentium 4 processor”, Intel Technology Journal, Volume 5, Issue 1, 2001 8. Joel Emer, “EV8:The Post-Ultimate Alpha”, International Conference on Parallel Architectures and Compilation Techniques, page 155, 2001 9. Shubu Mukherjee, “The Alpha 21364 and 21464 Microprocessors: Continuing the Performance Lead Beyond Y2K”, 1999, http://www.cs.wisc.edu/~shubu/talks/alpha.ppt 10. Rick Merritt, “Designers cut fresh paths to parallelism”, EETimes http://www.eetimes.com/story/OEG19991008S0014 11. Francisco J. Cazorla Almeida, “Quality of Service for Simultaneous Multithreading Processors (QoS for SMT Processors)”, PhD Thesis, DAC, UPC, 2005 12. Kamil Kędzierski, Francisco J. Cazorla and Mateo Valero, “Analysis of Simultaneous Multithreading Implementations in Current High-Performance Processors”, ACACES Second International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems, Poster Session, July 26th , 2006 pages 113-116 13. http://www.cs.clemson.edu/~mark/multithreading.html Introduction Implementations Performance Conclusions Conclusions References