Suche senden
Hochladen
01 intel processor architecture core
•
7 gefällt mir
•
6,254 views
S
sssuhas
Folgen
Technologie
Business
Melden
Teilen
Melden
Teilen
1 von 155
Jetzt herunterladen
Downloaden Sie, um offline zu lesen
Empfohlen
Intel core i7 processor
Intel core i7 processor
Gautam Kumar
Case study on Intel core i3 processor.
Case study on Intel core i3 processor.
Mauryasuraj98
I3 Vs I5 Vs I7
I3 Vs I5 Vs I7
aravindalluri6
Intel core i7
Intel core i7
Amit Kundu
intel core i7
intel core i7
Aleem Pasha
Presentation on - Processors
Presentation on - Processors
The Avi Sharma
Intel Core I5
Intel Core I5
Raafat Ismael
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD
Empfohlen
Intel core i7 processor
Intel core i7 processor
Gautam Kumar
Case study on Intel core i3 processor.
Case study on Intel core i3 processor.
Mauryasuraj98
I3 Vs I5 Vs I7
I3 Vs I5 Vs I7
aravindalluri6
Intel core i7
Intel core i7
Amit Kundu
intel core i7
intel core i7
Aleem Pasha
Presentation on - Processors
Presentation on - Processors
The Avi Sharma
Intel Core I5
Intel Core I5
Raafat Ismael
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD
Intel core i5
Intel core i5
Abdul-Fattah Mahran
Intel core i7 processors
Intel core i7 processors
Self employed
Intel core i7 processor
Intel core i7 processor
Abhishek Ashish
ELC21: VM-to-VM Communication Mechanisms for Embedded
ELC21: VM-to-VM Communication Mechanisms for Embedded
Stefano Stabellini
AMD vs Intel
AMD vs Intel
AbhinavGupta528
Intel Processors
Intel Processors
home
Pcie basic
Pcie basic
Saifuddin Kaijar
Processor powerpoint
Processor powerpoint
brennan_jame
Difference between Intel i3 i5 i7
Difference between Intel i3 i5 i7
Mohammad Danish
PCI express
PCI express
sarangaprabod
Intel’s core i7
Intel’s core i7
Chandresh Mahajan
Graphics card
Graphics card
Pratik Jain
AMD Processor
AMD Processor
Ali Fahad
Intel Core i7 Processors
Intel Core i7 Processors
Anagh Vijayvargia
PCIe
PCIe
ChiaYang Tsai
AMD Processor
AMD Processor
Reber Novanta
Evolution of microprocessors
Evolution of microprocessors
harinder
Arm device tree and linux device drivers
Arm device tree and linux device drivers
Houcheng Lin
Pcie drivers basics
Pcie drivers basics
Venkatesh Malla
Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0
Deepak Shankar
Intel processors
Intel processors
Kapil Raghuwanshi
Evolution of Intel Processors
Evolution of Intel Processors
Shad Ahmad Zaidi
Weitere ähnliche Inhalte
Was ist angesagt?
Intel core i5
Intel core i5
Abdul-Fattah Mahran
Intel core i7 processors
Intel core i7 processors
Self employed
Intel core i7 processor
Intel core i7 processor
Abhishek Ashish
ELC21: VM-to-VM Communication Mechanisms for Embedded
ELC21: VM-to-VM Communication Mechanisms for Embedded
Stefano Stabellini
AMD vs Intel
AMD vs Intel
AbhinavGupta528
Intel Processors
Intel Processors
home
Pcie basic
Pcie basic
Saifuddin Kaijar
Processor powerpoint
Processor powerpoint
brennan_jame
Difference between Intel i3 i5 i7
Difference between Intel i3 i5 i7
Mohammad Danish
PCI express
PCI express
sarangaprabod
Intel’s core i7
Intel’s core i7
Chandresh Mahajan
Graphics card
Graphics card
Pratik Jain
AMD Processor
AMD Processor
Ali Fahad
Intel Core i7 Processors
Intel Core i7 Processors
Anagh Vijayvargia
PCIe
PCIe
ChiaYang Tsai
AMD Processor
AMD Processor
Reber Novanta
Evolution of microprocessors
Evolution of microprocessors
harinder
Arm device tree and linux device drivers
Arm device tree and linux device drivers
Houcheng Lin
Pcie drivers basics
Pcie drivers basics
Venkatesh Malla
Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0
Deepak Shankar
Was ist angesagt?
(20)
Intel core i5
Intel core i5
Intel core i7 processors
Intel core i7 processors
Intel core i7 processor
Intel core i7 processor
ELC21: VM-to-VM Communication Mechanisms for Embedded
ELC21: VM-to-VM Communication Mechanisms for Embedded
AMD vs Intel
AMD vs Intel
Intel Processors
Intel Processors
Pcie basic
Pcie basic
Processor powerpoint
Processor powerpoint
Difference between Intel i3 i5 i7
Difference between Intel i3 i5 i7
PCI express
PCI express
Intel’s core i7
Intel’s core i7
Graphics card
Graphics card
AMD Processor
AMD Processor
Intel Core i7 Processors
Intel Core i7 Processors
PCIe
PCIe
AMD Processor
AMD Processor
Evolution of microprocessors
Evolution of microprocessors
Arm device tree and linux device drivers
Arm device tree and linux device drivers
Pcie drivers basics
Pcie drivers basics
Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0
Andere mochten auch
Intel processors
Intel processors
Kapil Raghuwanshi
Evolution of Intel Processors
Evolution of Intel Processors
Shad Ahmad Zaidi
Intel I3,I5,I7 Processor
Intel I3,I5,I7 Processor
sagar solanky
Evolution of intel microprocessors
Evolution of intel microprocessors
Aurang Zaib
Processors
Processors
Laxman Puri
Final draft intel core i5 processors architecture
Final draft intel core i5 processors architecture
Jawid Ahmad Baktash
Evolution Of Microprocessors
Evolution Of Microprocessors
harinder
Evolution of processors
Evolution of processors
Sandesh Agrawal
Intel Processor History
Intel Processor History
nglkumar
Intel processor trace - What are Recorded?
Intel processor trace - What are Recorded?
Pipat Methavanitpong
OSCh2
OSCh2
Joe Christensen
Basic Introduction to an Operation - OR Design and Aseptic Techniques
Basic Introduction to an Operation - OR Design and Aseptic Techniques
Reynaldo Joson
Desktop operating system
Desktop operating system
Fazla Rabbi
Modern operating system.......
Modern operating system.......
vignesh0009
80386 Architecture
80386 Architecture
Rohit Choudhury
Embedded Web Services Report
Embedded Web Services Report
Bernie Chiu
Modern Operating System Windows Server 2008
Modern Operating System Windows Server 2008
Sneha Chopra
GPS
GPS
shubham paliwal
Robotics and autmation
Robotics and autmation
kasthuri electrical
Waterfall
Waterfall
jatinder_dolon
Andere mochten auch
(20)
Intel processors
Intel processors
Evolution of Intel Processors
Evolution of Intel Processors
Intel I3,I5,I7 Processor
Intel I3,I5,I7 Processor
Evolution of intel microprocessors
Evolution of intel microprocessors
Processors
Processors
Final draft intel core i5 processors architecture
Final draft intel core i5 processors architecture
Evolution Of Microprocessors
Evolution Of Microprocessors
Evolution of processors
Evolution of processors
Intel Processor History
Intel Processor History
Intel processor trace - What are Recorded?
Intel processor trace - What are Recorded?
OSCh2
OSCh2
Basic Introduction to an Operation - OR Design and Aseptic Techniques
Basic Introduction to an Operation - OR Design and Aseptic Techniques
Desktop operating system
Desktop operating system
Modern operating system.......
Modern operating system.......
80386 Architecture
80386 Architecture
Embedded Web Services Report
Embedded Web Services Report
Modern Operating System Windows Server 2008
Modern Operating System Windows Server 2008
GPS
GPS
Robotics and autmation
Robotics and autmation
Waterfall
Waterfall
Ähnlich wie 01 intel processor architecture core
Features of modern intel microprocessors
Features of modern intel microprocessors
Krunal Siddhapathak
Software Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT Platforms
Intel® Software
Evaluating Microsoft Windows 8 Security on Intel Architecture Tablets
Evaluating Microsoft Windows 8 Security on Intel Architecture Tablets
IT@Intel
Windows 8 hardware sensors
Windows 8 hardware sensors
Matteo Pagani
Intel
Intel
Jayson Bautista
Noile soluţii Intel pentru afaceri eficiente-23apr2010
Noile soluţii Intel pentru afaceri eficiente-23apr2010
Agora Group
Noile soluţii Intel pentru afaceri eficiente-20apr2010
Noile soluţii Intel pentru afaceri eficiente-20apr2010
Agora Group
Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010
Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010
Agora Group
Noile solutii Intel pentru afaceri eficiente-tm-20mai2010
Noile solutii Intel pentru afaceri eficiente-tm-20mai2010
Agora Group
What's under the hood of Exadata X2-2 and X2-8?
What's under the hood of Exadata X2-2 and X2-8?
Enkitec
Develop, Deploy, and Innovate with Intel® Cluster Ready
Develop, Deploy, and Innovate with Intel® Cluster Ready
Intel IT Center
Intel Roadmap 2010
Intel Roadmap 2010
Umair Mohsin
Explore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XE
Intel IT Center
Accelerating Insights in the Technical Computing Transformation
Accelerating Insights in the Technical Computing Transformation
Intel IT Center
Intel Knights Landing Slides
Intel Knights Landing Slides
Ronen Mendezitsky
Understanding Intel Products from DarrenYaoYao
Understanding Intel Products from DarrenYaoYao
DarrenYaoYao
Accelerating Mission Critical Transformation at Red Hat Summit 2011
Accelerating Mission Critical Transformation at Red Hat Summit 2011
Pauline Nist
Intel Roadmap
Intel Roadmap
earningsreport
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel Software Brasil
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
MAKERPRO.cc
Ähnlich wie 01 intel processor architecture core
(20)
Features of modern intel microprocessors
Features of modern intel microprocessors
Software Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT Platforms
Evaluating Microsoft Windows 8 Security on Intel Architecture Tablets
Evaluating Microsoft Windows 8 Security on Intel Architecture Tablets
Windows 8 hardware sensors
Windows 8 hardware sensors
Intel
Intel
Noile soluţii Intel pentru afaceri eficiente-23apr2010
Noile soluţii Intel pentru afaceri eficiente-23apr2010
Noile soluţii Intel pentru afaceri eficiente-20apr2010
Noile soluţii Intel pentru afaceri eficiente-20apr2010
Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010
Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010
Noile solutii Intel pentru afaceri eficiente-tm-20mai2010
Noile solutii Intel pentru afaceri eficiente-tm-20mai2010
What's under the hood of Exadata X2-2 and X2-8?
What's under the hood of Exadata X2-2 and X2-8?
Develop, Deploy, and Innovate with Intel® Cluster Ready
Develop, Deploy, and Innovate with Intel® Cluster Ready
Intel Roadmap 2010
Intel Roadmap 2010
Explore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XE
Accelerating Insights in the Technical Computing Transformation
Accelerating Insights in the Technical Computing Transformation
Intel Knights Landing Slides
Intel Knights Landing Slides
Understanding Intel Products from DarrenYaoYao
Understanding Intel Products from DarrenYaoYao
Accelerating Mission Critical Transformation at Red Hat Summit 2011
Accelerating Mission Critical Transformation at Red Hat Summit 2011
Intel Roadmap
Intel Roadmap
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
Kürzlich hochgeladen
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
NavinnSomaal
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Lonnie McRorey
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
LoriGlavin3
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Commit University
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
mohitsingh558521
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
LoriGlavin3
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
Lorenzo Miniero
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Hervé Boutemy
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
LoriGlavin3
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Addepto
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Kalema Edgar
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Mattias Andersson
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
BookNet Canada
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
LoriGlavin3
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
BookNet Canada
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Dubai Multi Commodity Centre
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Alan Dix
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
Kürzlich hochgeladen
(20)
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
01 intel processor architecture core
1.
Intel® Core™ Microarchitecture
Intel® Software College
2.
Intel® Software College Objectives After
completion of this module you will be able to describe • Components of an IA processor • Working flow of the instruction pipeline • Notable features of the architecture Intel® Processor Micro-architecture - Core® microarchitecture 2 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
3.
Intel® Software College Agenda Introduction Knowledge
preparation Notable features Micro-architecture tour Coding considerations Intel® Processor Micro-architecture - Core® microarchitecture 3 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
4.
Intel® Software College Agenda Introduction Knowledge
preparation Notable features Micro-architecture tour Coding considerations Intel® Processor Micro-architecture - Core® microarchitecture 4 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
5.
Industrial Recognition
Intel® Software College PC Format May 2006 “Intel Strikes Back! Conroe is the name. Pistol-whipping Athlon 64s into burger meat is the game..“ Intel's Next Generation Microarchitecture Unveiled Real World Tech “Just as important as the technical innovations in Core MPUs, this microarchitecture will have a profound impact on the industry. “ Intel Dishes the Knockout Punch to AMD with Conroe, GD Hardware.com “…the results were far more than we could hope for and it'll be amusing to see AMD's response to this beat-down session Intel Regains Performance Crown, Anandtech “… At 2.8 or 3.0GHz, a Conroe EE would offer even stronger performance than what we’ve seen here.” Intel Reveals Conroe Architecture, Extremetech “… And not only was the Intel system running at 2.66GHz— a slower clock rate than the top Pentium 4—it was outpacing an overclocked Athlon 64 FX-60. Wrap your brain around that idea for a bit…” Conroe Benchmarks - Intel Showing Big Strength Hot Hardware.com Intel® Processor Micro-architecture - Core® microarchitecture “… Intel is poised to change the face of the desktop computing landscape…” 5 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
6.
Intel® Software College
Performance Summary Intel® Core™ Microarchitecture dramatically boosts Intel platform performance • Conroe & Woodcrest drive clear Desktop/Server performance leadership • Merom extends Intel Mobile performance leadership Intel® Core™ Microarchitecture-based platforms set the bar in Performance and Energy Efficiency for the Multi- Core era • Intel’s 3rd generation dual-core (while competition stuck on 1st generation) • New Intel high-performance ‘engine’: Wider, Smarter, Faster, More Efficient Best Processor on the Planet: Energy-Efficient Performance 1 Energy- The “Core™ Effect”: Intel® Core™ Microarchitecture 20% (Merom), broad roadmap accelerationsPerformance Boosts1 ! ramp fuels 40% (Conroe), 80% (Woodcrest) Intel® Processor Micro-architecture - Core® microarchitecture 6 1 Based on SPECint*_rate_base2000 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
7.
Intel® Software College Agenda Introduction Knowledge
preparation • Architecture VS Microarchitecture • CISC VS RISC • Performance Measurements • Pipeline Design • Power and Energy • Chip Multi-Processing Notable features Micro-architecture tour Coding considerations Intel® Processor Micro-architecture - Core® microarchitecture 7 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
8.
Intel® Software College
Architecture and Micro-architecture What is Computer Architecture? • Architecture is the set of features which are externally visible: • Instruction set • Registers • Addressing modes • Bus protocols Intel Architectures (IA) • IA32/X86 (8-bit, 16-bit and 32-bit Integer architecture) • X87 (Floating Point extension) • MMX (Multi-Media extension) • SSE, SSE2, SSE3 (SIMD Streaming Extension) • Intel® 64/EM64T (64-bit Integer extension of IA32) ? Go to detail! • IA64 (Intel new 64-bit architecture) • Itanium/Itainium2 processor family Intel® Processor Micro-architecture - Core® microarchitecture 8 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
9.
Intel® Software College Architecture
and Micro-architecture (cont.) What is Micro-architecture? • Same as m–Architecture or u-Architecture • “Invisible” features that provide meaningful value to the end user (whatever makes you buy a new compatible PC) • Programs run faster Improved Performance • Reduced Power consumption Extended Battery life • H/W fits into Smaller Form Factor Intel® Processor Micro-architecture - Core® microarchitecture 9 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
10.
Intel® Software College
Intel® Architecture History * IXA – Intel Internet Exchange Architecture/ EPIC – Explicitly Parallel Instruction Computing Examples: Architecture: Instruction set definition EPIC* (Itanium®) IA-32 IXA* (XScale) and compatibility Microarchitecture: Hardware implementation Examples: maintaining instruction set compatibility with high-level P5 P6 Intel NetBurst® Banias architecture Processors: Productized implementation of Microarchitecture Examples: Pentium® 4 Pentium® Pro Pentium® Pentium® D Pentium® M Pentium® II/III Xeon® Intel® Processor Micro-architecture - Core® microarchitecture 10 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
11.
Intel® Software College Intel®
Core™ Microarchitecture Processors Intel® NetBurst® + New Innovations Mobile Microarchitecture Intel® Core™ 2 Duo/Quad/Extreme processors Intel® Processor Micro-architecture - Core® microarchitecture 11 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
12.
Intel® Software College RISC
Approach to CPU design (RISC = Reduced Instruction Set Computers) Optimize H/W for common basic operations • Fixed instruction length • Shorter Execution Pipeline • Ease of Instruction Level Parallelism • Large number of registers • Less memory accesses • ‘Load/Store’ architecture • Shorter Execution Pipeline • Ease of advancing Loads • Branch Hints • Reduce pipeline flush events • ‘Exotic’ stuff to be implemented in S/W with minimal H/W support • No ‘complex’ H/W instructions • Handle exceptional conditions in S/W Examples: MIPS, IBM Power and PowerPC, Sun Sparc Achieve Maximum performance by right partitioning between H/W and S/W Intel® Processor Micro-architecture - Core® microarchitecture 12 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
13.
Intel® Software College CISC
Approach to CPU design (CISC = Complex Instruction Set Computers) Rich architecture • Variable length instructions. • Complex addressing modes. On-chip HW / SW partitioning required • H/W keeps executing ‘simple’ stuff • Complex instructions are ‘emulated’ using u-code routines from ROM • More instructions treated as ‘simple’ as more H/W is available COMPATIBILITY has some major advantages: • Large (and forever increasing) software base • Code development tools • Expertise • H/W - S/W spiral Example: Intel IA32, Motorola 680X0 Maximize information passed to the HW Intel® Processor Micro-architecture - Core® microarchitecture 13 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
14.
Intel® Software College Performance
Measurement Performance is the reciprocal of the “Time of execution”: 1 1 Performance ≈ = Were: Time _ of _ Execution L * CPI * TC L = Code Length (# of machine instructions) CPI = Clock cycles Per Instruction Tc = Clock period (nSecs) Substitute: IPC = Instructions Per Cycle = 1/CPI F = Frequency = 1/Tc Improve ILP Improve Timing IPC * F Performance ≈ L Arch Enhancements Intel® Processor Micro-architecture - Core® microarchitecture 14 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
15.
Intel® Software College Performance
Measurement (cont.) Benchmarks examples Performance considerations: • Industry Standard • Which Code/Application to run? • Spec (ISPEC, FSPEC) • Which OS? • TPC • Commercial • Which other components in the • SysMark platform? • MobileMark • Under which thermal conditions? • PCMark • Multithreading? Multiprocessing? • Sandra • ScienceMark • Applications • Video (Windows Media encoder, DivX) • Audio (Lame MP3) • Compression (RAR) • Content creation (3DSM, Photoshop, Premiere) • Latest Games (Doom III, FarCry, but changes fast) • Specific industries use specific benchmarks • Linux compilation, POVRay, LinPack, lmbench Intel® Processor Micro-architecture - Core® microarchitecture 15 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
16.
Intel® Software College Design
Considerations for Different Market Segments Constrains: • Thermally, area constrained Desktop • Unconstrained Extreme • Very area constrained Value • Thermally, Energy and Area constrained Mobile • Thermally, Energy Servers Micro-architecture is the Art of Tradeoffs between: • Schedule • Requirements / Standards • Performance • Features • Power / Energy • Area / Cost Intel® Processor Micro-architecture - Core® microarchitecture 16 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
17.
Intel® Software College Design
Metrics IPC = Instructions per Cycle • The more the better Latency – same as Response Time • The time interval between • when any request for data is made and • when the data transfer completes • The less the better Throughput • The amount of work completed by the system per unit of time. • The more the better • ops/sec Intel® Processor Micro-architecture - Core® microarchitecture 17 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
18.
Intel® Software College CPU
Pipeline Break the work to smaller pieces • Four basic stages of instruction life • Fetch - bring instruction to core • Decode - read operands from register • Execute - perform the operation • Writeback - save result to register • Execution timing of simple instructions (legend: “op src1,src2 dst”) add eax, ebx eax F D E W sub ecx, edx ecx F D E W Increased throughput • increased number of completed instructions per cycle Intel® Processor Micro-architecture - Core® microarchitecture 18 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
19.
Intel® Software College Pipeline
Design - Explore Parallelism New instruction not always depends on previous one • Can start new instruction before previous one is finished • ...if different stages use different H/W resources Run instructions in parallel (pipeline) Add eax, ebx eax F D E W Sub ecx, edx ecx F D E W Or edi, esi edi F D E W Need to balance pipe stages • Each stage should take same time for best throughput and utilization Clock cycle is determined by the longest path! Fetch Decode Exec WB Fetch Decode Exec WB Fetch Decode Exec WB Fetch Decode Exec WB Intel® Processor Micro-architecture - Core® microarchitecture 19 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
20.
Intel® Software College Pipeline
Design – Fighting Stalls Data flow dependency (instructions output/input) • Solved by bypasses, renaming etc Control flow dependencies • Solved by branch prediction Others (Cache misses, long latency instructions) • Solved by other dynamic scheduling techniques ? Go to detail! Intel® Processor Micro-architecture - Core® microarchitecture 20 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
21.
Intel® Software College Race
of CISC vs. RISC In modern CPUs Advanced µ-Architecture Techniques minimize the advantages of RISC over CISC • Branch Prediction • Reduces the effect of extra pipeline stages • Register Renaming • Effectively Increase the Number of Registers • Out Of Order • Reduce Number of stalls caused by shortage of registers • Speculative Execution • Further Reduce Number of stalls • Power saving features • Reduce the overhead when not needed. Intel® Processor Micro-architecture - Core® microarchitecture 21 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
22.
Intel® Software College op
– Intel’s Take of the CICS/RISC Race (CISC) Instructions are translated into one or more (RISC) uop(micro-operation)s • Fixed format • Wide and simple • Temp registers Usually one uop per instruction Complex instruction can be thousands of uops Stores divided into two uops (STA and STD) Fusion play games here Intel® Processor Micro-architecture - Core® microarchitecture 22 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
23.
Intel® Software College Power
and Energy Maximum power (TDP): • Cooling requirements • Cooling solution • Computer form factor and acoustic noise Average power • Battery life • Electricity bill General calculation: • P = frequency * voltage^2 * activity factor * capacitance + leakage Reducing TDP • Less transistors and wires • Smaller transistors and wires • Power features less activity • Low leakage transistors Reducing average power • Energy efficiency • Power states • Lower leakage Intel® Processor Micro-architecture - Core® microarchitecture 23 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
24.
Intel® Software College Dual/Multi
Core and SMT Put more than one core per package Architectural change: • Software must be multi-threaded or multi-process • …but backward compatible with multiprocessor systems (MP) Several ways of implementing it • All of them being used I/O I/O I/O I/O LLC LLC LLC LLC LLC Core Core Core Core Core Core SMT: Run two (or more) threads on the same core, simultaneously Intel® Processor Micro-architecture - Core® microarchitecture 24 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
25.
Intel® Software College Intel
Approach ? Intel® Intel® XQ6700* Intel® Intel® Core 2 Duo® Duo® Intel® Intel® Pentium® D Pentium® Processor 80 Threads Intel® Intel® Pentium® Pentium® With HT Intel® Intel® 4 Threads Pentium® Pentium® 2 Threads State 2 Threads Execution Units Cache Bus 2 Threads 1 Threads Q4 2000 Q2 2003 Q2 2005 Q3 2006 Q4 2006 While single core performance has increased due to clock speed, While single core performance has increased due to clock speed, increased cache and improved ILP the biggest performance increases increased cache and improved ILP the biggest performance increases have come from the thread level parallelism. have come from the thread level parallelism. Intel® Processor Micro-architecture - Core® microarchitecture 25 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
26.
Intel® Software College A
“Acronym Cheat Sheet” of Parallel Computing CMP: Chip Multi Processor (two or more cores per package) • Dual Core: two cores in same package • Quad Core: four cores in same package DP: Dual Processor (two packages) MP: Multi Processor (four or more packages) SMT: Symmetric Multi Threading (virtual multi core: HyperThreading) Intel® Processor Micro-architecture - Core® microarchitecture 26 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
27.
Intel® Software College Agenda Introduction Knowledge
preparation Notable features • Wide Dynamic Execution • Smart Memory Access • Advanced Smart Cache • Advanced Digital Media Boost • Intelligent Power Capability Micro-architecture tour Coding considerations Intel® Processor Micro-architecture - Core® microarchitecture 27 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
28.
Intel® Software College Intel®
Core® Micro-architecture Notable Features Instruction Fetch Intel® Wide Dynamic Execution and PreDecode • 14-stage efficient pipeline Instruction Queue 2M/4M • Wider execution path 5 shared L2 • Advanced branch prediction uCode ROM Decode Cache • Macro-fusion 4 • Roughly ~15% of all instructions are conditional branches up to • Macro-fusion fuses a comparison Rename/Alloc and jump to reduce micro-ops 10.4 Gb/s running down the pipeline FSB • Micro-fusion Retirement Unit 4 • Merges the load and operation (ReOrder Buffer) micro-ops into one macro-op • 64-Bit Support Schedulers ALU ALU ALU • Merom, Conroe, and Woodcrest Branch FAdd FMul support EM64T MMX/SSE MMX/SSE MMX/SSE Load Store FPmove FPmove FPmove L1 D-Cache and D-TLB Intel® Processor Micro-architecture - Core® microarchitecture 28 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
29.
Intel® Software College Intel®
Core® Micro-architecture Notable Features (cont.) Intel® Advanced Memory Access • Improved prefetching • Memory disambiguation • Advance load before a possible data dependency (pointer conflict) • Earlier loads hide memory latencies Intel® Processor Micro-architecture - Core® microarchitecture 29 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
30.
Intel® Software College Intel®
Core® Micro-architecture Notable Features (cont.) Intel® Advanced Smart Cache • Multi-core optimization • Shared between the two cores • Advanced Transfer Cache architecture • Reduced bus traffic • Both cores have full access to the entire cache • Dynamic Cache sizing Intel® Processor Micro-architecture - Core® microarchitecture 30 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
31.
Intel® Software College Intel®
Core® Micro-architecture Notable Features (cont.) Advantages of Shared Cache Memory Front Side Bus (FSB) Shipping L2 Cache Line ~Half access to memory Cache Line CPU1 CPU2 Intel® Processor Micro-architecture - Core® microarchitecture 31 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
32.
Intel® Software College Intel®
Core® Micro-architecture Notable Features (cont.) Advantages of Shared Cache (cont.) Memory Front Side Bus (FSB) L2 is shared: No need to ship cache line Cache Line CPU1 CPU2 Intel® Processor Micro-architecture - Core® microarchitecture 32 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
33.
Intel® Software College Intel®
Core® Micro-architecture Notable Features (cont.) Intel® Advanced Digital Media Boost SIMD Operation (SSE/SSE2/SSE3/SSSE) • Single Cycle SIMD Operation SOURCE 127 0 • 8 Single Precision Flops/cycle X4 X3 X2 X1 • 4 Double Precision Flops/cycle SSE/2/3 OP • Wide Operations Y4 Y3 Y2 Y1 • 128-bit packed Add DEST • 128-bit packed Multiply Core™ µarch • 128-bit packed Load CLOCK X4opY4 X3opY3 X2opY2 X1opY1 • 128-bit packed Store CYCLE 1 • Support for Intel® EM64T Previous CLOCK X2opY2 X1opY1 CYCLE 1 instructions CLOCK X4opY4 X3opY3 CYCLE 2 Intel® Processor Micro-architecture - Core® microarchitecture 33 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
34.
Intel® Software College Intel®
Core® Micro-architecture Notable Features Intel® Advanced Digital Media Boost • Additional Media Instructions - Supplemental Streaming SIMD Extensions 3 (SSSE3) • 16 new packed integer instructions • Targeting video encode/decode • Significantly improved strings • REP MOVS and REP STOS • ~8 bytes / cycle throughput • mileage may vary Intel® Processor Micro-architecture - Core® microarchitecture 34 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
35.
Intel® Software College Intel®
Core® Micro-architecture Notable Features Intel® Advanced Digital Media Boost • Supplemental SSE-3 (SSSE-3) Horizontal Addition/Subtraction PHADDW, PHADDSW, PHADDD, PHSUBW, PHSUBSW, PHSUBD Packed Absolute Values PABSB, PABSW, PABSD Multiply and Add Packed Signed/Unsigned bytes PMADDUBSW Packed multiply High with Round and Scale PMULHRSW Packed Shuffle Bytes PSHUFB Packed SIGN PSIGNB/W/D Packed Align Right PALIGNR Intel® Processor Micro-architecture - Core® microarchitecture 35 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
36.
Intel® Software College Intel®
Core® Micro-architecture Notable Features (cont.) Intelligent Power Capability • Advanced power gating & Dynamic power coordination • Multi-point demand-based switching • Voltage-Frequency switching separation • Supports transitions to deeper sleep modes • Event blocking • Clock partitioning and recovery • Dynamic Bus Parking • During periods of high performance execution, many parts of the chip core can be shut off Intel® Processor Micro-architecture - Core® microarchitecture 36 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
37.
Intel® Software College Agenda Introduction Knowledge
preparation Notable features Micro-architecture tour • Front End • Out-Of-Order Execution Core • Memory Sub-system Coding considerations Intel® Processor Micro-architecture - Core® microarchitecture 37 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
38.
Intel® Software College Intel®
Core® Micro-architecture Drill-down page miss handler store icache branch address integer prediction predecode unit data memory FP load SIMD cache order instruction unit buffer store (3x) queue data instruction register Reservation decode alias table Station MS ALLOC Re-Order Buffer Intel® Processor Micro-architecture - Core® microarchitecture 38 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
39.
Intel® Software College Agenda Introduction Knowledge
refreshment Notable features Micro-architecture tour • Front End • Out-Of-Order Execution Core • Memory Sub-system Coding considerations Intel® Processor Micro-architecture - Core® microarchitecture 39 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
40.
Intel® Software College Core®
Micro-architecture Front End Instruction preparation before executed icache branch • Instruction Fetch Unit prediction predecode unit • Instruction Queue • Instruction Decode Unit • Branch Prediction Unit instruction queue instruction decode MS Intel® Processor Micro-architecture - Core® microarchitecture 40 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
41.
Intel® Software College Intel®
Core™ Microarchitecture – Front End Instruction Queue Buffer between instruction pre-decode unit and decoder • up to six predecoded instructions written per cycle • 18 Instructions contained in IQ • up to 5 Instructions read from IQ Potential Loop cache Loop Stream Detector (LSD) support • Re-use of decoded instruction • Potential power saving Intel® Processor Micro-architecture - Core® microarchitecture 41 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
42.
Intel® Software College Intel®
Core™ Microarchitecture – Front End Macro - Fusion Scheduler Roughly ~15% of all instructions are cmpjae eax, [mem], label conditional branches. Macro-fusion merges two instructions into a single micro-op, as if the two instructions were a single long instruction. Execution Enhanced Arithmetic Logic Unit (ALU) for macro-fusion. Each macro-fused instruction executes with a single dispatch. Branch Eval Not supported in EM64T long mode flags and target to Write back Intel® Processor Micro-architecture - Core® microarchitecture 42 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
43.
Intel® Software College Intel®
Core™ Microarchitecture – Front End Macro-Fusion Absent Instruction Queue addps xmm0, [EAX+16] Read four instructions from mulps xmm0, xmm0 Instruction Queue Each instruction gets decoded movps [EAX+240], xmm0 into separate uops cmp eax, 100000 Enabling Example jge label for (int i=0; i<100000; i++) { … addps xmm0, [EAX+16] dec0 Cycle 1 } mulps xmm0, xmm0 dec1 movps [EAX+240], xmm0 dec2 cmp eax, 100000 dec3 Cycle 2 jge label dec0 Intel® Processor Micro-architecture - Core® microarchitecture 43 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
44.
Intel® Software College Intel®
Core™ Microarchitecture – Front End Macro-Fusion Presented Instruction Queue addps xmm0, [EAX+16] Read five Instructions from Instruction Queue mulps xmm0, xmm0 Send fusable pair to single movps [EAX+240], xmm0 decoder cmp eax, 100000 Single uop represents two instructions jae label Enabling Example for (unsigned int i=0; Cycle 1 addps xmm0, [EAX+16] dec0 i<100000; i++) { mulps xmm0, xmm0 dec1 … movps [EAX+240], xmm0 dec2 } cmpjae eax, 100000, label dec3 Intel® Processor Micro-architecture - Core® microarchitecture 44 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
45.
Intel® Software College Intel®
Core™ Microarchitecture – Front End Instruction Decode / Micro-Op Fusion Frequent pairs of micro-operations derived from the same Macro Instruction can be fused into a single micro-operation Micro-op fusion effectively widens the pipeline Intel® Processor Micro-architecture - Core® microarchitecture 45 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
46.
Intel® Software College Intel®
Core™ Microarchitecture – Front End Instruction Decode / Micro-Fusion (cont.) u-ops of a Store “movps [EAX+240], xmm0” sta eax+240 st xmm0, [eax+240] std xmm0, [eax+240] Intel® Processor Micro-architecture - Core® microarchitecture 46 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
47.
Intel® Software College Intel®
Core™ Microarchitecture – Front End Branch Prediction Improvements Intel® Pentium® 4 Processor branch prediction PLUS the following two improvements: Indirect Branch Predictor Loop Detector Branch miss-predictions reduced by >20% Intel® Processor Micro-architecture - Core® microarchitecture 47 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
48.
Intel® Software College Agenda Introduction Knowledge
preparation Notable features Micro-architecture tour • Front End • Out-Of-Order Execution Core • Memory Sub-system Coding considerations Intel® Processor Micro-architecture - Core® microarchitecture 48 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
49.
Intel® Software College Core®
Micro-architecture Execution Core store Accepted decoded u-ops, assign resources, address integer execute and retire u-ops FP load • Renamer SIMD store data (3x) • Reservation station (RS) register Reservation • Issue ports alias table Station • Execution Unit ALLOC Re-Order Buffer Intel® Processor Micro-architecture - Core® microarchitecture 49 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
50.
Intel® Software College Intel®
Core™ Microarchitecture – Execution Core Execution Core Building Blocks Renamer Ports (number) RS 0,1,5 0,1,5 SIMD/Integer 0,1,5 SIMD Floating MUL Integer ROB Integer Point Execution Unit 2 Load 3,4 Store Memory Sub-system Intel® Processor Micro-architecture - Core® microarchitecture 50 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
51.
Intel® Software College Intel®
Core™ Microarchitecture – Execution Core Issue Ports and Execution Units 6 dispatch ports from RS • 3 execution ports • (shared for integer / fp / simd) • load • store (address) • store (data) 128-bit SSE implementation • Port 0 has packed multiply (4 cycles SP 5 DP pipelined) • Port 1 has packed add (3 cycles all precisions) Intel® Processor Micro-architecture - Core® microarchitecture 51 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
52.
Intel® Software College Intel®
Core™ Microarchitecture – Execution Core Retirement Unit ReOrder Buffer (ROB) • Holds micro-ops in various stages of completion • Buffers completed micro-ops • updates the architectural state in order • manages ordering of exceptions register Reservation alias table Station ALLOC Re-Order Buffer Intel® Processor Micro-architecture - Core® microarchitecture 52 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
53.
Intel® Software College Agenda Introduction Knowledge
preparation Notable features Micro-architecture tour • Front End • Out-Of-Order Execution Core • Memory Sub-system Coding considerations Intel® Processor Micro-architecture - Core® microarchitecture 53 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
54.
Intel® Software College Core®
Micro-architecture Memory Sub- System Memory Ordering Buffer • Store Address Buffer • Stores the address of each store not actually performed • Loads compare address to any store older than itself • If it find a hole… • Store Data Buffer • Stores data of each store not actually performed • If load hit on the SAB, it forward the data from here • Load Buffer • Stores address of non-retired loads • For snoops and re-dispatch • One 128-bit load and one 128-bit store per cycle to different memory locations • Out of order Memory operations Intel® Processor Micro-architecture - Core® microarchitecture 54 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
55.
Intel® Software College Intel®
Core™ Microarchitecture – Memory Sub-system Core® Micro-architecture Memory Sub- System (cont.) 32k D-Cache (8-way, 64 byte line size) Shared second level (L2) 2MB 8-way or 4MB 16-way instruction and data cache Cache to cache transfer • improves producer / consumer style MP Wider interface to L2 • reduced interference • processor line fill is 2 cycles Core1 Core2 Higher bandwidth from the L2 cache to the core • ~14 clock latency and 2 clock throughput Load & Store Access order Bus 1. L1 cache of immediate core 2. L1 cache of the other core 2 MB L2 Cache 3. L2 cache 4. Memory Intel® Processor Micro-architecture - Core® microarchitecture 55 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
56.
Intel® Software College Intel®
Core™ Microarchitecture – Memory Sub-system Advanced Memory Access / Enhanced Data Pre-fetch Logic Speculates the next needed data and loads it into cache by HW and/or SW Door Valet Parking Area Main Parking Lot (L1 Cache) (L2 Cache) (External Memory) Intel® Processor Micro-architecture - Core® microarchitecture 56 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
57.
Intel® Software College Intel®
Core™ Microarchitecture – Memory Sub-system Advanced Memory Access / Enhanced Data Pre-fetch Logic (cont.) • L1D cache prefetching • Data Cache Unit Prefetcher • Known as the streaming prefetcher • Recognizes ascending access patterns in recently loaded data • Prefetches the next line into the processors cache • Instruction Based Stride Prefetcher • Prefetches based upon a load having a regular stride • Can prefetch forward or backward 2 Kbytes • 1/2 default page size • L2 cache prefetching: Data Prefetch Logic (DPL) • Prefetches data to the 2nd level cache before the DCU requests the data • Maintains 2 tables for tracking loads • Upstream – 16 entries • Downstream – 4 entries • Every load is either found in the DPL or generates a new entry • Upon recognition of the 2nd load of a “stream” the DPL will prefetch the next load Intel® Processor Micro-architecture - Core® microarchitecture 57 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
58.
Intel® Software College Intel®
Core™ Microarchitecture – Memory Sub-system Advanced Memory Access / Memory Disambiguation Memory Disambiguation predictor • Loads that are predicted NOT to forward from preceding store are allowed to schedule as early as possible • increasing the performance of OOO memory pipelines Disambiguated loads checked at retirement • Extension to existing coherency mechanism • Invisible to software and system Intel® Processor Micro-architecture - Core® microarchitecture 58 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
59.
Intel® Software College Intel®
Core™ Microarchitecture – Memory Sub-system Advanced Memory Access / Memory Disambiguation Absent Load4 must WAIT until previous stores complete Memory Data W Store1 Y Load2 Y Data Z Store3 W Load4 X Data Y Data X Intel® Processor Micro-architecture - Core® microarchitecture 59 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
60.
Intel® Software College Intel®
Core™ Microarchitecture – Memory Sub-system Advanced Memory Access / Memory Disambiguation Presented Loads can decouple from stores Load4 can get its data WITHOUT waiting for stores Memory Data W Load4 X Store1 Y Load2 Y Data Z Store3 W Data Y Data X Intel® Processor Micro-architecture - Core® microarchitecture 60 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
61.
Intel® Software College Intel®
Core™ Microarchitecture – Memory Sub-system Advanced Memory Access / Stores Forwarding If a load follows a store and reloads the data that the store writes to memory, the micro-architecture can forward the data directly from the store to the load Memory Store1 Y Internal Load2 Y Buffers Data Y Intel® Processor Micro-architecture - Core® microarchitecture 61 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
62.
Intel® Software College Advanced
Memory Access / Stores Forwarding: Aligned Store Cases store 16 store 32 bit store 64 bit load 16 load 32 bit load 64 bit ld 8 ld 8 load 16 load 16 load 32 bit load 32 bit ld 8 ld 8 ld 8 ld 8 load 16 load 16 load 16 load 16 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 store 128 bit load 128 bit load 64 bit load 64 bit load 32 bit load 32 bit load 32 bit load 32 bit load 16 load 16 load 16 load 16 load 16 load 16 load 16 load 16 ld 8 ld 8 ld 8 ld 8 ld 8 Intel® Processorld 8 ld 8 ld -8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 Micro-architecture Core® microarchitecture 62 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
63.
Intel® Software College Advanced
Memory Access / Stores Forwarding: Unaligned Cases Note that unaligned store forward does not occur when the load crosses a cache line boundary store 16 store 32 bit store 64 bit load 16‡ load 32 bit‡ load 64 bit ld 8 ld 8 load 16‡ load 16 load 32 bit‡ load 32 bit ld 8 ld 8 ld 8 ld 8 load 16‡ load 16 load 16 load 16 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 Store forwarded to load Note: Unaligned 128-bit stores ld 8 No forwarding are issued as two 64-bit stores. ‡: This provides two alignments for No forwarding if the load store forwarding crosses a cache line boundary Intel® Processor Micro-architecture - Core® microarchitecture 63 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
64.
Intel® Software College Agenda Introduction Knowledge
preparation Notable features Micro-architecture tour Coding considerations Intel® Processor Micro-architecture - Core® microarchitecture 64 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
65.
Intel® Software College Optimizing
for Instruction Fetch and PreDecode Avoid “Length Changing Prefixes” (LCPs) • Affects instructions with immediate data or offset • Operand Size Override (66H) • Address Size Override (67H) [obsolete] • LCPs change the length decoding algorithm – increasing the processing time from one cycle to six cycles (or eleven cycles when the instruction spans a 16-byte boundary) • The REX (EM64T) prefix (4xH) is not an LCP • The REX prefix does lengthen the instruction by one byte, so use of the first eight general registers in EM64T is preferred Intel® Processor Micro-architecture - Core® microarchitecture 65 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
66.
Intel® Software College Optimizing
for Instruction Queue Includes a “Loop Stream Detector” (LSD) • Potentially very high bandwidth instruction streaming • A number of requirements to make use of the LSD • Maximum of 18 instructions in up to four 16-byte packets • No RET instructions (hence, little practical use for CALLs) • Up to four taken branches allowed • Most effective at 70+ iterations • LSD is after PreDecode so there is no added cost for LCPs • Trade-off LSD with conventional loop unrolling Intel® Processor Micro-architecture - Core® microarchitecture 66 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
67.
Intel® Software College Optimizing
for Decode Decoder issues up to 4 uOps for renaming/ allocation per clock • This creates a trade off between more complex instruction uOps versus multiple simple instruction uOps • For example, a single four uOp instruction is all that can be renamed/allocated in a single clock • In some cases, multiple simple instructions may be a better choice than a single complex instruction • Single uOp instructions allow more decoder flexibility • For example, 4-1-1-1 can be decoded in one clock • However, 2-2-2-1 takes three clocks to decode Intel® Processor Micro-architecture - Core® microarchitecture 67 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Jetzt herunterladen