SlideShare a Scribd company logo
1 of 14
Confidential
Esperanto Accelerates Machine Learning With 1000+
Low-Power RISC-V Cores on a Single Chip
RISC-V Summit 2020
8 December 2020
CEO: art.swift@esperanto.ai
2
2020 RISC-V Summit
Esperanto: Doubling Down on RISC Principles and RISC-V
DavidR. Ditzel and DavidA. Patterson, the co-
authors of“TheCase for the ReducedInstruction
SetComputer”
together at the 7th RISC-V Workshop
Using RISC-Vasthe basisforourAI Processorstrategywaskey forus!
RISC-VENABLESHARDWAREINNOVATION
BROADECOSYSTEMEASESDEVELOPMENTTASKS
SIMPLEINSTRUCTIONSETUSESFEWER GATES
- Less complex designs
- Smaller die size andlower costs
- Reduced dynamic andstatic power consumption
- Machine learning specific Instruction Set extensions
- Custommicroarchitecture
- Proprietarylow-power design techniques
- Development tools
- Operating systems andsoftware stacks
- 3rd party IP
3
2020 RISC-V Summit
Esperanto: Highly Scalable RISC-V AI Chip Solutions
EsperantoET-SoC-1dieplot:
1000+RISC-VCustomCoreswith23.8Btransistors
usingTSMC7nm
manufacturingnode. Initialproductistargetedat
datacenterinferencing.
Esperanto’sTiledAI Solutionis Designed toScalefromHundreds toThousandsofCPU Cores
BEST-IN-CLASSEFFICIENCY
FUTURE-PROOFSOLUTION
SUPERIORPERFORMANCE(1)
- Up to50x better performance on Recommendation Networks
- Up to30x better performance for Image Classification
- 100x better energy efficiency (Inferences / Watt) on key workloads
- Huge reduction in energy costs for datacenter customers
- Fully programmable tohandle future AImodels
- Leverages large, open programmingsoftware ecosystem
- Industry-leading roadmapofhardwareandsoftware solutions
(1) Comparing Esperanto full-chip emulation results with measured inference benchmark results for
incumbent competitors. Characterized silicon results coming soon.
4
2020 RISC-V Summit
DC bank 0 DC bank 1 DC bank 2 DC bank 3
Data Cache Control
(including D-tags) Front
End
Trans
ROMs
32-bit & 16-bit FMA Bypass TIMA TIMA VPU RF T0/T1
VPU RF T0/T1
32-bit & 16-bit FMA Bypass TIMA TIMA VPU RF T0/T1
32-bit & 16-bit FMA Bypass TIMA TIMA VPU RF T0/T1 Trans
ROMs
32-bit & 16-bit FMA Bypass TIMA TIMA
Trans
ROMs
32-bit & 16-bit FMA Bypass TIMA TIMA VPU RF T0/T1
VPU RF T0/T1
32-bit & 16-bit FMA Bypass TIMA TIMA VPU RF T0/T1
32-bit & 16-bit FMA Bypass TIMA TIMA VPU RF T0/T1 Trans
ROMs
32-bit & 16-bit FMA Bypass TIMA TIMA
ET-Minionis an Energy-efficientRISC-V CPU with Vector/Tensor Unit
ET-MINIONISACUSTOMBUILT 64-BITRISC-VPROCESSOR
- In-order pipeline with low gates/stage toimproveMHz at low voltages
- Architecture andCircuitsoptimized toenable low-voltage operation
- 2 Hardwarethreads ofexecution
- Softwareconfigurable L1 data-cache and/or scratchpad
VECTOR/TENSORUNITOPTIMIZEDFORMACHINELEARNING
- New multi-cycle Tensor Instructions
- 256-bit wide Floating Point per cycle
- 16 32-bit Single Precision operations per cycle
- 32 16-bit Half Precision operations per cycle
- 512-bit wide Integer per cycle
- 128 8-bit integer operations per cycle
- Vector transcendental instructions
512b Int8
RISC-V Integer Pipeline
Vector/Tensor
Unit
256b Floating Point Vector RF
ET-Minion RISC-V Core and Tensor/Vector unit
optimized for low-voltage operation
to improve energy-efficiency
RISC-V
Integer
L1 Data-Cache/Scratchpad
Optimizedforenergy-efficient MLoperations.EachET-Minioncandeliver peakof128GOPs8 perGHz.
5
2020 RISC-V Summit
32 ET-Minion CPU’s and 4 MB memory form a “MinionShire”
32ET-MINIONRISC-VCORESPERMINIONSHIRE
- Arrangedinfour 8-coreneighborhoods
MEMORYHIERARCHYISSOFTWARECONFIGURABLE
- L1 SRAM can be configured as data cache orscratchpad
- 4MBL2 SRAM canbe configuredas PrivateL2, SharedL3 or scratchpad
MESHCONNECTEDSHIRES
MULTIPLESYNCHRONIZATIONPRIMITIVES
- Fast Local Atomics
- Fast Local Barriers
- Fast Local Credit Counter
- IPISupport
4x4
xbar
Mesh
stop
m0
m3
m4
m7
m8
m11
m12
m15
m16
m19
m20
m23
m24
m27
m28
m31
Minion
Shire
Bank0
(1MB)
Bank1
(1MB)
Bank2
(1MB)
Bank3
(1MB)
Four 8-Core
Neighborhoods
4MB Banked SRAM
Cache/Scratchpad
Local Sync Primitives
Mesh
Interconnect
Low Voltage
Nominal Voltage
6
2020 RISC-V Summit
More RISC-V’s on a Chip: 1089 ET-Minions & 4 ET-Maxionsin 7nm
LPDDR4x
DRAM
Ctrl
LPDDR4x
DRAM
Ctrl
PCIe 4 Maxions
34Minion Shires
- 1088ET-MinionProcessors
- 136MB on-diememory software
configurableasL2, L3 orScratchpad
- Sharedglobaladdressspace
ServiceProcessor
- 1 ET-MinionProcessor
4ET-MaxionProcessors
- High PerformanceOOO CPU
- Up to 5 RV64GC instructionissue/clock
- 4 MB PrivateL2
x8PCIe Gen4
SecureRootof Trust
LPDDR4xDRAMControllers
- Up to 32 GB DRAM
- 137GB/sec memory bandwidth
- 256-bitwideinterface
BlockdiagramofEsperanto’sEnergy-EfficientET-SoC-1Chip. Typicaloperatingpointunder20Watts.
7
2020 RISC-V Summit
PCIe switch
ET-SoC-1
1093 RISC-V Cores
140 MB SRAM
ET-SoC-1
1093 RISC-V Cores
140 MB SRAM
ET-SoC-1
1093 RISC-V Cores
140 MB SRAM
ET-SoC-1
1093 RISC-V Cores
140 MB SRAM
ET-SoC-1
1093 RISC-V Cores
140 MB SRAM
ET-SoC-1
1093 RISC-V Cores
140 MB SRAM
PCIe card interface
6558 RISC-V Cores on a Board withEsperanto’s Energy-EfficientChip
1536-BITWIDE MEMORY SYSTEMDELIVERS UPTO 822 GB/S OF ENERGY-EFFICIENT BANDWIDTH
24
DRAM
chips
192 GB
LPDDR4x
LPDDR4x
LPDDR4x
LPDDR4x
64 64 64 64
LPDDR4x
LPDDR4x
LPDDR4x
LPDDR4x
64 64 64 64
LPDDR4x
LPDDR4x
LPDDR4x
LPDDR4x
64 64 64 64
LPDDR4x
LPDDR4x
LPDDR4x
LPDDR4x
64 64 64 64
LPDDR4x
LPDDR4x
LPDDR4x
LPDDR4x
64 64 64 64
LPDDR4x
LPDDR4x
LPDDR4x
LPDDR4x
64 64 64 64
EnergyEfficiencyenablesEsperantotoputmultiple chips perboard,insteadof onebig hotchip.
8
2020 RISC-V Summit
Up to Six ET-SoC-1 Chipson a Glacier Pointv2 Card
Note:TheGlacierPointv2boarddesignhasbeenopen
sourcedthroughtheOpenComputeProjectandisavailablefor
purchase. ThreeEsperantoDualM.2modulescanmounton
thetopsideandthreeonthebottom.
Peakperformanceof> 800Tera-Ops8/ SecondwithET-Minionsoperatingat1GHz
ONECARDWITH UPTO:
- 6558RISC-VCores
- 192GB of DRAM
- 822GB/s DRAM Bandwidth
9
2020 RISC-V Summit
Note (1):TheCasefor theInfinite Data Center”– Gartner, Source: Gartner, Data CenterFrontier
OCP GlacierPointv2
AcceleratorCardholds:
• 6EsperantoAIchips
• 192GB DRAM
Yosemitev2Cubby holds:
• 4YosemiteSleds
• 48EsperantoAIchips
ExampleOCPDataCenter:
 @ 30sq.ft.perOCPrack(1)
 Estimated4,000-20,000racksperdatacenter
RackwithYosemitev2holds:
 8Yosemitev2Cubbies
 384EsperantoAIchips
Yosemitev2Sled holds:
• 1or2GlacierPoint
Acceleratorcards
• 12EsperantoAIchips
Yosemite
v2
x4 x8
Glacier Pointv2 Accelerator Fits in ExistingOCP Infrastructure
x2
Top of Rack Switch
PowerShelf
PowerShelf
10
2020 RISC-V Summit
ET GLOW Backend
ET Runtime
ET Device Driver
C++ …..
ONNX Models
Development Tools
Management Utilities
GLOW Compiler
(Facebook Open Source Project)
MS CNTK
GLOW runs on x86 Host
ML Models run across multiple
ET-SoC-1 chips
ML Model Frameworks
Console /
Debugger
Performance
Monitor
GLOW Frontend:
 GLOW = Graph LOWering
 Open Sourced by Facebook
 Hardware Independent Optimizations
 Divides work across n chips
GLOW Backend:
 Does Hardware Dependent
Optimizations
 Backend modified by ET to generate
instructions for ET-SoC-1 chip
GLOW IR (Intermediate Representation)
ET-SoC-1 instructions
. . .
Software:EsperantoSupports C++ /Pytorch and CommonML Frameworks
Diagnostics
Firmware
Updater
11
2020 RISC-V Summit
Balanced Architecture for Evolving Machine LearningWorkloads
- Models rangefrom computeintensive to memoryintensive with both dense and sparse matrix representation
- “Should not over-design hardware for GEMMs and Convolutions” *
Workload Use
Case
Model
Examples
Current Approach Attributes
Recommendation DLRM,
Wide&Deep,
NCF
• Large embedding
tables
• MLP based
compute
• Mix of memory
intensive and
compute
Computer Vision ResNets,
ResNext, Yolo,
M2Det
• CNN • Convolution
Natural Language
Processing
BERT, GPT3 • Multi-headed self-
attention
• Matrix compute
Key Hyperscaler MLWorkload Categories
Relative Importance*
100X
10X
1X
*MishaSmelyanskiy,Facebook, LinleyFallProcessorConference2019
“ChallengesandOpportunitiesof ArchitectingAISystemsatDatacenterScale”
Esperantoprovidesabalancedsolutionforbothdensecomputeandlargesparsely-accessedmemory
12
2020 RISC-V Summit
Esperanto Meets Hyperscaler AI InferencingChallenges
AIprocessing challenges include
delivering AI-based services while
reducing cost and complexity
Esperanto's energy-efficient, high-performancearchitecturewill scale fromHyperscale datacenters to Edge AI!
Esperanto’s custom RISC-V basedsolutions
deliver the requiredperformance andpower
efficiency, are “future proof,” anddon’t lock
Hyperscalers into legacysuppliers
Today most hyperscaler AI
inferencing workloads run on chips
with legacyarchitectures
Performance, energy use and
programmabilityof these solutions donot
meet demandingHyperscaler
requirements
13
2020 RISC-V Summit
Some of our Key DevelopmentPartners
Thankstoall ourpartnersfortheirhelp in bringing ourvision intoreality! Sorrywecan’tnameeveryone!
Confidential
Thank You

More Related Content

What's hot

What's hot (20)

Easily emulating full systems on amazon fpg as
Easily emulating full systems on amazon fpg asEasily emulating full systems on amazon fpg as
Easily emulating full systems on amazon fpg as
 
Educating the computer architects of tomorrow's critical systems with RISC-V
Educating the computer architects of tomorrow's critical systems with RISC-VEducating the computer architects of tomorrow's critical systems with RISC-V
Educating the computer architects of tomorrow's critical systems with RISC-V
 
Chips alliance omni xtend overview
Chips alliance omni xtend overviewChips alliance omni xtend overview
Chips alliance omni xtend overview
 
Andes andes clarity for risc-v vector processor
Andes andes clarity for risc-v vector processorAndes andes clarity for risc-v vector processor
Andes andes clarity for risc-v vector processor
 
Tech talk with lampro mellon an open source solution for accelerating verific...
Tech talk with lampro mellon an open source solution for accelerating verific...Tech talk with lampro mellon an open source solution for accelerating verific...
Tech talk with lampro mellon an open source solution for accelerating verific...
 
RISC-V 30906 hex five multi_zone iot firmware
RISC-V 30906 hex five multi_zone iot firmwareRISC-V 30906 hex five multi_zone iot firmware
RISC-V 30906 hex five multi_zone iot firmware
 
Coco co-desing and co-verification of masked software implementations on cp us
Coco   co-desing and co-verification of masked software implementations on cp usCoco   co-desing and co-verification of masked software implementations on cp us
Coco co-desing and co-verification of masked software implementations on cp us
 
RISC-V: The Open Era of Computing
RISC-V: The Open Era of ComputingRISC-V: The Open Era of Computing
RISC-V: The Open Era of Computing
 
RISC-V Foundation Overview
RISC-V Foundation OverviewRISC-V Foundation Overview
RISC-V Foundation Overview
 
Andes building a secure platform with the enhanced iopmp
Andes building a secure platform with the enhanced iopmpAndes building a secure platform with the enhanced iopmp
Andes building a secure platform with the enhanced iopmp
 
Andes RISC-V processor solutions
Andes RISC-V processor solutionsAndes RISC-V processor solutions
Andes RISC-V processor solutions
 
An Automatic Generation of NoC Architectures: An Application-Mapping Approach
An Automatic Generation of NoC Architectures: An Application-Mapping ApproachAn Automatic Generation of NoC Architectures: An Application-Mapping Approach
An Automatic Generation of NoC Architectures: An Application-Mapping Approach
 
Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...
Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...
Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...
 
An open flow for dn ns on ultra low-power RISC-V cores
An open flow for dn ns on ultra low-power RISC-V coresAn open flow for dn ns on ultra low-power RISC-V cores
An open flow for dn ns on ultra low-power RISC-V cores
 
Data on the move a RISC-V opportunity
Data on the move   a RISC-V opportunityData on the move   a RISC-V opportunity
Data on the move a RISC-V opportunity
 
Secure IoT Firmware for RISC-V
Secure IoT Firmware for RISC-VSecure IoT Firmware for RISC-V
Secure IoT Firmware for RISC-V
 
Andes RISC-V vector extension demystified-tutorial
Andes RISC-V vector extension demystified-tutorialAndes RISC-V vector extension demystified-tutorial
Andes RISC-V vector extension demystified-tutorial
 
RISC-V 30946 manuel_offenberg_v3_notes
RISC-V 30946 manuel_offenberg_v3_notesRISC-V 30946 manuel_offenberg_v3_notes
RISC-V 30946 manuel_offenberg_v3_notes
 
Reverse Engineering of Rocket Chip
Reverse Engineering of Rocket ChipReverse Engineering of Rocket Chip
Reverse Engineering of Rocket Chip
 
Ripes: Teaching Computer Architecture Through Visual and Interactive Simulators
Ripes: Teaching Computer Architecture Through Visual and Interactive SimulatorsRipes: Teaching Computer Architecture Through Visual and Interactive Simulators
Ripes: Teaching Computer Architecture Through Visual and Interactive Simulators
 

Similar to Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a single chip

“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
Edge AI and Vision Alliance
 
Brochure (2016-01-30)
Brochure (2016-01-30)Brochure (2016-01-30)
Brochure (2016-01-30)
Jonah McLeod
 
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
Edge AI and Vision Alliance
 

Similar to Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a single chip (20)

Assignmentdsp
AssignmentdspAssignmentdsp
Assignmentdsp
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
 
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
 
HiPEAC 2022-DL4IoT workshop_René Griessl presentation
HiPEAC 2022-DL4IoT workshop_René Griessl presentationHiPEAC 2022-DL4IoT workshop_René Griessl presentation
HiPEAC 2022-DL4IoT workshop_René Griessl presentation
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
 
Brochure (2016-01-30)
Brochure (2016-01-30)Brochure (2016-01-30)
Brochure (2016-01-30)
 
NGIoT Sustainability Workshop 2023_Rene Griessl presentation
NGIoT Sustainability Workshop 2023_Rene Griessl presentationNGIoT Sustainability Workshop 2023_Rene Griessl presentation
NGIoT Sustainability Workshop 2023_Rene Griessl presentation
 
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mãoWebinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
 
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
 
Crypto Performance on ARM Cortex-M Processors
Crypto Performance on ARM Cortex-M ProcessorsCrypto Performance on ARM Cortex-M Processors
Crypto Performance on ARM Cortex-M Processors
 
Jetson AGX Xavier and the New Era of Autonomous Machines
Jetson AGX Xavier and the New Era of Autonomous MachinesJetson AGX Xavier and the New Era of Autonomous Machines
Jetson AGX Xavier and the New Era of Autonomous Machines
 
ODSA Use Case - SmartNIC
ODSA Use Case - SmartNICODSA Use Case - SmartNIC
ODSA Use Case - SmartNIC
 
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
 
Arm Neoverse market update_05122020.pdf
Arm Neoverse market update_05122020.pdfArm Neoverse market update_05122020.pdf
Arm Neoverse market update_05122020.pdf
 
AMD K6
AMD K6AMD K6
AMD K6
 
HiPEAC-CSW 2022_Kevin Mika presentation
HiPEAC-CSW 2022_Kevin Mika presentationHiPEAC-CSW 2022_Kevin Mika presentation
HiPEAC-CSW 2022_Kevin Mika presentation
 
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
 
Cyclone II FPGA Overview
Cyclone II FPGA OverviewCyclone II FPGA Overview
Cyclone II FPGA Overview
 
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentationSS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
 
Piccolo F2806x Microcontrollers
Piccolo F2806x MicrocontrollersPiccolo F2806x Microcontrollers
Piccolo F2806x Microcontrollers
 

More from RISC-V International

More from RISC-V International (20)

WD RISC-V inliner work effort
WD RISC-V inliner work effortWD RISC-V inliner work effort
WD RISC-V inliner work effort
 
RISC-V Zce Extension
RISC-V Zce ExtensionRISC-V Zce Extension
RISC-V Zce Extension
 
RISC-V Online Tutor
RISC-V Online TutorRISC-V Online Tutor
RISC-V Online Tutor
 
London Open Source Meetup for RISC-V
London Open Source Meetup for RISC-VLondon Open Source Meetup for RISC-V
London Open Source Meetup for RISC-V
 
RISC-V Introduction
RISC-V IntroductionRISC-V Introduction
RISC-V Introduction
 
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...
Ziptillion   boosting RISC-V with an efficient and os transparent memory comp...Ziptillion   boosting RISC-V with an efficient and os transparent memory comp...
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...
 
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-V
 
Standardizing the tee with global platform and RISC-V
Standardizing the tee with global platform and RISC-VStandardizing the tee with global platform and RISC-V
Standardizing the tee with global platform and RISC-V
 
Security and functional safety
Security and functional safetySecurity and functional safety
Security and functional safety
 
RISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor FamilyRISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor Family
 
RISC-V 30910 kassem_ summit 2020 - so_c_gen
RISC-V 30910 kassem_ summit 2020 - so_c_genRISC-V 30910 kassem_ summit 2020 - so_c_gen
RISC-V 30910 kassem_ summit 2020 - so_c_gen
 
RISC-V 30908 patra
RISC-V 30908 patraRISC-V 30908 patra
RISC-V 30908 patra
 
RISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentorRISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentor
 
RISC-V software state of the union
RISC-V software state of the unionRISC-V software state of the union
RISC-V software state of the union
 
Ripes tracking computer architecture throught visual and interactive simula...
Ripes   tracking computer architecture throught visual and interactive simula...Ripes   tracking computer architecture throught visual and interactive simula...
Ripes tracking computer architecture throught visual and interactive simula...
 
Porting tock to open titan
Porting tock to open titanPorting tock to open titan
Porting tock to open titan
 
Open j9 jdk on RISC-V
Open j9 jdk on RISC-VOpen j9 jdk on RISC-V
Open j9 jdk on RISC-V
 
Open source manufacturable pdk for sky water 130nm process node
Open source manufacturable pdk for sky water 130nm process nodeOpen source manufacturable pdk for sky water 130nm process node
Open source manufacturable pdk for sky water 130nm process node
 
Online test program generator for RISC-V processors
Online test program generator for RISC-V processorsOnline test program generator for RISC-V processors
Online test program generator for RISC-V processors
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a single chip

  • 1. Confidential Esperanto Accelerates Machine Learning With 1000+ Low-Power RISC-V Cores on a Single Chip RISC-V Summit 2020 8 December 2020 CEO: art.swift@esperanto.ai
  • 2. 2 2020 RISC-V Summit Esperanto: Doubling Down on RISC Principles and RISC-V DavidR. Ditzel and DavidA. Patterson, the co- authors of“TheCase for the ReducedInstruction SetComputer” together at the 7th RISC-V Workshop Using RISC-Vasthe basisforourAI Processorstrategywaskey forus! RISC-VENABLESHARDWAREINNOVATION BROADECOSYSTEMEASESDEVELOPMENTTASKS SIMPLEINSTRUCTIONSETUSESFEWER GATES - Less complex designs - Smaller die size andlower costs - Reduced dynamic andstatic power consumption - Machine learning specific Instruction Set extensions - Custommicroarchitecture - Proprietarylow-power design techniques - Development tools - Operating systems andsoftware stacks - 3rd party IP
  • 3. 3 2020 RISC-V Summit Esperanto: Highly Scalable RISC-V AI Chip Solutions EsperantoET-SoC-1dieplot: 1000+RISC-VCustomCoreswith23.8Btransistors usingTSMC7nm manufacturingnode. Initialproductistargetedat datacenterinferencing. Esperanto’sTiledAI Solutionis Designed toScalefromHundreds toThousandsofCPU Cores BEST-IN-CLASSEFFICIENCY FUTURE-PROOFSOLUTION SUPERIORPERFORMANCE(1) - Up to50x better performance on Recommendation Networks - Up to30x better performance for Image Classification - 100x better energy efficiency (Inferences / Watt) on key workloads - Huge reduction in energy costs for datacenter customers - Fully programmable tohandle future AImodels - Leverages large, open programmingsoftware ecosystem - Industry-leading roadmapofhardwareandsoftware solutions (1) Comparing Esperanto full-chip emulation results with measured inference benchmark results for incumbent competitors. Characterized silicon results coming soon.
  • 4. 4 2020 RISC-V Summit DC bank 0 DC bank 1 DC bank 2 DC bank 3 Data Cache Control (including D-tags) Front End Trans ROMs 32-bit & 16-bit FMA Bypass TIMA TIMA VPU RF T0/T1 VPU RF T0/T1 32-bit & 16-bit FMA Bypass TIMA TIMA VPU RF T0/T1 32-bit & 16-bit FMA Bypass TIMA TIMA VPU RF T0/T1 Trans ROMs 32-bit & 16-bit FMA Bypass TIMA TIMA Trans ROMs 32-bit & 16-bit FMA Bypass TIMA TIMA VPU RF T0/T1 VPU RF T0/T1 32-bit & 16-bit FMA Bypass TIMA TIMA VPU RF T0/T1 32-bit & 16-bit FMA Bypass TIMA TIMA VPU RF T0/T1 Trans ROMs 32-bit & 16-bit FMA Bypass TIMA TIMA ET-Minionis an Energy-efficientRISC-V CPU with Vector/Tensor Unit ET-MINIONISACUSTOMBUILT 64-BITRISC-VPROCESSOR - In-order pipeline with low gates/stage toimproveMHz at low voltages - Architecture andCircuitsoptimized toenable low-voltage operation - 2 Hardwarethreads ofexecution - Softwareconfigurable L1 data-cache and/or scratchpad VECTOR/TENSORUNITOPTIMIZEDFORMACHINELEARNING - New multi-cycle Tensor Instructions - 256-bit wide Floating Point per cycle - 16 32-bit Single Precision operations per cycle - 32 16-bit Half Precision operations per cycle - 512-bit wide Integer per cycle - 128 8-bit integer operations per cycle - Vector transcendental instructions 512b Int8 RISC-V Integer Pipeline Vector/Tensor Unit 256b Floating Point Vector RF ET-Minion RISC-V Core and Tensor/Vector unit optimized for low-voltage operation to improve energy-efficiency RISC-V Integer L1 Data-Cache/Scratchpad Optimizedforenergy-efficient MLoperations.EachET-Minioncandeliver peakof128GOPs8 perGHz.
  • 5. 5 2020 RISC-V Summit 32 ET-Minion CPU’s and 4 MB memory form a “MinionShire” 32ET-MINIONRISC-VCORESPERMINIONSHIRE - Arrangedinfour 8-coreneighborhoods MEMORYHIERARCHYISSOFTWARECONFIGURABLE - L1 SRAM can be configured as data cache orscratchpad - 4MBL2 SRAM canbe configuredas PrivateL2, SharedL3 or scratchpad MESHCONNECTEDSHIRES MULTIPLESYNCHRONIZATIONPRIMITIVES - Fast Local Atomics - Fast Local Barriers - Fast Local Credit Counter - IPISupport 4x4 xbar Mesh stop m0 m3 m4 m7 m8 m11 m12 m15 m16 m19 m20 m23 m24 m27 m28 m31 Minion Shire Bank0 (1MB) Bank1 (1MB) Bank2 (1MB) Bank3 (1MB) Four 8-Core Neighborhoods 4MB Banked SRAM Cache/Scratchpad Local Sync Primitives Mesh Interconnect Low Voltage Nominal Voltage
  • 6. 6 2020 RISC-V Summit More RISC-V’s on a Chip: 1089 ET-Minions & 4 ET-Maxionsin 7nm LPDDR4x DRAM Ctrl LPDDR4x DRAM Ctrl PCIe 4 Maxions 34Minion Shires - 1088ET-MinionProcessors - 136MB on-diememory software configurableasL2, L3 orScratchpad - Sharedglobaladdressspace ServiceProcessor - 1 ET-MinionProcessor 4ET-MaxionProcessors - High PerformanceOOO CPU - Up to 5 RV64GC instructionissue/clock - 4 MB PrivateL2 x8PCIe Gen4 SecureRootof Trust LPDDR4xDRAMControllers - Up to 32 GB DRAM - 137GB/sec memory bandwidth - 256-bitwideinterface BlockdiagramofEsperanto’sEnergy-EfficientET-SoC-1Chip. Typicaloperatingpointunder20Watts.
  • 7. 7 2020 RISC-V Summit PCIe switch ET-SoC-1 1093 RISC-V Cores 140 MB SRAM ET-SoC-1 1093 RISC-V Cores 140 MB SRAM ET-SoC-1 1093 RISC-V Cores 140 MB SRAM ET-SoC-1 1093 RISC-V Cores 140 MB SRAM ET-SoC-1 1093 RISC-V Cores 140 MB SRAM ET-SoC-1 1093 RISC-V Cores 140 MB SRAM PCIe card interface 6558 RISC-V Cores on a Board withEsperanto’s Energy-EfficientChip 1536-BITWIDE MEMORY SYSTEMDELIVERS UPTO 822 GB/S OF ENERGY-EFFICIENT BANDWIDTH 24 DRAM chips 192 GB LPDDR4x LPDDR4x LPDDR4x LPDDR4x 64 64 64 64 LPDDR4x LPDDR4x LPDDR4x LPDDR4x 64 64 64 64 LPDDR4x LPDDR4x LPDDR4x LPDDR4x 64 64 64 64 LPDDR4x LPDDR4x LPDDR4x LPDDR4x 64 64 64 64 LPDDR4x LPDDR4x LPDDR4x LPDDR4x 64 64 64 64 LPDDR4x LPDDR4x LPDDR4x LPDDR4x 64 64 64 64 EnergyEfficiencyenablesEsperantotoputmultiple chips perboard,insteadof onebig hotchip.
  • 8. 8 2020 RISC-V Summit Up to Six ET-SoC-1 Chipson a Glacier Pointv2 Card Note:TheGlacierPointv2boarddesignhasbeenopen sourcedthroughtheOpenComputeProjectandisavailablefor purchase. ThreeEsperantoDualM.2modulescanmounton thetopsideandthreeonthebottom. Peakperformanceof> 800Tera-Ops8/ SecondwithET-Minionsoperatingat1GHz ONECARDWITH UPTO: - 6558RISC-VCores - 192GB of DRAM - 822GB/s DRAM Bandwidth
  • 9. 9 2020 RISC-V Summit Note (1):TheCasefor theInfinite Data Center”– Gartner, Source: Gartner, Data CenterFrontier OCP GlacierPointv2 AcceleratorCardholds: • 6EsperantoAIchips • 192GB DRAM Yosemitev2Cubby holds: • 4YosemiteSleds • 48EsperantoAIchips ExampleOCPDataCenter:  @ 30sq.ft.perOCPrack(1)  Estimated4,000-20,000racksperdatacenter RackwithYosemitev2holds:  8Yosemitev2Cubbies  384EsperantoAIchips Yosemitev2Sled holds: • 1or2GlacierPoint Acceleratorcards • 12EsperantoAIchips Yosemite v2 x4 x8 Glacier Pointv2 Accelerator Fits in ExistingOCP Infrastructure x2 Top of Rack Switch PowerShelf PowerShelf
  • 10. 10 2020 RISC-V Summit ET GLOW Backend ET Runtime ET Device Driver C++ ….. ONNX Models Development Tools Management Utilities GLOW Compiler (Facebook Open Source Project) MS CNTK GLOW runs on x86 Host ML Models run across multiple ET-SoC-1 chips ML Model Frameworks Console / Debugger Performance Monitor GLOW Frontend:  GLOW = Graph LOWering  Open Sourced by Facebook  Hardware Independent Optimizations  Divides work across n chips GLOW Backend:  Does Hardware Dependent Optimizations  Backend modified by ET to generate instructions for ET-SoC-1 chip GLOW IR (Intermediate Representation) ET-SoC-1 instructions . . . Software:EsperantoSupports C++ /Pytorch and CommonML Frameworks Diagnostics Firmware Updater
  • 11. 11 2020 RISC-V Summit Balanced Architecture for Evolving Machine LearningWorkloads - Models rangefrom computeintensive to memoryintensive with both dense and sparse matrix representation - “Should not over-design hardware for GEMMs and Convolutions” * Workload Use Case Model Examples Current Approach Attributes Recommendation DLRM, Wide&Deep, NCF • Large embedding tables • MLP based compute • Mix of memory intensive and compute Computer Vision ResNets, ResNext, Yolo, M2Det • CNN • Convolution Natural Language Processing BERT, GPT3 • Multi-headed self- attention • Matrix compute Key Hyperscaler MLWorkload Categories Relative Importance* 100X 10X 1X *MishaSmelyanskiy,Facebook, LinleyFallProcessorConference2019 “ChallengesandOpportunitiesof ArchitectingAISystemsatDatacenterScale” Esperantoprovidesabalancedsolutionforbothdensecomputeandlargesparsely-accessedmemory
  • 12. 12 2020 RISC-V Summit Esperanto Meets Hyperscaler AI InferencingChallenges AIprocessing challenges include delivering AI-based services while reducing cost and complexity Esperanto's energy-efficient, high-performancearchitecturewill scale fromHyperscale datacenters to Edge AI! Esperanto’s custom RISC-V basedsolutions deliver the requiredperformance andpower efficiency, are “future proof,” anddon’t lock Hyperscalers into legacysuppliers Today most hyperscaler AI inferencing workloads run on chips with legacyarchitectures Performance, energy use and programmabilityof these solutions donot meet demandingHyperscaler requirements
  • 13. 13 2020 RISC-V Summit Some of our Key DevelopmentPartners Thankstoall ourpartnersfortheirhelp in bringing ourvision intoreality! Sorrywecan’tnameeveryone!