Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the Datacenter

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 36 Anzeige

HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the Datacenter

Herunterladen, um offline zu lesen

Session ID: HKG18-500K1
Session Name: HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the Datacenter
Speaker: Not Available
Track: Keynote


★ Session Summary ★
For decades we have been able to take advantage of Moore’s Law to improve single thread performance, reduce power and cost with each generation of semiconductor technology. While technology has advanced after the end of Dennard scaling more than 10 years ago, the advances have slowed down. Server performance increases have relied on increasing core counts and power budgets.
At the same time, workloads have changed in the era of cloud computing. Scale out is becoming more important than scale up. Domain specific architectures have started to emerge to improve the energy efficiency of emerging workloads like deep learning
This talk will provide a historical perspective and discuss emerging trends driving the development of modern servers processors.


---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-500k1/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-500k1.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-500k1.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong

---------------------------------------------------
Keyword: Keynote
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961

Session ID: HKG18-500K1
Session Name: HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the Datacenter
Speaker: Not Available
Track: Keynote


★ Session Summary ★
For decades we have been able to take advantage of Moore’s Law to improve single thread performance, reduce power and cost with each generation of semiconductor technology. While technology has advanced after the end of Dennard scaling more than 10 years ago, the advances have slowed down. Server performance increases have relied on increasing core counts and power budgets.
At the same time, workloads have changed in the era of cloud computing. Scale out is becoming more important than scale up. Domain specific architectures have started to emerge to improve the energy efficiency of emerging workloads like deep learning
This talk will provide a historical perspective and discuss emerging trends driving the development of modern servers processors.


---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-500k1/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-500k1.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-500k1.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong

---------------------------------------------------
Keyword: Keynote
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the Datacenter (20)

Anzeige

Weitere von Linaro (20)

Aktuellste (20)

Anzeige

HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the Datacenter

  1. 1. Qualcomm Datacenter Technologies, Inc. Emerging Computing Trends in the Datacenter Dileep Bhandarkar, Ph. D. Vice President, Technology Linaro Connect Keynote – 23 March 2018, Hong Kong
  2. 2. Outline • Historical Perspective on 40 Years of Moore’s Law – Single Core Era enabled by Dennard Scaling • Post Dennard Scaling Drives Multi-Core Era • The Shift to Energy Efficient Multi-Core Designs for the Cloud • Heterogenous Computing Era with Application Specific Accelerators
  3. 3. The First 50 Years after Shockley’s Transistor Invention
  4. 4. 1958: Jack Kilby’s Integrated Circuit My 40+ Year Journey From Mainframes to Smartphones https://www.youtube.com/watch?v=7ptXpNFY3XM Bob Noyce’s Integrated Circuit
  5. 5. From 2300 to >1Billion Transistors Moore’s Law video at http://www.cs.ucr.edu/~gupta/hpca9/HPCA-PDFs/Moores_Law_Video_HPCA9.wmv
  6. 6. Dennard Scaling Device or Circuit Parameter Scaling Factor Device dimension tox, L, W 1/K Doping concentration Na K Voltage V 1/K Current I 1/K Capacitance eA/t 1/K Delay time per circuit VC/I 1/K Power dissipation per circuit VI 1/K2 Power density VI/A 1 The benefits of scaling : as transistors get smaller, they can switch faster and use less power. Each new generation of process technology was expected to reduce minimum feature size by approximately 0.7x (K ~1.4). A 0.7x reduction in linear features size provided roughly a 2x increase in transistor density. Dennard scaling broke down around 2004 with unscaled interconnect delays and our inability to scale the voltage and current due to reliability concerns. But increasing transistor density (Moore’s Law) has continued to enable multicore designs.
  7. 7. THE MULTICORE ERA SINGLE THREAD PERFORMANCE IMPROVEMENT SLOWING DOWN PERFORMANCE DRIVEN BY HIGHER CORE COUNT Post Dennard Scaling
  8. 8. Transistor Count Increasing Slower Improvement No Improvement Power Going Up With Performance Core count increasing to drive Performance Now Performance Improvement Comes from Higher Core Count at Similar Frequency with Each New Process Node
  9. 9. The last 5 Generations of ~135W Xeon Processors Slow Improvement in IPC but per thread performance constrained by power Performance data from www.spec.org 8 cores Mar 2012 10 cores Sep 2013 12 cores Sep 2014 14 cores Apr 2016 18 cores Jul 2017
  10. 10. No Improvement in Perf/Watt per Core even with higher power Performance data from www.spec.org
  11. 11. Era of Energy Efficient Cores
  12. 12. © 2017 Arm Limited12 Looking ahead from edge to cloud The future requires a new approach to CPU design Safe and autonomous Hyper-efficient Secure private compute Cortex beyond mobile Mixed reality Presented by Peter Greenhalgh at Hot Chips 2017
  13. 13. 13 Cloud Traditional Enterprise IT %Totaldatacenterserverrevenue 0% 25% 50% 75% 100% 2013 2014 2015 2016 2017 2018 2019 2020 Server Industry is shifting to the Cloud
  14. 14. Disruptions Come from Below! Mainframes Minicomputers RISC Systems Desktop PCs Notebooks Smart Phones Volume Performance Bell’s Law: hardware technology, networks, and interfaces allows new, smaller, more specialized computing devices to be introduced to serve a computing need.
  15. 15. 15 Qualcomm Datacenter Technologies Uniquely positioned to leverage mobile growth and drive datacenter process leadership 65nm 45nm 28nm 20nm 10nm 1st in the industry 14nm Mobile driven NowThen Fab process tech driven by PC Fab process tech driven by mobile phones PC driven 2008 2010 2012 2016 20182014 1.5B units 256M units Smartphone unitsPC units 45nm 32nm 10nm14nm22nm A new world in datacenter: Manufacturing process Mobile Technology Disrupting the Cloud Datacenter
  16. 16. 16 Qualcomm Centriq ™ 2400 Throughput performance Thread Density Quality of Service Energy Efficiency What Cloud means for Processor Architecture Key metrics • Perf / thread • Perf / Watt • Perf / mm2 The future requires a new approach to CPU design
  17. 17. Computational + server growth fuel datacenter energy efficiency considerations • 2014: US datacenters consumed 70 billion kilowatt- hours of electricity • Datacenters can cost between $10M and $20M per megawatt • Unused datacenter capacity can be expensive • 1W of server power can cost $1 per year in energy costs at 10 cents per KWH • Server power related costs can be 30-50% of overall datacenter operating costs • Servers need to be designed for average power consumption (not just max peak output) • Hyper-efficient designs necessary to improve server energy efficiency
  18. 18. 18 Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex Falkor duplex 8-Serdes SATACTL HDMA EMAC OCMEM QGIC USB USB USB USB PW QFPROMIMCMPM/CC 8-Serdes PCle 8-Serdes8-Serdes PCle 8-Serdes DDR DDR DDR MCMCMC DDR DDR DDR Coherent segmented ring interconnect L3L3L3L3 L3L3 L3L3L3L3 L3L3 MCMCMC • 48 custom Armv8 cores at 2.6 GHz peak frequency • Large 60 MB L3 cache • 6 DDR4 memory channels at 2667 MT/s • High bandwidth coherent ring • Low average power under typical load • Ultra low idle power • Cache Quality of Service • Inline memory bandwidth compression • Security rooted in hardware • Leading performance and energy efficiency Qualcomm Centriq 2400: Built for The Cloud Details at https://www.qualcomm.com/products/qualcomm-centriq-2400-processor
  19. 19. 19 Qualcomm Centriq 2400 Drives Perf/W and Perf/Thread Leadership 1 1.71 1.04 1.25 1.38 1 1.18 0.77 0.93 0.99 1 0.69 0.74 0.75 0.72 1 2.02 1.84 1.86 1.70 1 1.01 0.92 0.93 0.85 1 0.24 0.59 0.40 0.27 QDF 2460 PLATINUM 8180 GOLD 6138 PLATINUM 8160 PLATINUM 8170 Power SPECintrate2006 Perf/Watt Perf/Core Perf/Thread Perf/$ IsoPower IsoPerf 48 cores 120 W TDP 657 SIR2006 $1,995 20 cores 125 W TDP 504 SIR2006 $2,612 26 cores 165 W TDP 653 SIR2006 $7,405 28 cores 205 W TDP 775 SIR2006 $10,009 Top Bin E7 Price 24 cores 150 W TDP 612 SIR2006 $4,702 Top Bin E5 Price SKU Performance based on internal tests for SPECintrate2006 (SIR) estimates using gcc O2
  20. 20. 20 Qualcomm Centriq 2460 Lowers Average and Idle Power to Improve Cloud Server Density in Datacenters 0 20 40 60 80 100 120 AveragePower(Watts) 8W idle power 400. perlbench 401. bzip2 403. gcc 429. mcf 445. gobmk 456. hmmer 458. libquantum 464. h264ref 471. omnetpp 473. astar 458. sieng 483. xalancbmk SPECint®_rate2006 subtests 120W TDP Median = 65W
  21. 21. • Are we really serious about energy efficiency? • What should the Cost and Power constraints be? • How many instruction sets is too many? • X86, ARM, MIPS, Power, RISC V • Have we reached the limit of high core count? SW Scalability? • Do we need to improve single thread general purpose performance? • What should the power limit be for a single socket? • How much performance are we willing to sacrifice for better security? • Is there a fundamental conflict between multi-tenancy and security? • Cost and convenience vs extreme security? • When does device scaling end? Will there be a sub pico nm era? Many Questions to Ponder?
  22. 22. Heterogenous Computing Era
  23. 23. • Energy efficiency must be a implicit design target • Desktop PC CPU cores are too power hungry and not energy efficient • Wimpy cores are not good enough for servers • Servers can be designed by scaling up energy efficient mobile core design philosophy • Many workloads run best on different kinds of specialized processing engines • Each processing engine has its own strengths Lessons from Mobile Computing
  24. 24. • Order of Magnitude higher computational efficiency than general purpose processors • Can accept inefficient implementation to reduce time to market • Many potential applications – Machine Learning – Encryption – Data Compression – Video processing • Need reasonable volume for business case • Algorithms need to be stable • Can they be programmable? Where do FPGAs fit? The Age of Application Specific Accelerators
  25. 25. Before the emergence of DNNs  Algorithms and rule based systems were laboriously hand-coded But by 2012, the ingredients for change were available Sufficiently powerful GPU’s Readily available large data sets on the internet The Emergence of Deep Neural Networks Deep Neural Networks are becoming Pervasive  The turning point - ImageNet Competition 2012  “ImageNet Classification with Deep Convolutional Neural Networks”, Neural Information Processing Systems Conference (NIPS 2012)  Deep Neural Net enabled a performance breakthrough  Now - DNN’s are simpler to develop and deploy, ushering in radical change in many fields and entire industries
  26. 26. Deep Learning is Growing Exponentially Source: Google Source: Google
  27. 27. 2727 Devices,machines, and things are becoming more intelligent
  28. 28. 2828 Learn, infer context, anticipate Reasoning Act intuitively, interact naturally, protect privacy Action Hear, see, monitor, observe Perception Offering new capabilities to enrich our lives
  29. 29. 29 Where does compute need to be and why? . . . • Bandwidth / Backhaul traffic • Compute Resources • Power/Thermal Envelope • Privacy & Security • Latency • Reliability Central CloudDevices Edge Cloud
  30. 30. 30 What is “Edge”? Cloudlets / edge nodes / edge gateways ◦ 5-20ms latency ◦ Optionally co-located with access networks ◦ Few server racks per site . . . Customer devices ◦ Smartphones, connected cars, drones, IoT sensors/devices ◦ < 2 ms latency; millions of devices Customer premises ◦ Enterprises, homes, stadiums, cars ◦ < 5 ms latency; 1000s of devices Centralized clouds ◦ > 100 ms latency ◦ 5-100 per operator or cloud service provider ◦ 100s-1000s of server racks per site EDGE
  31. 31. Server/Cloud Training Execution/Inference Devices Execution/Inference AI is Increasingly Everywhere Inference: on device, on the edge cloud, or centralized cloud depending on use case characteristics (latency, bandwidth, context)
  32. 32. CPU • Free cycles available • ISA enhancements • Complementary with other accelerators GPU • Over-design (cost, power) for AI FPGA • Offers flexibility • Typically hard to program & expensive ASIC • Purpose-built • Energy and cost efficient • Expensive to design • Least flexible
  33. 33. Training tends toward concentrated, centralized computation Inference tends toward wide distribution GPUs Large DPU CPUs Small DPU CPUs Small DPU Low cost GPUs Large DPU Higher Cost
  34. 34. CPUs are not powerful enough for training, but have free cycles available for inference – opportunity for add-in accelerator cards  Instruction Set enhancements can improve performance GPUs have too much “extra baggage” that add cost and power for features not needed for AI – opportunity for domain specific accelerators FPGAs offer more flexibility, but are difficult to program and expensive ASICs are energy and product cost efficient, but less flexible Deep neural networks are making significant strides in many areas  speech, vision, language, search, robotics, medical imaging & treatment, drug discovery … We have an opportunity to dramatically reshape our computing devices to better serve this emerging and growing market Expect to see lots of innovation and excitement in the years to come Thoughts on Future Silicon for Deep Learning
  35. 35. • Single thread general purpose performance improvement is slowing down • Energy efficiency is extremely important in datacenters • ARM architecture enables energy efficient designs with good performance • Typical-use efficiency is becoming more important than peak output efficiency in enterprise data centers • Idle mode power will become more important for servers • Smart power management can dynamically optimize server operation to improve efficiency in normal use • Security improvements need even if they cost performance • There is plenty of opportunity for innovation on new application specific architectures targeted for specific workloads Concluding Remarks Speculation Can Lead to a Meltdown!
  36. 36. Follow us on: For more information, visit us at: www.qualcomm.com & www.qualcomm.com/blog Nothing in these materials is an offer to sell any of the components or devices referenced herein. ©2018 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Qualcomm is a trademark of Qualcomm Incorporated, registered in the United States and other countries, Qualcomm Centriq and Falkor are trademarks of Qualcomm Incorporated. Other products and brand names may be trademarks or registered trademarks of their respective owners. References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product and services businesses, including its semiconductor business, QCT. Thank you

×