SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
NST 121
Computer Systems
Fundamentals
INTRODUCTION TO COMPUTERS
Gary Tarolli - 3dfx and Nvidia
3D Graphics Engineer
Monday, April 27
3D Graphics from my career perspective
1974-1978 BS. Math RPI (minor in CS)
1979-1980 MS CS Caltech
1980-1983 Digital Equipment Corp
1984-1992 Silicon Graphics, Inc
1992-1993 consulting
1993-2000 3dfx
2000- nvidia
or “Moore’s Law viewed from my career”
Moore’s law at 50 (years) publication came in the mail last week …
Various articles in the news too … should we throw a party or a wake ?
Moore’s law in action over 4 decades
Moore’s Law : http://www.mooreslaw.org
The most popular formulation is :
the number of transistors on and integrated circuit
doubles about every two years. (same size chip)
e.g. 500nm to 350nm is sqrt(2) shrink on one side
of a chip, so square = 2x as dense (# transistors)
Note: in addition the clock speed increases
and the chip area increases (better manufacturing)
Cost per transistor or performance drops!
Result: trends over 4 decades …
Mainframe (IBM) => minicomputer (DEC) => workstation (SGI) => PC (3dfx)
The rise of importance of 3D graphics and hence graphics chips
Consolidation in the 3d graphics industry
◦ ~40 3d graphics chip startups in 1994
◦ Only a few independent companies left : nvidia, Imagination Technologies (Power VR)
◦ 2 cpu/system companies : Intel, AMD , Apple
Surprise: graphics chips power supercomputers
Surprise: cars
◦ 8 million cars with nvidia chips in them, many more coming
◦ Self driving cars are coming: enabled by supercomputing power in cheap chips
Surprise: deep neural net learning enabled by this computing power is exploding
Coming soon … ???
The Age of Intelligent Machines by Ray Kurzweil
http://en.wikipedia.org/wiki/The_Singularity_Is_Near
You probably don’t believe this now,
see if you do in an hour …
So let’s begin the journey …
1974-1978 : BS. Math & CS RPI
1974 – my first calculator : HP-35 purchased for college ($270? – a few weeks salary)
1975 – my first computer program on an IBM 360 mainframe
(using my friends engineering account)
1979-1980 : MS CS Caltech
1979 – played networked Star Trek on Xerox Alto : black and white bit-mapped graphics
until 4am , living off of $.25 ice cream sandwiches
1979-1980 : MS CS Caltech …
Worked on VLSI CAD tools for custom chips, humans draw every single wire for every single
transistor on a chip
inverter inverter
1979-1980 : MS CS Caltech …
MIT class projects in 1978
1980-1983 : DEC (minicomputer) #93246
CPUS were still many boards of logic
I worked on VLSI CAD tools so we could design a single chip VAX, called microVAX
And go
from this :
A refrigerator filled with boards …
1980-1983 : DEC (minicomputer) …
To this …
1984-1992 : SGI(workstation) #36
IRIS 1000 workstation (1984) : $10,000 to $30,000 - 8 MHz Motorola 68010
IRIS 1400 workstation: ran at 10 MHz , had 1.5 MB of RAM and a 73 MB disk drive
My other claim to fame: http://en.wikipedia.org/wiki/SGI_Dogfight
1984-1992 : SGI, Silicon Graphics, Inc …
IRIS Indigo (1992) : $6000 - 33 MHz MIPS R3000
◦ 100k lines/sec, 10k triangles/sec
◦ Almost all of SGI GL library implemented in software on MIPs
1984-1992 : SGI, Silicon Graphics, Inc …
1991: IRIS vision: $4000 board set for the PC, ISA and microchannel
◦ http://en.wikipedia.org/wiki/IrisVision
Intel 486 and bus architecture just too slow, so died in obscurity …
But a few of us (Sellers, Smith, Tarolli, aka SST) and others realized what was coming
… faster Pentiums, Moore’s law (smaller, denser chips) , PCI bus ….
and that SGI would be out of business some day if it didn’t transform itself
But going from 80% margins to 20% margins is not easy to swallow. They did not …
we voted with our feet and left (along with others who went to Nvidia and elsewhere)
and they paid the price…by 2000 SGI was in decline … died in 2009 … about 20 years later …
$0 to $5 billion back to $0
Onyx Reality Engine (1992) : $50,000 to $80,000 – 100 MHz R4400
Beautiful real-time texture mapped graphics (divide per pixel)
◦ 1M triangles/sec, 100 Mpixels/sec
1984-1992 : SGI, Silicon Graphics, Inc …
1993-2000 : 3Dfx (PC) employee #1
Why:
◦ Entrepreneurs – eventually need to start their own company (and hopefully get rich in the process)
◦ We saw a problem within SGI, and an opportunity in 3d PC graphics
◦ Engineers – we saw a cool problem and wanted to solve it
◦ We realized the gaming market was a lot bigger than anyone knew
◦ ~$5B at the time, almost as big as movie industry
◦ Today it is MUCH larger, over $100B worldwide for all games, dwarfs the movie industry
Goal:
◦ Produce similar images as Reality Engine for $500 in real-time, i.e. 30 fps
◦ Similar means reduced quality (less bit depth) but still excellent
Activation energy: Caroline said “Just do it” one day
1993-2000 : 3Dfx (PC) …
How:
◦ Take maximum usage of just arriving technology
◦ Aim high – don’t sacrifice quality, do the entire Reality Engine pipeline at full speed
◦ Make it easy to program , no difficult choices : e.g. trading off speed for quality
◦ Included ALL the important features of Reality Engine: shading, zbuffering, alpha-blending, fog, quality texturing and filtering
◦ Listened to game developers and professionals – tech. advisory board
◦ John Carmack (id)
◦ Tim Sweeney (Epic)
◦ Tom Porter (Pixar)
A bit of luck, ok a lot?
◦ $500 too costly for consumer market, so we targeted the arcades
◦ And 3dfx ended up in various arcade machines, SF Rush, Gretzky Hockey, NFL Blitz, Mace, etc.
◦ Memory prices fell dramatically resulting in a $300 board and enabled the consumer market
1993-2000 : 3Dfx (PC) …
Key to quality texture mapping is per-pixel divide
◦ Very costly
◦ Key is to be just good enough
◦ We didn’t need 32 bit results, only about 18-20 bits
◦ Just enough to not be visually distracting
◦ So we used a table lookup, and then linear interpolation (which helped a lot)
◦ Remember those sin/cos/tan tables in high school trig? Same basic idea
◦ 6 bit index (64 entries, 15 bits wide, ends up in a PLA optimized ROM)
◦ 4 bit interpolation, adds another 3-4 bits
◦ Input is float, so shift result by exponent since log(1/x) = -log(x) = -exponent(x) in float representation
Simplify full equations using math, e.g. LOD = .5 * Log2 ( sqrt(dsdx2 + dsdy2))
◦ Log2 (sqrt(x)) = .5 * Log2 (x)
1993-2000 : 3Dfx
1993-2000 : 3Dfx …
C simulator
◦ Very fast bit accurate simulator for the chip
◦ 10k to 50k lines of C code
◦ Can research algorithms quickly
◦ Up and running well before RTL simulator
◦ You can develop software and hardware tests on C simulator
RTL simulator
◦ Verilog
Before tapeout, we compare C vs Verilog results for chip functional tests that we write
Story time : code then test, vs test then code
1993-2000 : 3Dfx… debugging
Yogi Berra: In theory there is no difference between theory and practice. In practice there is.
From Bandits? : Always expect the unexpected, except of course the truly unexpected …
Me: If you cannot believe there is a bug (in your code), then you will never find it.
1993-2000 : 3Dfx Voodoo 1
Voodoo 1 – 50 Mhz chip, 500 nm chip, 50 Mhz mem (4MB), 50 Mpixels/sec
◦ Each chip was ~1 million transistors, 250k gates
1993-2000 : 3Dfx Voodoo 1
System architecture – perhaps my best work ever (along with Scott Sellers)
1993-2000 : 3Dfx Voodoo 1 results
Images tell the story … compared to Reality Engine …
1993-2000 : 3Dfx Voodoo 2
1993-2000 : 3Dfx Voodoo 2 , 3
Voodoo 3 : ~4 years after Voodoo 1
1 chip vs 2-3 chips
Density: 250 nm vs 500 nm = 4x more logic (2x went to reduce the chip count)
Clock rate: 50 Mhz to 200 Mhz
Memory: 50 Mhz to 166 Mhz , 4 MB to 16 MB
https://en.wikipedia.org/wiki/Comparison_of_3dfx_graphics_processing_units
2000-now : nvidia
We goofed, missed a product cycle/schedule, tactical and strategic mistakes and poof!
◦ Another one bites the dust
One strategic mistake – we did not put T&L on a chip until too late
◦ our next product had T&L , but it was still in the lab
◦ I thought CPU companies (Intel, IBM, AMD) had more at stake in floating point than we did
◦ They peaked out at 8-16 cores, and IEEE float performance was not their #1 priority
◦ GPUs became more important than I think anyone ever thought (we didn’t truly believe ourselves?)
◦ Enabled high $$$ investment in GPU floating point, where I thought it would end up on CPU
◦ Supercomputer speed floating point is basically for free on a GPU
◦ 80% of the GPU area is just a massively parallel SIMD floating point supercomputer
◦ Many times more powerful than the early CRAY supercomputers
2000-now : nvidia Titan X
Unreal Engine demo: http://content.jwplatform.com/previews/tDgR1DxI-sy1F28d9
4x8 green dots = one SM (SIMD cpu)
3072 of them on the die
Each is ~Voodoo 2 or more
2000-now : 1995 + 20 years = 2015
over 20 years Moore’s law says we should expect 2**10 increase or 1000x
Voodoo 1 Titan X x increase
Transistors 2 M (2 chips) 8000 M 4000
Cores 1 2000-3000 2500
Technology 500 nm 28 nm 300
Area 100 mm2 600 mm2 6
Triangles/sec 1 M 6000 M 6000
Mpixels/sec 100M 100,000 M 1000
Ops/sec 5 B (8b) 7000 B (32b ieee) 1000
Memory b/w < 1 GB/sec 340 GB/sec 400
Power 4 watts 250 watts (the price you pay)
Frequency 50 Mhz 1000 Mhz 20
Memory 4 MB 12,000 MB 3000
Cost $500 $1000 2
Design 5 man years ($5M) >500 man years ($500M) 100
CPUs vs GPUS
Graphics is embarrassingly parallel ! (millions of pixels on the screen)
◦ Which is why 1000-3000 cores can be efficient
◦ If your PC has 1000-3000 cores, what would they do?
PIXAR field trip (while at 3dfx)
◦ Server room full of Sun workstations
◦ Limit is how much computing power you can fit in that physical room (and A/C)
Supercomputers
◦ Super computers are often limited to a power budget in MWatts for cpus and A/C
◦ Once GPUs were general enough and supported 32b and 64b IEEE floating point ….
2000-now : 3dfx + nvidia … looking back
Need I say more:
1995: 0% of consumer PCs have 3d graphics accelerators
2015: 100% penetration (embedded accelerator in all Intel and AMD chips)
Deep neural net analysis, deep learning
Is this the key to Artificial Intelligence becoming real?
Intel 16 core XEON = 43 days to train a DNN problem
Titan-X = 1.5 days
Next year < 1 day
5 years … 1 hour (with software advances)
20 years … 1 sec to 1 minute ?
Coming soon … ???
The Age of Intelligent Machines by Ray Kurzweil
Now do you believe?
Is Artificial Intelligence really almost here?
GPU Fanatic (last week this came in my nvidia email)
Ray Kurzweil, a renowned futurist and the director of engineering at Google:
“…the hardware needed to emulate the human brain may be ready even
sooner than he predicted — in around 2020 — using technologies such as
graphics processing units (GPUs), which are ideal for brain-software
algorithms.” (Washington Post, 4/23/14)
Self promoting Links:
http://www.thedodgegarage.com/3dfx/
https://en.wikipedia.org/wiki/3dfx_Interactive
simply google everything else, e.g. deep learning
(that’s what I did)

Weitere ähnliche Inhalte

Ähnlich wie 3dfx, nvidia, Moore's Law and more...

AI gold rush, tool vendors and the next big thing
AI gold rush, tool vendors and the next big thingAI gold rush, tool vendors and the next big thing
AI gold rush, tool vendors and the next big thingAlbert Y. C. Chen
 
Future of computing is boring (and that is exciting!)
Future of computing is boring (and that is exciting!) Future of computing is boring (and that is exciting!)
Future of computing is boring (and that is exciting!) alekn
 
My amazing journey from mainframes to smartphones chm lecture aug 2014 final
My amazing journey from mainframes to smartphones  chm lecture aug 2014 finalMy amazing journey from mainframes to smartphones  chm lecture aug 2014 final
My amazing journey from mainframes to smartphones chm lecture aug 2014 finalDileep Bhandarkar
 
Comparison between computers of past and present
Comparison between computers of past and presentComparison between computers of past and present
Comparison between computers of past and presentMuhammad Danish Badar
 
Video Terminal Evolution and The Future of Browsers
Video Terminal Evolution and The Future of BrowsersVideo Terminal Evolution and The Future of Browsers
Video Terminal Evolution and The Future of BrowsersThomas Walker Lynch
 
Appsterdam talk - about the chips inside your phone
Appsterdam talk - about the chips inside your phoneAppsterdam talk - about the chips inside your phone
Appsterdam talk - about the chips inside your phonemarcocjacobs
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)Fatima Qayyum
 
arquitectura_de_las_pc.pdf
arquitectura_de_las_pc.pdfarquitectura_de_las_pc.pdf
arquitectura_de_las_pc.pdfbrydyl
 
The Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half OverThe Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half Overinside-BigData.com
 
Computer Basics
Computer Basics Computer Basics
Computer Basics BIT DURG
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Codemotion
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architectureDhaval Kaneria
 
Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64Scala Italy
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010TELECOM I+D
 
IBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOMEIBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOMEIBM Research
 
Energy Efficiant Computing in the 21c
Energy Efficiant Computing in the 21cEnergy Efficiant Computing in the 21c
Energy Efficiant Computing in the 21cIan Phillips
 
Kickstaring the transition to parallel computing with open hardware
Kickstaring the transition to parallel computing with open hardwareKickstaring the transition to parallel computing with open hardware
Kickstaring the transition to parallel computing with open hardwareAndreas Olofsson
 
Internet Of Things: Hands on: YOW! night
Internet Of Things: Hands on: YOW! nightInternet Of Things: Hands on: YOW! night
Internet Of Things: Hands on: YOW! nightAndy Gelme
 
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyThe von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyPerry Lea
 

Ähnlich wie 3dfx, nvidia, Moore's Law and more... (20)

AI gold rush, tool vendors and the next big thing
AI gold rush, tool vendors and the next big thingAI gold rush, tool vendors and the next big thing
AI gold rush, tool vendors and the next big thing
 
Future of computing is boring (and that is exciting!)
Future of computing is boring (and that is exciting!) Future of computing is boring (and that is exciting!)
Future of computing is boring (and that is exciting!)
 
My amazing journey from mainframes to smartphones chm lecture aug 2014 final
My amazing journey from mainframes to smartphones  chm lecture aug 2014 finalMy amazing journey from mainframes to smartphones  chm lecture aug 2014 final
My amazing journey from mainframes to smartphones chm lecture aug 2014 final
 
Co315 part 1
Co315   part 1Co315   part 1
Co315 part 1
 
Comparison between computers of past and present
Comparison between computers of past and presentComparison between computers of past and present
Comparison between computers of past and present
 
Video Terminal Evolution and The Future of Browsers
Video Terminal Evolution and The Future of BrowsersVideo Terminal Evolution and The Future of Browsers
Video Terminal Evolution and The Future of Browsers
 
Appsterdam talk - about the chips inside your phone
Appsterdam talk - about the chips inside your phoneAppsterdam talk - about the chips inside your phone
Appsterdam talk - about the chips inside your phone
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
 
arquitectura_de_las_pc.pdf
arquitectura_de_las_pc.pdfarquitectura_de_las_pc.pdf
arquitectura_de_las_pc.pdf
 
The Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half OverThe Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half Over
 
Computer Basics
Computer Basics Computer Basics
Computer Basics
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010
 
IBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOMEIBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOME
 
Energy Efficiant Computing in the 21c
Energy Efficiant Computing in the 21cEnergy Efficiant Computing in the 21c
Energy Efficiant Computing in the 21c
 
Kickstaring the transition to parallel computing with open hardware
Kickstaring the transition to parallel computing with open hardwareKickstaring the transition to parallel computing with open hardware
Kickstaring the transition to parallel computing with open hardware
 
Internet Of Things: Hands on: YOW! night
Internet Of Things: Hands on: YOW! nightInternet Of Things: Hands on: YOW! night
Internet Of Things: Hands on: YOW! night
 
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyThe von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
 

Kürzlich hochgeladen

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Kürzlich hochgeladen (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

3dfx, nvidia, Moore's Law and more...

  • 2. Gary Tarolli - 3dfx and Nvidia 3D Graphics Engineer Monday, April 27
  • 3. 3D Graphics from my career perspective 1974-1978 BS. Math RPI (minor in CS) 1979-1980 MS CS Caltech 1980-1983 Digital Equipment Corp 1984-1992 Silicon Graphics, Inc 1992-1993 consulting 1993-2000 3dfx 2000- nvidia
  • 4. or “Moore’s Law viewed from my career” Moore’s law at 50 (years) publication came in the mail last week … Various articles in the news too … should we throw a party or a wake ?
  • 5. Moore’s law in action over 4 decades Moore’s Law : http://www.mooreslaw.org The most popular formulation is : the number of transistors on and integrated circuit doubles about every two years. (same size chip) e.g. 500nm to 350nm is sqrt(2) shrink on one side of a chip, so square = 2x as dense (# transistors) Note: in addition the clock speed increases and the chip area increases (better manufacturing) Cost per transistor or performance drops!
  • 6. Result: trends over 4 decades … Mainframe (IBM) => minicomputer (DEC) => workstation (SGI) => PC (3dfx) The rise of importance of 3D graphics and hence graphics chips Consolidation in the 3d graphics industry ◦ ~40 3d graphics chip startups in 1994 ◦ Only a few independent companies left : nvidia, Imagination Technologies (Power VR) ◦ 2 cpu/system companies : Intel, AMD , Apple Surprise: graphics chips power supercomputers Surprise: cars ◦ 8 million cars with nvidia chips in them, many more coming ◦ Self driving cars are coming: enabled by supercomputing power in cheap chips Surprise: deep neural net learning enabled by this computing power is exploding
  • 7. Coming soon … ??? The Age of Intelligent Machines by Ray Kurzweil http://en.wikipedia.org/wiki/The_Singularity_Is_Near You probably don’t believe this now, see if you do in an hour … So let’s begin the journey …
  • 8. 1974-1978 : BS. Math & CS RPI 1974 – my first calculator : HP-35 purchased for college ($270? – a few weeks salary) 1975 – my first computer program on an IBM 360 mainframe (using my friends engineering account)
  • 9. 1979-1980 : MS CS Caltech 1979 – played networked Star Trek on Xerox Alto : black and white bit-mapped graphics until 4am , living off of $.25 ice cream sandwiches
  • 10. 1979-1980 : MS CS Caltech … Worked on VLSI CAD tools for custom chips, humans draw every single wire for every single transistor on a chip inverter inverter
  • 11. 1979-1980 : MS CS Caltech … MIT class projects in 1978
  • 12. 1980-1983 : DEC (minicomputer) #93246 CPUS were still many boards of logic I worked on VLSI CAD tools so we could design a single chip VAX, called microVAX And go from this : A refrigerator filled with boards …
  • 13. 1980-1983 : DEC (minicomputer) … To this …
  • 14. 1984-1992 : SGI(workstation) #36 IRIS 1000 workstation (1984) : $10,000 to $30,000 - 8 MHz Motorola 68010 IRIS 1400 workstation: ran at 10 MHz , had 1.5 MB of RAM and a 73 MB disk drive My other claim to fame: http://en.wikipedia.org/wiki/SGI_Dogfight
  • 15. 1984-1992 : SGI, Silicon Graphics, Inc … IRIS Indigo (1992) : $6000 - 33 MHz MIPS R3000 ◦ 100k lines/sec, 10k triangles/sec ◦ Almost all of SGI GL library implemented in software on MIPs
  • 16. 1984-1992 : SGI, Silicon Graphics, Inc … 1991: IRIS vision: $4000 board set for the PC, ISA and microchannel ◦ http://en.wikipedia.org/wiki/IrisVision Intel 486 and bus architecture just too slow, so died in obscurity … But a few of us (Sellers, Smith, Tarolli, aka SST) and others realized what was coming … faster Pentiums, Moore’s law (smaller, denser chips) , PCI bus …. and that SGI would be out of business some day if it didn’t transform itself But going from 80% margins to 20% margins is not easy to swallow. They did not … we voted with our feet and left (along with others who went to Nvidia and elsewhere) and they paid the price…by 2000 SGI was in decline … died in 2009 … about 20 years later … $0 to $5 billion back to $0
  • 17. Onyx Reality Engine (1992) : $50,000 to $80,000 – 100 MHz R4400 Beautiful real-time texture mapped graphics (divide per pixel) ◦ 1M triangles/sec, 100 Mpixels/sec 1984-1992 : SGI, Silicon Graphics, Inc …
  • 18. 1993-2000 : 3Dfx (PC) employee #1 Why: ◦ Entrepreneurs – eventually need to start their own company (and hopefully get rich in the process) ◦ We saw a problem within SGI, and an opportunity in 3d PC graphics ◦ Engineers – we saw a cool problem and wanted to solve it ◦ We realized the gaming market was a lot bigger than anyone knew ◦ ~$5B at the time, almost as big as movie industry ◦ Today it is MUCH larger, over $100B worldwide for all games, dwarfs the movie industry Goal: ◦ Produce similar images as Reality Engine for $500 in real-time, i.e. 30 fps ◦ Similar means reduced quality (less bit depth) but still excellent Activation energy: Caroline said “Just do it” one day
  • 19. 1993-2000 : 3Dfx (PC) … How: ◦ Take maximum usage of just arriving technology ◦ Aim high – don’t sacrifice quality, do the entire Reality Engine pipeline at full speed ◦ Make it easy to program , no difficult choices : e.g. trading off speed for quality ◦ Included ALL the important features of Reality Engine: shading, zbuffering, alpha-blending, fog, quality texturing and filtering ◦ Listened to game developers and professionals – tech. advisory board ◦ John Carmack (id) ◦ Tim Sweeney (Epic) ◦ Tom Porter (Pixar) A bit of luck, ok a lot? ◦ $500 too costly for consumer market, so we targeted the arcades ◦ And 3dfx ended up in various arcade machines, SF Rush, Gretzky Hockey, NFL Blitz, Mace, etc. ◦ Memory prices fell dramatically resulting in a $300 board and enabled the consumer market
  • 20. 1993-2000 : 3Dfx (PC) … Key to quality texture mapping is per-pixel divide ◦ Very costly ◦ Key is to be just good enough ◦ We didn’t need 32 bit results, only about 18-20 bits ◦ Just enough to not be visually distracting ◦ So we used a table lookup, and then linear interpolation (which helped a lot) ◦ Remember those sin/cos/tan tables in high school trig? Same basic idea ◦ 6 bit index (64 entries, 15 bits wide, ends up in a PLA optimized ROM) ◦ 4 bit interpolation, adds another 3-4 bits ◦ Input is float, so shift result by exponent since log(1/x) = -log(x) = -exponent(x) in float representation Simplify full equations using math, e.g. LOD = .5 * Log2 ( sqrt(dsdx2 + dsdy2)) ◦ Log2 (sqrt(x)) = .5 * Log2 (x)
  • 22. 1993-2000 : 3Dfx … C simulator ◦ Very fast bit accurate simulator for the chip ◦ 10k to 50k lines of C code ◦ Can research algorithms quickly ◦ Up and running well before RTL simulator ◦ You can develop software and hardware tests on C simulator RTL simulator ◦ Verilog Before tapeout, we compare C vs Verilog results for chip functional tests that we write Story time : code then test, vs test then code
  • 23. 1993-2000 : 3Dfx… debugging Yogi Berra: In theory there is no difference between theory and practice. In practice there is. From Bandits? : Always expect the unexpected, except of course the truly unexpected … Me: If you cannot believe there is a bug (in your code), then you will never find it.
  • 24. 1993-2000 : 3Dfx Voodoo 1 Voodoo 1 – 50 Mhz chip, 500 nm chip, 50 Mhz mem (4MB), 50 Mpixels/sec ◦ Each chip was ~1 million transistors, 250k gates
  • 25. 1993-2000 : 3Dfx Voodoo 1 System architecture – perhaps my best work ever (along with Scott Sellers)
  • 26. 1993-2000 : 3Dfx Voodoo 1 results Images tell the story … compared to Reality Engine …
  • 27. 1993-2000 : 3Dfx Voodoo 2
  • 28. 1993-2000 : 3Dfx Voodoo 2 , 3 Voodoo 3 : ~4 years after Voodoo 1 1 chip vs 2-3 chips Density: 250 nm vs 500 nm = 4x more logic (2x went to reduce the chip count) Clock rate: 50 Mhz to 200 Mhz Memory: 50 Mhz to 166 Mhz , 4 MB to 16 MB https://en.wikipedia.org/wiki/Comparison_of_3dfx_graphics_processing_units
  • 29. 2000-now : nvidia We goofed, missed a product cycle/schedule, tactical and strategic mistakes and poof! ◦ Another one bites the dust One strategic mistake – we did not put T&L on a chip until too late ◦ our next product had T&L , but it was still in the lab ◦ I thought CPU companies (Intel, IBM, AMD) had more at stake in floating point than we did ◦ They peaked out at 8-16 cores, and IEEE float performance was not their #1 priority ◦ GPUs became more important than I think anyone ever thought (we didn’t truly believe ourselves?) ◦ Enabled high $$$ investment in GPU floating point, where I thought it would end up on CPU ◦ Supercomputer speed floating point is basically for free on a GPU ◦ 80% of the GPU area is just a massively parallel SIMD floating point supercomputer ◦ Many times more powerful than the early CRAY supercomputers
  • 30. 2000-now : nvidia Titan X Unreal Engine demo: http://content.jwplatform.com/previews/tDgR1DxI-sy1F28d9 4x8 green dots = one SM (SIMD cpu) 3072 of them on the die Each is ~Voodoo 2 or more
  • 31. 2000-now : 1995 + 20 years = 2015 over 20 years Moore’s law says we should expect 2**10 increase or 1000x Voodoo 1 Titan X x increase Transistors 2 M (2 chips) 8000 M 4000 Cores 1 2000-3000 2500 Technology 500 nm 28 nm 300 Area 100 mm2 600 mm2 6 Triangles/sec 1 M 6000 M 6000 Mpixels/sec 100M 100,000 M 1000 Ops/sec 5 B (8b) 7000 B (32b ieee) 1000 Memory b/w < 1 GB/sec 340 GB/sec 400 Power 4 watts 250 watts (the price you pay) Frequency 50 Mhz 1000 Mhz 20 Memory 4 MB 12,000 MB 3000 Cost $500 $1000 2 Design 5 man years ($5M) >500 man years ($500M) 100
  • 32. CPUs vs GPUS Graphics is embarrassingly parallel ! (millions of pixels on the screen) ◦ Which is why 1000-3000 cores can be efficient ◦ If your PC has 1000-3000 cores, what would they do? PIXAR field trip (while at 3dfx) ◦ Server room full of Sun workstations ◦ Limit is how much computing power you can fit in that physical room (and A/C) Supercomputers ◦ Super computers are often limited to a power budget in MWatts for cpus and A/C ◦ Once GPUs were general enough and supported 32b and 64b IEEE floating point ….
  • 33. 2000-now : 3dfx + nvidia … looking back Need I say more: 1995: 0% of consumer PCs have 3d graphics accelerators 2015: 100% penetration (embedded accelerator in all Intel and AMD chips)
  • 34. Deep neural net analysis, deep learning Is this the key to Artificial Intelligence becoming real? Intel 16 core XEON = 43 days to train a DNN problem Titan-X = 1.5 days Next year < 1 day 5 years … 1 hour (with software advances) 20 years … 1 sec to 1 minute ?
  • 35. Coming soon … ??? The Age of Intelligent Machines by Ray Kurzweil Now do you believe? Is Artificial Intelligence really almost here? GPU Fanatic (last week this came in my nvidia email) Ray Kurzweil, a renowned futurist and the director of engineering at Google: “…the hardware needed to emulate the human brain may be ready even sooner than he predicted — in around 2020 — using technologies such as graphics processing units (GPUs), which are ideal for brain-software algorithms.” (Washington Post, 4/23/14)

Hinweis der Redaktion

  1. Moore’s law explains alot, but not why I went looking like this to that …
  2. Ugh, that makes me feel old
  3. I did NOT work on this at SGI but this was our target at 3dfx, built something close for $500
  4. When people say, well in theory this should work fine, I don’t understand why its not… I use these quotes… I make observations on my code behavior, and then can often predict where the bug is, regardless of what I think about the quality of my code. And I often find it very quickly. Ego-less debugging.