3dfx, nvidia, Moore's Law and more...

NST 121
Computer Systems
Fundamentals
INTRODUCTION TO COMPUTERS

Gary Tarolli - 3dfx and Nvidia
3D Graphics Engineer
Monday, April 27

3D Graphics from my career perspective
1974-1978 BS. Math RPI (minor in CS)
1979-1980 MS CS Caltech
1980-1983 Digital Equipment Corp
1984-1992 Silicon Graphics, Inc
1992-1993 consulting
1993-2000 3dfx
2000- nvidia

or “Moore’s Law viewed from my career”
Moore’s law at 50 (years) publication came in the mail last week …
Various articles in the news too … should we throw a party or a wake ?

Moore’s law in action over 4 decades
Moore’s Law : http://www.mooreslaw.org
The most popular formulation is :
the number of transistors on and integrated circuit
doubles about every two years. (same size chip)
e.g. 500nm to 350nm is sqrt(2) shrink on one side
of a chip, so square = 2x as dense (# transistors)
Note: in addition the clock speed increases
and the chip area increases (better manufacturing)
Cost per transistor or performance drops!

Result: trends over 4 decades …
Mainframe (IBM) => minicomputer (DEC) => workstation (SGI) => PC (3dfx)
The rise of importance of 3D graphics and hence graphics chips
Consolidation in the 3d graphics industry
◦ ~40 3d graphics chip startups in 1994
◦ Only a few independent companies left : nvidia, Imagination Technologies (Power VR)
◦ 2 cpu/system companies : Intel, AMD , Apple
Surprise: graphics chips power supercomputers
Surprise: cars
◦ 8 million cars with nvidia chips in them, many more coming
◦ Self driving cars are coming: enabled by supercomputing power in cheap chips
Surprise: deep neural net learning enabled by this computing power is exploding

Coming soon … ???
The Age of Intelligent Machines by Ray Kurzweil
http://en.wikipedia.org/wiki/The_Singularity_Is_Near
You probably don’t believe this now,
see if you do in an hour …
So let’s begin the journey …

1974-1978 : BS. Math & CS RPI
1974 – my first calculator : HP-35 purchased for college ($270? – a few weeks salary)
1975 – my first computer program on an IBM 360 mainframe
(using my friends engineering account)

1979-1980 : MS CS Caltech
1979 – played networked Star Trek on Xerox Alto : black and white bit-mapped graphics
until 4am , living off of $.25 ice cream sandwiches

1979-1980 : MS CS Caltech …
Worked on VLSI CAD tools for custom chips, humans draw every single wire for every single
transistor on a chip
inverter inverter

1979-1980 : MS CS Caltech …
MIT class projects in 1978

1980-1983 : DEC (minicomputer) #93246
CPUS were still many boards of logic
I worked on VLSI CAD tools so we could design a single chip VAX, called microVAX
And go
from this :
A refrigerator filled with boards …

1980-1983 : DEC (minicomputer) …
To this …

1984-1992 : SGI(workstation) #36
IRIS 1000 workstation (1984) : $10,000 to $30,000 - 8 MHz Motorola 68010
IRIS 1400 workstation: ran at 10 MHz , had 1.5 MB of RAM and a 73 MB disk drive
My other claim to fame: http://en.wikipedia.org/wiki/SGI_Dogfight

1984-1992 : SGI, Silicon Graphics, Inc …
IRIS Indigo (1992) : $6000 - 33 MHz MIPS R3000
◦ 100k lines/sec, 10k triangles/sec
◦ Almost all of SGI GL library implemented in software on MIPs

1991: IRIS vision: $4000 board set for the PC, ISA and microchannel
◦ http://en.wikipedia.org/wiki/IrisVision
Intel 486 and bus architecture just too slow, so died in obscurity …
But a few of us (Sellers, Smith, Tarolli, aka SST) and others realized what was coming
… faster Pentiums, Moore’s law (smaller, denser chips) , PCI bus ….
and that SGI would be out of business some day if it didn’t transform itself
But going from 80% margins to 20% margins is not easy to swallow. They did not …
we voted with our feet and left (along with others who went to Nvidia and elsewhere)
and they paid the price…by 2000 SGI was in decline … died in 2009 … about 20 years later …
$0 to $5 billion back to $0

Onyx Reality Engine (1992) : $50,000 to $80,000 – 100 MHz R4400
Beautiful real-time texture mapped graphics (divide per pixel)
◦ 1M triangles/sec, 100 Mpixels/sec

1993-2000 : 3Dfx (PC) employee #1
Why:
◦ Entrepreneurs – eventually need to start their own company (and hopefully get rich in the process)
◦ We saw a problem within SGI, and an opportunity in 3d PC graphics
◦ Engineers – we saw a cool problem and wanted to solve it
◦ We realized the gaming market was a lot bigger than anyone knew
◦ ~$5B at the time, almost as big as movie industry
◦ Today it is MUCH larger, over $100B worldwide for all games, dwarfs the movie industry
Goal:
◦ Produce similar images as Reality Engine for $500 in real-time, i.e. 30 fps
◦ Similar means reduced quality (less bit depth) but still excellent
Activation energy: Caroline said “Just do it” one day

1993-2000 : 3Dfx (PC) …
How:
◦ Take maximum usage of just arriving technology
◦ Aim high – don’t sacrifice quality, do the entire Reality Engine pipeline at full speed
◦ Make it easy to program , no difficult choices : e.g. trading off speed for quality
◦ Included ALL the important features of Reality Engine: shading, zbuffering, alpha-blending, fog, quality texturing and filtering
◦ Listened to game developers and professionals – tech. advisory board
◦ John Carmack (id)
◦ Tim Sweeney (Epic)
◦ Tom Porter (Pixar)
A bit of luck, ok a lot?
◦ $500 too costly for consumer market, so we targeted the arcades
◦ And 3dfx ended up in various arcade machines, SF Rush, Gretzky Hockey, NFL Blitz, Mace, etc.
◦ Memory prices fell dramatically resulting in a $300 board and enabled the consumer market

1993-2000 : 3Dfx (PC) …
Key to quality texture mapping is per-pixel divide
◦ Very costly
◦ Key is to be just good enough
◦ We didn’t need 32 bit results, only about 18-20 bits
◦ Just enough to not be visually distracting
◦ So we used a table lookup, and then linear interpolation (which helped a lot)
◦ Remember those sin/cos/tan tables in high school trig? Same basic idea
◦ 6 bit index (64 entries, 15 bits wide, ends up in a PLA optimized ROM)
◦ 4 bit interpolation, adds another 3-4 bits
◦ Input is float, so shift result by exponent since log(1/x) = -log(x) = -exponent(x) in float representation
Simplify full equations using math, e.g. LOD = .5 * Log2 ( sqrt(dsdx2 + dsdy2))
◦ Log2 (sqrt(x)) = .5 * Log2 (x)

1993-2000 : 3Dfx …
C simulator
◦ Very fast bit accurate simulator for the chip
◦ 10k to 50k lines of C code
◦ Can research algorithms quickly
◦ Up and running well before RTL simulator
◦ You can develop software and hardware tests on C simulator
RTL simulator
◦ Verilog
Before tapeout, we compare C vs Verilog results for chip functional tests that we write
Story time : code then test, vs test then code

1993-2000 : 3Dfx… debugging
Yogi Berra: In theory there is no difference between theory and practice. In practice there is.
From Bandits? : Always expect the unexpected, except of course the truly unexpected …
Me: If you cannot believe there is a bug (in your code), then you will never find it.

1993-2000 : 3Dfx Voodoo 1
Voodoo 1 – 50 Mhz chip, 500 nm chip, 50 Mhz mem (4MB), 50 Mpixels/sec
◦ Each chip was ~1 million transistors, 250k gates

1993-2000 : 3Dfx Voodoo 1
System architecture – perhaps my best work ever (along with Scott Sellers)

1993-2000 : 3Dfx Voodoo 1 results
Images tell the story … compared to Reality Engine …

1993-2000 : 3Dfx Voodoo 2 , 3
Voodoo 3 : ~4 years after Voodoo 1
1 chip vs 2-3 chips
Density: 250 nm vs 500 nm = 4x more logic (2x went to reduce the chip count)
Clock rate: 50 Mhz to 200 Mhz
Memory: 50 Mhz to 166 Mhz , 4 MB to 16 MB
https://en.wikipedia.org/wiki/Comparison_of_3dfx_graphics_processing_units

2000-now : nvidia
We goofed, missed a product cycle/schedule, tactical and strategic mistakes and poof!
◦ Another one bites the dust
One strategic mistake – we did not put T&L on a chip until too late
◦ our next product had T&L , but it was still in the lab
◦ I thought CPU companies (Intel, IBM, AMD) had more at stake in floating point than we did
◦ They peaked out at 8-16 cores, and IEEE float performance was not their #1 priority
◦ GPUs became more important than I think anyone ever thought (we didn’t truly believe ourselves?)
◦ Enabled high $$$ investment in GPU floating point, where I thought it would end up on CPU
◦ Supercomputer speed floating point is basically for free on a GPU
◦ 80% of the GPU area is just a massively parallel SIMD floating point supercomputer
◦ Many times more powerful than the early CRAY supercomputers

2000-now : nvidia Titan X
Unreal Engine demo: http://content.jwplatform.com/previews/tDgR1DxI-sy1F28d9
4x8 green dots = one SM (SIMD cpu)
3072 of them on the die
Each is ~Voodoo 2 or more

2000-now : 1995 + 20 years = 2015
over 20 years Moore’s law says we should expect 2**10 increase or 1000x
Voodoo 1 Titan X x increase
Transistors 2 M (2 chips) 8000 M 4000
Cores 1 2000-3000 2500
Technology 500 nm 28 nm 300
Area 100 mm2 600 mm2 6
Triangles/sec 1 M 6000 M 6000
Mpixels/sec 100M 100,000 M 1000
Ops/sec 5 B (8b) 7000 B (32b ieee) 1000
Memory b/w < 1 GB/sec 340 GB/sec 400
Power 4 watts 250 watts (the price you pay)
Frequency 50 Mhz 1000 Mhz 20
Memory 4 MB 12,000 MB 3000
Cost $500 $1000 2
Design 5 man years ($5M) >500 man years ($500M) 100

CPUs vs GPUS
Graphics is embarrassingly parallel ! (millions of pixels on the screen)
◦ Which is why 1000-3000 cores can be efficient
◦ If your PC has 1000-3000 cores, what would they do?
PIXAR field trip (while at 3dfx)
◦ Server room full of Sun workstations
◦ Limit is how much computing power you can fit in that physical room (and A/C)
Supercomputers
◦ Super computers are often limited to a power budget in MWatts for cpus and A/C
◦ Once GPUs were general enough and supported 32b and 64b IEEE floating point ….

2000-now : 3dfx + nvidia … looking back
Need I say more:
1995: 0% of consumer PCs have 3d graphics accelerators
2015: 100% penetration (embedded accelerator in all Intel and AMD chips)

Deep neural net analysis, deep learning
Is this the key to Artificial Intelligence becoming real?
Intel 16 core XEON = 43 days to train a DNN problem
Titan-X = 1.5 days
Next year < 1 day
5 years … 1 hour (with software advances)
20 years … 1 sec to 1 minute ?

Coming soon … ???
The Age of Intelligent Machines by Ray Kurzweil
Now do you believe?
Is Artificial Intelligence really almost here?
GPU Fanatic (last week this came in my nvidia email)
Ray Kurzweil, a renowned futurist and the director of engineering at Google:
“…the hardware needed to emulate the human brain may be ready even
sooner than he predicted — in around 2020 — using technologies such as
graphics processing units (GPUs), which are ideal for brain-software
algorithms.” (Washington Post, 4/23/14)

Self promoting Links:
http://www.thedodgegarage.com/3dfx/
https://en.wikipedia.org/wiki/3dfx_Interactive
simply google everything else, e.g. deep learning
(that’s what I did)

3dfx, nvidia, Moore's Law and more...

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie 3dfx, nvidia, Moore's Law and more...

Ähnlich wie 3dfx, nvidia, Moore's Law and more... (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

3dfx, nvidia, Moore's Law and more...

Hinweis der Redaktion