HIS 2015: Prof. Ian Phillips - Stronger than its weakest link
1. 1
Stronger than its weakest link
High Integrity So.ware Conference (HIS'15)
5nov15: Bristol.
Pdf & SlideCast @ hCp://ianp24.blogspot.com
Opinions expressed are my own ...
Prof. Ian Phillips
Principal Staff Engineer
ARM Ltd
ian.phillips@arm.com
Visiting Prof. at ...
Contribution to
Industry Award 2008
2v0
2. 2
High Integrity Software !?
..Or..
"The scienNfic method assumes that a system with perfect integrity yields a singular
extrapolaNon within its domain that one can test against observed results" (Wikipedia)
§ Is So.ware the weakest link in
High Integrity Systems ?
§ Such that improving it is all that's
necessary to produce High
Integrity Systems?
§ When we say So.ware are we are
actually thinking ComputaNon?
§ But Computa;on is about results not about
implementa;on technologies!
3. 3
We know what Proper Computing is...
§ HPC and Mainframe ... maybe Worksta2on
§ But not really Laptop or ... (Heaven forbid) a Pocketable?
4. 4
Graham's Orrery - c1700
§ A machine to Compute the posi;on of the planets
§ Single-Task, Con;nuous Time, Analogue, Mechanical, Computer (With backlash!)
George Graham. Clock-Maker (1674-1751)
5. 5
Amsler’s Planimeter - c1856
Planimeter 2015 !
§ A Machine for Compu;ng the Area of an arbitrary 2D shape
§ Technology: Precision Mechanics, Analogue
§ Available today ... Electronically enhanced
Jakob Amsler-Laffon. Mathematician,
physicist, engineer (1823-1912)
6. 6
IN (x)
Enumerated
Phenomena
OUT (y)
Processed Data/
Information
y=F(x)
§ State (s) and Time (t) are implicit or explicit variables in this
§ And so are Accuracy (a), Reliability (r) and Cost ($)
§ All of which can be balanced (Architected) to meet End-Customer needs
§ Exceeding needs almost always 'costs' more!
... Technologies and Methodologies just offer 'star$' op;ons over basic func;onality
... Not all of which will be commercially valuable
Computing is solving a Model of a Subset of Reality ...
Fast enough to be useful and affordable by its customer
y=F(x,s,t,a,r,$)
10. 10
The Invisible Face of Computing Today
Unrecognised but Vital ... All need to be Dependable
11. 11
The Visible Face of Computing Today
EssenNal but not Vital ... But BIG-BIG-BIG $
12. 12
§ Digital Electronics
§ So.ware
§ Memory
§ OpNcs
§ Analogue Electronic
§ Sensors/Transducers
§ Mechanics
§ Micro-Motors
§ Displays
§ Discharge Tube
§ RoboNc Assembly
§ PlasNc, Metal, Glass
Input: Image(Light) => Compute (Process Image) => Output: SD Card (Electrons)
... Many Technologies seamlessly coopera;ng, to Enhance Human Memory
... Tradi;onal siloes (inc. SW and HW) are just a means to this end!
Electronic System (Cyber-physical System) - c2015
Incorporating DIGIC5+ (ARM)
System-Level
Computation
‘Classic’
Computer
13. 13
Human Population
Computing for the Masses ...
... Technology Products are Increasingly ‘Intelligent’
1970 1980 1990 2000 2010 2020 2030
Main Frame
Mini Computer
Personal Computer
Desktop Internet
Mobile Internet
Millionsof
Units
1st Era
Select work-tasks
2nd Era
Broad-based computing
for specific tasks
3rd Era
Computing as part
of our lives
Technology is the Driver
Consumer is the Driver
... Old Markets are s;ll there; but don't drive the Technology today!
15. 15
Typical 2015 Computing Platform
Exynos 5422
Eight 32 bit CPUs (big.LITTLE):
• Four big (2.1GHz ARM A15) for
heavy tasks;
• Four small (1.5GHz ARM A7)
for lighter tasks.
+ Nine Mali GPU cores ...
... A ~30 Core Heterogeneous Mul;-Processor ... In your Shirt Pocket!
One Board ...
21 significant ‘Chips’
16. 16
2010:Apple’s A4 SIP Package (Cross-sec;on)
IC Packaging Technology
§ The processor is the centre rectangle. The silver circles beneath it are solder balls.
§ Two rectangles above are RAM die, offset to make room for the wirebonds.
§ Pufng the RAM close to the processor reduces latency, making RAM
faster and reduces power consumpNon ... But increases cost.
§ Memory: Unknown
§ Processor: Samsung/Apple (ARM Processor)
§ Packaging: Unknown (SIP Technology)
Source ... http://www.ifixit.com
Processor SOC Die
2 Memory Dies
Glue
Memory
‘Package’
4-Layer Platform
Package’
Steve Jobs WWDC 2010
17. 17
2013: Samsung Solid-State Memory
§ Smart Memory (eMMC)
§ 16-128Gb in a single package
§ 8Gb/die. Stacked 2-16 die/package
§ Handles errors in the API (Smart Interface)
§ Package just 1.4mm thick! (11.5x13x1.4mm)
... Smaller than a postage stamp
19. 19
§ They sell things that Their Customers desire and can afford
§ To sa;sfy the End-Customers needs ... In an End-Product which may be several ‘layers’ above them.
§ Focus on their Core Competencies as a Component Provider in a Global Market
§ Avoid CommodiNsaNon by DifferenNaNon
§ Improved Cost and Quality (by improving Process) ..and..
§ Improved Business-Models (which make the Money) ..and..
§ Improved Func;onality (by new Technology and Methods)
§ But New Product Development is a Cost and a Risk to be Minimised
§ Technology (HW, SW, Mechanics, Op;cs, Graphene, etc) just enables Op;ons!
§ New-Technology may cost more (including risk) than it delivers in Product Value!
§ Over-Design costs ... Business can't afford the Precau;onary Principle!
... Because successful End-Products fund their en;re (RD&I) Value-Chains
... Reuse of their Technologies become economic necessity in other markets!
Computing Technologies in Business Context
Businesses have to be Competitive, Money Making Machines today ...
20. 20
Component and Sub-Systems from Global Enterprise ...
... Global Teams contributing Specialist Knowledge & Knowhow
§ Apple ID’d 159 Tier-1 Suppliers ...
§ Thousands of Engineers Globally
§ Est. 10x Tier-2 Suppliers ...
§ Including Virtual Components1 and
Sub-Systems (ARM and other IP Providers)
§ Mul;ple Technologies ...
§ Hardware, Sojware, Op;cs,
Mechanics, Acous;cs, RF, Plas;cs, etc
§ Manufacturing, Test, Qualifica;on,
etc.
§ Methods, Tools, Training, etc
§ Tens of thousands Engineers Globally
... More than 90% of Technology and
Methods are Reused (produc;vity)!
1: Virtual Components do not appear on BOM
21. 21
§ But the only way to economically realise this potenNal is by product evoluNon;
reusing and reusing again the work of our technical predecessors ...
§ Hardware, SoHware and other Technologies; Methods and Tools; and throughout the stack
§ In-Company: Sourced and Evolved from Predecessor Products
§ Ex-Company: Sourced from businesses with Specialist Knowledge/Experiance
§ Reuse Improves Quality; as objects are designed more carefully, and bug-fixes are incremental
§ Reuse Improves ProducLvity; as objects can be deployed without understand their implementa;on
technology (or its limita;ons)
... It delivers working systems quickly with finite teams; but the dependability cannot be quan;fied!
... Despite this, Commercial Technologies will be used in Systems on which people Depend
§ The cost of alternaLves will be several orders of magnitude too great
§ The issue is (just) making dependable systems using undependable components
Designer Productivity has become the Limiting Factor
The Customer Expectation of the Billions of available Transistors is irresistible!
22. 22
ARM: Delivers Reuse-Based Productivity ...
.... 24 Processors in 6 Families for different Applica;on Domains
About 50MTr
About 50KTr
23. 23
...Tools to create optimal Hetrogeneous Multi-Processors ...
ACE
ACE
NIC-400 Network Interconnect
Flash GPIO
NIC-400
USBQuad
Cortex-
A15
L2 cache
Interrupt Control
CoreLink™
DMC-520
x72
DDR4-3200
PHY
AHB
Snoop
Filter
Quad
Cortex-
A15
L2 cache
Quad
Cortex-
A15
L2 cache
Quad
Cortex-
A15
L2 cache
CoreLink™
DMC-520
x72
DDR4-3200
8-16MB L3 cache
PCIe
10-40
GbE
DPI Crypto
CoreLink™ CCN-504 Cache Coherent Network
IO Virtualisation with System MMU
DSP
DSP
DSP
SATA
Dual channel
DDR3/4 x72
Up to 4 cores
per cluster
Up to 4
coherent
clusters
Integrated
L3 cache
Up to 18 AMBA
interfaces for
I/O coherent
accelerators
and IO
Peripheral address space
Heterogeneous processors – CPU, GPU, DSP and
accelerators
Virtualized Interrupts
Uniform
System
memory
24. 24
… Other Tools, Libraries and Partners to Realize the Potential
§ Technology to build Electronic System solu2ons:
§ SoHware, Drivers, OS-Ports, Tools, ULliLes to create
efficient system with op;mized sojware solu;ons
§ Diverse Physical Components, including CPU and GPU
processors designed for specific tasks
§ Interconnect System IP delivering coherency and the
quality of service required for lowest memory bandwidth
§ OpLmised Cell-Libraries for a highly op;mized SoC
implementa;ons
§ Well Connected to Partners in the Life-Cycle:
§ For complementary tools and methods required by
System Developers
§ Global Technology Global Partners:
§ >900 Licences; Millions of Developers
25. 25
Are the Outcomes of this 'chain' Dependable?
Evidently so:They are Functional and Dependable enough to satisfy Billions/yr!
(2Q2015)
Smart-Phone shipments 2Q15 - 185 million (~0.75B/yr)
... The probability of a 'fairly reliable' systems failing, when you need to use it
for 'improbable' event, is 'highly improbable' ... And mostly this is enough
26. 26
‘OpNmal’ Plaporm
HW1" HW2" HW3" HW4"
Hardware Interface"
RTOS/Drivers"
Thread"
Bus(es) Processor(s)
F1"
F2"
F3"
F4"
F5"
Create FuncNonal-Model1 on a 'Generic' Plaporm
(F1)! (F3)!
(F5)!(F2)!
Evolving the Model (& Plaporm) unNl FuncNonal
and Non-FuncNonal, Performance is Adequate.
NOTE: 'Final SW' is sNll a Model of Behaviour!
Design is Transforming a Model of Behaviour ...
... evolving a Mathematical Model to meet Non-Functional Constraints
Transform to a FuncNonal-Model on an 'OpNmal' (HW/SW) Plaporm
1: This includes a Model of Execu;on such as a Java VM.
27. 27
§ All models are a simplificaNon of reality; therefore they all have limitaNons
§ "All models are wrong, but some are useful" (G.E.Box)
§ Normal So.ware Design Methods are create-it-wrong, test-it-right ...
§ Quality is established by Test; and bug-fixes/patches in the field (An inherently poor method)
§ Sojware Reuse offers hugely improved ProducLvity (Not-using it is not an op;on)
§ Sojware Reuse offers improved Quality (But over what?)
§ ExaminaNon shows that all code has high residual errors ...
§ Well structured and tested Source-Code has ~5 errors per 1,000 lines of code (E-KLOC)
§ Commercial code is typically ~5x worse than this
§ Most errors are harmless – But there is no useful correla;on
§ Formal-Methods are beRer; but cost is high if you can't uNlise (normal) legacy code.
§ But Even 'Perfect-Sojware' s;ll has to execute on an Imperfect-Plauorm
... "YES!": But Good-Enough sa;sfies the Commercial Impera;ve for most applica;ons
Is Software (Logic) Inherently Undependable?
Software is a Model of Reality, executing on a Hardware and Software Platform
28. 28
Open Source is Dependable?
"Somebody will see the bugs!" (But only if they look!)
1: http://www.wired.com/2014/04/heartbleedslesson/
2: http://veridicalsystems.com/blog/of-money-responsibility-and-pride/
“It is now very clear that
OpenSSL development could
benefit from dedicated full-Nme,
properly funded developers”
“OSF typically receives only
$2,000 a year in donaNons”
§ OpenSSL HeartBleed bug (2014) 1
§ Update was received just before a Public Holiday
§ Editor was a known and high-quality source
§ Code was reviewed informally and released
§ Editor was conflicted with day-job, family and holiday pressure 2
§ Too lixle resources to do a proper job.
§ This was a classic E-KLOC error ...
§ Not a Coding, Formayng, or Func;onal error
§ It was a System error (an omission in a non-func;onal aspect of the code).
... Was the ‘fault’ with the sojware Source (OpenSSL Sojware Founda;on (OSF)) ?
... Or a User Community too-ready to believe in the Myth of Open Source sojware?
30. 30
MiNgaNng this we have ...
§ Weak Transistors: Not all ...
§ Are at 70 degC even if the die is (But some will be higher)
§ Are Minimum Size (Larger ‘area’ reduces variability)
§ Are on Cri;cal Paths; and the probability of there being more than one on a path is low!
§ CMOS Logic: Is very robust and will conNnue to funcLon with out-of-spec transistors
§ Leaky Gates and Faster Transi;ons are seldom func;onal failures (but they do hit reliability!)
§ Speed varia;ons on a path average out (on average!)
§ Errors are frequently difficult to detect (and thus correct!)
§ Memory: Analogue Circuits are much more sensiNve to transistor variaNon. But ...
§ Failures are easier to detect (and work around)
§ Spare rows/columns are included to fix manufacturing (sta;c) defects ... but not dynamic (use)
§ NV-M limited write-cycles and bit failures are shielded by their smart API ... to some degree.
... Hardware failure is not always easily spoxed at the func;onal level!
So is Hardware (Logic) Dependable? 2/3
31. 31
§ And we haven't included imponderables ...
§ Internally and Externally generated noise? (Greater suscep;bility at lower voltages)
§ High-energy par;cles? (Greater suscep;bility at smaller geometries)
§ Wear-out: Vt/Gain drij and Electro Migra;on? (Greater suscep;bility at smaller geometries)
§ Local Hot-Spots? (140C is not uncommon on chip)
§ Limita;ons of Verifica;on and Test (State-Space explora;on is always a sub-set)
§ We are repeatedly mulNplying Nny-improbables, by ever larger-numbers ...
§ And many of the values are only guesses!
§ We have no real idea about the reliability/dependability of modern Systems or Components
§ But we know that as process geometries shrink, SuscepNbility will get worse ...
§ Chips will get ever more complex (and more chips will be used in more complex Systems)
§ Transistors will get smaller and Designers will erode safety margins to get performance
... Despite this; Chips and Systems do Yield more than we would rightly expect ...
... So we must be u;lising Unknown Safety Margins!
So is Hardware (Logic) Dependable? 3/3
32. 32
Killing a Sacred Cow: SW and HW Logic are the Same
...They have different characteristics, so choice is a System Architectural decision!
// A master-slave type D-Flip Flop
module flop (data, clock, clear, q, qb);
input data, clock, clear;
output q, qb;
// primitive #delay instance-name
// (output, input1, input2, .....),
nand #10 nd1 (a, data, clock, clear),
nd2 (b, ndata, clock),
nd4 (d, c, b, clear),
nd5 (e, c, nclock),
nd6 (f, d, nclock),
nd8 (qb, q, f, clear);
nand #9 nd3 (c, a, d),
nd7 (q, e, qb);
not #10 inv1 (ndata, data),
inv2 (nclock, clock);
endmodule
'Hardware' Language (Verilog) 'Software' Language (C)
#include<time.h>
/* Use the PC's timer to check */
/* processing time */
main()
{
clock_t time,deltime;
long junk,i;
float secs;
LOOP:
printf("input loop count: ");
scanf("%ld",&junk);
time = clock();
for(i=0;i<junk;i++)
deltime = clock() - time;
secs = (float) deltime/CLOCKS_PER
printf("for %ld loops, #tics = %
%fn",junk,deltime,secs);
goto LOOP;
...
Target Platform
CMOS -------- CPU
Target Architecture Info
Compilers
HW ----------- SW
Configuration Files
HW -------------- SW
35. 35
§ System-Level Dependability is what maCers ...
§ Component and Sub-System dependability is inherently poor (and will get worse).
§ ProducNvity demands that Dependable Systems must Reuse Components and Sub-
Systems (Physical and Virtual); and the affordable ones are of Commercial quality!
§ Clean-Sheet design is not an op;on for almost all complex products!
... the cost-is-no-object customer is an endangered specie
§ Increasing the Dependability of Components and Sub-Systems helps; but can never be enough
§ ARM product is really; 'Enhanced Reuse for Electronic System Design and Manufacture'
... The Only Place to implement System-Level Dependability on an Undependable
Plauorm, is at the System-Layer!
§ Reliable components and sub-systems will help, but cannot ever be enough
§ Predominantly a 'So.ware' challenge; but not alone (Don't forget the simple Watch-Dog)
Dependable on Undependable
Any Methods that are based on perfection in HW or SW are untenable ...
36. 36
The Real Conclusions
§ Systems are what End-Customers buy; they expect them to be Dependable Enough
§ A subjec;ve concept; which is Applica;on, State and Context dependent (& Technology independent)
§ Commercial Components (HW/SW) will be the building blocks of Dependable Systems
§ Commercial use gives us the Technologies which we are economically bound to use today
§ Though they work bexer than we would rightly expect, we cannot quan;fy their quality
§ Improving their Quality/Reliability/Dependability helps; but 100% is an asympto;c goal!
§ The System Knows what the System Wants
§ So: System behaviour and robustness must be handled at the System-Level (Top-Level);
only it can know the expected ac;on and appropriate correc;ve ac;on for its domain.
§ And: Because of the size of the Func;onal and Non-Func;onal Space, conformance cannot be
measured; so it will require a Policy Based approach.
... Meanwhile systems that people depend on will be produced
... The Commercial Impera;ve can’t/won't wait for the 'right methodology'
37. 37
The END IsVery Nigh ...
Pdf & SlideCast through http://ianp24.blogspot.com