How to Troubleshoot Apps for the Modern Connected Worker
Formal Verification of Programming Languages
1. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Formal Verification of Programming Language
Implementations
Ph.D. Literature Seminar
Jason S. Reich
<jason@cs.york.ac.uk>
University of York
December 8, 2009
2. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Compiling an arithmetic language
Compile from a simple arithmetic language to machine code for a
simple register machine.
Example taken from [McCart67]
3. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Compiling an arithmetic language
Compile from a simple arithmetic language to machine code for a
simple register machine.
Source language
Numeric constants
Variables
Addition
e.g. (x + 3) + (x + (y + 2))
Example taken from [McCart67]
4. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Compiling an arithmetic language
Compile from a simple arithmetic language to machine code for a
simple register machine.
Target language
Source language
Load Immediate into ac
Numeric constants LOAD into ac from
Variables address/register
Addition STOre ac value to
address/register
e.g. (x + 3) + (x + (y + 2))
ADD register value to ac
Example taken from [McCart67]
5. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Compiling an arithmetic language
Arithmetic expression compiler in Haskell
compile : : Source −> I n t −> Target
compile ( Const v ) t = [ L i v ]
compile ( Var x ) t = [ Load x ]
compile (Sum e1 e2 ) t =
c o m p i l e e1 t
++ [ Sto ( "t + " ++ show t ) ]
++ c o m p i l e e2 ( t + 1 )
++ [ Add ( "t + " ++ show t ) ]
6. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Compiling an arithmetic language
When compiled and executed, is the value in the accumulator the
result of the source arithmetic expression?
(x + 3) + (x + (y + 2)) compiled to machine code?
1 LOAD x 8 LOAD y
2 STO t 9 STO t + 2
3 LI 3 10 LI 2
4 ADD t 11 ADD t + 2
5 STO t 12 ADD t + 1
6 LOAD x 13 ADD t
7 STO t + 1
n.b. Where x and y are known memory locations and t + k are registers.
7. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Why use high-level languages?
Rapid development
Easier to understand, maintain and modify
Less likely to make mistakes
Easier to reason about and infer properties
Architecture portability
But...
8. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Can you trust your compiler?
Use a compiler to translate from a high-level language to a
low-level
Compilers are programs (generally) written by people
People make mistakes
Can silently turn “a correct program into an incorrect
executable” [Leroy09]
GHC 6.10.x is ≈ 800, 000 lines of code and has had 737 bugs
reported in the bug tracker as of 04/12/2009 [GHC]
Can we formally verify a compiler?
9. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Can you trust your compiler?
Use a compiler to translate from a high-level language to a
low-level
Compilers are programs (generally) written by people
People make mistakes
Can silently turn “a correct program into an incorrect
executable” [Leroy09]
GHC 6.10.x is ≈ 800, 000 lines of code and has had 737 bugs
reported in the bug tracker as of 04/12/2009 [GHC]
Can we formally verify a compiler?
10. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Can you trust your compiler?
Use a compiler to translate from a high-level language to a
low-level
Compilers are programs (generally) written by people
People make mistakes
Can silently turn “a correct program into an incorrect
executable” [Leroy09]
GHC 6.10.x is ≈ 800, 000 lines of code and has had 737 bugs
reported in the bug tracker as of 04/12/2009 [GHC]
Can we formally verify a compiler?
11. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
McCarthy and Painter, 1967
“Correctness of a compiler for arithmetic expressions”
[McCart67]
Describe, in first-order predicate logic;
Source language semantics
Target language semantics
A compilation process
Reason that the compiler maintains semantic equivalence
12. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
McCarthy and Painter, 1967
Semantic equivalence in [McCart67]
∀e ∈ Expressions, ∀µ : Variable Mappings •
interpret(e, µ) ≡ acValue(emulate(compile(e), mkState(µ)))
Very limited, small toy source and target language
Proof performed by hand
Logical framework and proof presented in under ten pages
Shows that proving a compiler correct is possible
13. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Milner and Weyhrauch, 1972
“Proving compiler correctness in a mechanised logic”
[Milner72]
Provide an LCF machine-checked proof of the
McCarthy-Painter example
Proceed towards mechanically proving a compiler for a more
complex language to a stack machine
Claim to have “no significant doubt that the remainder of the
proof can be done on machine” [Milner72]
14. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Morris, 1973
“Advice on structuring compilers and proving them correct”
[Morris73]
Proves by hand the correctness of a compiler for a source
language that contains assignment, conditionals, loops,
arithmetic, booleans operations and local definitions
“Essence” of the advice presented in [Morris73]
compile
Source language −−→
−− Target language
Target semantics
Source semantics
Source meanings ←−−
−− Target meanings
decode
15. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Thatcher, Wagner and Wright, 1980
Advice presented in [Thatch80]
compile
Source language −−→
−− Target language
Target semantics
Source semantics
Source meanings −−→
−− Target meanings
encode
“More on advice on structuring compilers and proving them
correct” [Thatch80]
Provides a correct compiler for a more advanced target
language than [Morris73]
Claim that mechanised theorem proving tools required further
development
16. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
The “structuring compilers” series
Discuss constructing algebras to describe languages
How to move from one algebra to another
Encode abstract state to concrete or decode to abstract?
“there is not enough information in the [abstract] state to
recover the [concrete] state completely” [Moore89]
Further paper “Even more on advice on structuring compilers
and proving them correct: changing an arrow” [Orejas81]
[Moore89] discusses this issue from a practical perspective
17. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
The “structuring compilers” series
Discuss constructing algebras to describe languages
How to move from one algebra to another
Encode abstract state to concrete or decode to abstract?
“there is not enough information in the [abstract] state to
recover the [concrete] state completely” [Moore89]
Further paper “Even more on advice on structuring compilers
and proving them correct: changing an arrow” [Orejas81]
[Moore89] discusses this issue from a practical perspective
18. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
The “structuring compilers” series
Discuss constructing algebras to describe languages
How to move from one algebra to another
Encode abstract state to concrete or decode to abstract?
“there is not enough information in the [abstract] state to
recover the [concrete] state completely” [Moore89]
Further paper “Even more on advice on structuring compilers
and proving them correct: changing an arrow” [Orejas81]
[Moore89] discusses this issue from a practical perspective
19. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Meijer, 1994
“More advice on proving a compiler correct: Improve a correct
compiler” [Meijer94]
Given a interpreter for a source language, can we transform it
into a compiler to and residual interpreter for the target
language?
A functional decomposition problem (i.e.
interpreter = emulator ◦ compiler )
Demonstrate this technique for a first-order imperative
language compiling to a three-address code machine
While quite feasible for first-order languages, becomes far
more difficult for higher-order languages
20. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Berghofer and Stecker, 2003
“Extracting a formally verified, fully executable compiler from
a proof assistant” [Bergho03]
Proves a compiler for a subset of the Java source language to
Java bytecode
Includes typechecking, abstract syntax tree annotation and
bytecode translation
Isabelle/HOL used to prove properties about an abstract
compiler
Isabelle code extraction to produce an executable compiler
21. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Dave, 2003
Papers listed against decade published
Maulik A. Dave’s
bibliography for “Compiler
Verification” [Dave03]
Ninety-nine papers listed
Ninety-one of those listed
were published after 1990
Interestingly neither the
Milner and Weyhrauch paper
nor the Meijer are included
22. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Dave, 2003
Papers listed against decade published
Maulik A. Dave’s
bibliography for “Compiler
Verification” [Dave03]
Ninety-nine papers listed
Ninety-one of those listed
were published after 1990
Interestingly neither the
Milner and Weyhrauch paper
nor the Meijer are included
23. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Dave, 2003
Papers listed against decade published
Maulik A. Dave’s
bibliography for “Compiler
Verification” [Dave03]
Ninety-nine papers listed
Ninety-one of those listed
were published after 1990
Interestingly neither the
Milner and Weyhrauch paper
nor the Meijer are included
24. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Recent work
Leroy’s “A formally verified compiler back-end” [Leroy09]
Proves a compiler for Cminor to PowerPC assembler
Chlipala’s “A verified compiler for an impure functional
language” [Chlipa10]
For a toy (but still quite feature rich) functional source
language to instructions register-based machine
Both use the Coq proof assistant and code extraction
Both decompose the problem into compilation to several
intermediate languages
Both express worries that the proof assistant itself contain
bugs that would invalidate correctness
25. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Conclusions
Compilers have been proved correct for progressively larger
source languages
Rapidly became apparent that some kind of proof assistant is
required
Decomposition of large compilers is a key factor for success
Programs are only verified when all surrounding elements are
verified
26. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
Open questions
What about compilers for larger target languages and more
advanced compilation facilities?
Are our mechanised assistants producing valid proofs?
Are there other ways to decompose the problem?
Are particular language paradigms more amenable to compiler
verification?
Why haven’t the concepts of [Meijer94] been more widely
used?
What other ways are there of decomposing the compiler
verification problem?
27. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions
More information
Slides and bibliography will be made available at;
http://www-users.cs.york.ac.uk/~jason/
Jason S. Reich
<jason@cs.york.ac.uk>