The document discusses generating efficient VHDL implementations from Cryptol specifications. It covers translating Cryptol to VHDL, using formal methods to check safety and equivalence, and optimizing performance through techniques like lifting functions to a stream model, using block RAM, and adding pragmas for parallelism and pipelining. The toolset allows quickly generating verified and optimized hardware from mathematical descriptions in Cryptol.
Cryptol VHDL Guide: Efficient and Equivalent Implementation
1. The Cryptol Epilogue:
Swift and Bulletproof VHDL
Pedro Pereira Ulisses Costa
Formal Methods in Software Engineering
June 18, 2009
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
2. Last milestone’s recap!
We had to
Generate an efficient and equivalent C implementation
We showed you
The first part of the user’s guide to the toolset
Cryptol → C conversion
An introduction to the Formal Methods’ subset
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
3. This time
We had to
Generate an efficient and equivalent VHDL implementation
We will show you
The last part of the user’s guide to the toolset ⇒ remaining
interpreter modes
Cryptol → VHDL conversion
Hardware performance analysis
Real application of the Formal Methods’ suite
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
4. Intermediate Representation
IR is what Cryptol generates after parsing + type-checking
Format between the Abstract Syntax Tree and all the other
backends
Explicitly annotated with types ⇒ allows for type-directed
evaluation/translation in backends
Can be viewed using the :def command
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
5. Relevant Interpreter modes for Hardware design
Symbolic
Performs symbolic interpretation on the IR
LLSPIR
Compiles to LLSPIR, optimizing the circuit, and also provides
rough profiling information of the final circuit
VHDL
Compiles to LLSPIR and then translates to VHDL, useful for
generating VHDL that is manually integrated into another design
FPGA
Compiles to LLSPIR, translates to VHDL and uses external tools
to synthesize the VHDL to an architecture dependent netlist
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
6. Cryptol → VHDL conversion
Step 1
Remove constructs from specialized Cryptol implementation which
are unsupported in the FPGA compiler
Step 2
Convert top-level function to stream model for performance
analysis
Step 3
Adjust implementation according to space and time requirements
Step 4
Use reg pragma to pipeline the implementation
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
7. Step 1: FPGA backend limitations
The following are not supported
Division by powers of other than 2 (hardware’s limitation)
Recursive functions (recursive streams are fine)
High-order functions (partially, since functions are allowed to
be passed as parameters but cannot be returned)
These limitations rarely are a problem; in fact, only the second one
applied to our specification and was easily resolved.
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
8. Formal Methods to the rescue!
Let’s continue, but first...
Is our implementation
Safe ?
Correct ?
Equivalent ?
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
9. Safety Checking
:safe command
No evil zeroes
No illegal index accesses
And more but these are sufficient
snow3g v0.95> :set sbv
snow3g v0.95> :safe encrypt
“encrypt” is safe; no safety violations exist.
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
10. Theorem Proving
:prove command
Theorems are boolean functions
Proves theorem is equivalent to the function that always
returns true regardless of its inputs
plaintext ⇔ decrypt . encrypt
theorem EncDec : { pt k i }. pt == decrypt ( encrypt ( pt , k ,
i), k, i);
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
11. Theorem Proving
JAIG
snow3g v0.95> :prove EncDec
Generating formal model of EncDec
Generating formal model of f where f : ([4][32],[4][32],[4][32]) ->
Bit; f x = True;
37.519% (01:19:16 ETA)
JAIG eventually froze and crashed.
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
12. Theorem Proving
ABC
snow3g v0.95> :set abc
snow3g v0.95> :set symbolic +v
snow3g v0.95> :prove EncDec
Generating formal model of EncDec
Generating formal model of f where f : ([4][32],[4][32],[4][32]) ->
Bit; f x = True;
Q.E.D.
ABC took 2 minutes to finish the proof.
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
13. Equivalence Checking
:eq command
Works with an incremental development model: successive
versions of an algorithm can be proven equivalent to a
previous specification ⇒ stepwise-refinement approach
Checks whether Cryptol’s translation to another language
remains formally equivalent ⇒ Cryptol → VHDL for instance
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
14. Equivalence Checking
Step 1 - :fm command
snow3g v0.95> :set abc
snow3g v0.95> :set symbolic +v
snow3g v0.95> :fm encrypt ”./enc.aig”
Generating formal model of encrypt: ./enc.aig
Step 2 - :eq command
snow3g v0.95> :set LLSPIR
snow3g v0.95> :eq encrypt ”./enc.aig”
True
Took less than 5 minutes to finish.
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
15. Checkpoint
Our implementation is
Safe
Correct
Equivalent
What about efficiency?
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
16. Technical Jargon
Clockrate
Rate of clock cycles per second on the FPGA measured in Hz
Latency or Propagation delay
Amount of time between inputs fed to the circuit and
corresponding outputs measured in number of clock cycles or
seconds respectively
Output rate
Indicates how long one must wait before feeding input into the
circuit to produce output and is measured in inverse clock cycles
Throughput
Amount of information that is output from the circuit per unit of
time measured in bits/second
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
17. Circuit representations: Combinatorial vs Sequential
Combinatorial circuit
Output is a pure function of present input and has no state
Unclocked
Sequential circuit
Output depends on past inputs or state
Clocked or Unclocked
Practical computer circuits contain a mixture of combinational and
sequential logic
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
18. Circuit representations: Combinatorial vs Sequential
Combinatorial circuit
adderC : ([8] ,[8]) -> [255][8];
adderC (a , b ) = [| ( a + b ) || (a , b ) <- [0..254] |];
Sequential circuit
adderS : [8] -> [255][8];
adderS b = take (255 , outs )
where outs = [ b ] # [| ( a + b ) || a <- outs |];
Cryptol’s generated circuits must be clocked, otherwise it’s not
possible to make use of clock constraints to produce useful timing
analysis
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
19. Modelling Sequential Circuits
Step Model
Models circuits that are later lifted into stream model
Unclocked
Variation of type: (input, state) → (output, state)
Stream Model
Model uses infinite sequences over time
Each element in the input or output corresponds to some
number of clock cycles ⇒ latency of the circuit
Variation of type: [inf]input → [inf]output
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
20. Performance Analysis
LLSPIR and FPGA modes report estimates of circuit latency,
clockrate, space utilization and the longest path in a circuit
Guides towards a more efficient (faster and/or smaller)
implementation
Cryptol expects top-level function to be defined in the stream
model and will forcibly lift it otherwise
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
21. Performance Analysis: LLSPIR
Underestimates clockrate and provides rough estimate of
space utilization
Users are encouraged to refine an implementation as much as
possible in this mode before beginning synthesis in FPGA
mode
Translation from LLSPIR to VHDL is trivial and takes less
time than synthesis ⇒ if implementation is correct in LLSPIR,
its correctness is highly probable in VHDL
Use :translate to compile a function to LLSPIR, producing a
.dot file and :set +v to print the performance information
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
22. Performance Analysis: FPGA
FSIM mode reports space utilization accurately but reported
clockrate is overestimated (theoretical maximum)
TSIM mode reports the exact obtainable clockrate for a
particular place-and-route attempt
fpga clockrate and fpga optlevel settings can significantly
influence the place-and-route tool ⇒ experimentation is
advised to obtain maximum possible clockrate
External profiling tools
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
23. Step 2: Lift top level functions
encrypt
encrypt : ([4][ wsize ] , [4][ wsize ] , [4][ wsize ])
-> [4][ wsize ];
encrypt ( pt , key , iv )
= [| k ^ p || k <- GenKS ( key , iv ) || p <- pt |];
enc lifted
enc_lifted : [ inf ]([4][ wsize ] , [4][ wsize ] , [4][ wsize ])
-> [ inf ][4][ wsize ];
enc_lifted ins = [| encrypt in || in <- ins |];
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
24. Performance Analysis: LLSPIR
enc lifted
snow3g v0.94> :set LLSPIR +v
snow3g v0.94> :translate enc lifted
Sorry, not implemented: timing dependencies are too complicated.
LLSPIR is not in canonical form.
Some serious optimization is required!
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
25. Step 3: Space/Time Tradeoffs
Block RAM
par and seq pragmas
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
26. Space/Time Tradeoffs: Block RAM
FPGA implementation of constant sequences such as S-Boxes
Simplifies design effort and reduces computational logic
The compiler tries the conversion by default
Doesn’t work if there are dynamic elements
It’s really fast!
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
27. Space/Time Tradeoffs: Block RAM
MULxPOW
MULxPOW : ([8] , [8] , [8]) -> [8];
MULxPOW (v , i , c ) = res @ i
where res = [ v ] # [| MULx (e , c ) || e <- res |];
The latency of this implementation is 28 , because Cryptol
implements synchronous circuits whose latency must be
known statically ⇒ latency of this circuit is equal to the
worst-case latency
We can be more efficient by implementing it as 8 static
256-element lookup tables ⇒ Block RAMs
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
28. Space/Time Tradeoffs: Block RAM
MULa before static tables
=== Circuit Timing ===
circuit latency: 246 cycles (245 cycles plus propagation delay)
circuit rate: N/A
output length: one element
total time: 246 cycles (245 cycles plus propagation delay)
MULa after static tables
=== Circuit Timing ===
circuit latency: 3 cycles (2 cycles plus propagation delay)
circuit rate: N/A
output length: one element
total time: 3 cycles (2 cycles plus propagation delay)
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
29. Space/Time Tradeoffs: par and seq
par
Forces paralelization
Replicates circuitry
Faster but consumes more space
seq
Forces sequentialization
Reuses circuitry over multiple clock cycles
Slower but consumes less space
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
30. par pragma
Example
map : { a b } ( a -> b , [4] a ) -> [4] b ;
map (f , xs ) = [| ( f x ) || x <- xs |];
There’s no need to use par because it’s the compiler’s default
action in order to improve overall performance
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
31. seq pragma
Example
map : { a b } ( a -> b , [4] a ) -> [4] b ;
map (f , xs ) = seq [| ( f x ) || x <- xs |];
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
32. Step 4: Pipelining
reg pragma
Sequential circuits in the stream model can be pipelined
Separation of a function into several smaller computational
units
Each unit is a stage in the pipeline consuming output from
previous stage and producing output to the next
Typically increases area and latency of circuit but can
dramatically increase clockrate and throughput
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
33. Performance Analysis - LLSPIR
enc lifted
snow3g v0.95> :translate enc lifted
=== Circuit Timing ===
circuit latency: 25 cycles (24 cycles plus propagation delay)
circuit rate: one element per cycle
output length: unbounded
total time: unbounded
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
34. Conclusions
Language
Combination of arithmetics and sequence manipulations ⇒
compact syntax and easy to learn
Infinite sequences
Size and shape polymorphism
Really captures the elegance and abstract mathematical
essence of ciphers’ specifications
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
35. Conclusions
Formal Methods’ tools
Possible to check if implementations are safe to execute,
correct and formally identical to their specifications
They work in real scenarios
Push button package ⇒ avoids specific annotations and effort
to learn external languages
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
36. Conclusions
FPGA synthesis
Performance analysis
Compiler pragmas are provided to make simple and effective
space/time tradeoffs
Can generate more efficient than hand-made implementations
⇒ saving loads of time
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL
37. Questions
?
Pedro Pereira, Ulisses Costa The Cryptol Epilogue: Swift and Bulletproof VHDL