The document is a presentation about writing compilers using Python and LLVM. It provides an overview of compilers, including definitions of key terms like source code, intermediate representations, passes, optimizations, and code generation. It also describes the frontend and backend of compilers, and how LLVM can be used to build compiler backends with its libraries. Specifically, it discusses the Python bindings for LLVM called llvm-py that allow building compiler backends using Python.
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
llvm-py: Writing Compilers In Python
1. llvm-py: Writing Compilers Using Python
PyCon India 2010
Mahadevan R
mdevan@mdevan.org
September 25, 2010
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 1 / 14
2. About This Talk
Outline
Compilers overview, some terminology
LLVM about the “Low Level Virtual Machine”
llvm-py Python bindings for LLVM
About me:
Open source developer
Day job is as a software architect (medical imaging)
Started my career in 1999; mostly C, C++, a little Python!
..also the author of llvm-py.
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 2 / 14
3. Compilers What is a Compiler?
What is a Compiler?
What is a compiler?
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
4. Compilers What is a Compiler?
What is a Compiler?
What is a compiler?
Something that transforms “source code” into “something else”
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
5. Compilers What is a Compiler?
What is a Compiler?
What is a compiler?
Something that transforms “source code” into “something else”
What is “source code”?
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
6. Compilers What is a Compiler?
What is a Compiler?
What is a compiler?
Something that transforms “source code” into “something else”
What is “source code”?
A well-structured, textual representation of a program
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
7. Compilers What is a Compiler?
What is a Compiler?
What is a compiler?
Something that transforms “source code” into “something else”
What is “source code”?
A well-structured, textual representation of a program
Not: preprocessors, assemblers
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
8. Compilers What is a Compiler?
What is a Compiler?
What is a compiler?
Something that transforms “source code” into “something else”
What is “source code”?
A well-structured, textual representation of a program
Not: preprocessors, assemblers
What is “something else?”
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
9. Compilers What is a Compiler?
What is a Compiler?
What is a compiler?
Something that transforms “source code” into “something else”
What is “source code”?
A well-structured, textual representation of a program
Not: preprocessors, assemblers
What is “something else?”
Assembly language
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
10. Compilers What is a Compiler?
What is a Compiler?
What is a compiler?
Something that transforms “source code” into “something else”
What is “source code”?
A well-structured, textual representation of a program
Not: preprocessors, assemblers
What is “something else?”
Assembly language
Another well-structured textual representation (e.g. cfront)
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
11. Compilers What is a Compiler?
What is a Compiler?
What is a compiler?
Something that transforms “source code” into “something else”
What is “source code”?
A well-structured, textual representation of a program
Not: preprocessors, assemblers
What is “something else?”
Assembly language
Another well-structured textual representation (e.g. cfront)
Intermediate, binary representation (AOT compilers)
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
12. Compilers What is a Compiler?
What is a Compiler?
What is a compiler?
Something that transforms “source code” into “something else”
What is “source code”?
A well-structured, textual representation of a program
Not: preprocessors, assemblers
What is “something else?”
Assembly language
Another well-structured textual representation (e.g. cfront)
Intermediate, binary representation (AOT compilers)
In-memory executable code (JIT compilers)
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
13. Compilers What is a Compiler?
What is a Compiler?
What is a compiler?
Something that transforms “source code” into “something else”
What is “source code”?
A well-structured, textual representation of a program
Not: preprocessors, assemblers
What is “something else?”
Assembly language
Another well-structured textual representation (e.g. cfront)
Intermediate, binary representation (AOT compilers)
In-memory executable code (JIT compilers)
Executable images
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
14. Compilers Inside a Compiler
Inside a Compiler
What does the compiler do with the source code?
Lexical analysis produces tokens
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 4 / 14
15. Compilers Inside a Compiler
Inside a Compiler
What does the compiler do with the source code?
Lexical analysis produces tokens
Parser eats tokens, produces AST
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 4 / 14
16. Compilers Inside a Compiler
Inside a Compiler
What does the compiler do with the source code?
Lexical analysis produces tokens
Parser eats tokens, produces AST
What is an AST? And what is abstract about it?
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 4 / 14
17. Compilers Inside a Compiler
Inside a Compiler
What does the compiler do with the source code?
Lexical analysis produces tokens
Parser eats tokens, produces AST
What is an AST? And what is abstract about it?
What next after an AST?
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 4 / 14
18. Compilers Inside a Compiler
Inside a Compiler
What does the compiler do with the source code?
Lexical analysis produces tokens
Parser eats tokens, produces AST
What is an AST? And what is abstract about it?
What next after an AST?
What is an intermediate form (IR)?
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 4 / 14
19. Compilers Inside a Compiler
Inside a Compiler
What does the compiler do with the source code?
Lexical analysis produces tokens
Parser eats tokens, produces AST
What is an AST? And what is abstract about it?
What next after an AST?
What is an intermediate form (IR)?
example: IRs in gcc: source → GENERIC → GIMPLE → RTL →
backend
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 4 / 14
20. Compilers Inside a Compiler
Inside a Compiler
What does the compiler do with the source code?
Lexical analysis produces tokens
Parser eats tokens, produces AST
What is an AST? And what is abstract about it?
What next after an AST?
What is an intermediate form (IR)?
example: IRs in gcc: source → GENERIC → GIMPLE → RTL →
backend
What is Static Single Assignment (SSA) form?
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 4 / 14
21. Compilers Passes, Optimization and Code Generation
Passes, Optimization and Code Generation
What is a “pass”?
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 5 / 14
22. Compilers Passes, Optimization and Code Generation
Passes, Optimization and Code Generation
What is a “pass”?
What is a “optimization”?
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 5 / 14
23. Compilers Passes, Optimization and Code Generation
Passes, Optimization and Code Generation
What is a “pass”?
What is a “optimization”?
What is a “code generator”?
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 5 / 14
24. Compilers Passes, Optimization and Code Generation
Passes, Optimization and Code Generation
What is a “pass”?
What is a “optimization”?
What is a “code generator”?
What is a “code generator generator”?
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 5 / 14
25. Compilers The Frontend and The Backend
The Frontend and The Backend
Everything before the first IR is usually called the frontend
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
26. Compilers The Frontend and The Backend
The Frontend and The Backend
Everything before the first IR is usually called the frontend
And everything after as the backend
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
27. Compilers The Frontend and The Backend
The Frontend and The Backend
Everything before the first IR is usually called the frontend
And everything after as the backend
LLVM and llvm-py deal only with backends
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
28. Compilers The Frontend and The Backend
The Frontend and The Backend
Everything before the first IR is usually called the frontend
And everything after as the backend
LLVM and llvm-py deal only with backends
Frontends tend to be simpler
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
29. Compilers The Frontend and The Backend
The Frontend and The Backend
Everything before the first IR is usually called the frontend
And everything after as the backend
LLVM and llvm-py deal only with backends
Frontends tend to be simpler
Many frontends for a backend is common (Scala, Groovy, Clojure:
Java; C#, IronPython, etc: .NET)
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
30. Compilers The Frontend and The Backend
The Frontend and The Backend
Everything before the first IR is usually called the frontend
And everything after as the backend
LLVM and llvm-py deal only with backends
Frontends tend to be simpler
Many frontends for a backend is common (Scala, Groovy, Clojure:
Java; C#, IronPython, etc: .NET)
To build a front end in Python:
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
31. Compilers The Frontend and The Backend
The Frontend and The Backend
Everything before the first IR is usually called the frontend
And everything after as the backend
LLVM and llvm-py deal only with backends
Frontends tend to be simpler
Many frontends for a backend is common (Scala, Groovy, Clojure:
Java; C#, IronPython, etc: .NET)
To build a front end in Python:
hand-coded lexer, (rec-desc) parser
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
32. Compilers The Frontend and The Backend
The Frontend and The Backend
Everything before the first IR is usually called the frontend
And everything after as the backend
LLVM and llvm-py deal only with backends
Frontends tend to be simpler
Many frontends for a backend is common (Scala, Groovy, Clojure:
Java; C#, IronPython, etc: .NET)
To build a front end in Python:
hand-coded lexer, (rec-desc) parser
Antlr (generates Python)
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
33. Compilers The Frontend and The Backend
The Frontend and The Backend
Everything before the first IR is usually called the frontend
And everything after as the backend
LLVM and llvm-py deal only with backends
Frontends tend to be simpler
Many frontends for a backend is common (Scala, Groovy, Clojure:
Java; C#, IronPython, etc: .NET)
To build a front end in Python:
hand-coded lexer, (rec-desc) parser
Antlr (generates Python)
Yacc, Bison, Lemon (generates C, wrap to Python)
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
34. Compilers The Frontend and The Backend
The Frontend and The Backend
Everything before the first IR is usually called the frontend
And everything after as the backend
LLVM and llvm-py deal only with backends
Frontends tend to be simpler
Many frontends for a backend is common (Scala, Groovy, Clojure:
Java; C#, IronPython, etc: .NET)
To build a front end in Python:
hand-coded lexer, (rec-desc) parser
Antlr (generates Python)
Yacc, Bison, Lemon (generates C, wrap to Python)
Sphinx (C++, wrap to Python)
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
35. Compilers The Frontend and The Backend
The Frontend and The Backend
Everything before the first IR is usually called the frontend
And everything after as the backend
LLVM and llvm-py deal only with backends
Frontends tend to be simpler
Many frontends for a backend is common (Scala, Groovy, Clojure:
Java; C#, IronPython, etc: .NET)
To build a front end in Python:
hand-coded lexer, (rec-desc) parser
Antlr (generates Python)
Yacc, Bison, Lemon (generates C, wrap to Python)
Sphinx (C++, wrap to Python)
various Python parser generators
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
36. LLVM What is LLVM?
What is LLVM?
Primarily a set of libraries..
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
37. LLVM What is LLVM?
What is LLVM?
Primarily a set of libraries..
..with which you can make a compiler backend..
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
38. LLVM What is LLVM?
What is LLVM?
Primarily a set of libraries..
..with which you can make a compiler backend..
..or a VM with JIT support (but hard to get this right)
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
39. LLVM What is LLVM?
What is LLVM?
Primarily a set of libraries..
..with which you can make a compiler backend..
..or a VM with JIT support (but hard to get this right)
It provides:
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
40. LLVM What is LLVM?
What is LLVM?
Primarily a set of libraries..
..with which you can make a compiler backend..
..or a VM with JIT support (but hard to get this right)
It provides:
IR data structure, with text and binary representations
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
41. LLVM What is LLVM?
What is LLVM?
Primarily a set of libraries..
..with which you can make a compiler backend..
..or a VM with JIT support (but hard to get this right)
It provides:
IR data structure, with text and binary representations
wide range of optimizations
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
42. LLVM What is LLVM?
What is LLVM?
Primarily a set of libraries..
..with which you can make a compiler backend..
..or a VM with JIT support (but hard to get this right)
It provides:
IR data structure, with text and binary representations
wide range of optimizations
code generator (partly description-based) for many platforms
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
43. LLVM What is LLVM?
What is LLVM?
Primarily a set of libraries..
..with which you can make a compiler backend..
..or a VM with JIT support (but hard to get this right)
It provides:
IR data structure, with text and binary representations
wide range of optimizations
code generator (partly description-based) for many platforms
JIT-compile-execute from IR
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
44. LLVM What is LLVM?
What is LLVM?
Primarily a set of libraries..
..with which you can make a compiler backend..
..or a VM with JIT support (but hard to get this right)
It provides:
IR data structure, with text and binary representations
wide range of optimizations
code generator (partly description-based) for many platforms
JIT-compile-execute from IR
link-time optimization (LTO)
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
45. LLVM What is LLVM?
What is LLVM?
Primarily a set of libraries..
..with which you can make a compiler backend..
..or a VM with JIT support (but hard to get this right)
It provides:
IR data structure, with text and binary representations
wide range of optimizations
code generator (partly description-based) for many platforms
JIT-compile-execute from IR
link-time optimization (LTO)
support for accurate garbage collection
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
46. LLVM LLVM Highlights
LLVM Highlights
Written in readable C++
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 8 / 14
47. LLVM LLVM Highlights
LLVM Highlights
Written in readable C++
Reasonable documentation, helpful and mature community
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 8 / 14
48. LLVM LLVM Highlights
LLVM Highlights
Written in readable C++
Reasonable documentation, helpful and mature community
clang is a now-famous subproject of LLVM
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 8 / 14
49. LLVM LLVM Highlights
LLVM Highlights
Written in readable C++
Reasonable documentation, helpful and mature community
clang is a now-famous subproject of LLVM
llvm-gcc is gcc frontend (C, C++, Java, Ada, Fortran, ObjC) +
LLVM backend
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 8 / 14
50. LLVM LLVM Highlights
LLVM Highlights
Written in readable C++
Reasonable documentation, helpful and mature community
clang is a now-famous subproject of LLVM
llvm-gcc is gcc frontend (C, C++, Java, Ada, Fortran, ObjC) +
LLVM backend
Many users: Unladen Swallow, Iced Tea, Rubinus, llvm-lua, GHC,
LDC
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 8 / 14
51. LLVM The LLVM IR
The LLVM IR
The IR looks like this (textual representation):
“Hello world” in LLVM Assembly
@msg = private constant [15 x i8 ] c " Hello , world !0 A 00"
declare i32 @puts ( i8 *)
define i32 @main ( i32 % argc , i8 ** % argv ) {
entry :
%0 = getelementptr [15 x i8 ]* @msg , i64 0 , i64 0
%1 = tail call i32 @puts ( i8 * %0)
ret i32 undef
}
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 9 / 14
52. LLVM A Word About clang
A Word About clang
A sub-project of the LLVM team
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 10 / 14
53. LLVM A Word About clang
A Word About clang
A sub-project of the LLVM team
clang is a C, Objective C and C++ frontend
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 10 / 14
54. LLVM A Word About clang
A Word About clang
A sub-project of the LLVM team
clang is a C, Objective C and C++ frontend
works with the LLVM backend (obviously!)
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 10 / 14
55. LLVM A Word About clang
A Word About clang
A sub-project of the LLVM team
clang is a C, Objective C and C++ frontend
works with the LLVM backend (obviously!)
It can convert C/ObjC/C++ code to LLVM IR
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 10 / 14
56. LLVM A Word About clang
A Word About clang
A sub-project of the LLVM team
clang is a C, Objective C and C++ frontend
works with the LLVM backend (obviously!)
It can convert C/ObjC/C++ code to LLVM IR
which can be played around with using llvm-py
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 10 / 14
57. LLVM A Word About clang
A Word About clang
A sub-project of the LLVM team
clang is a C, Objective C and C++ frontend
works with the LLVM backend (obviously!)
It can convert C/ObjC/C++ code to LLVM IR
which can be played around with using llvm-py
and then compiled “normally”
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 10 / 14
58. llvm-py What is llvm-py?
What is llvm-py?
Python bindings for LLVM
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 11 / 14
59. llvm-py What is llvm-py?
What is llvm-py?
Python bindings for LLVM
Including JIT
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 11 / 14
60. llvm-py What is llvm-py?
What is llvm-py?
Python bindings for LLVM
Including JIT
Excellent for experimenting and prototyping
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 11 / 14
61. llvm-py What is llvm-py?
What is llvm-py?
Python bindings for LLVM
Including JIT
Excellent for experimenting and prototyping
Available in Ubuntu, Debian, MacPorts (but may be out of date)
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 11 / 14
62. llvm-py What is llvm-py?
What is llvm-py?
Python bindings for LLVM
Including JIT
Excellent for experimenting and prototyping
Available in Ubuntu, Debian, MacPorts (but may be out of date)
Current version (0.6) works with LLVM 2.7
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 11 / 14
63. llvm-py More on llvm-py
More on llvm-py
Covers most of the LLVM APIs, esp. IR
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 12 / 14
64. llvm-py More on llvm-py
More on llvm-py
Covers most of the LLVM APIs, esp. IR
See examples to get started
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 12 / 14
65. llvm-py More on llvm-py
More on llvm-py
Covers most of the LLVM APIs, esp. IR
See examples to get started
User guide at http://www.mdevan.org/userguide.html
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 12 / 14
66. llvm-py More on llvm-py
More on llvm-py
Covers most of the LLVM APIs, esp. IR
See examples to get started
User guide at http://www.mdevan.org/userguide.html
Some LLVM knowledge required: Modules, IR instructions, ...
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 12 / 14
67. llvm-py More on llvm-py
More on llvm-py
Covers most of the LLVM APIs, esp. IR
See examples to get started
User guide at http://www.mdevan.org/userguide.html
Some LLVM knowledge required: Modules, IR instructions, ...
Learning curve of LLVM is steep, llvm-py/Python helps!
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 12 / 14
68. llvm-py More on llvm-py
More on llvm-py
Covers most of the LLVM APIs, esp. IR
See examples to get started
User guide at http://www.mdevan.org/userguide.html
Some LLVM knowledge required: Modules, IR instructions, ...
Learning curve of LLVM is steep, llvm-py/Python helps!
Users group: http://groups.google.com/group/llvm-py
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 12 / 14
69. llvm-py llvm-py Internals
llvm-py Internals
Wraps LLVM’s C API as a Python extension module
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 13 / 14
70. llvm-py llvm-py Internals
llvm-py Internals
Wraps LLVM’s C API as a Python extension module
Extends LLVM C API as it is incomplete
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 13 / 14
71. llvm-py llvm-py Internals
llvm-py Internals
Wraps LLVM’s C API as a Python extension module
Extends LLVM C API as it is incomplete
Does not use binding generators (Boost.Python, swig etc)
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 13 / 14
72. llvm-py llvm-py Internals
llvm-py Internals
Wraps LLVM’s C API as a Python extension module
Extends LLVM C API as it is incomplete
Does not use binding generators (Boost.Python, swig etc)
Public APIs are in Python, use extension module internally
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 13 / 14
73. llvm-py llvm-py Internals
llvm-py Internals
Wraps LLVM’s C API as a Python extension module
Extends LLVM C API as it is incomplete
Does not use binding generators (Boost.Python, swig etc)
Public APIs are in Python, use extension module internally
Mostly straightforward Python, a little metaclass magic
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 13 / 14
74. Thanks References and Thanks
Thanks!
Thanks for listening!
Must read: Steven S. Muchnick. Advanced Compiler Design &
Implementation.
Me: Mahadevan R, mdevan@mdevan.org, http://www.mdevan.org/
Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 14 / 14