2. Introduction To LLVM
• collection of modular and reusable compiler and tools
• formerly Low Level Virtual Machine
• spawned a wide variety of front ends like ActionScript,
Ada, D, Fortran, GLSL, Haskell, Java bytecode, Julia,
Objective-C, Python, Ruby, Rust, Scala and C#.
• supports several backends like ARM, Hexagon,
MBlaze, MIPS, Nvidia PTX ,PowerPC, SPARC, z/
Architecture , x86/x86-64, XCore
Wednesday, 16 October 13
3. Overview of tools
• bugpoint is the automatic test case reduction tool.
• clang is the Clang C, C++, and Objective-C compiler.
• llc iis the LLVM static compiler
• llvm-as is the LLVM assembler
• llvm-bcanalyzer is the LLVM bitcode analyzer.
• llvm-dis is the LLVM disassembler.
• llvm-link is the LLVM linker.
• llvm-nm is used to list LLVM bitcode and object file's
symbol table
Wednesday, 16 October 13
4. LLVM Feature
Link Time Optimization
• intermodular optimizations which can be used at link
time
• treates LLVM bitcode files like native object files and
allows mixing and matching
• let the developer take advantage of intermodular
optimizations without making any significant changes
to the developer’s makefiles or build system
• libLTO, a shared object, to handle LLVM bitcode files
Wednesday, 16 October 13
5. Example
--- a.h ---
extern int foo1(void);
extern void foo2(void);
extern void foo4(void);
--- a.c ---
#include "a.h"
static signed int i = 0;
void foo2(void) {
i = -1;
}
static int foo3() {
foo4();
return 10;
}
int foo1(void) {
int data = 0;
if (i < 0)
data = foo3();
data = data + 42;
return data;
}
--- main.c ---
#include <stdio.h>
#include "a.h"
void foo4(void) {
printf("Hin");
}
int main() {
return foo1();
}
% clang -emit-llvm -c a.c -o a.o # <-- a.o is LLVM bitcode file
% clang -c main.c -o main.o # <-- main.o is native object file
% clang a.o main.o -o main # <-- link command without modifications
Wednesday, 16 October 13
6. How it works?
Phase 1 : Read LLVM Bit code Files
The linker first reads all object files.
the linker calls lto_module_create()for non native object file.
If object file is LLVM bit code then lto_module_get_symbol_name() and
lto_module_get_symbol_attribute() are used.
result --> linker’s global table.
The lto* -> libLTO. This allows the LLVM LTO code to be updated independently of the linker tool. lazy loading.
Phase 2 : Symbol Resolution
The linker resolves symbols using global symbol table.
Reports Error.
The linker is able to do this seamlessly even though it does not know the exact content of input
LLVM bit code files.
Wednesday, 16 October 13
7. continue...
Phase 3 : Optimize Bit code Files
the linker tells the LTO shared object which symbols are needed by native object files
using lto_codegen_add_must_preserve_symbol().
Next the linker invokes the LLVM optimizer and code generators using
lto_codegen_compile() which returns a native object file creating by merging the
LLVM bit code files and applying various optimization passes.
Phase 4 : Symbol Resolution after optimization
The linker reads optimized a native object file and updates the internal global symbol table.
The linker also collects information about any changes in use of external symbols by LLVM bitcode
files. In the example above, the linker notes that foo4() is not used any more.
So performs dead code stripping.
After this phase, the linker continues linking as if it never saw LLVM bit code files.
Wednesday, 16 October 13
8. libLTO
libLTO is a shared object that is part of the LLVM tools, and is intended for use by a linker.
A non-native object file is handled via an lto_module_t. The following functions allow the linker to check
if a file (on disk or in a memory buffer) is a file which libLTO can process:
lto_module_is_object_file(const char*)
lto_module_is_object_file_for_target(const char*, const char*)
lto_module_is_object_file_in_memory(const void*, size_t)
lto_module_is_object_file_in_memory_for_target(const void*, size_t, const char*)
If the object file can be processed by libLTO, the linker creates a lto_module_t by using one of:
lto_module_create(const char*)
lto_module_create_from_memory(const void*, size_t)
and when done, the handle is released via
lto_module_dispose(lto_module_t)
The linker can introspect the non-native object file by getting the number of symbols and getting the
name and attributes of each symbol via:
lto_module_get_num_symbols(lto_module_t)
lto_module_get_symbol_name(lto_module_t, unsigned int)
lto_module_get_symbol_attribute(lto_module_t, unsigned int)
The attributes of a symbol include the alignment, visibility, and kind.
Wednesday, 16 October 13
9. lto_code_gen_t
Once the linker has loaded each non-native object files into an lto_module_t, it can request libLTO to process
them all and generate a native object file. This is done in a couple of steps. First, a code generator is created
with:
lto_codegen_create()
Then, each non-native object file is added to the code generator with:
lto_codegen_add_module(lto_code_gen_t, lto_module_t)
The linker then has the option of setting some codegen options. Whether or not to generate DWARF debug
info is set with:
lto_codegen_set_debug_model(lto_code_gen_t)
Which kind of position independence is set with:
lto_codegen_set_pic_model(lto_code_gen_t)
And each symbol that is referenced by a native object file or otherwise must not be optimized away is set with:
lto_codegen_add_must_preserve_symbol(lto_code_gen_t, const char*)
After all these settings are done, the linker requests that a native object file be created from the modules with
the settings using:
lto_codegen_compile(lto_code_gen_t, size*)
which returns a pointer to a buffer containing the generated native object file. The linker then parses that and
links it with the rest of the native object files.
Wednesday, 16 October 13