SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Improving static code
review using AST-based
code analysis
Christophe Alladoum
@_hugsy_
hugsy
Who am I ?
➔ Christophe Alladoum
➔ IOActive pirate
➔ blah blah blah
What about ?
➔ I read a LOT of code
◆ mostly for fun (eventually for work)
● just to know how it works
● occasionally to find bugs
◆ most of the time, C code
● sometimes C++
● occasionally higher level stuff: PHP (lol), Java,
Python, ...
What about ?
➔ C code is tricky & not trivial
● many standards (ANSI C - C89, C99, C11, etc..)
● many bad coding practices
● MANY subtleties in the language
➔ Ergo, many places for flaws
● logic errors
● programming errors
● lack of restriction in code (buffers, integers)
I like
Existing automated tools
● Many Open-Source & licenced ($$$) tools use regexp to
find weak patterns
● Insufficient approach :
○ Example using latest flawfinder :
○ Basically as clever as making a `grep`
which is one of the best vuln finder btw
Ok, thanks !
Existing automated tools
○ and (too) many times, there are “strange” results
○ Usually a very *bad* idea to just paste output from
those tools in a (serious) code review report
*PLUS* splint fails to
see vulnerable calls
A smarter approach
➔ C based code projects are ultimately made
to be compiled & linked
◆ Compilers are the best code reviewers !!
● Code is parsed and transformed into another format
● Code is validated
● Some additional checks are even provided by default for
programming errors (type checks, unused vars, invalid
formatted strings, uninitialized values, etc…)
Quick reminder on compilers
● Compiler, noun : set of programs that transforms source code written in a
programming language into another computer language (Wikipedia).
■ Examples : GCC, as, Python ( which embeds a JIT compiler), etc...
● Abstract representation of compiler behavior:
LLVM Specifics
● What makes LLVM so special ?
○ LLVM (Low-Level Virtual Machine) : 13 year old project
○ Many different projects around this architecture
○ LLVM structure *truly* isolates each part
(lexing/optimizing/generating)
○ Totally Plug-and-Play
● you can easily write a lexer for generating Python .pyc file ...
● … or you can use optimizer API to help runtime bug detection (heard of Google
AddressSanitizer module ?) …
● … or you can use an existing parser (for instance GCC’s) and bind it to the rest
of the LLVM architecture (llvm-gcc)
→ really cool features ! Go
hack it !!
LLVM Specifics
● Clang
○ Default C/C++/Obj-C compiler based for LLVM architecture
○ Parser gets .c, .cpp, .m files as input and generates an
Intermediate Representation (IR) of the code
→ this is achieved thanks to an Abstract Syntax Tree (AST)
created when “reading” each source file
○ An API is provided to interact with the generated AST
→ in native C++
→ or higher languages, like Python
■ This means that Clang parses the code for us, then why not use
this to parse code in a smart way (and ultimately find
vulnerabilities) ?
Clang Python API
● Relatively easy to use...
○ … but not enough thoroughly documented (just automatically generated documentation)
→ pydoc works fairly well on it
○ Many blog posts (but sometimes outdated on the topic)
○ Namespace fairly intuitive
Basic example : outputs
Demo
● clang-draw-ast.py is a 70-line Python script that will parse a C source
file and display (PNG format) the corresponding AST.
(This is the expected result if live demo fails)
Let’s have a look...
The magic inside
Indexation engine API is exposed by `clang.cindex` package.
● Index
○ top-level object which manages some global library state.
● TranslationUnit
○ High-level object encapsulating the AST for a single translation unit
(parsed on the fly)
● SourceRange, SourceLocation, and File
○ Objects representing information about the input source.
Clang internals voodoo
The routines in this group provide the
ability to create and destroy translation
units from files, either by parsing the
contents of the files or by reading in a
serialized representation of a
translation unit.
● Once indexation engine is created, parse() function
will output a TranslationUnit object
○ The most important object
● Cursor object that will iterate through all nodes
○ kind : declare the type of the current node
○ displayname : display name for the entity referenced
○ location : returns the source location (the starting
character)
○ get_children() : return an iterator for accessing the children of
this cursor
○ get_arguments(): return an iterator for accessing the arguments
of this cursor
Clang internals voodoo
Now we can better understand the previous script
Easy, right ?
1
2
3
4
Pros / Cons
Pros
● simple and intuitive Python bindings
● full control over all the code being audited
● parsing and browsing are fast
● can be extended with LLVM extra modules
Cons
● generated over Python ctypes : might not work as well for other high
level languages (Ruby, Java, etc.)
Limitations ?
● Many developments, API keeps on improving and docs becoming more
complete
Introducing CodeBro!
● Built as a Proof-of-Concept around this idea
○ Meaning : you can use it but don’t rely on it
● Underlying idea : create a web-based tool that would interface between
AST and code reviewer
○ Code reviewer can smartly analyse/navigate through code and
eventually add some modules to detect basic (or advanced)
vulnerabilities
CodeBro!
● 100% Open-Source
○ Beer-Ware License
● 100% full Python
● (Hopefully) Easily installable (pip)
● Django (compat. 1.5+) based application
○ combines many cool Python based technologies
■ PyDot
■ PyCharm
■ Pygments
■ etc.
○ Allows to keep things simple
■ 1 project to audit = 1 specific database (default : SQLite)
CodeBro!
● Uses Clang parsing module to dynamically
interact with code
○ Cross-referencing feature similar to IDA Pro
■ only between functions (caller/callee)
○ call graphs generation : visual understanding of code
■ SVG generated graph → can be browsed through browser
CodeBro!
● “Analysis” module
○ reports all default diagnostics provided by Clang
○ provides a “Plugin” API
■ some modules implemented
■ … some more to come
CodeBro!
● Extensible through plugins
○ can use AST and/or already existing references
○ Examples :
■ detecting dead code
● find all functions never called (i.e. no down Xref to it)
■ improving format string flaws detection
● “count” number of args for known functions (printf, sprintf,
etc.) and parse the arguments
● detect formatted string wrapping functions (based on former
calls)
■ (in a limited extent)
detect use-after-free like this →
Demo time
(More screenshots if demo still fails)
Code project listing
Code browsing - unparsed
then parsed
Call graph generation : SVG generation (href linking)
← Functions listing
Future enhancements
● Still a work in progress
● Fix bugs
● Index all components of source files (instead of just CALL_EXPR and
FUNCTION_DECL)
● Improve search engine
● Add macro parsing
● Integrate more source code input vector (GIT - as soon as there is a decent
Python GIT bindings package)
● Improve C++ and Objective-C analysis
● Add moar modulez !!
The end
QUESTIONS ?
Links :
● https://github.com/hugsy/codebro
● https://twitter.com/_hugsy_
● http://eli.thegreenplace.net/2011/07/03/parsing-c-in-python-with-clang
● http://llvm.org/devmtg/2010-11/Gregor-libclang.pdf
● https://code.google.com/p/address-sanitizer/wiki/AddressSanitizer

Weitere ähnliche Inhalte

Was ist angesagt?

The D Programming Language - Why I love it!
The D Programming Language - Why I love it!The D Programming Language - Why I love it!
The D Programming Language - Why I love it!
ryutenchi
 
OWF12/PAUG Conf Days Dart a new html5 technology, nicolas geoffray, softwar...
OWF12/PAUG Conf Days Dart   a new html5 technology, nicolas geoffray, softwar...OWF12/PAUG Conf Days Dart   a new html5 technology, nicolas geoffray, softwar...
OWF12/PAUG Conf Days Dart a new html5 technology, nicolas geoffray, softwar...
Paris Open Source Summit
 

Was ist angesagt? (20)

FTD JVM Internals
FTD JVM InternalsFTD JVM Internals
FTD JVM Internals
 
Introduction to Go programming language
Introduction to Go programming languageIntroduction to Go programming language
Introduction to Go programming language
 
Hands on clang-format
Hands on clang-formatHands on clang-format
Hands on clang-format
 
Golang
GolangGolang
Golang
 
Go Programming Language (Golang)
Go Programming Language (Golang)Go Programming Language (Golang)
Go Programming Language (Golang)
 
Grant Rogerson SDEC2015
Grant Rogerson SDEC2015Grant Rogerson SDEC2015
Grant Rogerson SDEC2015
 
Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)
Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)
Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)
 
JDD 2017: Kotlin for Java developers (Tomasz Kleszczyński)
JDD 2017: Kotlin for Java developers (Tomasz Kleszczyński)JDD 2017: Kotlin for Java developers (Tomasz Kleszczyński)
JDD 2017: Kotlin for Java developers (Tomasz Kleszczyński)
 
Groovy / comparison with java
Groovy / comparison with javaGroovy / comparison with java
Groovy / comparison with java
 
The D Programming Language - Why I love it!
The D Programming Language - Why I love it!The D Programming Language - Why I love it!
The D Programming Language - Why I love it!
 
D programming language
D programming languageD programming language
D programming language
 
TDC2016SP - Groovy como você nunca viu
TDC2016SP - Groovy como você nunca viuTDC2016SP - Groovy como você nunca viu
TDC2016SP - Groovy como você nunca viu
 
Go Lang Tutorial
Go Lang TutorialGo Lang Tutorial
Go Lang Tutorial
 
Basic c++ 11/14 for python programmers
Basic c++ 11/14 for python programmersBasic c++ 11/14 for python programmers
Basic c++ 11/14 for python programmers
 
Kotlin workshop 2018-06-11
Kotlin workshop 2018-06-11Kotlin workshop 2018-06-11
Kotlin workshop 2018-06-11
 
Beginning python programming
Beginning python programmingBeginning python programming
Beginning python programming
 
OWF12/PAUG Conf Days Dart a new html5 technology, nicolas geoffray, softwar...
OWF12/PAUG Conf Days Dart   a new html5 technology, nicolas geoffray, softwar...OWF12/PAUG Conf Days Dart   a new html5 technology, nicolas geoffray, softwar...
OWF12/PAUG Conf Days Dart a new html5 technology, nicolas geoffray, softwar...
 
A Plan towards Ruby 3 Types
A Plan towards Ruby 3 TypesA Plan towards Ruby 3 Types
A Plan towards Ruby 3 Types
 
DConf 2016: Keynote by Walter Bright
DConf 2016: Keynote by Walter Bright DConf 2016: Keynote by Walter Bright
DConf 2016: Keynote by Walter Bright
 
Go lang introduction
Go lang introductionGo lang introduction
Go lang introduction
 

Andere mochten auch

เดินทางบุกตะลุยในกรุงเทพด้วยมอเตอร์ไซค์รับจ้าง
เดินทางบุกตะลุยในกรุงเทพด้วยมอเตอร์ไซค์รับจ้างเดินทางบุกตะลุยในกรุงเทพด้วยมอเตอร์ไซค์รับจ้าง
เดินทางบุกตะลุยในกรุงเทพด้วยมอเตอร์ไซค์รับจ้าง
Banana bike
 
Letter of recommendaton Roy Teng
Letter of recommendaton Roy TengLetter of recommendaton Roy Teng
Letter of recommendaton Roy Teng
Roy Teng
 
completionCertificate
completionCertificatecompletionCertificate
completionCertificate
Bevin Shaw
 

Andere mochten auch (9)

Pg202
Pg202Pg202
Pg202
 
Negocis europaest 270408
Negocis europaest 270408Negocis europaest 270408
Negocis europaest 270408
 
เดินทางบุกตะลุยในกรุงเทพด้วยมอเตอร์ไซค์รับจ้าง
เดินทางบุกตะลุยในกรุงเทพด้วยมอเตอร์ไซค์รับจ้างเดินทางบุกตะลุยในกรุงเทพด้วยมอเตอร์ไซค์รับจ้าง
เดินทางบุกตะลุยในกรุงเทพด้วยมอเตอร์ไซค์รับจ้าง
 
RESUME DODDANAGOUDA.K M-TECH
RESUME DODDANAGOUDA.K M-TECHRESUME DODDANAGOUDA.K M-TECH
RESUME DODDANAGOUDA.K M-TECH
 
Letter of recommendaton Roy Teng
Letter of recommendaton Roy TengLetter of recommendaton Roy Teng
Letter of recommendaton Roy Teng
 
Black MBA Mag
Black MBA MagBlack MBA Mag
Black MBA Mag
 
completionCertificate
completionCertificatecompletionCertificate
completionCertificate
 
Pres
PresPres
Pres
 
104 b04
104 b04104 b04
104 b04
 

Ähnlich wie Ruxmon.2013-08.-.CodeBro!

"Revenge of The Script Kiddies: Current Day Uses of Automated Scripts by Top ...
"Revenge of The Script Kiddies: Current Day Uses of Automated Scripts by Top ..."Revenge of The Script Kiddies: Current Day Uses of Automated Scripts by Top ...
"Revenge of The Script Kiddies: Current Day Uses of Automated Scripts by Top ...
PROIDEA
 
Python and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthroughPython and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthrough
gabriellekuruvilla
 
Enforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code GenerationEnforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code Generation
Tim Burks
 
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling ToolsTIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
Xiaozhe Wang
 

Ähnlich wie Ruxmon.2013-08.-.CodeBro! (20)

Whirlwind tour of the Runtime Dynamic Linker
Whirlwind tour of the Runtime Dynamic LinkerWhirlwind tour of the Runtime Dynamic Linker
Whirlwind tour of the Runtime Dynamic Linker
 
Ruxmon.2015-08.-.proxenet
Ruxmon.2015-08.-.proxenetRuxmon.2015-08.-.proxenet
Ruxmon.2015-08.-.proxenet
 
Mender.io | Develop embedded applications faster | Comparing C and Golang
Mender.io | Develop embedded applications faster | Comparing C and GolangMender.io | Develop embedded applications faster | Comparing C and Golang
Mender.io | Develop embedded applications faster | Comparing C and Golang
 
Pentester++
Pentester++Pentester++
Pentester++
 
Power Leveling your TypeScript
Power Leveling your TypeScriptPower Leveling your TypeScript
Power Leveling your TypeScript
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
 
Dart the better Javascript 2015
Dart the better Javascript 2015Dart the better Javascript 2015
Dart the better Javascript 2015
 
Flash develop presentation
Flash develop presentationFlash develop presentation
Flash develop presentation
 
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides:  Let's build macOS CLI Utilities using SwiftMobileConf 2021 Slides:  Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
 
Rusty Python
Rusty PythonRusty Python
Rusty Python
 
Ln monitoring repositories
Ln monitoring repositoriesLn monitoring repositories
Ln monitoring repositories
 
Dart the Better JavaScript
Dart the Better JavaScriptDart the Better JavaScript
Dart the Better JavaScript
 
Smalltalk, the dynamic language
Smalltalk, the dynamic languageSmalltalk, the dynamic language
Smalltalk, the dynamic language
 
"Revenge of The Script Kiddies: Current Day Uses of Automated Scripts by Top ...
"Revenge of The Script Kiddies: Current Day Uses of Automated Scripts by Top ..."Revenge of The Script Kiddies: Current Day Uses of Automated Scripts by Top ...
"Revenge of The Script Kiddies: Current Day Uses of Automated Scripts by Top ...
 
Python and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthroughPython and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthrough
 
Enforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code GenerationEnforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code Generation
 
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling ToolsTIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
 

Ruxmon.2013-08.-.CodeBro!

  • 1. Improving static code review using AST-based code analysis Christophe Alladoum @_hugsy_ hugsy
  • 2. Who am I ? ➔ Christophe Alladoum ➔ IOActive pirate ➔ blah blah blah
  • 3. What about ? ➔ I read a LOT of code ◆ mostly for fun (eventually for work) ● just to know how it works ● occasionally to find bugs ◆ most of the time, C code ● sometimes C++ ● occasionally higher level stuff: PHP (lol), Java, Python, ...
  • 4. What about ? ➔ C code is tricky & not trivial ● many standards (ANSI C - C89, C99, C11, etc..) ● many bad coding practices ● MANY subtleties in the language ➔ Ergo, many places for flaws ● logic errors ● programming errors ● lack of restriction in code (buffers, integers) I like
  • 5. Existing automated tools ● Many Open-Source & licenced ($$$) tools use regexp to find weak patterns ● Insufficient approach : ○ Example using latest flawfinder : ○ Basically as clever as making a `grep` which is one of the best vuln finder btw Ok, thanks !
  • 6. Existing automated tools ○ and (too) many times, there are “strange” results ○ Usually a very *bad* idea to just paste output from those tools in a (serious) code review report *PLUS* splint fails to see vulnerable calls
  • 7. A smarter approach ➔ C based code projects are ultimately made to be compiled & linked ◆ Compilers are the best code reviewers !! ● Code is parsed and transformed into another format ● Code is validated ● Some additional checks are even provided by default for programming errors (type checks, unused vars, invalid formatted strings, uninitialized values, etc…)
  • 8. Quick reminder on compilers ● Compiler, noun : set of programs that transforms source code written in a programming language into another computer language (Wikipedia). ■ Examples : GCC, as, Python ( which embeds a JIT compiler), etc... ● Abstract representation of compiler behavior:
  • 9. LLVM Specifics ● What makes LLVM so special ? ○ LLVM (Low-Level Virtual Machine) : 13 year old project ○ Many different projects around this architecture ○ LLVM structure *truly* isolates each part (lexing/optimizing/generating) ○ Totally Plug-and-Play ● you can easily write a lexer for generating Python .pyc file ... ● … or you can use optimizer API to help runtime bug detection (heard of Google AddressSanitizer module ?) … ● … or you can use an existing parser (for instance GCC’s) and bind it to the rest of the LLVM architecture (llvm-gcc) → really cool features ! Go hack it !!
  • 10. LLVM Specifics ● Clang ○ Default C/C++/Obj-C compiler based for LLVM architecture ○ Parser gets .c, .cpp, .m files as input and generates an Intermediate Representation (IR) of the code → this is achieved thanks to an Abstract Syntax Tree (AST) created when “reading” each source file ○ An API is provided to interact with the generated AST → in native C++ → or higher languages, like Python ■ This means that Clang parses the code for us, then why not use this to parse code in a smart way (and ultimately find vulnerabilities) ?
  • 11. Clang Python API ● Relatively easy to use... ○ … but not enough thoroughly documented (just automatically generated documentation) → pydoc works fairly well on it ○ Many blog posts (but sometimes outdated on the topic) ○ Namespace fairly intuitive Basic example : outputs
  • 12. Demo ● clang-draw-ast.py is a 70-line Python script that will parse a C source file and display (PNG format) the corresponding AST.
  • 13. (This is the expected result if live demo fails)
  • 14. Let’s have a look...
  • 15. The magic inside Indexation engine API is exposed by `clang.cindex` package. ● Index ○ top-level object which manages some global library state. ● TranslationUnit ○ High-level object encapsulating the AST for a single translation unit (parsed on the fly) ● SourceRange, SourceLocation, and File ○ Objects representing information about the input source.
  • 16. Clang internals voodoo The routines in this group provide the ability to create and destroy translation units from files, either by parsing the contents of the files or by reading in a serialized representation of a translation unit. ● Once indexation engine is created, parse() function will output a TranslationUnit object ○ The most important object ● Cursor object that will iterate through all nodes ○ kind : declare the type of the current node ○ displayname : display name for the entity referenced ○ location : returns the source location (the starting character) ○ get_children() : return an iterator for accessing the children of this cursor ○ get_arguments(): return an iterator for accessing the arguments of this cursor
  • 17. Clang internals voodoo Now we can better understand the previous script Easy, right ? 1 2 3 4
  • 18. Pros / Cons Pros ● simple and intuitive Python bindings ● full control over all the code being audited ● parsing and browsing are fast ● can be extended with LLVM extra modules Cons ● generated over Python ctypes : might not work as well for other high level languages (Ruby, Java, etc.) Limitations ? ● Many developments, API keeps on improving and docs becoming more complete
  • 19. Introducing CodeBro! ● Built as a Proof-of-Concept around this idea ○ Meaning : you can use it but don’t rely on it ● Underlying idea : create a web-based tool that would interface between AST and code reviewer ○ Code reviewer can smartly analyse/navigate through code and eventually add some modules to detect basic (or advanced) vulnerabilities
  • 20. CodeBro! ● 100% Open-Source ○ Beer-Ware License ● 100% full Python ● (Hopefully) Easily installable (pip) ● Django (compat. 1.5+) based application ○ combines many cool Python based technologies ■ PyDot ■ PyCharm ■ Pygments ■ etc. ○ Allows to keep things simple ■ 1 project to audit = 1 specific database (default : SQLite)
  • 21. CodeBro! ● Uses Clang parsing module to dynamically interact with code ○ Cross-referencing feature similar to IDA Pro ■ only between functions (caller/callee) ○ call graphs generation : visual understanding of code ■ SVG generated graph → can be browsed through browser
  • 22. CodeBro! ● “Analysis” module ○ reports all default diagnostics provided by Clang ○ provides a “Plugin” API ■ some modules implemented ■ … some more to come
  • 23. CodeBro! ● Extensible through plugins ○ can use AST and/or already existing references ○ Examples : ■ detecting dead code ● find all functions never called (i.e. no down Xref to it) ■ improving format string flaws detection ● “count” number of args for known functions (printf, sprintf, etc.) and parse the arguments ● detect formatted string wrapping functions (based on former calls) ■ (in a limited extent) detect use-after-free like this →
  • 24. Demo time (More screenshots if demo still fails)
  • 26. Code browsing - unparsed then parsed
  • 27. Call graph generation : SVG generation (href linking) ← Functions listing
  • 28. Future enhancements ● Still a work in progress ● Fix bugs ● Index all components of source files (instead of just CALL_EXPR and FUNCTION_DECL) ● Improve search engine ● Add macro parsing ● Integrate more source code input vector (GIT - as soon as there is a decent Python GIT bindings package) ● Improve C++ and Objective-C analysis ● Add moar modulez !!
  • 30. Links : ● https://github.com/hugsy/codebro ● https://twitter.com/_hugsy_ ● http://eli.thegreenplace.net/2011/07/03/parsing-c-in-python-with-clang ● http://llvm.org/devmtg/2010-11/Gregor-libclang.pdf ● https://code.google.com/p/address-sanitizer/wiki/AddressSanitizer