SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Static Analysis
   for PHP
          PHPDay Verona, Italy 2012
 Nick Galbreath @ngalbreath nickg@etsy.com
http://slidesha.re/

  KzTfLy
github.com/client9/hphp-tools

    Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Static Analysis
• Typically analyzes source code “at rest” for
  bugs, security problems, leaks, threading
  problems.
• We’ll cover simple checks and HpHp
• Some commercial tools exists too.
  Veracode runs off of PHP byte code
  http://www.veracode.com/products/static

 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Dynamic Analysis
        Analysis of code while running


• valgrind http://valgrind.org/
• xdebug http://xdebug.org/
• xhprof http://pecl.php.net/package/xhprof

      Great tools, but not for this talk.
Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Simple Static Analysis
The Littlest Static Analysis
                  php -l
  • Syntax errors should never be committed.
  • Syntax errors should never go to prod!
  • Make sure dev and prod versions of PHP
    are identical




   Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
PHP Leading Whitespace

pre-commit check that every file starts with
either #! or <?php exactly




Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
PHP Trailing Whitespace
 Check that file ends exactly with ?> or make
 sure it doesn’t have a closing tag.




 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Anti-Virus
      On Source Code

• It’s static analysis too!
• Not so concerned with PHP but do you
  have Javascript, Flash, Word, PowerPoint,
  PDFs, ZIPs in your source tree?



 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
ClamAV
• http://www.clamav.net/
• Free anti-virus.
• Available on every OS.
ClamAV Performance




   1G of Source Code / Minute
      Why not do it?
Advanced Static Analysis
Why not use... AST?
http://docs.php.net/manual/en/function.token-get-all.php

    •    token_get_all($file)  takes a file and
         returns an Abstract Syntax Tree in php.
    • Orders of magnitude slower -- can’t use for
         pre-commit check on large code bases
    • Too low level -- need to turn it into an
         intermediate representation.


        Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Why not use...
         CodeSniffer?
 http://pear.php.net/package/PHP_CodeSniffer

• Excellent tool, but...
• Based on token_get_all
• SAX-style API
• Too slow for pre/post commit hooks

 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Why not use...
• php-SAT:Orphaned 2009
• php-AST: Orphaned 2008
• phc: active but doesn’t support... OBJECTS
• Every other PHP to Java translator or
  converter is orphaned or has other
  problems


 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Facebook’s HpHp
• A full re-implementation of Apache+PHP
• Compiles PHP to it’s own byte code format
  and executes in own runtime.
• May also translate to C++ for other
  compilation or use JIT
• Does type-inference for speed-ups
• Also includes a HTTP web server
 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Bad News #1
      No action since
        2011-12-06
Facebook appears to use “code drops”
instead of true “streaming” open source
model. BOO.




Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Bad News #2
Missing Many Common
       Modules
• Has: apc, array, bcmath, bzip2, ctype,
  curl,iconv, gd, imap, ipc, json, ldap,math,mb,
  mcrypt,memcache, mysql, network,
  openssl, pdo, posix, preg, process, session,
  simplexml, soap, socket, slqite3, stream,
  string, thread, thrift, url, xml*, zlib.
• That’s it!   (No filter_var, no ftp, no ..)
 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Bad News #3
Doesn’t Track PHP 5.3
                PHP 5.4? No way!
•   Some functions signatures aren’t quite
    right. e.g. debug_print_backtrace
    • HpHp 2 arguments
    • PHP 5.3.6 3 arguments
    • PHP 5.4 4 arguments
• (End up needing to whitelist this to ignore
    false positives)
 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Bad News #4
    Seriously #*$%&!#
     annoying to build
• My crappy CentOS build script
  https://github.com/client9/hphp-tools
• Ubuntu users are slightly better off (see
  HpHp wiki)
• Takes hours.
 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Bad News #5
    Won’t help with
   Dynamic Evaluation
$fn = “foo”;
$fn(1,2,3); // function not found
eval(“foo(1,2,3)”); // no

  • This is more for runtime dynamic analysis.
  • Try to avoid this anyways.
 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Conclusion

• You aren’t going to run your application
  under HpHp (at least not as is)
• But, it has a great static analyzer that works
  and finds real bugs really fast.
• Scans thousands of files in a few seconds

 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Using HpHp
Step 1: Make a
         constants file
• HpHp doesn’t know about hardwired constants
• Nifty script generates the constants
• May need to hand edit



 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Step 2: Make a stubs file

• HpHp doesn’t have many binary extensions
• But... the analyzer doesn’t care. Just make a
  stub function.
  // http://php.net/manual/en/function.filter-var.php
  function filter_var($var, $filter=0, $options=NULL) {
     return $var;
  }

  You can make stub classes as well.

      Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Step 3: Create the file list
 • Create a list of all php files to be analyzed
   and include your constants and stubs file.
 • Ignore phpunit and other tests
 • HpHp implements much of PHP base
   functionality as PHP code. (e.g. the
   Exception class is written in PHP). You
   need to add these system PHP as well


  Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
correction:
       grep -v helper.idl.php | grep -v constants.php >> $JUNK
          Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Step 4: Do it




Include paths are a bit mysterious.
You’ll have to play around to get it right.
   Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Step 5: Analyze it

•   /tmp/hphp/Stats.js contains some...
    statistics in JSON format.
•   /tmp/hphp/CodeError.js is were the
    good stuff is.
•    JSON format, includes:
    Error type, file, line number, code snippet


Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
UseUndeclaredVariable

 • #1 bug.
 • Typically typos, scoping or cut-n-paste
   errors
 • Found frequently in error handling cases
if (!$ok) {
    error_log(“$user_id has a problem”);


  Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
TooManyArgument
TooFewArgument
Too Many Arguments typically indicates the
caller is confused and has logic errors (bug).
Too Few Arguments is frequently a serious
bug as PHP silently fails and defaults to null.

hash_hmac(‘sha1’, ‘foo’); // ooops no key



Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
BadDefine
    UsesEvaluation
define($k, $v);
eval(“1+1”);


 • “Bad” since HpHp can’t compile it, but likely
   legal PHP.
 • Avoid using dynamic constant generation.
   Use configuration file instead.

   Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
UseUndeclaredGlobalVariable


  • HpHp only defines certain globals.
  • Used only by Smarty?
   •   $GLOBALS['HTTP_SERVER_VARS']

   •   $GLOBALS['HTTP_SESSION_VARS']

   •   $GLOBALS['HTTP_COOKIE_VARS']




   Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
UseVoidReturn
Some function returns “nothing” but the
value is used
function foo() {
   if (time() % 60 == 0) { return true; }
   // oops void
}

$now = foo(); // error




Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
RequiredAfterOptionalParam
   function foo($first, $second=2, $third) {




   • IMHO should be a PHP syntax error
   • Confusing
   • (Oops, I haven’t investigated behavior)


    Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
DeclaredConstantTwice


• Probably not invalid PHP, but HpHp
  analyzes all files at once.
• Best to have one file that defines constants
  or just not use them.



 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
UnknownFunction
 UnknownObjectMethod
    UnknownClass
  UnknownBaseClass

• Is your file list complete?
• Do you need to make stubs?

 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
BadPHPIncludeFile


 Likely a PHP file trying to include/require
 itself or invalid file name or your autoloader
 is ambiguous.




 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
PHPIncludeFileNotFound


 • Really common
 • Probably unique to your autoloader.
 • Not sure I quite understand how HpHp
   computes file names and loads includes,
   requires, require_once... yet



  Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
HpHp at Etsy
Every Commit
• Every commit gets checked in real-time
• “try-server” also allows developers to test
  before committing.
• Finds and prevents bugs before
  they go live every day.
• Almost no false positives (!!)
• Developers love it (especially the Java
  groups)
 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Analysis
•   CodeError.js is processed through a
    custom script.
• Has a large blacklist of checks or files we
    don’t care about (3rd party, known bad,
    etc).
• File and line info pass through to git blame
    to find author and date/time.

 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
hphp-try runs in Jenkins

                       oops

                                 Console Output
                                  gives details




 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Work in Progress
• It took a lot of work to get the code base in
  shape so we could add pre-commit hook.
• Over 200 real problems first identified.
• We still have blacklisted some checks since
  we are still cleaning up legacy code (and
  figuring out how HpHp works)


 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Can We Do Better?
Checks aren’t that
      complicated
• HpHp’s runtime type-inference isn’t used for
  static analysis (good since type-inference is
  hard)
• All checks are fairly simple book-keeping.
• All could be done in CodeSniffer/AST but
  too slow


 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Slice off HpHp?

• The HpHp Runtime is nice, but really
  complicated and a moving target.
• Can we slice out the analysis part of HpHp?
• Much simpler to build, easier to hack on.

 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Or Build New?

• Can this run off “byte code” or hook into
  the parsing step of PHP?
• Exec a snippet of PHP for the loading script
  files ?
• Seems feasible

 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Acknowledgments
 and References
Thanks
• The Facebook Team!
• Sebastian Bergman who first blogged about
  using HpHp for static analysis
• Rasmus who first hacked up a version of
  HpHp in house at Etsy
• The QA and DevTools teams at Etsy
• All the Etsy developers who had some
  painful weeks getting the code in shape!
 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Facebook References
• https://github.com/facebook/hiphop-php
  Main source repo + wiki
• http://developers.facebook.com/blog/post/
  2010/02/02/hiphop-for-php--move-fast/
  Main announcement, 2010-02-02
• https://www.facebook.com/note.php?
  note_id=416880943919
  Update 2012-08-13

 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Notes from
    Sebastian Bergman
• http://sebastian-bergmann.de/archives/894-
  Using-HipHop-for-Static-Analysis.html
  Static Analysis Intro, 2010-07-27
• http://sebastian-bergmann.de/archives/918-
  Static-Analysis-with-HipHop-for-PHP.html
  Tool to help process output, 2012-01-27


 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Misc References
• http://arstechnica.com/business/2011/12/
  facebook-looks-to-fix-php-performance-
  with-hiphop-virtual-machine/
  ArsTechnica overview, 2011-12-13
• http://www.serversidemagazine.com/news/
  10-questions-with-facebook-research-
  engineer-andrei-alexandrescu/
  Lots of good stuff in here, 2012-01-29

 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
This Talk
• These slides are posted at
  http://slidesha.re/KzTfLy
• Tools for building on CentOS
  https://github.com/client9/hphp-tools
• More about Nick Galbreath
  http://client9.com/



 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Nick Galbreath nickg@etsy.com @ngalbreath
      PHPDay Verona Italy May 19, 2012
            http://2012.phpday.it/

Weitere ähnliche Inhalte

Andere mochten auch

Static code analysis
Static code analysisStatic code analysis
Static code analysis
Rune Sundling
 
libinjection: from SQLi to XSS  by Nick Galbreath
libinjection: from SQLi to XSS  by Nick Galbreathlibinjection: from SQLi to XSS  by Nick Galbreath
libinjection: from SQLi to XSS  by Nick Galbreath
CODE BLUE
 

Andere mochten auch (6)

Static code analysis
Static code analysisStatic code analysis
Static code analysis
 
Static code analysis
Static code analysisStatic code analysis
Static code analysis
 
libinjection: from SQLi to XSS  by Nick Galbreath
libinjection: from SQLi to XSS  by Nick Galbreathlibinjection: from SQLi to XSS  by Nick Galbreath
libinjection: from SQLi to XSS  by Nick Galbreath
 
How To Detect Xss
How To Detect XssHow To Detect Xss
How To Detect Xss
 
Content security policy
Content security policyContent security policy
Content security policy
 
The promise of asynchronous PHP
The promise of asynchronous PHPThe promise of asynchronous PHP
The promise of asynchronous PHP
 

Mehr von Nick Galbreath

Making operations visible - devopsdays tokyo 2013
Making operations visible  - devopsdays tokyo 2013Making operations visible  - devopsdays tokyo 2013
Making operations visible - devopsdays tokyo 2013
Nick Galbreath
 
Faster Secure Software Development with Continuous Deployment - PH Days 2013
Faster Secure Software Development with Continuous Deployment - PH Days 2013Faster Secure Software Development with Continuous Deployment - PH Days 2013
Faster Secure Software Development with Continuous Deployment - PH Days 2013
Nick Galbreath
 
Data Driven Security, from Gartner Security Summit 2012
Data Driven Security, from Gartner Security Summit 2012Data Driven Security, from Gartner Security Summit 2012
Data Driven Security, from Gartner Security Summit 2012
Nick Galbreath
 
Fraud Engineering, from Merchant Risk Council Annual Meeting 2012
Fraud Engineering, from Merchant Risk Council Annual Meeting 2012Fraud Engineering, from Merchant Risk Council Annual Meeting 2012
Fraud Engineering, from Merchant Risk Council Annual Meeting 2012
Nick Galbreath
 

Mehr von Nick Galbreath (15)

Making operations visible - devopsdays tokyo 2013
Making operations visible  - devopsdays tokyo 2013Making operations visible  - devopsdays tokyo 2013
Making operations visible - devopsdays tokyo 2013
 
Faster Secure Software Development with Continuous Deployment - PH Days 2013
Faster Secure Software Development with Continuous Deployment - PH Days 2013Faster Secure Software Development with Continuous Deployment - PH Days 2013
Faster Secure Software Development with Continuous Deployment - PH Days 2013
 
Fixing security by fixing software development
Fixing security by fixing software developmentFixing security by fixing software development
Fixing security by fixing software development
 
DevOpsDays Austin 2013 Reading List
DevOpsDays Austin 2013 Reading ListDevOpsDays Austin 2013 Reading List
DevOpsDays Austin 2013 Reading List
 
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
 
SQL-RISC: New Directions in SQLi Prevention - RSA USA 2013
SQL-RISC: New Directions in SQLi Prevention - RSA USA 2013SQL-RISC: New Directions in SQLi Prevention - RSA USA 2013
SQL-RISC: New Directions in SQLi Prevention - RSA USA 2013
 
Rebooting Software Development - OWASP AppSecUSA
Rebooting Software Development - OWASP AppSecUSA Rebooting Software Development - OWASP AppSecUSA
Rebooting Software Development - OWASP AppSecUSA
 
libinjection and sqli obfuscation, presented at OWASP NYC
libinjection and sqli obfuscation, presented at OWASP NYClibinjection and sqli obfuscation, presented at OWASP NYC
libinjection and sqli obfuscation, presented at OWASP NYC
 
Continuous Deployment - The New #1 Security Feature, from BSildesLA 2012
Continuous Deployment - The New #1 Security Feature, from BSildesLA 2012Continuous Deployment - The New #1 Security Feature, from BSildesLA 2012
Continuous Deployment - The New #1 Security Feature, from BSildesLA 2012
 
New techniques in sql obfuscation, from DEFCON 20
New techniques in sql obfuscation, from DEFCON 20New techniques in sql obfuscation, from DEFCON 20
New techniques in sql obfuscation, from DEFCON 20
 
Data Driven Security, from Gartner Security Summit 2012
Data Driven Security, from Gartner Security Summit 2012Data Driven Security, from Gartner Security Summit 2012
Data Driven Security, from Gartner Security Summit 2012
 
Slide show font sampler, black on white
Slide show font sampler, black on whiteSlide show font sampler, black on white
Slide show font sampler, black on white
 
Fraud Engineering, from Merchant Risk Council Annual Meeting 2012
Fraud Engineering, from Merchant Risk Council Annual Meeting 2012Fraud Engineering, from Merchant Risk Council Annual Meeting 2012
Fraud Engineering, from Merchant Risk Council Annual Meeting 2012
 
Rate Limiting at Scale, from SANS AppSec Las Vegas 2012
Rate Limiting at Scale, from SANS AppSec Las Vegas 2012Rate Limiting at Scale, from SANS AppSec Las Vegas 2012
Rate Limiting at Scale, from SANS AppSec Las Vegas 2012
 
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
 

Kürzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Static Analysis for PHP, from PHPDay Italy 2012

  • 1. Static Analysis for PHP PHPDay Verona, Italy 2012 Nick Galbreath @ngalbreath nickg@etsy.com
  • 2. http://slidesha.re/ KzTfLy github.com/client9/hphp-tools Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 3. Static Analysis • Typically analyzes source code “at rest” for bugs, security problems, leaks, threading problems. • We’ll cover simple checks and HpHp • Some commercial tools exists too. Veracode runs off of PHP byte code http://www.veracode.com/products/static Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 4. Dynamic Analysis Analysis of code while running • valgrind http://valgrind.org/ • xdebug http://xdebug.org/ • xhprof http://pecl.php.net/package/xhprof Great tools, but not for this talk. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 6. The Littlest Static Analysis php -l • Syntax errors should never be committed. • Syntax errors should never go to prod! • Make sure dev and prod versions of PHP are identical Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 7. PHP Leading Whitespace pre-commit check that every file starts with either #! or <?php exactly Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 8. PHP Trailing Whitespace Check that file ends exactly with ?> or make sure it doesn’t have a closing tag. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 9. Anti-Virus On Source Code • It’s static analysis too! • Not so concerned with PHP but do you have Javascript, Flash, Word, PowerPoint, PDFs, ZIPs in your source tree? Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 10. ClamAV • http://www.clamav.net/ • Free anti-virus. • Available on every OS.
  • 11. ClamAV Performance 1G of Source Code / Minute Why not do it?
  • 13. Why not use... AST? http://docs.php.net/manual/en/function.token-get-all.php • token_get_all($file) takes a file and returns an Abstract Syntax Tree in php. • Orders of magnitude slower -- can’t use for pre-commit check on large code bases • Too low level -- need to turn it into an intermediate representation. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 14. Why not use... CodeSniffer? http://pear.php.net/package/PHP_CodeSniffer • Excellent tool, but... • Based on token_get_all • SAX-style API • Too slow for pre/post commit hooks Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 15. Why not use... • php-SAT:Orphaned 2009 • php-AST: Orphaned 2008 • phc: active but doesn’t support... OBJECTS • Every other PHP to Java translator or converter is orphaned or has other problems Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 16. Facebook’s HpHp • A full re-implementation of Apache+PHP • Compiles PHP to it’s own byte code format and executes in own runtime. • May also translate to C++ for other compilation or use JIT • Does type-inference for speed-ups • Also includes a HTTP web server Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 17. Bad News #1 No action since 2011-12-06 Facebook appears to use “code drops” instead of true “streaming” open source model. BOO. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 18. Bad News #2 Missing Many Common Modules • Has: apc, array, bcmath, bzip2, ctype, curl,iconv, gd, imap, ipc, json, ldap,math,mb, mcrypt,memcache, mysql, network, openssl, pdo, posix, preg, process, session, simplexml, soap, socket, slqite3, stream, string, thread, thrift, url, xml*, zlib. • That’s it! (No filter_var, no ftp, no ..) Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 19. Bad News #3 Doesn’t Track PHP 5.3 PHP 5.4? No way! • Some functions signatures aren’t quite right. e.g. debug_print_backtrace • HpHp 2 arguments • PHP 5.3.6 3 arguments • PHP 5.4 4 arguments • (End up needing to whitelist this to ignore false positives) Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 20. Bad News #4 Seriously #*$%&!# annoying to build • My crappy CentOS build script https://github.com/client9/hphp-tools • Ubuntu users are slightly better off (see HpHp wiki) • Takes hours. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 21. Bad News #5 Won’t help with Dynamic Evaluation $fn = “foo”; $fn(1,2,3); // function not found eval(“foo(1,2,3)”); // no • This is more for runtime dynamic analysis. • Try to avoid this anyways. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 22. Conclusion • You aren’t going to run your application under HpHp (at least not as is) • But, it has a great static analyzer that works and finds real bugs really fast. • Scans thousands of files in a few seconds Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 24. Step 1: Make a constants file • HpHp doesn’t know about hardwired constants • Nifty script generates the constants • May need to hand edit Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 25. Step 2: Make a stubs file • HpHp doesn’t have many binary extensions • But... the analyzer doesn’t care. Just make a stub function. // http://php.net/manual/en/function.filter-var.php function filter_var($var, $filter=0, $options=NULL) { return $var; } You can make stub classes as well. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 26. Step 3: Create the file list • Create a list of all php files to be analyzed and include your constants and stubs file. • Ignore phpunit and other tests • HpHp implements much of PHP base functionality as PHP code. (e.g. the Exception class is written in PHP). You need to add these system PHP as well Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 27. correction: grep -v helper.idl.php | grep -v constants.php >> $JUNK Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 28. Step 4: Do it Include paths are a bit mysterious. You’ll have to play around to get it right. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 29. Step 5: Analyze it • /tmp/hphp/Stats.js contains some... statistics in JSON format. • /tmp/hphp/CodeError.js is were the good stuff is. • JSON format, includes: Error type, file, line number, code snippet Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 30. UseUndeclaredVariable • #1 bug. • Typically typos, scoping or cut-n-paste errors • Found frequently in error handling cases if (!$ok) { error_log(“$user_id has a problem”); Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 31. TooManyArgument TooFewArgument Too Many Arguments typically indicates the caller is confused and has logic errors (bug). Too Few Arguments is frequently a serious bug as PHP silently fails and defaults to null. hash_hmac(‘sha1’, ‘foo’); // ooops no key Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 32. BadDefine UsesEvaluation define($k, $v); eval(“1+1”); • “Bad” since HpHp can’t compile it, but likely legal PHP. • Avoid using dynamic constant generation. Use configuration file instead. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 33. UseUndeclaredGlobalVariable • HpHp only defines certain globals. • Used only by Smarty? • $GLOBALS['HTTP_SERVER_VARS'] • $GLOBALS['HTTP_SESSION_VARS'] • $GLOBALS['HTTP_COOKIE_VARS'] Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 34. UseVoidReturn Some function returns “nothing” but the value is used function foo() { if (time() % 60 == 0) { return true; } // oops void } $now = foo(); // error Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 35. RequiredAfterOptionalParam function foo($first, $second=2, $third) { • IMHO should be a PHP syntax error • Confusing • (Oops, I haven’t investigated behavior) Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 36. DeclaredConstantTwice • Probably not invalid PHP, but HpHp analyzes all files at once. • Best to have one file that defines constants or just not use them. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 37. UnknownFunction UnknownObjectMethod UnknownClass UnknownBaseClass • Is your file list complete? • Do you need to make stubs? Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 38. BadPHPIncludeFile Likely a PHP file trying to include/require itself or invalid file name or your autoloader is ambiguous. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 39. PHPIncludeFileNotFound • Really common • Probably unique to your autoloader. • Not sure I quite understand how HpHp computes file names and loads includes, requires, require_once... yet Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 41. Every Commit • Every commit gets checked in real-time • “try-server” also allows developers to test before committing. • Finds and prevents bugs before they go live every day. • Almost no false positives (!!) • Developers love it (especially the Java groups) Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 42. Analysis • CodeError.js is processed through a custom script. • Has a large blacklist of checks or files we don’t care about (3rd party, known bad, etc). • File and line info pass through to git blame to find author and date/time. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 43. hphp-try runs in Jenkins oops Console Output gives details Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 44. Work in Progress • It took a lot of work to get the code base in shape so we could add pre-commit hook. • Over 200 real problems first identified. • We still have blacklisted some checks since we are still cleaning up legacy code (and figuring out how HpHp works) Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 45. Can We Do Better?
  • 46. Checks aren’t that complicated • HpHp’s runtime type-inference isn’t used for static analysis (good since type-inference is hard) • All checks are fairly simple book-keeping. • All could be done in CodeSniffer/AST but too slow Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 47. Slice off HpHp? • The HpHp Runtime is nice, but really complicated and a moving target. • Can we slice out the analysis part of HpHp? • Much simpler to build, easier to hack on. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 48. Or Build New? • Can this run off “byte code” or hook into the parsing step of PHP? • Exec a snippet of PHP for the loading script files ? • Seems feasible Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 50. Thanks • The Facebook Team! • Sebastian Bergman who first blogged about using HpHp for static analysis • Rasmus who first hacked up a version of HpHp in house at Etsy • The QA and DevTools teams at Etsy • All the Etsy developers who had some painful weeks getting the code in shape! Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 51. Facebook References • https://github.com/facebook/hiphop-php Main source repo + wiki • http://developers.facebook.com/blog/post/ 2010/02/02/hiphop-for-php--move-fast/ Main announcement, 2010-02-02 • https://www.facebook.com/note.php? note_id=416880943919 Update 2012-08-13 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 52. Notes from Sebastian Bergman • http://sebastian-bergmann.de/archives/894- Using-HipHop-for-Static-Analysis.html Static Analysis Intro, 2010-07-27 • http://sebastian-bergmann.de/archives/918- Static-Analysis-with-HipHop-for-PHP.html Tool to help process output, 2012-01-27 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 53. Misc References • http://arstechnica.com/business/2011/12/ facebook-looks-to-fix-php-performance- with-hiphop-virtual-machine/ ArsTechnica overview, 2011-12-13 • http://www.serversidemagazine.com/news/ 10-questions-with-facebook-research- engineer-andrei-alexandrescu/ Lots of good stuff in here, 2012-01-29 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 54. This Talk • These slides are posted at http://slidesha.re/KzTfLy • Tools for building on CentOS https://github.com/client9/hphp-tools • More about Nick Galbreath http://client9.com/ Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  • 55. Nick Galbreath nickg@etsy.com @ngalbreath PHPDay Verona Italy May 19, 2012 http://2012.phpday.it/

Hinweis der Redaktion

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n