21. cwe.mitre.org
250 internally mined common entries
+ 200 entries from other sourcesECG
• Template for issue description
• Catalog of 400 entries
applicable for PHP and
Magento code
Describing Issues
28. Trends
• Most popular issues
• Issues breakdown by location, impact, time of
introduction
• Overall code quality
• Better understanding nature of the issues
41. Issues outside PHP code
Xml files (configuration & layout updates)
DB Schema (indexes, non-optimal field types)
Wrong file’s placing & naming
Javascript, CSS & HTML issues
42. Working on compound sniffers
1. Many different approaches
which should be used together
2. Calculations redundancy
Tokenize code again and again by each sniffer
Typically Magento application have over 8,000 files consisting of code,
templates, JavaScript and CSS
Difficulties
46. Software graph
1. File system as part of graph
2. PHP Reflection as part of graph
(TokenReflection)
47.
48. Software graph
1. File system as part of graph
2. PHP Reflection as part of graph
(TokenReflection)
3. PHP lexical tree inside
methods & functions as part of graph
(PHP_Parser)
51. Software Graph’s API
• Visitor
• Direct querying
search methods, fluent interface, state monad
• Query language
just syntactic sugar
52. Software graph: additional benefits
1. Query caching, lazy loading
2. Intelligent node search,
traverse algorithms based on relation types
3. Easy way to get path (issue location)
File Class Name Method name Line numbers
53. Query Language Implementation
Parser:
Built with Loco, parser combinator for PHP
Interpreter:
State monad wrapper for graph traverse API
+
1. Simple boolean operators
2. Tunneling to native php functions
55. Example 1
Find model load in loops
LoopStatement.bodyMethodCall[name = “load”]
class Ecg_Sniffs_Performance_LoopModelLoadSniff implements PHP_CodeSniffer_Sniff
{
public function register()
{
return array(T_WHILE, T_FOR, T_FOREACH, T_DO);
}
public function process(PHP_CodeSniffer_File $phpcsFile, $stackPtr)
{
$tokens = $phpcsFile->getTokens();
$opener = $tokens[$stackPtr]['scope_opener'];
$closer = $tokens[$stackPtr]['scope_closer'];
for ($ptr = $opener + 1; $ptr < $closer; $ptr++) {
$content = $tokens[$ptr]['content'];
if ($tokens[$ptr]['code'] === T_STRING && $content == 'load') {
$phpcsFile->addError('Model load in loop detected', $ptr,
'ModelLoad', array $content));
}
}
}
}
//*[
name()="node:Stmt_Foreach" or
name()="node:Stmt_Do" or
name()="node:Stmt_For" or
name()="node:Stmt_While"
]//node:Expr_MethodCall/subNode:name[
scalar:string = "load"
]
56. Example 2
Find all methods in code that has inconsistence
between docBlock annotation and really returned value
Method [
DocBlock.returnAnnotation.types as $types,
Statement [
name=“return”,
!(expression.returnedType in $types)
]
]
57. Example 3
Find direct output in models
(MageModel or MageResourceModel)OutputStatement
58. Rule Examples
1. Perhaps DB query not inside resource model or install/upgrade script is an issue
2. DB query inside block and controller definitely is an issue
Next concept: confidence
Perhaps? Definitely?
Two types of confidence
1. Confidence based on accuracy of sniffs
Any rules have exceptions
2. Confidence based on accuracy of observations
Used technologies are not ideal
59. Code Bases
1. Target codebase
Concrete module, local code pool
2. Auxiliary codebase
PEAR libs, whole Magento application
Example:
Analyzed class inside target code base,
parent class inside auxiliary codebase. We
search for copy-pasted code in overridden
methods without parent’s method call.