3. “One night only” Curriculum.
Focus on concepts that will allow you to build things.
Lexical analysis
Syntax analysis [PHP7]
Opcode
Extensions
And some interesting facts about PHP
4. Terminology
1. Zvals are the datastructures holding PHP variable data. Variable points to a
Zval
2. A Reference !== Passing by reference. Reference === `Points`
3. Variables references zvals
4. Heap is the memory where zvals live
5. PHP Execution
Interpreted, compiled, or both?
These are the typical steps of a multipass compiler
- Lexical analysis
- Syntax analysis
- Some other steps maybe….
- Opcode [Final compilation]
14. Opcode
VLD / Bytekit / Parsekit
I used VLD, easiest to get to work.
Vulcan Logic Disassembler
Bytekit doesn’t seem to be supported, Parsekit results same is VLD.
17. ZVALS
How PHP represents data and keeps count of references for instance.
Represented by a C type called a Union.
PHP5
18. Zvals
The important difference between PHP 5 and PHP 7 : share the same Zval, regardless of by value or
reference.
Only once some kind of modification is performed the array will be separated.
19.
20. Simulation of the copy-on-write behavior
Time to run some scripts…..
[Script can be found in github]
22. Extensions
I used the PHP-CPP skeleton framework. http://www.php-cpp.com/
You will need sudo apt-get install php5-dev
In Makefile
Assign your extension name to the NAME parameter, and then create a ini file with the same name.
Put this in the ini file
extension=quintillion.so
Make && make install
Run php -i | grep quintillion
/etc/php5/cli/conf.d/quintillion.ini
24. Live long and prosper
Internals is very interesting topic.
Knowledge about internals is good for memory, good for speed, data-structures, native implementations to interface with something.
nicoloubser@gmail.com
@Nico_Loubser
Hinweis der Redaktion
My name is Nico and I am a PHP backend developer at...
Payfast - Where every 2nd Friday, they stop the free head massages, they lock the beer fridge, they take away our kittens, but they allow us to work on a project of our own liking(This is the best of the three).
So over the course of a few own fridays, and some evenings, I have studied PHP internals quite a bit and decided to share with you what I have learned. I am by no means an expert, and the questions I cannot answer now, I will answer in the Meetup group.
I had to decide what topics I will talk about. I figured the best topics will be ones with a more tangible integration possibilities, so I decided I Will focus on concepts that you can use to build tools with, and may mix PHP 5 and 7, but I will make it clear when doing so. I will also cover some memory aspects of PHP.
Most of us think of PHP as an interpreted language. PHP hasn’t been purely interpreted since PHP3. PHP 4 introduced the the Zend engine. Precompiles your syntax and produces opcode.
[ It is still interpreted but Interpretation does not replace compilation completely, it only hides it from the user. ]
This engine splits the processing of PHP code into several phases. This is part of any compilation process. These are the typical steps of a multipass compiler
Lex analysis
Syntax analysis
opcode
PHP gets compiled and as all compilers does, it changes syntax into a target format, and in this case a format that can be interpreted.
OPcode caching causes PHP to skip the lexical and syntax steps as well in subsequent compilations.
No need for APC and similar caching mechanisms as of PHP 5.5 and later. The PHP developers directly integrated what they call OPCache into the core of the product. Not only does this provide greater overall product stability, it is officially supported by the PHP developers.
The very first step is a lexical analysis.
We can use the exact same function that the ZEND engine uses to do the lexical analysis, using token_get_all and token name.
token_get_all is declared as a PHP function giving us entry into the system. In orange you can see it calls the built in tokenize function. ON the right is the tokenize function, and in orange you can see it uses the lex_cscan method.
Token_name function is then used to map the token numbers to their symbolic name.
Tokenisation can generate a lot of data, so I am keeping my examples short.
On the left hand side I am displaying the lexical analysis for a one liner. $number = 10 + 10; In the first column you can see the token id. This token is is returned from the function token_get_all, and we have to use token_name to get the symbolic names, which is displayed in the second columns. The 3rd column is the bit of syntax that was analysed and the line is the line number in the script.
On the right hand side we have the exact same line, but commented out. As you can see the lexer stopped and didn’t analyse anything further, but even comments have a lexical tag
In the following snippet I am importing a piece of code into token_get_all. Token get all returns the token ids. Not shown here is my css classes linked to a token id.
By iterating through my returned array and applying the CSS classes, I managed to make a very promotive source code high lighter.
What does the
PHP7 introduced AST. RFC written by Nikita Popov, whos name features a lot with PHP internals
Abstract syntax tree -
More maintainable parser and compiler
Decoupling syntax decisions from technical issues
In this step the tokens are analysed for grammatical correctness, and allows static analysis of the code. Allows dealing with code in an abstract and robust way, and can be created for tools to view correctness of code. As far as I know there isn’t a build in tool we can use for this, so I used a tool called PHP-Parser to create the Syntax tree for me.
What is really cool about AST’s are that you can generate normal syntax code from AST’s. You can traverse and edit the AST, ans then change it back into normal syntax. you can infact write code that changes itself, should you so desire. But one of the uses for this has been preparing code for porting to older systems, under what I assume could have been a much better rewrite of code, since mass find replaces aren’t possible.
This is the tree that the previous code snippet generates
One kind of problem one can solve using the above two techniques is the missing brackets problem. If anyone has ever worked in a structured block of code or even function of lets say a 1000 lines(yes they exist although they shouldnt) you may have stumbled across the issue of a missing bracket. You do a count for left braces and you get 301, then you do a count for right braces and you get 300. The rest of your day is basically gone.
So I believe that the previous two techniques can be used to solve that problem and it is something I am thinking about working on.
Opcode is generated based on the correctness of the previous two steps.
There are numerous tools available, I decided to use VLD, as I couldn’t install Bytekit. COmparing PArsekit and Bytekit examples on the internet they both look very similar.
YOu can learn a lot about the code by looking at the opcode.
UNderstanding the behaviour of ZVALS us very important and especially in PHP 5 can make a difference in your code. Zvals are allocated on the HEAP, and referenced by PHP variables.
Each ZVAL has a counter that counts the amount of references to it.
It also has an is_Ref field which tells us whether something has been passed by reference to it.
PHP uses copy on write when assigning and editing variables and I Will show that to you in a second
IN PHP 7 zvals are not each independantly allocated on the heap, but stored in a hashmap.
The hashmap however is still in the HEAP.
Non complex types like int and long are no longer stored in the ZEND union but directly in the ZVAL.
This way there are less pointers and lookups and things that you get in PHP 5.
In PHP7 if you do a $a = 1; debug_zval_dump($a) you only see long(1), and not
In PHP data is passed by reference, but references
There are a few options available to create your own extensions
THe hard way. Most probably done in C. This is hard for good reason and I will not go into it now, but I will say that you basically do everything yourself.
The PHP way. There are PHP based libraries that simplifies this task for you. A very good website exists for it, but I cannot find it again.
The C++ way is the way I decided to go.
I used the PHP-CPP skeleton framework and of course php5-dev. The skeleton provides everyting you need. The correct folders, config files.
Make and make install has placed it in the correct directories for me.
PHP does not really distinguish a number type and a string type.
This behavior complies to the PHP standard.
Also due to this behavior, we didn’t and can’t use C++ internal types for the temporary variable used in the function (temp) but used Php::Value as the variable type.
Using int and string is internal only