5. SAPIS
Server API
CLI, CGI, mod_php, phpdbg, embed
input args
output, flushing, file descriptors, interruptions, system user info
input filtering and optionally headers, post data, http specific stuff
6. Extensions
Talk to a C library
do stuff faster then PHP
make the engine funny
8. Who Cares?
Pick the right SAPI
Fewer extensions = better
Static extensions = better
Lifecycle is important for sharing stuff
Newer PHP = better faster stronger
9. turn on the “go_fast” ini setting
Thread, fork, async, very wow
10. Threading
Thread Safe != reentrant
Thread safe != parallel
Thread safe != async
Thread safe != concurrent
Thread safe == two threads running at the same time won’t stomp on the others data
yes really, that’s all it means
11. Reentrant
Let’s quit this, and run it again
and it will be like we never ran it
12. Async
I’m gonna work on this stuff
But I’m not going to block you if you have important stuff to do
13. Parallel … Concurrent
Concurrent – two things at the same time that need communication
Parallel – two things at the same time
14. TSRM
Thread safe resource manager
global data in extensions
making some C re-entrant
thread safety
15. Why do I care?
react-php (parallel)
pecl event (async)
pthreads (concurrent)
pcntl (fork and pray)
proc_open/popen (subprocessing)
queues and jobs and workers
native tls rfc
16. Welcome to the Engine
Lexers and Parsers and Opcodes OH MY!
17. Lexer
checks PHP’s spelling
turns into tokens
see token_get_all for what PHP sees
26. Numbers
Booleans are unsigned char
Integers are really signed long integers
Longs are platform dependent
Floats and doubles are doubles not floats
27. 64 Bit Madness
LLP64
short = 16
int = 32
long = 32
long long = 64
pointer = 64
(windows)
LP64
short = 16
integer = 32
long = 64
long long = 64
pointer = 64
(unices)
28. Strings
Char *
Translated to what we see by an algorithm
ASCII, UTF8, binary – EVERYTHING has a codepage
wchar? screw you
32. Why do I care?
Know the limitations of your data types
Remember that arrays aren’t arrays
Beware of many many resources
Beware of many many objects
64 bit can be broken in strange ways
34. Stack? Heap?
Stack = scratch space for thread of execution
can overflow!
slightly faster
size determined at thread start
Heap = space for dynamic allocation
managed by program
can fragment
leaky!
35. Zend Memory Manager
Internal Heap Allocator
frees yo memory (leak management)
preallocates blocks in set sizes that PHP uses
caches allocations to avoid fragmentation
allows monitoring of memory usage
36. COW (not moo)
Copy On Write
1 zval, many variables
each variable increases refcount
destroy after refcount
Oh no, a change! copy
37. Refcounts, GC, and PHPNG
Sometimes you have a refcount but no var to reference it
This is a circular reference, this sucks (ask doctrine)
GC checks for this periodically and cleans up
PHPNG
38. References are not Pointers
PHP is smarter than you are
access the same variable content by different names
using symbol table aliases
variable name != variable content
39. Side Track – Objects are not References
$a = new stdClass;
$b = $a;
$a->foo = 'bar';
var_dump($b);
$a = 'baz';
var_dump($b);
40. Places to Learn More
http://www.phpinternalsbook.com
http://php.net
http://lxr.php.net
http://wiki.php.net
http://nikic.github.io/
http://blog.krakjoe.ninja/
41. About Me
http://emsmith.net
@auroraeosrose
That’s Aurora Eos Rose
auroraeosrose@gmail.com
freenode in #phpmentoring #phpwomen #phpinternals
Hinweis der Redaktion
story of how I got into internals in the first place
and how each new discovery (extensions, sapis, engine, oh look now I can do it all) led down the alice rabbit hole
but it also made me a better PHP programmer
because I knew all the WTFs
So PHP does the architecture of it’s system right – it’s as big as it needs to be, and no bigger – but all the important components are pluggable and extendable which makes it awesome glue
take a side track about learning more about programming and how down or up the stack is usually more valuable in general than going across stacks
well talk more about this later, but this is the part that actually looks at and analyzes your source code
and makes it actually like – talk to your cpu and run
yes, this isn’t really different from say c# or java – the difference is WHAT it compiles to
C compiles to machine code your system can immediately use
c# goes to msil which is run on their runtime
java gotes to bytecode that runs on the java vm
smarty compiles a template to a (horrible) php file
so the core functionality of PHP (in main) is kind of a mishmash – but generally it’s IO
the php manual lies though – if you look up “core” functionality what is actually IN core is not nearly so much
instead what you’re seeing is extensions you can’t “turn off” – well you can if you’re nuts
try it sometime, PHP is really boring without it’s “standard lib”
SAPIs provide the glue for interfacing PHP into an application. They define the ways in which data is passed between an application and PHP
this is really what sets PHP apart
this can also make or break you if you choose poorly
a lot of sapi choice is dictated by server choice, although most have gone to fastcgi at this point, which although it’s an old protocol it works well and is stable and “shared nothing”
for example python invented it’s own interface (wsgi) which requires a separate server that talks something your server actually talks (fastcgi, scgi) instead of using a pluggable model
there are those that whine that PHP doesn’t have this “middleware” – it’s actually easily doable though – you could do a dedicated sapi or just use the embedded sapi
sapis take care of setting up interpreter context, dealing optionally with headers and input args (don’t need to do anything, it’s entirely optional, if you want yur PHP code itself needed work)
we could use more sapis! people run away from writing these which is sad – I recruit – have a list of ones I’d love to mentor you into writing
almost everything is an extension
there are two types, regular and “zend” extensions
zend extensions can “hook” engine behavior using opcodes
99.9% of the functionality you use comes from this
SO – threading makes this a little weird – because MINIT is not run in new threads
so ginit is called right before rinit if ZTS is on (annoying)
so why is it important to know this stuff
you need to know what extensions are available
why you would compile your own PHP with all static extensions
sharing can be limited when requests aren’t shared
This is going to annoy some people because they go on and on about how PHP is “thread safe” – but really it’s not
it’s kind of almost able to be threaded when compiled right most of the time
others blame things on libraries
no – no there are some very bad things in core that totally prevent this
ah “thread safety”
Parallelism is the act of taking a large job, splitting it up into smaller ones, and doing them at once. People often use "parallel" and "concurrent" interchangably, but there is a subtle difference. Concurrency is necessary for parallelism but not the other way around. If I alternate between cooking eggs and pancakes I'm doing both concurrently. If I'm cooking eggs while you are cooking pancakes, we are cooking concurrently and in parallel. Technically if I'm cooking eggs and you are mowing the lawn we are also working in parallel, but since no coordination is needed in that case there's nothing to talk about.
talk about what each of these means
Lexical Analysis
Converts the source from a
sequence of characters
into a
sequence of tokens
Syntax Analysis
Analyzes a
sequence of tokens
to determine their
grammaticalstructure
5.6 and 7+
Generate
bytecode
based on the information gathered byanalyzing the
sourcecode
abstract syntax tree – decouples compiler and parser steps – even though we compile to opcode, it’s still a compile
before php7 we emit opcodes directrly from parsing (a bit eww) now we can do better, cooler stuff
so zend is actually a “virtual machine”
it interprets OPCODES and does stuff with them
reads each opcode and does a specific action – like a giant state machine
underlying it all PHP just has some basic types
every zval stores some value and the type this value has
A union defines multiple members of different types, but only one of them can ever be used at a time
unions store all their members at the same memory location and just interpret the value located there differently depending on which member you access. The size of the union is the size of its largest member
so why do we care that this is how PHP stores stuff? at the end of the day those are actual C types underneath we’re just “mapping” to with many of the conversion rules that go along with it
char = smallest addressable unit of the machine
IEEE 754 single-precision binary floating-point format = float
IEEE 754 double-precision binary floating-point format = double
this is more precision – but remember to use gmp for real math
The disadvantage of the LP64 model is that storing a long into an int may overflow
converting a pointer to a long will “work” in LP64
useful for – the lazy
LLP64 is generally the “safer” route – you can’t convert a pointer to a long (WTF WOULD YOU?) and a long to an int won’t overflow (BC considerations) therefore logical choice for windows
handling of strings >= 2^31
handling of 64 bit integers
large file support
handling of numeric 64 bit hash keys
Fixed in PHP7
null terminated array of chars
Another way of accessing a contiguous chunk of memory, instead of with an array, is with a pointer.
the character array containing the string must already exist (having been either statically- or dynamically-allocated)
C is a programming language that was developed in an environment where the dominant character set was the 7-bit ASCII code. Hence since then the 8-bit byte is the most common unit of encoding. However when a software is developed for an international purpose, it has to be able to represent different characters. For example character encoding schemes to represent the Indian, Chinese, Japanese writing systems should be available. The inconvenience of handling such varied multibyte characters can be eliminated by using characters that are simply a uniform number of bytes. ANSI C provides a type that allows manipulation of variable width characters as uniform sized data objects called wide character
story of Microsoft’s early adoption, utf8 on a napkin
Arrays in C are just regions of memory that can be accessed by offset
offsets must be continuous integers
complex key becomes integer via hash function
for hash collisions – PHP stores all the items with the same hash in a linked list
so – advantage of resources is they’re smaller than objecst in php
disadvantages are very numerous
they’re slow – depending on what you’re doing much slower than objects
they’re limited – you can literally run out of resources,
and they get shove in a giant list in the executor
no seriously
this is why they suck
sadpanda
like resources, these are stored in executor globals
like resources eventually some day you might run out
but you can sure do a lot more with them
store opaque data, deal with opaque data, etc
The stack is the memory set aside as scratch space for a thread of execution. When a function is called, a block is reserved on the top of the stack for local variables and some bookkeeping data. When that function returns, the block becomes unused and can be used the next time a function is called. The stack is always reserved in a LIFO (last in first out) order; the most recently reserved block is always the next block to be freed. This makes it really simple to keep track of the stack; freeing a block from the stack is nothing more than adjusting one pointer.
The heap is memory set aside for dynamic allocation. Unlike the stack, there's no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time; there are many custom heap allocators available to tune heap performance for different usage patterns.
Each thread gets a stack, while there's
typically only one heap for the application (although it isn't uncommon to have multiple heaps for different types of allocation).
things get crazy when you add dynamically loaded modules – they may have their OWN heap
fewer calls to malloc – less cpu usage, less kernel madness
less fragmentation
LIBRARIES do not use it!
you can turn it off, but 99.9% of the time it’s better with it
The behavior is very straightforward: When a reference is added, increment the refcount, if a reference is removed, decrement it. If the refcount reaches 0, the zval is destroyed.
All values in existing Zend Engine implementation were allocated on heap and they were subject for reference counting and garbage collection. Zend engine mostly operated by pointers to zvals
phpng stores data in a totally different wayand even though I saw an interesting talk on it at zendcon I’m stillw rapping my head around the code
basically it seperates scalar from non-scalar and uses flags to give it information (type, etc)
Assigning values by references when you don't need to (in order to later modify the original value through a different label) is NOT a case of you outsmarting the silly engine and gaining speed and performance. It's the opposite, it's you TRYING to outsmart the engine and failing, because the engine is already doing a better job than you think.
in other words, references are basically useful for – digging into internal nested arrays and input/output parameters – that’s about it
objects do behave in a references
but remember a variable that is assigned to an object just holds a pointer to the actual value of the object – which is elsewhere
if you later assign that