Perf: difference: number of instructions (and L1 misses)
Fast, middle, slow path of LTTng
- Write
- Switch page
- Switch subbuffer (subbuf aligned to page size)
Ftrace:
- Write
- Switch page (event can't be bigger than a page)
- Extra work by the kernel
- Extra work by LTTng, test the arguments, etc.
Extrae: fast, but not flexible
Event: 2 fields
1st field: event type
2nd field: entering or leaving
LTTng signal-safe reentrant, atomic instructions for per-cpu variables
Extrae: doesn't seem to be very optimised
Not to take away anything from Extrae, but it seems that not much
effort was put into optimizing the fast path
- One of those tracers is not scalable
- Many optimisations to be made for LTTng (get CPU id, restartable
sequences for critical sections)