SlideShare ist ein Scribd-Unternehmen logo
1 von 103
Downloaden Sie, um offline zu lesen
Debugging Node.js in Production
Yunong Xiao
@yunongx
Software Engineer
Node Platform
Node.js @ Netflix
❖ 65+ Million Subscribers
❖ Website (netïŹ‚ix.com)
❖ Dynamic asset packager
❖ PaaS on Node
❖ Internal Services
–Gene Kranz, Flight Director, Apollo 13
“Let's work the problem, people. Let's not make
things any worse by guessing”
Apply the Scientific Method
1. Construct a Hypothesis
2. Collect data
3. Analyze data and draw a conclusion
4. Repeat
Production Crisis
❖ Runtime Performance
❖ Runtime Crashes
❖ Memory Leaks
Netflix is “Slow”
Gather Request Data
http://restify.com
http://github.com/restify/node-restify
Observable REST Framework
to the Rescue
[2014-12-09T14:07:26.293Z] INFO: shakti/restify-audit/20067: handled:
200, latency=1402
(req_id=b3fa3820-7fac-11e4-8908-a5c7b70d676f, latency=1435)
GET / HTTP/1.1
host: www.netflix.com
--
HTTP/1.1 200 OK
x-netflix.client.instance: i-057e47ef
x-frame-options: DENY
content-type: text/html
--
req.timers: {
"parseBody": 700123,
"apiRpc": 701911,
"render": 400031
}
req.timers: {
"parseBody": 700123,
“apiRPC”: 301911,
"render": 400031,
}
On CPU
CPU is Critical
❖ Node is essentially “single threaded”
❖ Cascading effect on ALL requests in process
req.timers: {
"parseBody": 700123,
“apiRPC”: 301911,
"render": 400031,
}
Can’t process ANY other request for 1.1 seconds
On CPU
How Much Code?
$ find . -name "*.js*" | xargs cat | wc -l
6 042 301
Statistically Sample Stack Traces
Snapshot What’s Currently Executing
Stacktrace: A stack trace is a report of the active stack frames
at a certain point in time during the execution of a program.
> console.log(ex, ex.stack.split("n"))
ReferenceError: ex is not defined
at repl:1:13
at REPLServer.defaultEval (repl.js:132:27)
at bound (domain.js:254:14)
at REPLServer.runBound [as eval] (domain.js:267:12)
at REPLServer.<anonymous> (repl.js:279:12)
at REPLServer.emit (events.js:107:17)
at REPLServer.Interface._onLine (readline.js:214:10)
at REPLServer.Interface._line (readline.js:553:8)
at REPLServer.Interface._ttyWrite (readline.js:830:14)
at ReadStream.onkeypress (readline.js:109:10)
Two Problems
1) How to sample stack traces from a running
process?
2) How to do 1) without affecting the process?
Linux Perf Events
PERF(1) perf Manual PERF(1)
NAME
perf - Performance analysis tools for Linux
SYNOPSIS
perf [--version] [--help] COMMAND [ARGS]
DESCRIPTION
Performance counters for Linux are a new kernel-based subsystem
that provide a framework for all things performance analysis.
It covers hardware level (CPU/PMU, Performance Monitoring Unit)
features and software features (software counters, tracepoints)
as well.
Sample Stack Traces w/ perf(1)
# perf record -F 99 -p `pgrep -n node` -g -- sleep 30
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.524 MB perf.data
(~22912 samples) ]
Sample Stack Trace
ab2fee v8::internal::Heap::DeoptMarkedAllocationSites() (/apps/node/bin/
a69754 v8::internal::StackGuard::HandleInterrupts() (/apps/node/bin/node)
c9f13b v8::internal::Runtime_StackGuard(int, v8::internal::Object**
3c793e3060bb (/tmp/perf-5382.map)
3c793e3060bb (/tmp/perf-5382.map)
3c793e3060bb (/tmp/perf-5382.map)
3c793e3060bb (/tmp/perf-5382.map) (repeated 30 more lines)
8e6b2f v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>,
int, v8::Local<v8::Value>*) (/apps/node/bin/node)
8f2281 v8::Function::Call(v8::Local<v8::Value>, int,
v8::Local<v8::Value>*) (/apps/node/bin/node)
df599a node::MakeCallback(node::Environment*, v8::Local<v8::Value>,...
df5ccb node::CheckImmediate(uv_check_s*) (/apps/node/bin/node)
fb1597 uv__run_check (/apps/node/bin/node)
fabcee uv_run (/apps/node/bin/node)
dfaa50 node::Start(int, char**) (/apps/node/bin/node)
7fcc3ef6876d __libc_start_main (/lib/x86_64-linux-gnu/libc-2.15.so)
Missing JS Frames
Why? v8 places symbols JIT(Just in
Time)
node --perf_basic_prof_only_functions
“outputs the ïŹles in a format that the existing perf tool
can consume.”
node --perf_basic_prof_only_functions
Available right now in Node v5.x
Coming soon to Node v4.3:
https://github.com/nodejs/node/pull/3609
Results
node 5382 cpu-clock:
3c793e38b0c1 LazyCompile:DELETE native runtime.js:349 (/tmp/perf-5382.map)
3c793e31981d Builtin:JSConstructStubGeneric (/tmp/perf-5382.map)
3c793ff2ca94 (/tmp/perf-5382.map)
3c793e98a10f LazyCompile:~AtlasClient._run /apps/node/webapp/node_modules/nf-atlas-client/lib/client/AtlasClient.js:85 (/tmp/
perf-5382.map)
3c793f47de29 LazyCompile:*AtlasClient.timer /apps/node/webapp/node_modules/nf-atlas-client/lib/client/AtlasClient.js:70 (/tmp/
perf-5382.map)
3c793e9eee38 LazyCompile:~fetchSingleGetCallback /apps/node/webapp/singletons/ShaktiFetcher.js:120 (/tmp/perf-5382.map)
3c793f6cffee LazyCompile:*Model.get /apps/node/webapp/node_modules/nf-models/lib/Model.js:90 (/tmp/perf-5382.map)
3c793ed3e2ad (/tmp/perf-5382.map)
3c7940e4357b Handler:ca (/tmp/perf-5382.map)
3c793f060e3c Function:~ /apps/node/webapp/node_modules/vasync/lib/vasync.js:134 (/tmp/perf-5382.map)
3c79404edbfa (/tmp/perf-5382.map)
3c79401fd3f7 (/tmp/perf-5382.map)
3c79400e307b LazyCompile:*fetchMulti /apps/node/webapp/singletons/ShaktiFetcher.js:50 (/tmp/perf-5382.map)
3c793fb9a59f LazyCompile:*fetch /apps/node/webapp/singletons/ShaktiFetcher.js:32 (/tmp/perf-5382.map)
3c793e896697 (/tmp/perf-5382.map)
3c7943aaabbe (/tmp/perf-5382.map)
3c793ef4c53c Function:~ /apps/node/webapp/node_modules/vasync/lib/vasync.js:245 (/tmp/perf-5382.map)
3c793eaf4f01 LazyCompile:* /apps/node/webapp/node_modules/nf-packager/lib/index.js:194 (/tmp/perf-5382.map)
3c793eab130a LazyCompile:processImmediate timers.js:352 (/tmp/perf-5382.map)
3c793e319f7d Builtin:JSEntryTrampoline (/tmp/perf-5382.map)
3c793e3189e2 Stub:JSEntryStub (/tmp/perf-5382.map)
a65baf v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>,
v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, bool) (/apps/node/bin/node)
8e6b2f v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) (/apps/node/bin/node)
8f2281 v8::Function::Call(v8::Local<v8::Value>, int, v8::Local<v8::Value>*) (/apps/node/bin/node)
df599a node::MakeCallback(node::Environment*, v8::Local<v8::Value>, v8::Local<v8::Function>, int, v8::Local<v8::Value>*)
(/apps/node/bin/node)
df5ccb node::CheckImmediate(uv_check_s*) (/apps/node/bin/node)
fb1597 uv__run_check (/apps/node/bin/node)
fabcee uv_run (/apps/node/bin/node)
dfaa50 node::Start(int, char**) (/apps/node/bin/node)
7fcc3ef6876d __libc_start_main (/lib/x86_64-linux-gnu/libc-2.15.so)
)
JS Frames
Native Frames
Problem: Too Many Traces
$ cat out.nodestacks01 | grep cpu-clock | wc -l
744
$ wc -l out.nodestacks01
58116
Too Many Traces
Solution: Flame Graphs
Flamegraph
❖ Each box presents a function in
the stack (stack frame)
❖ x-axis: percent of time on CPU
❖ y-axis: stack depth
❖ colors: random, or can be a
dimension
❖ https://github.com/
brendangregg/FlameGraph
v8
libc
JS
built ins
Flame Graph Interpretation
a()
b() h()
c()
d()
e() f()
g()
i()
Flame Graph Interpretation
Top edge shows who is running on-CPU,‹
and how much (width)
a()
b() h()
c()
d()
e() f()
g()
i()
Flame Graph Interpretation
Top-down shows ancestry
e.g., from g():
h()
d()
e()
i()
a()
b()
c()
f()
g()
Flame Graph Interpretation
a()
b() h()
c()
d()
e() f()
g()
i()
Widths are proportional to presence in samples
e.g., comparing b() to h() (incl. children)
> 50% time on CPU
lodash!
function merge(object) {
var args = arguments,
length = 2;
...
Use _.assign() Instead
Before
After
Flame Graphs
Helps you ïŹnd 1 LoC out of 6 Million
Results
❖ Dramatically reduced request latency
❖ Reduced CPU utilization
❖ Increased throughput
Runtime Performance Technique
❖ Sample stack traces via perf(1)
❖ Visualize code distribution with CPU ïŹ‚ame graphs
❖ Identify candidate code paths for performance
improvement
❖ Repeat
Runtime Crashes
- ChaïŹn, R. "Pioneer F & G Telemetry and Command Processor Core Dump
Program." JPL Technical Report XVI, no. 32-1526 (1971): 174.
“The method described in this article was designed
to provide a core dump
 with a minimal impact
on the spacecraft
 as the resumption of data
acquisition from the spacecraft is the highest
priority.”
Core Dumps — A Brief History
❖ Magnetic core memory
❖ Dump out the contents of
“core” memory for debugging
❖ “Core dump” was born
❖ Initially printed on paper!
❖ Postmortem debugging was
born!
Production Constraints
❖ Uptime is critical
❖ Not easily reproducible
❖ Can’t simulate environment
❖ Resume normal operations ASAP
Postmortem Debugging
Take core
dump
Restart app
Load core
dump
elsewhere
Engineer
Fix
Debug
Continue
serving
trafïŹc
Configure Node to Dump Core on Error
!"[0] <> node --abort_on_uncaught_exception throw.js
Uncaught Error
FROM
Object.<anonymous> (/Users/yunong/throw.js:1:63)
Module._compile (module.js:435:26)
Object.Module._extensions..js (module.js:442:10)
Module.load (module.js:356:32)
Function.Module._load (module.js:311:12)
Function.Module.runMain (module.js:467:10)
startup (node.js:134:18)
node.js:961:3
[1] 4131 illegal hardware instruction (core dumped) node --
abort_on_uncaught_exception throw.js
Node Post Mortem Tooling
❖ NetïŹ‚ix uses Linux in Prod
❖ Linux — Work in progress
❖ https://github.com/tjfontaine/lldb-v8
❖ https://github.com/indutny/llnode
❖ Solaris — Full featured, compatible with Linux cores
❖ https://github.com/joyent/mdb_v8
Socks & Duct Tape: Setup a Debug Solaris Instance
EC2: http://omnios.omniti.com/wiki.php/
Installation#IntheCloud
VM: http://omnios.omniti.com/wiki.php/
Installation#Quickstart
Post Mortem Methodology
❖ Where: Inspect stack trace
❖ Why: Inspect heap and stack variable state
mdb(1) JS commands
❖ ::help <cmd>
❖ ::jsstack
❖ ::jsprint
❖ ::jssource
❖ ::jsconstructor
❖ ::ïŹndjsobjects
❖ ::jsfunctions
Load the Core Dump
# mdb ./node-v4.2.2-linux/node-v4.2.2-linux-x64/bin/node ./core.7186
> ::load ./mdb_v8_amd64.so
mdb_v8 version: 1.1.1 (release, from 28cedf2)
V8 version: 143.156.132.195
Autoconfigured V8 support from target
C++ symbol demangling enabled
linux node binary core dumpload mdb_v8 module
::jsstack
> ::jsstack
js: test
js: storeHeader
js: <anonymous> (as OutgoingMessage._storeHeader)
js: <anonymous> (as ServerResponse.writeHead)
js: restifyWriteHead
js: _cb
js: send
js: <anonymous> (as <anon>)
js: <anonymous> (as ReactRenderer._renderLayout)
js: <anonymous> (as <anon>)
js: <anonymous> (as <anon>)
js: <anonymous> (as dispatchHandler)
js: <anonymous> (as <anon>)
js: runHooks
js: runTransitionToHooks
js: <anonymous> (as assign.to)
js: <anonymous> (as <anon>)
js: runHooks
js: runTransitionFromHooks
js: <anonymous> (as assign.from)
js: <anonymous> (as React.createClass.statics.dispatch)
native: _ZN2v88internalL6InvokeEbNS0_6HandleINS0_10JSFunctionEEENS1_INS0...
native: v8::internal::Execution::Call+0xc8
native: v8::internal::Runtime_Apply+0x1ce
frame type
func name
Always name your functions!
var foo = function foo() {};
Foo.prototype.bar = function bar() {};
foo(function bar() {});
::jsstack -v Frame Source
> ::jsstack -v
js: storeHeader
file: http.js
posn: position 18774
this: 2ad561306c91 (<unknown>)
arg1: 3bd67e0669b9 (JSObject: ServerResponse)
arg2: 3dfe966ae299 (JSObject: Object)
arg3: 34d5391d8859 (SeqAsciiString)
arg4: 34d5391d8881 (SeqAsciiString)
652
653 function storeHeader(self, state, field, value) {
654 // Protect against response splitting. The if statement is there to
655 // minimize the performance impact in the common case.
656 if (/[rn]/.test(value))
657 value = value.replace(/[rn]+[ t]*/g, '');
658
659 state.messageHeader += field + ': ' + value + CRLF;
660
661 if (connectionExpression.test(field)) {
662 state.sentConnectionHeader = true;
663 if (closeExpression.test(value)) {
664 self._last = true;
665 } else {
666 self.shouldKeepAlive = true;
667 }
668
::jsstack -vn0 Frame and Function Args
> ::jsstack -vn0
js: test
file: native regexp.js
posn: position 2677
this: 2421205bd4d9 (JSRegExp)
arg1: 34d5391d8859 (SeqAsciiString)
js: storeHeader
file: http.js
posn: position 18774
this: 2ad561306c91 (<unknown>)
arg1: 3bd67e0669b9 (JSObject: ServerResponse)
arg2: 3dfe966ae299 (JSObject: Object)
arg3: 34d5391d8859 (SeqAsciiString)
arg4: 34d5391d8881 (SeqAsciiString)
js: <anonymous> (as OutgoingMessage._storeHeader)
file: http.js
posn: position 15652
this: 3bd67e0669b9 (JSObject: ServerResponse)
arg1: 3dfe966ae271 (ConsString)
arg2: 3dfe966add99 (JSObject: Object)
js: restifyWriteHead
file: /apps/node/webapp/node_modules/restify/lib/response.js
posn: position 6964
this: 3bd67e0669b9 (JSObject: ServerResponse)
(1 internal frame elided)
js: _cb
Func Name
JS File
Line #
Func Args
::jsstack Function Args
> ::jsstack -vn0
js: test
file: native regexp.js
posn: position 2677
this: 2421205bd4d9 (JSRegExp)
arg1: 34d5391d8859 (SeqAsciiString)
js: storeHeader
file: http.js
posn: position 18774
this: 2ad561306c91 (<unknown>)
arg1: 3bd67e0669b9 (JSObject: ServerResponse)
arg2: 3dfe966ae299 (JSObject: Object)
arg3: 34d5391d8859 (SeqAsciiString)
arg4: 34d5391d8881 (SeqAsciiString)
js: <anonymous> (as OutgoingMessage._storeHeader)
file: http.js
posn: position 15652
this: 3bd67e0669b9 (JSObject: ServerResponse)
arg1: 3dfe966ae271 (ConsString)
arg2: 3dfe966add99 (JSObject: Object)
js: restifyWriteHead
file: /apps/node/webapp/node_modules/restify/lib/response.js
posn: position 6964
this: 3bd67e0669b9 (JSObject: ServerResponse)
(1 internal frame elided)
js: _cb
Memory Address of Var Var Type
::jsprint Print JS Objects
> 3bd67e0669b9::jsprint
{
"_time": 1437690472539,
"_headers": {
"content-type": "text/html",
"req_id": "5b7f18f2-7f12-4c68-b07f-3cd75698ba65",
"set-cookie": “CENSORED; Domain=.netflix.com; Expires=Fri, 24 Jul 2015 10:27:52 GMT
"x-frame-options": "DENY",
"x-ua-compatible": "IE=edge",
"x-netflix.client.instance": "i-c420596c",
},
"output": [],
"_last": false,
"_hangupClose": false,
"_hasBody": true,
"socket": {
"_connecting": false,
"_handle": [...],
"_readableState": [...],
"readable": true,
"domain": null,
"_events": [...],
"_maxListeners": 10,
"_writableState": [...],
"writable": true,
"allowHalfOpen": true,
Actual JS Object Instance
::jsconstructor Show Object Constructor
> 3bd67e0669b9::jsconstructor -v
ServerResponse (JSFunction: 2421205bced9)
::jssource Print f() Source
> 2421205bced9::jssource
file: http.js
1066 function ServerResponse(req) {
1067 OutgoingMessage.call(this);
1068
1069 if (req.method === 'HEAD') this._hasBody = false;
1070
1071 this.sendDate = true;
1072
1073 if (req.httpVersionMajor < 1 || req.httpVersionMinor < 1) {
1074 this.useChunkedEncodingByDefault = chunkExpression.test(req.headers.te);
1075 this.shouldKeepAlive = false;
1076 }
1077 }
1078 util.inherits(ServerResponse, OutgoingMessage);
Core Dump === Complete Process State
Memory Leaks
Memory Leaks
Generate Core Dump Ad-hoc
gcore(1) GNU Tools gcore(1)
NAME
gcore - Generate a core file for a running process
SYNOPSIS
gcore [-o filename] pid
Take a Core Dump!
root@demo:~# gcore `pgrep node`
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/
libthread_db.so.1".
[New Thread 0x7facaeffd700 (LWP 5650)]
[New Thread 0x7facaf7fe700 (LWP 5649)]
[New Thread 0x7facaffff700 (LWP 5648)]
[New Thread 0x7facbc967700 (LWP 5647)]
[New Thread 0x7facbd168700 (LWP 5617)]
[New Thread 0x7facbd969700 (LWP 5616)]
[New Thread 0x7facbe16a700 (LWP 5615)]
[New Thread 0x7facbe96b700 (LWP 5614)]
0x00007facbea5b5a9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
Saved corefile core.5602
Problem: Find Leaking Objects
::findjsobjects
NAME
findjsobjects - find JavaScript objects
SYNOPSIS
[ addr ] ::findjsobjects [-vb] [-r | -c
cons | -p prop]
::findjsobjects Find ALL JS Objects on Heap
> ::findjsobjects
OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS
...
3dfe97453121 18 6721 Array
157a020e01 1304 101 <anonymous> (as Constructor): ...
8f1a53211 13879 12 ReactDOMComponent: _tag, tagName,
props, ...
8f1a05691 85776 2 Array
3dfe97451a99 36 5589 Array
23e5d7d44351 1 218020 Object: .2f5hpw2hgjk.1.0.3, ...
8f1a05f31 40533 6 <anonymous> (as ReactElement): type, ...
8f1a04da1 252133 1 Array
8f1a04dc1 125869 7 Array
8f1a04f01 114914 8 Array
8f1a04d39 230924 7 Module: id, exports, parent,
filename, ...
Memory Leak Strategy
❖ Look at objects on heap for suspicious objects
❖ Take successive core dumps and compare object counts
❖ Growing object counts are likely leaking
❖ Inspect object for more context
❖ Walk reverse references to ïŹnd root object
Look at Object Delta Between
Successive Core Dumps
Uptime = 45mins
> ::findjsobjects
OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS
...
8f1a04d39 230924 7 Module: id, exports,
parent, filename, ...
Uptime = 90 mins
> ::findjsobjects
OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS
...
8f1a04d39 323454 7 Module: id, exports,
parent, filename, ...
Analyze Leaked Objects
Representative Object
> ::findjsobjects
OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS
...
8f1a04d39 323454 7 Module: id, exports,
parent, filename, ...
Representative Object, 1 of 323454
Look Closer
> 8f1a04d39::jsprint
{
"id": "/apps/node/webapp/ui/js/pages/akiraClient.js",
"exports": {},
"parent": {
"id": "/apps/node/webapp/middleware/autoClientStrings.js",
"exports": function autoExposeClientStrings,
"parent": [...],
"filename": "/apps/node/webapp/middleware/
autoClientStrings.js",
"loaded": true,
"children": [...],
"paths": [...],
},
"filename": "/apps/node/webapp/ui/js/pages/akiraClient.js",
Use ::findjsobjects to Find All “Module” Objects
> 8f1a04d39::findjsobjects
8f1a04d39
3fd996bffb39
3fd996bfcff1
3fd996bfbac1
3fd996bf8a19
3fd996bf7949
3fd996bf3ce9
3fd996bf0f19
3fd996bead71
3fd996bea821
3fd996bea001
3fd996be92b1
3fd996be73d1
3fd996be58d1
3fd996bd88b1
3fd996bcb459
3fd996bcaa41
3fd996bc7009
Analyze All 320K+ Objects?
Custom Querying With Pipes and Unix Tools
8f1a04d39::findjsobjects | ::jsprint ! grep filename | sort | uniq -c
Results
...
1 "filename": "/apps/node/webapp/ui/js/akira/
components/messaging/paymentHold.js",
2 "filename": "/apps/node/webapp/ui/js/common/
commonCore.js",
1 "filename": "/apps/node/webapp/ui/js/common/
playPrediction/playPrediction.js",
3 "filename": "/apps/node/webapp/ui/js/common/
presentationTracking/presentationTracking.js",
111061 "filename": “/apps/node/webapp/ui/js/common/
playPrediction/playPrediction.js",
7103 "filename": “/apps/node/webapp/ui/js/pages/
reactClientRender.js",
111061 "filename": “/apps/node/webapp/ui/js/pages/
akiraClient.js",
118257 "filename": “/apps/node/webapp/middleware/
autoClientStrings.js",
... Client Side Modules
What’s holding on to these
modules?
Aim: Find Root Object
Walk Reverse Refs with ::findjsobjects -r
> 8f1a04d39::findjsobjects -r
8f1a04d39 referred to by 14fd6c5b13c1.parent
Root Object
> 1f313791bb41::jsprint
[
{
"id": "/apps/node/webapp/ui/js/pages/akiraClient.js",
"exports": [...],
"parent": [...],
"filename": "/apps/node/webapp/ui/js/pages/akiraClient.js",
"loaded": false,
"children": [...],
"paths": [...],
},
{
"id": "/apps/node/webapp/ui/js/pages/akiraClient.js",
"exports": [...],
"parent": [...],
"filename": "/apps/node/webapp/ui/js/pages/akiraClient.js",
"loaded": false,
"children": [...],
"paths": [...],
},
{
"id": "/apps/node/webapp/ui/js/pages/akiraClient.js",
"exports": [...],
"parent": [...],
Spot the Leak
var cache = {};
function checkCache(someModule) {
var mod = cache[someModule];
if (!mod) {
try {
mod = require(someModule);
cache[someModule] = mod;
return mod;
} catch (e) {
return {};
}
}
return mod;
}
Module could be client only, must catch
Should cache the
fact we caught an exception here
Root Cause
❖ Node caches metadata for each module
❖ If require process throws an exception, the module
metadata is leaked (bug?)
❖ Client side module meant we were throwing during
every request, and not caching the fact we tried to
require it
❖ Each request leaks 3+ module metadata objects
Memory Leaks
❖ Take successive core dumps (gcore(1))
❖ Compare object counts (::ïŹndjsobjects)
❖ Growing objects are likely leaking
❖ Inspect object for more context (::jsprint)
❖ Walk reverse references to ïŹnd root obj (::ïŹndjsobjects -
r)
Post Mortem Debugging is Critical to
Large Scale Prod Node Deployments
More State than Just Logs
❖ Detailed stack trace (::jsstack)
❖ Function args for each frame (::jsstack -vn0)
❖ Get state of any object and its provenance
(::jsprint, ::jsconstructor)
❖ Get source code of any function (::jssource)
❖ Find arbitrary JS objects (::ïŹndjsobjects)
❖ UnmodiïŹed Node binary!
Production Failures are Inevitable
But We Can Learn From Them
Production Debugging
❖ Runtime Performance
❖ CPU proïŹling/ïŹ‚ame graphs
❖ Runtime Crashes
❖ Inspect program state with core dumps and mdb
❖ Memory leaks
❖ Analyze objects and references with core dumps and
mdb
Use the Scientific Method
Epilogue — State of Tooling
❖ Join Working Group https://github.com/nodejs/post-
mortem
❖ Help make mdb_v8 cross platform https://github.com/
joyent/mdb_v8
❖ Contribute to https://github.com/tjfontaine/lldb-v8
and https://github.com/indutny/llnode
Acknowledgements
❖ mdb_v8
❖ Dave Pacheco, TJ Fontaine, Julien Gilli, Bryan Cantrill
❖ CPU ProïŹling/Flamegraphs
❖ Brendan Gregg, Google V8 team, Ali Ijaz Sheikh
❖ Linux Perf
❖ Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Peter Zijlstra
❖ lldb-v8
❖ TJ Fontaine
❖ llnode
❖ Fedor Indutny
Get Involved!
THANKS
❖ Questions? We’re Hiring!
❖ yunong@netïŹ‚ix.com
❖ @yunongx
Citations
❖ Slides 29-32 used with permission from “Java Mixed-
Mode Flame Graphs”, Brendan Gregg, Oct 2015
❖ Slide 26 used with permission from http://
www.brendangregg.com/FlameGraphs/
cpuïŹ‚amegraphs.html

Weitere Àhnliche Inhalte

Was ist angesagt?

20191115-PGconf.Japan
20191115-PGconf.Japan20191115-PGconf.Japan
20191115-PGconf.JapanKohei KaiGai
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Ferran GalĂ­ Reniu
 
ă‚°ăƒ©ăƒ•ăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čNeo4Jă§ă‚ąă‚»ăƒƒăƒˆăƒ€ă‚Šăƒłăƒ­ăƒŒăƒ‰ăźæ§‹æˆçźĄç†ăšæœ€é©ćŒ–
ă‚°ăƒ©ăƒ•ăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čNeo4Jă§ă‚ąă‚»ăƒƒăƒˆăƒ€ă‚Šăƒłăƒ­ăƒŒăƒ‰ăźæ§‹æˆçźĄç†ăšæœ€é©ćŒ–ă‚°ăƒ©ăƒ•ăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čNeo4Jă§ă‚ąă‚»ăƒƒăƒˆăƒ€ă‚Šăƒłăƒ­ăƒŒăƒ‰ăźæ§‹æˆçźĄç†ăšæœ€é©ćŒ–
ă‚°ăƒ©ăƒ•ăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čNeo4Jă§ă‚ąă‚»ăƒƒăƒˆăƒ€ă‚Šăƒłăƒ­ăƒŒăƒ‰ăźæ§‹æˆçźĄç†ăšæœ€é©ćŒ–gree_tech
 
物䜓怜ć‡șăźè©±Up甹
物䜓怜ć‡șăźè©±Up甚物䜓怜ć‡șăźè©±Up甹
物䜓怜ć‡șăźè©±Up甹Takashi Abe
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Anubhav Jain
 
Fantiaから歊ぶgcp運甹ぼノォハォ
Fantiaから歊ぶgcp運甹ぼノォハォFantiaから歊ぶgcp運甹ぼノォハォ
Fantiaから歊ぶgcpé‹ç”šăźăƒŽă‚Šăƒă‚Šè™Žăźç©Ž 開ç™ș柀
 
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...Spark Summit
 
[DLèŒȘèȘ­äŒš]YOLO9000: Better, Faster, Stronger
[DLèŒȘèȘ­äŒš]YOLO9000: Better, Faster, Stronger[DLèŒȘèȘ­äŒš]YOLO9000: Better, Faster, Stronger
[DLèŒȘèȘ­äŒš]YOLO9000: Better, Faster, StrongerDeep Learning JP
 
Pythonă«ă‚ˆă‚‹CVă‚ąăƒ«ă‚ŽăƒȘă‚șăƒ ćźŸèŁ…
Pythonă«ă‚ˆă‚‹CVă‚ąăƒ«ă‚ŽăƒȘă‚șăƒ ćźŸèŁ…Pythonă«ă‚ˆă‚‹CVă‚ąăƒ«ă‚ŽăƒȘă‚șăƒ ćźŸèŁ…
Pythonă«ă‚ˆă‚‹CVă‚ąăƒ«ă‚ŽăƒȘă‚șăƒ ćźŸèŁ…Hirokatsu Kataoka
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSAmazon Web Services
 
Argoă«ă‚ˆă‚‹æ©Ÿæą°ć­Šçż’ćźŸèĄŒćŸșç›€ăźæ§‹çŻ‰ăƒ»é‹ç”šă‹ă‚‰ăżăˆăŠăăŸă“ăš
Argoă«ă‚ˆă‚‹æ©Ÿæą°ć­Šçż’ćźŸèĄŒćŸșç›€ăźæ§‹çŻ‰ăƒ»é‹ç”šă‹ă‚‰ăżăˆăŠăăŸă“ăšArgoă«ă‚ˆă‚‹æ©Ÿæą°ć­Šçż’ćźŸèĄŒćŸșç›€ăźæ§‹çŻ‰ăƒ»é‹ç”šă‹ă‚‰ăżăˆăŠăăŸă“ăš
Argoă«ă‚ˆă‚‹æ©Ÿæą°ć­Šçż’ćźŸèĄŒćŸșç›€ăźæ§‹çŻ‰ăƒ»é‹ç”šă‹ă‚‰ăżăˆăŠăăŸă“ăšShinsaku Kono
 
Open source APM Scouter로 ëȘšë‹ˆí„°ë§ 잘 하Ʞ
Open source APM Scouter로 ëȘšë‹ˆí„°ë§ 잘 하ꞰOpen source APM Scouter로 ëȘšë‹ˆí„°ë§ 잘 하Ʞ
Open source APM Scouter로 ëȘšë‹ˆí„°ë§ 잘 하ꞰGunHee Lee
 
PostgreSQLăźé–ąæ•°ć±žæ€§ă‚’çŸ„ă‚ă†
PostgreSQLăźé–ąæ•°ć±žæ€§ă‚’çŸ„ă‚ă†PostgreSQLăźé–ąæ•°ć±žæ€§ă‚’çŸ„ă‚ă†
PostgreSQLăźé–ąæ•°ć±žæ€§ă‚’çŸ„ă‚ă†kasaharatt
 
æ·±ć±€ćŒ·ćŒ–ć­Šçż’ć…„é–€ă€€2020ćčŽćșŠDeep LearningćŸșç€ŽèŹ›ćș§ă€ŒćŒ·ćŒ–ć­Šçż’ă€
æ·±ć±€ćŒ·ćŒ–ć­Šçż’ć…„é–€ă€€2020ćčŽćșŠDeep LearningćŸșç€ŽèŹ›ćș§ă€ŒćŒ·ćŒ–ć­Šçż’ă€æ·±ć±€ćŒ·ćŒ–ć­Šçż’ć…„門 2020ćčŽćșŠDeep LearningćŸșç€ŽèŹ›ćș§ă€ŒćŒ·ćŒ–ć­Šçż’ă€
æ·±ć±€ćŒ·ćŒ–ć­Šçż’ć…„é–€ă€€2020ćčŽćșŠDeep LearningćŸșç€ŽèŹ›ćș§ă€ŒćŒ·ćŒ–ć­Šçż’ă€Tatsuya Matsushima
 
MongoDBăźç›ŁèŠ–
MongoDBăźç›ŁèŠ–MongoDBăźç›ŁèŠ–
MongoDBăźç›ŁèŠ–Tetsutaro Watanabe
 
OpenStackè¶…ć…„é–€ă‚·ăƒȘăƒŒă‚ș ă„ăŸă•ă‚‰èžă‘ăȘいSwiftăźäœżă„æ–č
OpenStackè¶…ć…„é–€ă‚·ăƒȘăƒŒă‚ș ă„ăŸă•ă‚‰èžă‘ăȘいSwiftăźäœżă„æ–čOpenStackè¶…ć…„é–€ă‚·ăƒȘăƒŒă‚ș ă„ăŸă•ă‚‰èžă‘ăȘいSwiftăźäœżă„æ–č
OpenStackè¶…ć…„é–€ă‚·ăƒȘăƒŒă‚ș ă„ăŸă•ă‚‰èžă‘ăȘいSwiftăźäœżă„æ–čToru Makabe
 
Javaăźăƒ—ăƒ­ă‚°ăƒ©ăƒ ăŻă©ă†ă‚„ăŁăŠć‹•ă„ăŠă„ă‚‹ăź? JVMç·š
Javaăźăƒ—ăƒ­ă‚°ăƒ©ăƒ ăŻă©ă†ă‚„ăŁăŠć‹•ă„ăŠă„ă‚‹ăź? JVMç·šJavaăźăƒ—ăƒ­ă‚°ăƒ©ăƒ ăŻă©ă†ă‚„ăŁăŠć‹•ă„ăŠă„ă‚‹ăź? JVMç·š
Javaăźăƒ—ăƒ­ă‚°ăƒ©ăƒ ăŻă©ă†ă‚„ăŁăŠć‹•ă„ăŠă„ă‚‹ăź? JVMç·šYuichi Sakuraba
 
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...JaeJun Yoo
 

Was ist angesagt? (20)

20191115-PGconf.Japan
20191115-PGconf.Japan20191115-PGconf.Japan
20191115-PGconf.Japan
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
ă‚°ăƒ©ăƒ•ăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čNeo4Jă§ă‚ąă‚»ăƒƒăƒˆăƒ€ă‚Šăƒłăƒ­ăƒŒăƒ‰ăźæ§‹æˆçźĄç†ăšæœ€é©ćŒ–
ă‚°ăƒ©ăƒ•ăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čNeo4Jă§ă‚ąă‚»ăƒƒăƒˆăƒ€ă‚Šăƒłăƒ­ăƒŒăƒ‰ăźæ§‹æˆçźĄç†ăšæœ€é©ćŒ–ă‚°ăƒ©ăƒ•ăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čNeo4Jă§ă‚ąă‚»ăƒƒăƒˆăƒ€ă‚Šăƒłăƒ­ăƒŒăƒ‰ăźæ§‹æˆçźĄç†ăšæœ€é©ćŒ–
ă‚°ăƒ©ăƒ•ăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čNeo4Jă§ă‚ąă‚»ăƒƒăƒˆăƒ€ă‚Šăƒłăƒ­ăƒŒăƒ‰ăźæ§‹æˆçźĄç†ăšæœ€é©ćŒ–
 
物䜓怜ć‡șăźè©±Up甹
物䜓怜ć‡șăźè©±Up甚物䜓怜ć‡șăźè©±Up甹
物䜓怜ć‡șăźè©±Up甹
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 
Fantiaから歊ぶgcp運甹ぼノォハォ
Fantiaから歊ぶgcp運甹ぼノォハォFantiaから歊ぶgcp運甹ぼノォハォ
Fantiaから歊ぶgcp運甹ぼノォハォ
 
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
 
[DLèŒȘèȘ­äŒš]YOLO9000: Better, Faster, Stronger
[DLèŒȘèȘ­äŒš]YOLO9000: Better, Faster, Stronger[DLèŒȘèȘ­äŒš]YOLO9000: Better, Faster, Stronger
[DLèŒȘèȘ­äŒš]YOLO9000: Better, Faster, Stronger
 
Pythonă«ă‚ˆă‚‹CVă‚ąăƒ«ă‚ŽăƒȘă‚șăƒ ćźŸèŁ…
Pythonă«ă‚ˆă‚‹CVă‚ąăƒ«ă‚ŽăƒȘă‚șăƒ ćźŸèŁ…Pythonă«ă‚ˆă‚‹CVă‚ąăƒ«ă‚ŽăƒȘă‚șăƒ ćźŸèŁ…
Pythonă«ă‚ˆă‚‹CVă‚ąăƒ«ă‚ŽăƒȘă‚șăƒ ćźŸèŁ…
 
Apache Solr æ€œçŽąă‚šăƒłă‚žăƒłć…„é–€
Apache Solr æ€œçŽąă‚šăƒłă‚žăƒłć…„é–€Apache Solr æ€œçŽąă‚šăƒłă‚žăƒłć…„é–€
Apache Solr æ€œçŽąă‚šăƒłă‚žăƒłć…„é–€
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWS
 
Argoă«ă‚ˆă‚‹æ©Ÿæą°ć­Šçż’ćźŸèĄŒćŸșç›€ăźæ§‹çŻ‰ăƒ»é‹ç”šă‹ă‚‰ăżăˆăŠăăŸă“ăš
Argoă«ă‚ˆă‚‹æ©Ÿæą°ć­Šçż’ćźŸèĄŒćŸșç›€ăźæ§‹çŻ‰ăƒ»é‹ç”šă‹ă‚‰ăżăˆăŠăăŸă“ăšArgoă«ă‚ˆă‚‹æ©Ÿæą°ć­Šçż’ćźŸèĄŒćŸșç›€ăźæ§‹çŻ‰ăƒ»é‹ç”šă‹ă‚‰ăżăˆăŠăăŸă“ăš
Argoă«ă‚ˆă‚‹æ©Ÿæą°ć­Šçż’ćźŸèĄŒćŸșç›€ăźæ§‹çŻ‰ăƒ»é‹ç”šă‹ă‚‰ăżăˆăŠăăŸă“ăš
 
Open source APM Scouter로 ëȘšë‹ˆí„°ë§ 잘 하Ʞ
Open source APM Scouter로 ëȘšë‹ˆí„°ë§ 잘 하ꞰOpen source APM Scouter로 ëȘšë‹ˆí„°ë§ 잘 하Ʞ
Open source APM Scouter로 ëȘšë‹ˆí„°ë§ 잘 하Ʞ
 
PostgreSQLăźé–ąæ•°ć±žæ€§ă‚’çŸ„ă‚ă†
PostgreSQLăźé–ąæ•°ć±žæ€§ă‚’çŸ„ă‚ă†PostgreSQLăźé–ąæ•°ć±žæ€§ă‚’çŸ„ă‚ă†
PostgreSQLăźé–ąæ•°ć±žæ€§ă‚’çŸ„ă‚ă†
 
æ·±ć±€ćŒ·ćŒ–ć­Šçż’ć…„é–€ă€€2020ćčŽćșŠDeep LearningćŸșç€ŽèŹ›ćș§ă€ŒćŒ·ćŒ–ć­Šçż’ă€
æ·±ć±€ćŒ·ćŒ–ć­Šçż’ć…„é–€ă€€2020ćčŽćșŠDeep LearningćŸșç€ŽèŹ›ćș§ă€ŒćŒ·ćŒ–ć­Šçż’ă€æ·±ć±€ćŒ·ćŒ–ć­Šçż’ć…„門 2020ćčŽćșŠDeep LearningćŸșç€ŽèŹ›ćș§ă€ŒćŒ·ćŒ–ć­Šçż’ă€
æ·±ć±€ćŒ·ćŒ–ć­Šçż’ć…„é–€ă€€2020ćčŽćșŠDeep LearningćŸșç€ŽèŹ›ćș§ă€ŒćŒ·ćŒ–ć­Šçż’ă€
 
MongoDBăźç›ŁèŠ–
MongoDBăźç›ŁèŠ–MongoDBăźç›ŁèŠ–
MongoDBăźç›ŁèŠ–
 
OpenStackè¶…ć…„é–€ă‚·ăƒȘăƒŒă‚ș ă„ăŸă•ă‚‰èžă‘ăȘいSwiftăźäœżă„æ–č
OpenStackè¶…ć…„é–€ă‚·ăƒȘăƒŒă‚ș ă„ăŸă•ă‚‰èžă‘ăȘいSwiftăźäœżă„æ–čOpenStackè¶…ć…„é–€ă‚·ăƒȘăƒŒă‚ș ă„ăŸă•ă‚‰èžă‘ăȘいSwiftăźäœżă„æ–č
OpenStackè¶…ć…„é–€ă‚·ăƒȘăƒŒă‚ș ă„ăŸă•ă‚‰èžă‘ăȘいSwiftăźäœżă„æ–č
 
Javaăźăƒ—ăƒ­ă‚°ăƒ©ăƒ ăŻă©ă†ă‚„ăŁăŠć‹•ă„ăŠă„ă‚‹ăź? JVMç·š
Javaăźăƒ—ăƒ­ă‚°ăƒ©ăƒ ăŻă©ă†ă‚„ăŁăŠć‹•ă„ăŠă„ă‚‹ăź? JVMç·šJavaăźăƒ—ăƒ­ă‚°ăƒ©ăƒ ăŻă©ă†ă‚„ăŁăŠć‹•ă„ăŠă„ă‚‹ăź? JVMç·š
Javaăźăƒ—ăƒ­ă‚°ăƒ©ăƒ ăŻă©ă†ă‚„ăŁăŠć‹•ă„ăŠă„ă‚‹ăź? JVMç·š
 
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
 
GPT
GPTGPT
GPT
 

Andere mochten auch

[JSDC 2016] Codex: Conditional Modules Strike Back
[JSDC 2016] Codex: Conditional Modules Strike Back[JSDC 2016] Codex: Conditional Modules Strike Back
[JSDC 2016] Codex: Conditional Modules Strike BackAlex Liu
 
Observable Node.js Applications - EnterpriseJS
Observable Node.js Applications - EnterpriseJSObservable Node.js Applications - EnterpriseJS
Observable Node.js Applications - EnterpriseJSYunong Xiao
 
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsNetflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsChris Saint-Amant
 
To Err Is Human
To Err Is HumanTo Err Is Human
To Err Is HumanAlex Liu
 
Slaying Monoliths with Node and Docker
Slaying Monoliths with Node and DockerSlaying Monoliths with Node and Docker
Slaying Monoliths with Node and DockerYunong Xiao
 
Debugging node in prod
Debugging node in prodDebugging node in prod
Debugging node in prodYunong Xiao
 

Andere mochten auch (6)

[JSDC 2016] Codex: Conditional Modules Strike Back
[JSDC 2016] Codex: Conditional Modules Strike Back[JSDC 2016] Codex: Conditional Modules Strike Back
[JSDC 2016] Codex: Conditional Modules Strike Back
 
Observable Node.js Applications - EnterpriseJS
Observable Node.js Applications - EnterpriseJSObservable Node.js Applications - EnterpriseJS
Observable Node.js Applications - EnterpriseJS
 
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsNetflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
 
To Err Is Human
To Err Is HumanTo Err Is Human
To Err Is Human
 
Slaying Monoliths with Node and Docker
Slaying Monoliths with Node and DockerSlaying Monoliths with Node and Docker
Slaying Monoliths with Node and Docker
 
Debugging node in prod
Debugging node in prodDebugging node in prod
Debugging node in prod
 

Ähnlich wie Node Interactive Debugging Node.js In Production

Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflowinside-BigData.com
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in productionParis Data Engineers !
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsemBO_Conference
 
Resource planning on the (Amazon) cloud
Resource planning on the (Amazon) cloudResource planning on the (Amazon) cloud
Resource planning on the (Amazon) cloudEnis Afgan
 
Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)nikomatsakis
 
Resource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkResource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkDatabricks
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterAndrey Kudryavtsev
 
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebula Project
 
Varnish http accelerator
Varnish http acceleratorVarnish http accelerator
Varnish http acceleratorno no
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Andriy Berestovskyy
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Hajime Tazaki
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in RustInfluxData
 
2012 coscup - Build your PHP application on Heroku
2012 coscup - Build your PHP application on Heroku2012 coscup - Build your PHP application on Heroku
2012 coscup - Build your PHP application on Herokuronnywang_tw
 
Top-5-Performance-JaxLondon-2023.pptx
Top-5-Performance-JaxLondon-2023.pptxTop-5-Performance-JaxLondon-2023.pptx
Top-5-Performance-JaxLondon-2023.pptxTier1 app
 
Easy deployment & management of cloud apps
Easy deployment & management of cloud appsEasy deployment & management of cloud apps
Easy deployment & management of cloud appsDavid Cunningham
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applicationsVipin Varghese
 
Develop QNAP NAS App by Docker
Develop QNAP NAS App by DockerDevelop QNAP NAS App by Docker
Develop QNAP NAS App by DockerTerry Chen
 
LCA13: Hadoop DFS Performance
LCA13: Hadoop DFS PerformanceLCA13: Hadoop DFS Performance
LCA13: Hadoop DFS PerformanceLinaro
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudAndrea Righi
 

Ähnlich wie Node Interactive Debugging Node.js In Production (20)

Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflow
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
 
Resource planning on the (Amazon) cloud
Resource planning on the (Amazon) cloudResource planning on the (Amazon) cloud
Resource planning on the (Amazon) cloud
 
Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)
 
Resource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkResource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache Spark
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
 
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
 
Varnish http accelerator
Varnish http acceleratorVarnish http accelerator
Varnish http accelerator
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
 
2012 coscup - Build your PHP application on Heroku
2012 coscup - Build your PHP application on Heroku2012 coscup - Build your PHP application on Heroku
2012 coscup - Build your PHP application on Heroku
 
Top-5-Performance-JaxLondon-2023.pptx
Top-5-Performance-JaxLondon-2023.pptxTop-5-Performance-JaxLondon-2023.pptx
Top-5-Performance-JaxLondon-2023.pptx
 
Easy deployment & management of cloud apps
Easy deployment & management of cloud appsEasy deployment & management of cloud apps
Easy deployment & management of cloud apps
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
Develop QNAP NAS App by Docker
Develop QNAP NAS App by DockerDevelop QNAP NAS App by Docker
Develop QNAP NAS App by Docker
 
LCA13: Hadoop DFS Performance
LCA13: Hadoop DFS PerformanceLCA13: Hadoop DFS Performance
LCA13: Hadoop DFS Performance
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
 

KĂŒrzlich hochgeladen

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

KĂŒrzlich hochgeladen (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

Node Interactive Debugging Node.js In Production

  • 1. Debugging Node.js in Production Yunong Xiao @yunongx Software Engineer Node Platform
  • 2. Node.js @ Netflix ❖ 65+ Million Subscribers ❖ Website (netïŹ‚ix.com) ❖ Dynamic asset packager ❖ PaaS on Node ❖ Internal Services
  • 3.
  • 4. –Gene Kranz, Flight Director, Apollo 13 “Let's work the problem, people. Let's not make things any worse by guessing”
  • 5. Apply the Scientific Method 1. Construct a Hypothesis 2. Collect data 3. Analyze data and draw a conclusion 4. Repeat
  • 6. Production Crisis ❖ Runtime Performance ❖ Runtime Crashes ❖ Memory Leaks
  • 9. to the Rescue [2014-12-09T14:07:26.293Z] INFO: shakti/restify-audit/20067: handled: 200, latency=1402 (req_id=b3fa3820-7fac-11e4-8908-a5c7b70d676f, latency=1435) GET / HTTP/1.1 host: www.netflix.com -- HTTP/1.1 200 OK x-netflix.client.instance: i-057e47ef x-frame-options: DENY content-type: text/html -- req.timers: { "parseBody": 700123, "apiRpc": 701911, "render": 400031 }
  • 10. req.timers: { "parseBody": 700123, “apiRPC”: 301911, "render": 400031, } On CPU
  • 11. CPU is Critical ❖ Node is essentially “single threaded” ❖ Cascading effect on ALL requests in process
  • 12. req.timers: { "parseBody": 700123, “apiRPC”: 301911, "render": 400031, } Can’t process ANY other request for 1.1 seconds On CPU
  • 13. How Much Code? $ find . -name "*.js*" | xargs cat | wc -l 6 042 301
  • 15. Snapshot What’s Currently Executing Stacktrace: A stack trace is a report of the active stack frames at a certain point in time during the execution of a program. > console.log(ex, ex.stack.split("n")) ReferenceError: ex is not defined at repl:1:13 at REPLServer.defaultEval (repl.js:132:27) at bound (domain.js:254:14) at REPLServer.runBound [as eval] (domain.js:267:12) at REPLServer.<anonymous> (repl.js:279:12) at REPLServer.emit (events.js:107:17) at REPLServer.Interface._onLine (readline.js:214:10) at REPLServer.Interface._line (readline.js:553:8) at REPLServer.Interface._ttyWrite (readline.js:830:14) at ReadStream.onkeypress (readline.js:109:10)
  • 16. Two Problems 1) How to sample stack traces from a running process? 2) How to do 1) without affecting the process?
  • 17. Linux Perf Events PERF(1) perf Manual PERF(1) NAME perf - Performance analysis tools for Linux SYNOPSIS perf [--version] [--help] COMMAND [ARGS] DESCRIPTION Performance counters for Linux are a new kernel-based subsystem that provide a framework for all things performance analysis. It covers hardware level (CPU/PMU, Performance Monitoring Unit) features and software features (software counters, tracepoints) as well.
  • 18. Sample Stack Traces w/ perf(1) # perf record -F 99 -p `pgrep -n node` -g -- sleep 30 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.524 MB perf.data (~22912 samples) ]
  • 19. Sample Stack Trace ab2fee v8::internal::Heap::DeoptMarkedAllocationSites() (/apps/node/bin/ a69754 v8::internal::StackGuard::HandleInterrupts() (/apps/node/bin/node) c9f13b v8::internal::Runtime_StackGuard(int, v8::internal::Object** 3c793e3060bb (/tmp/perf-5382.map) 3c793e3060bb (/tmp/perf-5382.map) 3c793e3060bb (/tmp/perf-5382.map) 3c793e3060bb (/tmp/perf-5382.map) (repeated 30 more lines) 8e6b2f v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) (/apps/node/bin/node) 8f2281 v8::Function::Call(v8::Local<v8::Value>, int, v8::Local<v8::Value>*) (/apps/node/bin/node) df599a node::MakeCallback(node::Environment*, v8::Local<v8::Value>,... df5ccb node::CheckImmediate(uv_check_s*) (/apps/node/bin/node) fb1597 uv__run_check (/apps/node/bin/node) fabcee uv_run (/apps/node/bin/node) dfaa50 node::Start(int, char**) (/apps/node/bin/node) 7fcc3ef6876d __libc_start_main (/lib/x86_64-linux-gnu/libc-2.15.so) Missing JS Frames
  • 20. Why? v8 places symbols JIT(Just in Time)
  • 21. node --perf_basic_prof_only_functions “outputs the ïŹles in a format that the existing perf tool can consume.”
  • 22. node --perf_basic_prof_only_functions Available right now in Node v5.x Coming soon to Node v4.3: https://github.com/nodejs/node/pull/3609
  • 23. Results node 5382 cpu-clock: 3c793e38b0c1 LazyCompile:DELETE native runtime.js:349 (/tmp/perf-5382.map) 3c793e31981d Builtin:JSConstructStubGeneric (/tmp/perf-5382.map) 3c793ff2ca94 (/tmp/perf-5382.map) 3c793e98a10f LazyCompile:~AtlasClient._run /apps/node/webapp/node_modules/nf-atlas-client/lib/client/AtlasClient.js:85 (/tmp/ perf-5382.map) 3c793f47de29 LazyCompile:*AtlasClient.timer /apps/node/webapp/node_modules/nf-atlas-client/lib/client/AtlasClient.js:70 (/tmp/ perf-5382.map) 3c793e9eee38 LazyCompile:~fetchSingleGetCallback /apps/node/webapp/singletons/ShaktiFetcher.js:120 (/tmp/perf-5382.map) 3c793f6cffee LazyCompile:*Model.get /apps/node/webapp/node_modules/nf-models/lib/Model.js:90 (/tmp/perf-5382.map) 3c793ed3e2ad (/tmp/perf-5382.map) 3c7940e4357b Handler:ca (/tmp/perf-5382.map) 3c793f060e3c Function:~ /apps/node/webapp/node_modules/vasync/lib/vasync.js:134 (/tmp/perf-5382.map) 3c79404edbfa (/tmp/perf-5382.map) 3c79401fd3f7 (/tmp/perf-5382.map) 3c79400e307b LazyCompile:*fetchMulti /apps/node/webapp/singletons/ShaktiFetcher.js:50 (/tmp/perf-5382.map) 3c793fb9a59f LazyCompile:*fetch /apps/node/webapp/singletons/ShaktiFetcher.js:32 (/tmp/perf-5382.map) 3c793e896697 (/tmp/perf-5382.map) 3c7943aaabbe (/tmp/perf-5382.map) 3c793ef4c53c Function:~ /apps/node/webapp/node_modules/vasync/lib/vasync.js:245 (/tmp/perf-5382.map) 3c793eaf4f01 LazyCompile:* /apps/node/webapp/node_modules/nf-packager/lib/index.js:194 (/tmp/perf-5382.map) 3c793eab130a LazyCompile:processImmediate timers.js:352 (/tmp/perf-5382.map) 3c793e319f7d Builtin:JSEntryTrampoline (/tmp/perf-5382.map) 3c793e3189e2 Stub:JSEntryStub (/tmp/perf-5382.map) a65baf v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, bool) (/apps/node/bin/node) 8e6b2f v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) (/apps/node/bin/node) 8f2281 v8::Function::Call(v8::Local<v8::Value>, int, v8::Local<v8::Value>*) (/apps/node/bin/node) df599a node::MakeCallback(node::Environment*, v8::Local<v8::Value>, v8::Local<v8::Function>, int, v8::Local<v8::Value>*) (/apps/node/bin/node) df5ccb node::CheckImmediate(uv_check_s*) (/apps/node/bin/node) fb1597 uv__run_check (/apps/node/bin/node) fabcee uv_run (/apps/node/bin/node) dfaa50 node::Start(int, char**) (/apps/node/bin/node) 7fcc3ef6876d __libc_start_main (/lib/x86_64-linux-gnu/libc-2.15.so) ) JS Frames Native Frames
  • 24. Problem: Too Many Traces $ cat out.nodestacks01 | grep cpu-clock | wc -l 744 $ wc -l out.nodestacks01 58116
  • 27. Flamegraph ❖ Each box presents a function in the stack (stack frame) ❖ x-axis: percent of time on CPU ❖ y-axis: stack depth ❖ colors: random, or can be a dimension ❖ https://github.com/ brendangregg/FlameGraph v8 libc JS built ins
  • 28. Flame Graph Interpretation a() b() h() c() d() e() f() g() i()
  • 29. Flame Graph Interpretation Top edge shows who is running on-CPU,‹ and how much (width) a() b() h() c() d() e() f() g() i()
  • 30. Flame Graph Interpretation Top-down shows ancestry e.g., from g(): h() d() e() i() a() b() c() f() g()
  • 31. Flame Graph Interpretation a() b() h() c() d() e() f() g() i() Widths are proportional to presence in samples e.g., comparing b() to h() (incl. children)
  • 32.
  • 33.
  • 34. > 50% time on CPU
  • 36. function merge(object) { var args = arguments, length = 2; ...
  • 39. After
  • 40. Flame Graphs Helps you ïŹnd 1 LoC out of 6 Million
  • 41. Results ❖ Dramatically reduced request latency ❖ Reduced CPU utilization ❖ Increased throughput
  • 42. Runtime Performance Technique ❖ Sample stack traces via perf(1) ❖ Visualize code distribution with CPU ïŹ‚ame graphs ❖ Identify candidate code paths for performance improvement ❖ Repeat
  • 44. - ChaïŹn, R. "Pioneer F & G Telemetry and Command Processor Core Dump Program." JPL Technical Report XVI, no. 32-1526 (1971): 174. “The method described in this article was designed to provide a core dump
 with a minimal impact on the spacecraft
 as the resumption of data acquisition from the spacecraft is the highest priority.”
  • 45. Core Dumps — A Brief History ❖ Magnetic core memory ❖ Dump out the contents of “core” memory for debugging ❖ “Core dump” was born ❖ Initially printed on paper! ❖ Postmortem debugging was born!
  • 46.
  • 47. Production Constraints ❖ Uptime is critical ❖ Not easily reproducible ❖ Can’t simulate environment ❖ Resume normal operations ASAP
  • 48. Postmortem Debugging Take core dump Restart app Load core dump elsewhere Engineer Fix Debug Continue serving trafïŹc
  • 49. Configure Node to Dump Core on Error !"[0] <> node --abort_on_uncaught_exception throw.js Uncaught Error FROM Object.<anonymous> (/Users/yunong/throw.js:1:63) Module._compile (module.js:435:26) Object.Module._extensions..js (module.js:442:10) Module.load (module.js:356:32) Function.Module._load (module.js:311:12) Function.Module.runMain (module.js:467:10) startup (node.js:134:18) node.js:961:3 [1] 4131 illegal hardware instruction (core dumped) node -- abort_on_uncaught_exception throw.js
  • 50. Node Post Mortem Tooling ❖ NetïŹ‚ix uses Linux in Prod ❖ Linux — Work in progress ❖ https://github.com/tjfontaine/lldb-v8 ❖ https://github.com/indutny/llnode ❖ Solaris — Full featured, compatible with Linux cores ❖ https://github.com/joyent/mdb_v8
  • 51.
  • 52. Socks & Duct Tape: Setup a Debug Solaris Instance EC2: http://omnios.omniti.com/wiki.php/ Installation#IntheCloud VM: http://omnios.omniti.com/wiki.php/ Installation#Quickstart
  • 53. Post Mortem Methodology ❖ Where: Inspect stack trace ❖ Why: Inspect heap and stack variable state
  • 54. mdb(1) JS commands ❖ ::help <cmd> ❖ ::jsstack ❖ ::jsprint ❖ ::jssource ❖ ::jsconstructor ❖ ::ïŹndjsobjects ❖ ::jsfunctions
  • 55. Load the Core Dump # mdb ./node-v4.2.2-linux/node-v4.2.2-linux-x64/bin/node ./core.7186 > ::load ./mdb_v8_amd64.so mdb_v8 version: 1.1.1 (release, from 28cedf2) V8 version: 143.156.132.195 Autoconfigured V8 support from target C++ symbol demangling enabled linux node binary core dumpload mdb_v8 module
  • 56. ::jsstack > ::jsstack js: test js: storeHeader js: <anonymous> (as OutgoingMessage._storeHeader) js: <anonymous> (as ServerResponse.writeHead) js: restifyWriteHead js: _cb js: send js: <anonymous> (as <anon>) js: <anonymous> (as ReactRenderer._renderLayout) js: <anonymous> (as <anon>) js: <anonymous> (as <anon>) js: <anonymous> (as dispatchHandler) js: <anonymous> (as <anon>) js: runHooks js: runTransitionToHooks js: <anonymous> (as assign.to) js: <anonymous> (as <anon>) js: runHooks js: runTransitionFromHooks js: <anonymous> (as assign.from) js: <anonymous> (as React.createClass.statics.dispatch) native: _ZN2v88internalL6InvokeEbNS0_6HandleINS0_10JSFunctionEEENS1_INS0... native: v8::internal::Execution::Call+0xc8 native: v8::internal::Runtime_Apply+0x1ce frame type func name
  • 57. Always name your functions! var foo = function foo() {}; Foo.prototype.bar = function bar() {}; foo(function bar() {});
  • 58. ::jsstack -v Frame Source > ::jsstack -v js: storeHeader file: http.js posn: position 18774 this: 2ad561306c91 (<unknown>) arg1: 3bd67e0669b9 (JSObject: ServerResponse) arg2: 3dfe966ae299 (JSObject: Object) arg3: 34d5391d8859 (SeqAsciiString) arg4: 34d5391d8881 (SeqAsciiString) 652 653 function storeHeader(self, state, field, value) { 654 // Protect against response splitting. The if statement is there to 655 // minimize the performance impact in the common case. 656 if (/[rn]/.test(value)) 657 value = value.replace(/[rn]+[ t]*/g, ''); 658 659 state.messageHeader += field + ': ' + value + CRLF; 660 661 if (connectionExpression.test(field)) { 662 state.sentConnectionHeader = true; 663 if (closeExpression.test(value)) { 664 self._last = true; 665 } else { 666 self.shouldKeepAlive = true; 667 } 668
  • 59. ::jsstack -vn0 Frame and Function Args > ::jsstack -vn0 js: test file: native regexp.js posn: position 2677 this: 2421205bd4d9 (JSRegExp) arg1: 34d5391d8859 (SeqAsciiString) js: storeHeader file: http.js posn: position 18774 this: 2ad561306c91 (<unknown>) arg1: 3bd67e0669b9 (JSObject: ServerResponse) arg2: 3dfe966ae299 (JSObject: Object) arg3: 34d5391d8859 (SeqAsciiString) arg4: 34d5391d8881 (SeqAsciiString) js: <anonymous> (as OutgoingMessage._storeHeader) file: http.js posn: position 15652 this: 3bd67e0669b9 (JSObject: ServerResponse) arg1: 3dfe966ae271 (ConsString) arg2: 3dfe966add99 (JSObject: Object) js: restifyWriteHead file: /apps/node/webapp/node_modules/restify/lib/response.js posn: position 6964 this: 3bd67e0669b9 (JSObject: ServerResponse) (1 internal frame elided) js: _cb Func Name JS File Line # Func Args
  • 60. ::jsstack Function Args > ::jsstack -vn0 js: test file: native regexp.js posn: position 2677 this: 2421205bd4d9 (JSRegExp) arg1: 34d5391d8859 (SeqAsciiString) js: storeHeader file: http.js posn: position 18774 this: 2ad561306c91 (<unknown>) arg1: 3bd67e0669b9 (JSObject: ServerResponse) arg2: 3dfe966ae299 (JSObject: Object) arg3: 34d5391d8859 (SeqAsciiString) arg4: 34d5391d8881 (SeqAsciiString) js: <anonymous> (as OutgoingMessage._storeHeader) file: http.js posn: position 15652 this: 3bd67e0669b9 (JSObject: ServerResponse) arg1: 3dfe966ae271 (ConsString) arg2: 3dfe966add99 (JSObject: Object) js: restifyWriteHead file: /apps/node/webapp/node_modules/restify/lib/response.js posn: position 6964 this: 3bd67e0669b9 (JSObject: ServerResponse) (1 internal frame elided) js: _cb Memory Address of Var Var Type
  • 61. ::jsprint Print JS Objects > 3bd67e0669b9::jsprint { "_time": 1437690472539, "_headers": { "content-type": "text/html", "req_id": "5b7f18f2-7f12-4c68-b07f-3cd75698ba65", "set-cookie": “CENSORED; Domain=.netflix.com; Expires=Fri, 24 Jul 2015 10:27:52 GMT "x-frame-options": "DENY", "x-ua-compatible": "IE=edge", "x-netflix.client.instance": "i-c420596c", }, "output": [], "_last": false, "_hangupClose": false, "_hasBody": true, "socket": { "_connecting": false, "_handle": [...], "_readableState": [...], "readable": true, "domain": null, "_events": [...], "_maxListeners": 10, "_writableState": [...], "writable": true, "allowHalfOpen": true, Actual JS Object Instance
  • 62. ::jsconstructor Show Object Constructor > 3bd67e0669b9::jsconstructor -v ServerResponse (JSFunction: 2421205bced9)
  • 63. ::jssource Print f() Source > 2421205bced9::jssource file: http.js 1066 function ServerResponse(req) { 1067 OutgoingMessage.call(this); 1068 1069 if (req.method === 'HEAD') this._hasBody = false; 1070 1071 this.sendDate = true; 1072 1073 if (req.httpVersionMajor < 1 || req.httpVersionMinor < 1) { 1074 this.useChunkedEncodingByDefault = chunkExpression.test(req.headers.te); 1075 this.shouldKeepAlive = false; 1076 } 1077 } 1078 util.inherits(ServerResponse, OutgoingMessage);
  • 64. Core Dump === Complete Process State
  • 67.
  • 68.
  • 70. gcore(1) GNU Tools gcore(1) NAME gcore - Generate a core file for a running process SYNOPSIS gcore [-o filename] pid
  • 71. Take a Core Dump! root@demo:~# gcore `pgrep node` [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/ libthread_db.so.1". [New Thread 0x7facaeffd700 (LWP 5650)] [New Thread 0x7facaf7fe700 (LWP 5649)] [New Thread 0x7facaffff700 (LWP 5648)] [New Thread 0x7facbc967700 (LWP 5647)] [New Thread 0x7facbd168700 (LWP 5617)] [New Thread 0x7facbd969700 (LWP 5616)] [New Thread 0x7facbe16a700 (LWP 5615)] [New Thread 0x7facbe96b700 (LWP 5614)] 0x00007facbea5b5a9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6 Saved corefile core.5602
  • 73. ::findjsobjects NAME findjsobjects - find JavaScript objects SYNOPSIS [ addr ] ::findjsobjects [-vb] [-r | -c cons | -p prop]
  • 74. ::findjsobjects Find ALL JS Objects on Heap > ::findjsobjects OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS ... 3dfe97453121 18 6721 Array 157a020e01 1304 101 <anonymous> (as Constructor): ... 8f1a53211 13879 12 ReactDOMComponent: _tag, tagName, props, ... 8f1a05691 85776 2 Array 3dfe97451a99 36 5589 Array 23e5d7d44351 1 218020 Object: .2f5hpw2hgjk.1.0.3, ... 8f1a05f31 40533 6 <anonymous> (as ReactElement): type, ... 8f1a04da1 252133 1 Array 8f1a04dc1 125869 7 Array 8f1a04f01 114914 8 Array 8f1a04d39 230924 7 Module: id, exports, parent, filename, ...
  • 75. Memory Leak Strategy ❖ Look at objects on heap for suspicious objects ❖ Take successive core dumps and compare object counts ❖ Growing object counts are likely leaking ❖ Inspect object for more context ❖ Walk reverse references to ïŹnd root object
  • 76. Look at Object Delta Between Successive Core Dumps
  • 77. Uptime = 45mins > ::findjsobjects OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS ... 8f1a04d39 230924 7 Module: id, exports, parent, filename, ...
  • 78. Uptime = 90 mins > ::findjsobjects OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS ... 8f1a04d39 323454 7 Module: id, exports, parent, filename, ...
  • 80. Representative Object > ::findjsobjects OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS ... 8f1a04d39 323454 7 Module: id, exports, parent, filename, ... Representative Object, 1 of 323454
  • 81. Look Closer > 8f1a04d39::jsprint { "id": "/apps/node/webapp/ui/js/pages/akiraClient.js", "exports": {}, "parent": { "id": "/apps/node/webapp/middleware/autoClientStrings.js", "exports": function autoExposeClientStrings, "parent": [...], "filename": "/apps/node/webapp/middleware/ autoClientStrings.js", "loaded": true, "children": [...], "paths": [...], }, "filename": "/apps/node/webapp/ui/js/pages/akiraClient.js",
  • 82. Use ::findjsobjects to Find All “Module” Objects > 8f1a04d39::findjsobjects 8f1a04d39 3fd996bffb39 3fd996bfcff1 3fd996bfbac1 3fd996bf8a19 3fd996bf7949 3fd996bf3ce9 3fd996bf0f19 3fd996bead71 3fd996bea821 3fd996bea001 3fd996be92b1 3fd996be73d1 3fd996be58d1 3fd996bd88b1 3fd996bcb459 3fd996bcaa41 3fd996bc7009
  • 83. Analyze All 320K+ Objects?
  • 84. Custom Querying With Pipes and Unix Tools 8f1a04d39::findjsobjects | ::jsprint ! grep filename | sort | uniq -c
  • 85. Results ... 1 "filename": "/apps/node/webapp/ui/js/akira/ components/messaging/paymentHold.js", 2 "filename": "/apps/node/webapp/ui/js/common/ commonCore.js", 1 "filename": "/apps/node/webapp/ui/js/common/ playPrediction/playPrediction.js", 3 "filename": "/apps/node/webapp/ui/js/common/ presentationTracking/presentationTracking.js", 111061 "filename": “/apps/node/webapp/ui/js/common/ playPrediction/playPrediction.js", 7103 "filename": “/apps/node/webapp/ui/js/pages/ reactClientRender.js", 111061 "filename": “/apps/node/webapp/ui/js/pages/ akiraClient.js", 118257 "filename": “/apps/node/webapp/middleware/ autoClientStrings.js", ... Client Side Modules
  • 86. What’s holding on to these modules?
  • 87. Aim: Find Root Object
  • 88. Walk Reverse Refs with ::findjsobjects -r > 8f1a04d39::findjsobjects -r 8f1a04d39 referred to by 14fd6c5b13c1.parent
  • 89. Root Object > 1f313791bb41::jsprint [ { "id": "/apps/node/webapp/ui/js/pages/akiraClient.js", "exports": [...], "parent": [...], "filename": "/apps/node/webapp/ui/js/pages/akiraClient.js", "loaded": false, "children": [...], "paths": [...], }, { "id": "/apps/node/webapp/ui/js/pages/akiraClient.js", "exports": [...], "parent": [...], "filename": "/apps/node/webapp/ui/js/pages/akiraClient.js", "loaded": false, "children": [...], "paths": [...], }, { "id": "/apps/node/webapp/ui/js/pages/akiraClient.js", "exports": [...], "parent": [...],
  • 90. Spot the Leak var cache = {}; function checkCache(someModule) { var mod = cache[someModule]; if (!mod) { try { mod = require(someModule); cache[someModule] = mod; return mod; } catch (e) { return {}; } } return mod; } Module could be client only, must catch Should cache the fact we caught an exception here
  • 91. Root Cause ❖ Node caches metadata for each module ❖ If require process throws an exception, the module metadata is leaked (bug?) ❖ Client side module meant we were throwing during every request, and not caching the fact we tried to require it ❖ Each request leaks 3+ module metadata objects
  • 92. Memory Leaks ❖ Take successive core dumps (gcore(1)) ❖ Compare object counts (::ïŹndjsobjects) ❖ Growing objects are likely leaking ❖ Inspect object for more context (::jsprint) ❖ Walk reverse references to ïŹnd root obj (::ïŹndjsobjects - r)
  • 93. Post Mortem Debugging is Critical to Large Scale Prod Node Deployments
  • 94. More State than Just Logs ❖ Detailed stack trace (::jsstack) ❖ Function args for each frame (::jsstack -vn0) ❖ Get state of any object and its provenance (::jsprint, ::jsconstructor) ❖ Get source code of any function (::jssource) ❖ Find arbitrary JS objects (::ïŹndjsobjects) ❖ UnmodiïŹed Node binary!
  • 96. But We Can Learn From Them
  • 97. Production Debugging ❖ Runtime Performance ❖ CPU proïŹling/ïŹ‚ame graphs ❖ Runtime Crashes ❖ Inspect program state with core dumps and mdb ❖ Memory leaks ❖ Analyze objects and references with core dumps and mdb
  • 99. Epilogue — State of Tooling ❖ Join Working Group https://github.com/nodejs/post- mortem ❖ Help make mdb_v8 cross platform https://github.com/ joyent/mdb_v8 ❖ Contribute to https://github.com/tjfontaine/lldb-v8 and https://github.com/indutny/llnode
  • 100. Acknowledgements ❖ mdb_v8 ❖ Dave Pacheco, TJ Fontaine, Julien Gilli, Bryan Cantrill ❖ CPU ProïŹling/Flamegraphs ❖ Brendan Gregg, Google V8 team, Ali Ijaz Sheikh ❖ Linux Perf ❖ Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Peter Zijlstra ❖ lldb-v8 ❖ TJ Fontaine ❖ llnode ❖ Fedor Indutny
  • 102. THANKS ❖ Questions? We’re Hiring! ❖ yunong@netïŹ‚ix.com ❖ @yunongx
  • 103. Citations ❖ Slides 29-32 used with permission from “Java Mixed- Mode Flame Graphs”, Brendan Gregg, Oct 2015 ❖ Slide 26 used with permission from http:// www.brendangregg.com/FlameGraphs/ cpuïŹ‚amegraphs.html