Times have changed. Multi-core CPUs have become the norm and multi-threading has been replaced by asynchronous programming. You think you know everything about async/await... until something goes wrong. While debugging synchronous code can be straightforward, investigating an asynchronous deadlock or race condition proves to be surprisingly tricky.
In this talk, follow us through real-life examples and investigations to cover the main asynchronous code patterns that can go wrong. You will tumble on deadlock and understand the reasons behind ThreadPool thread starvation.
In addition to WinDbg magic to follow async/await chains, Visual Studio goodies won't be forgotten to quickly analyze hundreds of call stacks or tasks status.
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
NYAN Conference: Debugging asynchronous scenarios in .net
1. Organized by Donating to
R&Devents@criteo.com
criteo.com
Medium.com/criteo-labs
@CriteoEng #NYANconf
Debugging asynchronous scenarios
by Christophe Nasarre
Kevin Gosse
NYAN conference
2. First case: a service refuses to stop
• Still in running state in Windows Services panel
4. Parallel Stack in Visual Studio
• Yes: VS is able to load a memory dump
• This is a nice way to visually see what is going on
→ We are waiting for ClusterClient.Dispose() to end
5. In production → take a memory snaphot
procdump -ma <pid>
Which foreground thread is still running?
what ClusterClient.Dispose() is waiting for?
Look at the Code Luke!
7. In production → take a memory snaphot
procdump -ma <pid>
Which foreground thread is still running?
what ClusterClient.Dispose() is waiting for?
Look at the Code Luke!
Look for _agent state
13. In production → take a memory snaphot
procdump -ma <pid>
Which foreground thread is still running?
what ClusterClient.Dispose() is waiting for?
Look at the Code Luke!
Look for _agent state
→ Exception broke the responses ActionBlock
14. BONUS: more continuations
• A few other continuation scenarios that you may encounter
✓ Task.Delay
✓ Task.WhenAny
✓ Special cases
15. Why a List<object> as continuation?
Task DoStuffAsync()
{
var task = SendAsync();
task.ContinueWith(t => LogStuff(t));
return task;
}
// user code
await DoStuffAsync();
DoSomethingSynchronously()
Task
m_continuationObject
nullStandardTaskContinuation
List<object>
StandardTaskContinuation
*TaskContinuation
16. Why a empty List<object> as continuation?
async Task DoStuffAsync()
{
var T1 = Task.Run(…);
var T2 = Task.Run(…);
await Task.WhenAny(T1, T2);
… // T2 ends first
}
T1
m_continuationObject
null
T2
m_continuationObject
null
CompleteOnInvokePromise
CompleteOnInvokePromise
empty List<object>object
17. Investigation 1 - key takeaways
1. Thread call stacks do not give the full picture
• Even Visual Studio parallel stacks is not enough
2. Require clear understanding of Task internals
• m_continuationObject and state machines
3. Start from the blocked task and follow the reverse references chain
• sosex!refs is your friend
19. In production → take a memory snaphot
procdump -ma <pid>
look at call stacks in Visual Studio
20. In production → take a memory snaphot
procdump -ma <pid>
look at call stacks in Visual Studio
→ what are those tasks (we are waiting for) doing?
21. In production → take a memory snaphot
procdump -ma <pid>
look at call stacks in Visual Studio
→ what are those tasks (we are waiting for) doing?
look at tasks in WinDBG
→ no deadlock but everything is blocked…
30. In production → take a memory snaphot
procdump -ma <pid>
look at call stacks in Visual Studio
→ what are those tasks (we are waiting for) doing?
look at tasks in WinDBG
→ no deadlock but everything is blocked…
→ ThreadPool is starved
31. Investigation 2 - key takeaways
1. Waiting synchronously on a Task is dangerous
2. ThreadPool scheduling is unfair
3. 0% CPU + increasing thread count = sign of ThreadPool starvation
32. Conclusion
• Understand the underlying data structures
• Think of causality chains instead of threads call stack
• Visual Studio is your friend
• Parallel Stacks to get the big picture
• WinDBG is your true ally
• Use and abuse of sosex !refs
• You knew that waiting on tasks is bad
• Now you know why
33. Resources
Criteo blog series
• http://labs.criteo.com/
• https://medium.com/@kevingosse
• https://medium.com/@chnasarre
Debugging extensions
• https://github.com/chrisnas/DebuggingExtensions (aka Grand Son Of Strike)
Contacts
• Kevin Gosse @kookiz
• Christophe Nasarre @chnasarre