Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
How We Analyzed
1000 Dumps in
One Day
DINA GOLDSHTEIN
EMBEDDED TEAM LEADER, BRIGHTSOURCE ENERGY
BLOGS.MICROSOFT.CO.IL/DINA...
Agenda
What we do and why we need dumps
Manual analysis process
The holy grail: automatic dump analysis
Our automatic tria...
About Us
BrightSource Energy builds solar power plants
Power plants have control software
Control software crashes
Our Production Environment
The office (development) network is connected to the Internet
The production (power plant) netw...
In the Beginning…
Mask all crashes by a nice error dialog and an “orderly” shut-down
Analyze errors using very extensive l...
Crash Dumps
A dump is a snapshot of a process’s memory: threads, heap, exceptions,
locks, etc.
Various tools can open dump...
How???
An executable can be compiled with debug information - the symbols
Symbols files (.PDB) contain information which a...
How???
An executable can be compiled with debug information - the symbols
Symbols files (.PDB) contain information which a...
Symbol Server
Symbols can be provided to the debugger explicitly
But they can also reside in a Symbol Server (stored by na...
Production Crashes
We can’t attach a debugger, or do remote analysis of production errors
Windows can be configured to aut...
Manual Dump Analysis
With high failure rates, we’re talking dozens of dumps per day from a
single facility
Many errors are...
Automatic Dump Analysis
ClrMD is a NuGet package which provides a debugger API for dumps
and live processes
◦ Works with b...
Some Code…
target = DataTarget.LoadCrashDump(dumpPath);
if (target.ClrVersions.Count > 0) {
ClrInfo dacVersion = target.Cl...
Our Dump Analysis Workflow
At the end of a shift, operators copy dumps to a network share in the
office network
A script g...
From Hours to Seconds
Manual, tedious, error-prone dump analysis by red-eyed developers…
…Automatic, happy, untiring ninja...
DEMO
ANALYZE 74 DUMPS IN A FEW MINUTES
Summary
What we do and why we need dumps
Manual analysis process
The holy grail: automatic dump analysis
Our automatic tri...
Questions?
Thank You!
DINA GOLDSHTEIN
EMBEDDED TEAM LEADER, BRIGHTSOURCE ENERGY
BLOGS.MICROSOFT.CO.IL/DINAZIL/
@DINAGOZIL
...
Nächste SlideShare
Wird geladen in …5
×

How We Analyzed 1000 Dumps in One Day - Dina Goldshtein, Brightsource - DevOpsDays Tel Aviv 2015

  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

How We Analyzed 1000 Dumps in One Day - Dina Goldshtein, Brightsource - DevOpsDays Tel Aviv 2015

  1. 1. How We Analyzed 1000 Dumps in One Day DINA GOLDSHTEIN EMBEDDED TEAM LEADER, BRIGHTSOURCE ENERGY BLOGS.MICROSOFT.CO.IL/DINAZIL/ @DINAGOZIL
  2. 2. Agenda What we do and why we need dumps Manual analysis process The holy grail: automatic dump analysis Our automatic triage workflow
  3. 3. About Us BrightSource Energy builds solar power plants Power plants have control software Control software crashes
  4. 4. Our Production Environment The office (development) network is connected to the Internet The production (power plant) network is isolated There is a (very slow) one-way link from production to development
  5. 5. In the Beginning… Mask all crashes by a nice error dialog and an “orderly” shut-down Analyze errors using very extensive log files from all components Alas, last error in log doesn’t always correspond to the fiend Need to know exact exception, when it occurred and where!
  6. 6. Crash Dumps A dump is a snapshot of a process’s memory: threads, heap, exceptions, locks, etc. Various tools can open dump files and see what’s inside
  7. 7. How??? An executable can be compiled with debug information - the symbols Symbols files (.PDB) contain information which allows debuggers to match addresses and other information in the file to names of DLLs, functions, variables, lines of code, etc.
  8. 8. How??? An executable can be compiled with debug information - the symbols Symbols files (.PDB) contain information which allows debuggers to match addresses and other information in the file to names of DLLs, functions, variables, lines of code, etc.
  9. 9. Symbol Server Symbols can be provided to the debugger explicitly But they can also reside in a Symbol Server (stored by name and hash) The debugger can download debugging symbols automatically for the right product version
  10. 10. Production Crashes We can’t attach a debugger, or do remote analysis of production errors Windows can be configured to automatically save a dump when a process crashes When crashes occur, dump files are generated and transmitted to a central location and then the office network
  11. 11. Manual Dump Analysis With high failure rates, we’re talking dozens of dumps per day from a single facility Many errors are exact duplicates Manual analysis means: ◦ Copy dump to my machine (it’s not uncommon for a dump to be 2-3GB) ◦ Copy debugger support files and symbols (if no symbol server is present) ◦ Open dump in debugger (Visual Studio/WinDbg) ◦ Locate the exception and call stack ◦ Triage and open a bug for the relevant developer ◦ Probably around 10 minutes per dump…
  12. 12. Automatic Dump Analysis ClrMD is a NuGet package which provides a debugger API for dumps and live processes ◦ Works with both native and managed code The core of our automatic solution uses ClrMD for automatic dump analysis and triage: ◦ Exception information ◦ Call stack ◦ Likely faulting component Recently became open source on GitHub
  13. 13. Some Code… target = DataTarget.LoadCrashDump(dumpPath); if (target.ClrVersions.Count > 0) { ClrInfo dacVersion = target.ClrVersions[0]; string dacLocation = dacVersion.TryDownloadDac(); runtime = target.CreateRuntime(dacLocation); } var dc = (IDebugControl)target.DebuggerInterface; dc.GetLastEventInformation(out eventType, out processId, out threadIndex, extraInformation, extraInformationSize, out extraInformationUsed, description, descriptionSize, out descriptionUsed); var dso = (IDebugSystemObjects)target.DebuggerInterface; var sysIds = new uint[count]; dso.GetThreadIdsByIndex(threadIndex, count, null, sysIds); if (IsThreadManaged(sysIds[0])) { var td = runtime.Threads.First(t => t.OSThreadId == sysIds[0]); clrException = td.CurrentException; }
  14. 14. Our Dump Analysis Workflow At the end of a shift, operators copy dumps to a network share in the office network A script goes over the dumps one by one and uses ClrMD to find the root cause of the error According to a configuration file, the faulting module’s owner is alerted and a ticket is opened in Redmine
  15. 15. From Hours to Seconds Manual, tedious, error-prone dump analysis by red-eyed developers… …Automatic, happy, untiring ninja script 
  16. 16. DEMO ANALYZE 74 DUMPS IN A FEW MINUTES
  17. 17. Summary What we do and why we need dumps Manual analysis process The holy grail: automatic dump analysis Our automatic triage workflow Resources: ◦ The slides: http://tinyurl.com/dumpstlv ◦ ClrMD on GitHub ◦ DumpAnalyzer on GitHub ◦ msos on GitHub
  18. 18. Questions? Thank You! DINA GOLDSHTEIN EMBEDDED TEAM LEADER, BRIGHTSOURCE ENERGY BLOGS.MICROSOFT.CO.IL/DINAZIL/ @DINAGOZIL "Retouched Kitty" by Ozan Kilic is licensed under Creative Commons Attribution 2.0

×