Logging and Exception handling is one of the easiest tools to use when debugging; but how can you take those massive logs, thousands of errors and effortlessly use them to build a better product? This presentation share our developers team's lesson-learned to expedite releases and fix app issues faster. It discuss best practices that will help your dev team build a culture of logging such as: what to log, how to log it, and how to proactively put it to use.
3. 3
Logging: not hard, but not perfect
The Easy
• Libraries: log4net, nLog, Elmah,
Enterprise Library
• Very easy to integrate and use
• Great for debug
The Not Easy
• Production log file access
• Volume of data * Number of
servers
• Rotation and retention
• Flat and hard to query
• Getting enough context
• Proactively consuming /
monitoring
4. 4
Today’s topics
Log All The Things
Building a “culture of logging” to
always have relevant, contextual
logs that don’t add overhead.
Make the most of your logging
framework.
Work Smarter, Not Harder
Using proper tooling to
consolidate & aggregate all of
your logging to be available to
everyone that needs it, quickly
consume it, and proactively put it
to use.
5. 5
What things?
• Timestamp
• Thread ID
• Transaction ID
• Server
• Environment
• User Data
• Post Data
• Cookies
• Querystring
• Etc, etc
If you’re going to take the time to log something, log enough to
give you a play-by-play picture of events.
When?
• For every important business transaction
6. Start Simple – handle all exceptions
Pros Cons
Elmah -Easy
-Lots of storage possibilities
-Web page to view errors
-Limited email & notification possibilities
-Unhandled only
-Advanced usage requires a lot of configuration.
-If MVC, don’t forget to grab Elmah.MVC
-Don’t leave your errors web page exposed in
producton!
Global.asax (Application_Error)
-Use the framework of your choice
-Captures all exceptions
-Fully customizable
-You are responsible for returning state back to the
method that called (e.g. a controller action) or
redirecting
Try/Catch blocks -Try/Catch/Finally generally perceived as a good
pattern
-You will catch 100% of the exceptions that you
“catch”
-You will miss 100% of the exceptions that you don’t
“catch”
-Can be “error prone” if you are only catching a
certain type that doesn’t happen
-Time consuming to “retro fit” into an app
Exception Filters (MVC) -Like using global.asax, very customizable -You have to implement everything, including errors
that happen globally, how to log, etc
7. Adding Context
Treat logs like “debug when you can’t attach”
• Use verbosity levels appropriately – allows you to “turn up” and “turn down” the
amount of logging as necessary
• Identity key processes and functions that need to be audited and logged
• Capture the key steps, measure success / failure
• Capture contextual data that helps you solve an issue or answer a question (i.e.
logged in user, Http Request data)
Don’t forget about logging in your client code
• Choose a library that is smart enough to bridge server
and client code while keeping context
9. 9
Adding Context - Advanced
• Logged the input parameters
• Logged the object created in my
DbContext
• Had to use a custom appender for
log4net that knows how to serialize
the objects
Alternatively, would have to serialize
to a string:
log.Debug(string.Format("Creating a foo:
{0}",JsonConvert.SerializeObject(foo)));
10. Adding Context – Power User!!
Wouldn’t it be nice to:
• Always capture the user who generated the log or exception?
• Always capture basic request details?
This would allow us to have deeper, more meaningful data to search
through our logs.
14. Tips & Tricks
• Use your verbosity levels
• Make sure your logging framework is asynchronous
• Log heavily is core / shared services & other high fan-in areas
• Always log the “who”
• Use patterns like AOP to write enty / exit logging once, apply
everywhere
• Make it a part of your development culture (don’t just write the logs –
USE them)
15. Working Smarter
At Stackify, we set out to make our logging and error monitoring
smarter…
…it became a core part of our product offering.
16. Working Smarter
Logs & Errors Tell (Some of) the Story
• Logs as “breadcrumb trail” are good for:
• Capturing errors in raw form
• Capturing key events (depending on
logging level)
• Piecing together the story (if you logged it) on
one server at a time
• Error monitoring is good for:
• “After the fact” notification
• Seeing just the error in question
• Pointing you back at log files for more info
17. Working Smarter
The Trouble With Your Current Logs:
• Working with all those log files is tedious and time consuming
• No cross-server / cross-app correlation
• No trends / telemetry
• File retention & rotation
• Intermixed with Exception details
18. Working Smarter
The Trouble With Your Errors:
• Logged errors usually are no more than exception and stack trace
• No cross-server/cross-app tracking for errors
• Missing/hard to get request details
• No trends / telemetry
• Need intelligent notifications
• In Error Monitoring apps,
disconnected from Logs!
19. 19
Indexing:
• Search across apps and servers w/o managing log files
• Fully indexed “Context” details
• Real-time log tailing
• Narrow scope by time range
Telemetry
• Visualize trends in logging and exceptions
• Across all levels of granularity
• Contextual telemetry – for any app or server, by default
Alerts
• Notify when new or “first chance” exceptions occur
• Notify on error regression
• Notify when error rates are abnormal
• Notify on any log query we can think of
Engage
• Provide access to those who can help solve issues
• Give everyone the same picture, with no extra effort
Architecture
• Single platform solution
• Relate exceptions to relevant logs for maximum context
• Contribute to picture of “application health”
26. Benefits of an integrated logging / error
platform
• Get real telemetry, know how well your app behaves from release to
release
• In context to overall performance
• Use in QA environments, look for new exceptions and abnormal rates
before it is released to prod
• Proactively hunt down bugs & improve your codebase
• Know about an issue before your customers tell you
• Quicker, more efficient debugging via logs
Editor's Notes
Hello, I’m Jason Taylor, and I’m the CTO of Stackify. I’m here today to talk to you about making the most of your logging and exception data.
So why am I talking about logging and exceptions. No one has ever claimed that logging is difficult or advanced. Everyone has some form of logging and exception handling.
So as much as it’s not hard, it’s not perfect either. In order to explain why, I’m going to tell you a little bit about Stackify. Full disclosure: our lessons learned became a core part of our product, and I’ll be showing that a bit today, but more importantly I’m going to talk about the *why* of what we did, and what the gains were.
Stackify is an application performance monitoring service. It’s a space that I’m sure everyone has heard a lot about. Let me tell you, building a SaaS offering to handle billions of data points per day, that others rely on to guage *their* application performance is hard.
We have a highly distributed, asynchronous architecture built to handle this high volume of transactions. As such, one of the most critical aspects of our architecture is how we capture log messages and exceptions across our many service boundaries. In order to measure our own performance, we need great visibility into these messages.
Our own logging and exception handling was failing us. Some things were easy: for .Net developers, just grab your library of choice – usually via Nuget and start adding log messages. Done. But there are some things that are difficult as well, and especially so in a highly distributed architecture, operating at scale.
And that brings us to today’s topics, split into two buckets. We could really talk about these in any order, but I’m going to discuss “Log All The Things” first. It’s about building a culture of logging: logging the right things in the right places.
Then we’ll talk about how to capitalize on your logging improvements to proactively put it to use so you can Worker Smarter, Not Harder.
So… LOG ALL THE THINGS! What things?
You need to think of your log files as a playback of a transaction. Why bother going to the trouble of logging if it doesn’t provide any value?
For any important, critical, business transaction, you need to include critical data such as server name, environment, user data, post data, cookies, querystring, etc, etc. It’s all about adding context.
(example: querying a flat log file – find lots of hits, but not necessarily related. Need a thread id or transaction id to string together relevant timeline of events in a process)
If you don’t have enough data to know the who and what around a key transaction, you’ve missed the point.
This was certainly the first hurdle for us to get over at Stackify. Data would come in through our web services, receive some validation, move onto a service bus, get picked up from the service bus by workers, be processed and dispatched to any number of storage mechanisms, and possibly back to the UI. Without this sort of context, it made it difficult when debugging an issue. By adding it, we can instantly answer “Who is this impacting? What type of message is this impacting?” and so on and so forth.
Let’s talk about how we’ve added that context. The very first thing was to make sure all exceptions are being handled. If you aren’t handling your exceptions – and all of them, you’re missing critical performance data, and critical debug data that can help you quickly isolate the issue. And, unfortunately while you are missing those, your clients or customers are probably seeing them.
Remember when you were a junior dev and someone – or a book – told you to wrap everything with a try / catch / finally? Yeah, you alwaaaaays do that, don’t you? And you certainly wouldn’t just be swallowing any of those exceptions either? Good, didn’t think so. Kudos to all of you.
There are a number of ways you can quickly add in exception handling throughout your app. They all have pros and cons. Elmah is super easy to start with, but it’s only going to capture errors that aren’t already handled.
Exception Filters work well form MVC, but you are responsible for the entire implementation and it’s only going to grab controller exceptions by default. You still need to provide an implementation for more global errors.
And lastly, if you just override Application_Error in the global.asax, you can catch everything. You are still, however, responsible for all of the implementation details.
Once you’ve handled all of your exceptions, it’s time to add your logging around key business transactions. Some things you should consider as you do this:
Now, lets look at some examples of adding context. For demo purposes I‘m using log4net, but the theory is the same no matter how you log.
In the example on the left, all I’ve done is implement a try / catch. When this fails – and it can: the method takes a nullable int as a parameter, but the entity field doesn’t allow this – all I am going to get is a generic exception.
On the right, is a typical log pattern I see for debug. It’s really no better than setting breakpoints in code, and the developer is only going to know they entered the function, and then get the above exception. If they refer back to the code, they’ll see a log statement that was skipped, but this really doesn’t provide anything better than just capturing the exception.
Because we have no context. We don’t know WHO tried to create this and we don’t know what values were passed in. Even if it succeeds, we don’t get any of the output from that successful database save (such as the primary key that would have been created) which can come in handy when trying to do further debugging in a high volume production system.
This, however, is a much better pattern.
We start by logging the input paramaters. And then we logged the resulting object after it was saved. Now, whether it succeeds or fails, I have enough data to know WHAT failed, and why. By adding similar logging upstream and down, I can begin to paint a broad picture of this transaction.
One thing to note is that log4net’s logging methods take an “object debugData” parameter. It’s up to *you* to make sure you serialize this properly. In my case, I’m using our own custom appender which then uses JSON.Net to serialize objects I pass in. Otherwise, if using a file appender, I’d have to first serialize the object to a string to actually output it.
But we aren’t done yet: read slide.
Fortunately for us, log4net provides the ability to add context properties to your logging statements. There are a couple of types. We are going to use the logical thread context to grab user and request details on every single log statement.
All it requires is a little configuration. I tell log4net the key names I wish to use.
Then, I simply set them - they will be available in the logical thread context properties array.
In my case, I want to log the user which is an Iprinicipal type. The best place to get this is by overriding authenticate request in my global.asax.
Now, with every log statement on this thread, log4net will automatically add this data in. I’ll get the identity, roles, and claims for the user.
I can do similar with the Request details, although I wouldn’t recommend logging the entire HTTP Request with each statement, but perhaps some useful details such as form values, querysting values, requestor ip address, etc.
And the result is rich errors and logs with deeply nested contextual business objects, with very little effort. The view you see here is a custom log viewer that we created, and I’ll talk more about that in a couple of minutes.
Before we move on, just a couple of tips, tricks, and notes on Logging All the Things.