APIs in production - we built it, can we fix it?

TEAM
Oh, there is. I've found it:
It's in the A-hole.

wir haben's gebaut, aber
können wir's auch reparieren?

#apisummit @MartinGoodwell
About me
Passionate about life,
technology, and the people
behind both of them.
Trying not to be an A-hole.
• Started with Commodore 8-bit (VC-20 and C-64)
• Built Null-modem connections for playing Doom and WarCraft
• Built IPX/SPX networks between MS-DOS 5.0 and Windows 3.1
• Did DevOps before they called it that way (mainly Java and Web)
for about 10 years
• Now at Dynatrace Innovation Lab
• Tech Lead for Microsoft Technologies
and Software Architecture
• Talking, blogging, webinaring, and innovating
• Find me on Twitter: @MartinGoodwell

The Rules
• Please, ask or interrupt anytime.
• Feel free to talk to me anywhere around.

Warm up
• What's your occupation?
• Dev, Ops, Non-technical
• What's your technology stack?
• Java, .net, Node.js, Go, PHP, Python
• Who of you does
• Cloud
• API
• Application Monitoring
• Level of automation
• Version control (also for stored procedures?)
• Build server
• Automated deployment
• Who of you builds their own troubleshooting tools?

API call
API call
API call

API
Gateway
API call
API call
API call
API call
API call
API call

Two "real" problems of every project
(but with APIs it's even more complicated)
1) it's not working
2) it's too slow
One "developer" problem
1) it's crap. we need to redo it

Technical problem solving

Monitoring

Host metrics
• CPU usage
• Memory usage
• Disk I/O
• Network performance
• No insight into app's
problems and performance

Application metrics

In your code

Use statsd

statsd real quick
http://www.slideshare.net/DatadogSlides/dev-opsdays-tokyo2013effectivestatsdmonitoring

Logging

API call
API call
API call
API
Gateway
Logfile
Logfile
Logfile

https://www.elastic.co/blog/elastic-stack-primer
Elastic Stack (fka ELK-Stack)
Elasticsearch
• datastore
Logstash
• the logging interface
Kibana
• the dashboard

http://theburningmonk.com/2015/05/a-consistent-approach-to-track-correlation-ids-through-microservices/

Metrics
Errors
Miscellaneous
Usage information Charts about feature usage
Chart about statistical data
Details about exception (like stacktrace)
Charts about performance
Whatever
Anatomy of an arbitrary log file

It does not matter,
what we want to get out of our logfiles.
Whatever it is,
we have to filter out lots of noise

What do we log?
• Information about exceptions
What do we not log?
• Metrics
How do we log?
• in JSON
• including a correlation id
Where do we log?
• to a central logging server

Logging learnings
• Use a logging server (eg ELK stack)
• directly log as JSON
• or at least store as JSON
• Using logging for monitoring is expensive
• log analysis is a real resource hog
• works great for troubleshooting
• works great with limited problem scope
• for Java, use Logback via SLF4J
• to local logfiles
• to logstash
• to syslog

Logging vs Monitoring
Monitoring
• numeric only
• Analysis and aggregation much cheaper
• perfect for charting
• and long time reporting
• Numeric only
Logging
• Text or numeric
• Analysis and aggregation is expensive
• b/c lots of noise
• only for limited timeframe
• Can contain text with detailed
descriptions
@MartinGoodwell

Call Tracing

Google Dapper paper
• The Dapper paper (2010)
http://research.google.com/pubs/archive/36356.pdf
• OpenTracing (for Go, JavaScript, Java, Python, Objective-C, C++)
http://opentracing.io/documentation/
• OpenZipkin (by Twitter)
• http://zipkin.io/

https://github.com/openzipkin/zipkin

http://zipkin.io/

https://github.com/spring-cloud/spring-cloud-sleuth
Spring Cloud Sleuth is a distributed tracing solution on top of Spring Cloud

http://trace.risingstack.com

Databases

Getting database insight
• Database automation
• eg. DB Maintain
• https://dbmaintain.github.io/
• Database performance logging
• log4jdbc
• https://github.com/arthurblake/log4jdbc

3rd party calls

Monitoring external API calls
• Monitor
• nr of calls
• response times
• errors
• Netflix OSS Hystrix
• circuit breaker
• trip count
• If you create a public API, please keep request headers in response

Human problem solving
- or -
The Dev/Ops dilemma

#apisummit @MartinGoodwell@MartinGoodwell

What open-source offerings still miss
• No project that takes care about automating Operation's use-cases
• No single umbrella project for
• Monitoring
• Log-Analysis
• Call-Tracing
• DB-Analysis

You can't fight in here, Gentlemen.
This is the war room!

The commercial hood

Broad technology support

Zero-conf and ready to run dashboards

Method level insight for code and database

Host, process and network metrics

Call-tracing across technologies

Including log analytics

Full Docker insight (zero-conf)

Dedicated support for most important technologies

Automated baselining, root-cause-analysis, and problem correlation

DevOps is about collaboration.
Collaboration requires documentation.
Automation is implicit documentation.
But there is no automation for
supporting Ops with troubleshooting.

What did you learn?
@MartinGoodwell
martin.gutenbrunner@dynatrace.com

And don't forget
TEAM
about the A-hole.

Thank you!

APIs in production - we built it, can we fix it?

Recommended

Recommended

More Related Content

What's hot

What's hot (9)

Viewers also liked

Viewers also liked (19)

Similar to APIs in production - we built it, can we fix it?

Similar to APIs in production - we built it, can we fix it? (20)

More from Martin Gutenbrunner

More from Martin Gutenbrunner (6)

Recently uploaded

Recently uploaded (20)

APIs in production - we built it, can we fix it?