Management and exposure of application programming interfaces (APIs) are hot topics in the world of the Internet of Things. But how do you make sure that your APIs are always reachable, scalable, and capable of processing high volumes of requests with zero downtime? Share common patterns and best practices, and gain insights into building the most powerful, robust APIs that can serve millions of things concurrently.
-- SAP TechEd && d-code Berlin (November 13 2014)
4. “If you add up all the smartphones and the tablets
and the digital televisions and the PCs... we see a
large opportunity of perhaps 3 billion to 4 billion
units per annum, but we see an embedded market
that’s maybe 30 billion to 40 billion units per
annum”
- ARM CEO Warren East
5. Problem definition
For example, running an application that depends on 30
services that each have 99.99% uptime we get:
99.9930 = 99.7% uptime
0.3% of 1 million requests = 3,000 failures
2+ hours downtime/month even if all dependencies have excellent
uptime.
Reality is generally worse.
8. Design principles
• Restrict any single dependency from using up all user threads.
• Shed load and fail fast instead of queueing.
• Provide fallbacks wherever feasible to protect users from failure
• Use isolation techniques (such as bulkhead, swimlane and circuit breaker
patterns) to limit impact of any one dependency.
• Optimize for time-to-discovery through near real-time metrics, monitoring
and alerting
• Optimize for time-to-recovery with low latency propagation of configuration
changes and support for dynamic property changes in virtually all aspects of
Hystrix to allow real-time operational modifications with low latency
feedback loops.
• Protect against entire dependency client execution, not just network traffic
9. Use timeouts
Time-out calls that take longer than defined thresholds. A
default exists but for most dependencies is custom-set via
properties to be just slightly higher than the measured
99.5th percentile performance for each dependency.
10. Bulkheads
Maintain a small thread-pool (or semaphore) for
each dependency and if it becomes full commands
will be immediately rejected instead of queued up.
Dependencies with Clogged threads pools shouldn’t
hinder access to other dependencies.
11. Circuit breakers
Trip a circuit-breaker automatically or manually
to stop all requests to that service for a period of
time if error percentage passes a threshold.
12. Fallback logic
Perform fallback logic when a request
fails, is rejected, timed-out or short-circuited.
13. Measure
Measure success, failures
(exceptions thrown by client),
timeouts, and thread
rejections.
14. Request collapsing
Collapse multiple concurrent user request
into one a single backend dependency call
(within a short time window of e.g. 10ms)
15. Request caching
Reduce the number of request being sent to the
backend dependencies by caching and de-duping
requests.
16. Define a pipeline and context
Many service share base functionality such as
authentication. Defining a clear request pipeline and
context, optimizes shared logic and prevents
repeating calls (e.g. getCustomer)
17. Don’t lock the bonnet
Make it possible to switch on logging and direct certain
traffic to a specific node
18. REST vs Experience API
/users/<id>/ratings/title
/users/<id>/queues
/users/<id>/queues/instant
/users/<id>/recommendations
/catalog/titles/movie
/catalog/titles/series
/catalog/people
VS