IT Service Management (ITSM) Best Practices for Advanced Computing
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
1. The Economies of Scaling Software
Abdelmonaim Remani
@PolymathicCoder
2. Creative Commons Attribution Non-Commercial License 3.0 Unported
The graphics and logos in this presentation belong to their rightful
owner
3. About Me
•
Platform Architect at just.me Inc.
•
JavaOne RockStar and frequent speaker at many developer events and conferences
including JavaOne, JAX, OSCON, OREDEV, 33rd Degree, etc...
•
Open-source advocate and contributor
•
Active Community member
•
•
The NorCal Java User Group
The Silicon Valley Dart Meetup
Bio:
Twitter:
http://about.me/PolymathicCoder
@PolymathicCoder
Email:
abdelmonaim.remani@gmail.com
SlideShare:
http://www.slideshare.net/PolymathicCoder/
|
@PolymathicCoder
5. The Title of the Talk
• The Economies of Scale
• “In microeconomics, economies of scale are the cost
advantages that enterprises obtain due to size [...] often
operational efficiency is [...] greater with increasing scale [...]” Wikipedia
|
@PolymathicCoder
7. Blurred Lines…
• Only the enterprise worried about scalability
• The rise of social and the abundance of mobile
• An exponential growth of internet traffic
• The creation of a spoiled user-base
• I want to see the closest Moroccan restaurants to my
current location on a map along with consumer ratings and
whether any of my friends has recently checked-in in the
last 30 days
• The lines are blurred between consumer applications
and the enterprise applications
|
@PolymathicCoder
8. The Bar Is Higher!
Scalability is everyone’s problem…
|
@PolymathicCoder
10. The Common Definition
• The ability of an application to handle an increasing
amount of work without performance degradation
• Not a good definition! It implies:
• You’ll need to scale forever
• Scalability is relative; It is bound by one’s specific needs
• You’ll need to be fully scalable from day one
• Scalability is evolutionary; It is a gradual process
• There are no external constraints
• Unrealistic
|
@PolymathicCoder
11. A Better Definition
• The ability of an application to gracefully evolve within
the constraints of its ecosystem in order to handle the
maximum potential amount of work without
performance degradation
• Work?
• Simultaneous requests
• Performance degradation?
• Increased latency or decreased throughput
|
@PolymathicCoder
12. A Black Art!
• Don’t be surprised if
• Your application supports one
million users
• You add one more feature
• 500,000 user load crashes your
system or renders it unusable
|
@PolymathicCoder
14. Syllogismo
• To scale is to reduce latency
• To reduce latency is to address bottlenecks
• To scale is to address bottlenecks
• The usual suspects
• The CPU
• The Storage I/O
• The Network I/O
• Inter-related
|
@PolymathicCoder
16. Overcoming the CPU Bottleneck
• Nothing affects the CPU more than the instructions it is
summoned to execute
• This is about your application
• How it is written (Architecture, code base, etc..)
• How it is deployed
|
@PolymathicCoder
18. Architecture?
• “Things that people perceive as hard-to-change” -Martin
Flower
• http://martinfowler.com/ieeeSoftware/whoNeedsArchitect.pdf
• Decision you commit to; the ones that will be stuck with
you forever
|
@PolymathicCoder
19. Be Wise… Think Twice…
• Choose the right technologies
•
•
Platform
Languages
• Frameworks
• Libraries
• Make the right abstractions
•
Loosely-coupled components
• Functional abstractions
• Technical abstractions
• Make sure that the latter is subordinate to the former and not the other way
around
|
@PolymathicCoder
21. Write Good Code
• Think your algorithms through and mind their complexity
(Asymptotic Complexity, Cyclomatic Complexity, etc…)
• SOLIDify your design
• Single Responsibility, Open-Closed, Liskov Substitution,
Interface Segregation, and Dependency Inversion
• Understand the limitation of your technology and
leverage its strengths
|
@PolymathicCoder
25. You do all that…
You’ll end up with…
At best…
The fading tradition of making cow dung piles
http://news.ukpha.org/2011/01/the-fading-tradition-of-making-cow-dung-piles/
|
@PolymathicCoder
27. Technical Debt
• What is it?
• The quick-and-dirty you are not proud of
• What you would have done differently haven't you had time
• It’s a matter of time before it starts to smell really bad
• What to do?
• The fact you recognize it as debt is good thing in itself
• Keep tabs and refactor often
• Cut the right corners
• Don’t mortgage architecture (Don’t lock yourself out)
|
@PolymathicCoder
29. Vertical Scaling
• Vertical Scaling (Scaling Up)
• On a single-node system
• Adding more computing resources to the node (Getting a beefier
machine)
• Writing code to harness the full power of the one node
|
@PolymathicCoder
30. Parallelism At The Node Level
• Writing concurrent code of simultaneously executing
code
• Simple business logic within containers is already multithreaded
• Executing complex business logic within a reasonable
time
• Break it into smaller steps
• Execute them in parallel
• Aggregate data back
|
@PolymathicCoder
31. Easier Said Than Done…
• Moore’s Law
• Performance gain is automatically realized by software (Code is
faster on faster hardware)
• Nothing is forever…
• The era of the multi-core chip
• We need to write code to take advantage of all cores
|
@PolymathicCoder
32. Easier Said Than Done…
• Synchronize state across threads across multiple cores
• Good luck!
• Relay on frameworks and libraries (Fork/Join, Akka,
etc…)
• Go immutable
• Not always straightforward or possible
• Go functional (Scala, Clojure, etc…)
|
@PolymathicCoder
33. It Gets More Interesting…
• Amdahl’s Law
• Throwing more cores does not necessarily result in performance
gain
• Diminishing return at some point no matter how many cores you
throw in
|
@PolymathicCoder
34. Miscellaneous
• Leverage Probabilistic data structures and algorithms
• Bloom Filters, Quotient filters, etc…
• Go Reactive
• http://www.reactivemanifesto.org/
• RxJava, Spring Reactor, etc…
|
@PolymathicCoder
36. Horizontal Scaling
• Horizontal Scaling
• On a distributed system (A cluster)
• Adding more nodes
• Writing code to harness the full power of the cluster
|
@PolymathicCoder
37. Topology
• A typical cluster consists of
• A number of identical application server nodes behind a load
balancer
|
@PolymathicCoder
38. Topology
• A typical cluster consists of
• A number of identical application server nodes behind a load
balancer
A number?
• It depends on how many you actually need and can
afford
• Elastic Scaling / Auto-Scaling
• The number of live nodes within the cluster shrinks and grows
depending on the load
• New ones are provisioned or terminated as needed
|
@PolymathicCoder
39. Topology
• A typical cluster consists of
• A number of identical application server nodes behind a load
balancer
Identical?
• Application nodes are cloned off of image files (Ex. AWS
Ec2 AMIs, etc...)
• Configuration Management tool (Chef, Puppet, Salt,
etc...)
|
@PolymathicCoder
40. Topology
• A typical cluster consists of
• A number of identical application server nodes behind a load
balancer
Load balancer?
• Load is evenly distributed across live nodes according to
some algorithm (Round-Robin typically)
|
@PolymathicCoder
41. Managing State
• Session data
• Session Replication
• Session Affinity / Sticky Session
• Requests from the same client are routed to the same node
• When the node dies, the session data dies with it
• Shared Session / Distributed Session
• Session data is in a “centralized” location
• Go Stateless
• No session data (Any node would do)
|
@PolymathicCoder
42. Parallelism At The Cluster Level
• Leverage Map/Reduce
• “A programming model for processing large data sets
with a parallel, distributed algorithm on a cluster”
• Apache Hadoop
|
@PolymathicCoder
43. Miscellaneous
• How to HTTPS?
• End at load balancer
• Wildcard SSL
• Distributed Lock Manager (DLM)
• Synchronize access to shared resources
• (Google Chubby, Apache Zookeeper, etc…)
• Distributed Transactions
• X/Open XA
|
@PolymathicCoder
49. What Datastore to Use?
• Relational of course!
•
•
•
•
Normalized schema guaranteeing data integrity
ACID Transactions
No biased towards specific access patterns
Flexible query language
• As datasets grow
•
•
•
•
•
Scale up (Buy beefier machines)
Database tuning / query optimization
Create materialized views
De-normalize
Etc…
| @PolymathicCoder
50. Mucho Data!
• No other choice but scaling out RDBMS
• Master/Slave clusters
• Sharding
• Failed big time!
• RDBMS is designed to run on one machine
• Eric Brewer’s CAP Theorem of distributed systems
• Pick 2 out of 3: Consistency, Availability, and Partition
Tolerance
• The relational model is designed to favor CA, hence can
never support P
|
@PolymathicCoder
51. NoSQL
• A wide range of specialized datastores with the goal of
addressing the challenges of the relational model
• “The whole point of seeking alternatives is that you need
to solve a problem that relational databases are a bad fit
for” –Eric Evans
• A wide variety
•
•
•
•
Key-Value Datastores
Columnar Datastores
Document Datastores
Graph Datastores
|
@PolymathicCoder
52. Polyglot Persistence
• Within the application
• Data is complex and accessed in many different ways
• Why should we fit it into one storage model?
• Polyglot Persistence is about
• Leveraging multiple data stores based on the specific way the
data is stored and accessed
• For more info:
• Checkout my talk on YouTube from JAX Conf 2012
• “The Rise of NoSQL and Polyglot Persistence”
• http://bit.ly/PCWtWi
|
@PolymathicCoder
54. Caching
• A cache is typically a simple key-value data structure
• Instead of incurring the overhead of data retrieval or
computation every time, you check the cache first
• You can’t cache everything, caches can be configured to use
multiple algorithms depending on the use case (LRU, LFU,
Bélády's Algorithm, etc...)
• Use aggressively!
• What to cache?
• Frequently accessed data (Session data, feeds, etc…)
• Results of intensive computations
|
@PolymathicCoder
55. Caching
• Where to cache?
• On disk
• File System: Slow and sequential access
• DB: A bit better (Data is arranged in structures designed for
efficiant access, indexes, etc…)
• Generally a terrible idea (SSDs make things a bit better)
• In-Memory: Fast and random access, but volatile
• Something in between: Persistence caches (Redis, etc…)
• What type of cache?
• Local, Replicated, Distributed, and Clustered
|
@PolymathicCoder
56. Caching
• How to cache?
• Most caches implement a very simple interface
• Always attempt to get from cache first using a key
• If it is a hit, you saved yourself the overhead
• If it is a miss, compute or read from the data store then put in
cache for subsequent gets
• When you update you can evict stale data
• You can set a TTL when you put
• Many other common operations...
|
@PolymathicCoder
57. Caching Patterns
• Caching Query Results
• Key: Hash of the query itself
• How about parameterized queries?
• Key: Hash of the query itself + Hash of parameter values
• Method/Function Memoization
• Key: Method name
• How methods with parameters?
• Key: Hash of the method name + Hash of parameter values
• Caching Objects
• Key: Identity of the object
|
@PolymathicCoder
58. Caching Patterns
• Time-series datasets (Ex. Real-time feed)
• Most of the time pseudo/near real-time is enough
• Use caching to throttle access to resources
• Cache query result with a t expiry
• Fresh data is only read every t
|
@PolymathicCoder
59. Caching Gotchas
• Profile your code to assess what to cache, and whether
you need to to begin with
• Stale state might bite you hard
• Incoherence: Inconsistent copies of objects cached with multiple
keys
• Stale nested aggregates
• Network overhead of misses might outweighs the
performance gain of hits
• Consider writing/updating cache when writing/updating
the persistence store
|
@PolymathicCoder
64. Asynchronous Processing
• Resource-intensive tasks cannot be handled practically during an
HTTP session
• Synchronous processing is overused and not necessary most of the
time
|
@PolymathicCoder
65. Asynchronous Processing Patterns
• Pseudo-Asynchronous Processing
• Flow
• Process data / operations in advance
• User requests data or operation
• Respond synchronously with pre-processed result
• Sometimes not possible (Dynamic content, etc...)
|
@PolymathicCoder
66. Asynchronous Processing Patterns
• True Asynchronous Processing
• Flow
• User request data or operation
• Acknowledge
• Ex. A REST that return an “202 Accepted” HTTP status code
• Do Processing at your own convenience
• Allow the user to check progress
• Optionally notify when processing is completed
|
@PolymathicCoder
69. Content Delivery Network (CDN)
• Static content
• Binary (Video, Audio, etc…)
• Web objects (HTML, JavaScript, CSS, etc…)
• Do NOT serve through your application server
• Use a CDN
• “A large distributed system of servers deployed in multiple data
centers across the internet”
• Akamai
• AWS CloudFront
|
@PolymathicCoder
70. CDN Gotchas
• Dirty Caches
• script.js is a script file deployed on CDN
• Multiple copies of script.js will be replicated across all edge
nodes of the CDN
• Clients/browsers will their own copies of script.js locally
• We update script.js
• Since the new and old version have the same URI
• New clients will be served the old version by the CDN
• Old clients will continue to use the old version from their
local cache
|
@PolymathicCoder
71. CDN Gotchas
• Dirty Caches
• What to do?
•
•
•
Simply append version number to file names
• script-v1.js, script-v2.js, etc…
Force invalidation of all copies on edge nodes
Set HTTP caching headers properly
|
@PolymathicCoder
73. Domain Name Service (DNS)
• Do NOT rely on your free domain name registrar DNS
•
Use a scalable DNS solution
• AWS Route 53
• DynECT
• UltraDNS
• Etc…
• Domain Sharding
•
•
Browsers limit the number of connections per host (Max of 6 usually)
• Creating multiple subdomains (CNAME entries) allow for more resources to
be downloaded in parallel
Watch out for: DNS lookup overhead, HTTPS cost, Browser’s Same-Origin
Policy, etc…
|
@PolymathicCoder
75. Remoting
• In a SOA (Service Oriented Architecture)
• RPC calls to multiple services
• Data Exchange (Plain vs. Binary)
• SOAP / REST with XML or JSON
• Google Protocol Buffers, Apache Thrift, Apache Avro, etc…
• Protocol
• JMS
• HTTP
• SPDY
|
@PolymathicCoder
79. When Disaster Hits…
• Goal
• Fault-tolerant system
• Restore service and recover data ASAP in case of a disaster
• Be proactive
• Develop a Disaster Recovery Plan (DRP)
• Practice and test your DRP by doing failure drills
|
@PolymathicCoder
81. Scaling Teams
• Hiring
• Always hire top talent
• You are as strong as your weakest link
• Develop a process to bring people in
• Turnkey Hardware/Software Setup (Vagrant, etc...)
• Arrange for proper access/accounts
• Develop a knowledge base (Architecture documentation,
FAQs, etc...)
• Development Process
• Be Agile
• Refine in the spirit of Six Sigma
| @PolymathicCoder
82. Scaling Teams
• Team Structure
• Small is good
• Form ad-hoc teams from pools of Agile breeds
• Product Owners
• Team Members
• Team Lead (Scrum Master)
• Engineers
• QAs
• Architecture Owners
• Give them ownership of their DevOps
|
@PolymathicCoder
84. The Take-home Message
• The early-bird gets the worm
• Design to scale from day one
• Plan for capacity early
• Your needs determine how scalable “your scalable”
needs to be
• Do not over-engineer
• Do not bite more than you can chew
• Building scalable system is process
• Commit to a road map around bottlenecks
• Guided by planned business features
• Learn from others’ experiences (Twitter, Netflix, etc...)
|
@PolymathicCoder
85. Take it slow… You’ll get there…
Work smarter not harder…
|
@PolymathicCoder