SlideShare ist ein Scribd-Unternehmen logo
1 von 84
The Economies of Scaling
Software
Abdelmonaim RemaniAbdelmonaim Remani
@PolymathicCoder@PolymathicCoder
About Me
• Platform Architect at just.me Inc.
• JavaOne RockStar and frequent speaker at many developer events and conferences
including JavaOne, JAX, OSCON, OREDEV, 33rd Degree, etc...
• Open-source advocate and contributor
• Active Community member
• The NorCal Java User Group
• The Silicon valley Spring User Group
• The SiliconValley dart Meetup
• Bio: http://about.me/PolymathicCoder
• Twitter: @PolymathicCoder
• Email: abdelmonaim.remani@gmail.com
• SlideShare: http://www.slideshare.net/PolymathicCoder/
License
• Creative Commons Attribution Non-Commercial
License 3.0 Unported
• The graphics and logos in this presentation belong to
their rightful owners
• http://speakerscore.com/jaxconf-scalability
• @PolymathicCoder
Let’s Go!Let’s Go!
What’s up with the title?
• The Economies of Scale
• “In microeconomics, economies of scale
are the cost advantages that enterprises
obtain due to size [...] Often operational
efficiency is [...] greater with increasing
scale [...]” -Wikipedia
The line is blurred!
• The was a time when only the enterprise worried about issues
like scalability
• The rise of social and the abundance of mobile are responsible
for
• Not only an exponential growth of internet traffic
• But the creation of a spoiled user-base that wants answers
to questions like
• I want to see the closest Moroccan restaurants to my
current location on a map along with consumer
ratings and whether any of my friends has recently
checked-in in the last 30 days
The bar is high!
• Scalability is everyone’s problem
What is Scalability?
The Common Definition
• The ability of a software to handle an
increasing amount of work without
performance degradation
I have a problem with that definition...
• It implies that a scalable system is one that is
capable of sustaining its scalability forever
• Not realistic, It fails to recognize external
constraints imposed
• It fails to acknowledge that scalability is relative
• It does not take into account that a system
• Need not to be capable to handle the work
• But simply capable of evolving to handle the work
A better definition
• The ability of an application to gracefully
evolve within the constraints of its
ecosystem in order to handle the
maximum potential amount of work
without performance degradation
Easier said than done!
• A black art
• Not surprise here!
• An application that supports 1 million
users
• You add one new feature
• 500,000 users crash your system
The BottlenecksThe Bottlenecks
The Bottlenecks
• Scaling is about relieving or managing these limitations or
constraints that we call the bottlenecks
• When we talk about bottlenecks in computing, we talk about the
usual suspects
• The CPU
• Storage or I/O
• The Network
• Inter-related
• The rest of this talk is structured around these bottlenecks to make
the case that one’s scalability needs are to be addressed in that
fashion
The CPUThe CPU
BottleneckBottleneck
BottleneckBottleneck
The CPU Bottleneck
• Nothing affects the CPU more than the
instructions it is summoned to execute
• In other words, this is about the very code
of your application
A Scalable Architecture
Architecture?
• Architecture
• “Things that people perceive as hard-to-
change” - Martin Fowler
• http://martinfowler.com/ieeeSoftware/whoNee
• Decisions you commit to; the ones that will
be stuck with you forever
Be wise...Think twice...
Choosing the right technologies
• Platform
• Languages
• Frameworks
• Libraries
Making the right abstractions
• Technical Abstractions
• Functional Abstractions
• Make sure that the former is subordinate to the later and not the other way
around
Write Good Code
Write Good Code
• Think your algorithms through and mind their complexity (Asymptotic Complexity,
Cyclomatic Complexity, etc...)
• SOLIDify your Design
• Single Responsibility, Open-Closed, Liskov Substitution, Interface Segregation, and
Dependency Inversion
• Understand the limitation of your technology and leverage its strengths
• Don’t be afraid to be Polyglot
• Obsess with testing
• TTD/BDD
• Tools
• Static code analyzers (PMD, FindBugs, Etc...)
• Profilers (Detect memory leaks, bottle neck, etc...)
• Etc...
KnowYour S#!t
• Read
• The classics: The Mythical Man-Mouth
• GoF’s “Design Patterns”
• Eric Evans’ “Domain-Driven Design”
• Every book by Martin Fowler
• Uncle Bob’s “Clean Code”
• Josh Bloch’s “Effective Java”
• Brian Goetz’s “Java Concurrency in Practice ”
• Etc...
• The list is long ...
We do all that... and still end up with this...
• The fading tradition of making cow dung piles
Still much better than this...
Technical Debt is a Reality
• It is the inevitable...You will incur it one way or another deliberately and not
• The quick-and-dirty you are not proud of
• Things you would/should do differently
• Anyways, after a while it starts to smell...
• The bright side
• The fact that it is recognize as a debt is good
• Keep track and refactor
• For the fearless... Be wise and think twice before you do it
• Cut the right corners
• Don’t lock yourself out
• Don’t make it a part of your architecture
Scaling UpYour
Application
Parallelism
• Parallelism?
• Writing concurrent code or simultaneously executing code
• Most write code that runs within web containers by extending
framework classes that are already multi-threaded
• Sometimes the complexity of the business logic demands that we
break it into smaller steps, execute them in parallel, then
aggregate data back to get a result within a reasonable amount of
time
• This is not easy!
• Often requires synchronizing state, which is a nightmare
Vertical Scaling
• Vertical Scaling (Scaling up)
• A single-node system
• Adding more computing resources to the
node (Get a beefier machine)
• Writing code to harness the full power of
the one node
Easier said than done...
• On the one machine, we have been reaping the benefit of Moore’s Law
• Performance gain is automatically realized by software (In other
words, code is faster on faster hardware)
• The End of Moore’s Law:The birth multi-core chip
• We actually need to write code to take advantage of this
• Good news! There are frameworks and libraries make it a lot easier
• Fork/Join in Java
• Akka
• Etc...
Easier said than done...
• Challenges
• What about dependencies and 3rd Party code?
• Synchronizing state just got HARDER across cores! Too
many cooks!
• Frankly, this shared state deal is a real pain
• Get a life and do without
• Go immutable (Not always straightforward or
not even sometimes not possible)
• Go “Functional” (No guts... no glory...)
It gets more interesting...• Amdahl’s Law
• Throwing more cores does not necessarily result in
performance gain
• We actually end up with diminishing return at some point no
matter how many cores you throw in
Scaling OutYour
Application
Horizontal Scaling
• Horizontal Scaling (Scaling out)
• A distributed system (A cluster)
• Adding more nodes
• Writing code to harness the full power of
the cluster
Topology
• A typical cluster consists of
• A number of identical application server nodes behind a load balancer
A number?
• It depends on how many you actually need and can afford
• Elastic Scaling / Auto-Scaling
• The number of live nodes within the cluster shrinks and grows depending on the load
• New ones are provisioned or terminated as needed
Identical?
• Application nodes are cloned off of image files (Ex. AWS Ec2 AMIs, etc...)
• Configuration Management tool (Chef, Puppet, Salt, etc...)
Load balancer?
• Load is evenly distributed across live nodes according to some algorithm (Round-Robin typically)
Topology
Managing State
• Session data
• Session Replication
• Session Affinity/Sticky Session
• Requests from the same client always get routed back to the
same server
• When the node dies, session data die with it
• Shared/Distributed Session
• Session is in a centralized location
• Do your self a favor and go stateless!
• No session data
• Any server would do
Parallelism
• Leverage MapReduce
• “A programming model for processing
large data sets with a parallel, distributed
algorithm on a cluster”
• Hadoop
Misc
• Distributed Lock Manager (DLM)
• Synchronized access to shared resources
• Google Chubby
• Zookeeper
• Hazelcast
• Teracotta
• Etc...
• Distributed Transactions
• X/Open XA
• HTTPS
• End at the load balancer
• Wildcard SSL
• Leverage probabilistic data structures and algorithms
• Bloom filters
• Quotient filters
• Etc...
Deployment
Deployment
• Environments
• Multiple Development,Test, Stage, and
Production
• Automatic Configuration Management
• Practice Continuos Delivery
• Leverage The Cloud
• IaaS, PaaS, SaaS, and NaaS
The StorageThe Storage
BottleneckBottleneck
BottleneckBottleneck
The Storage Bottleneck
• Storage or I/O is usually the most
signification
The Persistent
Datastore
What datastore to use?
What kind of question is that?
What kind of question is that?
• There was a time when the obvious choice was the relational model
• Schema that guarantees data integrity
• Data Normalized (minimized redundancies, no modification anomalies, etc...)
• ACIDity (Atomicity, Consistency, Isolation, and Durability)
• Data is stored in away that is independent from how the data is to accessed (No biased
towards any particular query patterns)
• Flexible query language
• As our datasets grow, we scaled vertically
• Buying beefier machines
• Database tuning / Query Optimization
• Creating MaterializedViews
• De-normalizing
• Etc...
Mucho Data!
• We hit the limit of the one machine
• Attempted to scale the RDBMS horizontally
• Master/Slave clusters
• Data Sharding
• We failed...Why?
• Eric Brewer’s CAP Theorem on distributed systems
• Pick 2 out of 3
• Consistency
• Availability
• Partition Tolerance
• The relational model is designed to favor CA over P
• It cannot be scaled horizontally
NoSQL
• A wide range of specialized data stores with the goal of
addressing the challenges of the relation model
• “The whole point of seeking alternatives is that you need to
solve a problem that relational databases are a bad fit for” -Eric
Evans
• A wide variety
• Key-Value Data stores
• Columnar Data stores
• Document Data stores
• Graph Data stores
Polyglot Persistence
• Acknowledging
• The complexity and variety data and data access
patterns within the one application
• The absurdity of the idea that all data should be
fitted into one storage model
• Proposing a solutions that
• Leverage multiple data stores within the one
application based on the specific way the data is
stored and accessed
For more details...
• Checkout my talk from JAX Conf 2012
• The Rise of NoSQL and Polyglot
Persistence
• YouTubeVideo:
• http://bit.ly/PCWtWi
Caching
Caching
• A cache is typically simple key-value data structure
• Instead of incurring the overhead of data retrieval or
computation every time, you check the cache first
• Since we can’t cache everything, caches can be configured to
use multiple algorithms depending on the use cases (LRU,
Bélády's Algorithm, Etc...)
• Use aggressively!
• What to cache?
• Frequently accessed data (Session data, feeds, etc...)
• Long computation results
Caching
• Where to cache?
• On disk
• File System: Slow and sequential access
• DB:A little bit better (Data is arranged in structures
designed for efficient access, indexes, etc...)
• Generally a terrible idea
• SSD make things a little better
• In-Memory: Fast and random access, but volatile
• Something in between: Persistent caches (Redis, etc...)
Caching
• Types of Caches
• Local
• Replicated
• Distributed
• Clustered
Caching
• How to cache?
• Most caches implement a very simple interface
• Always attempt to get from cache first using a key
• If it is a hit, you saved yourself the overhead
• If it is a miss, compute or read from the data
store then put in cache for subsequent gets
• When you update you can evict stale data
• You can set a TTL when you put
• Many other common operations...
Caching Patterns
• Caching Query Results
• Key: hash of the query itself
• How about parametrized complex queries?
• Key: hash of the query itself + hash of parameter values
• Method/Function Memoization
• Key: method name
• How about with parametrized?
• Key: hash of the method name + hash of parameter values
• Caching Objects
• Key: Identity of the object
Caching Pattern
• Time-series datasets (Ex. Realtime feed)
• Sometimes pseudo/near realtime is
enough
• Use caching to throttle access to the
source
• Cache query result with a t expiry
• Fresh data is only read every t
Caching Gotchas
• Profile your code to assess what to cache, and whether you
need to to begin with
• Stale state might bite you hard
• Incoherence: Inconsistent copies of objects cached with
multiple keys
• Stale nested aggregates
• Network overhead of misses might outweighs the
performance gain of the hits
• Consider writing/updating to cache when you write to the
data store
Featured Solutions
• EhCache
• Memcahed
• Oracle Coherence
• Redis
• A Persistent NoSQL store
• Supports built-in data structures like sets and lists
• Supports intelligent keys and namespaces
The NetworkThe Network
BottleneckBottleneck
BottleneckBottleneck
Asynchronous
Processing
Asynchronous Processing
• Resource-intensive tasks are not practical
to handle a during a HTTP request window
• Synchronous is overused and not necessary
most of time
Asynchronous Processing Patterns
• Pseudo-Asynchronous Processing
• Flow
• Preprocessing data / operations in advance
• Request data or operation
• Responding synchronously with preprocessed
result
• Sometimes not possible (Dynamic content,
etc...)
Asynchronous Processing Patterns
• True Asynchronous Processing
• Flow
• Request data or operation
• Acknowledge
• Ex.A REST that return an “202 Accepted” HTTP
status code
• Do Processing at your own connivence
• Allow the user to check progress
• Optionally notify when processing is complete
Techniques
• Job/Work/Task Queues
• JMS
• AMQP (RabbitMQ,ActiveMQ, Etc...)
• AWS SQS
• Redis Lists
• Etc...
• Task Scheduling
• Jobs triggered periodically (Cron, Quartz, Etc...)
• Batch Processing
Content Delivery
Network (CDN)
CDN
• Static Content
• Binary (Video,Audio, Etc...)
• Web objects (HTML, Javascript, CSS, Etc...)
• Do not serve through you application server
• Use a CDN
• “A large distributed system of server deployed in
multiple data centers across the internet”
• Akamai
• AWS CloudFront
CDN Gotchas
• Versioning and caching
• Assume that you a script file named
script.js deployed on a CDN
• Copies of the file script.js will be
replicated across all edge nodes
• Clients will cache copies of the script file
script.js as well in their local cache
CDN Gotchas
• Versioning and caching
• When script.js is updated sharing the same URI
with the old version
• The new content is NOT propagated across
the edge nodes
• New clients end up being served with the
old version, now dirty state
• Old clients continue to use their local cache
containing the old version, now dirty state
CDN Gotchas
• Versioning and caching
• What to do?
• Simply append version numbers to file
names
• script-v1.js, script-v2.js, Etc...
• Force invalidation of the file on edge nodes
• Set HTTP caching headers properly
Domain Name Service
(DNS)
DNS
• DNS
• Do not rely on your free domain name registrar
DNS services
• Use a scalable DNS solution
• AWS Route 53
• DynECT
• UltraDNS
• Etc...
QuantifyingQuantifying
ScalabilityScalability
ScalabilityScalability
Quantifying Scalability
• Instrumentation
• Bake it into the code early
• Monitoring
• Application health
• Cluster
• Individual node
• System resources
• JVM
• Track Key Performance Indicators (KPIs)
• Number of request handled
• Throughput
• Latency
• Apdex Index
• Etc ...
• Logs
• Testing
• Load/Stress testing
DisasterDisaster
RecoveryRecovery
RecoveryRecovery
When disaster hits...• Goal:
• Fault tolerant system
• If case of disaster, recover and restore service ASAP
• Be proactive
• Develop a Disaster Recovery Plan (DRP)
• Test DRP in failure drills
ScalingScaling
TeamsTeams
TeamsTeams
Scaling Teams
• Hiring
• Always hire top talent
• You are as strong as your weakest link
• Develop a process to bring people in
• Turnkey Hardware/Software Set up (Tools likeVagrant, etc...)
• Arrange for proper access/accounts
• Develop a knowledge base (Architecture documentation, FAQs, etc...)
• Development Process
• Be Agile
• Refine in the spirit of Six Sigma
Scaling Teams
• Teams
• Form small ad-hoc teams from pools of Agile breeds
• Product Owners
• Team Members
• Team Lead (Scrum Master)
• Engineers
• QAs
• Architecture Owners
• Keep them small
• Give them ownership of their DevOps
The Take-homeThe Take-home
The Take-home Message
• The early-bird gets the worm
• Design to scale from day one
• Plan for capacity early
• Your needs determine how scalable is scalable
• Do not over-engineer
• Do not bite more than you can chew
• Building scalable system is process
• Commit to a road map around bottlenecks
• Guided by planned business features
• Learn from others’ experiences (Twitter, Netflix, etc...)
Take it slow...You’ll get there...
• Work smarter not harder
Questions?
Thank YouThank You
http://speakerscore.com/jaxconf-scalabilityhttp://speakerscore.com/jaxconf-scalability
@PolymathicCoder@PolymathicCoder

Weitere ähnliche Inhalte

Was ist angesagt?

Advanced java
Advanced java Advanced java
Advanced java NA
 
Hibernate introduction
Hibernate introductionHibernate introduction
Hibernate introductionSagar Verma
 
basic core java up to operator
basic core java up to operatorbasic core java up to operator
basic core java up to operatorkamal kotecha
 
JavaClassPresentation
JavaClassPresentationJavaClassPresentation
JavaClassPresentationjuliasceasor
 
The State of Managed Runtimes 2013, by Attila Szegedi
The State of Managed Runtimes 2013, by Attila SzegediThe State of Managed Runtimes 2013, by Attila Szegedi
The State of Managed Runtimes 2013, by Attila SzegediZeroTurnaround
 
Java introduction
Java introductionJava introduction
Java introductionKuppusamy P
 
Java 8 selected updates
Java 8 selected updatesJava 8 selected updates
Java 8 selected updatesVinay H G
 
Advance java prasentation
Advance java prasentationAdvance java prasentation
Advance java prasentationdhananajay95
 
java training in jaipur|java training|core java training|java training compa...
 java training in jaipur|java training|core java training|java training compa... java training in jaipur|java training|core java training|java training compa...
java training in jaipur|java training|core java training|java training compa...infojaipurinfo Jaipur
 
Core java lessons
Core java lessonsCore java lessons
Core java lessonsvivek shah
 
Java Presentation
Java PresentationJava Presentation
Java PresentationAmr Salah
 
Java Tutorial to Learn Java Programming
Java Tutorial to Learn Java ProgrammingJava Tutorial to Learn Java Programming
Java Tutorial to Learn Java Programmingbusiness Corporate
 
Modern Java Concurrency (OSCON 2012)
Modern Java Concurrency (OSCON 2012)Modern Java Concurrency (OSCON 2012)
Modern Java Concurrency (OSCON 2012)Martijn Verburg
 
A begineers guide of JAVA - Getting Started
 A begineers guide of JAVA - Getting Started A begineers guide of JAVA - Getting Started
A begineers guide of JAVA - Getting StartedRakesh Madugula
 
Java and internet fundamentals.
Java and internet fundamentals.Java and internet fundamentals.
Java and internet fundamentals.mali yogesh kumar
 
Java 101 Intro to Java Programming
Java 101 Intro to Java ProgrammingJava 101 Intro to Java Programming
Java 101 Intro to Java Programmingagorolabs
 

Was ist angesagt? (20)

Core Java
Core JavaCore Java
Core Java
 
Advanced java
Advanced java Advanced java
Advanced java
 
Java
JavaJava
Java
 
Hibernate introduction
Hibernate introductionHibernate introduction
Hibernate introduction
 
Introduction To Java.
Introduction To Java.Introduction To Java.
Introduction To Java.
 
basic core java up to operator
basic core java up to operatorbasic core java up to operator
basic core java up to operator
 
JavaClassPresentation
JavaClassPresentationJavaClassPresentation
JavaClassPresentation
 
The State of Managed Runtimes 2013, by Attila Szegedi
The State of Managed Runtimes 2013, by Attila SzegediThe State of Managed Runtimes 2013, by Attila Szegedi
The State of Managed Runtimes 2013, by Attila Szegedi
 
Java introduction
Java introductionJava introduction
Java introduction
 
Java 8 selected updates
Java 8 selected updatesJava 8 selected updates
Java 8 selected updates
 
Advance java prasentation
Advance java prasentationAdvance java prasentation
Advance java prasentation
 
Core Java
Core JavaCore Java
Core Java
 
java training in jaipur|java training|core java training|java training compa...
 java training in jaipur|java training|core java training|java training compa... java training in jaipur|java training|core java training|java training compa...
java training in jaipur|java training|core java training|java training compa...
 
Core java lessons
Core java lessonsCore java lessons
Core java lessons
 
Java Presentation
Java PresentationJava Presentation
Java Presentation
 
Java Tutorial to Learn Java Programming
Java Tutorial to Learn Java ProgrammingJava Tutorial to Learn Java Programming
Java Tutorial to Learn Java Programming
 
Modern Java Concurrency (OSCON 2012)
Modern Java Concurrency (OSCON 2012)Modern Java Concurrency (OSCON 2012)
Modern Java Concurrency (OSCON 2012)
 
A begineers guide of JAVA - Getting Started
 A begineers guide of JAVA - Getting Started A begineers guide of JAVA - Getting Started
A begineers guide of JAVA - Getting Started
 
Java and internet fundamentals.
Java and internet fundamentals.Java and internet fundamentals.
Java and internet fundamentals.
 
Java 101 Intro to Java Programming
Java 101 Intro to Java ProgrammingJava 101 Intro to Java Programming
Java 101 Intro to Java Programming
 

Ähnlich wie The Economies of Scaling Software

JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling SoftwareJAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Softwarejazoon13
 
Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015Ricard Clau
 
Design for Scale / Surge 2010
Design for Scale / Surge 2010Design for Scale / Surge 2010
Design for Scale / Surge 2010Christopher Brown
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Bob Pusateri
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Bob Pusateri
 
Docker-N-Beyond
Docker-N-BeyondDocker-N-Beyond
Docker-N-Beyondsantosh007
 
Getting Deep on Orchestration - Nickoloff - DockerCon16
Getting Deep on Orchestration - Nickoloff - DockerCon16Getting Deep on Orchestration - Nickoloff - DockerCon16
Getting Deep on Orchestration - Nickoloff - DockerCon16allingeek
 
RightScale Webinar: Security Monitoring in the Cloud: How RightScale Does It
RightScale Webinar: Security Monitoring in the Cloud: How RightScale Does ItRightScale Webinar: Security Monitoring in the Cloud: How RightScale Does It
RightScale Webinar: Security Monitoring in the Cloud: How RightScale Does ItRightScale
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?CQD
 
VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012Eonblast
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that growGibraltar Software
 
Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Ricard Clau
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Bob Pusateri
 
Understanding container security
Understanding container securityUnderstanding container security
Understanding container securityJohn Kinsella
 
Computer system organization
Computer system organizationComputer system organization
Computer system organizationSyed Zaid Irshad
 
Session on scalability - by isaka traore - 19 may 2016 - rockstart
Session on scalability - by isaka traore - 19 may 2016 - rockstartSession on scalability - by isaka traore - 19 may 2016 - rockstart
Session on scalability - by isaka traore - 19 may 2016 - rockstartIsaka Traore
 
Антон Бойко "Разделяй и властвуй — набор практик для построения масштабируемо...
Антон Бойко "Разделяй и властвуй — набор практик для построения масштабируемо...Антон Бойко "Разделяй и властвуй — набор практик для построения масштабируемо...
Антон Бойко "Разделяй и властвуй — набор практик для построения масштабируемо...Marina Peregud
 

Ähnlich wie The Economies of Scaling Software (20)

JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling SoftwareJAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
 
Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015
 
Design for Scale / Surge 2010
Design for Scale / Surge 2010Design for Scale / Surge 2010
Design for Scale / Surge 2010
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 
Docker-N-Beyond
Docker-N-BeyondDocker-N-Beyond
Docker-N-Beyond
 
Getting Deep on Orchestration - Nickoloff - DockerCon16
Getting Deep on Orchestration - Nickoloff - DockerCon16Getting Deep on Orchestration - Nickoloff - DockerCon16
Getting Deep on Orchestration - Nickoloff - DockerCon16
 
RightScale Webinar: Security Monitoring in the Cloud: How RightScale Does It
RightScale Webinar: Security Monitoring in the Cloud: How RightScale Does ItRightScale Webinar: Security Monitoring in the Cloud: How RightScale Does It
RightScale Webinar: Security Monitoring in the Cloud: How RightScale Does It
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?
 
VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that grow
 
Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
 
Understanding container security
Understanding container securityUnderstanding container security
Understanding container security
 
Stackato v2
Stackato v2Stackato v2
Stackato v2
 
Computer system organization
Computer system organizationComputer system organization
Computer system organization
 
Session on scalability - by isaka traore - 19 may 2016 - rockstart
Session on scalability - by isaka traore - 19 may 2016 - rockstartSession on scalability - by isaka traore - 19 may 2016 - rockstart
Session on scalability - by isaka traore - 19 may 2016 - rockstart
 
Lost with data consistency
Lost with data consistencyLost with data consistency
Lost with data consistency
 
Антон Бойко "Разделяй и властвуй — набор практик для построения масштабируемо...
Антон Бойко "Разделяй и властвуй — набор практик для построения масштабируемо...Антон Бойко "Разделяй и властвуй — набор практик для построения масштабируемо...
Антон Бойко "Разделяй и властвуй — набор практик для построения масштабируемо...
 

Mehr von Abdelmonaim Remani

The Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceThe Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceAbdelmonaim Remani
 
Building enterprise web applications with spring 3
Building enterprise web applications with spring 3Building enterprise web applications with spring 3
Building enterprise web applications with spring 3Abdelmonaim Remani
 
Introduction To Building Enterprise Web Application With Spring Mvc
Introduction To Building Enterprise Web Application With Spring MvcIntroduction To Building Enterprise Web Application With Spring Mvc
Introduction To Building Enterprise Web Application With Spring MvcAbdelmonaim Remani
 
Introduction To Rich Internet Applications
Introduction To Rich Internet ApplicationsIntroduction To Rich Internet Applications
Introduction To Rich Internet ApplicationsAbdelmonaim Remani
 

Mehr von Abdelmonaim Remani (7)

The Eschatology of Java
The Eschatology of JavaThe Eschatology of Java
The Eschatology of Java
 
How RESTful Is Your REST?
How RESTful Is Your REST?How RESTful Is Your REST?
How RESTful Is Your REST?
 
The Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceThe Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot Persistence
 
Le Tour de xUnit
Le Tour de xUnitLe Tour de xUnit
Le Tour de xUnit
 
Building enterprise web applications with spring 3
Building enterprise web applications with spring 3Building enterprise web applications with spring 3
Building enterprise web applications with spring 3
 
Introduction To Building Enterprise Web Application With Spring Mvc
Introduction To Building Enterprise Web Application With Spring MvcIntroduction To Building Enterprise Web Application With Spring Mvc
Introduction To Building Enterprise Web Application With Spring Mvc
 
Introduction To Rich Internet Applications
Introduction To Rich Internet ApplicationsIntroduction To Rich Internet Applications
Introduction To Rich Internet Applications
 

Kürzlich hochgeladen

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Kürzlich hochgeladen (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

The Economies of Scaling Software

  • 1. The Economies of Scaling Software Abdelmonaim RemaniAbdelmonaim Remani @PolymathicCoder@PolymathicCoder
  • 2. About Me • Platform Architect at just.me Inc. • JavaOne RockStar and frequent speaker at many developer events and conferences including JavaOne, JAX, OSCON, OREDEV, 33rd Degree, etc... • Open-source advocate and contributor • Active Community member • The NorCal Java User Group • The Silicon valley Spring User Group • The SiliconValley dart Meetup • Bio: http://about.me/PolymathicCoder • Twitter: @PolymathicCoder • Email: abdelmonaim.remani@gmail.com • SlideShare: http://www.slideshare.net/PolymathicCoder/
  • 3. License • Creative Commons Attribution Non-Commercial License 3.0 Unported • The graphics and logos in this presentation belong to their rightful owners
  • 6. What’s up with the title? • The Economies of Scale • “In microeconomics, economies of scale are the cost advantages that enterprises obtain due to size [...] Often operational efficiency is [...] greater with increasing scale [...]” -Wikipedia
  • 7. The line is blurred! • The was a time when only the enterprise worried about issues like scalability • The rise of social and the abundance of mobile are responsible for • Not only an exponential growth of internet traffic • But the creation of a spoiled user-base that wants answers to questions like • I want to see the closest Moroccan restaurants to my current location on a map along with consumer ratings and whether any of my friends has recently checked-in in the last 30 days
  • 8. The bar is high! • Scalability is everyone’s problem
  • 10. The Common Definition • The ability of a software to handle an increasing amount of work without performance degradation
  • 11. I have a problem with that definition... • It implies that a scalable system is one that is capable of sustaining its scalability forever • Not realistic, It fails to recognize external constraints imposed • It fails to acknowledge that scalability is relative • It does not take into account that a system • Need not to be capable to handle the work • But simply capable of evolving to handle the work
  • 12. A better definition • The ability of an application to gracefully evolve within the constraints of its ecosystem in order to handle the maximum potential amount of work without performance degradation
  • 13. Easier said than done! • A black art • Not surprise here! • An application that supports 1 million users • You add one new feature • 500,000 users crash your system
  • 15. The Bottlenecks • Scaling is about relieving or managing these limitations or constraints that we call the bottlenecks • When we talk about bottlenecks in computing, we talk about the usual suspects • The CPU • Storage or I/O • The Network • Inter-related • The rest of this talk is structured around these bottlenecks to make the case that one’s scalability needs are to be addressed in that fashion
  • 17. The CPU Bottleneck • Nothing affects the CPU more than the instructions it is summoned to execute • In other words, this is about the very code of your application
  • 19. Architecture? • Architecture • “Things that people perceive as hard-to- change” - Martin Fowler • http://martinfowler.com/ieeeSoftware/whoNee • Decisions you commit to; the ones that will be stuck with you forever
  • 20. Be wise...Think twice... Choosing the right technologies • Platform • Languages • Frameworks • Libraries Making the right abstractions • Technical Abstractions • Functional Abstractions • Make sure that the former is subordinate to the later and not the other way around
  • 22. Write Good Code • Think your algorithms through and mind their complexity (Asymptotic Complexity, Cyclomatic Complexity, etc...) • SOLIDify your Design • Single Responsibility, Open-Closed, Liskov Substitution, Interface Segregation, and Dependency Inversion • Understand the limitation of your technology and leverage its strengths • Don’t be afraid to be Polyglot • Obsess with testing • TTD/BDD • Tools • Static code analyzers (PMD, FindBugs, Etc...) • Profilers (Detect memory leaks, bottle neck, etc...) • Etc...
  • 23. KnowYour S#!t • Read • The classics: The Mythical Man-Mouth • GoF’s “Design Patterns” • Eric Evans’ “Domain-Driven Design” • Every book by Martin Fowler • Uncle Bob’s “Clean Code” • Josh Bloch’s “Effective Java” • Brian Goetz’s “Java Concurrency in Practice ” • Etc... • The list is long ...
  • 24. We do all that... and still end up with this... • The fading tradition of making cow dung piles
  • 25. Still much better than this...
  • 26. Technical Debt is a Reality • It is the inevitable...You will incur it one way or another deliberately and not • The quick-and-dirty you are not proud of • Things you would/should do differently • Anyways, after a while it starts to smell... • The bright side • The fact that it is recognize as a debt is good • Keep track and refactor • For the fearless... Be wise and think twice before you do it • Cut the right corners • Don’t lock yourself out • Don’t make it a part of your architecture
  • 28. Parallelism • Parallelism? • Writing concurrent code or simultaneously executing code • Most write code that runs within web containers by extending framework classes that are already multi-threaded • Sometimes the complexity of the business logic demands that we break it into smaller steps, execute them in parallel, then aggregate data back to get a result within a reasonable amount of time • This is not easy! • Often requires synchronizing state, which is a nightmare
  • 29. Vertical Scaling • Vertical Scaling (Scaling up) • A single-node system • Adding more computing resources to the node (Get a beefier machine) • Writing code to harness the full power of the one node
  • 30. Easier said than done... • On the one machine, we have been reaping the benefit of Moore’s Law • Performance gain is automatically realized by software (In other words, code is faster on faster hardware) • The End of Moore’s Law:The birth multi-core chip • We actually need to write code to take advantage of this • Good news! There are frameworks and libraries make it a lot easier • Fork/Join in Java • Akka • Etc...
  • 31. Easier said than done... • Challenges • What about dependencies and 3rd Party code? • Synchronizing state just got HARDER across cores! Too many cooks! • Frankly, this shared state deal is a real pain • Get a life and do without • Go immutable (Not always straightforward or not even sometimes not possible) • Go “Functional” (No guts... no glory...)
  • 32. It gets more interesting...• Amdahl’s Law • Throwing more cores does not necessarily result in performance gain • We actually end up with diminishing return at some point no matter how many cores you throw in
  • 34. Horizontal Scaling • Horizontal Scaling (Scaling out) • A distributed system (A cluster) • Adding more nodes • Writing code to harness the full power of the cluster
  • 35. Topology • A typical cluster consists of • A number of identical application server nodes behind a load balancer A number? • It depends on how many you actually need and can afford • Elastic Scaling / Auto-Scaling • The number of live nodes within the cluster shrinks and grows depending on the load • New ones are provisioned or terminated as needed Identical? • Application nodes are cloned off of image files (Ex. AWS Ec2 AMIs, etc...) • Configuration Management tool (Chef, Puppet, Salt, etc...) Load balancer? • Load is evenly distributed across live nodes according to some algorithm (Round-Robin typically)
  • 37. Managing State • Session data • Session Replication • Session Affinity/Sticky Session • Requests from the same client always get routed back to the same server • When the node dies, session data die with it • Shared/Distributed Session • Session is in a centralized location • Do your self a favor and go stateless! • No session data • Any server would do
  • 38. Parallelism • Leverage MapReduce • “A programming model for processing large data sets with a parallel, distributed algorithm on a cluster” • Hadoop
  • 39. Misc • Distributed Lock Manager (DLM) • Synchronized access to shared resources • Google Chubby • Zookeeper • Hazelcast • Teracotta • Etc... • Distributed Transactions • X/Open XA • HTTPS • End at the load balancer • Wildcard SSL • Leverage probabilistic data structures and algorithms • Bloom filters • Quotient filters • Etc...
  • 41. Deployment • Environments • Multiple Development,Test, Stage, and Production • Automatic Configuration Management • Practice Continuos Delivery • Leverage The Cloud • IaaS, PaaS, SaaS, and NaaS
  • 43. The Storage Bottleneck • Storage or I/O is usually the most signification
  • 45. What datastore to use? What kind of question is that? What kind of question is that? • There was a time when the obvious choice was the relational model • Schema that guarantees data integrity • Data Normalized (minimized redundancies, no modification anomalies, etc...) • ACIDity (Atomicity, Consistency, Isolation, and Durability) • Data is stored in away that is independent from how the data is to accessed (No biased towards any particular query patterns) • Flexible query language • As our datasets grow, we scaled vertically • Buying beefier machines • Database tuning / Query Optimization • Creating MaterializedViews • De-normalizing • Etc...
  • 46. Mucho Data! • We hit the limit of the one machine • Attempted to scale the RDBMS horizontally • Master/Slave clusters • Data Sharding • We failed...Why? • Eric Brewer’s CAP Theorem on distributed systems • Pick 2 out of 3 • Consistency • Availability • Partition Tolerance • The relational model is designed to favor CA over P • It cannot be scaled horizontally
  • 47. NoSQL • A wide range of specialized data stores with the goal of addressing the challenges of the relation model • “The whole point of seeking alternatives is that you need to solve a problem that relational databases are a bad fit for” -Eric Evans • A wide variety • Key-Value Data stores • Columnar Data stores • Document Data stores • Graph Data stores
  • 48. Polyglot Persistence • Acknowledging • The complexity and variety data and data access patterns within the one application • The absurdity of the idea that all data should be fitted into one storage model • Proposing a solutions that • Leverage multiple data stores within the one application based on the specific way the data is stored and accessed
  • 49. For more details... • Checkout my talk from JAX Conf 2012 • The Rise of NoSQL and Polyglot Persistence • YouTubeVideo: • http://bit.ly/PCWtWi
  • 51. Caching • A cache is typically simple key-value data structure • Instead of incurring the overhead of data retrieval or computation every time, you check the cache first • Since we can’t cache everything, caches can be configured to use multiple algorithms depending on the use cases (LRU, Bélády's Algorithm, Etc...) • Use aggressively! • What to cache? • Frequently accessed data (Session data, feeds, etc...) • Long computation results
  • 52. Caching • Where to cache? • On disk • File System: Slow and sequential access • DB:A little bit better (Data is arranged in structures designed for efficient access, indexes, etc...) • Generally a terrible idea • SSD make things a little better • In-Memory: Fast and random access, but volatile • Something in between: Persistent caches (Redis, etc...)
  • 53. Caching • Types of Caches • Local • Replicated • Distributed • Clustered
  • 54. Caching • How to cache? • Most caches implement a very simple interface • Always attempt to get from cache first using a key • If it is a hit, you saved yourself the overhead • If it is a miss, compute or read from the data store then put in cache for subsequent gets • When you update you can evict stale data • You can set a TTL when you put • Many other common operations...
  • 55. Caching Patterns • Caching Query Results • Key: hash of the query itself • How about parametrized complex queries? • Key: hash of the query itself + hash of parameter values • Method/Function Memoization • Key: method name • How about with parametrized? • Key: hash of the method name + hash of parameter values • Caching Objects • Key: Identity of the object
  • 56. Caching Pattern • Time-series datasets (Ex. Realtime feed) • Sometimes pseudo/near realtime is enough • Use caching to throttle access to the source • Cache query result with a t expiry • Fresh data is only read every t
  • 57. Caching Gotchas • Profile your code to assess what to cache, and whether you need to to begin with • Stale state might bite you hard • Incoherence: Inconsistent copies of objects cached with multiple keys • Stale nested aggregates • Network overhead of misses might outweighs the performance gain of the hits • Consider writing/updating to cache when you write to the data store
  • 58. Featured Solutions • EhCache • Memcahed • Oracle Coherence • Redis • A Persistent NoSQL store • Supports built-in data structures like sets and lists • Supports intelligent keys and namespaces
  • 61. Asynchronous Processing • Resource-intensive tasks are not practical to handle a during a HTTP request window • Synchronous is overused and not necessary most of time
  • 62. Asynchronous Processing Patterns • Pseudo-Asynchronous Processing • Flow • Preprocessing data / operations in advance • Request data or operation • Responding synchronously with preprocessed result • Sometimes not possible (Dynamic content, etc...)
  • 63. Asynchronous Processing Patterns • True Asynchronous Processing • Flow • Request data or operation • Acknowledge • Ex.A REST that return an “202 Accepted” HTTP status code • Do Processing at your own connivence • Allow the user to check progress • Optionally notify when processing is complete
  • 64. Techniques • Job/Work/Task Queues • JMS • AMQP (RabbitMQ,ActiveMQ, Etc...) • AWS SQS • Redis Lists • Etc... • Task Scheduling • Jobs triggered periodically (Cron, Quartz, Etc...) • Batch Processing
  • 66. CDN • Static Content • Binary (Video,Audio, Etc...) • Web objects (HTML, Javascript, CSS, Etc...) • Do not serve through you application server • Use a CDN • “A large distributed system of server deployed in multiple data centers across the internet” • Akamai • AWS CloudFront
  • 67. CDN Gotchas • Versioning and caching • Assume that you a script file named script.js deployed on a CDN • Copies of the file script.js will be replicated across all edge nodes • Clients will cache copies of the script file script.js as well in their local cache
  • 68. CDN Gotchas • Versioning and caching • When script.js is updated sharing the same URI with the old version • The new content is NOT propagated across the edge nodes • New clients end up being served with the old version, now dirty state • Old clients continue to use their local cache containing the old version, now dirty state
  • 69. CDN Gotchas • Versioning and caching • What to do? • Simply append version numbers to file names • script-v1.js, script-v2.js, Etc... • Force invalidation of the file on edge nodes • Set HTTP caching headers properly
  • 71. DNS • DNS • Do not rely on your free domain name registrar DNS services • Use a scalable DNS solution • AWS Route 53 • DynECT • UltraDNS • Etc...
  • 73. Quantifying Scalability • Instrumentation • Bake it into the code early • Monitoring • Application health • Cluster • Individual node • System resources • JVM • Track Key Performance Indicators (KPIs) • Number of request handled • Throughput • Latency • Apdex Index • Etc ... • Logs • Testing • Load/Stress testing
  • 75. When disaster hits...• Goal: • Fault tolerant system • If case of disaster, recover and restore service ASAP • Be proactive • Develop a Disaster Recovery Plan (DRP) • Test DRP in failure drills
  • 77. Scaling Teams • Hiring • Always hire top talent • You are as strong as your weakest link • Develop a process to bring people in • Turnkey Hardware/Software Set up (Tools likeVagrant, etc...) • Arrange for proper access/accounts • Develop a knowledge base (Architecture documentation, FAQs, etc...) • Development Process • Be Agile • Refine in the spirit of Six Sigma
  • 78. Scaling Teams • Teams • Form small ad-hoc teams from pools of Agile breeds • Product Owners • Team Members • Team Lead (Scrum Master) • Engineers • QAs • Architecture Owners • Keep them small • Give them ownership of their DevOps
  • 80. The Take-home Message • The early-bird gets the worm • Design to scale from day one • Plan for capacity early • Your needs determine how scalable is scalable • Do not over-engineer • Do not bite more than you can chew • Building scalable system is process • Commit to a road map around bottlenecks • Guided by planned business features • Learn from others’ experiences (Twitter, Netflix, etc...)
  • 81. Take it slow...You’ll get there... • Work smarter not harder
  • 83.