Scalability rules for web sites

Scalability Rules for
Web Sites
Minh Tran – 12/2011

Distribute Your Work
• Design to Clone Things (X axis)
• Design to Split Different Things (Y Axis)
• Design to Split Similar Things (Z Axis)

Design to Clone Things (X axis)
• When:
– Databases with a VERY HIGH READ to write
ratio (5:1 or greater—the higher the better).
– Any system where transaction growth exceeds
data growth.
• How to use:
– Simply clone services and implement a load
balancer.
– For databases, ensure the accessing code
understands the difference between a read and a
write.

Example
• Reservation system has
– 400 searches (read) for 1 booking (write)
• How to scale??
Scaled by creating read-only copies (or replicas)
One way is to use a caching tier in front of the
database (high recommended 1st step)
Most RDBMS allows replication ―out of the box‖
Master ~ primary transactional database (write)
Slave ~ read-only copies of the master

Design to Split Different Things(Y Axis)
• When:
– Very large data sets where relations between
data are not necessary.
– Large, complex systems where scaling
engineering resources requires specialization.
• How to use:
– Split up actions by using verbs or resources
by using nouns or use a mix.
– Split both the services and the data along the
lines defined by the verb/noun approach.

Example – Split up by verbs

Login
Signup

Search

Ecommerce Browse
system
View

Purchase /
Buy Add-to-cart

Example – Split up by nouns
User
Information

Product

SKU

Ecommerce
system
Catalog

Inventory

Design to Split Similar Things(Z Axis)
• When:
– Very large, similar data sets such as large and
rapidly growing customer bases.
• How to use:
– Identify something about the customer
• Customer ID
• Last name
• Geography
• Device
– Split or partition both data and services based on
that attribute.
• Often referred to as sharding or horizontal
partitioning

Use the Right Tool
• Use Databases Appropriately
• Actively Use Log Files

Use Databases Appropriately
RDBMS File System
Example • Oracle, MySQL… • GFS, MogileFS, Ceph

Storage • Transactional integrity (ACID) • No transactional
Structure • Relational structure within • No relationships
tables

Advantages • Minimize data redundancy • Handle very large amount of
• Improve transaction files and data
processing

Limitation • Scalability (ACID) • conflicting reads and writes
• Sharding or Partitioning over time
(Relational structure)

Use Databases Appropriately (cont)
NoSQL
Example • Memcached, • Google Big Table, • CouchDB, Amazon ‗s
Tokyo Tyrant, Cassandra SimpleDB, Yahoo‘s
Voldemort PNUTS, MongoDB,…

Storage • Key-value stores • Extensible record stores • Document stores
Structure • Single key-value • Row and column data • Multi-indexed object
index for data model model
• Stored in memory • Documents can be
aggregated into
collection of documents
Advantages • Significant • Rows are sharded on • Can be queried based
scaling and primary keys (automatic) on many different
performance • Columns are broken into attributes
groups (user definitions)

Limitation • Kind of data can • Asynchronous replication • Asynchronous
be stored replication
• Synchronous • ACID
replication

Actively Use Log Files
• When
– Put a process in place that monitors log files
– Forces people to take action on issues
identified.
• How to use
– Use any number of monitoring tools from
custom scripts to Splunk to watch your
application logs for errors
– Export these and assign resources for
identifying and solving the issue.

Use Caching Aggressively
• Leverage Content Delivery Networks
• Use Expires Headers
• Leverage Page Caches
• Utilize Application Caches
• Make Use of Object Caches
• Put Object Caches on Their Own ―Tier‖

Leverage Content Delivery
Networks
• When
– Ensure it is cost justified and then choose
which content is most suitable.
• How to use
– Most CDNs leverage DNS (Domain Name
Services or Domain Name Servers) to serve
content on your site‘s behalf.

Use Expires Headers
• When HTTP Status Code: HTTP/1.1 200
OK
– All object types Date: Thu, 21 Oct 2010 20:03:38
GMT
need to be Server: Apache/2.2.9 (Fedora)
X-Powered-By: PHP/5.2.6
considered. Expires: Mon, 26 Jul 2011 05:00:00
• How to use GMT
Last-Modified: Thu, 21 Oct 2010
– Headers can be set 20:03:38 GMT
Cache-Control: no-cache
on Web servers or Vary: Accept-Encoding, User-Agent
through application Transfer-Encoding: chunked
Content-Type: text/html;
code. charset=UTF-8

Leverage Page Caches
• When
– Always
• How to use
– Choose a
caching
system and
deploy.

Use Caching Aggressively
• Leverage Content Delivery Networks
• Use Expires Headers
• Leverage Page Caches
• Make Use of Object Caches
• Put Object Caches on Their Own ―Tier‖

Make Use of Object Caches
• When:
– Any time you have repetitive queries or
computations.
• How to use:
– Select any one of the many open source or
vendor supported solutions
– Implement the calls in your application code.
• Some popular caches: Memcached,
Ehcache, Apache OJB, NCache

Put Object Caches on Their Own
―Tier‖

Learn Aggressively
• Take every opportunity to learn.
• Be constantly learning from your mistakes
as well as successes.
• Watch your customers or use A/B testing
to determine what works.
• Use postmortems to learn from incidents
and problems in production.

Reference
• Scalability Rules: 50 Principles for Scaling
Web Sites
• The Art of Scalability: Scalable Web
Architecture, Processes, and
Organizations for the Modern Enterprise
• http://www.codefutures.com/database-
sharding/
• http://highscalability.com/unorthodox-
approach-database-design-coming-shard

Scalability rules for web sites

Scalability rules for web sites

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (18)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Scalability rules for web sites

Hinweis der Redaktion