1. http://www.egeniq.com
info@egeniq.com
@egeniq
The Art of Scalability
Ivo Jansch / @ijansch
Tweakers.net Developer Summit, March 24 2011
2. What is Scalability?
“[...] Scalability is the ability of a system,
network, or process, to handle growing
amounts of work in a graceful manner or its
ability to be enlarged to accommodate that
growth.” - Wikipedia
2
3. Different levels of scale
Pageviews/
Site # Servers Views / server
month
Facebook 200B 30K 6.6M
Netlog 4B 600 6.6M
Hyves 6.5B 3000 2.1M
Tweakers 60M 15 4M
Nu.nl 340M 14 24M
(See end of slide deck for sources)
3
4. Some of the tools at Netlog
‣ Content Delivery Network
• http://www.akamai.com/
‣ Search Engine
• http://sphinxsearch.com/
• (Also have a look at http://lucene.apache.org/solr/)
‣ Database Replication
4
5. Some of the tools at Hyves
‣ Caching
• http://memcached.org/
‣ Lightweight webserver
• http://nginx.org/
‣ Automated deployment
• http://puppetlabs.com/
5
6. Some of the tools at Nu.nl
‣ Content pre-generation
‣ Varnish Edge Side Includes
• http://www.varnish-software.com/
‣ Distributed Job Queues
• http://gearman.org/ *
* nu.nl uses a custom job queue but this is one I can recommend
6
7. Some of the tools at Facebook
‣ PHP -> C++ compiler
• https://github.com/facebook/hiphop-php
‣ Distributed Logging
• https://github.com/facebook/scribe
‣ NoSQL Storage
• http://cassandra.apache.org/
7
11. A scalable funnel
Load balancer
Web Web Web Web Web
Server Server Server Server Server
Load balancer
App App App
Server Server Server
Load balancer
Database Services
11
18. But this is Art, too
“Composition
with Blue”
Piet Mondriaan
18
19. Moral of this story
Scalability is about the ability to scale.
Don’t scale prematurely.
Keep it simple.
19
20. “You’re doing it wrong!”
“Scalability is the ability of a system, network,
or process, to handle growing large amounts
of work in a graceful manner or its ability to be
enlarged to accommodate that growth.”
20
22. Sources
The slide with servers and pageviews was based on:
‣ http://cns.ucsd.edu/lecturearchive09.shtml#Roth (Facebook)
‣ http://technologie.hyves.nl/ (Hyves)
‣ http://www.slideshare.net/folke/netlog-what-we-learned-about-scalability-high-
availability-430211 (Netlog)
‣ http://www.slideshare.net/peter_ibuildings/surviving-a-plane-crash (Nu.nl)
‣ http://tweakers.net/reviews/331/5/tweakers-punt-net-faq-plans-stats-servers-en-site-
software.html (Tweakers)
23. Credits
The following Creative Commons pictures were used in this presentation:
‣ ‘Datacenter Work’ by Leonardo Rizzi - http://www.flickr.com/photos/stars6/4381851322/
‣ ‘Rubber Band Man’ by Abe Novy - http://flickr.com/photos/thenovys/3791884189/
‣ ‘You’re doing it wrong’ by Adam Swank - http://www.flickr.com/photos/adeepbreath/3952587062/