Scalable Drupal Infrastructure

Designing, Scoping, and Conﬁguring
Scalable Drupal
Infrastructure

Presented 2009-05-30 by David Strauss

Understanding
Load Distribution

Predicting peak traffic
Traffic over the day can be highly irregular. To plan
for peak loads, design as if all traffic were as heavy
as the peak hour of load in a typical month -- and
then plan for some growth.

Analyzing hit distribution
40%
30%
Hu man
e nt
nt 3%
icC
o 50%
t
Sta
t
en

W
t m

eb
rea
al T

Cr
100% ci

aw
pe
s
ou S

le
No

r
ym

10%
on
Dy
n

An
am

“P
i cP

ay
W
ag

al
l”
es

By
pa
ss
70%
Auth
entic
ated 7%
20%

Throughput vs. Delivery Methods
Green Yellow Red
(Static) (Dynamic, Cacheable) (Dynamic)
2
Content Delivery
Network
●●●●●●●●●● ✖ ✖

Reverse Proxy Cache ●●●●●●● ●●●●●●● ✖
1000 req/s

1
Drupal + Page Cache
+ memcached
●●● ●●● ✖
1
Drupal + Page Cache ●●● ●● ✖
1
Drupal ●●● ● ●
10 req/s
1 Delivered by Apache without Drupal
More dots = More throughput 2 Some actually can do this.

Objective

Deliver hits using the
fastest, most scalable
method available.

Layering: Less Traﬃc at Each Step

Your Datacenter

Load Reverse
Application
Traﬃc Balancer
Proxy
Server
Cache

DNS Round Robin

CDN Database

Oﬄoad from the master database
Search Your master database is the single
greatest limitation on scalability.

Application Slave
Server Database

Master
Memory
Cache
Database

Tools to use
‣ Apache Solr for search.
(Acquia oﬀers hosting of this now.)
‣ Squid or Varnish for reverse proxy caching.
‣ Any third-party service for CDN.

Do the math
‣ All non-CDN traffic travels through your load
balancers and reverse proxy caches. Even traffic
passed through to application servers must run
through the initial layers.

Load Reverse
Application
Traffic Balancer
Proxy
Server
Cache

What hit rate is each layer geing?
How many servers share the load?

Get a management/monitoring box
Load (maybe two or three
Balancer and have them
specialized or
redundant)

Application
Database Management
Server

Reverse
Proxy
Cache

Infrastructure goals
‣ Redundancy
‣ Scalability
‣ Performance
‣ Manageability

Redundancy
‣ When one server fails, the website should
be able to recover without taking too long.
‣ This requires N+1, putting a ﬂoor
on system requirements.
‣ How long can your site be down?
‣ Automatic versus manual failover

Performance
‣ Find the “sweet spot” for hardware. This is the
best price/performance point.
‣ Avoid overspending on any type of component
‣ Yet, avoid creating bottlenecks
‣ Swapping memory to disk is very dangerous

Relative importance
Processors/Cores Memory Disk Speed

Reverse Proxy
Cache ● ●●● ●●

Web Server ●●●●● ●● ●

Database Server ●● ●●●● ●●●●

Monitoring ● ● ●

Reverse proxy caches
‣ Squid makes poor use of multiple cores. Focus on
getting the highest per-core performance. The
best per-core performance is often on dual-core
processors with high clock rates and lots of cache.
‣ Varnish is much more multithreaded.
‣ 4-8 GB memory, total
‣ Expect 1000 requests per second, per Squid
‣ 64-bit operating system if more than 2 GB RAM

Web servers
‣ Apache 2.2 + mod_php + memcached
‣ Many processors + many cores is best
‣ 25 Apache threads per core
‣ 50 MB memory per thread, system-wide
‣ 1 GB memory for system
‣ 1 GB memory for memcached
‣ Conﬁgure MaxClients in Apache to maximum
system-wide thread count
‣ Expect 1 request per thread, per second

Database servers
‣ MySQL 5.0 cannot use more than eight cores
eﬀectively but gets good gains from at least quad-
core processors.
‣ Depend on each Apache thread needing one
connection, and add another 50.
‣ Each MySQL connection needs around 6 MB.
‣ MySQL with InnoDB needs a buﬀer pool large
enough to cache all indexes. Start by giving the
pool most remaining database server memory and
working from there.
‣ 64-bit operating system if more than 2 GB RAM

Monitoring server
‣ Very low hardware requirements
‣ Choose hardware that is inexpensive but
essentially similar to the rest of the cluster to
reduce management overhead
‣ Reliability and fast failover are typically low
priorities for monitoring services

Assembling the numbers
‣ Start with an architecture providing redundancy.
‣ Two servers, each running the whole stack
‣ Increase the number of proxy caches based on
anonymous and search engine traﬃc.
‣ Increase the number of web servers based on
authenticated traﬃc.
‣ Databases are harder to predict, but large sites
should run them on at least two separate boxes
with replication.

Pressﬂow
Make Drupal sites scale by upgrading core
with a compatible, powerful replacement.

Common large-site issues
‣ Drupal core requires patching to effectively
support the advanced scalability techniques
discussed here.
‣ Patches often conflict and have to be reapplied
with each Drupal upgrade.
‣ The original patches are often unmaintained.
‣ Sites stagnate, running old, insecure versions of
Drupal core because updating is too difficult.

What is Pressflow?
‣ Pressflow is a derivative of Drupal core that
integrates the most popular performance and
scalability enhancements.
‣ Pressflow is completely compatible with existing
Drupal 5 and 6 modules, both standard and
custom.
‣ Pressflow installs as a drop-in replacement for
standard Drupal.
‣ Pressflow is free as long as the matching version of
Drupal is also supported by the community.

What are the enhancements?
‣ Reverse proxy support
‣ Database replication support
‣ Lower database and session management load
‣ More eﬃcient queries
‣ Testing and optimization by Four Kitchens
with standard high-performance software
and hardware conﬁguration
‣ Industry-leading scalability support
by Four Kitchens and Tag1 Consulting

Four Kitchens + Tag1
‣ Provide the development, support, scalability, and
performance services behind Pressﬂow
‣ Comprise most members of the Drupal.org
infrastructure team
‣ Have the most experience scaling Drupal sites
of all sizes and all types

Ready to scale?
‣ Learn more about Pressﬂow:
‣ Pick up pamphlets in the lobby
‣ Request Pressﬂow releases at fourkitchens.com
‣ Get the help you need to make it happen:
‣ Talk to me (David) or Todd here at DrupalCamp
‣ Email shout@fourkitchens.com

The problem
Soware and
Conﬁguration

Application Application Application Application Application
Server Server Server Server Server

Objectives:
Fast, atomic deployment and rollback
Minimize single points of failure and contention
Restart services
Integrate with version control systems

Manual updates and deployment

Human Human Human Human Human


Why not: slow deployment,
non-atomic/diﬃcult rollbacks

Shared storage

NFS

Why not: single point of contention and failure

rsync
Synchronized
with rsync


Why not: non-atomic, does not manage services

Capistrano
Deployed with
Capistrano


Capistrano provides near-atomic deployment,
service restarts, automated rollback, test automation, and
version control integration (tagged releases).

Multistage deployment
Deployments
Deployed with Deployed with
Capistrano can be staged. Capistrano
cap staging deploy
cap production deploy

Development
Integration Deployed with Staging
Capistrano


But your application isn’t the only
thing to manage.

Beneath the application
Reverse
Cluster-level
Proxy Database
configuration
Cache


Cluster management applies to package management,
updates, and soware configuration.

cfengine and bcfg2 are popular
cluster-level system configuration tools.

System configuration management
‣ Deploys and updates packages, cluster-wide or
selectively.
‣ Manages arbitrary text configuration files
‣ Analyzes inconsistent configurations (and
converges them)
‣ Manages device classes (app. servers, database
servers, etc.)
‣ Allows confident configuration testing on a
staging server.

All on the management box

{
Development
Integration

Staging

Management

Deployment
Tools

Monitoring

Types of monitoring
Failure Capacity/Load

Analyzing Downtime Analyzing Trends

Viewing Failover Predicting Load

Troubleshooting Checking Results of
Conﬁguration and
Notiﬁcation Soware Changes

What to use

Failure/Uptime Capacity/Load

Nagios Cacti

Hyperic Munin

Nagios
‣ Highly recommended.
‣ Used by Four Kitchens and Tag1 Consulting for
client work, Drupal.org, Wikipedia, etc.
‣ Easy to install on CentOS 5 using EPEL packages.
‣ Easy to install nrpe agents to monitor diverse
services.
‣ Can notify administrators on failure.
‣ We use this on Drupal.org

Hyperic
‣ I haven’t used this much, but it’s fairly popular.
‣ More diﬃcult to set up than Nagios.

Cacti
‣ Highly annoying to set up.
‣ One instance generally collects all statistics.
(No “agents” on the systems being monitored.)
‣ Provides ﬂexible graphs that can be customized on
demand.
‣ Optimized database for perpetual statistics collection.
‣ We use this on Drupal.org and for client sites.

Munin
‣ Fairly easy to set up.
‣ One instance generally collects all statistics.
(No “agents” on the systems being monitored.)
‣ Provides static graphs that cannot be
customized.

Cache/session coherency
‣ Systems that run properly on single boxes may
lose coherency when run on a networked cluster.
‣ Some caches, like APC’s object cache, have no
ability to handle network-level coherency. (APC’s
opcode cache is safe to use on clusters.)
‣ memcached, if misconfigured, can hash values
inconsistently across the cluster, resulting in
different servers using different memcached
instances for the same keys.
‣ Session coherency can be helped with load
balancer affinity.

Cache regeneration races
‣ Downside to network cache coherency: synched
expiration
‣ Hard to solve
All servers regenerating the item.

Old Cached Item

Expiration
{ New Cached Item

Time

Broken replication
‣ MySQL slave servers get out of synch, fall further
behind
‣ No means of automated recovery
‣ Only solvable with good monitoring and recovery
procedures
‣ Can automate removal from use, but requires
cluster management tools

Server failure
‣ Load balancers can remove broken or overloaded
application reverse proxy caches.
‣ Reverse proxy caches like Varnish can automatically
use only functional application servers.
‣ Cluster management tools like heartbeat2 can manage
service IPs on MySQL servers to automate failover.
‣ Conclusion: Each layer intelligently monitors and uses
the servers beneath it.

All content in this presentation, except where noted otherwise, is Creative Commons Attribution-
ShareAlike 3.0 licensed and copyright 2009 Four Kitchen Studios, LLC.

Scalable Drupal Infrastructure

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (11)

Ähnlich wie Scalable Drupal Infrastructure

Ähnlich wie Scalable Drupal Infrastructure (20)

Mehr von David Timothy Strauss

Mehr von David Timothy Strauss (13)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Scalable Drupal Infrastructure