3. SO…FIRST THING’S FIRST
• Prerequisites that you’ll need (or at least want).
– A good, reliable network with plenty of capacity
– At least one expert Systems Administrator
• You don’t necessarily need this:
5. FIRST, DON’T USE DRUPAL
(WELL, SORTA)
Pressflow
Apache
PHP
Drupal
MySQL
• Drop-in replacement for Drupal 6.x
• Support for database replication
• Support for reverse proxy caching
• Optimization for MySQL
• Optimization for PHP 5
• Available at:
http://fourkitchens.com/pressflow-
makes-drupal-scale
6. AFTER PRESSFLOW, IT’S ALL ABOUT
CACHE
Varnish
Varnish
Apache
PHP
Pressflow
MySQL
• Varnish is a reverse proxy cache
• Caches content based on HTTP
headers
• Uses kernel-based virtual memory
• Watch out for cookies,
authenticated users
• Great Config http://lb.cm/ZyR
• Thanks quicksketch!
• Available at
http://varnish-cache.org/
8. HTTP LOGGING
• VarnishNCSA daemon handles logging
• Default Apache logs will always show 127.0.0.1
• Define a new log format to use X-Forwarded-For
LogFormat "%{X-Forwarded-For}i %l %u %t "%r" %>s
%b "%{Referer}i" "%{User-Agent}i""
combined_proxy
CustomLog /var/log/apache2/access.log
combined_proxy
9. CACHING WITH COOKIES
sub vcl_recv {
// Remove has_js and Google Analytics __* cookies.
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;s*)(__[a-
z]+|has_js)=[^;]*", "");
// Remove a ";" prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, "^;s*", "");
// Remove empty cookies.
if (req.http.Cookie ~ "^s*$") {
unset req.http.Cookie;
}
}
sub vcl_hash {
// Include cookie in cache hash
if (req.http.Cookie) {
set req.hash += req.http.Cookie;
}
}
10. BASIC SECURITY
// Define the internal network subnets
acl internal {
"127.0.0.0"/8;
"10.0.0.0"/8;
}
sub vcl_recv {
[…]
// Do not allow outside access to cron.php
if (req.url ~ "^/cron.php(?.*)?$" && !client.ip ~ internal) {
set req.url = "/404-cron.php";
}
}
12. APACHE OPTIMIZATIONS
Apache
Varnish
Apache
PHP
Pressflow
MySQL
• Tune apache to match your
hardware
• Setting MaxClients too high is
asking for trouble
• Every application is different
• A good starting point is total
amount of memory allocated
to Apache divided by 40MB
• One of the areas that will
need to be monitored and
updated on an ongoing basis
13. STILL ALL ABOUT CACHE
• APC Opcode Cache APC
Varnish
Apache
APC
PHP
Pressflow
MySQL
• APC is an Opcode cache
• Officially supported by PHP
• Prevents unnecessary PHP
parsing and compiling
• Reduces load on Memory
and CPU
15. EVEN MORE CACHE
• Memcached Memcached
Varnish
Apache
APC
PHP
Pressflow
Memcached
MySQL
• Memcached is a
distributed memory object
caching system
• Reduces load on database
• Simple key/value datastore
18. BUT WHAT ABOUT SEARCH?
Solr
Varnish
Apache
APC
PHP
Pressflow
Memcached
Solr
MySQL
• Better than native Drupal
search
• Built on standard application
server
• You can decide what J2EE
server to use
• Flexibility allows fault tolerance
• Configurable through the
Drupal Solr module
19. CAN OTHER STACKS WORK TOO?
• Yes, there are some different technologies and
strategies that do the same thing (nginx, Cassandra,
eAccelerator, etc.).
• Arguments can be made both for/against
• This stack is what we have used in production and feel
is the most stable and enterprise-ready
• We’re always refining our stack too. So far, this is what
we like best. (Currently testing the Comanche web server)
• Project Mercury uses the same stack
20. BACK TO STACKS
• The beauty of this performance stack…
Varnish
Apache
APC
PHP
Pressflow
Memcached
Solr
MySQL
…is that it can be installed
entirely on a single
server, and that server
will perform well.
But what if one server
isn’t good enough?
21. SCALE APART
• Because these services are modular, we can separate
server roles
Varnish
Apache
APC
PHP
Pressflow
Memcached
Solr
MySQL
Varnish Server
Web/App Server
Database Server
22. YOU CAN DO THIS A COUPLE WAYS
• Another example:
Varnish
Apache
APC
PHP
Pressflow
Memcache
Solr
MySQL
Web/App Server
Memcached/Solr
Server
Database Server
23. HOW WE LIKE TO DO IT
• This is our standard separation
Varnish
Apache
APC
PHP
Pressflow
Memcached
Solr
MySQL
Web/App
Server
Database
Server
24. SCALE FURTHER: LOAD-BALANCING
• Multiple web servers
can be load-balanced
for greater capacity,
using the same
database
• Single-points of
failure apparent.
• Load balancing
utilizing LVS
Web/App
Server
Web/App
Server
Database
Server
Load Balancer(s)
Load-balanced Architecture
25. FILE SYNCHRONIZATION
• NFS NFS
Varnish
Apache
APC
PHP
Pressflow
Memcached
Solr
MySQL
NFS
•NFS allows multiple
web/app servers to
seamlessly serve the same
content
•User uploaded content is
instantly available to all web
servers
•Any code changes only need
to be made in one location
27. FAULT TOLERANCE IS IMPORTANT NOW
High Availability Architecture
• Now that we’re
scaling out with more
capacity, we’re
probably really scared
of the DB failing
• MySQL circular
replication
• NFS-HA
• Solr fault tolerance
• All managed by
Heartbeat
Web/App
Server
Web/App
Server
MySQL / NFS
Server
Load Balancer(s)
MySQL / NFS
Server
28. MYSQL CIRCULAR REPLICATION
• Circular replication is the method by which we
synchronize data
• There are 2 IP addresses (master and slave)
• Heartbeat is used to automatically failover the
addresses when necessary
MySQL Server 1 MySQL Server 2
29. NFS HA USING DRBD
• Data synchronization handled with DRBD
• Distributed Replicated Block Device (DRBD)
– Essentially RAID1 over the network
– Only one NFS server is able to access the data at a time, which is
why we have the IP management
• IP management is handled by Heartbeat automatically
NFS Server 1 NFS Server 2
30. SOLR
• Data synchronization handled with DRBD
• Distributed Replicated Block Device (DRBD)
– Essentially RAID1 over the network
– Only one Solr server is able to access the data at a time, which is
why we have the IP management
• IP management is handled by Heartbeat automatically
Solr Server 1 Solr Server 2
32. OTHER THINGS TO CONSIDER
• Drush
• Monitoring
– Availability
– Core updates
– Module updates
– Munin
• CDN
33. LESSONS WE’VE LEARNED
…things we’ve picked up from experience
• Conntrack tables
– Disable all the IPTables
connection tracking modules
unless you need them
• NTP
– Time synchronization is
extremely important on any
system that utilizes Heartbeat
• Load Testing
– Load test your solutions and
make sure you can achieve
your goal
Welcome to our presentation over designing ent drupal environments. This presentation is being given from an infrastructure viewpoint, there are many different aspects to take into consideration from a development perspective and there a lot more qualified folks here that can talk about that aspect. Today we are going to walk you through the software that we have found has the best balance of stability and speed for applications that have to be up all the time and are high traffic sites. All slides will be available online once the presentation is complete and we will be here and available for questions.
A good network is an absolute requirement in order to run a big Drupal environment, if you plan on getting any press or decent traffic at all, then making sure that you set requirements for your hosting provider is crucial. The main points to require are 1) You should require a gigabit, dedicated connection to their core network 2) A network that has a guaranteed SLA 3) a guarantee that they have the capacity to support your growth.
This is the semi-unofficial standard Drupal stack. There are a lot of packages here that can be replaced by other packages, some better, some worse. Through our experience we have found that this is currently the most scalable and reliable stack we have deployed. That does not mean we have a problem with Cassandra or nginx, just that we have not gone through enough deployments with those packages to call them “standard” yet. That could change very soon. Quick descriptions:
Varnish is the outward most facing cache
APC is our opcode cache
Apache/PHP is our webserver and interpreter
Pressflow is the port of Drupal that we currently use
Memcache is our backend caching mechanism
MySQL is our database system
(Directly from fourkitchens.com)
Support for database replication
Replication in MySQL takes the data from a designated master server and copies it out to any number of other MySQL servers. Pressflow can then use these replicated MySQL servers for time-consuming query operations that would otherwise slow down the central database. Pressflow’s replication model (and much of the same software) is currently in use on Drupal.org and numerous large Drupal sites, but similar support for replication won’t be available in standard Drupal releases until version 7, which includes a new database abstraction layer.
Support for Squid and Varnish reverse proxy caching
A reverse proxy cache takes most of the load of anonymous browsing off of Drupal, PHP, Apache, and MySQL by placing a high-performance cache in front of the rest of the web application stack. This cache accelerates static content, like CSS, Javascript, and image files, as well as full web pages for anonymous users. Squid and Varnish are the most popular free, open-source reverse proxy caches, and are used to deliver sites as high-traffic as Wikipedia. The key to effective reverse proxy cache deployment is having the content management system inform the cache what it can and cannot cache. Currently, only Pressflow supports this capability.
Optimization for MySQL
Drupal is designed to run on MySQL, PostgreSQL, and (beginning with Drupal 7) SQLite. While this gives Drupal broad storage support, the vast majority of Drupal sites run on MySQL, especially the largest ones. Pressflow only supports MySQL, which allows it to quickly integrate optimizations that, when proposed for Drupal, often face delays resulting from the need to support multiple database engines.
Optimization for PHP 5
Drupal includes “wrapper” functions to provide PHP 5-style operations on PHP 4. These wrapper functions increase the size of the code and are often dramatically slower than their native PHP 5 equivalents. Because Pressflow only supports PHP 5, it can replace these wrapper functions with their high-performance PHP 5 equivalents.
Varnish is our reverse proxy cache, this is the biggest win available when trying to scale a site to accommodate mass traffic increases.
Here we are telling Apache to STOP listening on port 80 and instead listen on port 8080, we also inform varnish that it should accept all incoming port 80 requests and utilize apache running on port 8080 to populate it’s cache and serve those requests.
Apache logs are really only helpful if you want to compare how much traffic is getting handled by varnish, vs how much is making it through to apache
Varnish cannot cache as effectively when cookies are present. By default, it does not cache anything if a cookie is present in the request headers. It creates a hash from the HTTP headers to look up cached content, and if anything is different (i.e. cookie content), a different hash is generated and the benefits of caching are negated.
Vcl_recv is executed on every incoming request
Google analytics continues to function, even without the cookies, so we strip these. Any other cookies that are not actually processed by your application are also safe to strip
The rest of this function is just cleanup, so we are creating as many consistent hashes as possible
Vcl_hash is called when varnish calculates the hash. This code instructs it to hash the cookie headers instead of just skipping it.
Protect yourself against DOS attacks by only allowing specific network segments to access cron.php. 404-cron.php does not need to exist, drupal will handle it with its own 404 error handling, and this entry will be logged so that you can be aware of any unwanted access attempts to cron.php
This same procedure can be used to protect other sensitive URLs, such as update.php or install.php
Because varnish is able to serve pages from cache so quickly, it can handle a much larger number of incoming HTTP connections than a standard apache server. As such, it needs to be able to open and maintain a large number of file handles to manage its HTTP connections and cache content. Here we are just telling to kernel to allow more file handles than is enabled by default.
Again, every application is different, so these values will need to be tuned accordingly, but a good starting point is 120MB
In addition to the php.ini configuration, the kernel needs to be configured to provide access to enough shared memory
Every table in Drupal mysql that starts with cache_ can go into memcache and it’s common to setup a separate bucket for each table. A bucket is a term used to identify an allocation of memory that is set aside for a portion of the cache. Each bucket is actually a separate memcache instance.
Multiple bins require separate instances on different ports
Setting up the Drupal memcache module to utilize our newly created bins is pretty straight forward. You provide the memcache server in the memcache_server array, Assign each of the memcache instances to a drupal table in the foreach loop, then assign the tables to buckets in the memcache_bins array.
This is what we will assume for the rest of the presentation
Varnish can use it’s own server depending on your configuration.