4. Introduction
• I work for Future Publishing Plc
• We are based in Bath, UK and have offices
in London, San Francisco, New York and
Sydney
• We publish over 180 special-interest
publications
5. Publications
To name a few:
• In the UK we publish SFX, Cycling Plus,
TotalFilm, MBUK, Simply Knitting, Official
Playstation Magazine, .Net
• In the US we publish Mac|Life, World of
Warcraft Official Magazine
• In Australia we publish Guitarist, T3, Official
Nintendo, Official Xbox 360
7. Websites
• Most websites contain: news, features,
reviews, products, forums
• Ad funded
• Popular sites, lots of traffic, often market
leaders
8. Website Platforms
We try to use the most appropriate
website platform for the job
• Wordpress for small builds
http://www.futureplc.com
• Drupal for medium builds
http://www.photoradar.com
• CakePHP for large custom builds
http://www.totalfilm.com
http://www.cyclingnews.com
9.
10. Our first big CakePHP build
• The first CakePHP build had 6 developers,
2 front-end developers and most hadn’t
used CakePHP before
• Design change halfway through site build
• But the site was completely with time to
spare
11. We learnt some lessons
• Most developers thought they knew better
• They didn’t really embrace CakePHP
conventions - myself included
• They didn’t know better
12. We learnt some lessons
• We ended up with at four different ways of
doing everything
• Our future site builds with CakePHP really
should use developers that “get” the
framework
13. Launch day
• Expecting 30k page views
• Got 100k page views
• Site held up nicely
• We all went to the pub
14. The day after launch day
• Marketing and PR departments announced
the new site and started pimping links
• IMDB liked one of the launch stories and
put a link on their homepage
• We got 500k page views
• CakePHP view caching did not hold up
because the server just couldn’t run
enough copies of PHP
15. Oh crap! It broke
• How did we fix it?
• JavaScript used for dynamic aspects such as
user login/out
• Cached full pages in Memcache via a helper
• Served directly from Memcache with Nginx
• More info:
http://andy-gale.com/cakephp-view-memcache.html
16. Quick fix cache solution wasn’t perfect!
• Hub pages - i.e. the homepage, features -
weren’t immediately updated as content
changed
• Simple changes to content meant entire
sections of the site should be rebuilt
• Susceptible to the “thundering herd”
problem
18. Thundering Herd
A cache issue
• When cached item expires an unlucky user
has to rebuild item in cache
• But we have many requests every second
• That’s a lot of concurrent requests trying
to rebuild the item in cache
• And then we have an unlucky server
19. Thundering Herd
Solutions - Soft expire
• A single unlucky user rebuilds that cache
item
• And creates a lock in the cache so no
other users try to recreate
• Lock must expire quickly in case rebuild
fails
20. Thundering Herd
Solutions - Update the cache!!!
• The best way to prevent the thundering
herd problem is positive cache expiry
• Update the cache when things change!
• Sometimes easier said than done
22. Our next project
• Recently acquired by Future Publishing
• World’s number one cycling site
• Flat HTML website
• Editors used text editor to hand code
HTML and FTP to update the website
• Laborious to edit but fast to serve
23. Our next project
• CMS driven website needed
• De-skill editorial requirements
• More modern design
• Still needs to handle 4 million page views in
a day during the Tour de France
• Massive peak towards the end of a stage
25. Our next project
• It’s a news site and needs to update
instantly
• High traffic peaks during the Tour de
France or doping story
• Couldn’t tolerate the caching issues of that
TotalFilm
• Don’t have a Facebook’s hardware budget
26. How would we cache it?
• Instead of caching whole pages, cache
individual elements
• We called them “panels”
• The data in each panel relates to a model
so when that model is saved we can update
the cache of panels related to that model
27.
28. 1000 requests per second
• At peak times (i.e. at the end of a stage)
• Not the CSS, JS and other static elements
they are served via a CDN
• That’s serving just the HTML
29. Can we handle that with CakePHP?
• After the success of TotalFilm we wanted
to use CakePHP again
• But we we’re worried it wouldn’t scale to
1000 requests a second
• So we benchmarked it
30. Can we handle that with CakePHP?
• CakePHP with requestAction was very
slow
• CakePHP with elements cached was
actually pretty quick but too slow
31. So what did we do?
• CakePHP for the CMS
• CakePHP generating the HTML separated
into panels
• CakePHP for publishing HTML panels into
Memcache
32. So what did we do?
• We used a very simple PHP script to
compile the HTML into full pages
• Works out which panels are required from
URL, fetches parts of the page from
Memcache and compiles page
33. Benchmarks (average with all panels cached)
• Using CakePHP with cached elements
0.1521 seconds
• Pagecompiler
0.0488 seconds
• Pagecompiler with optimisation
0.0031 seconds
34. How do we update the panels?
• View panel - an article
• When article changes, panel needs to be
updated
35. Two types of panel
Hub panel
Often a listing
Difficult to update
36. Hub panels - Need to update?
• We stored the find params for each panel
• Check beforeSave and afterSave, compare
the resulting arrays and if they’ve changed
the HTML panel needs to be regenerated
37. Model beforeSave
Model->find() for the each panel
associated with model
Model afterSave
Model->find() for the each panel Ensure find query isn’t
associated with model cached!!!
Update HTML panel in cache
38. Lots of panels
• To avoid making the CMS really slow
generating loads of HTML panels for every
save we decided to use a queue
• And made a CakePHP shell to work
through the panels and publish them
39. Panelworker
• CakePHP shell
• Works through the panels and publishes
them
• Queue system prevents panels being
regenerated over and over again when
altered by multiple users
41. CMS Panelworker
Place changed Next item from queue
panels in queue
Fetch panel HTML from CMS
Panel queue
/news
Publish panel HTML to
/news/view/lance-is-amazing
Memcache
/races/view/tour-de-france
/features/view/some-new-bike
www1 www2
42. What else?
• We cached generated pages for 60 seconds
in Memcache and served directly from
Nginx to give us even more head room
• www. servers also run Panelworkers to
enable them to get panels they don’t have
• Panels are also cached on disk for when
they fall out of Memcache
43. So the site launched
• 1181 forum posts complaining
• Lance Armstrong tweets saying he hates
the site
• But, it easily handled the traffic and we have
loads of head room
44. Alternatively
• If your site needs more interactivity with
users, replace HTML panels with data
• Use a similar system to update data in the
cache when things change
• A queue system similar to the Panelworker
could still be useful to keep your site
responsive
45. What we’d do differently
• Instead of storing find params use a model
method for fetching content for each panel
• Despite being fast the web site lacks
interactively
• Use Membase instead of both Memcache
and files to store panels
46. But...
• Since we did out benchmarks over a year
ago CakePHP 1.3 seems a lot quicker
• LazyModel by Frank de Graaf (and others)
http://bakery.cakephp.org/articles/view/optimizing-model-loading-with-lazymodel
• Although it still connects to MySQL when
there is no need :(
47. But...
• For the new build we’re implementing the
following caching system...