1. ENHANCE YOUR APDEX..
NATURALLY!
Proven methods to enhance your
Rails performance when trafic
increases X times
Vlad ZLOTEANU
#ParisRB Software Engineer - Dimelo
March 6, 2002 @vladzloteanu
Copyright Dimelo SA www.dimelo.com
2. Be Warned!
Surprise coming up…
at the end of this talk! ;)
Copyright Dimelo SA www.dimelo.com
3. Dimelo
Software editor, SaaS platforms, social media CR
Frontend platforms
Collaborative platforms, ‘forum/SO-like’,
white-labeled, for big accounts (a kind of
GetSatisfaction / UserVoice for big accounts)
Backend product
SocialAPIs
kind of tweetdeck, but multiple channel,
designed for multiple users/teams
Copyright Dimelo SA www.dimelo.com
4. Technical details (frontend product)
30+ average dynamic (Rails) req/s (web + api)
Peaks of 80 req/s
2M+ dynamic requests / day
700k+ unique visitors / day
Copyright Dimelo SA www.dimelo.com
6. Demo env
Rails 3.2
REE + Passenger + Apache (3 workers)
MySQL 5.5.x + InnoDB tables
OSX Lion - MBPro 2010 8GB RAM
class Post < ActiveRecord::Base # 500K posts
belongs_to :author
has_and_belongs_to_many :categories
# state -> [ moderation, published, answered ]
…
class Category < ActiveRecord::Base
has_and_belongs_to_many :posts
Copyright Dimelo SA www.dimelo.com
7. A. External services: timeouts [DEMO]
# EventMachine app on port 8081
operation = proc do
sleep 2 # simulate a long running request
resp.status = 200
resp.content = "Hello World!"
end
EM.defer(operation, callback)
# AggregatesController on main site (port 8080)
uri = URI('http://127.0.0.1:8081’)
http = Net::HTTP.new(uri.host, uri.port)
@rss_feeds = http.get("/").body
Copyright Dimelo SA www.dimelo.com
8. A. External services: timeouts
Problem Page depends on external resource (E.G.: RSS,
Twitter API, FB API, Auth servers, …)
External resource responds very slow, or
connection hangs
In ruby, Net::HTTP’s default timeout is 60s!
Ruby 1.8 – Timeout library is not reliable
Solution Move it to a BG request
Put timeouts EVERYWHERE!
Enable timeouts on API clients
Cache parts that involve external resources
Copyright Dimelo SA www.dimelo.com
9. A. Internal services(2)
Problem Same conditions, but this time 2 services from
same server/application have calls each to other
Solution Same problems, but risk of deadlock!
Copyright Dimelo SA www.dimelo.com
10. B. DB: Queries containing ‘OR’
conditions [Demo]
# Request: list also my posts (current user’s posts),
even if they are not published
# Current index: on [state, created_at]
@posts.where("state = :state OR author_id =
:author_id",
{:state => 'published',
:author_id => params[:author_id]})
Copyright Dimelo SA www.dimelo.com
11. B. DB: Queries containing ‘OR’
conditions
Problem Queries containing “OR” conditions
EG: ‘visible_or_mine’ (status = published OR
author_id=42 )
.. will make index on [ a, b, c ] unusable on (a
OR condition) AND b AND c
Solution Don’t use it!
Cache the result
Put index only on sort column
On: (a OR cond) AND b AND c, put index on[b, c]
Copyright Dimelo SA www.dimelo.com
12. C. Filtering on HABTM relations [Demo]
# Request: Filter by one (or more) categories
# Model
@posts = @posts.joins(:categories).
where(:categories =>
{:id => params[:having_categories]})
# OR: Create join model, use only one join
# Model
has_many :post_categorizations
has_many :categories, :through => :post_categorizations
# Controller
@posts.joins(:post_categorizations).
where(:post_categorizations =>
{:category_id =>
Copyright Dimelo SA params[:having_categories]}) www.dimelo.com
13. C. Filtering on HABTM relations
Problem Filtering on HABTM relations creates a double
join
.. which are (usually) expensive
Solution Rewrite double joins
Use intermediary model
Join on intermediary model
Copyright Dimelo SA www.dimelo.com
14. D. DB: Pagination/count on large tables
[Demo]
# Nothing fancy, just implement pagination
# Controller
@posts = @posts.paginate(
:page => params[:page]).
order('posts.created_at desc')
# View
<%= will_paginate @posts %>
Copyright Dimelo SA www.dimelo.com
15. D. DB: Pagination/count on large tables
Problem Count queries are expensive on large tables
Each time a pagination is displayed, a count
query is run
Displaying distant page (aka using a big OFFSET)
is very expensive
MyISAM: counts LOCK the TABLE!
Copyright Dimelo SA www.dimelo.com
16. D. DB: Pagination/count on large tables
(2)
Solution Cache count result
.. and don’t display ‘last’ pages
Limit count
SELECT COUNT(*) FROM a_table WHERE some_conditions
SELECT COUNT(*) FROM (SELECT 1 FROM a_table WHERE
some_conditions LIMIT x) t;
Drop the isolation: NOLOCK / READ
UNCOMMITED
Copyright Dimelo SA www.dimelo.com
17. E. Fragment caching: Thundering herd
# Let’s implement fragment caching, time-expired for
displaying the previous page (no pagination
optimisations were enabled)
<% cache_key = ("posts::" +
Digest::MD5.hexdigest(params.inspect))
cache cache_key, :expires_in => 20.seconds do %>
<h1>Posts#index</h1>
….
<% end %>
Copyright Dimelo SA www.dimelo.com
18. E. Fragment caching: Thundering herd
Problem Using: fragment cache, time-expired
Cache for a resource-intensive page expires
multiple processes try to recalculate the key
Effects: resource consumption peaks, passenger
worker pools starvation
t
Cache unavailable; Cache
Cache computation validity
Copyright Dimelo SA www.dimelo.com
20. E. Fragment caching: Thundering herd (3)
Backgrounded calculation/sweeping is
hard/messy/buggy
Solution
Before expiration time is reached (t - delta),
obtain a lock and trigger cache recalculation
The next processes won’t obtain the lock and
will serve the still-valid cache
Rails 2:
github.com/nel/atomic_mem_cache_store
Rails 3: Implemented.
Copyright Dimelo SA www.dimelo.com
21. F. API and Web on same server
Problem API and Web don’t have complementary usage
patterns
Web slows down APIs, that should respond
fast
APIs are much more prone to peaks
Worker threads starvation
Solution Put API and WEB on different servers
Log/Throttle API calls
Copyright Dimelo SA www.dimelo.com
22. G. API: dynamic queries
Problem REST APIs usually expose a proxy to your DB
Client can make any type of combination
available: filter + sort
And because they can, they will.
Solution Don’t give them too many options
Use one db per client (prepare to shard per
client)
Will be able to: add custom indexes
Log/Throttle API calls
Copyright Dimelo SA www.dimelo.com
23. H. Ruby GC: not adapted for Web
frameworks
Problem Ruby GC is not optimized for large web
frameworks
On medium Rails apps, ~50% of time can be
spent in GC
Solution Use REE or Ruby 1.9.3
Activate GC.stats & benchmark your app
Tweak GC params
trade memory for CPU
Previous conf: 40%+ speed for 20%+ memory
Copyright Dimelo SA www.dimelo.com
24. I. Other recomandations
Solution
Use MyISAM (on MySQL) unless you really need
transactions
Design your models thinking about sharding
(and shard the DB, when it becomes the
bottleneck)
Perf refactor: improve where it matters
Benchmark (before and after)
Btw.. Don’t make perf tests on OSX :P
Copyright Dimelo SA www.dimelo.com
26. Le Dimelo Contest revient !
Coder un Middleware Rack
pour
déterminer les urls accédées via Rack,
calculer le nombre de visiteurs uniques,
en temps réel, agrégé sur les 5 dernières
minutes.
Copyright Dimelo SA www.dimelo.com