This talk will feature: memcache, resque, a bit of metaprogramming, a look at caching in the wild and code that fixes some usual problems, and a fairly epic SQL query with some nice Postgres features you should know about.
5. 1) NIH PRESENTATION
IN 4 WEEKS!
Integrate clinicaltrials.gov into our site
Search by trial type
Search by trial phase
Search by trial conditions mapped from Mesh to Meddra
Search by trial facility locations…
• Location search…
8. WHAT IS RIGHT
PostGIS spatial database extensions for PostgreSQL
MongoDB built in support for two dimensional geospatial indexes
9. AND WHAT IS EASY
sqrt(pow(69.1 * (clinical_trial_locations.lat - 40.948073),2)
+ pow(53.0 * (clinical_trial_locations.lng - -90.36871),2)) AS distance
11. CHOOSE THE EASY!
Who knows if location is even important?
Who knows if this project is even important?
MongoDB requires dev setup, automated staging
setup, production setup, monitoring.
16. PATIENT SEARCH RANKING
Very basic search
Plus very complex ordering
Not as many great solutions in this space
N^2 similarity matrix @ 100k patients about 4 TB
And did I mention it’s N^2?
Postgres is an amazingly viable solution.
19. BUT IT’S JUST THIS
SIDE OF ‘REAL-TIME’
One second queries just don’t fly.
And oh, yeah 16 people hitting it at the same time would
clobber the servers.
20. 3) A FORWARD LOOKING
TIME MACHINE
Maybe those were aberrations?
Crazy right?
24. STEPPING BACK
Conflict
• Relational data is most easily queried relationally.
• Relational queries don’t necessarily scale and stay in the
millisecond range
• Denormalized queries & special solutions scale
• But take longer to implement
• (note) This isn’t just SQL, I’m talking about anything slow
We want to experiment/fail fast
• But we don’t want…
30. WHAT WE WANT
Trivially easy way for developers to declare that some
methods are not to be run without adult supervision.
Consistent framework so that ops doesn’t need to be afraid
of new, sometimes expensive experiments.
32. SOLUTION SPACE
Memoization
• Brilliant
• Functional Programming Nirvana
• No cache-key shenanagins
• But also no expiry…
• There’s just one thing…
• It only works in a single request
33. SOLUTION SPACE
Memcached
• Great
• Simple to setup.
• Could be simpler. Handmade cache keys feels wrong.
• But it doesn’t solve our :-C problem.
• The first request still slams the server.
• So you do some cache warming thing…
• But this is a PITA again.
34. WHAT COULD MAKE
THIS SIMPLER?
Remove one constraint.
A basic Rails.cache.fetch guarantees you a result
• But no performance guarantee
Flip that deal around.
• Guarantee performance
• Don’t guarantee a result
• It’s ok not to know the answer!
40. SEND LATER
Super easy way to just do something later while in the same
context.
Most workers are real boring.
Single worker for suffices for many background jobs.
Makes testing/development easier by bypassing Resque in
configuration.
AR extension. Coordinates logging / monitoring.
57. SOFT EXPIRATION
Memcached is great, but it doesn’t tell you when something
expires.
Our strategy was to add a ‘soft_expiry’
This gets stored along with the result.
Then recalculate if soft_expiry < now()
61. PRETTY BORING
Except that it works.
Round 1: Major Pain Points
Round 2: Magic Scaling Sprinkles
Super alpha gem here:
https://github.com/kbrock/patella
Alternative https://github.com/csquared/rack-worker
Very REST-ish, request based.
I have an axe to grind. Forming working teams. Makeup. Engineering described as the brake. Technique: I think it relates well to what Aaron was saying about concurrency. This needs to be easy.Dispel a little meta-programming fear
Going to try to ramp up from problem statement to solution description, to general features of the solution to full blown meta programming analysis.There is a alpha gem, not actually in production. But I’m going to get into the meta programming that wrote it, because honestly it’s more there for display than for you to jump out and use.Moreover I think some people are scared of this stuff and I want to try to clear up a bit about that.
Pattern identification. 3 examples
Trials have centers every center has many recruiting locationsWe have a theory that distance to the center is a big deal. Differentiator.
See! They can all live together happily. I bring you a message of joy and peace
It’s one of those death star queries. It’s like your poor little server is Alderan.
Company name, but we don’t do a great job
Clear weighted factorsJaccard similarityDistance search without any new installsEasily extensible.
Beautiful. Makes my heart sing.Postrgres CTEA partridge in a pear tree
Third example. What kind of pattern is here?
That’s a crazy idea, you can’t do that.
The idea here is that we’re going to be able to predict the course of your disease progression based on everyone else who’s like you.It actually did work.
I’m sure this doesn’t remind you of any of the entrepreneurs you work with.
This is our ops guyThe guy with the pager
Who here knows what memoization is?
Note class method.Expiry. Soft expiration. Why soft? Memcache not super versatile. Good enough for us.No backgrounding. This lets us use the same pattern in places where we always want a result / don’t want to write an async UI.
This is a nice general purpose queuing thing.ANY method in your app get’s auto backgrounding with no setup.
NEXT: we’re going to do those two. Learn about meta-programming. Stick with me!
This is not a good slide. So we’re just going to look at it for a long long time.Until you get it.
Step 1: we have the interface, but what does it return?
----- Meeting Notes (4/24/12 08:05) -----need to look up in memcahce and return patella result
This is what gets called from Resque
This is not a real
Who knows what this is?
The fetch ALWAYS returns something but doesn’t always kick off a new job.