High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest
1.
2. High Performance Solr and JVM
Tuning Strategies used for
MapQuest’s Search Ahead
Darren Spehr – System Architect
3. MapQuest
Going strong …
since 1967
• Maps
• Directions
• Routing
• Geocoding
• Mobile
• B2B
4. Every Adventure has a Beginning
Our mobile client needs an overhaul …
Oh, and we need an auto-correct feature
… well, auto-complete
… actually, search ahead
5. … Top Secret Meeting Minutes
• How
do
we
use
auto-‐complete
today?
• What
are
we
searching
over?
• How
fast
can
a
person
type?
• What
are
we
going
to
say
in
response?
• When
do
we
have
to
launch
this?
6. Characteristics
• Searches
march
from
le;
to
right
• Expect
the
first
term
to
be
highly
relevant
• Term
order
and
proximity
are
clues
• Spaces
are
now
really
important
• Expect
mixed
query
types
• AbbreviaCons
and
misspellings
are
common
• People
can
type
really
fast
(but
generally
less
than
10
keystrokes
per
sec)
• Users
frequently
want
to
browse
8. Methodology
Some
opCmizaCons
can
be
planned
Others
need
to
be
discovered
Test
alternaCves
–
opCmize
low
hanging
fruit
early
Finally:
Take
it
to
task
10. The Data:
Categories
Franchises
Locations
• Neighborhoods
to
Countries
Points of Interest
• Airports
• Businesses
• Landmarks
Addresses
• Individual
• Block
(Interpolated)
In all – over 10 Billion unique documents
11. Architecture
Solr
API
Clusters
Mobile
Client
Mobile
App
API-‐East
Targeted
LocaCon
Business
Address
1
Address
2
API-‐West
Targeted:
4
VMs
1
shard,
283,000
docs
Frequent
Low
Volume
Updates
Loca7on:
3
VMs
1
shard,
4.3
million
docs
Frequent
Low
Volume
Updates
Business:
5
VMs
1
shard,
13.4
million
docs
Heavy
Updates
Address:
30
VMs
10
shards,
100
million
docs
No
Updates
Interpolated
Address:
30
VMs
10
shards,
10
billion
docs*
No
Updates
12. Special Cases
Business data
Ø Complex synonyms
Ø Stemming needs
Ø The memory factor
Ø Complex query patterns
Addresses
Ø So many!
Ø Nested structure
Ø Interpolated positions
Ø Updates an issue
Airports
Ø Airport codes
Ø International issues
Locations
Ø International issues
Ø Relevance
13. Move Analysis to the ETL
A typical job includes:
• Basic
text
processing
/
cleansing
• Stemming
• Synonyms
and
subsCtuCon
• Cloning
• Filtering
• Various
permutaCons
• RegionalizaCon
• Pre-‐calculaCng
relevance
14. Custom Doc Routing
Address data won’t fit in memory or perform well …
Both collections are sharded so the size on disk is around 6-8 GB
Initial, naïve balancing wasn’t nearly good enough
Optimization problem that accounts for:
- Size on disk
- Predicted query volumes
- FST load (entropy)
15. Setting Up the Indexes
Clean up schema.xml and solrconfig.xml
Exact and Fuzzy queries tested – String fields WIN!
(Thank you FST and prefix queries!)
Geo-sensitivity made easy using Spatial4J
(Thank you David Smiley!)
Optimization required
No NRT functionality needed
16. Query-Time Considerations
Jetty
-‐ <New
class=“java.uCl.concurrent.ArrayBlockingQueue”>
-‐ Limit
thread
pool
based
on
projected
need
Filters used judiciously
Pull in a single field from the indexes for display.
Shard/route aware clients used for Addresses
Estimate caching needs
17. The API has to be Fast Too
Pool as many resources as it makes sense
A Note on connection pools:
- The DefaultHttpClient avoids key registry overhead
- Ask for keep-alive support
- Balance pool according to use
Thread level caching used to avoid ClassLoader overhead
Take out some insurance with TTLs
Solr
Query
HfpClient
Executors
18. Keep Queries Simple
Federate a larger number of queries
Break queries out by type and expectation
Use custom search handlers to move the
burden of “tough” queries to Solr
Special case:
Ø Interpolated Addresses
Ø Business Names
Collec7on
Query
Count
Category
3
Franchise
3
Airport
1
CriCcal
Address
1
LocaCons
4
Businesses
3
Addresses
(both)
2
each
19. At this point the service is up and running …
but the fun has only begun
20. Getting Ready to Test
Choose your tool set …
Ø Test Suite (JMeter)
Ø Application Monitoring (VisualVM)
Ø GC Monitoring (VisualGC)
Ø On Host tools (top, pidstat)
Ø Runtime exposure (JMX, jsvc)
Ø Offline analysis (JMeter, GCHisto)
21. Set Boundary Conditions
Production Query Volume
• What is the expected peak QPS
• Estimate 50th, 75th percentiles
Know what success looks like:
• What availability are you looking for?
• What about latency?
• Caching success?
Know what failure looks like:
• When do you consider a machine maxed out?
23. Memory Settings
Max Heap = anticipated index size in memory + delta for new gen
Min Heap = Max Heap to limit HotSpot optimizations
-Xmx = -Xms
Sizing the new generation (-Xmn):
Ø Start with around 1/3 of your heap size
Ø Set the Survivor space (-XX:SurvivorRatio=15)
Determine the Eden Space:
eden = -Xmn - 2 * ( -Xmn / 15 )
24. Example
7 GB Index + 3 GB for new generation:
-Xms10G
-Xmx10G
-Xmn3G
-XX:SurvivorRatio=15
-XX:PermSize=64m
-XX:MaxPermSize=64m
Survivor Size: 3 GB / 15 = 205 MB
Eden Space: 3 GB – 2 * 205 MB = 2.7 GB
28. Test Cycle
Monitor
Record
Evaluate
?
JVM
Page
Faults
CPU
GC
Rates
Threading
Context
Switches
Locks
Swapping
Network
Traffic
Availability
Throughput
Latency
Thread
Count
Have
I
met
my
exit
condiCons?
Add
More
Threads
29. Monitoring the JVM
Watch your application come to life!
Memory Steady States:
• Old
GeneraCon:
⅓
to
¼
the
size
of
your
sepngs
• Permanent
GeneraCon:
½
its
size
Tenure histogram sizes should drop off
… this is your ideal level
14000
12000
10000
8000
6000
4000
2000
0
Tenure
Size
1
2
3
4
5
Tenure
Size
30. Monitoring Solr Caches
The UI is a wealth of information!
Cache Strategy
Ø Size
Ø Type
Look at the hit and eviction statistics
Use “binary sizing” to walk the sizes up
until there are diminishing returns
<filterCache class="solr.LRUCache"
size="8384"
initialSize="8384"
autowarmCount="0"/>
<documentCache class="solr.LRUCache"
size="8384"
initialSize="8384"
autowarmCount="0"/>
31. JVM Tuning Strategies
Smaller eden spaces result in:
Ø more frequent minor GCs
Ø a higher probability of premature promotion
Ø the best performance
Watch out for too much eager promotion and lengthening major GCs
Mitigate major GC STW pauses by:
Ø Keeping the old generation as small as possible
Ø Maybe even a little smaller
Ø Turn off swapping
Ø Consider explicit GC
33. Planning for the Future
What we used to do predictive expansion:
1) Target max VM capacity
2) Matching QPS
3) Breakdown of traffic load
4) Scaling factor
34. Conclusions
7 Habits of Highly Effective Tuners
1. Know where you’re going
2. Know where you’re starting from
3. Test incrementally
4. Monitor with intent
5. Make small changes
6. Know when to stop
7. Plan ahead
36. Resources
VisualVM
VisualGC
GCHisto
Java Performance – Hunt and John
The Garbage Collection Handbook – Jones, Hosking and Moss
Solr In Action – Grainger and Potter