Weitere ähnliche Inhalte Mehr von Michael Bohlig (11) Tuning Search Requests - Amazon CloudSearch1. How to Tune Search Requests
Tom Hill / tomhill@amazon.com
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
2. Agenda
! Query Processing
! Common Issues
! Benchmarking
! Analytics
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
3. …query processing
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
4. Query Processing
564 726
726 564
Query
123 123
Matching Filtering Ranking Sorting
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
5. Query Processing
! Matching
• Text Fields
• Literal Fields
! Filtering
• Numeric terms, ranges
! Ranking
• Score computation for each document
! Sorting
• Based on rank computation, alphabetic, numeric
! Faceting
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
6. Matching
! The starting point for results
! Full text matching with “text” fields
! Exact matching with “literal” fields
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
7. Filtering
! UINT fields
• Numbers
• Numeric ranges
! After all of these you get results:
<hits
found="2432"
start="0">
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
8. Ranking
! Score can be
• Text Relevance
• Rank expression
! Done for every document that makes it past filtering
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
9. Sorting
! Last step
! Again, cost a function of match set size
! Sort by
• Text Relevance
• Rank Expression
• Field Value
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
10. Performance: Match Set Size
<all documents> cat|dog AND color:black AND age:0..6
Increasing Performance
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
11. …common issues
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
12. Literals vs Uints
! Literals will tend to improve performance
! Uints will tend to take less space
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
13. Query Restriction: LiteralsNo Uints
Vs.
Restriction
GeoMethod
TextRel
Limits
Queries
Seconds
QTimeMS
Threads
CompletedQ
AveHits
NONE
false
10
6.2255
622
1
10
8345450.00
UINT
CARTESIAN
false
10
15.6064
1560
1
10
8345450.00
EQUI
false
10
19.7106
1971
1
10
8345450.00
Restriction
COSINES
false
10
27.4968
2749
1
10
8345450.00
HAVERSINE
false
10
31.2595
3125
1
10
8345450.00
NONE
false
Numeric
10
9.1758
917
1
10
3807.00
Literal
CARTESIAN
false
Numeric
10
9.0255
902
1
10
3807.00
EQUI
false
Numeric
10
9.1158
911
1
10
3807.00
Restriction
COSINES
false
Numeric
10
9.8321
983
1
10
3807.00
HAVERSINE
false
Numeric
10
9.1272
912
1
10
3807.00
NONE
false
literal
10
0.8254
82
1
10
3781.00
CARTESIAN
false
literal
10
0.5936
59
1
10
3781.00
EQUI
false
literal
10
0.6173
61
1
10
3781.00
COSINES
false
literal
10
0.5916
59
1
10
3781.00
HAVERSINE
false
literal
10
0.6289
62
1
10
3781.00
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
14. Negative Queries
! CloudSearch supports negative queries
• &q=-amazon
! Can match all documents
• if "doesntmatchanything" matches 0 docs, then
-doesntmatchanything will match all docs.
! Matching all docs means lots of computations
• Sorting, rank expressions, etc.
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
15. Implicit Limits
! If the user doesn't give you restrictions,
add some!
! Top items for their category
! Add implicit limits to broad queries
• &bq=important:1
• &bq=population:10000..
! select the best
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
16. Wildcard Queries
! Expands the query terms
• a* => aardvark, aaron, … azimuth
• Limited to first 2000 terms.
• But that's still 2000 terms!
• Match set is all docs that contain any one of the terms
! Match set gets big!
a* a doc1 doc9 doc17 doc80 doc85 doc90 do
aaron doc3 doc50 doc87
after doc99 doc110 doc117 doc111
apple doc3 doc5 doc18
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
17. Wildcard Queries
! Stemming
• cats* doesn't match cats
• cats is stemmed to "cat", but wildcards are not stemmed
! Avoid negative wildcard queries
• -cat works fine
• but may take a while to execute
• -cat123 may confuse you. It becomes:
• -cat 123
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
18. Boolean Queries
! Used to restrict your searches
• Faceting (e.g. category)
• Date
• Security
! &bq=(and
title:'potter'
author:'rowling')
! Can improve performance
! Can slow performance
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
19. Boolean Queries
! Effects of additional AND
! Effects of Additional OR
Color
Color
Size Style
Size Style
Size Style
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
20. Rank Expressions Review
! Used to enhance search results
! Include non-text factors in scoring
• Popularity (likes/upvotes)
• Distance
• Price/Profit
&rank-‐pop=((0.3*popularity)/10.0)+(0.7*text_relevance)&rank=pop
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
21. Rank Expressions
! Execute once for each document in match set
• Reduce your match set with text matches, literals, uints
• Done after filters applied
Faster!
Slower
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
22. Rank Expressions
! Avoid queries that match all (or many) documents
• -foo
• state:CA
! Precompute static parts of rank expression
• sqrt(rating)+200*log10(doc.votes) + 20*log10(doc.sales) =>
"precomputed"
! Add implicit limits to broad searches
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
23. Default Search Field
! Searches all text fields by default
• Usually the right thing
• If not, remove unneeded fields from it
! Changeable via the API
! Alternative: explicitly select one field to search
• title:'harry potter'
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
24. Target Slow Queries
! Slow searches affect other searches
! Optimizing the slowest search requests may speed up all
of your searches.
! Slow queries can produce timeouts (507s)
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
25. …benchmarking
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
26. Testing Latency
! Hard to estimate
• Depends on queries, usage patterns, data…
! Build a domain & test
• It's the cloud, spin up & down as needed!
• Use your own data
• As close to real usage as you can
• Log replay is good!
! A/B Test your changes
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
27. Two Approaches
! Testing
• Just run some queries
! Benchmarking
• Run enough to have some statistical validity
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
28. Testing Approaches
! Statistics provided
! Browser tools
• Chrome
• Firebug for Firefox
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
29. Benchmarking
! Apache JMeter (or similar)
• Well documented
• Well tested
• General
• This is good, and bad
! Custom Code
• Usually looks a lot like your application
• So, not as much code as you might think
• Flexible
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
30. Custom Code
! Multithread it for realistic results
• If it's simulating more than one client
! Make sure the clients (tester) isn't the bottleneck
• Benchmark it with searches stubbed
• Avoid languages with global interpreter locks
! Personally, I use:
• Java
• Apache Http Client
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
31. Custom Code Example
LinkedList<Thread>
threads
=
new
LinkedList<Thread>();
int
nQ
=
queue.size();
long
time
=
System.nanoTime();
List<Consumer>
consumers
=
new
LinkedList<Consumer>();
for
(int
i
=
0;
i
<
threadCount;
i++)
{
Consumer
c1
=
new
Consumer(queue);
consumers.add(c1);
Thread
t
=
new
Thread(c1);
threads.add(t);
t.start();
}
//
Wait
for
signal
that
we
have
processed
all
queries.
for
(Thread
thread
:
threads)
{
thread.join();
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
32. Sample Data & Queries
! Sample Data – must be realistic
• BAD: a123 b456 xyzzyx
• Better: use Wikipedia, project Gutenberg
• Best: Your own data
! Queries
• BAD: random words
• Better: read words from test data
• Best: Replay log files
• Always check your number of responses in benchmarks.
• It's easy to get fast queries, if you get no hits.
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
33. …analytics
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
34. Analytics & Metrics
! You have to know what to tune
! CloudSearch Metrics
! Custom Logging
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
35. CloudSearch Metrics
! Top Queries
! Zero result queries
• Lack of data?
• Or query issues?
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
36. Custom Logging
! Log all requests on your end
! Watch
• Longest running queries
• Failed Queries
• HTTP error codes
! Track Changes over time
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
37. Warning Signs
! Know your http error codes
• 500 series can be retried
• May indicate server is overloaded
• Long queries can tie up threads until timeout
• More of an issue on small servers.
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
38. …wrap up
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
39. Wrap Up
! 1) Limit match set size
! 2) Limit Match set size J
! Be aware of the cost of features
• Test/Benchmark
! Resources
• Slides will be on meetup group
• http://jmeter.apache.org/
• https://getfirebug.com/
• http://en.wikipedia.org/wiki/Category:Load_testing_tools
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
40. Thanks!
Tom Hill
tomhill@amazon.com
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.