2. Elasticsearch and me
● At Infoscience, helped build a log
management product based on ES +
Hadoop
● At M3, ES evangelist (??)
○ Maintain ES cluster
○ Help dev teams integrate ES into their apps
Twitter: @cbirchall
Github: https://github.com/cb372
3. Search at M3
● Using ES for all new services
○ Search, recommendation (MoreLikeThis)
● Slowly migrating other services from Solr
● A few legacy services use Lucene directly
● Running all indices on one ES cluster
● Kuromoji for Japanese content
4. Debugging
Mostly debugging of queries
● “Why doesn’t doc X match query Y?”
● “Why does this search return no results?”
Operational issues are very rare
● ES’s clustering magic is surprisingly
stable!
● No performance issues so far
5. Debugging - Step 1
Check for typos!
ES will silently ignore many typos in
settings/mapping definitions
7. Typo - Example (cont’d)
{"ok":true,"acknowledged":true}
Response from ES:
OK, seems fine...
8. Typo - Example (cont’d)
$ curl localhost:9200/myapp/_mappings?pretty
Response from ES:
{
"myapp" : { }
}
Eh?
Where are my lovingly-crafted mappings?!
Now check the mappings...
15. Why disable Kuromoji?
Problem: occasionally weird tokenization
● AND query will fail, because not all terms match
● OR query will match any document with 病院
→ low precision
Phrase Terms
特定医療法人財団 日本会 東日本病院
(document field)
特定、医療、法人、財団、
日本、会、東日本、病院
東日本 (query) 東日、東日本、本
東日本病院 (query) 東、東日本、日本、病院
16. Useful plugin - Head
$ bin/plugin -install mobz/elasticsearch-head
http://mobz.github.io/elasticsearch-head/
17. Testing
Main goal: Ensure that queries return the
results that we expect
● Test coverage of representative queries
○ Freedom to tune for a given query without
breaking other queries
Ideally, tests should:
● Run fast
● Run standalone (i.e. no need to have an
ES server running)
18. Testing - Java
elasticsearch-test is awesome
● DSL to set up/tear down ES
● Annotations + JUnit runner
● ES runs in-process
○ No need to start an external ES server
● Index is stored in-memory
○ Runs quickly
https://github.com/tlrx/elasticsearch-test