SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Debugging and
Testing ES Systems
Chris Birchall
2013/8/29
Elasticsearch 勉強会 第1回
#elasticsearchjp
Elasticsearch and me
● At Infoscience, helped build a log
management product based on ES +
Hadoop
● At M3, ES evangelist (??)
○ Maintain ES cluster
○ Help dev teams integrate ES into their apps
Twitter: @cbirchall
Github: https://github.com/cb372
Search at M3
● Using ES for all new services
○ Search, recommendation (MoreLikeThis)
● Slowly migrating other services from Solr
● A few legacy services use Lucene directly
● Running all indices on one ES cluster
● Kuromoji for Japanese content
Debugging
Mostly debugging of queries
● “Why doesn’t doc X match query Y?”
● “Why does this search return no results?”
Operational issues are very rare
● ES’s clustering magic is surprisingly
stable!
● No performance issues so far
Debugging - Step 1
Check for typos!
ES will silently ignore many typos in
settings/mapping definitions
Typo - Example
$ curl -X PUT localhost:9200/myapp -d '{
"settings": {
"number_of_shards": 3
},
"mapping" : {
"article" : {
"_source": { "enabled": false },
"properties": {
"title": { "type": "string", "store": "true" },
"body": { "type": "string", "store": "true" },
...
}
},
...
}'
Let’s create a new index...
Typo - Example (cont’d)
{"ok":true,"acknowledged":true}
Response from ES:
OK, seems fine...
Typo - Example (cont’d)
$ curl localhost:9200/myapp/_mappings?pretty
Response from ES:
{
"myapp" : { }
}
Eh?
Where are my lovingly-crafted mappings?!
Now check the mappings...
Typo - Example (cont’d)
$ curl -X PUT localhost:9200/myapp -d '{
"settings": {
"number_of_shards": 3
},
"mappings" : {
"article" : {
"_source": { "enabled": false },
"properties": {
"title": { "type": "string", "store": "true" },
"body": { "type": "string", "store": "true" },
...
}
},
...
}'
Oops!
Debugging - Step 2
Set up a local environment
● Makes it easy to wipe & rebuild index
Setting up a local env (OSX)
# Install
$ brew install elasticsearch
# Kuromoji plugin (optional)
$ /usr/local/opt/elasticsearch/bin/plugin -install
elasticsearch/elasticsearch-analysis-kuromoji/1.5.0
# Start
$ elasticsearch
# Create index
$ curl -X PUT localhost:9200/my_app -d '{ ... }'
# Insert some documents
$ curl -X PUT localhost:9200/my_app/my_type/1 -d '{ ... }'
$ curl -X PUT localhost:9200/my_app/my_type/2 -d '{ ... }'
# Done!
Useful commands - Analyze
$ curl 'localhost:9200/myindex/_analyze?pretty' /
-d '東京特許許可局許可局長'
{
"tokens" : [ {
"token" : "東京",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 1
}, {
"token" : "特許",
"start_offset" : 2,
"end_offset" : 4,
"type" : "word",
...
How is my
document/query
being
tokenized?
Useful commands - Explain
$ curl 'localhost:9200/kuro/docs/123/_explain?pretty' /
-d '{ "query": { "term": { "body": "東京" } } }'
{
...
"matched" : true,
"explanation" : {
"value" : 0.375,
"description" : "weight(body:東京 in 0)
[PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.375,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0"
...
Why does this
document (not)
match this
query?
Specify document ID
Tuning queries
Parameters to tweak
● default_operator (AND/OR)
● auto_generate_phrase_queries
● minumum_should_match
● Stop words/tags
● Kuromoji
○ Segmentation mode
○ Reading form filter
○ Disable Kuromoji! (for some fields)
Why disable Kuromoji?
Problem: occasionally weird tokenization
● AND query will fail, because not all terms match
● OR query will match any document with 病院
→ low precision
Phrase Terms
特定医療法人財団 日本会 東日本病院
(document field)
特定、医療、法人、財団、
日本、会、東日本、病院
東日本 (query) 東日、東日本、本
東日本病院 (query) 東、東日本、日本、病院
Useful plugin - Head
$ bin/plugin -install mobz/elasticsearch-head
http://mobz.github.io/elasticsearch-head/
Testing
Main goal: Ensure that queries return the
results that we expect
● Test coverage of representative queries
○ Freedom to tune for a given query without
breaking other queries
Ideally, tests should:
● Run fast
● Run standalone (i.e. no need to have an
ES server running)
Testing - Java
elasticsearch-test is awesome
● DSL to set up/tear down ES
● Annotations + JUnit runner
● ES runs in-process
○ No need to start an external ES server
● Index is stored in-memory
○ Runs quickly
https://github.com/tlrx/elasticsearch-test
https://github.com/cb372/elasticsearch-test-example
Testing - Java
Simple elasticsearch-test example
Testing - Ruby
Simple Rails + Tire + RSpec example
https://github.com/cb372/elasticsearch-rspec-example
We’re hiring!
TODO We are hiring slide
http://bit.ly/m3jobs

Weitere ähnliche Inhalte

Was ist angesagt?

Logstash: Get to know your logs
Logstash: Get to know your logsLogstash: Get to know your logs
Logstash: Get to know your logsSmartLogic
 
ニコニコ動画を検索可能にしてみよう
ニコニコ動画を検索可能にしてみようニコニコ動画を検索可能にしてみよう
ニコニコ動画を検索可能にしてみようgenta kaneyama
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...Sematext Group, Inc.
 
Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용종민 김
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화NAVER D2
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELKYuHsuan Chen
 
Application Logging With The ELK Stack
Application Logging With The ELK StackApplication Logging With The ELK Stack
Application Logging With The ELK Stackbenwaine
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic searchmarkstory
 
Real-time search in Drupal. Meet Elasticsearch
Real-time search in Drupal. Meet ElasticsearchReal-time search in Drupal. Meet Elasticsearch
Real-time search in Drupal. Meet ElasticsearchAlexei Gorobets
 
elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practiceJano Suchal
 
아파트 정보를 이용한 ELK stack 활용 - 오근문
아파트 정보를 이용한 ELK stack 활용 - 오근문아파트 정보를 이용한 ELK stack 활용 - 오근문
아파트 정보를 이용한 ELK stack 활용 - 오근문NAVER D2
 
Elastic search 클러스터관리
Elastic search 클러스터관리Elastic search 클러스터관리
Elastic search 클러스터관리HyeonSeok Choi
 
Dcm#8 elastic search
Dcm#8  elastic searchDcm#8  elastic search
Dcm#8 elastic searchIvan Wallarm
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013Roy Russo
 
PySpark with Juypter
PySpark with JuypterPySpark with Juypter
PySpark with JuypterLi Ming Tsai
 

Was ist angesagt? (20)

Logstash: Get to know your logs
Logstash: Get to know your logsLogstash: Get to know your logs
Logstash: Get to know your logs
 
ニコニコ動画を検索可能にしてみよう
ニコニコ動画を検索可能にしてみようニコニコ動画を検索可能にしてみよう
ニコニコ動画を検索可能にしてみよう
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
 
Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELK
 
Application Logging With The ELK Stack
Application Logging With The ELK StackApplication Logging With The ELK Stack
Application Logging With The ELK Stack
 
LogStash in action
LogStash in actionLogStash in action
LogStash in action
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic search
 
Real-time search in Drupal. Meet Elasticsearch
Real-time search in Drupal. Meet ElasticsearchReal-time search in Drupal. Meet Elasticsearch
Real-time search in Drupal. Meet Elasticsearch
 
Nodejs - A quick tour (v6)
Nodejs - A quick tour (v6)Nodejs - A quick tour (v6)
Nodejs - A quick tour (v6)
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practice
 
아파트 정보를 이용한 ELK stack 활용 - 오근문
아파트 정보를 이용한 ELK stack 활용 - 오근문아파트 정보를 이용한 ELK stack 활용 - 오근문
아파트 정보를 이용한 ELK stack 활용 - 오근문
 
Elastic search 클러스터관리
Elastic search 클러스터관리Elastic search 클러스터관리
Elastic search 클러스터관리
 
Dcm#8 elastic search
Dcm#8  elastic searchDcm#8  elastic search
Dcm#8 elastic search
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 
PySpark with Juypter
PySpark with JuypterPySpark with Juypter
PySpark with Juypter
 

Andere mochten auch

elasticsearchプラグイン入門
elasticsearchプラグイン入門elasticsearchプラグイン入門
elasticsearchプラグイン入門Shinsuke Sugaya
 
Elasticsearch入門 pyfes 201207
Elasticsearch入門 pyfes 201207Elasticsearch入門 pyfes 201207
Elasticsearch入門 pyfes 201207Jun Ohtani
 
Elasticsearchを使うときの注意点 公開用スライド
Elasticsearchを使うときの注意点 公開用スライドElasticsearchを使うときの注意点 公開用スライド
Elasticsearchを使うときの注意点 公開用スライド崇介 藤井
 
リクルート流Elasticsearchの使い方
リクルート流Elasticsearchの使い方リクルート流Elasticsearchの使い方
リクルート流Elasticsearchの使い方Recruit Technologies
 
Elasticsearchのサジェスト機能を使った話
Elasticsearchのサジェスト機能を使った話Elasticsearchのサジェスト機能を使った話
Elasticsearchのサジェスト機能を使った話ktaro_w
 
Elasticsearchで作る形態素解析サーバ
Elasticsearchで作る形態素解析サーバElasticsearchで作る形態素解析サーバ
Elasticsearchで作る形態素解析サーバShinsuke Sugaya
 
Elasticsearchインデクシングのパフォーマンスを測ってみた
Elasticsearchインデクシングのパフォーマンスを測ってみたElasticsearchインデクシングのパフォーマンスを測ってみた
Elasticsearchインデクシングのパフォーマンスを測ってみたRyoji Kurosawa
 
Amebaにおけるログ解析基盤Patriotの活用事例
Amebaにおけるログ解析基盤Patriotの活用事例Amebaにおけるログ解析基盤Patriotの活用事例
Amebaにおけるログ解析基盤Patriotの活用事例cyberagent
 
Flumeを活用したAmebaにおける大規模ログ収集システム
Flumeを活用したAmebaにおける大規模ログ収集システムFlumeを活用したAmebaにおける大規模ログ収集システム
Flumeを活用したAmebaにおける大規模ログ収集システムSatoshi Iijima
 
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch ServiceAWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch ServiceAmazon Web Services
 
Elasticsearch+nodejs+dynamodbで作る全社システム基盤
Elasticsearch+nodejs+dynamodbで作る全社システム基盤Elasticsearch+nodejs+dynamodbで作る全社システム基盤
Elasticsearch+nodejs+dynamodbで作る全社システム基盤Recruit Technologies
 
[Black Belt Online Seminar] AWS上でのログ管理
[Black Belt Online Seminar] AWS上でのログ管理[Black Belt Online Seminar] AWS上でのログ管理
[Black Belt Online Seminar] AWS上でのログ管理Amazon Web Services Japan
 
Fluentdのお勧めシステム構成パターン
Fluentdのお勧めシステム構成パターンFluentdのお勧めシステム構成パターン
Fluentdのお勧めシステム構成パターンKentaro Yoshida
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data AnalyticsAmazon Web Services
 

Andere mochten auch (15)

elasticsearchプラグイン入門
elasticsearchプラグイン入門elasticsearchプラグイン入門
elasticsearchプラグイン入門
 
Elasticsearch入門 pyfes 201207
Elasticsearch入門 pyfes 201207Elasticsearch入門 pyfes 201207
Elasticsearch入門 pyfes 201207
 
Logstash
LogstashLogstash
Logstash
 
Elasticsearchを使うときの注意点 公開用スライド
Elasticsearchを使うときの注意点 公開用スライドElasticsearchを使うときの注意点 公開用スライド
Elasticsearchを使うときの注意点 公開用スライド
 
リクルート流Elasticsearchの使い方
リクルート流Elasticsearchの使い方リクルート流Elasticsearchの使い方
リクルート流Elasticsearchの使い方
 
Elasticsearchのサジェスト機能を使った話
Elasticsearchのサジェスト機能を使った話Elasticsearchのサジェスト機能を使った話
Elasticsearchのサジェスト機能を使った話
 
Elasticsearchで作る形態素解析サーバ
Elasticsearchで作る形態素解析サーバElasticsearchで作る形態素解析サーバ
Elasticsearchで作る形態素解析サーバ
 
Elasticsearchインデクシングのパフォーマンスを測ってみた
Elasticsearchインデクシングのパフォーマンスを測ってみたElasticsearchインデクシングのパフォーマンスを測ってみた
Elasticsearchインデクシングのパフォーマンスを測ってみた
 
Amebaにおけるログ解析基盤Patriotの活用事例
Amebaにおけるログ解析基盤Patriotの活用事例Amebaにおけるログ解析基盤Patriotの活用事例
Amebaにおけるログ解析基盤Patriotの活用事例
 
Flumeを活用したAmebaにおける大規模ログ収集システム
Flumeを活用したAmebaにおける大規模ログ収集システムFlumeを活用したAmebaにおける大規模ログ収集システム
Flumeを活用したAmebaにおける大規模ログ収集システム
 
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch ServiceAWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
 
Elasticsearch+nodejs+dynamodbで作る全社システム基盤
Elasticsearch+nodejs+dynamodbで作る全社システム基盤Elasticsearch+nodejs+dynamodbで作る全社システム基盤
Elasticsearch+nodejs+dynamodbで作る全社システム基盤
 
[Black Belt Online Seminar] AWS上でのログ管理
[Black Belt Online Seminar] AWS上でのログ管理[Black Belt Online Seminar] AWS上でのログ管理
[Black Belt Online Seminar] AWS上でのログ管理
 
Fluentdのお勧めシステム構成パターン
Fluentdのお勧めシステム構成パターンFluentdのお勧めシステム構成パターン
Fluentdのお勧めシステム構成パターン
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
 

Ähnlich wie Debugging and Testing ES Systems

2 Years of Real World FP at REA
2 Years of Real World FP at REA2 Years of Real World FP at REA
2 Years of Real World FP at REAkenbot
 
Declarative Infrastructure Tools
Declarative Infrastructure Tools Declarative Infrastructure Tools
Declarative Infrastructure Tools Yulia Shcherbachova
 
HFM, Workspace, and FDM – Voiding your warranty
HFM, Workspace, and FDM – Voiding your warrantyHFM, Workspace, and FDM – Voiding your warranty
HFM, Workspace, and FDM – Voiding your warrantyCharles Beyer
 
Javascript done right - Open Web Camp III
Javascript done right - Open Web Camp IIIJavascript done right - Open Web Camp III
Javascript done right - Open Web Camp IIIDirk Ginader
 
Catalyst - refactor large apps with it and have fun!
Catalyst - refactor large apps with it and have fun!Catalyst - refactor large apps with it and have fun!
Catalyst - refactor large apps with it and have fun!mold
 
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...NoSQLmatters
 
The Evolution of Async-Programming on .NET Platform (.Net China, C#)
The Evolution of Async-Programming on .NET Platform (.Net China, C#)The Evolution of Async-Programming on .NET Platform (.Net China, C#)
The Evolution of Async-Programming on .NET Platform (.Net China, C#)jeffz
 
"ClojureScript journey: from little script, to CLI program, to AWS Lambda fun...
"ClojureScript journey: from little script, to CLI program, to AWS Lambda fun..."ClojureScript journey: from little script, to CLI program, to AWS Lambda fun...
"ClojureScript journey: from little script, to CLI program, to AWS Lambda fun...Julia Cherniak
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbMongoDB APAC
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaPrajal Kulkarni
 
A Recovering Java Developer Learns to Go
A Recovering Java Developer Learns to GoA Recovering Java Developer Learns to Go
A Recovering Java Developer Learns to GoMatt Stine
 
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup Alberto Paro
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019Provectus
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
PostgreSQL Open SV 2018
PostgreSQL Open SV 2018PostgreSQL Open SV 2018
PostgreSQL Open SV 2018artgillespie
 
Remix Your Language Tooling (JSConf.eu 2012)
Remix Your Language Tooling (JSConf.eu 2012)Remix Your Language Tooling (JSConf.eu 2012)
Remix Your Language Tooling (JSConf.eu 2012)lennartkats
 
SQL-RISC: New Directions in SQLi Prevention - RSA USA 2013
SQL-RISC: New Directions in SQLi Prevention - RSA USA 2013SQL-RISC: New Directions in SQLi Prevention - RSA USA 2013
SQL-RISC: New Directions in SQLi Prevention - RSA USA 2013Nick Galbreath
 

Ähnlich wie Debugging and Testing ES Systems (20)

2 Years of Real World FP at REA
2 Years of Real World FP at REA2 Years of Real World FP at REA
2 Years of Real World FP at REA
 
Declarative Infrastructure Tools
Declarative Infrastructure Tools Declarative Infrastructure Tools
Declarative Infrastructure Tools
 
HFM, Workspace, and FDM – Voiding your warranty
HFM, Workspace, and FDM – Voiding your warrantyHFM, Workspace, and FDM – Voiding your warranty
HFM, Workspace, and FDM – Voiding your warranty
 
Javascript done right - Open Web Camp III
Javascript done right - Open Web Camp IIIJavascript done right - Open Web Camp III
Javascript done right - Open Web Camp III
 
Catalyst - refactor large apps with it and have fun!
Catalyst - refactor large apps with it and have fun!Catalyst - refactor large apps with it and have fun!
Catalyst - refactor large apps with it and have fun!
 
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
 
The Evolution of Async-Programming on .NET Platform (.Net China, C#)
The Evolution of Async-Programming on .NET Platform (.Net China, C#)The Evolution of Async-Programming on .NET Platform (.Net China, C#)
The Evolution of Async-Programming on .NET Platform (.Net China, C#)
 
"ClojureScript journey: from little script, to CLI program, to AWS Lambda fun...
"ClojureScript journey: from little script, to CLI program, to AWS Lambda fun..."ClojureScript journey: from little script, to CLI program, to AWS Lambda fun...
"ClojureScript journey: from little script, to CLI program, to AWS Lambda fun...
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 
A Recovering Java Developer Learns to Go
A Recovering Java Developer Learns to GoA Recovering Java Developer Learns to Go
A Recovering Java Developer Learns to Go
 
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup
 
Tdd is not about testing
Tdd is not about testingTdd is not about testing
Tdd is not about testing
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
 
Api Design
Api DesignApi Design
Api Design
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
PostgreSQL Open SV 2018
PostgreSQL Open SV 2018PostgreSQL Open SV 2018
PostgreSQL Open SV 2018
 
Remix Your Language Tooling (JSConf.eu 2012)
Remix Your Language Tooling (JSConf.eu 2012)Remix Your Language Tooling (JSConf.eu 2012)
Remix Your Language Tooling (JSConf.eu 2012)
 
SQL-RISC: New Directions in SQLi Prevention - RSA USA 2013
SQL-RISC: New Directions in SQLi Prevention - RSA USA 2013SQL-RISC: New Directions in SQLi Prevention - RSA USA 2013
SQL-RISC: New Directions in SQLi Prevention - RSA USA 2013
 

Mehr von Chris Birchall

Scala.js & friends: SCALA ALL THE THINGS
Scala.js & friends: SCALA ALL THE THINGSScala.js & friends: SCALA ALL THE THINGS
Scala.js & friends: SCALA ALL THE THINGSChris Birchall
 
Tour of Distributed Systems 3 - Apache Kafka
Tour of Distributed Systems 3 - Apache KafkaTour of Distributed Systems 3 - Apache Kafka
Tour of Distributed Systems 3 - Apache KafkaChris Birchall
 
Tour of distributed systems 2 - Cassandra
Tour of distributed systems 2 - CassandraTour of distributed systems 2 - Cassandra
Tour of distributed systems 2 - CassandraChris Birchall
 
Guess the Country - Playing with Twitter Streaming API
Guess the Country - Playing with Twitter Streaming APIGuess the Country - Playing with Twitter Streaming API
Guess the Country - Playing with Twitter Streaming APIChris Birchall
 
Tour of distributed systems 1 - ZooKeeper
Tour of distributed systems 1 - ZooKeeperTour of distributed systems 1 - ZooKeeper
Tour of distributed systems 1 - ZooKeeperChris Birchall
 
ScalaCache: simple caching in Scala
ScalaCache: simple caching in ScalaScalaCache: simple caching in Scala
ScalaCache: simple caching in ScalaChris Birchall
 
Load testing with gatling
Load testing with gatlingLoad testing with gatling
Load testing with gatlingChris Birchall
 
Phone Home: A client-side error collection system
Phone Home: A client-side error collection systemPhone Home: A client-side error collection system
Phone Home: A client-side error collection systemChris Birchall
 
Branching Strategies: Feature Branches vs Branch by Abstraction
Branching Strategies: Feature Branches vs Branch by AbstractionBranching Strategies: Feature Branches vs Branch by Abstraction
Branching Strategies: Feature Branches vs Branch by AbstractionChris Birchall
 

Mehr von Chris Birchall (11)

Scala.js & friends: SCALA ALL THE THINGS
Scala.js & friends: SCALA ALL THE THINGSScala.js & friends: SCALA ALL THE THINGS
Scala.js & friends: SCALA ALL THE THINGS
 
Rust 超入門
Rust 超入門Rust 超入門
Rust 超入門
 
Tour of Distributed Systems 3 - Apache Kafka
Tour of Distributed Systems 3 - Apache KafkaTour of Distributed Systems 3 - Apache Kafka
Tour of Distributed Systems 3 - Apache Kafka
 
Tour of distributed systems 2 - Cassandra
Tour of distributed systems 2 - CassandraTour of distributed systems 2 - Cassandra
Tour of distributed systems 2 - Cassandra
 
Guess the Country - Playing with Twitter Streaming API
Guess the Country - Playing with Twitter Streaming APIGuess the Country - Playing with Twitter Streaming API
Guess the Country - Playing with Twitter Streaming API
 
Tour of distributed systems 1 - ZooKeeper
Tour of distributed systems 1 - ZooKeeperTour of distributed systems 1 - ZooKeeper
Tour of distributed systems 1 - ZooKeeper
 
ScalaCache: simple caching in Scala
ScalaCache: simple caching in ScalaScalaCache: simple caching in Scala
ScalaCache: simple caching in Scala
 
Hydra
HydraHydra
Hydra
 
Load testing with gatling
Load testing with gatlingLoad testing with gatling
Load testing with gatling
 
Phone Home: A client-side error collection system
Phone Home: A client-side error collection systemPhone Home: A client-side error collection system
Phone Home: A client-side error collection system
 
Branching Strategies: Feature Branches vs Branch by Abstraction
Branching Strategies: Feature Branches vs Branch by AbstractionBranching Strategies: Feature Branches vs Branch by Abstraction
Branching Strategies: Feature Branches vs Branch by Abstraction
 

Kürzlich hochgeladen

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Kürzlich hochgeladen (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Debugging and Testing ES Systems

  • 1. Debugging and Testing ES Systems Chris Birchall 2013/8/29 Elasticsearch 勉強会 第1回 #elasticsearchjp
  • 2. Elasticsearch and me ● At Infoscience, helped build a log management product based on ES + Hadoop ● At M3, ES evangelist (??) ○ Maintain ES cluster ○ Help dev teams integrate ES into their apps Twitter: @cbirchall Github: https://github.com/cb372
  • 3. Search at M3 ● Using ES for all new services ○ Search, recommendation (MoreLikeThis) ● Slowly migrating other services from Solr ● A few legacy services use Lucene directly ● Running all indices on one ES cluster ● Kuromoji for Japanese content
  • 4. Debugging Mostly debugging of queries ● “Why doesn’t doc X match query Y?” ● “Why does this search return no results?” Operational issues are very rare ● ES’s clustering magic is surprisingly stable! ● No performance issues so far
  • 5. Debugging - Step 1 Check for typos! ES will silently ignore many typos in settings/mapping definitions
  • 6. Typo - Example $ curl -X PUT localhost:9200/myapp -d '{ "settings": { "number_of_shards": 3 }, "mapping" : { "article" : { "_source": { "enabled": false }, "properties": { "title": { "type": "string", "store": "true" }, "body": { "type": "string", "store": "true" }, ... } }, ... }' Let’s create a new index...
  • 7. Typo - Example (cont’d) {"ok":true,"acknowledged":true} Response from ES: OK, seems fine...
  • 8. Typo - Example (cont’d) $ curl localhost:9200/myapp/_mappings?pretty Response from ES: { "myapp" : { } } Eh? Where are my lovingly-crafted mappings?! Now check the mappings...
  • 9. Typo - Example (cont’d) $ curl -X PUT localhost:9200/myapp -d '{ "settings": { "number_of_shards": 3 }, "mappings" : { "article" : { "_source": { "enabled": false }, "properties": { "title": { "type": "string", "store": "true" }, "body": { "type": "string", "store": "true" }, ... } }, ... }' Oops!
  • 10. Debugging - Step 2 Set up a local environment ● Makes it easy to wipe & rebuild index
  • 11. Setting up a local env (OSX) # Install $ brew install elasticsearch # Kuromoji plugin (optional) $ /usr/local/opt/elasticsearch/bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/1.5.0 # Start $ elasticsearch # Create index $ curl -X PUT localhost:9200/my_app -d '{ ... }' # Insert some documents $ curl -X PUT localhost:9200/my_app/my_type/1 -d '{ ... }' $ curl -X PUT localhost:9200/my_app/my_type/2 -d '{ ... }' # Done!
  • 12. Useful commands - Analyze $ curl 'localhost:9200/myindex/_analyze?pretty' / -d '東京特許許可局許可局長' { "tokens" : [ { "token" : "東京", "start_offset" : 0, "end_offset" : 2, "type" : "word", "position" : 1 }, { "token" : "特許", "start_offset" : 2, "end_offset" : 4, "type" : "word", ... How is my document/query being tokenized?
  • 13. Useful commands - Explain $ curl 'localhost:9200/kuro/docs/123/_explain?pretty' / -d '{ "query": { "term": { "body": "東京" } } }' { ... "matched" : true, "explanation" : { "value" : 0.375, "description" : "weight(body:東京 in 0) [PerFieldSimilarity], result of:", "details" : [ { "value" : 0.375, "description" : "fieldWeight in 0, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0" ... Why does this document (not) match this query? Specify document ID
  • 14. Tuning queries Parameters to tweak ● default_operator (AND/OR) ● auto_generate_phrase_queries ● minumum_should_match ● Stop words/tags ● Kuromoji ○ Segmentation mode ○ Reading form filter ○ Disable Kuromoji! (for some fields)
  • 15. Why disable Kuromoji? Problem: occasionally weird tokenization ● AND query will fail, because not all terms match ● OR query will match any document with 病院 → low precision Phrase Terms 特定医療法人財団 日本会 東日本病院 (document field) 特定、医療、法人、財団、 日本、会、東日本、病院 東日本 (query) 東日、東日本、本 東日本病院 (query) 東、東日本、日本、病院
  • 16. Useful plugin - Head $ bin/plugin -install mobz/elasticsearch-head http://mobz.github.io/elasticsearch-head/
  • 17. Testing Main goal: Ensure that queries return the results that we expect ● Test coverage of representative queries ○ Freedom to tune for a given query without breaking other queries Ideally, tests should: ● Run fast ● Run standalone (i.e. no need to have an ES server running)
  • 18. Testing - Java elasticsearch-test is awesome ● DSL to set up/tear down ES ● Annotations + JUnit runner ● ES runs in-process ○ No need to start an external ES server ● Index is stored in-memory ○ Runs quickly https://github.com/tlrx/elasticsearch-test
  • 20. Testing - Ruby Simple Rails + Tire + RSpec example https://github.com/cb372/elasticsearch-rspec-example
  • 21. We’re hiring! TODO We are hiring slide http://bit.ly/m3jobs