Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Elasticsearch
Guide to search #1
Antoni Orfin
antoniorfin@gmail.com
USE CASES
1. Intelligent search engines
…learning on users behaviour
„Search for cats
that I would love
from 3M database”
...
USE CASES
2. Autocomplete
„Show the most relevant suggestions
that starts with search…”
USE CASES
3. Geo-search (Geospatial)
„Search for restaurants
that are nearest to ”
USE CASES
4. Search by colors (ColorSearch)
„Search for flowers
that are ”
OLD SCHOOL
Searching in MySQL
SELECT * FROM photos WHERE title LIKE ”%cat%”
SELECT * FROM photos WHERE title LIKE ”%cats%”...
SEARCH THEORY
Building Inverted Index
Cute cat
and dog
#1
Cats
playing
piano
#3
Term [PK] Id
cute 1
cat 1, 2, 3
dog 1, 2
p...
SEARCH THEORY
Text Analysis
Puppy and kitten with guinea pig
1. Tokenization
[Puppy] [and] [kitten] [with] [guinea] [pig]
...
ASCII Folding – róża à roza
Lowercase - Cat à cat
Synonyms –
kitten à cat
puppy à dog
Stopwords – common words to remo...
Lekarz Chorób Wewnętrznych
stemming
Lekarz Choroba Wewnętrzny
asciifolding, lowercase
lekarz choroba wewnetrzny
synonyms
i...
TECHNOLOGIES
Search Engines Overview
SOLUTION
Elasticsearch is a flexible and powerful open-
source, distributed, real-time search and analytics
engine.
ELASTICSEARCH
Architecture
Node 1
Shard 1
Shard 2
Replica 3
Replica 4
Shard 3
Shard 4
Replica 1
Replica 2
Node 2
4 shards
...
Elasticsearch MySQL
Node Instance
Index Database
Type Table
Document Row
Attribute Column
ELASTICSEARCH
Nomenclature
PUT [localhost:9200]/pixers/photos/_mapping
{
"photos" : {
"properties" : {
"title" : {"type" : "string", "analyzer" : "pl...
localhost:9200/{index}/{type}/{document id}
PUT [localhost:9200]/pixers/photos/1
{
"title" : "Cute cat and dog sitting on ...
Searching
GET /pixers/photos/_search
{
"query" : {
"match" : {
"title" : "cat"
}
}
}
Real life query > >
ELASTICSEARCH
RES...
Query vs Filter
Query String
„likes:[10 to *] and title:(+cat –dog)”
Match – „funny cat”
Fuzzy – „funy cad”
More Like This...
Query vs Filter
Terms – [some, tags]
Range – likes > 10
Geo Distance
Lat=50; Lon=20; Distance=200m
ELASTICSEARCH
Searching
Query vs Filter
Nested
Bool
MUST/MUST NOT/SHOULD/SHOULD NOT
Function Score
ELASTICSEARCH
Searching
Aggregations
Get likes stats and histogram of
created_at date grouped by
categories.
terms: category
- stats: likes
- hist...
Contact me at:
antoniorfin@gmail.com
linkedin.com/in/antoniorfin
twitter.com/antoniorfin
www.pixersize.com
Thank you!
Ques...
Nächste SlideShare
Wird geladen in …5
×

Elasticsearch - Guide to Search

939 Aufrufe

Veröffentlicht am

Presentation covers concepts of full-text search and shows possibilites of Elasticsearch as a technology of choice to build an intelligent search engine with.

Presentation from the 2nd Wrocław's PHPErs Conference which took place on 10.08.2015.

Veröffentlicht in: Technologie
  • Loggen Sie sich ein, um Kommentare anzuzeigen.

Elasticsearch - Guide to Search

  1. 1. Elasticsearch Guide to search #1 Antoni Orfin antoniorfin@gmail.com
  2. 2. USE CASES 1. Intelligent search engines …learning on users behaviour „Search for cats that I would love from 3M database” …forgiving spelling mistakes „Search for Mihael Jakson photos and show Michael Jackson photos”
  3. 3. USE CASES 2. Autocomplete „Show the most relevant suggestions that starts with search…”
  4. 4. USE CASES 3. Geo-search (Geospatial) „Search for restaurants that are nearest to ”
  5. 5. USE CASES 4. Search by colors (ColorSearch) „Search for flowers that are ”
  6. 6. OLD SCHOOL Searching in MySQL SELECT * FROM photos WHERE title LIKE ”%cat%” SELECT * FROM photos WHERE title LIKE ”%cats%” Id [PK] title 1 Cute cat and dog 2 Cat plays with a dog 3 Cats playing piano … …. 3 000 000 Hidden cat
  7. 7. SEARCH THEORY Building Inverted Index Cute cat and dog #1 Cats playing piano #3 Term [PK] Id cute 1 cat 1, 2, 3 dog 1, 2 play 2, 3 … …. Cat plays with a dog #2
  8. 8. SEARCH THEORY Text Analysis Puppy and kitten with guinea pig 1. Tokenization [Puppy] [and] [kitten] [with] [guinea] [pig] 2. Filtering tokens [dog] [cat] [guinea] [pig] Two separate tokens? L
  9. 9. ASCII Folding – róża à roza Lowercase - Cat à cat Synonyms – kitten à cat puppy à dog Stopwords – common words to remove and, what, with, or Stemming - reducing inflected words to their base form cats -> cat fishing, fisher, fished -> fish SEARCH THEORY Text Analysis
  10. 10. Lekarz Chorób Wewnętrznych stemming Lekarz Choroba Wewnętrzny asciifolding, lowercase lekarz choroba wewnetrzny synonyms internista SEARCH THEORY Text Analysis
  11. 11. TECHNOLOGIES Search Engines Overview
  12. 12. SOLUTION Elasticsearch is a flexible and powerful open- source, distributed, real-time search and analytics engine.
  13. 13. ELASTICSEARCH Architecture Node 1 Shard 1 Shard 2 Replica 3 Replica 4 Shard 3 Shard 4 Replica 1 Replica 2 Node 2 4 shards 1 replica
  14. 14. Elasticsearch MySQL Node Instance Index Database Type Table Document Row Attribute Column ELASTICSEARCH Nomenclature
  15. 15. PUT [localhost:9200]/pixers/photos/_mapping { "photos" : { "properties" : { "title" : {"type" : "string", "analyzer" : "pl"}, ”categories" : {"type" : ”nested”, ...} } } } Types string, float, double, byte, short, integer, long, date nested geo_point geo_shape … etc … ELASTICSEARCH Mapping
  16. 16. localhost:9200/{index}/{type}/{document id} PUT [localhost:9200]/pixers/photos/1 { "title" : "Cute cat and dog sitting on books", "keywords": ["cat", "dog"] } GET [localhost:9200]/pixers/photos/1 DELETE [localhost:9200]/pixers/photos/1 ELASTICSEARCH REST API
  17. 17. Searching GET /pixers/photos/_search { "query" : { "match" : { "title" : "cat" } } } Real life query > > ELASTICSEARCH REST API
  18. 18. Query vs Filter Query String „likes:[10 to *] and title:(+cat –dog)” Match – „funny cat” Fuzzy – „funy cad” More Like This ELASTICSEARCH Searching
  19. 19. Query vs Filter Terms – [some, tags] Range – likes > 10 Geo Distance Lat=50; Lon=20; Distance=200m ELASTICSEARCH Searching
  20. 20. Query vs Filter Nested Bool MUST/MUST NOT/SHOULD/SHOULD NOT Function Score ELASTICSEARCH Searching
  21. 21. Aggregations Get likes stats and histogram of created_at date grouped by categories. terms: category - stats: likes - histogram: created_at ELASTICSEARCH Analytics
  22. 22. Contact me at: antoniorfin@gmail.com linkedin.com/in/antoniorfin twitter.com/antoniorfin www.pixersize.com Thank you! Questions & Answers

×