SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Elasticsearch w
ekosystemie Allegro
Andrzej Wisłowski, Paweł Bobruk
O nas:
zespół zajmujący się dostarczaniem zespołom
Elasticsearch jak usługę w Grupie Allegro
infrastruktura
konsultacje
szkolenia wewnętrzne
Agenda
Dlaczego elasticsearch
Teoria wyszukiwania informacji
Agregacje
Przypadki użycia w Allegro
Infrastruktura klastrów ELS w Allegro
Nowości w elasticsearch 2.0
Dlaczego
elasticsearch
Real-Time data & analytics
System Rozproszony
Wyszukiwanie pełnotekstowe
Restowe API
Dokumenty JSONowe
Teoria
wyszukiwania
informacji
Wyszukiwanie informacji
Wyszukiwanie informacji (information
retrivial) jest pozyskaniem istotnych dla
szukającego informacji (zazwyczaj
dokumentów) z ze zbioru danych
informatycznych (zazwyczaj
niestrukturalnych).
Odwrócony indeks
Dokument - element, na którym budujemy
system
(np. produkt w sklepie, strona internetowa)
Term - element indeksu, zazwyczaj
pojedyncze słowo zestaw termów tworzy
słownik indeksu
Wyszukiwanie pełnotekstowe
Term Doc_1 Doc_2 Doc_3
------------------------------------
brown | X | X |
dog | X | |
dogs | | X | X
fox | X | | X
foxes | | X |
in | | X |
jumped | X | | X
lazy | X | X |
leap | | X |
over | X | X | X
quick | X | X | X
summer | | X |
the | X | | X
------------------------------------
Odwrócony index
Doc_1: The quick brown fox jumped over the lazy dog
Doc_2: Quick brown foxes leap over lazy dogs in summer
Doc_3: The quick fox jumped over dogs
Analizatory
Analizatory
- Tokenizer
- whitespace - tokeny tylko po białych znakach
- standard - tokeny po białych znakach + znakach interpunkcji
- keyword - np. dla marek “Hugo Boss”
- regexp - własna definicja
- Token Filter
- lowercase
- stopword
Wyszukiwanie pełnotekstowe
GET /my_index/_search
{
"query" : {
"match" : {
"body" : "brown"
}
}
}
Term Doc_1 Doc_2 Doc_3
------------------------------------
brown | X | X |
dog | X | | X
dogs | | X | X
fox | X | | X
foxes | | X |
in | | X |
jumped | X | | X
lazy | X | X |
leap | | X |
over | X | X | X
quick | X | X | X
summer | | X |
the | X | | X
------------------------------------
Odwrócony index
Agregacje
Term Doc_1 Doc_2 Doc_3
------------------------------------
brown | X | X |
dog | X | | X
dogs | | X | X
fox | X | | X
foxes | | X |
in | | X |
jumped | X | | X
lazy | X | X |
leap | | X |
over | X | X | X
quick | X | X | X
summer | | X |
the | X | | X
------------------------------------
Odwrócony index
GET /my_index/_search
{
"query" : {
"match" : {
"body" : "brown"
}
},
"aggs" : {
"popular_terms": {
"terms" : {
"field" : "body"
}
}
}
}
Field data
Doc Terms
-----------------------------------------------------------------
Doc_1 | brown, dog, fox, jumped, lazy, over, quick, the
Doc_2 | brown, dogs, foxes, in, lazy, leap, over, quick, summer
Doc_3 | dog, dogs, fox, jumped, over, quick, the
-----------------------------------------------------------------
doc values - field data na dysku
Field data
Doc Terms
-----------------------------------------------------------------
Doc_1 | brown, dog, fox, jumped, lazy, over, quick, the
Doc_2 | brown, dogs, foxes, in, lazy, leap, over, quick, summer
Doc_3 | dog, dogs, fox, jumped, over, quick, the
-----------------------------------------------------------------
GET /my_index/_search
{
"query" : {
"match" : {
"body" : "brown"
}
},
"aggs" : {
"popular_terms": {
"terms" : {
"field" : "body"
}
}
}
}
Term Doc_1 Doc_2 Doc_3
------------------------------------
brown | X | X |
dog | X | | X
dogs | | X | X
...
-----------------------------------
{
"hits": {
"total": 2,
...
},
"aggregations": {
"popular_terms": {
"buckets": [
{
"key": "brown",
"doc_count": 2
},
{
"key": "lazy",
"doc_count": 2
},
{
"key": "over",
"doc_count": 2
},
{
"key": "quick",
"doc_count": 2
},
{
"key": "dog",
"doc_count": 1
},
...
]
}}}
Inverted index
Agregacje
Metrics
min
max
sum
avg
value_count
sum_of_squares
std_deviation
variance
terms
agregacja po polu
Najczęściej wyszukiwane słowa
{
"aggs": {
"mostCommonSearched": {
"terms": {
"field": "phrase"
}
}
}
}
{
"key": "samsung",
"doc_count": 3321830
},
{
"key": "galaxy",
"doc_count": 2664985
},
{
"key": "audi",
"doc_count": 2343937
},
{
"key": "nike",
"doc_count": 2234019
},
...
agregacje
zagłębione
Najczęściej wyszukiwane per
kategoria
{
"aggs": {
"category": {
"terms": {
"field": "category",
"size": 5
},
"aggs": {
"mostCommonSearched": {
"terms": {
"field": "phrase",
"size": 5
}
}
}
}
Main Page Motoryzacja Książki i Komiksy Komputery Odzież, Obuwie,
Dodatki
samsung audi testy lenovo nike
nike bmw historia tablet adidas
galaxy mercedes matematyka office buty
buty opel nauczyciela microsoft air
audi honda księga gtx sukienka
stats
count, min, max, avg, sum
Statystyki cen per kategoria
{
"aggs": {
"category": {
"terms": {
"field":
"category",
"size": 10
},
"aggs": {
"stats": {
"stats": {
"field": "price"
}
}
}
}
}
{
"key": "Oświetlenie/Lampy",
"stats": {
"count": 157452,
"min": 1,
"max": 66479,
"avg": 297.42634631508975,
"sum": 46830373.080003515
}
},
...
percentile
{
"key": "Oświetlenie/Lampy",
"percentile": {
"values": {
"50.0": 162.65936064485723,
"75.0": 330.90439138409306,
"95.0": 980.0905118431161,
"99.0": 2147.587473335171
}
}
},
...
Ceny w percentylach per kategoria
{
"aggs": {
"category": {
"terms": {
"field":
"category.path",
"size": 10
},
"aggs": {
"percentile": {
"percentiles": {
"field": "price",
"percents": [
50,
75,
95,
99
]
date_histogram
statystyka wg daty
Nowe produkty Sprzedawcy
{
"aggs": {
"date": {
"date_histogram": {
"field": "createdAt",
"interval": "month"
}
}
}
}
{
"key":
“April”,
"doc_count": 13
},
{
"key":
“May”,
"doc_count": 40
},
{
"key":
“June”,
"doc_count": 28
script
Koniec aukcji wg dni tygodnia
{
"query": {
"aggs": {
"auctionsPerWeekDay": {
"terms": {
"script":
“doc['endDate'].date.dayOfWeek().getAsText(
)",
}
}
}
}
Niedziela 3672219
Sobota 3144495
Poniedziałek 3090514
Piątek 2823472
Środa 2729971
Czwartek 2724953
geo
Punkty odbioru w promieniu 1,5 km
{
"aggs": {
"byPlace": {
"geo_distance": {
"field": "location",
"origin": "52.2401,21.0421",
"ranges": [
{ "to": 700},
{"from": 700, "to":
1000},
{"from": 1000, "to": 1500}
],
"unit": "m"
}
}
}
}
{
"key": "*-700.0",
"from": 0,
"to": 700,
"doc_count": 1
},
{
"key": "700.0-1000.0",
"from": 700,
"to": 1000,
"doc_count": 6
} ,
{
"key": "1000.0-1500.0",
"from": 1000,
"to": 1500,
"doc_count": 12
}
pipeline v 2.0
Pipeline
statystyczne (min, max, avg, ...)
percentile
cumulative
moving
...
Moving
Elasticsearch w
Allegro
Rekomendacje
Rekomendacje
- indeksowane są oferty allegro zawierające id meta
produktu oraz ranking (score) oznaczający jakość oferty
{
“name”: “iphone 6, super oferta”
offerId: “23”,
metaProductId: “p12”,
score: 0,324
}
- system rekomendacji wybiera meta produkty które mają
być zarekomendowane użytkownikowi
Rekomendacje
{
"aggregations": {
"by_meta_product_id": {
"buckets": [
{ "key": "p12",
"top_n_items": {
"hits": {
"total": 15,
"max_score": null,
"hits": [
{ { "name": "iphone 6s" },
"sort": [ 0.1485 ]
},
{ { "name": "iphone 6s nowość", },
"sort": [ 0.1348 ]
},
...
]
}}}]}}}
Problemy- optimize
- w Lucene zmiana dokumentu to usunięcie i dodanie
nowego
- przy częstych zmianach indeksu, rośnie ilość
skasowanych dokumentów, przy przekroczeniu pewnego
progu - u nas 20% ma to wpływ na czasy odpowiedzi
- dlatego raz na dobę wymuszamy optimize indexu
Wydajność
- wielkość indeksu: 25 mln dokumentów
- odczyty > 3000 rps
- czasy odpowiedzi : < 500ms (p99)
Skalujemy odczyty przez większą liczbę replik.
http://hermes.allegro.tech/
https://github.com/allegro/hermes
Hermes
wykorzystanie do audytów “eventów”, zapisywane są
wysłane i opublikowane “eventy”
wydajne zapisy ponad 4000 rps
indeksy timeseries (per dzień), łatowść kasowania
około 1,8 mld dokumentów
zapisy bulkami
DWH
- KIBANA - analityczne - biznes potrafi używać
- marketing / reklama
- 4 mld dokumentów
- 2000 indeksów
- indeksy timeseries z użyciem aliasów
Bilingi
Billingi serwisów allegro
około 400 mln dokumentów
indeksowań 100 rps
użycie timeseries i aliasów
routing po użytkowniku
Punkty odbioru - GEO
wyszukiwanie geolokalizacyjne
Setup
klastrów
Setup cloudowy
prywatny cloud oparty na openstacku
ansible do automatyzacji stawiania klastrów (łatwe
rozszerzanie)
backupy na hdfsa - rozszerzyliśmy o autoryzację kerberos
monitoring oparty na graphite, cabot i pagerduty
Narzędzie do reindeksacji
https://github.com/allegro/elasticsearch-reindex-tool
Nowości w
elasticsearch 2.0
Elasticsearch 2.0
Agregacje Pipeline
uproszczone query API - (filtry zmergowane w query)
konfigurowalne kompresowanie (LZ4, DEFLATE)
doc-values - domyślne
Shield na poziomie pól i dokumentów (płatny)
Marvel opensoursed
Sense opensourced
Pytania

Weitere ähnliche Inhalte

Empfohlen

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

allegro.tech Data Science Meetup #2: Elasticsearch w praktyce

Hinweis der Redaktion

  1. dostarczanie infrastrukury konsultacje szkolenia
  2. ten obrazek do zmiany na inny
  3. czy screenshot z sensa, curl np. jak w slajdzie analyze api: !jeden slajd. (Nie ma narzuconej schemy)! Format Json
  4. X
  5. X
  6. Elasitcsearch skaluje zapis przez zwiększenie ilości shardów.
  7. zmienić font
  8. nody hot i cold; przeglądarkowy klient - sense!