SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Back to the future :
SQL 92 for Elasticsearch ?
@LucianPrecup
@nosqlmatters #nosql14
2014-09-04
whoami
• CTO of Adelean (http://adelean.com/, http://www.elasticsearch.com/about/partners/)
• Integrate search, nosql and big data
technologies to support ETL, BI, data mining,
data processing and data visualization use
cases.
2014-09-04 2@LucianPrecup @nosqlmatters #nosql14
Poll - How many of you …
• Know SQL ?
• Are familiar with the NoSQL theory ?
• Are familiar with Elasticsearch ?
• Lucene ? Solr ?
• Used a NoSQL database or product ?
• Are remembering SQL 92 ?
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 3
SQL 92 ? NoSQL ?
SQL ? SQL 92 ? RDBMS ?
• SQL
– Structured Query Language
– Based on relational algebra
• Designed for RDMBSes
– Relational Database
Management Systems
• SQL 92
– 700 pages of specification
– Standardization
– No vendor lock in ?
NoSQL ? Elasticsearch ?
• NoSQL
– At first : the name of an event
– Distributed databases
– Horizontal scaling
• Standardization ?
• Polyglot persistence
• The language
– Low level : speak the “raw
data ” language
• Elasticsearch Query DSL
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 4
Why this presentation ?
• The title is voluntarily provocative
– Back in ‘92, the dream (or nightmare) of any
database vendor was to be SQL 92 compliant
• Good occasion to do a comparison
– And who knows : the history might repeat :-)
• Elasticsearch users often ask questions about
how to express a SQL query with Elasticsearch
– However this will not going to be exhaustive about
the subject
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 5
The "Query Optimizer"
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 6
SELECT DISTINCT offer_status FROM offer;
SELECT offer_status FROM offer GROUP by offer_status;
≡
SELECT O.id, O.label
FROM offer O
WHERE O.offer_status IN (
SELECT S.id FROM offer_status S)
SELECT O.id, O.label
FROM offer O, offer_status S
WHERE O.offer_status = S.id
≡
The "Query Optimizer"
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 7
SQL/RDBMS Power to the DBA
The "Query Optimizer"
NoSQL
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 8
SQL/RDBMS Power to the DBA
The "Query Optimizer"
NoSQL
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 9
SQL/RDBMS Power to the DBA
The "Query Optimizer"
NoSQL
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 10
SQL/RDBMS Power to the DBA
The "Query Optimizer"
SQL/RDBMS Power to the DBA
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 11
NoSQL
The "Query Optimizer"
SQL/RDBMS Power to the DBA NoSQL Power to the developer
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 12
“With great power comes great responsibility”
• The developer has to :
– Deal with query optimization
– Deal with data storage
– Take care about data consistency
– …
• But the developer can do better than the
query optimizer  adjusting (the data) to the
(very) specific needs
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 13
Great responsibility … with Elasticsearch
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 14
"fields": ["@timestamp"],
"from": 0, "size": 1,
"sort": [{ "@timestamp": { "order": "desc" }}],
"query": { "match_all": {} },
"filter": { "and": [
{"term": {"account": "you@me.org"}},
{"term": {"protocol": "http"}}
]
}
"from": 0, "size": 0,
"query": { "filtered": {"query": {"match_all": {}},
"filter": { "bool": { "must": [
{"term": {"account": "you@me.org"}},
{"term": {"protocol": "http"}}
]}}}
},
"aggs": {"LastTimestamp": {"max": {"field": "@timestamp"}}}
≡
What SQL 92 for Elasticsearch would imply ?
• Syntax  not important
• Focus on functionality
• Take advantage of the fact that the database is no
longer the center of the information system. The
service layer is.
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 15
Side by side - pagination
• Statement.execute()
• do while ResultSet.next()
– ResultSet.get()
• Otherwise: no standard
for pagination in SQL 92
• Pagination is at the core
of search engines
• Top n results are returned
fast and use cases usually
stop to that
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 16
As we will use this
difference in some
choices
Side by side - decimals
CREATE TABLE test_decimal(
salary_dec DECIMAL(5,2),
salary_double DOUBLE);
INSERT INTO test_decimal(
salary_dec, salary_double)
values (0.1, 0.1); X 10
SELECT SUM(salary_dec)
FROM test_decimal;
1.00
SELECT SUM(salary_double)
FROM test_decimal;
0.9999999999999999
PUT test_index/test_decimal/_mapping
"test_decimal" : {
"salary_float" : {"type" : "float" },
"salary_double" : {"type" : "double" },
"salary_string" : {"type" : "string", "index": "not_analyzed"
}
POST test_index/test_decimal
{"salary_float" : 0.1,"salary_double" : 0.1,"salary_string" :
"0.1"} X 10
POST test_index/test_decimal/_search
"size": 0, "aggs": {
"FloatTotal": {"sum": { "field" : "salary_float" }},
"DoubleTotal": {"sum": { "field" : "salary_double" }}
}
 "FloatTotal": {"value": 1.0000000149011612},
"DoubleTotal": {"value": 1}
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 17
As SQL 92 introduced
some new types
This fits
But 0.00001 X 10 does not
 0.00010000000000000002
Decimals for Elasticsearch – the solution
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 18
Multiply salary_dec by 100
Then use integers
Divide salary_dec by 100 !
Side by side – order by
• SELECT * FROM offer
ORDER BY price;
• SELECT (price_ex +
price_vat) AS price FROM
offer ORDER BY price;
• SELECT substring(concat(
value1, value2)) AS code
FROM table ORDER BY code
• "query": {"match_all": {}},
"sort": [{"price": {"order": "asc"}}]
• "function_score": {"boost_mode": "replace",
"script_score": {"script":
"doc['price_ex'].value + doc['price_vat'].value"}}
• Let’s do the computations at index time !
• Watch out for order by + pagination + distributed
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 19
Order by - computations at index time
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 20
Index substring(concat(
value1, value2)) as code
"sort": [{"code": {"order": "asc"}}]
Side by side - count
• SELECT COUNT(*)
FROM offer;
• SELECT COUNT(*)
FROM offer WHERE
price > 10;
• POST index/_count
{"query" : {"match_all": {}}}
• POST index/_count
"query": {"filtered": {
"filter": {"range": {"price": {"from": 10}}}}}
• POST index/_search
"size": 0,
"aggs": {"Total": {"value_count": { "field" :
"price" }}}
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 21
The simplest
aggregation
Side by side - other aggregations
• SELECT SUM(price)
FROM offer;
• SELECT AVG(price)
FROM offer;
• SELECT MAX(price)
FROM offer;
• POST index/_search
"size": 0,
"aggs": {"Total": {"sum": { "field" :
"price" }}}
• POST index/_search
"size": 0,
"aggs": {"Average": {"avg": { "field"
: "price" }}}
• POST index/_search
"size": 0,
"aggs": {"Maximum": {"max": {
"field" : "price" }}}
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 22
Side by side – distinct and group by
• SELECT DISTINCT
offer_status FROM
offer;
• SELECT * FROM offer
GROUP BY offer_status;
• "size": 0,
"aggs": {"Statuses": {"terms": {
"field" : "offer_status.raw" }}}
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 23
Side by side – distinct and group by
• SELECT * FROM offer
GROUP BY offer_status;
• "size": 0,
"aggs": {"Statuses": {"terms": { "field" :
"offer_status.raw" }}}
• "query": {"filtered": {
"filter": {"term": {"offer_status.raw": "on_line"}}}
"query": {"filtered": {
"filter": {"term": {"offer_status.raw": "off_line"}}}
• "size": 0,
"aggs": {"Statuses": {"terms": { "field" :
"offer_status.raw" },
"aggs": {"Top hits": {"top_hits": {"size": 10}}}}}
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 24
Implementing GROUP BY
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 25
Query 1: A terms aggregation
Query 2..N: Several terms queries
(grouped with the multi-search api)
With Elasticsearch 1.3.2 :
A terms aggregation
A top_hits sub aggregation
Side by side – joins
Normalized database Elasticsearch document
{"film" : {
"id" : "183070",
"title" : "The Artist",
"published" : "2011-10-12",
"genre" : ["Romance", "Drama", "Comedy"],
"language" : ["English", "French"],
"persons" : [
{"person" : { "id" : "5079", "name" : "Michel
Hazanavicius", "role" : "director" }},
{"person" : { "id" : "84145", "name" : "Jean
Dujardin", "role" : "actor" }},
{"person" : { "id" : "24485", "name" : "BĂŠrĂŠnice
Bejo", "role" : "actor" }},
{"person" : { "id" : "4204", "name" : "John
Goodman", "role" : "actor" }}
]
}}
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 26
The issue with joins :-)
• Let’s say you have two relational entities: Persons
and Contracts
– A Person has zero, one or more Contracts
– A Contract is attached to one or more Persons (eg. the
Subscriber, the Grantee, …)
• Need a search services :
– S1: getPersonsDetailsByContractProperties
– S2: getContractsDetailsByPersonProperties
• Simple solution with SQL:
SELECT P.* FROM P, C WHERE P.id = C.pid AND C.a = 'A‘
SELECT C.* FROM P, C WHERE P.id = C.pid AND P.a = 'A'
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 27
The issue with joins - solutions
• Solution 1
– Index Persons with Contracts together for S1
{"person" : { "details" : …, … , "contracts" : ["contract" :{"id" : 1, …}, …] }}
– Index Contracts with Persons together for S2
{"contract" : { "details" : …, …, "persons" : ["person" :{"id" : 1, "role" : "S", …}, …]}}
• Issues with solution 1:
– A lot of data duplication
– Have to get Contracts when indexing Persons and vice-versa
• Solution 2
– Elasticsearch’s Parent/Child
• Issues with solution 2:
– Works in one way but not the other (only one parent for n children, a 1 to n relationship)
• Solution 3
– Index Persons and Contracts separately
– Launch two Elasticsearch queries to get the response
– For S1 : First get all Contract ids by Contract properties, then get Persons by Contract ids (terms
query or mget)
– For S2 : First get all Persons ids by Person properties, then get Contracts by Person ids (terms
query or mget)
– The response to the second query can be returned “as is” to the client (pagination, etc.)
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 28
Side by side - having
• SELECT *, SUM(price)
FROM offer
GROUP BY offer_status
HAVING AVG(price) > 10;
• "size": 0,
"aggs": {
"Status": {"terms": {"field": "offer_status"},
"aggs": {
"Average": {"avg": {"field": "price_ht"}}}}
}
• "query": {
"filtered": {"filter": {
"terms": {"offer_status": ["on_line"]}}}},
"aggs": {
"Status": {"terms": {"field": "offer_status"},
"aggs": {
"Total": {"sum": {"field": "price_ht"}}}}}
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 29
Also specified
by SQL 92
Implementing HAVING
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 30
1/ Query 1: A terms aggregation and an avg sub-aggregation
2/ Pick terms that match the HAVING clause
3/ Query 2: A filtered query on previous terms + terms
aggregation + sum sub-aggregation
4/ Construct the result from hits + lookup in the
corresponding aggregation
Conclusion
• The service layer is the center of the system
• The developer has the power :-)
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 31
Thank you
Q & A

Weitere ähnliche Inhalte

Andere mochten auch

COMPUTADORA
COMPUTADORACOMPUTADORA
COMPUTADORA
YherZhon GL
 
Picasso
PicassoPicasso
Picasso
eritxane
 

Andere mochten auch (12)

COMPUTADORA
COMPUTADORACOMPUTADORA
COMPUTADORA
 
Portraits
PortraitsPortraits
Portraits
 
Picasso
PicassoPicasso
Picasso
 
Process of food video production
Process of food video productionProcess of food video production
Process of food video production
 
Labio Leporino
Labio LeporinoLabio Leporino
Labio Leporino
 
Eye candyto foodphoto101
Eye candyto foodphoto101Eye candyto foodphoto101
Eye candyto foodphoto101
 
Process of food styling & food photography
Process of food styling & food photographyProcess of food styling & food photography
Process of food styling & food photography
 
Expo 2-anatomia-dentaria-aplicada
Expo 2-anatomia-dentaria-aplicadaExpo 2-anatomia-dentaria-aplicada
Expo 2-anatomia-dentaria-aplicada
 
Caries dental
Caries dentalCaries dental
Caries dental
 
Portfolio illustration a3
Portfolio illustration a3Portfolio illustration a3
Portfolio illustration a3
 
Graphic Design & Illustration
Graphic Design & IllustrationGraphic Design & Illustration
Graphic Design & Illustration
 
Exercices corrigĂŠs de la comptabilitĂŠ des sociĂŠtĂŠs la constitution des sa
Exercices corrigĂŠs de la comptabilitĂŠ des sociĂŠtĂŠs la constitution des saExercices corrigĂŠs de la comptabilitĂŠ des sociĂŠtĂŠs la constitution des sa
Exercices corrigĂŠs de la comptabilitĂŠ des sociĂŠtĂŠs la constitution des sa
 

Ähnlich wie Back to the future : SQL 92 for Elasticsearch ? @nosqlmatters Dublin 2014

conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalyticsconf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
Tom LaGatta
 
Singpore Oracle Sessions III - What is truly useful in Oracle Database 12c fo...
Singpore Oracle Sessions III - What is truly useful in Oracle Database 12c fo...Singpore Oracle Sessions III - What is truly useful in Oracle Database 12c fo...
Singpore Oracle Sessions III - What is truly useful in Oracle Database 12c fo...
Lucas Jellema
 

Ähnlich wie Back to the future : SQL 92 for Elasticsearch ? @nosqlmatters Dublin 2014 (20)

Back to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Back to the future : SQL 92 for Elasticsearch @nosqlmatters ParisBack to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Back to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
 
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
 
Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.
 
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalyticsconf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
 
Eventually Elasticsearch: Eventual Consistency in the Real World
Eventually Elasticsearch: Eventual Consistency in the Real WorldEventually Elasticsearch: Eventual Consistency in the Real World
Eventually Elasticsearch: Eventual Consistency in the Real World
 
Search and nosql for information management @nosqlmatters Cologne
Search and nosql for information management @nosqlmatters CologneSearch and nosql for information management @nosqlmatters Cologne
Search and nosql for information management @nosqlmatters Cologne
 
Elasticsearch for Data Engineers
Elasticsearch for Data EngineersElasticsearch for Data Engineers
Elasticsearch for Data Engineers
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
 
SQL Tutorial for Marketers
SQL Tutorial for MarketersSQL Tutorial for Marketers
SQL Tutorial for Marketers
 
Singpore Oracle Sessions III - What is truly useful in Oracle Database 12c fo...
Singpore Oracle Sessions III - What is truly useful in Oracle Database 12c fo...Singpore Oracle Sessions III - What is truly useful in Oracle Database 12c fo...
Singpore Oracle Sessions III - What is truly useful in Oracle Database 12c fo...
 
Introducing N1QL: New SQL Based Query Language for JSON
Introducing N1QL: New SQL Based Query Language for JSONIntroducing N1QL: New SQL Based Query Language for JSON
Introducing N1QL: New SQL Based Query Language for JSON
 
Manage online profiles with oracle no sql database tht10972 - v1.1
Manage online profiles with oracle no sql database   tht10972 - v1.1Manage online profiles with oracle no sql database   tht10972 - v1.1
Manage online profiles with oracle no sql database tht10972 - v1.1
 
When to NoSQL and when to know SQL
When to NoSQL and when to know SQLWhen to NoSQL and when to know SQL
When to NoSQL and when to know SQL
 
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
 
Application development with Oracle NoSQL Database 3.0
Application development with Oracle NoSQL Database 3.0Application development with Oracle NoSQL Database 3.0
Application development with Oracle NoSQL Database 3.0
 
Everything That Is Really Useful in Oracle Database 12c for Application Devel...
Everything That Is Really Useful in Oracle Database 12c for Application Devel...Everything That Is Really Useful in Oracle Database 12c for Application Devel...
Everything That Is Really Useful in Oracle Database 12c for Application Devel...
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
 
Introduction to SQL Server Graph DB
Introduction to SQL Server Graph DBIntroduction to SQL Server Graph DB
Introduction to SQL Server Graph DB
 
Do More with Postgres- NoSQL Applications for the Enterprise
Do More with Postgres- NoSQL Applications for the EnterpriseDo More with Postgres- NoSQL Applications for the Enterprise
Do More with Postgres- NoSQL Applications for the Enterprise
 
Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization
 

Mehr von Lucian Precup

ALM et Agilite : la convergence
ALM et Agilite : la convergenceALM et Agilite : la convergence
ALM et Agilite : la convergence
Lucian Precup
 
La revue de code : agile, lean, indispensable !
La revue de code : agile, lean, indispensable !La revue de code : agile, lean, indispensable !
La revue de code : agile, lean, indispensable !
Lucian Precup
 

Mehr von Lucian Precup (8)

Enrich data and rewrite queries with the Elasticsearch percolator
Enrich data and rewrite queries with the Elasticsearch percolatorEnrich data and rewrite queries with the Elasticsearch percolator
Enrich data and rewrite queries with the Elasticsearch percolator
 
Joins in a distributed world Distributed Matters Barcelona 2015
Joins in a distributed world Distributed Matters Barcelona 2015Joins in a distributed world Distributed Matters Barcelona 2015
Joins in a distributed world Distributed Matters Barcelona 2015
 
Search, nosql et bigdata avec les moteurs de recherche
Search, nosql et bigdata avec les moteurs de rechercheSearch, nosql et bigdata avec les moteurs de recherche
Search, nosql et bigdata avec les moteurs de recherche
 
ALM et Agilite : la convergence
ALM et Agilite : la convergenceALM et Agilite : la convergence
ALM et Agilite : la convergence
 
La revue de code : facile !
La revue de code : facile !La revue de code : facile !
La revue de code : facile !
 
La revue de code : agile, lean, indispensable !
La revue de code : agile, lean, indispensable !La revue de code : agile, lean, indispensable !
La revue de code : agile, lean, indispensable !
 
Moteurs de recherche et Lucene at LorraineJUG
Moteurs de recherche et Lucene at LorraineJUGMoteurs de recherche et Lucene at LorraineJUG
Moteurs de recherche et Lucene at LorraineJUG
 
Solr and Elasticsearch in Action (at Breizhcamp)
Solr and Elasticsearch in Action (at Breizhcamp)Solr and Elasticsearch in Action (at Breizhcamp)
Solr and Elasticsearch in Action (at Breizhcamp)
 

KĂźrzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

KĂźrzlich hochgeladen (20)

Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Back to the future : SQL 92 for Elasticsearch ? @nosqlmatters Dublin 2014

  • 1. Back to the future : SQL 92 for Elasticsearch ? @LucianPrecup @nosqlmatters #nosql14 2014-09-04
  • 2. whoami • CTO of Adelean (http://adelean.com/, http://www.elasticsearch.com/about/partners/) • Integrate search, nosql and big data technologies to support ETL, BI, data mining, data processing and data visualization use cases. 2014-09-04 2@LucianPrecup @nosqlmatters #nosql14
  • 3. Poll - How many of you … • Know SQL ? • Are familiar with the NoSQL theory ? • Are familiar with Elasticsearch ? • Lucene ? Solr ? • Used a NoSQL database or product ? • Are remembering SQL 92 ? 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 3
  • 4. SQL 92 ? NoSQL ? SQL ? SQL 92 ? RDBMS ? • SQL – Structured Query Language – Based on relational algebra • Designed for RDMBSes – Relational Database Management Systems • SQL 92 – 700 pages of specification – Standardization – No vendor lock in ? NoSQL ? Elasticsearch ? • NoSQL – At first : the name of an event – Distributed databases – Horizontal scaling • Standardization ? • Polyglot persistence • The language – Low level : speak the “raw data ” language • Elasticsearch Query DSL 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 4
  • 5. Why this presentation ? • The title is voluntarily provocative – Back in ‘92, the dream (or nightmare) of any database vendor was to be SQL 92 compliant • Good occasion to do a comparison – And who knows : the history might repeat :-) • Elasticsearch users often ask questions about how to express a SQL query with Elasticsearch – However this will not going to be exhaustive about the subject 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 5
  • 6. The "Query Optimizer" 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 6 SELECT DISTINCT offer_status FROM offer; SELECT offer_status FROM offer GROUP by offer_status; ≡ SELECT O.id, O.label FROM offer O WHERE O.offer_status IN ( SELECT S.id FROM offer_status S) SELECT O.id, O.label FROM offer O, offer_status S WHERE O.offer_status = S.id ≡
  • 7. The "Query Optimizer" 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 7 SQL/RDBMS Power to the DBA
  • 8. The "Query Optimizer" NoSQL 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 8 SQL/RDBMS Power to the DBA
  • 9. The "Query Optimizer" NoSQL 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 9 SQL/RDBMS Power to the DBA
  • 10. The "Query Optimizer" NoSQL 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 10 SQL/RDBMS Power to the DBA
  • 11. The "Query Optimizer" SQL/RDBMS Power to the DBA 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 11 NoSQL
  • 12. The "Query Optimizer" SQL/RDBMS Power to the DBA NoSQL Power to the developer 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 12
  • 13. “With great power comes great responsibility” • The developer has to : – Deal with query optimization – Deal with data storage – Take care about data consistency – … • But the developer can do better than the query optimizer  adjusting (the data) to the (very) specific needs 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 13
  • 14. Great responsibility … with Elasticsearch 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 14 "fields": ["@timestamp"], "from": 0, "size": 1, "sort": [{ "@timestamp": { "order": "desc" }}], "query": { "match_all": {} }, "filter": { "and": [ {"term": {"account": "you@me.org"}}, {"term": {"protocol": "http"}} ] } "from": 0, "size": 0, "query": { "filtered": {"query": {"match_all": {}}, "filter": { "bool": { "must": [ {"term": {"account": "you@me.org"}}, {"term": {"protocol": "http"}} ]}}} }, "aggs": {"LastTimestamp": {"max": {"field": "@timestamp"}}} ≡
  • 15. What SQL 92 for Elasticsearch would imply ? • Syntax  not important • Focus on functionality • Take advantage of the fact that the database is no longer the center of the information system. The service layer is. 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 15
  • 16. Side by side - pagination • Statement.execute() • do while ResultSet.next() – ResultSet.get() • Otherwise: no standard for pagination in SQL 92 • Pagination is at the core of search engines • Top n results are returned fast and use cases usually stop to that 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 16 As we will use this difference in some choices
  • 17. Side by side - decimals CREATE TABLE test_decimal( salary_dec DECIMAL(5,2), salary_double DOUBLE); INSERT INTO test_decimal( salary_dec, salary_double) values (0.1, 0.1); X 10 SELECT SUM(salary_dec) FROM test_decimal; 1.00 SELECT SUM(salary_double) FROM test_decimal; 0.9999999999999999 PUT test_index/test_decimal/_mapping "test_decimal" : { "salary_float" : {"type" : "float" }, "salary_double" : {"type" : "double" }, "salary_string" : {"type" : "string", "index": "not_analyzed" } POST test_index/test_decimal {"salary_float" : 0.1,"salary_double" : 0.1,"salary_string" : "0.1"} X 10 POST test_index/test_decimal/_search "size": 0, "aggs": { "FloatTotal": {"sum": { "field" : "salary_float" }}, "DoubleTotal": {"sum": { "field" : "salary_double" }} }  "FloatTotal": {"value": 1.0000000149011612}, "DoubleTotal": {"value": 1} 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 17 As SQL 92 introduced some new types This fits But 0.00001 X 10 does not  0.00010000000000000002
  • 18. Decimals for Elasticsearch – the solution 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 18 Multiply salary_dec by 100 Then use integers Divide salary_dec by 100 !
  • 19. Side by side – order by • SELECT * FROM offer ORDER BY price; • SELECT (price_ex + price_vat) AS price FROM offer ORDER BY price; • SELECT substring(concat( value1, value2)) AS code FROM table ORDER BY code • "query": {"match_all": {}}, "sort": [{"price": {"order": "asc"}}] • "function_score": {"boost_mode": "replace", "script_score": {"script": "doc['price_ex'].value + doc['price_vat'].value"}} • Let’s do the computations at index time ! • Watch out for order by + pagination + distributed 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 19
  • 20. Order by - computations at index time 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 20 Index substring(concat( value1, value2)) as code "sort": [{"code": {"order": "asc"}}]
  • 21. Side by side - count • SELECT COUNT(*) FROM offer; • SELECT COUNT(*) FROM offer WHERE price > 10; • POST index/_count {"query" : {"match_all": {}}} • POST index/_count "query": {"filtered": { "filter": {"range": {"price": {"from": 10}}}}} • POST index/_search "size": 0, "aggs": {"Total": {"value_count": { "field" : "price" }}} 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 21 The simplest aggregation
  • 22. Side by side - other aggregations • SELECT SUM(price) FROM offer; • SELECT AVG(price) FROM offer; • SELECT MAX(price) FROM offer; • POST index/_search "size": 0, "aggs": {"Total": {"sum": { "field" : "price" }}} • POST index/_search "size": 0, "aggs": {"Average": {"avg": { "field" : "price" }}} • POST index/_search "size": 0, "aggs": {"Maximum": {"max": { "field" : "price" }}} 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 22
  • 23. Side by side – distinct and group by • SELECT DISTINCT offer_status FROM offer; • SELECT * FROM offer GROUP BY offer_status; • "size": 0, "aggs": {"Statuses": {"terms": { "field" : "offer_status.raw" }}} 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 23
  • 24. Side by side – distinct and group by • SELECT * FROM offer GROUP BY offer_status; • "size": 0, "aggs": {"Statuses": {"terms": { "field" : "offer_status.raw" }}} • "query": {"filtered": { "filter": {"term": {"offer_status.raw": "on_line"}}} "query": {"filtered": { "filter": {"term": {"offer_status.raw": "off_line"}}} • "size": 0, "aggs": {"Statuses": {"terms": { "field" : "offer_status.raw" }, "aggs": {"Top hits": {"top_hits": {"size": 10}}}}} 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 24
  • 25. Implementing GROUP BY 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 25 Query 1: A terms aggregation Query 2..N: Several terms queries (grouped with the multi-search api) With Elasticsearch 1.3.2 : A terms aggregation A top_hits sub aggregation
  • 26. Side by side – joins Normalized database Elasticsearch document {"film" : { "id" : "183070", "title" : "The Artist", "published" : "2011-10-12", "genre" : ["Romance", "Drama", "Comedy"], "language" : ["English", "French"], "persons" : [ {"person" : { "id" : "5079", "name" : "Michel Hazanavicius", "role" : "director" }}, {"person" : { "id" : "84145", "name" : "Jean Dujardin", "role" : "actor" }}, {"person" : { "id" : "24485", "name" : "BĂŠrĂŠnice Bejo", "role" : "actor" }}, {"person" : { "id" : "4204", "name" : "John Goodman", "role" : "actor" }} ] }} 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 26
  • 27. The issue with joins :-) • Let’s say you have two relational entities: Persons and Contracts – A Person has zero, one or more Contracts – A Contract is attached to one or more Persons (eg. the Subscriber, the Grantee, …) • Need a search services : – S1: getPersonsDetailsByContractProperties – S2: getContractsDetailsByPersonProperties • Simple solution with SQL: SELECT P.* FROM P, C WHERE P.id = C.pid AND C.a = 'A‘ SELECT C.* FROM P, C WHERE P.id = C.pid AND P.a = 'A' 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 27
  • 28. The issue with joins - solutions • Solution 1 – Index Persons with Contracts together for S1 {"person" : { "details" : …, … , "contracts" : ["contract" :{"id" : 1, …}, …] }} – Index Contracts with Persons together for S2 {"contract" : { "details" : …, …, "persons" : ["person" :{"id" : 1, "role" : "S", …}, …]}} • Issues with solution 1: – A lot of data duplication – Have to get Contracts when indexing Persons and vice-versa • Solution 2 – Elasticsearch’s Parent/Child • Issues with solution 2: – Works in one way but not the other (only one parent for n children, a 1 to n relationship) • Solution 3 – Index Persons and Contracts separately – Launch two Elasticsearch queries to get the response – For S1 : First get all Contract ids by Contract properties, then get Persons by Contract ids (terms query or mget) – For S2 : First get all Persons ids by Person properties, then get Contracts by Person ids (terms query or mget) – The response to the second query can be returned “as is” to the client (pagination, etc.) 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 28
  • 29. Side by side - having • SELECT *, SUM(price) FROM offer GROUP BY offer_status HAVING AVG(price) > 10; • "size": 0, "aggs": { "Status": {"terms": {"field": "offer_status"}, "aggs": { "Average": {"avg": {"field": "price_ht"}}}} } • "query": { "filtered": {"filter": { "terms": {"offer_status": ["on_line"]}}}}, "aggs": { "Status": {"terms": {"field": "offer_status"}, "aggs": { "Total": {"sum": {"field": "price_ht"}}}}} 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 29 Also specified by SQL 92
  • 30. Implementing HAVING 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 30 1/ Query 1: A terms aggregation and an avg sub-aggregation 2/ Pick terms that match the HAVING clause 3/ Query 2: A filtered query on previous terms + terms aggregation + sum sub-aggregation 4/ Construct the result from hits + lookup in the corresponding aggregation
  • 31. Conclusion • The service layer is the center of the system • The developer has the power :-) 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 31

Hinweis der Redaktion

  1. TODO review this
  2. The *famous* Query Optimizer versus *the* developer
  3. The *famous* Query Optimizer versus *the* developer
  4. The *famous* Query Optimizer versus *the* developer
  5. The *famous* Query Optimizer versus *the* developer
  6. The *famous* Query Optimizer versus *the* developer
  7. The *famous* Query Optimizer versus *the* developer
  8. The *famous* Query Optimizer versus *the* developer