SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
Parallel SQL
Joel Bernstein
Search Engineer, Alfresco
jbernste@apache.org
3
03
Introduction
•  Joel Bernstein
•  Lucene/Solr Committer
•  Search Engineer at Alfresco
•  Live and work in NYC
4
03
Alfresco
•  Open source ECM (Enterprise Content Management)
•  Alfresco is a system of record for documents
•  Uses Solr for search
•  1800+ customers
•  11 million active user accounts
•  Alfresco Solr: Document level access control,
eventually consistent, transactional,
multi-master, distributed search and faceting (coming
in Alfresco 5.1)
5
01
Agenda
1.  SQL Unleashed (What can it do?)
2. SQL Under the Hood (How does it work?)
6
01
SQL Unleashed
(In Solr 6.0)
7
01
Why SQL?
•  Solr has many awesome features.
•  But all of these feature create complexity.
•  Which faceting API to use? When to
Stream? Which parameters to use for
optimal performance?
•  The complexity level increases
dramatically when distributed joins come
into play
•  With SQL we can provide an optimizer to
choose the best query plan.
8
01
The SQL Interface at Glance
•  SQL over Map/Reduce: supports high
cardinality aggregations and
distributed joins.
•  SQL over Facets: high performance on
moderate cardinality aggregations.
•  SQL with Solr Search Predicates
•  SQL is fully integrated with SolrCloud
9
01
SQL Syntax: Limited and Unlimited SELECT
•  select colA, colB from tableB
•  select colA, colB from tableB limit 100
•  Unlimited selects return the entire result
set. Return fields must be DocValues.
•  Limited selects can sort by score and
retrieve any stored field.
10
01
SQL Syntax: ORDER BY
•  select a, b from tableB order by a desc,
b desc
•  Unlimited selects sort the entire result
set
11
01
The Predicate: Phrase Searching
•  select a, b from tableB where c = ‘hello
world’
•  Searches for the phrase ‘hello world’ in
field c.
12
01
The Predicate: Boolean searching
•  select a, b from tableB where c = ‘(hello world)’
•  Adding parens searches for (hello OR world).
•  Supports Solr query syntax inside the parens.
13
01
The Predicate: Range query
•  select a, b from tableB where c = ‘[0 TO 100]’
14
01
The Predicate: Arbitrary Boolean clauses
•  select a, b from tableB where (c = ‘hello
world’ AND d = ‘[0 TO 100]’)
15
01
SQL Syntax: Select Distinct
•  select distinct a, b from tableB
•  Map/Reduce Implementation: Tuples
•  are shuffled to worker nodes where the
distinct operation is performed.
•  JSON Facet Implementation: distinct
operation is pushed down into the
search engine
•  Map/Reduce for high cardinality
•  Facet for high QPS
16
01
Shuffle vs Push Down
•  Shuffling: high cardinality and
parallel relational algebra
(Distributed Joins)
•  Pushdown (Facet): blazing fast, high
QPS, moderate cardinality
•  aggregationMode flag is available with
the JDBC driver and http interface
[map_reduce or facet]
17
01
Aggregations: Stats
•  select count(*), sum(a) from tableA
•  Uses the StatsComponent under the covers
•  Initial release supports count, sum, avg, min,
max
•  Aggregation logic is always
pushed down into the search engine.
18
01
Aggregations: GROUP BY
•  select a, b count(*), sum(c) from tableB group by
a, b having count(*) > 50 order by sum(c) desc
•  Supports complex having clause: having (count(*)
> 50 AND sum(b) < 1000)
•  Has Map/Reduce implementation (shuffle)
•  And JSON Facet implementation (push down)
•  Map/Reduce can handle high cardinality multi-
dimension aggregations.
19
01
JDBC Driver
•  Ships with Solrj
•  Poolable Connection and Statement
•  SolrCloud Aware Load Balancing
•  Connection has aggregationMode
switch [map_reduce or facet]
20
01
SQL Under the Hood
21
01
SQL Parsing
•  Presto SQL Parser handles the parsing
•  SQL Statements are compiled to
TupleStream objects
•  The TupleStream is the base interface of the
Streaming API
•  The Streaming API is a general purpose
parallel computing API for SolrCloud
22
01
Parallel Computing Framework
•  Shuffling
•  Worker Collections
•  Streaming API
•  Streaming Expressions
•  Parallel SQL
23
01
Shuffling (sorting & partitioning)
•  Shuffling is pushed down into the search engine
•  Sorting: /export handler “stream sorts”
entire result sets.
•  Partitioning: HashQParserPlugin, hash
partitioning filter. Partitions results on
arbitrary fields.
•  Tuples (search results) begin streaming
instantly to worker nodes. Shuffling
never requires a spill to disk.
•  All replicas shuffle in parallel for the same
query. Allows for massive throughput.
24
01
Shuffling (sorting & partitioning)
Worker 2Worker 1
Shard 1
Replica 1
Shard 2
Replica 1
Shard 1
Replica 2
Shard 2
Replica 2
Client
Each worker is
shuffled ½
the result set
Tuples are
sorted and
partitioned on
keys
25
01
Worker Collections
•  Are Generic SolrCloud Collections
•  Can hold data, or just perform work
•  Search results are shuffled to the
workers
•  Configured with the /stream handler
26
01
Streaming API
•  Java Programming API for the parallel
computing framework
•  Real-time Map/Reduce and Parallel
Relational Algebra
•  Abstracts search results as Streams of
tuples (TupleStream)
•  Streams are transformed in parallel by
pluggable Decorator streams.
•  Parallel transformations include: group by, roll
up, union, intersect, complement and join
27
01
Streaming Expressions
•  Contributed by Dennis Gove (Bloomberg)
•  String Query Language and Serialization
format for the Streaming API
•  Streaming Expressions compile to
TupleStreams
•  TupleStreams serialize to
Streaming Expressions
28
01
Parallel SQL
•  Compiles SQL to a TupleStream
•  The TupleStream is serialized to a
Streaming Expression and sent to
worker nodes.
•  Worker nodes translate the Streaming
Expression back into TupleStream
•  Worker nodes open() and read() the
TupleStream in parallel. Tuples are
returned from each worker
29
01
From SQL to Streaming Expression
select str_s, count(*), sum(field_i), min(field_i), max(field_i),
avg(field_i) from collection1 where text='XXXX' group by str_s
rollup(
search(collection1,
q="(text:XXXX)",
qt="/export",
fl="str_s, field_i",
partitionKeys=str_s,
sort="str_s asc",
zkHost="127.0.0.1:64149/solr"),
over=str_s,
count(*),
sum(field_i),
min(field_i),
max(field_i),
avg(field_i))
30
01
Parallel SQL Shuffle (5 workers, 5 shards, aggregationMode=map_reduce)
Client
Worker 2
Shard 3
Replica 2
Worker 3Worker 1 Worker 4 Worker 5
Shard 1
Replica 2
Shard 1
Replica 3
Shard 2
Replica 3
Shard 2
Replica 2
Shard 2
Replica 1
Shard 1
Replica 1 Shard 3
Replica 1
Shard 3
Replica 3
Shard 4
Replica 3
Shard 4
Replica 2
Shard 4
Replica 1
Shard 5
Replica 3
Shard 5
Replica 2
Shard 5
Replica 1
/SQL
handler
31
01
Jira Tickets
•  SOLR-7560: Parallel SQL Support
•  SOLR-7377: Solr Streaming Expressions
•  SOLR-7082: Streaming Aggregation for SolrCloud
•  SOLR-7441: Improve overall robustness of the
Streaming stack: Streaming API,
Streaming Expressions, Parallel SQL
32
01
Getting Involved
• SQL is in Trunk
• Releasing with Solr 6
• Streaming API and Streaming Expressions
are located in the Solrj libraries
(solrj.io)
• Patches welcome
• Testers and feedback needed
33
01
Questions
Thanks!

Weitere ähnliche Inhalte

Was ist angesagt?

Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterLucidworks
 
Solr As A SparkSQL DataSource
Solr As A SparkSQL DataSourceSolr As A SparkSQL DataSource
Solr As A SparkSQL DataSourceSpark Summit
 
Automated Spark Deployment With Declarative Infrastructure
Automated Spark Deployment With Declarative InfrastructureAutomated Spark Deployment With Declarative Infrastructure
Automated Spark Deployment With Declarative InfrastructureSpark Summit
 
Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons          Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons Provectus
 
Designing Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache SparkDesigning Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache SparkDatabricks
 
Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Andra Lungu
 
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
 Lessons from the Field, Episode II: Applying Best Practices to Your Apache S... Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...Databricks
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...Databricks
 
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon ConsultingSolr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon ConsultingLucidworks
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Spark Summit
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopDatabricks
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationDatabricks
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsApache Apex
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Martin Junghanns
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkDatabricks
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Apex
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsDatabricks
 

Was ist angesagt? (20)

Spark etl
Spark etlSpark etl
Spark etl
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
 
Solr As A SparkSQL DataSource
Solr As A SparkSQL DataSourceSolr As A SparkSQL DataSource
Solr As A SparkSQL DataSource
 
Automated Spark Deployment With Declarative Infrastructure
Automated Spark Deployment With Declarative InfrastructureAutomated Spark Deployment With Declarative Infrastructure
Automated Spark Deployment With Declarative Infrastructure
 
Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons          Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons
 
Designing Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache SparkDesigning Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache Spark
 
Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015
 
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
 Lessons from the Field, Episode II: Applying Best Practices to Your Apache S... Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
 
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon ConsultingSolr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
 

Andere mochten auch

Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Shalin Shekhar Mangar
 
Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014
Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014
Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014Grid Dynamics
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...Lucidworks
 
Webinar: Solr's example/files: From bin/post to /browse and Beyond
Webinar: Solr's example/files: From bin/post to /browse and BeyondWebinar: Solr's example/files: From bin/post to /browse and Beyond
Webinar: Solr's example/files: From bin/post to /browse and BeyondLucidworks
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Lucidworks
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From SolrRamzi Alqrainy
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Djangotow21
 
Curso Formacion Apache Solr
Curso Formacion Apache SolrCurso Formacion Apache Solr
Curso Formacion Apache SolrEmpathyBroker
 
Apache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 AcquiaApache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 AcquiaDropsolid
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
Webinar: Natural Language Search with Solr
Webinar: Natural Language Search with SolrWebinar: Natural Language Search with Solr
Webinar: Natural Language Search with SolrLucidworks
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Lucidworks
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.ashish0x90
 

Andere mochten auch (20)

Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014
Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014
Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
 
Solr5
Solr5Solr5
Solr5
 
Apache solr
Apache solrApache solr
Apache solr
 
Webinar: Solr's example/files: From bin/post to /browse and Beyond
Webinar: Solr's example/files: From bin/post to /browse and BeyondWebinar: Solr's example/files: From bin/post to /browse and Beyond
Webinar: Solr's example/files: From bin/post to /browse and Beyond
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From Solr
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Django
 
Curso Formacion Apache Solr
Curso Formacion Apache SolrCurso Formacion Apache Solr
Curso Formacion Apache Solr
 
Apache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 AcquiaApache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 Acquia
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Webinar: Natural Language Search with Solr
Webinar: Natural Language Search with SolrWebinar: Natural Language Search with Solr
Webinar: Natural Language Search with Solr
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
Seminario Apache Solr
Seminario Apache SolrSeminario Apache Solr
Seminario Apache Solr
 
Formación apache Solr
Formación apache SolrFormación apache Solr
Formación apache Solr
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.
 

Ähnlich wie Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco

Parallel SQL for SolrCloud
Parallel SQL for SolrCloudParallel SQL for SolrCloud
Parallel SQL for SolrCloudJoel Bernstein
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFlink Forward
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Fabian Hueske
 
Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...
Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...
Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...aiuy
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sqlŁukasz Grala
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and SparkLucidworks
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkVerverica
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Aman Sinha
 
SQL on everything, in memory
SQL on everything, in memorySQL on everything, in memory
SQL on everything, in memoryJulian Hyde
 
Xia Zhu – Intel at MLconf ATL
Xia Zhu – Intel at MLconf ATLXia Zhu – Intel at MLconf ATL
Xia Zhu – Intel at MLconf ATLMLconf
 
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...Vyacheslav Lapin
 
Rdbms chapter 1 function
Rdbms chapter 1 functionRdbms chapter 1 function
Rdbms chapter 1 functiondipumaliy
 
Hive @ Bucharest Java User Group
Hive @ Bucharest Java User GroupHive @ Bucharest Java User Group
Hive @ Bucharest Java User GroupRemus Rusanu
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...Flink Forward
 
Foundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryFoundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryDataWorks Summit
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward
 

Ähnlich wie Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco (20)

Parallel SQL for SolrCloud
Parallel SQL for SolrCloudParallel SQL for SolrCloud
Parallel SQL for SolrCloud
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
 
Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...
Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...
Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql
 
Database part2-
Database part2-Database part2-
Database part2-
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and Spark
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Reactive Spring 5
Reactive Spring 5Reactive Spring 5
Reactive Spring 5
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018
 
SQL on everything, in memory
SQL on everything, in memorySQL on everything, in memory
SQL on everything, in memory
 
Xia Zhu – Intel at MLconf ATL
Xia Zhu – Intel at MLconf ATLXia Zhu – Intel at MLconf ATL
Xia Zhu – Intel at MLconf ATL
 
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
 
Rdbms chapter 1 function
Rdbms chapter 1 functionRdbms chapter 1 function
Rdbms chapter 1 function
 
Hive @ Bucharest Java User Group
Hive @ Bucharest Java User GroupHive @ Bucharest Java User Group
Hive @ Bucharest Java User Group
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
 
User Group3009
User Group3009User Group3009
User Group3009
 
Foundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryFoundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theory
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 

Mehr von Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

Mehr von Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Kürzlich hochgeladen

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Kürzlich hochgeladen (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco

  • 1. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  • 2. Parallel SQL Joel Bernstein Search Engineer, Alfresco jbernste@apache.org
  • 3. 3 03 Introduction •  Joel Bernstein •  Lucene/Solr Committer •  Search Engineer at Alfresco •  Live and work in NYC
  • 4. 4 03 Alfresco •  Open source ECM (Enterprise Content Management) •  Alfresco is a system of record for documents •  Uses Solr for search •  1800+ customers •  11 million active user accounts •  Alfresco Solr: Document level access control, eventually consistent, transactional, multi-master, distributed search and faceting (coming in Alfresco 5.1)
  • 5. 5 01 Agenda 1.  SQL Unleashed (What can it do?) 2. SQL Under the Hood (How does it work?)
  • 7. 7 01 Why SQL? •  Solr has many awesome features. •  But all of these feature create complexity. •  Which faceting API to use? When to Stream? Which parameters to use for optimal performance? •  The complexity level increases dramatically when distributed joins come into play •  With SQL we can provide an optimizer to choose the best query plan.
  • 8. 8 01 The SQL Interface at Glance •  SQL over Map/Reduce: supports high cardinality aggregations and distributed joins. •  SQL over Facets: high performance on moderate cardinality aggregations. •  SQL with Solr Search Predicates •  SQL is fully integrated with SolrCloud
  • 9. 9 01 SQL Syntax: Limited and Unlimited SELECT •  select colA, colB from tableB •  select colA, colB from tableB limit 100 •  Unlimited selects return the entire result set. Return fields must be DocValues. •  Limited selects can sort by score and retrieve any stored field.
  • 10. 10 01 SQL Syntax: ORDER BY •  select a, b from tableB order by a desc, b desc •  Unlimited selects sort the entire result set
  • 11. 11 01 The Predicate: Phrase Searching •  select a, b from tableB where c = ‘hello world’ •  Searches for the phrase ‘hello world’ in field c.
  • 12. 12 01 The Predicate: Boolean searching •  select a, b from tableB where c = ‘(hello world)’ •  Adding parens searches for (hello OR world). •  Supports Solr query syntax inside the parens.
  • 13. 13 01 The Predicate: Range query •  select a, b from tableB where c = ‘[0 TO 100]’
  • 14. 14 01 The Predicate: Arbitrary Boolean clauses •  select a, b from tableB where (c = ‘hello world’ AND d = ‘[0 TO 100]’)
  • 15. 15 01 SQL Syntax: Select Distinct •  select distinct a, b from tableB •  Map/Reduce Implementation: Tuples •  are shuffled to worker nodes where the distinct operation is performed. •  JSON Facet Implementation: distinct operation is pushed down into the search engine •  Map/Reduce for high cardinality •  Facet for high QPS
  • 16. 16 01 Shuffle vs Push Down •  Shuffling: high cardinality and parallel relational algebra (Distributed Joins) •  Pushdown (Facet): blazing fast, high QPS, moderate cardinality •  aggregationMode flag is available with the JDBC driver and http interface [map_reduce or facet]
  • 17. 17 01 Aggregations: Stats •  select count(*), sum(a) from tableA •  Uses the StatsComponent under the covers •  Initial release supports count, sum, avg, min, max •  Aggregation logic is always pushed down into the search engine.
  • 18. 18 01 Aggregations: GROUP BY •  select a, b count(*), sum(c) from tableB group by a, b having count(*) > 50 order by sum(c) desc •  Supports complex having clause: having (count(*) > 50 AND sum(b) < 1000) •  Has Map/Reduce implementation (shuffle) •  And JSON Facet implementation (push down) •  Map/Reduce can handle high cardinality multi- dimension aggregations.
  • 19. 19 01 JDBC Driver •  Ships with Solrj •  Poolable Connection and Statement •  SolrCloud Aware Load Balancing •  Connection has aggregationMode switch [map_reduce or facet]
  • 21. 21 01 SQL Parsing •  Presto SQL Parser handles the parsing •  SQL Statements are compiled to TupleStream objects •  The TupleStream is the base interface of the Streaming API •  The Streaming API is a general purpose parallel computing API for SolrCloud
  • 22. 22 01 Parallel Computing Framework •  Shuffling •  Worker Collections •  Streaming API •  Streaming Expressions •  Parallel SQL
  • 23. 23 01 Shuffling (sorting & partitioning) •  Shuffling is pushed down into the search engine •  Sorting: /export handler “stream sorts” entire result sets. •  Partitioning: HashQParserPlugin, hash partitioning filter. Partitions results on arbitrary fields. •  Tuples (search results) begin streaming instantly to worker nodes. Shuffling never requires a spill to disk. •  All replicas shuffle in parallel for the same query. Allows for massive throughput.
  • 24. 24 01 Shuffling (sorting & partitioning) Worker 2Worker 1 Shard 1 Replica 1 Shard 2 Replica 1 Shard 1 Replica 2 Shard 2 Replica 2 Client Each worker is shuffled ½ the result set Tuples are sorted and partitioned on keys
  • 25. 25 01 Worker Collections •  Are Generic SolrCloud Collections •  Can hold data, or just perform work •  Search results are shuffled to the workers •  Configured with the /stream handler
  • 26. 26 01 Streaming API •  Java Programming API for the parallel computing framework •  Real-time Map/Reduce and Parallel Relational Algebra •  Abstracts search results as Streams of tuples (TupleStream) •  Streams are transformed in parallel by pluggable Decorator streams. •  Parallel transformations include: group by, roll up, union, intersect, complement and join
  • 27. 27 01 Streaming Expressions •  Contributed by Dennis Gove (Bloomberg) •  String Query Language and Serialization format for the Streaming API •  Streaming Expressions compile to TupleStreams •  TupleStreams serialize to Streaming Expressions
  • 28. 28 01 Parallel SQL •  Compiles SQL to a TupleStream •  The TupleStream is serialized to a Streaming Expression and sent to worker nodes. •  Worker nodes translate the Streaming Expression back into TupleStream •  Worker nodes open() and read() the TupleStream in parallel. Tuples are returned from each worker
  • 29. 29 01 From SQL to Streaming Expression select str_s, count(*), sum(field_i), min(field_i), max(field_i), avg(field_i) from collection1 where text='XXXX' group by str_s rollup( search(collection1, q="(text:XXXX)", qt="/export", fl="str_s, field_i", partitionKeys=str_s, sort="str_s asc", zkHost="127.0.0.1:64149/solr"), over=str_s, count(*), sum(field_i), min(field_i), max(field_i), avg(field_i))
  • 30. 30 01 Parallel SQL Shuffle (5 workers, 5 shards, aggregationMode=map_reduce) Client Worker 2 Shard 3 Replica 2 Worker 3Worker 1 Worker 4 Worker 5 Shard 1 Replica 2 Shard 1 Replica 3 Shard 2 Replica 3 Shard 2 Replica 2 Shard 2 Replica 1 Shard 1 Replica 1 Shard 3 Replica 1 Shard 3 Replica 3 Shard 4 Replica 3 Shard 4 Replica 2 Shard 4 Replica 1 Shard 5 Replica 3 Shard 5 Replica 2 Shard 5 Replica 1 /SQL handler
  • 31. 31 01 Jira Tickets •  SOLR-7560: Parallel SQL Support •  SOLR-7377: Solr Streaming Expressions •  SOLR-7082: Streaming Aggregation for SolrCloud •  SOLR-7441: Improve overall robustness of the Streaming stack: Streaming API, Streaming Expressions, Parallel SQL
  • 32. 32 01 Getting Involved • SQL is in Trunk • Releasing with Solr 6 • Streaming API and Streaming Expressions are located in the Solrj libraries (solrj.io) • Patches welcome • Testers and feedback needed