Battle of the Giants
Rafał Kuć – Sematext Group, Inc.
@kucrafal @sematext sematext.com
Ich bin ein…
Sematext consultant & engineer
Solr Cookbook series author
„ElasticSearch Server” author
„Mastering ElasticSe...
Copyright 2013 Sematext Group. Inc. All rights reserved
Under the Hood
Copyright 2013 Sematext Group. Inc. All rights reserved
Lucene 4.3Lucene 4.3
Expectations
Scalability
Fault toleranance
High availablity
Features
Manageability
Ease of installation
Tools
Support
Copy...
Expectations vs Reality
Only ElasticSearch nodes
Single leader
Copyright 2013 Sematext Group. Inc. All rights reserved
Sol...
All Time Top Committers
Copyright 2013 Sematext Group. Inc. All rights reserved
Active Contributors
Copyright 2013 Sematext Group. Inc. All rights reserved
The Code
Copyright 2013 Sematext Group. Inc. All rights reserved
The Mailing Lists
Copyright 2013 Sematext Group. Inc. All rights reserved
Trends
Copyright 2013 Sematext Group. Inc. All rights reserved
Collection vs Index
Collections and Indices can be spread among
different nodes in the cluster
Copyright 2013 Sematext Gro...
Apache Solr Index Structure
Field and types defined in schema
Automatic value copying
Dynamic fields
Custom similarity
Cus...
ElasticSearch Index Structure
Schema - less
Fields and types defined with HTTP API
Multi – field support
Nested and parent...
Shards and Replicas
Many shards
0 or more replicas
Replica can become leader
Replicas can be created on
live cluster
Copyr...
Configuration
Static in solrconfig.xml
Can be reloaded with
core reload
Static in elasticsearch.yml
Changable at runtime
C...
Discovery
Copyright 2013 Sematext Group. Inc. All rights reserved
Zen DiscoveryApache Zookeeper
Solr & ZooKeeper
Requires additional software
Prevents split – brain situations
Holds collections configurations
ZooKeeper...
ElasticSearch Zen Discovery
Automatic node discovery
Multicast and unicast discovery methods
Automatic master detection
Tw...
HTTP FTW
HTTP REST API in ElasticSearch or Query String
for simple queries
HTTP with Query String in Apache Solr
Both prov...
Results Grouping
Group on:
field value
query result
function query
Copyright 2013 Sematext Group. Inc. All rights reserved
Prospective Search
Called Percolator
Matches documents to stored queries
Copyright 2013 Sematext Group. Inc. All rights re...
Full Text Search Capabilities
Variety of queries
Control score calculation
Different query parsers
Advanced Lucene queries...
Score Calculation
Leverage Lucene scoring
Control importance of:
documents
queries
terms
phrases
Similiarity configuration...
Apache Solr and Score Influence
Index - time boosting
Query - time
Term boosts
Field boosts
Phrases boost
Function queries...
ElasticSearch and Score Influence
Index - time
Query - time
Different queries provide different boost controls
Can calcula...
ElasticSearch Query Rescore
Reorders top N hits by using other query
Executed on shards before results are returned
to the...
ElasticSearch Nested Objects
Indexed as separate documents
Stored in the same part of index as root doc
Hidden from standa...
Solr Parent – Child Relationship
Used at query time
Multi core joins possible
select?q={!join from=parent to=id}color:Yell...
ElasticSearch Parent – Child
Proper indexing required
Indexed as separate documents
Standard queries don’t return child do...
Filters
Used to narrown down query results
Good candidates for caching and reuse
Copyright 2013 Sematext Group. Inc. All r...
Faceting
Copyright 2013 Sematext Group. Inc. All rights reserved
Terms
Range & query
Terms statistics
Spatial distance
Piv...
Real Time Or Not ?
Get not yet indexed docs from transaction log
Don’t need searcher reopening
Copyright 2013 Sematext Gro...
Data Handling
Single and batch indexing supported
Copyright 2013 Sematext Group. Inc. All rights reserved
JSON in / JSON o...
Partial Document Updates
Not based on LUCENE-3837
Server-side doc reindexing
Both servers use versioning
Decreases network...
Apache Solr Partial Doc Update
Sent to the standard update handler
Requires _version_ field
curl 'localhost:8983/solr/upda...
ElasticSearch Partial Doc Update
Special end – point exposed - _update
Supports parameters like routing, parent,
replicati...
Solr Collections API
Collection
creation
reload
deletion
shards splitting
Copyright 2013 Sematext Group. Inc. All rights r...
ElasticSearch Indices REST API
Index
creation
deletion
closing and opening
refreshing
existence checking
Copyright 2013 Se...
Apache Solr Shard Splitting
Copyright 2013 Sematext Group. Inc. All rights reserved
admin/collections?action=SPLITSHARD&co...
Cluster State Monitoring
Copyright 2013 Sematext Group. Inc. All rights reserved
Multiple MBeans exposed by
JMX
Multiple R...
ElasticSearch Statistics API
Health and state check
Nodes information
Cache statistics
Segments information
Index informat...
ElasticSearch Cluster Settings Update
Control
rebalancing
recovery
allocation
Change cluster configuration properties
Copy...
ElasticSearch Custom Shard Allocation
Cluster level:
Index level:
curl -XPUT localhost:9200/_cluster/settings -d '{
"persi...
Moving Shards and Replicas
Move shards between nodes on demand
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"comman...
Copyright 2013 Sematext Group. Inc. All rights reserved
The Verdict
And The Winner Is ?
Copyright 2013 Sematext Group. Inc. All rights reserved
We Are Hiring !
Dig Search ?
Dig Analytics ?
Dig Big Data ?
Dig Performance ?
Dig working with and in open – source ?
We’r...
Copyright 2013 Sematext Group. Inc. All rights reserved
Rafał Kuć
@kucrafal
rafal.kuc@sematext.com
Sematext
@sematext
http...
Nächste SlideShare
Wird geladen in …5
×

Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

12.401 Aufrufe

Veröffentlicht am

Veröffentlicht in: Technologie
0 Kommentare
23 Gefällt mir
Statistik
Notizen
  • Als Erste(r) kommentieren

Keine Downloads
Aufrufe
Aufrufe insgesamt
12.401
Auf SlideShare
0
Aus Einbettungen
0
Anzahl an Einbettungen
1.343
Aktionen
Geteilt
0
Downloads
319
Kommentare
0
Gefällt mir
23
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie

Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

  1. 1. Battle of the Giants Rafał Kuć – Sematext Group, Inc. @kucrafal @sematext sematext.com
  2. 2. Ich bin ein… Sematext consultant & engineer Solr Cookbook series author „ElasticSearch Server” author „Mastering ElasticSearch” author Solr.pl co-founder Father and husband  Copyright 2013 Sematext Group. Inc. All rights reserved
  3. 3. Copyright 2013 Sematext Group. Inc. All rights reserved
  4. 4. Under the Hood Copyright 2013 Sematext Group. Inc. All rights reserved Lucene 4.3Lucene 4.3
  5. 5. Expectations Scalability Fault toleranance High availablity Features Manageability Ease of installation Tools Support Copyright 2013 Sematext Group. Inc. All rights reserved
  6. 6. Expectations vs Reality Only ElasticSearch nodes Single leader Copyright 2013 Sematext Group. Inc. All rights reserved Solr + ZooKeeper Leader per shard Distributed Fault tolerant Automatic leader election
  7. 7. All Time Top Committers Copyright 2013 Sematext Group. Inc. All rights reserved
  8. 8. Active Contributors Copyright 2013 Sematext Group. Inc. All rights reserved
  9. 9. The Code Copyright 2013 Sematext Group. Inc. All rights reserved
  10. 10. The Mailing Lists Copyright 2013 Sematext Group. Inc. All rights reserved
  11. 11. Trends Copyright 2013 Sematext Group. Inc. All rights reserved
  12. 12. Collection vs Index Collections and Indices can be spread among different nodes in the cluster Copyright 2013 Sematext Group. Inc. All rights reserved Collection – main logical index Index – main logical structure
  13. 13. Apache Solr Index Structure Field and types defined in schema Automatic value copying Dynamic fields Custom similarity Custom postings format Multiple document types require shared schema Can be read using API Copyright 2013 Sematext Group. Inc. All rights reserved
  14. 14. ElasticSearch Index Structure Schema - less Fields and types defined with HTTP API Multi – field support Nested and parent – child documents Custom similarity Custom postings format Multiple document with different structure Can be read and written using API Copyright 2013 Sematext Group. Inc. All rights reserved
  15. 15. Shards and Replicas Many shards 0 or more replicas Replica can become leader Replicas can be created on live cluster Copyright 2013 Sematext Group. Inc. All rights reserved
  16. 16. Configuration Static in solrconfig.xml Can be reloaded with core reload Static in elasticsearch.yml Changable at runtime Copyright 2013 Sematext Group. Inc. All rights reserved
  17. 17. Discovery Copyright 2013 Sematext Group. Inc. All rights reserved Zen DiscoveryApache Zookeeper
  18. 18. Solr & ZooKeeper Requires additional software Prevents split – brain situations Holds collections configurations ZooKeeper ensemble needed Copyright 2013 Sematext Group. Inc. All rights reserved
  19. 19. ElasticSearch Zen Discovery Automatic node discovery Multicast and unicast discovery methods Automatic master detection Two - way failure detection Copyright 2013 Sematext Group. Inc. All rights reserved
  20. 20. HTTP FTW HTTP REST API in ElasticSearch or Query String for simple queries HTTP with Query String in Apache Solr Both provide specialized Java API Copyright 2013 Sematext Group. Inc. All rights reserved
  21. 21. Results Grouping Group on: field value query result function query Copyright 2013 Sematext Group. Inc. All rights reserved
  22. 22. Prospective Search Called Percolator Matches documents to stored queries Copyright 2013 Sematext Group. Inc. All rights reserved
  23. 23. Full Text Search Capabilities Variety of queries Control score calculation Different query parsers Advanced Lucene queries Copyright 2013 Sematext Group. Inc. All rights reserved
  24. 24. Score Calculation Leverage Lucene scoring Control importance of: documents queries terms phrases Similiarity configuration Copyright 2013 Sematext Group. Inc. All rights reserved
  25. 25. Apache Solr and Score Influence Index - time boosting Query - time Term boosts Field boosts Phrases boost Function queries Sub-queries used for boosting Copyright 2013 Sematext Group. Inc. All rights reserved
  26. 26. ElasticSearch and Score Influence Index - time Query - time Different queries provide different boost controls Can calculate distributed term frequencies Negative and Positive boosting queries Custom score filters Scripts Copyright 2013 Sematext Group. Inc. All rights reserved
  27. 27. ElasticSearch Query Rescore Reorders top N hits by using other query Executed on shards before results are returned to the node handling it Not executed with scan and count Copyright 2013 Sematext Group. Inc. All rights reserved
  28. 28. ElasticSearch Nested Objects Indexed as separate documents Stored in the same part of index as root doc Hidden from standard queries and filters Need appropriate queries and filters (nested) Top level documents can be sorted on the basis of nested ones Copyright 2013 Sematext Group. Inc. All rights reserved
  29. 29. Solr Parent – Child Relationship Used at query time Multi core joins possible select?q={!join from=parent to=id}color:Yellow Copyright 2013 Sematext Group. Inc. All rights reserved
  30. 30. ElasticSearch Parent – Child Proper indexing required Indexed as separate documents Standard queries don’t return child documents Retrieve parent docs using queries and filters (has_child, has_parent, top_children) Copyright 2013 Sematext Group. Inc. All rights reserved
  31. 31. Filters Used to narrown down query results Good candidates for caching and reuse Copyright 2013 Sematext Group. Inc. All rights reserved Addictive Can use different query parsers Can use local params Narrows down faceting results Defined using Query DSL Can be used for score calculation Doesn’t narrow down faceting results
  32. 32. Faceting Copyright 2013 Sematext Group. Inc. All rights reserved Terms Range & query Terms statistics Spatial distance Pivot Histograms
  33. 33. Real Time Or Not ? Get not yet indexed docs from transaction log Don’t need searcher reopening Copyright 2013 Sematext Group. Inc. All rights reserved Separate Get and Multi Get API Separate Realtime Get Handler
  34. 34. Data Handling Single and batch indexing supported Copyright 2013 Sematext Group. Inc. All rights reserved JSON in / JSON out (and YAML) Different formats allowed (XML, JSON, CSV, binary)
  35. 35. Partial Document Updates Not based on LUCENE-3837 Server-side doc reindexing Both servers use versioning Decreases network traffic Copyright 2013 Sematext Group. Inc. All rights reserved
  36. 36. Apache Solr Partial Doc Update Sent to the standard update handler Requires _version_ field curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[ { "id" : "12345", "enabled" : { "set" : true } } ]' Copyright 2013 Sematext Group. Inc. All rights reserved
  37. 37. ElasticSearch Partial Doc Update Special end – point exposed - _update Supports parameters like routing, parent, replication, percolate, etc (similar to Index API) Uses scripts to perform document updates curl -XPOST 'localhost:9200/sematext/test/12345/_update' -d '{ "script" : "ctx._source.enabled = enabled", "params" : { "enabled" : true } }' Copyright 2013 Sematext Group. Inc. All rights reserved
  38. 38. Solr Collections API Collection creation reload deletion shards splitting Copyright 2013 Sematext Group. Inc. All rights reserved
  39. 39. ElasticSearch Indices REST API Index creation deletion closing and opening refreshing existence checking Copyright 2013 Sematext Group. Inc. All rights reserved
  40. 40. Apache Solr Shard Splitting Copyright 2013 Sematext Group. Inc. All rights reserved admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1
  41. 41. Cluster State Monitoring Copyright 2013 Sematext Group. Inc. All rights reserved Multiple MBeans exposed by JMX Multiple REST end – points exposed to get different statistics
  42. 42. ElasticSearch Statistics API Health and state check Nodes information Cache statistics Segments information Index information Mappings information Copyright 2013 Sematext Group. Inc. All rights reserved SPM – „One to rule them all”
  43. 43. ElasticSearch Cluster Settings Update Control rebalancing recovery allocation Change cluster configuration properties Copyright 2013 Sematext Group. Inc. All rights reserved
  44. 44. ElasticSearch Custom Shard Allocation Cluster level: Index level: curl -XPUT localhost:9200/_cluster/settings -d '{ "persistent" : { "cluster.routing.allocation.exclude._ip" : "192.168.2.1" } }' curl -XPUT localhost:9200/sematext/_settings/ -d '{ "index.routing.allocation.include.tag" : "nodeOne,nodeTwo" }' Copyright 2013 Sematext Group. Inc. All rights reserved
  45. 45. Moving Shards and Replicas Move shards between nodes on demand curl -XPOST 'localhost:9200/_cluster/reroute' -d '{ "commands" : [ {"move" : {"index" : "sematext", "shard" : 0, "from_node" : "node1", "to_node" : "node2"}}, {"allocate" : {"index" : "sematext", "shard" : 1, "node" : "node3"}} ] }' Copyright 2013 Sematext Group. Inc. All rights reserved
  46. 46. Copyright 2013 Sematext Group. Inc. All rights reserved The Verdict
  47. 47. And The Winner Is ? Copyright 2013 Sematext Group. Inc. All rights reserved
  48. 48. We Are Hiring ! Dig Search ? Dig Analytics ? Dig Big Data ? Dig Performance ? Dig working with and in open – source ? We’re hiring world – wide ! http://sematext.com/about/jobs.html Copyright 2013 Sematext Group. Inc. All rights reserved
  49. 49. Copyright 2013 Sematext Group. Inc. All rights reserved Rafał Kuć @kucrafal rafal.kuc@sematext.com Sematext @sematext http://sematext.com http://blog.sematext.com ElasticSearch Server 25% off: MREESS25 Thank You !

×