SlideShare a Scribd company logo
1 of 32
Download to read offline
Batch Indexing & Near Real Time,
keeping things fast
Marc Sturlese
Software engineer @ Trovit
About me...
• Marc Sturlese – @sturlese
• Software engineer @Trovit. R&D focused
• Responsible for search and scalability
Agenda
• Who we are
• Batch architecture. Hadoop & Hive
• Near real time architecture. Storm & stuff
• Putting it all together
• Alternatives and Future directions
• Questions
Who we are
Trovit, a search engine for classifieds
Who we are
Batch Layer
• Hadoop based
• Documents are crunched by a pipeline of MR
jobs
• Hive to save stats of each phase
Batch Layer
Pipeline overview
Incoming data
Deployment
Lucene Indexes
Ad Processor Diff Matching Expiration Deduplication Indexing
t – 1
External Data
Hive Stats
Hadoop Cluster
Batch Layer
The good things!
• Index always built from scratch. Small number of
big segments
• Multicast deployment allows to send indexes to
all slaves at the same time.
• Backups convenient on HDFS
Batch Layer
That was cool but...
• Not even close to real time
• Crunch documents in batch means to wait until
all is processed. This can take a few hours
• We want to show the user fresher results!
Near real time Layer
Storm and stuff to the rescue
Near real time Layer
Storm properties
• Distributed real time computation system
• Fault tolerance
• Horizontal scalability
• Low latency
• Reliability
Near real time Layer
Storm in action
Slave
Slave
Solr prod replicas
Slave
XML feed
XML feed
Kafka partition
Kafka partition
Storm topologySources
Kafka spout
Kafka spout
XML spout Doc Manager bolt Indexer bolt
SHUFFLE
GROUPING GROUPING
FIELD
Near real time Layer
Storm in action
• Spouts just read and send
• Doc Manager Bolt processes and classifies
• Indexer Bolt adds documents to Solr
• Replicated logic with different implementation
• Careful not to overload Solr slaves...
Near real time Layer
Storm in action
Near real time Layer
Storm in action. But...
Near real time Layer
Storm in action. But...
• Now Solr has to handle user queries and storm
inserts
• Field grouping on Indexer Bolt for politeness
• Small bulks to reduce insert requests
• Committing on many cores, same host, same
time can be painful
Near real time Layer
Storm in action - Committing
Indexer Bolt Cars US
Real state UK R1 Cars US R1 Cars US R2 Jobs BR R1 Jobs BR R2 Real state ES R1
Indexer Bolt Jobs BR
ZooKeeper Locker
Slave 1 Slave 2 Slave N
. . .
Near real time Layer
Storm in action
• Adding documents now is fast
• Keep number of segments small
• Avoid merges on big segments
• Just add new docs (no deletes or updates)
Mixed Architecture
Putting it all together
15
Slave
Slave
Solr prod replicas
Slave
XML feed
XML feed
Kafka partition
Kafka partition
Storm topologySources
Hbase doc info
Bulk add
Exists?
MR Pipeline
zk
Mixed Architecture
Swapping indexes
• NRT docs might not be contained in the new
batch index (even fresher than the “being built”
batch index)
• This can lead to inconsistencies...
Mixed Architecture
Swapping indexes. Time jumps!
Mixed Architecture
Swapping indexes
HBase
XML feed t
Slave t+1
Slave t
Pipeline t
Pipeline t+1
XML feed t+1
XML feed t+2
NRT indexer
Batch indexer
Mixed Architecture
Swapping indexes
HBase
XML feed t
Slave t+1
Slave t
Pipeline t
Pipeline t+1
XML feed t+1
XML feed t+2
NRT indexer
Batch indexer
Mixed Architecture
Swapping indexes
HBase
XML feed t
Slave t+1
Slave t
Pipeline t
Pipeline t+1
XML feed t+1
XML feed t+2
NRT indexer
Batch indexer
NRT t+1
NRT t+2
Mixed Architecture
Swapping indexes
HBase
XML feed t
Slave t+1
Slave t
Pipeline t
Pipeline t+1
XML feed t+1
XML feed t+2
NRT indexer
Batch indexer
NRT t+1
NRT t+2
Mixed Architecture
Swapping indexes
• NRT indexed docs must be stored in a
temporary storage
• Fetch missing docs from the storage and add
them before the next deploy
• This avoids time jumps
Mixed Architecture
Storm and Hadoop
• Near real time inserts, low latency
• Hadoop handles deletes and updates. No rush
on those
• No merges on big segments so optimal query
response times
• Tolerant to human errors
• Temporary lost of accuracy on the NRT layer
Alternatives
SolrCloud - Why not?
• Good for the vast majority of use cases
• Incremental inserts/updates/deletes oriented.
Pay segment merges per real time
• Need to deploy full indexes fast (faster that rsync
or http replication)
• Now full deploy easier with aliases
Future lines
Lucene real time feature
• Allows to see docs in the index before they are
committed
• Good but not a must right now for the use case
• Very easy to integrate on the current
architecture
??
Thanks for your attention!
Marc Sturlese
marc@trovit.com
Lucene/Solr Revolution 2013, San Diego, May 1 2013
CONFERENCE PARTY
The Tipsy Crow: 770 5th Ave
Starts after Stump The Chump
Your conference badge gets
you in the door
TOMORROW
Breakfast starts at 7:30
Keynotes start at 8:30

More Related Content

Viewers also liked

Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Lucidworks
 
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lucidworks
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Lucidworks
 

Viewers also liked (14)

Slash n near real time indexing
Slash n   near real time indexingSlash n   near real time indexing
Slash n near real time indexing
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Engine
 
The Typed Index
The Typed IndexThe Typed Index
The Typed Index
 
Beyond tf idf why, what & how
Beyond tf idf why, what & howBeyond tf idf why, what & how
Beyond tf idf why, what & how
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 

More from lucenerevolution

Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
lucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 

Recently uploaded

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Recently uploaded (20)

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 

Batch indexing & near real time, keeping things fast.

  • 1. Batch Indexing & Near Real Time, keeping things fast Marc Sturlese Software engineer @ Trovit
  • 2. About me... • Marc Sturlese – @sturlese • Software engineer @Trovit. R&D focused • Responsible for search and scalability
  • 3. Agenda • Who we are • Batch architecture. Hadoop & Hive • Near real time architecture. Storm & stuff • Putting it all together • Alternatives and Future directions • Questions
  • 4. Who we are Trovit, a search engine for classifieds
  • 6. Batch Layer • Hadoop based • Documents are crunched by a pipeline of MR jobs • Hive to save stats of each phase
  • 7. Batch Layer Pipeline overview Incoming data Deployment Lucene Indexes Ad Processor Diff Matching Expiration Deduplication Indexing t – 1 External Data Hive Stats Hadoop Cluster
  • 8. Batch Layer The good things! • Index always built from scratch. Small number of big segments • Multicast deployment allows to send indexes to all slaves at the same time. • Backups convenient on HDFS
  • 9. Batch Layer That was cool but... • Not even close to real time • Crunch documents in batch means to wait until all is processed. This can take a few hours • We want to show the user fresher results!
  • 10. Near real time Layer Storm and stuff to the rescue
  • 11. Near real time Layer Storm properties • Distributed real time computation system • Fault tolerance • Horizontal scalability • Low latency • Reliability
  • 12. Near real time Layer Storm in action Slave Slave Solr prod replicas Slave XML feed XML feed Kafka partition Kafka partition Storm topologySources Kafka spout Kafka spout XML spout Doc Manager bolt Indexer bolt SHUFFLE GROUPING GROUPING FIELD
  • 13. Near real time Layer Storm in action • Spouts just read and send • Doc Manager Bolt processes and classifies • Indexer Bolt adds documents to Solr • Replicated logic with different implementation • Careful not to overload Solr slaves...
  • 14. Near real time Layer Storm in action
  • 15. Near real time Layer Storm in action. But...
  • 16. Near real time Layer Storm in action. But... • Now Solr has to handle user queries and storm inserts • Field grouping on Indexer Bolt for politeness • Small bulks to reduce insert requests • Committing on many cores, same host, same time can be painful
  • 17. Near real time Layer Storm in action - Committing Indexer Bolt Cars US Real state UK R1 Cars US R1 Cars US R2 Jobs BR R1 Jobs BR R2 Real state ES R1 Indexer Bolt Jobs BR ZooKeeper Locker Slave 1 Slave 2 Slave N . . .
  • 18. Near real time Layer Storm in action • Adding documents now is fast • Keep number of segments small • Avoid merges on big segments • Just add new docs (no deletes or updates)
  • 19. Mixed Architecture Putting it all together 15 Slave Slave Solr prod replicas Slave XML feed XML feed Kafka partition Kafka partition Storm topologySources Hbase doc info Bulk add Exists? MR Pipeline zk
  • 20. Mixed Architecture Swapping indexes • NRT docs might not be contained in the new batch index (even fresher than the “being built” batch index) • This can lead to inconsistencies...
  • 22. Mixed Architecture Swapping indexes HBase XML feed t Slave t+1 Slave t Pipeline t Pipeline t+1 XML feed t+1 XML feed t+2 NRT indexer Batch indexer
  • 23. Mixed Architecture Swapping indexes HBase XML feed t Slave t+1 Slave t Pipeline t Pipeline t+1 XML feed t+1 XML feed t+2 NRT indexer Batch indexer
  • 24. Mixed Architecture Swapping indexes HBase XML feed t Slave t+1 Slave t Pipeline t Pipeline t+1 XML feed t+1 XML feed t+2 NRT indexer Batch indexer NRT t+1 NRT t+2
  • 25. Mixed Architecture Swapping indexes HBase XML feed t Slave t+1 Slave t Pipeline t Pipeline t+1 XML feed t+1 XML feed t+2 NRT indexer Batch indexer NRT t+1 NRT t+2
  • 26. Mixed Architecture Swapping indexes • NRT indexed docs must be stored in a temporary storage • Fetch missing docs from the storage and add them before the next deploy • This avoids time jumps
  • 27. Mixed Architecture Storm and Hadoop • Near real time inserts, low latency • Hadoop handles deletes and updates. No rush on those • No merges on big segments so optimal query response times • Tolerant to human errors • Temporary lost of accuracy on the NRT layer
  • 28. Alternatives SolrCloud - Why not? • Good for the vast majority of use cases • Incremental inserts/updates/deletes oriented. Pay segment merges per real time • Need to deploy full indexes fast (faster that rsync or http replication) • Now full deploy easier with aliases
  • 29. Future lines Lucene real time feature • Allows to see docs in the index before they are committed • Good but not a must right now for the use case • Very easy to integrate on the current architecture
  • 30. ??
  • 31. Thanks for your attention! Marc Sturlese marc@trovit.com Lucene/Solr Revolution 2013, San Diego, May 1 2013
  • 32. CONFERENCE PARTY The Tipsy Crow: 770 5th Ave Starts after Stump The Chump Your conference badge gets you in the door TOMORROW Breakfast starts at 7:30 Keynotes start at 8:30