SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Solr on HDFS 
Past, Present, and Future 
Mark Miller, Cloudera
About Me 
Lucene Committer, Solr Committer. 
Works for Cloudera. 
A lot of work on Lucene, Solr, and SolrCloud.
Some Basics 
Solr 
A distributed, fault tolerant search engine using Lucene as it’s core search library. 
HDFS 
A distributed, fault tolerant filesystem that is part of the Hadoop project.
Solr on HDFS 
Wouldn’t it be nice if Solr could run on HDFS. 
If you are running other things on HDFS, it simplifies operations. 
If you are building indexes with MapReduce, merging them into your cluster becomes 
easy. 
You can do some other kind of cool things when you have are using a shared file 
system. 
Most attempts in the past have not really caught on.
Solr on HDFS in the Past. 
• Apache Blur is one of the more successful marriages of Lucene and HDFS. 
• We borrowed some code from them to seed Solr on HDFS. 
• Others have copied indexes between local filesystem and HDFS. 
• Most people felt that running Lucene or Solr straight on HDFS would be too slow.
How HDFS Writes Data 
Remote Remote Remote Remote 
Local 
Solr 
Write An attempt is made to make a local copy 
and as many remote copies as necessary to 
satisfy the replication factor configuration.
Co-Located Solr and HFDS Data Nodes 
HDFS HDFS HDFS HDFS 
Solr Solr Solr Solr 
We recommend that HDFS data nodes and Solr nodes are co-located 
so that the default case involves fast, local data.
Non Local Data 
• BlockCache is first line of defense, but it’s good to get local data again. 
• Optimize is more painful option. 
• An HDFS affinity feature could be useful. 
• A tool that simply wrote out a copy of the index with no merging might be interesting.
HdfsDirectory 
• Fairly simple and straightforward implementation. 
• Full support required making the Directory interface a first class citizen in Solr. 
• Largest part was making Replication work with non local filesystem directories. 
• With large enough ‘buffer’ sizes, works reasonably well as long as the data is local. 
• Really needs some kind of cache to be reasonable though.
“The Block Cache” 
A replacement for the OS filesystem cache, especially for the case when there is no 
local data. 
Even with local data, making it larger will beneficially reduce HDFS traffic in many 
cases. 
Block 
Cache 
HDFS Solr
Inside the Block Cache. 
ConcurrentLinkedHashMap<BlockCacheKey,BlockCacheLocation> 
ByteBuffer[] banks 
int numberOfBlocksPerBank 
Each ByteBuffer of size ‘blockSize’. 
Used locations tracked by ‘lock’ bitset.
The Global Block Cache 
The initial Block Cache implementation used a separate Block Cache for every unique 
index directory used by Solr in HDFS. 
There are many limitations around this strategy. It hinders capacity planning, it’s not 
very efficient, and it bites you at the worst times. 
The Global Block Cache is meant to be a single Block Cache to be used by all 
SolrCore’s for every directory. 
This makes sizing very simple - determine how much RAM you can spare for the Block 
Cache and size it that way once and forget it.
Performance 
In many average cases, performance looks really good - very comparable to local 
filesystem performance, though usually somewhat slower. 
In other cases, adjusting various settings for the Block Cache can help with 
performance. 
We have recently found some changes to improve performance.
Tuning the Block Cache 
Sizing 
By default, each ‘slab’ is 128 MB. Raise the slab count to increase by 128 MB slabs. 
Block Size (8 KB default) 
Not originally configurable, but certain use cases appear to work better with 4 KB.
HDFS Transaction Log 
We also moved the Transaction Log to HDFS. 
Implementation has held up okay, some improvements needed, a large replay 
performance issue improved. 
The HDFSDirectory and Block Cache have had a much larger impact. 
No truncate support in HDFS, so we work around it by replaying the whole log in some 
failed recovery cases where local filesystem impl just drops the log.
The autoAddReplicas Feature 
A new feature that is currently only available when using a shared filesystem like 
HDFS. 
The Overseer monitors the cluster state and fires off SolrCore create command 
pointing to existing data in HDFS when a node goes down.
The autoAddReplicas Feature 2 
HDFS HXDFS HDFS HDFS 
Solr SXolr Solr Solr
The Future 
At Cloudera, we are building an Enterprise Data Hub. 
In our vision, the more that runs on HDFS, the better. 
We will continue to improve and push forward HDFS support in SolrCloud.
Block Cache Improvements 
Apache Blur has a Block Cache V2. 
Uses variable sized blocks. 
Optionally uses Unsafe for direct memory management. 
The V1 Block Cache has some performance limitations. 
* Copying bytes from off heap to IndexInput buffer. 
* Concurrent access of the cache. 
* Sequential reads have to pull a lot of blocks from the cache. 
* Each DirectByteBuffer has some overhead, including a Cleaner object that can affect 
GC and add to RAM reqs.
HDFS Only Replication When Using Replicas 
Currently, if you want to use SolrCloud replicas, data is replicated both by HDFS and 
by Solr. 
HDFS replication factor = 1 is not a very good solution. 
autoAddReplicas is one possible solution. 
We will be working on another solution where only the leader writes to an index in 
HDFS while replicas read from it.
The End 
Mark Miller 
@heismark

Weitere ähnliche Inhalte

Was ist angesagt?

Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data Search
Mark Miller
 

Was ist angesagt? (20)

HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 
Introduction to Cloudera Search Training
Introduction to Cloudera Search TrainingIntroduction to Cloudera Search Training
Introduction to Cloudera Search Training
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposal
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
 
Search On Hadoop
Search On HadoopSearch On Hadoop
Search On Hadoop
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
SQOOP - RDBMS to Hadoop
SQOOP - RDBMS to HadoopSQOOP - RDBMS to Hadoop
SQOOP - RDBMS to Hadoop
 
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
 
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data Search
 
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in HadoopOctober 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
 
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
 
A Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application Archetypes
 

Ähnlich wie Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera

Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
saili mane
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Simplilearn
 
Delphix database virtualization v1.0
Delphix database virtualization v1.0Delphix database virtualization v1.0
Delphix database virtualization v1.0
Arik Lev
 

Ähnlich wie Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera (20)

Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
 
Introduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptxIntroduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptx
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
 
module 2.pptx
module 2.pptxmodule 2.pptx
module 2.pptx
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
Hadoop – big deal
Hadoop – big dealHadoop – big deal
Hadoop – big deal
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptx
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Delphix database virtualization v1.0
Delphix database virtualization v1.0Delphix database virtualization v1.0
Delphix database virtualization v1.0
 
Giraffa - November 2014
Giraffa - November 2014Giraffa - November 2014
Giraffa - November 2014
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 

Mehr von Lucidworks

Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 

Mehr von Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Kürzlich hochgeladen

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 

Kürzlich hochgeladen (20)

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 

Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera

  • 1.
  • 2. Solr on HDFS Past, Present, and Future Mark Miller, Cloudera
  • 3. About Me Lucene Committer, Solr Committer. Works for Cloudera. A lot of work on Lucene, Solr, and SolrCloud.
  • 4. Some Basics Solr A distributed, fault tolerant search engine using Lucene as it’s core search library. HDFS A distributed, fault tolerant filesystem that is part of the Hadoop project.
  • 5. Solr on HDFS Wouldn’t it be nice if Solr could run on HDFS. If you are running other things on HDFS, it simplifies operations. If you are building indexes with MapReduce, merging them into your cluster becomes easy. You can do some other kind of cool things when you have are using a shared file system. Most attempts in the past have not really caught on.
  • 6. Solr on HDFS in the Past. • Apache Blur is one of the more successful marriages of Lucene and HDFS. • We borrowed some code from them to seed Solr on HDFS. • Others have copied indexes between local filesystem and HDFS. • Most people felt that running Lucene or Solr straight on HDFS would be too slow.
  • 7. How HDFS Writes Data Remote Remote Remote Remote Local Solr Write An attempt is made to make a local copy and as many remote copies as necessary to satisfy the replication factor configuration.
  • 8. Co-Located Solr and HFDS Data Nodes HDFS HDFS HDFS HDFS Solr Solr Solr Solr We recommend that HDFS data nodes and Solr nodes are co-located so that the default case involves fast, local data.
  • 9. Non Local Data • BlockCache is first line of defense, but it’s good to get local data again. • Optimize is more painful option. • An HDFS affinity feature could be useful. • A tool that simply wrote out a copy of the index with no merging might be interesting.
  • 10. HdfsDirectory • Fairly simple and straightforward implementation. • Full support required making the Directory interface a first class citizen in Solr. • Largest part was making Replication work with non local filesystem directories. • With large enough ‘buffer’ sizes, works reasonably well as long as the data is local. • Really needs some kind of cache to be reasonable though.
  • 11. “The Block Cache” A replacement for the OS filesystem cache, especially for the case when there is no local data. Even with local data, making it larger will beneficially reduce HDFS traffic in many cases. Block Cache HDFS Solr
  • 12. Inside the Block Cache. ConcurrentLinkedHashMap<BlockCacheKey,BlockCacheLocation> ByteBuffer[] banks int numberOfBlocksPerBank Each ByteBuffer of size ‘blockSize’. Used locations tracked by ‘lock’ bitset.
  • 13. The Global Block Cache The initial Block Cache implementation used a separate Block Cache for every unique index directory used by Solr in HDFS. There are many limitations around this strategy. It hinders capacity planning, it’s not very efficient, and it bites you at the worst times. The Global Block Cache is meant to be a single Block Cache to be used by all SolrCore’s for every directory. This makes sizing very simple - determine how much RAM you can spare for the Block Cache and size it that way once and forget it.
  • 14. Performance In many average cases, performance looks really good - very comparable to local filesystem performance, though usually somewhat slower. In other cases, adjusting various settings for the Block Cache can help with performance. We have recently found some changes to improve performance.
  • 15. Tuning the Block Cache Sizing By default, each ‘slab’ is 128 MB. Raise the slab count to increase by 128 MB slabs. Block Size (8 KB default) Not originally configurable, but certain use cases appear to work better with 4 KB.
  • 16. HDFS Transaction Log We also moved the Transaction Log to HDFS. Implementation has held up okay, some improvements needed, a large replay performance issue improved. The HDFSDirectory and Block Cache have had a much larger impact. No truncate support in HDFS, so we work around it by replaying the whole log in some failed recovery cases where local filesystem impl just drops the log.
  • 17. The autoAddReplicas Feature A new feature that is currently only available when using a shared filesystem like HDFS. The Overseer monitors the cluster state and fires off SolrCore create command pointing to existing data in HDFS when a node goes down.
  • 18. The autoAddReplicas Feature 2 HDFS HXDFS HDFS HDFS Solr SXolr Solr Solr
  • 19. The Future At Cloudera, we are building an Enterprise Data Hub. In our vision, the more that runs on HDFS, the better. We will continue to improve and push forward HDFS support in SolrCloud.
  • 20. Block Cache Improvements Apache Blur has a Block Cache V2. Uses variable sized blocks. Optionally uses Unsafe for direct memory management. The V1 Block Cache has some performance limitations. * Copying bytes from off heap to IndexInput buffer. * Concurrent access of the cache. * Sequential reads have to pull a lot of blocks from the cache. * Each DirectByteBuffer has some overhead, including a Cleaner object that can affect GC and add to RAM reqs.
  • 21. HDFS Only Replication When Using Replicas Currently, if you want to use SolrCloud replicas, data is replicated both by HDFS and by Solr. HDFS replication factor = 1 is not a very good solution. autoAddReplicas is one possible solution. We will be working on another solution where only the leader writes to an index in HDFS while replicas read from it.
  • 22. The End Mark Miller @heismark