SlideShare a Scribd company logo
1 of 4
Download to read offline
Bringing Reusability to Enterprise
       Search
       Using Solr for building reusable enterprise search
       engine.

       A Collabor Labs Technology Paper, May 2011




       This whitepaper discusses the high level technical aspects of using solr to bring re
       usability in enterprise search implementation




                                                                          Brahmaji Pusuluri
                                                                       Sr. Software Engineer


www.collabor.com                                                           info@collabor.com
Using Lucene for enterprise search

Apache Lucene(TM) is a high-performance, full-featured text search engine library written
entirely in Java. It is suitable for nearly any application that requires full-text search.




Solr - the reusable enterprise search engine

Solr is the popular, blazing fast open source enterprise search platform from the Apache
Lucene project. HTTP request processing for indexing and querying documents. Thus, you
can have an application anywhere query and index files over the Internet via XML over
HTTP using the URL of your Solr search server. It is also a highly optimized search server
with caching and replication to other Solr search servers. It has the powerful feature of
indexing Rich text documents (e.g.: word, pdf, etc.)




Working with Solr

Once Solr is installed successfully, we need to modify the following files as per the project
requirements.

Solrconfig.xml:
Solrconfig.xml solrconfig.xml is the file that contains most of the parameters for
configuring Solr itself.

Schema.xml:
Schema.xml The schema.xml file contains all of the details about which fields your
documents can contain, and how those fields should be dealt with when adding
documents to the index, or when querying those fields.

Once the settings are done you can send an xml file to the Solr to index the data by using
curl. curl http://localhost:8983/solr/update -F stream. file=/tmp/example.xml
example.xml file containing the tags format which is defined in schema.xml.




Research by: Collabor Labs                                                                         Page 2
May 2011                                                   All trademarks belong to their respective owners
Index a DB table directly into Solr using DataImportHandler


Most applications store data in relational databases or XML files and searching over such
data is a common use-case. The DataImportHandler is a Solr contrib that provides a
configuration driven way to import this data into Solr in both "full builds" and using
incremental delta imports.

Edit your solrconfig.xml to add the request handler

<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
 <str name="config">data-config.xml</str>
</lst>
</requestHandler>



The data-config.xml file contains the following.

<dataConfig>
 <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/dbname" user="user-name" password="password"/>
 <document>
  <entity name="name" query="select id,name,desc from mytable">
    <field column="id" name="solr_id"/>
    <field column="name" name="solr_name"/>
    <field column="desc" name="solr_desc"/>
    <entity name="inner"
         query="select details from another_table where id ='${outer.id}'">
        <field column="details" name="solr_details"/>
    </entity>
  </entity>
 </document>
</dataConfig>


Run the full-import command to index the entire database.
http://localhost:8983/solr/dataimport?command=full-import

Run the delta-import command to index the incremental data imports
http://localhost:8983/solr/dataimport?command=delta-import


Building Reusable search engine with Solr using multi-core

Multiple cores let you have a single Solr instance with separate configurations and
indexes, with their own configuration and schema for very different applications, but still
have the convenience of unified administration. Individual indexes are still fairly isolated,
but you can manage them as a single application, create new indexes on the fly by

Research by: Collabor Labs                                                                         Page 3
May 2011                                                   All trademarks belong to their respective owners
spinning up new SolrCores, and even make one SolrCore replace another SolrCore
without ever restarting your Servlet Container.

Edit the solr.xml and write a snippet. See example below.
<solr persistent="true" sharedLib="lib">
<cores adminPath="/admin/cores">
 <core name="application1" instanceDir="app1">
  <property name="dataDir" value="/app1/data" />
  <property name="configName" value="/app1/config.xml" />
  <property name="schemaName" value="/app1/schema.xml" />
 </core>
 <core name="application2" instanceDir="app2" />
</cores>
</solr>


Run the full-import command to index the entire database in application1.
http://localhost:8983/solr/application1/dataimport?command=full-import

Run the delta-import command to index the incremental data imports
http://localhost:8983/solr/application1/dataimport?command=delta-import

Run the full-import command to index the entire database in application2.
http://localhost:8983/solr/application2/dataimport?command=full-import

Run the delta-import command to index the incremental data imports
http://localhost:8983/solr/application2/dataimport?command=delta-import

Searching for indexes

http://localhost:8983/solr/application1/select/?q=searchterm returns xml file with
results.

We can reuse single Solr installation to multiple enterprise search implementations.


References:

    1.   http://lucene.apache.org/solr/
    2.   Wikipedia pages -- Apache Solr
                          -




For more information, contact: info@collabor.com




Research by: Collabor Labs                                                                          Page 4
May 2011                                                    All trademarks belong to their respective owners

More Related Content

More from Collabor Inc.

Beyond CRM - Collabor's Customer Engagement & Insights Software
Beyond CRM - Collabor's Customer Engagement & Insights SoftwareBeyond CRM - Collabor's Customer Engagement & Insights Software
Beyond CRM - Collabor's Customer Engagement & Insights SoftwareCollabor Inc.
 
Collabor Tech Talk - Data Encryption 101
Collabor Tech Talk - Data Encryption 101Collabor Tech Talk - Data Encryption 101
Collabor Tech Talk - Data Encryption 101Collabor Inc.
 
Datasheet wondercrowds
Datasheet wondercrowdsDatasheet wondercrowds
Datasheet wondercrowdsCollabor Inc.
 
Case study mywhitecoat.com
Case study mywhitecoat.comCase study mywhitecoat.com
Case study mywhitecoat.comCollabor Inc.
 
Whitepaper - Building a collaboration beehive
Whitepaper - Building a collaboration beehiveWhitepaper - Building a collaboration beehive
Whitepaper - Building a collaboration beehiveCollabor Inc.
 
Beyond CRM - Customer Lifecycle Management
Beyond CRM - Customer Lifecycle ManagementBeyond CRM - Customer Lifecycle Management
Beyond CRM - Customer Lifecycle ManagementCollabor Inc.
 
Making flash work on iPhone & iPad April 2011
Making flash work on iPhone & iPad April 2011Making flash work on iPhone & iPad April 2011
Making flash work on iPhone & iPad April 2011Collabor Inc.
 

More from Collabor Inc. (10)

Beyond CRM - Collabor's Customer Engagement & Insights Software
Beyond CRM - Collabor's Customer Engagement & Insights SoftwareBeyond CRM - Collabor's Customer Engagement & Insights Software
Beyond CRM - Collabor's Customer Engagement & Insights Software
 
Collabor Tech Talk - Data Encryption 101
Collabor Tech Talk - Data Encryption 101Collabor Tech Talk - Data Encryption 101
Collabor Tech Talk - Data Encryption 101
 
Datasheet wondercrowds
Datasheet wondercrowdsDatasheet wondercrowds
Datasheet wondercrowds
 
The Cloud OS battle
The Cloud OS battleThe Cloud OS battle
The Cloud OS battle
 
Case study mywhitecoat.com
Case study mywhitecoat.comCase study mywhitecoat.com
Case study mywhitecoat.com
 
Whitepaper - Building a collaboration beehive
Whitepaper - Building a collaboration beehiveWhitepaper - Building a collaboration beehive
Whitepaper - Building a collaboration beehive
 
Work 3.0 Datasheet
Work 3.0 DatasheetWork 3.0 Datasheet
Work 3.0 Datasheet
 
Case-study FFC
Case-study FFCCase-study FFC
Case-study FFC
 
Beyond CRM - Customer Lifecycle Management
Beyond CRM - Customer Lifecycle ManagementBeyond CRM - Customer Lifecycle Management
Beyond CRM - Customer Lifecycle Management
 
Making flash work on iPhone & iPad April 2011
Making flash work on iPhone & iPad April 2011Making flash work on iPhone & iPad April 2011
Making flash work on iPhone & iPad April 2011
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Bringing reusability to enterprise search

  • 1. Bringing Reusability to Enterprise Search Using Solr for building reusable enterprise search engine. A Collabor Labs Technology Paper, May 2011 This whitepaper discusses the high level technical aspects of using solr to bring re usability in enterprise search implementation Brahmaji Pusuluri Sr. Software Engineer www.collabor.com info@collabor.com
  • 2. Using Lucene for enterprise search Apache Lucene(TM) is a high-performance, full-featured text search engine library written entirely in Java. It is suitable for nearly any application that requires full-text search. Solr - the reusable enterprise search engine Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. HTTP request processing for indexing and querying documents. Thus, you can have an application anywhere query and index files over the Internet via XML over HTTP using the URL of your Solr search server. It is also a highly optimized search server with caching and replication to other Solr search servers. It has the powerful feature of indexing Rich text documents (e.g.: word, pdf, etc.) Working with Solr Once Solr is installed successfully, we need to modify the following files as per the project requirements. Solrconfig.xml: Solrconfig.xml solrconfig.xml is the file that contains most of the parameters for configuring Solr itself. Schema.xml: Schema.xml The schema.xml file contains all of the details about which fields your documents can contain, and how those fields should be dealt with when adding documents to the index, or when querying those fields. Once the settings are done you can send an xml file to the Solr to index the data by using curl. curl http://localhost:8983/solr/update -F stream. file=/tmp/example.xml example.xml file containing the tags format which is defined in schema.xml. Research by: Collabor Labs Page 2 May 2011 All trademarks belong to their respective owners
  • 3. Index a DB table directly into Solr using DataImportHandler Most applications store data in relational databases or XML files and searching over such data is a common use-case. The DataImportHandler is a Solr contrib that provides a configuration driven way to import this data into Solr in both "full builds" and using incremental delta imports. Edit your solrconfig.xml to add the request handler <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">data-config.xml</str> </lst> </requestHandler> The data-config.xml file contains the following. <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/dbname" user="user-name" password="password"/> <document> <entity name="name" query="select id,name,desc from mytable"> <field column="id" name="solr_id"/> <field column="name" name="solr_name"/> <field column="desc" name="solr_desc"/> <entity name="inner" query="select details from another_table where id ='${outer.id}'"> <field column="details" name="solr_details"/> </entity> </entity> </document> </dataConfig> Run the full-import command to index the entire database. http://localhost:8983/solr/dataimport?command=full-import Run the delta-import command to index the incremental data imports http://localhost:8983/solr/dataimport?command=delta-import Building Reusable search engine with Solr using multi-core Multiple cores let you have a single Solr instance with separate configurations and indexes, with their own configuration and schema for very different applications, but still have the convenience of unified administration. Individual indexes are still fairly isolated, but you can manage them as a single application, create new indexes on the fly by Research by: Collabor Labs Page 3 May 2011 All trademarks belong to their respective owners
  • 4. spinning up new SolrCores, and even make one SolrCore replace another SolrCore without ever restarting your Servlet Container. Edit the solr.xml and write a snippet. See example below. <solr persistent="true" sharedLib="lib"> <cores adminPath="/admin/cores"> <core name="application1" instanceDir="app1"> <property name="dataDir" value="/app1/data" /> <property name="configName" value="/app1/config.xml" /> <property name="schemaName" value="/app1/schema.xml" /> </core> <core name="application2" instanceDir="app2" /> </cores> </solr> Run the full-import command to index the entire database in application1. http://localhost:8983/solr/application1/dataimport?command=full-import Run the delta-import command to index the incremental data imports http://localhost:8983/solr/application1/dataimport?command=delta-import Run the full-import command to index the entire database in application2. http://localhost:8983/solr/application2/dataimport?command=full-import Run the delta-import command to index the incremental data imports http://localhost:8983/solr/application2/dataimport?command=delta-import Searching for indexes http://localhost:8983/solr/application1/select/?q=searchterm returns xml file with results. We can reuse single Solr installation to multiple enterprise search implementations. References: 1. http://lucene.apache.org/solr/ 2. Wikipedia pages -- Apache Solr - For more information, contact: info@collabor.com Research by: Collabor Labs Page 4 May 2011 All trademarks belong to their respective owners