SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Solr 4.1



      Abhey Gupta
Software Engineer (Java)
Value First Digital Pvt Ltd
Outline
This presentation will guide from series of question in try to answer
    usability of Solr in MIS 3

–   What is Solr ?
–   Why use Solr ?
–   Current Scenairo
–   Scope of Improvement
–   Indexing Data
–   Import MIS Data
–   Challenges
–   Demo
–   Query Example
What is Solr?
Solr is the popular, blazing fast open source enterprise search platform
  from the Apache Lucene project.

     •    Its major features include powerful full-text search, hit
         highlighting, faceted search, dynamic clustering, database
         integration, rich document (e.g., Word, PDF) handling, and
         geospatial search.

     •   Solr is highly scalable, providing distributed search and index
         replication, and it powers the search and navigation features
         of many of the world's largest internet sites.
What is Lucene?
Apache Lucene is a high-performance, full-featured text search engine library
written entirely in Java. It is a technology suitable for nearly any application
that requires full-text search, especially cross-platform

     –   An open source Java-based IR library with best practice indexing
         and query capabilities, fast and lightweight search and indexing.
     –   100% Java (.NET, Perl and other versions too).
     –   Stable, mature API.
     –   Continuously improved and tuned over more than 10 years.
     –   Cleanly implemented, easy to embed in an application.
     –   Compact, portable index representation.
     –   Programmable text analyzers, spell checking and highlighting.
     –   Not a crawler or a text extraction tool.
Who uses Lucene/Solr?
Here are five noteworthy public sites that use Solr to handle search:

–   WhiteHouse.gov – The Obama administration's keystone web site is
    Drupal and Solr!
–   Netflix – Solr powers basic movie searching on this extremely busy
    site.
–   Internet Archive – Search this vast repository of music, documents
    and video using Solr.
–   StubHub.com – This ticket reseller uses Solr to help visitors search
    for concerts and sporting events.
–   The Smithsonian Institution – Search the Smithsonian’s collection of
    over 4 million items.
Solr indexing options
Why uses Solr?
 Assuming the user has a relational DB, why use Solr? If your use case
requires a person to type words into a search box, you want a text search
engine like Solr.

Databases and Solr have complementary strengths and weaknesses.

SQL supports very simple wildcard-based text search with some simple
normalization like matching upper case to lower case. The problem is that
these are full table scans. In Solr all searchable words are stored in an
"inverse index", which searches orders of magnitude faster.

For Deatils Please consult below link
                   –   http://wiki.apache.org/solr/WhyUseSolr
Current Scenario
In current ,MIS 3 use mysql FULL TEXT search for text based search which
lacks behind solr in terms of Query Speed & Text Search



                1. Full Text            2. Full Text Search
                Search         MIS UI   Query
     USER
                                (80)

                                                                   MYSQL
                                            4. Result




                                                              3. Full table Scan for text
                                                              Search
Scope of Improvement
Instead of quering MYSQL for text search , we can deploy Solr inbetween ,
which will return result , being inverted index , this quering is fast and
efficient.


                1. Full Text
                Search                 MIS UI
     USER
                                        (80)

                                                                                           MYSQL


                                                  4. Result
                 2. Full Text Search
                 Query



                                                SOLR
                                                              3. Scan for tokenized text
                                                              Search in inverted index
Indexing Data in Solr
A Solr index can accept data from many different sources, including XML
files, comma-separated value (CSV) files, data extracted from tables in a
database, and files in common file formats such as Microsoft Word or PDF.

Here are the most common ways of loading data into a Solr index:

    –    Uploading Structured Data Store Data with the Data Import Handler

    –    Using the Solr Cell framework built on Apache Tika for ingesting
         binary files or structured files such as Office, Word, PDF, and other
         proprietary formats.

    –    Uploading XML files by sending HTTP requests to the Solr server
         from any environment where such requests can be generated.

    –    Writing a custom Java application to ingest data through Solr's Java
Indexing MIS Data
MIS has structured data on MIS server and Structured files on services
server , so this way we can index data in two ways , These are following

    –    Data Import Handler on MIS Database
           • This has benefit of manageability , as this needs to be
               deployed on MIS servers only,which are very few.
           • We can import data on delta incremental.

    –    Script to import CSV files from services
           • This will increase in manageability and deployability of scripts
               on services
           • Need to implement partial import for DLRLOG data.
Indexing Bean
Solr can also import bean type for indexing , in Services we build bean of
Every MT and DLR , we can Directly import them on Solr.

This could increase into unneccesary load , as API will index bean per
messages.


                     Index Data call per MT and DLR
      API 15                                          SOLR
Data Import Handler
We can import data to index in solr from mysql , we can do this in two ways ,
 disctributed or centeral
                   SOLR
    MYSQL                                           MYSQL




                    SOLR
     MYSQL                                           MYSQL
                                                                        SOLR




                    SOLR                             MYSQL
     MYSQL




     MYSQL         SOLR                              MYSQL
Import CSV
We can import data to index in solr from each services , we can also do this in
 two ways , disctributed or centeral
                  SCRIPT
     Service 1               SOLR



                  SCRIPT
     Service 1               SOLR



     Service 1    SCRIPT     SOLR




    …..........



     Service 1    SCRIPT
                             SOLR
Challenges
Every Import Scenario advantages trade off with some disadvantages and
challenges .

For Example

    –   DIH : Data Import handler require joins with sql query to import data
        from mttextlog,mtlog and dlrlog.

           •   Or we can get messageid from Solr and query again to mysql
               for complete data with in clause query for message id.

    –   CSV Import : It requires scripts to be deployed on every service
        server and lots of managebablity of files proccessed or not
        proccessed.

    –   BEAN Import : It requires changes at API level and could result into
Thank You

Weitere ähnliche Inhalte

Ähnlich wie Solr 4

Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2longkeyy
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmaplucenerevolution
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road maplucenerevolution
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1Stefan Schmidt
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdfAbanti Aazmin
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrlucenerevolution
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 
REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25Jon Petter Hjulstad
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrTrey Grainger
 
Apache Solr Web Development: Unlocking the Power of Search
Apache Solr Web Development: Unlocking the Power of SearchApache Solr Web Development: Unlocking the Power of Search
Apache Solr Web Development: Unlocking the Power of Searchcompany
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdfFariha Tasnim
 
OUGN 2016: Experiences with REST support on OSB/SOA Suite
OUGN 2016: Experiences with REST support on OSB/SOA SuiteOUGN 2016: Experiences with REST support on OSB/SOA Suite
OUGN 2016: Experiences with REST support on OSB/SOA SuiteJon Petter Hjulstad
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Mary Jo Sminkey
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 

Ähnlich wie Solr 4 (20)

Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road map
 
Solr -
Solr - Solr -
Solr -
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
Apache Solr Web Development: Unlocking the Power of Search
Apache Solr Web Development: Unlocking the Power of SearchApache Solr Web Development: Unlocking the Power of Search
Apache Solr Web Development: Unlocking the Power of Search
 
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
OUGN 2016: Experiences with REST support on OSB/SOA Suite
OUGN 2016: Experiences with REST support on OSB/SOA SuiteOUGN 2016: Experiences with REST support on OSB/SOA Suite
OUGN 2016: Experiences with REST support on OSB/SOA Suite
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 

Kürzlich hochgeladen

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Kürzlich hochgeladen (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Solr 4

  • 1. Solr 4.1 Abhey Gupta Software Engineer (Java) Value First Digital Pvt Ltd
  • 2. Outline This presentation will guide from series of question in try to answer usability of Solr in MIS 3 – What is Solr ? – Why use Solr ? – Current Scenairo – Scope of Improvement – Indexing Data – Import MIS Data – Challenges – Demo – Query Example
  • 3. What is Solr? Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. • Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. • Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.
  • 4. What is Lucene? Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform – An open source Java-based IR library with best practice indexing and query capabilities, fast and lightweight search and indexing. – 100% Java (.NET, Perl and other versions too). – Stable, mature API. – Continuously improved and tuned over more than 10 years. – Cleanly implemented, easy to embed in an application. – Compact, portable index representation. – Programmable text analyzers, spell checking and highlighting. – Not a crawler or a text extraction tool.
  • 5. Who uses Lucene/Solr? Here are five noteworthy public sites that use Solr to handle search: – WhiteHouse.gov – The Obama administration's keystone web site is Drupal and Solr! – Netflix – Solr powers basic movie searching on this extremely busy site. – Internet Archive – Search this vast repository of music, documents and video using Solr. – StubHub.com – This ticket reseller uses Solr to help visitors search for concerts and sporting events. – The Smithsonian Institution – Search the Smithsonian’s collection of over 4 million items.
  • 7. Why uses Solr? Assuming the user has a relational DB, why use Solr? If your use case requires a person to type words into a search box, you want a text search engine like Solr. Databases and Solr have complementary strengths and weaknesses. SQL supports very simple wildcard-based text search with some simple normalization like matching upper case to lower case. The problem is that these are full table scans. In Solr all searchable words are stored in an "inverse index", which searches orders of magnitude faster. For Deatils Please consult below link – http://wiki.apache.org/solr/WhyUseSolr
  • 8. Current Scenario In current ,MIS 3 use mysql FULL TEXT search for text based search which lacks behind solr in terms of Query Speed & Text Search 1. Full Text 2. Full Text Search Search MIS UI Query USER (80) MYSQL 4. Result 3. Full table Scan for text Search
  • 9. Scope of Improvement Instead of quering MYSQL for text search , we can deploy Solr inbetween , which will return result , being inverted index , this quering is fast and efficient. 1. Full Text Search MIS UI USER (80) MYSQL 4. Result 2. Full Text Search Query SOLR 3. Scan for tokenized text Search in inverted index
  • 10. Indexing Data in Solr A Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data extracted from tables in a database, and files in common file formats such as Microsoft Word or PDF. Here are the most common ways of loading data into a Solr index: – Uploading Structured Data Store Data with the Data Import Handler – Using the Solr Cell framework built on Apache Tika for ingesting binary files or structured files such as Office, Word, PDF, and other proprietary formats. – Uploading XML files by sending HTTP requests to the Solr server from any environment where such requests can be generated. – Writing a custom Java application to ingest data through Solr's Java
  • 11. Indexing MIS Data MIS has structured data on MIS server and Structured files on services server , so this way we can index data in two ways , These are following – Data Import Handler on MIS Database • This has benefit of manageability , as this needs to be deployed on MIS servers only,which are very few. • We can import data on delta incremental. – Script to import CSV files from services • This will increase in manageability and deployability of scripts on services • Need to implement partial import for DLRLOG data.
  • 12. Indexing Bean Solr can also import bean type for indexing , in Services we build bean of Every MT and DLR , we can Directly import them on Solr. This could increase into unneccesary load , as API will index bean per messages. Index Data call per MT and DLR API 15 SOLR
  • 13. Data Import Handler We can import data to index in solr from mysql , we can do this in two ways , disctributed or centeral SOLR MYSQL MYSQL SOLR MYSQL MYSQL SOLR SOLR MYSQL MYSQL MYSQL SOLR MYSQL
  • 14. Import CSV We can import data to index in solr from each services , we can also do this in two ways , disctributed or centeral SCRIPT Service 1 SOLR SCRIPT Service 1 SOLR Service 1 SCRIPT SOLR ….......... Service 1 SCRIPT SOLR
  • 15. Challenges Every Import Scenario advantages trade off with some disadvantages and challenges . For Example – DIH : Data Import handler require joins with sql query to import data from mttextlog,mtlog and dlrlog. • Or we can get messageid from Solr and query again to mysql for complete data with in clause query for message id. – CSV Import : It requires scripts to be deployed on every service server and lots of managebablity of files proccessed or not proccessed. – BEAN Import : It requires changes at API level and could result into