SlideShare ist ein Scribd-Unternehmen logo
1 von 24
SOLR 101
JavaZone 2012, Oslo, Sébastien Muller, Findwise
Agenda
 Introductions
 Enterprise search
 What is Solr, why choose it?
 Solr Terminology
 Main Solr Features
 How it works
 Anatomy of a Query
 Scalbility
 Case studies
            Sparebank1
            Komplett Group
Enterprise Search
  Search has become mission critical for most enterprises
            Intranet
            Web presence
            E-commerce


  Exponential growth of data


  Cost of not finding information
            Knowledge (sharing)
            Time
            Money


  Information blackhole
What is Solr?

 Official definition:


 
       “Solr is an open source enterprise search platform
 based on the 
 Lucene Java search library, with an HTTP
 interface using XML, 
 JSON or other formats. It provides hit
 highlighting, faceted 
 search, caching, replication, a web
 administration interface and 
many more features. It runs in a
 Java servlet container such as 
     Apache Tomcat.”



              http://lucene.apache.org/solr
What is Solr?
  Open-source, license-free search engine
  Uses Apache Lucene library and adds enterprise search server
   features and capabilities
  Web based application that processes requests and returns
   responses via HTTP
  Easy scalability and great performance
  Industry-tested worldwide
  Modern solution architecture based on XML and Java – easy to work
   with
  Well integrated with the ecosystem around Big Data, such as
   Hadoop (also Nutch, Tika).
Why choose Solr?
  “Buy” > Build


  Open source vs. Commercial solution
            Open source software is free
            Licensed software can be very expensive


  High quality and easily modifiable relevancy

  Very fast query and indexing performance

  Highly flexible data processing/transformation
Why choose Solr?
  Some challenges unique to open source…
            No guaranteed support or bug fixing from community
            No formal quality control or support for upgrades
            Limited support for less experienced developers



  Some benefits unique to open source…
            Widely used and tested
            Access to source code
            Access to development versions and unreleased patches



  Ultimately search is a specialised field and requires specialists
Solr Terminology
  Index(ing)
               Inverted index
  Document
  Field
               Stored and/or indexed fields
  Analysis
               Tokenization
               Filters
               Terms
  Query
               Filter
               Function
               Facet
Main Solr Features
  Full text search
  Field search
  Number and date searching
  Facets
  Spelling assistance – “Did you mean…?”
  Replication
             Master/Slave architecture

  Related hits
  Query completion
  Admin GUI
How it works
  Easy configuration through XML
               schema.xml
               solrconfig.xml


  Documents are POSTed via HTTP to Solr
               Add/update
               Delete
               Commit


      Queries and response are also sent via HTTP
               Choice of formats
Anatomy of a Query
  Common parameters
          Start, rows, fl, fq, sort
          http://wiki.apache.org/solr/CommonQueryParameters




 ?
 q=*:*&start=0&rows=10&fl=title&fq=collection:popular&s
 ort=title asc

   Slightly more advanced
          &facets
          &qf
What is Facetting?
  Navigation/discovery technique


  Tally of docs for each distinct field value


  Parameters
             &facet=true
             &facet.field=category


                                                And so much more…
Scalability
  Architecture goals:
            More queries per second (qps)
            Faster query execution
            Bigger indexes
            Faster indexing


  Scaling options
            Multicore
            Replication
            Sharding
Scalability - Multicore
  Having more than one Solr in one Solr webapp


          <solr persistent = “true” sharedLib = “lib” >
          
     <cores adminPath=“/admin/cores”>
          
     
         <core name=“core0” instanceDir=“core0” />
          
     
         <core name=“core1” instanceDir=“core1” />
          
     </cores
          </solr>


  http://localhost:8080/solr/admin/cores?action=...
           STATUS
           CREATE
           SWAP
Scalability - Replication
  Basic architecture – indexing/querying handled by one instance


  1:1 Master/slave
           Indexing
           Querying


  1:N Master/slaves
           Different user groups
Scalability - Sharding
  Distributed index
            N masters with index split between them
            Simple hashing to choose index


  Sharding + replication
             N masters with M slaves each
             More shards = faster execution time
             More slaves = higher average QPS




 
      &shards=solr1:8983/solr,solr2:8983/
 solr&indent=true&q=ipod+solr
Case Studies
SpareBank1 - Background
  SpareBank1 Gruppen
            19 individual localised bank portals and one parent front page


  Boost 25 umbrella project
            Semantic URLs: https://www2.sparebank1.no/9898/3_privat?
             _nfpb=true&_nfls=false&_pageLabel=page_privat_innhold&pId=1233
             149354625&_
            New search interface
            Banking app


   CMS with no easy way of tracking individual banks’
    publications
            Mass duplicates
            Access to irrelevant data
SpareBank1 - Requirements
  Customer requirements : “bedre portal søk”
SpareBank1 - Requirements
  Basic search features include


            High quality relevance and precision


            Relevant faceting


            Query completion


            Spell check and suggestions


            Search analytics
SpareBank1 – Live Demo




    https://www2.sparebank1.no/
Komplett - Background
  Komplett NO, SE, DK… inWarhouse.se, MPX


  Existing Solr solution
             Mile long query with boosting per field


  Poor relevance
             Peripherals/accessories ranked higher than products


   Limited faceting

   No query completion or spellcheck

   Sloooooow indexing
Komplett - Requirements
  Superior and customisable relevance model


  Much more comprehensive indexing of products and specifications


  Spellcheck


  Query completion


  So much more faceting
Sébastien Muller
sebastien.muller@findwise.com

Weitere ähnliche Inhalte

Was ist angesagt?

Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big featuresDavid Smiley
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrAndy Jackson
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Hacking Lucene and Solr for Fun and Profit
Hacking Lucene and Solr for Fun and ProfitHacking Lucene and Solr for Fun and Profit
Hacking Lucene and Solr for Fun and Profitlucenerevolution
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 
Tutorial on-python-programming
Tutorial on-python-programmingTutorial on-python-programming
Tutorial on-python-programmingChetan Giridhar
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsOpenSource Connections
 
Tips for Tuning Solr Search: No Coding Required
Tips for Tuning Solr Search: No Coding RequiredTips for Tuning Solr Search: No Coding Required
Tips for Tuning Solr Search: No Coding RequiredAcquia
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...lucenerevolution
 

Was ist angesagt? (20)

Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big features
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Azure search
Azure searchAzure search
Azure search
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Lucene
LuceneLucene
Lucene
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Hacking Lucene and Solr for Fun and Profit
Hacking Lucene and Solr for Fun and ProfitHacking Lucene and Solr for Fun and Profit
Hacking Lucene and Solr for Fun and Profit
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
How Solr Search Works
How Solr Search WorksHow Solr Search Works
How Solr Search Works
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Tutorial on-python-programming
Tutorial on-python-programmingTutorial on-python-programming
Tutorial on-python-programming
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search Results
 
Lucene indexing
Lucene indexingLucene indexing
Lucene indexing
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
Tips for Tuning Solr Search: No Coding Required
Tips for Tuning Solr Search: No Coding RequiredTips for Tuning Solr Search: No Coding Required
Tips for Tuning Solr Search: No Coding Required
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
 

Ähnlich wie Solr 101

The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemTrey Grainger
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrLucidworks (Archived)
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-WebinarEdureka!
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunktdthomassld
 
II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...
II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...
II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...Dr. Haxel Consult
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher lucenerevolution
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" DataArt
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialSourcesense
 
Webinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewWebinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewLucidworks
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )'Moinuddin Ahmed
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 

Ähnlich wie Solr 101 (20)

Solr Architecture
Solr ArchitectureSolr Architecture
Solr Architecture
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for Solr
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-Webinar
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunk
 
II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...
II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...
II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Apache Solr vs Oracle Endeca
Apache Solr vs Oracle EndecaApache Solr vs Oracle Endeca
Apache Solr vs Oracle Endeca
 
Laravel 4 presentation
Laravel 4 presentationLaravel 4 presentation
Laravel 4 presentation
 
Webinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewWebinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's New
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 

Mehr von Findwise

White Arkitekter - Findability Day Roadshow 2017
White Arkitekter - Findability Day Roadshow 2017White Arkitekter - Findability Day Roadshow 2017
White Arkitekter - Findability Day Roadshow 2017Findwise
 
AI och maskininlärning - Findability Day Roadshow 2017
AI och maskininlärning - Findability Day Roadshow 2017AI och maskininlärning - Findability Day Roadshow 2017
AI och maskininlärning - Findability Day Roadshow 2017Findwise
 
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017Findwise
 
Findwise and IBM Watson
Findwise and IBM WatsonFindwise and IBM Watson
Findwise and IBM WatsonFindwise
 
Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findwise
 
Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findwise
 
Findability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learningFindability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learningFindwise
 
Findability Day 2016 - Enterprise social collaboration
Findability Day 2016 - Enterprise social collaborationFindability Day 2016 - Enterprise social collaboration
Findability Day 2016 - Enterprise social collaborationFindwise
 
Findability Day 2016 - SKF case study
Findability Day 2016 - SKF case studyFindability Day 2016 - SKF case study
Findability Day 2016 - SKF case studyFindwise
 
Findability Day 2016 - Structuring content for user experience
Findability Day 2016 - Structuring content for user experienceFindability Day 2016 - Structuring content for user experience
Findability Day 2016 - Structuring content for user experienceFindwise
 
Findability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindwise
 
Findability Day 2016 - What is GDPR?
Findability Day 2016 - What is GDPR?Findability Day 2016 - What is GDPR?
Findability Day 2016 - What is GDPR?Findwise
 
Findability Day 2016 - Get started with GDPR
Findability Day 2016 - Get started with GDPRFindability Day 2016 - Get started with GDPR
Findability Day 2016 - Get started with GDPRFindwise
 
Digital workplace och informationshantering i office 365
Digital workplace och informationshantering i office 365Digital workplace och informationshantering i office 365
Digital workplace och informationshantering i office 365Findwise
 
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...Findwise
 
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any messFindability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any messFindwise
 
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...Findwise
 
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...Findwise
 
Findability Day 2015 - Martin White - The future is search!
Findability Day 2015 - Martin White - The future is search!Findability Day 2015 - Martin White - The future is search!
Findability Day 2015 - Martin White - The future is search!Findwise
 
Findability Day 2015 Liam Holley - Dassault systems - Insight and discovery...
Findability Day 2015   Liam Holley - Dassault systems - Insight and discovery...Findability Day 2015   Liam Holley - Dassault systems - Insight and discovery...
Findability Day 2015 Liam Holley - Dassault systems - Insight and discovery...Findwise
 

Mehr von Findwise (20)

White Arkitekter - Findability Day Roadshow 2017
White Arkitekter - Findability Day Roadshow 2017White Arkitekter - Findability Day Roadshow 2017
White Arkitekter - Findability Day Roadshow 2017
 
AI och maskininlärning - Findability Day Roadshow 2017
AI och maskininlärning - Findability Day Roadshow 2017AI och maskininlärning - Findability Day Roadshow 2017
AI och maskininlärning - Findability Day Roadshow 2017
 
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
 
Findwise and IBM Watson
Findwise and IBM WatsonFindwise and IBM Watson
Findwise and IBM Watson
 
Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016
 
Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016
 
Findability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learningFindability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learning
 
Findability Day 2016 - Enterprise social collaboration
Findability Day 2016 - Enterprise social collaborationFindability Day 2016 - Enterprise social collaboration
Findability Day 2016 - Enterprise social collaboration
 
Findability Day 2016 - SKF case study
Findability Day 2016 - SKF case studyFindability Day 2016 - SKF case study
Findability Day 2016 - SKF case study
 
Findability Day 2016 - Structuring content for user experience
Findability Day 2016 - Structuring content for user experienceFindability Day 2016 - Structuring content for user experience
Findability Day 2016 - Structuring content for user experience
 
Findability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligence
 
Findability Day 2016 - What is GDPR?
Findability Day 2016 - What is GDPR?Findability Day 2016 - What is GDPR?
Findability Day 2016 - What is GDPR?
 
Findability Day 2016 - Get started with GDPR
Findability Day 2016 - Get started with GDPRFindability Day 2016 - Get started with GDPR
Findability Day 2016 - Get started with GDPR
 
Digital workplace och informationshantering i office 365
Digital workplace och informationshantering i office 365Digital workplace och informationshantering i office 365
Digital workplace och informationshantering i office 365
 
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
 
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any messFindability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
 
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
 
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
 
Findability Day 2015 - Martin White - The future is search!
Findability Day 2015 - Martin White - The future is search!Findability Day 2015 - Martin White - The future is search!
Findability Day 2015 - Martin White - The future is search!
 
Findability Day 2015 Liam Holley - Dassault systems - Insight and discovery...
Findability Day 2015   Liam Holley - Dassault systems - Insight and discovery...Findability Day 2015   Liam Holley - Dassault systems - Insight and discovery...
Findability Day 2015 Liam Holley - Dassault systems - Insight and discovery...
 

Kürzlich hochgeladen

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Kürzlich hochgeladen (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

Solr 101

  • 1. SOLR 101 JavaZone 2012, Oslo, Sébastien Muller, Findwise
  • 2. Agenda  Introductions  Enterprise search  What is Solr, why choose it?  Solr Terminology  Main Solr Features  How it works  Anatomy of a Query  Scalbility  Case studies  Sparebank1  Komplett Group
  • 3. Enterprise Search  Search has become mission critical for most enterprises  Intranet  Web presence  E-commerce  Exponential growth of data  Cost of not finding information  Knowledge (sharing)  Time  Money  Information blackhole
  • 4. What is Solr? Official definition: “Solr is an open source enterprise search platform based on the Lucene Java search library, with an HTTP interface using XML, JSON or other formats. It provides hit highlighting, faceted search, caching, replication, a web administration interface and many more features. It runs in a Java servlet container such as Apache Tomcat.” http://lucene.apache.org/solr
  • 5. What is Solr?  Open-source, license-free search engine  Uses Apache Lucene library and adds enterprise search server features and capabilities  Web based application that processes requests and returns responses via HTTP  Easy scalability and great performance  Industry-tested worldwide  Modern solution architecture based on XML and Java – easy to work with  Well integrated with the ecosystem around Big Data, such as Hadoop (also Nutch, Tika).
  • 6. Why choose Solr?  “Buy” > Build  Open source vs. Commercial solution  Open source software is free  Licensed software can be very expensive  High quality and easily modifiable relevancy  Very fast query and indexing performance  Highly flexible data processing/transformation
  • 7. Why choose Solr?  Some challenges unique to open source…  No guaranteed support or bug fixing from community  No formal quality control or support for upgrades  Limited support for less experienced developers  Some benefits unique to open source…  Widely used and tested  Access to source code  Access to development versions and unreleased patches  Ultimately search is a specialised field and requires specialists
  • 8. Solr Terminology  Index(ing)  Inverted index  Document  Field  Stored and/or indexed fields  Analysis  Tokenization  Filters  Terms  Query  Filter  Function  Facet
  • 9. Main Solr Features  Full text search  Field search  Number and date searching  Facets  Spelling assistance – “Did you mean…?”  Replication  Master/Slave architecture  Related hits  Query completion  Admin GUI
  • 10. How it works  Easy configuration through XML  schema.xml  solrconfig.xml  Documents are POSTed via HTTP to Solr  Add/update  Delete  Commit  Queries and response are also sent via HTTP  Choice of formats
  • 11. Anatomy of a Query  Common parameters  Start, rows, fl, fq, sort  http://wiki.apache.org/solr/CommonQueryParameters ? q=*:*&start=0&rows=10&fl=title&fq=collection:popular&s ort=title asc  Slightly more advanced  &facets  &qf
  • 12. What is Facetting?  Navigation/discovery technique  Tally of docs for each distinct field value  Parameters  &facet=true  &facet.field=category And so much more…
  • 13. Scalability  Architecture goals:  More queries per second (qps)  Faster query execution  Bigger indexes  Faster indexing  Scaling options  Multicore  Replication  Sharding
  • 14. Scalability - Multicore  Having more than one Solr in one Solr webapp <solr persistent = “true” sharedLib = “lib” > <cores adminPath=“/admin/cores”> <core name=“core0” instanceDir=“core0” /> <core name=“core1” instanceDir=“core1” /> </cores </solr>  http://localhost:8080/solr/admin/cores?action=...  STATUS  CREATE  SWAP
  • 15. Scalability - Replication  Basic architecture – indexing/querying handled by one instance  1:1 Master/slave  Indexing  Querying  1:N Master/slaves  Different user groups
  • 16. Scalability - Sharding  Distributed index  N masters with index split between them  Simple hashing to choose index  Sharding + replication  N masters with M slaves each  More shards = faster execution time  More slaves = higher average QPS &shards=solr1:8983/solr,solr2:8983/ solr&indent=true&q=ipod+solr
  • 18. SpareBank1 - Background  SpareBank1 Gruppen  19 individual localised bank portals and one parent front page  Boost 25 umbrella project  Semantic URLs: https://www2.sparebank1.no/9898/3_privat? _nfpb=true&_nfls=false&_pageLabel=page_privat_innhold&pId=1233 149354625&_  New search interface  Banking app  CMS with no easy way of tracking individual banks’ publications  Mass duplicates  Access to irrelevant data
  • 19. SpareBank1 - Requirements  Customer requirements : “bedre portal søk”
  • 20. SpareBank1 - Requirements  Basic search features include  High quality relevance and precision  Relevant faceting  Query completion  Spell check and suggestions  Search analytics
  • 21. SpareBank1 – Live Demo https://www2.sparebank1.no/
  • 22. Komplett - Background  Komplett NO, SE, DK… inWarhouse.se, MPX  Existing Solr solution  Mile long query with boosting per field  Poor relevance  Peripherals/accessories ranked higher than products  Limited faceting  No query completion or spellcheck  Sloooooow indexing
  • 23. Komplett - Requirements  Superior and customisable relevance model  Much more comprehensive indexing of products and specifications  Spellcheck  Query completion  So much more faceting

Hinweis der Redaktion

  1. \n
  2. Intros c= me &amp; solr\n
  3. Who &amp;#x2013; swiss, s&amp;#xE9;bastien muller, ex solr newbie, 1 yr w/ Solr almost daily, several projects\nWhat &amp;#x2013; work for findwise &amp;#xF0E0;&amp;#xF020;Findwizard, information access consultant&amp;#x2026;.. Enterprise search!\nWhere &amp;#x2013; Oslo for a year\nWhy &amp;#x2013; Oslo Solr Meetup community &amp;#xF0E0;&amp;#xF020;semi regular meetings at a pub in oslo\n\nINTRODUCTORY TALK\n
  4. Internally and/or externally, both for finding information or finding who has the information&amp;#x2026;. Ecommerce fail w/out search, cant find what you want to buy = no sale\nA lot of which is unstructured\nAccording to research performed by google approx. 85% of organisations can barely access less than 50% of the data they produce\n
  5. \n
  6. General description &amp;#x2013; the &amp;#x201C;sales pitch&amp;#x201D;\n\nWeb based app &amp;#x2013; runs in a servlet container and is deployed as a java war, works with all major application servers such as Tomcat or Jetty\n
  7. No point in building when there&amp;#x2019;s open source and licensed options readily available\nOpen source = free but might end up spending a lot to get it to do what you want &amp;#x2013; no vendor lock in, complete customisability\nLicensed = expensive and likely to spend a lot still\n
  8. Although it being open source allows for a very low barrier for adoption there is no Service Level Agreement with an open source community\nOpen Source community based project likely to yield better long term QA testing, more personally invested in the quality of the project, but life &gt; *\n\nIssue tracking is public\nAccess to source code &amp;#x2013; but the documentation isn&amp;#x2019;t necessarily comprehensive&amp;#x2026; google :D\n
  9. Inverted index &amp;#x2013; like a book = list of keywords paired with location -&gt; makes for v. fast queries rather than searching through documents for specific terms\n\nDocument = collection of fields with optional boosting values &amp;#xF0DF;&amp;#xF020;book&amp;#x2026;. Page&amp;#x2026; database entry etc&amp;#xF020;&amp;#xF0E0;&amp;#xF020;represented by a single result or hit\n\nSingle/multi valued\n\nStored = original pre-analysis value stored and returnable by queries, necessary for some features, increases index size &amp;#xF0E0;&amp;#xF020;store and retrieve, unless indexed too\n\nIndexed = searchable and facetable, unless stored will not be returned in a search\n\nTokenization = breaking up text sequences, filtering and trasnforming to generate &amp;#x201C;terms&amp;#x201D; that are tied to a specific field\n\nRestricts the search space by creating subset of indexed documents against which queries can be made\n\nContributes to relevance calculations in query time, can be customised\n
  10. \n
  11. Schema = define data and field types\n\nSolrconfig = search components, replication, request handlers etc\n\nDEMO schema/solrconfig in notepad++ and DEMO POSTing from netbeans/browser\n
  12. Sort on relevance score, value of fields&amp;#x2026;.\n\nIf there is a tie docs are sorted by date added (indexed time)\n\nMore shiney examples to follow\n
  13. Sort on relevance score, value of fields&amp;#x2026;.\n\nIf there is a tie docs are sorted by date added (indexed time)\n\nMore shiney examples to follow\n
  14. Sort on relevance score, value of fields&amp;#x2026;.\n\nIf there is a tie docs are sorted by date added (indexed time)\n\nMore shiney examples to follow\n
  15. Single solr instance with separate configurations (schema and config) and indexes while maintaining the convinience of unified administration\n\nEasy to add new cores or even replace cores with each other\n\nSTATUS, create, swap, unload, alias rename\n\nAtomically swaps the names used to access two existing cores. This can be useful for replacing a &quot;live&quot; core with an &quot;ondeck&quot; core, and keeping the old &quot;live&quot; core running in case you decide to roll-back.\n
  16. Basic &amp;#x2013; good starting point, fine for a small index with few updates and low query load as all updates will slow down querying\n\nMaster/slave &amp;#x2013; indexing on one, querying on the other, all replications slow down indexing, can modify replication interval &amp;#x2013; improves query speed/qps\n\n\n1:N &amp;#x2013; more qps, no more query speed necessarily -&gt; both require index to remain on 1 machine\n
  17. Updates go to one of N machines &amp;#x2013; unique key field must be unique across all shards &amp;#x2013; couple of features aren&amp;#x2019;t supported eg. More like this and joins\n\nMakes it easier to rebalance index operations across more servers\n\nshards=solr1:8983/solr,solr2:8983/solr&amp;indent=true&amp;q=ipod+solr -&gt; shards parameter syntax -&gt; can be added to a requestHandler specifically for shards\n
  18. Norwegian Bank and Norwegian based e-commerce group of sites\n
  19. One bank portal of about 1.5k docs, c. 50% were duplicates\n\nGroup publications made globally available via CMS but individual banks are under no obligation to publish articles and there&amp;#x2019;s no indication as to whether they had or not\n
  20. Norwegian Bank and Norwegian based e-commerce group of sites\n
  21. One bank portal of about 1.5k docs, c. 50% were duplicates\n\nGroup publications made globally available via CMS but individual banks are under no obligation to publish articles and there&amp;#x2019;s no indication as to whether they had or not\n
  22. Semantic FAIL\n
  23. \n
  24. \n
  25. One bank portal of about 1.5k docs, c. 50% were duplicates\n\nGroup publications made globally available via CMS but individual banks are under no obligation to publish articles and there&amp;#x2019;s no indication as to whether they had or not\n
  26. \n