SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Downloaden Sie, um offline zu lesen
cominvent as
                         Enterprise Search Specialists




               Migrating FAST to Solr
                            By Jan Høydahl
               Oslo Enterprise Search MeetUp May 2010

cominvent as
Jan Høydahl

                  ●   IT architect - search,
                      telecom, mobile
                  ●   Helped build FAST's Global
                      Services as first engineer
                  ●   Founder of Cominvent AS
                  ●   Search consultant 10 years




cominvent as
cominvent as



cominvent as
Consulting

    – Cominvent delivers independent search consulting
    – Focus on Apache Lucene/Solr & Microsoft FAST ESP

               Idea –> architecture –> implementation




cominvent as
Commercial Support (Solr/Lucene)

    – When community & mailing list support is not enough..
    – Paid support agreement for Apache Solr/Lucene
    – In cooperation with Lucid Imagination

    – Read more: http://www.cominvent.com/support/




cominvent as
Training

    – Cominvent AS delivers training public and on-site
    – Certified Solr Training Partner for Lucid Imagination
    – Certified FAST ESP Training Partner

    – Read more: http://www.cominvent.com/training/




cominvent as
                                                       Photo: fluidpowerzone.com
Solr kurs




cominvent as
cominvent as
FAST & Solr are very similar...




cominvent as
Areas of usage




cominvent as
Common features




cominvent as
Common features




cominvent as
Introduction to...


                      ...for FAST people




cominvent as
Apache Solr - characteristics




                                    Search server


(Commercially friendly)




cominvent as
Apache Solr - characteristics




           Modular                  Community




  Contributions & patches
                                    Light weight
cominvent as
Solr-user community growth

                                                                             Solr-user growth
           1600




           1400




           1200




           1000
Messages




            800
                                                                                                                                                                Column B


            600




            400




            200




              0
                  2006 Mar   2006 Jul   2006 Nov    2007 Mar   2007 Jul   2007 Nov    2008 Mar   2008 Jul   2008 Nov    2009 Apr    2009 Aug   2009 Dec
            2006 Jan   2006 May   2006 Sep    2007 Jan   2007 May   2007 Sep    2008 Jan   2008 May   2008 Sep    2009 Feb    2009 Jun    2009 Oct   2010 Feb
     cominvent as                                                               Month
Lucene/Solr deployments




    – More: http://wiki.apache.org/solr/PublicServers
cominvent as
                                              Thanks to Lucid Imagination for logo collection
XML/HTTP




8
Solr Architecture




cominvent as
The Apache Software Foundation




cominvent as
Other ASF Lucene sub-projects

                        – Lucene Java library



                        – Rich document extraction


                        – Crawling web pages



                        – Machine learning
                           • Classification/clustering
                           • Collaborative filtering...
cominvent as
Introduction to...



                      ...for Solr people




cominvent as
FAST ESP – characteristics & key strengths




                                      Security




                       Connectors
cominvent as
FAST ESP – characteristics & key strengths




cominvent as
FAST ESP – characteristics & key strengths

    – Very strong document processing framework
               Format       Language     Linguistic
               Conversion   Detection    Normalization             Entities




                                            Custom
               Taxonomy      Sentiment                           Ontology
                                            Plug-in



                                            PARIS (Reuters) - Venus Williams raced into the second
                                               round of the $11.25 million French Open Monday,
                Search      Alert            brushing aside Bianka Lamade, 6-3, 6-3, in 65 minutes.

                                            The Wimbledon and U.S. Open champion, seeded second,
                                                breezed past the German on a blustery center court to
                                                become the first seed to advance at Roland Garros.
                                                "I love being here, I love the French Open and more than
                                                anything I'd love to do well here," the American said.

                                            A first round loser last year, Williams is hoping to progress
cominvent as                                     beyond the quarter-finals for the first time in her career.
FAST ESP architecture




cominvent as
The migration...




cominvent as
Migration objectives

    – Possible objectives include:
        •   Lower maintenance cost
        •   Deeper in-house competency
        •   Less dependent on external consultants
        •   Ownership and visibility of source code
        •   Shorter time to market for new features
        •   Bugs fixed faster – or even fix ourselves
        •   Larger community, mailing lists that work!
        •   More choice in external consultants
        •   Contribute back to Open Source
        •   Lower HW footprint



cominvent as
Migration steps

    – Knowledge gathering & Training
    – Review current features & arch
        • Want to keep all features? Add new?
    – Migration areas:
        •   Index profile
        •   Content
        •   Feeding
        •   Document Processing
        •   Querying
        •   Search middleware?
        •   Admin & Operational
    – What to do in Application space vs Search space?

cominvent as
Feature comparison ESP – Solr (similarities)

               Feature                         ESP                  Solr
 Full-text, boolean, range search,       Yes                 Yes
 sorting, sub-second, facets, did-you-
 mean, synonyms, faceting
 Scaling for QPS                         Add rows            Add rows

 Scaling for document volume             Add columns         Add shards

 Synonyms                                Index/query side    Index/query side

 GEO search                              Yes                 Yes (1.5)

 Boolean query language                  Yes (FQL)           Yes (Lucene or
                                                             (e)DisMax)
 APIs                                    HTTP, Java, .NET,   HTTP, Java, .NET,
                                         C++, PHP            Ruby, Python, PHP,
                                                             Perl, JS

cominvent as
Feature comparison ESP – Solr (differences)

                Feature           ESP                Solr
 Admin server              Yes                No (coming 1.5)

 Processes                 Many (C++, Java,   One WAR in Java
                           Python)            app-server, 100%
                                              Java
 Navigators / Facets       Index-time         Query-time

 Did-you-mean              Dictionary based   Dictionary or
                                              index based
 Feeding                   API only           HTTP POST or API

 Document processing       Pipeline (py)      Simple pipeline
                                              (Java, JS, Groovy,
                                              Jython, JRuby..)
 Multi field querying      Composite fields   DisMax handler


cominvent as
Feature comparison ESP – Solr (differences)

                Feature                    ESP                  Solr
 Relevancy tuning                   Rank profiles, term Dynamic function
                                    boosting            queries and boost
                                                        functions
 XRANK                              XRANK operator       Function Queries

 Freshness boost                    Freshness in rank    Function Queries
                                    profile
 Boost GEO distance                 Rank profile and     Function Queries
                                    special
 Major schema or software updates   Cold update, use     Stage new content
                                    stage environment    into new Solr core
 Pluggability                       Docprocs, QT/RP      Everything :)
                                    (limited), clients   Request Handlers,
                                                         Query Parsers,
                                                         Docprocs, Rank,
                                                         Spell, tokenizer++
cominvent as
Feature comparison ESP – Solr (differences)

                Feature           ESP                  Solr
 Lemmatization             Can be licensed     Can be licensed
                           for many            from 3rd party
                           languages
 Query syntax              and(a:foo, b:bar)   a:foo OR b:bar
                           i:range(0, 100)     I:[0 TO 100]

                           d:range(2000-01-    d:[2000-01-
                           01T00:00:00,        01T00:00:00Z TO
                           2010-03-            NOW]
                           03T12:00:00)
 Query params              query=              q=
                           offset=             start=
                           hits=               rows=
                           spell=1             spellcheck=true
 What fields to return     view=viewname       fl=title,price,body...

cominvent as
Feature comparison ESP – Solr (differences)

               Feature           ESP                  Solr
 Search XML hierarchy     Yes, scope search    No

 Reports                  Built in analytics   Use 3rd party log
                                               analysis such as
                                               Splunk.com




cominvent as
Your existing FAST system - overview

                       Your web-app


                                      Search middleware?




cominvent as
                                              Graphics diagram: www.microsoft.com
Migrating index profile

    – ESP index profile -> Solr schema.xml
    – Setup field types, use defaults or create your own
    – Setup the static fields. ESP:



    – Solr equivalent:



    – No need for generic*, use dynamic fields:



cominvent as
Migrating index profile

    – Composite fields?
        • Solr can use <copyField> to copy multiple fields into
          one, e.g. as we did to map many attributes into one
          field
        • However, to achieve ranking with different boost of
          each field, Solr does not need composite field. Use
          DisMax query handler instead. Very powerful!
    – No need to edit schema to add new fields. Using
      dynamic fields, it is easy to e.g. Introduce a color facet
      for cars or a Mpixels facet for digital cameras




cominvent as
DisMax query example

    – This Solr query can replace use of composite-field
        • qt=dismax
        • q=oslo
        • qf=title^0.7 highpriorityfields^1.5
          mediumpriorityfields^0.6 lowpriorityfields^0.2
          recallfields^0.0 body^0.0
        • bf=recip(rord(creationDate),1,1000,1000)




cominvent as
Migrating content

    – If using FAST ContentAPI to push programatically
        • Use Solr's clients (Java, .NET, Ruby, Python, PHP...)
    – If feeding FastXML using FileTraverser
        • Feed as Solr XML using HTTP POST or a POST client




    – If you feed custom XML with XMLMapper
        • Have a look at DIH's import and mapping features


cominvent as
Push Feeding example

    – Feed XML using HTTP POST:
        • curl http://localhost:8080/solr/update?commit=true
          -H "Content-Type: text/xml"
          --data-binary @mydoc.xml
    – Ruby example:
        • >gem sources -a http://gemcutter.org
          >sudo gem install rsolr
          require 'rsolr'
          solr = RSolr.connect :url=>'http://localhost:8080'
          documents = [{:id=>1, :price=>1.00},
                    {:id=>2, :price=>10.50}]
          solr.add documents
          solr.commit


cominvent as
Pull: DataImportHandler (DIH)




cominvent as
Querying examples

    – http://localhost:8080/solr/select?q=car&fl=id,title




    – Ruby
        • res=solr.select :q=>'roses', :fq=>['red','white']
          res['response']['docs'].each do |doc|
            puts doc['title']
          end

cominvent as
Migrating document processing

    – Solr lacks a sophisticated pipeline with entity
      extraction etc. Alternatives:
        • Do extraction in Application space (Ruby)
        • Write own stage in Solr pipeline for simple cases
        • Integrate                 to do more advanced stuff
    – Matchers/extractors
        • LingPipe NamedEntityExtractor inside of OpenPipeline
    – Synonyms:
        • Use Solr's synonym handling index/query side
    – Custom stages:
        • Write a Solr UpdateProcessor (in Java, Jython etc)
    – Got a LOT of custom FAST docproc stages?
        • Have a look at SESAT's PY ProcServer for Solr (GPL)
cominvent as
Migrating linguistics (lemmatization)

    – Solr ships with Stemming instead of Lemmatization
    – Stemming has limitations
        • Biler, bilen, bilene -> bil
          BUT
        • Bøker, bøkene -> bøk; boka, bok -> bok
    – Kstem better. Free with LucidWorks for Solr
    – If you need singular/plural handling only
        • Free dictionaries? Check lucene-hunspell
    – Lemmatization can be licensed from 3rd party
      such as Basistech, who also has language
      identification & entity extraction
    – Language identification also from Sematext

cominvent as
Basistech Rosette for Lucene

    – High-end linguistics capabilities for
      19 languages
    – Language Identification
    – Segmentation and tokenization
    – Lemmatization
    – Noun decompounding
    – Part-of-speech tagging
    – Entity extraction

    – Easily integrated with Lucene/Solr

    – More: http://www.basistech.com/lucene/

cominvent as
Migrating search middleware

    – Using FAST Unity?
        • Consider migrating middleware logic such as external
          source querying and federation to SESAT (AGPL)
    – Using Comperio Front?
        • Ask Comperio for Solr engine support
        • Or migrate custom Q&R formats
    – Or is plain Solr enough?
        • Solr has built-in support for shards
        • A shard query will query multiple shards
          and merge the results into one
        • Add custom processing as Query
          Components in Solr
        • Check contrib & patches!

cominvent as
Migrating Front ends

    – Using a middleware with Solr support? Lucky you!
    – If not, consider introducing one now. Look at (Java):




    – If you decide to migrate from FAST Java/.NET APIs
        • Choose SolrJ or SolrNET
        • Query language differences. &fq= instead of filter()
        • Solr facets do not require sessions/state as FAST's
    – Migrate fast's «views» into named ReqHandler configs
    – Multi lingual: Need to handle title_no, title_en etc... :(

cominvent as
Migrating Web Crawler

    – Solr has no built-in web crawler
        • Instead you can choose from several integrations
    – The Apache Nutch crawler
        • Proven with hundreds of millions of pages
        • http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
    – Apache Droids
        • Still an incubator, but aims at becoming a full crawler
        • http://incubator.apache.org/droids/
    – Heritix + Solr (example in Solr1.4 book)
    – OpenPipeline has a (very) simple crawler
    – Lucene Connectors Framework
        • Preparing crawler support

cominvent as
Migrating Connectors

    – Solr handles these sources internally through DIH:
        • Database, RSS, Web-services, Local filesystem
    – Additionally throgh Lucene Connectors Framework:
        •

        • EMC Documentum, FileNet, JDBC, LiveLink, Patriarch
          (Memex), Meridio, SharePoint, RSS
        • New connectors should be written for LCF
    – Another option:
        •
        • Sharepoint, IMAP, Documentum, Vignette, Filesystem



cominvent as
Operations

    –   Solr has no admin-server (coming in 1.5)
    –   Possible to run multiple Tomcat on same server
    –   Multiple cores in same Tomcat – easier migration
    –   No built-in query reports, use 3rd party tools
    –   No built-in monitoring, have a look at



    – Log analysis? Check out




cominvent as
More info




cominvent as
Thank You


               www.cominvent.com



               jh@cominvent.com


               www.twitter.com/cominvent


               linkedin.com/in/janhoy

                  This presentation licensed under CC-by-sa license
cominvent as      You must attribute Cominvent with name and link

Weitere ähnliche Inhalte

Was ist angesagt?

Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Lucidworks
 
Scarab: SAT-based Constraint Programming System in Scala / Scala上で実現された制約プログラ...
Scarab: SAT-based Constraint Programming System in Scala / Scala上で実現された制約プログラ...Scarab: SAT-based Constraint Programming System in Scala / Scala上で実現された制約プログラ...
Scarab: SAT-based Constraint Programming System in Scala / Scala上で実現された制約プログラ...scalaconfjp
 
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...CloudxLab
 
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...frank2
 
Installing & Configuring OpenLDAP (Hands On Lab)
Installing & Configuring OpenLDAP (Hands On Lab)Installing & Configuring OpenLDAP (Hands On Lab)
Installing & Configuring OpenLDAP (Hands On Lab)Michael Lamont
 
OpenLDAP configuration brought to Apache Directory Studio
OpenLDAP configuration brought to Apache Directory StudioOpenLDAP configuration brought to Apache Directory Studio
OpenLDAP configuration brought to Apache Directory StudioLDAPCon
 
An introduction to ROP
An introduction to ROPAn introduction to ROP
An introduction to ROPSaumil Shah
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solrthelabdude
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
 
How to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterHow to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterlucenerevolution
 
In-Memory Evolution in Apache Spark
In-Memory Evolution in Apache SparkIn-Memory Evolution in Apache Spark
In-Memory Evolution in Apache SparkKazuaki Ishizaki
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkKazuaki Ishizaki
 

Was ist angesagt? (13)

Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
 
Scarab: SAT-based Constraint Programming System in Scala / Scala上で実現された制約プログラ...
Scarab: SAT-based Constraint Programming System in Scala / Scala上で実現された制約プログラ...Scarab: SAT-based Constraint Programming System in Scala / Scala上で実現された制約プログラ...
Scarab: SAT-based Constraint Programming System in Scala / Scala上で実現された制約プログラ...
 
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...
 
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...
 
Installing & Configuring OpenLDAP (Hands On Lab)
Installing & Configuring OpenLDAP (Hands On Lab)Installing & Configuring OpenLDAP (Hands On Lab)
Installing & Configuring OpenLDAP (Hands On Lab)
 
OpenLDAP configuration brought to Apache Directory Studio
OpenLDAP configuration brought to Apache Directory StudioOpenLDAP configuration brought to Apache Directory Studio
OpenLDAP configuration brought to Apache Directory Studio
 
An introduction to ROP
An introduction to ROPAn introduction to ROP
An introduction to ROP
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
How to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterHow to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr cluster
 
In-Memory Evolution in Apache Spark
In-Memory Evolution in Apache SparkIn-Memory Evolution in Apache Spark
In-Memory Evolution in Apache Spark
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
 

Andere mochten auch

Spirits Industry Tastings &amp; Special Events
Spirits Industry Tastings &amp; Special EventsSpirits Industry Tastings &amp; Special Events
Spirits Industry Tastings &amp; Special EventsEmilyFRyan341
 
Hcmf 2011 Rob Humphrey
Hcmf 2011 Rob Humphrey Hcmf 2011 Rob Humphrey
Hcmf 2011 Rob Humphrey Rob Humphrey
 
Is Your Income Protected?
Is Your Income Protected?Is Your Income Protected?
Is Your Income Protected?vickor
 
Performance development Program - Inenrwealth Corporate Development
Performance development Program - Inenrwealth Corporate DevelopmentPerformance development Program - Inenrwealth Corporate Development
Performance development Program - Inenrwealth Corporate DevelopmentChris Walker
 
English Home Work Oscar Tamara
English Home Work Oscar TamaraEnglish Home Work Oscar Tamara
English Home Work Oscar Tamaraoscar tamara
 
Juarez Strategic Plan Association
Juarez Strategic Plan AssociationJuarez Strategic Plan Association
Juarez Strategic Plan Associationbordertradealliance
 
Operation Al Fajr Iraq Nov 2004
Operation Al Fajr Iraq Nov 2004Operation Al Fajr Iraq Nov 2004
Operation Al Fajr Iraq Nov 2004intelcenter
 
Angielskie metro
Angielskie metroAngielskie metro
Angielskie metroAleksandra
 
Implementing ARIA for Real World Accessibility
Implementing ARIA for Real World AccessibilityImplementing ARIA for Real World Accessibility
Implementing ARIA for Real World AccessibilityJared Smith
 
Interim report Axfood Q3 2010
Interim report Axfood Q3 2010Interim report Axfood Q3 2010
Interim report Axfood Q3 2010Axfood
 
CRM AddOn Dial IT eCast
CRM AddOn Dial IT eCastCRM AddOn Dial IT eCast
CRM AddOn Dial IT eCastpatrick_m
 
Using A Video Off The Internet
Using A Video Off The InternetUsing A Video Off The Internet
Using A Video Off The Internetvera.weber
 
John Baird, General Manager, Freightwatch Mexico
John Baird, General Manager, Freightwatch MexicoJohn Baird, General Manager, Freightwatch Mexico
John Baird, General Manager, Freightwatch Mexicobordertradealliance
 
Year End Report Axfood 2011
Year End Report Axfood 2011Year End Report Axfood 2011
Year End Report Axfood 2011Axfood
 
Il Modulo Nilde Utenti E L’Automazione Di Un Servizio
Il Modulo Nilde Utenti E L’Automazione Di Un ServizioIl Modulo Nilde Utenti E L’Automazione Di Un Servizio
Il Modulo Nilde Utenti E L’Automazione Di Un ServizioBiblioteca Scientifica
 
APSU Drupal Training Personal
APSU Drupal Training PersonalAPSU Drupal Training Personal
APSU Drupal Training PersonalMark Jarrell
 
Deloitte publicatie cloud diner
Deloitte publicatie cloud dinerDeloitte publicatie cloud diner
Deloitte publicatie cloud dinerTheo Slaats
 
Group evaluation of Trapped
Group evaluation of TrappedGroup evaluation of Trapped
Group evaluation of Trappedcallison
 

Andere mochten auch (20)

Spirits Industry Tastings &amp; Special Events
Spirits Industry Tastings &amp; Special EventsSpirits Industry Tastings &amp; Special Events
Spirits Industry Tastings &amp; Special Events
 
Hcmf 2011 Rob Humphrey
Hcmf 2011 Rob Humphrey Hcmf 2011 Rob Humphrey
Hcmf 2011 Rob Humphrey
 
Is Your Income Protected?
Is Your Income Protected?Is Your Income Protected?
Is Your Income Protected?
 
Performance development Program - Inenrwealth Corporate Development
Performance development Program - Inenrwealth Corporate DevelopmentPerformance development Program - Inenrwealth Corporate Development
Performance development Program - Inenrwealth Corporate Development
 
English Home Work Oscar Tamara
English Home Work Oscar TamaraEnglish Home Work Oscar Tamara
English Home Work Oscar Tamara
 
Juarez Strategic Plan Association
Juarez Strategic Plan AssociationJuarez Strategic Plan Association
Juarez Strategic Plan Association
 
Operation Al Fajr Iraq Nov 2004
Operation Al Fajr Iraq Nov 2004Operation Al Fajr Iraq Nov 2004
Operation Al Fajr Iraq Nov 2004
 
Angielskie metro
Angielskie metroAngielskie metro
Angielskie metro
 
Cold Tundra Project Watts
Cold Tundra Project WattsCold Tundra Project Watts
Cold Tundra Project Watts
 
Implementing ARIA for Real World Accessibility
Implementing ARIA for Real World AccessibilityImplementing ARIA for Real World Accessibility
Implementing ARIA for Real World Accessibility
 
Interim report Axfood Q3 2010
Interim report Axfood Q3 2010Interim report Axfood Q3 2010
Interim report Axfood Q3 2010
 
CRM AddOn Dial IT eCast
CRM AddOn Dial IT eCastCRM AddOn Dial IT eCast
CRM AddOn Dial IT eCast
 
Using A Video Off The Internet
Using A Video Off The InternetUsing A Video Off The Internet
Using A Video Off The Internet
 
SocialMedia4SmallBiz_SpotOn
SocialMedia4SmallBiz_SpotOnSocialMedia4SmallBiz_SpotOn
SocialMedia4SmallBiz_SpotOn
 
John Baird, General Manager, Freightwatch Mexico
John Baird, General Manager, Freightwatch MexicoJohn Baird, General Manager, Freightwatch Mexico
John Baird, General Manager, Freightwatch Mexico
 
Year End Report Axfood 2011
Year End Report Axfood 2011Year End Report Axfood 2011
Year End Report Axfood 2011
 
Il Modulo Nilde Utenti E L’Automazione Di Un Servizio
Il Modulo Nilde Utenti E L’Automazione Di Un ServizioIl Modulo Nilde Utenti E L’Automazione Di Un Servizio
Il Modulo Nilde Utenti E L’Automazione Di Un Servizio
 
APSU Drupal Training Personal
APSU Drupal Training PersonalAPSU Drupal Training Personal
APSU Drupal Training Personal
 
Deloitte publicatie cloud diner
Deloitte publicatie cloud dinerDeloitte publicatie cloud diner
Deloitte publicatie cloud diner
 
Group evaluation of Trapped
Group evaluation of TrappedGroup evaluation of Trapped
Group evaluation of Trapped
 

Ähnlich wie Oslo Enterprise MeetUp May 12th 2010 - Jan Høydahl

Migrating Fast to Solr
Migrating Fast to SolrMigrating Fast to Solr
Migrating Fast to SolrCominvent AS
 
Exciting New Alfresco REST APIs
Exciting New Alfresco REST APIsExciting New Alfresco REST APIs
Exciting New Alfresco REST APIsJ V
 
E commerce Search using Apache Solr
E commerce Search using Apache SolrE commerce Search using Apache Solr
E commerce Search using Apache SolrRohan Makkar
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrLucidworks (Archived)
 
Delphi ORM SOA MVC SQL NoSQL JSON REST mORMot
Delphi ORM SOA MVC SQL NoSQL JSON REST mORMotDelphi ORM SOA MVC SQL NoSQL JSON REST mORMot
Delphi ORM SOA MVC SQL NoSQL JSON REST mORMotArnaud Bouchez
 
Past, Present and Future of APIs of Mobile and Web Apps
Past, Present and Future of APIs of Mobile and Web AppsPast, Present and Future of APIs of Mobile and Web Apps
Past, Present and Future of APIs of Mobile and Web AppsSmartBear
 
Java overview 20131022
Java overview 20131022Java overview 20131022
Java overview 20131022hamidsamadi
 
Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex Espen Brækken
 
Websphere Application Server: Much more than Open Source
Websphere Application Server: Much more than Open SourceWebsphere Application Server: Much more than Open Source
Websphere Application Server: Much more than Open SourceIBM WebSphereIndia
 
Developer’s intro to the alfresco platform
Developer’s intro to the alfresco platformDeveloper’s intro to the alfresco platform
Developer’s intro to the alfresco platformAlfresco Software
 
A Workhorse Named Mule
A Workhorse Named MuleA Workhorse Named Mule
A Workhorse Named MuleDavid Dossot
 
Ruby on Rails All Hands Meeting
Ruby on Rails All Hands MeetingRuby on Rails All Hands Meeting
Ruby on Rails All Hands MeetingDan Davis
 
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
OSS EU:  Deep Dive into Building Streaming Applications with Apache PulsarOSS EU:  Deep Dive into Building Streaming Applications with Apache Pulsar
OSS EU: Deep Dive into Building Streaming Applications with Apache PulsarTimothy Spann
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Murshed Ahmmad Khan
 
The Future of Cross-Platform is Native
The Future of Cross-Platform is NativeThe Future of Cross-Platform is Native
The Future of Cross-Platform is NativeJustin Mancinelli
 

Ähnlich wie Oslo Enterprise MeetUp May 12th 2010 - Jan Høydahl (20)

Migrating Fast to Solr
Migrating Fast to SolrMigrating Fast to Solr
Migrating Fast to Solr
 
Solr
SolrSolr
Solr
 
Web servicesoverview
Web servicesoverviewWeb servicesoverview
Web servicesoverview
 
Exciting New Alfresco REST APIs
Exciting New Alfresco REST APIsExciting New Alfresco REST APIs
Exciting New Alfresco REST APIs
 
E commerce Search using Apache Solr
E commerce Search using Apache SolrE commerce Search using Apache Solr
E commerce Search using Apache Solr
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for Solr
 
Delphi ORM SOA MVC SQL NoSQL JSON REST mORMot
Delphi ORM SOA MVC SQL NoSQL JSON REST mORMotDelphi ORM SOA MVC SQL NoSQL JSON REST mORMot
Delphi ORM SOA MVC SQL NoSQL JSON REST mORMot
 
Laravel 4 presentation
Laravel 4 presentationLaravel 4 presentation
Laravel 4 presentation
 
Web servicesoverview
Web servicesoverviewWeb servicesoverview
Web servicesoverview
 
Past, Present and Future of APIs of Mobile and Web Apps
Past, Present and Future of APIs of Mobile and Web AppsPast, Present and Future of APIs of Mobile and Web Apps
Past, Present and Future of APIs of Mobile and Web Apps
 
Java overview 20131022
Java overview 20131022Java overview 20131022
Java overview 20131022
 
Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex
 
Websphere Application Server: Much more than Open Source
Websphere Application Server: Much more than Open SourceWebsphere Application Server: Much more than Open Source
Websphere Application Server: Much more than Open Source
 
Developer’s intro to the alfresco platform
Developer’s intro to the alfresco platformDeveloper’s intro to the alfresco platform
Developer’s intro to the alfresco platform
 
A Workhorse Named Mule
A Workhorse Named MuleA Workhorse Named Mule
A Workhorse Named Mule
 
Ruby on Rails All Hands Meeting
Ruby on Rails All Hands MeetingRuby on Rails All Hands Meeting
Ruby on Rails All Hands Meeting
 
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
OSS EU:  Deep Dive into Building Streaming Applications with Apache PulsarOSS EU:  Deep Dive into Building Streaming Applications with Apache Pulsar
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!
 
The bigrabbit
The bigrabbitThe bigrabbit
The bigrabbit
 
The Future of Cross-Platform is Native
The Future of Cross-Platform is NativeThe Future of Cross-Platform is Native
The Future of Cross-Platform is Native
 

Mehr von Cominvent AS

Solr's missing plugin ecosystem
Solr's missing plugin ecosystemSolr's missing plugin ecosystem
Solr's missing plugin ecosystemCominvent AS
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr CloudCominvent AS
 
Improving the Solr Update Chain
Improving the Solr Update ChainImproving the Solr Update Chain
Improving the Solr Update ChainCominvent AS
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyCominvent AS
 
Dagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkDagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkCominvent AS
 
Open source breakfast norge findwise
Open source breakfast norge findwiseOpen source breakfast norge findwise
Open source breakfast norge findwiseCominvent AS
 
Frokostseminar mai 2010 solr open source cominvent as
Frokostseminar mai 2010 solr open source cominvent asFrokostseminar mai 2010 solr open source cominvent as
Frokostseminar mai 2010 solr open source cominvent asCominvent AS
 
Cominvent AS company Presentation
Cominvent AS company PresentationCominvent AS company Presentation
Cominvent AS company PresentationCominvent AS
 

Mehr von Cominvent AS (8)

Solr's missing plugin ecosystem
Solr's missing plugin ecosystemSolr's missing plugin ecosystem
Solr's missing plugin ecosystem
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr Cloud
 
Improving the Solr Update Chain
Improving the Solr Update ChainImproving the Solr Update Chain
Improving the Solr Update Chain
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoy
 
Dagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkDagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søk
 
Open source breakfast norge findwise
Open source breakfast norge findwiseOpen source breakfast norge findwise
Open source breakfast norge findwise
 
Frokostseminar mai 2010 solr open source cominvent as
Frokostseminar mai 2010 solr open source cominvent asFrokostseminar mai 2010 solr open source cominvent as
Frokostseminar mai 2010 solr open source cominvent as
 
Cominvent AS company Presentation
Cominvent AS company PresentationCominvent AS company Presentation
Cominvent AS company Presentation
 

Kürzlich hochgeladen

Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 

Kürzlich hochgeladen (20)

Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 

Oslo Enterprise MeetUp May 12th 2010 - Jan Høydahl

  • 1. cominvent as Enterprise Search Specialists Migrating FAST to Solr By Jan Høydahl Oslo Enterprise Search MeetUp May 2010 cominvent as
  • 2. Jan Høydahl ● IT architect - search, telecom, mobile ● Helped build FAST's Global Services as first engineer ● Founder of Cominvent AS ● Search consultant 10 years cominvent as
  • 4. Consulting – Cominvent delivers independent search consulting – Focus on Apache Lucene/Solr & Microsoft FAST ESP Idea –> architecture –> implementation cominvent as
  • 5. Commercial Support (Solr/Lucene) – When community & mailing list support is not enough.. – Paid support agreement for Apache Solr/Lucene – In cooperation with Lucid Imagination – Read more: http://www.cominvent.com/support/ cominvent as
  • 6. Training – Cominvent AS delivers training public and on-site – Certified Solr Training Partner for Lucid Imagination – Certified FAST ESP Training Partner – Read more: http://www.cominvent.com/training/ cominvent as Photo: fluidpowerzone.com
  • 9. FAST & Solr are very similar... cominvent as
  • 13. Introduction to... ...for FAST people cominvent as
  • 14. Apache Solr - characteristics Search server (Commercially friendly) cominvent as
  • 15. Apache Solr - characteristics Modular Community Contributions & patches Light weight cominvent as
  • 16. Solr-user community growth Solr-user growth 1600 1400 1200 1000 Messages 800 Column B 600 400 200 0 2006 Mar 2006 Jul 2006 Nov 2007 Mar 2007 Jul 2007 Nov 2008 Mar 2008 Jul 2008 Nov 2009 Apr 2009 Aug 2009 Dec 2006 Jan 2006 May 2006 Sep 2007 Jan 2007 May 2007 Sep 2008 Jan 2008 May 2008 Sep 2009 Feb 2009 Jun 2009 Oct 2010 Feb cominvent as Month
  • 17. Lucene/Solr deployments – More: http://wiki.apache.org/solr/PublicServers cominvent as Thanks to Lucid Imagination for logo collection
  • 20. The Apache Software Foundation cominvent as
  • 21. Other ASF Lucene sub-projects – Lucene Java library – Rich document extraction – Crawling web pages – Machine learning • Classification/clustering • Collaborative filtering... cominvent as
  • 22. Introduction to... ...for Solr people cominvent as
  • 23. FAST ESP – characteristics & key strengths Security Connectors cominvent as
  • 24. FAST ESP – characteristics & key strengths cominvent as
  • 25. FAST ESP – characteristics & key strengths – Very strong document processing framework Format Language Linguistic Conversion Detection Normalization Entities Custom Taxonomy Sentiment Ontology Plug-in PARIS (Reuters) - Venus Williams raced into the second round of the $11.25 million French Open Monday, Search Alert brushing aside Bianka Lamade, 6-3, 6-3, in 65 minutes. The Wimbledon and U.S. Open champion, seeded second, breezed past the German on a blustery center court to become the first seed to advance at Roland Garros. "I love being here, I love the French Open and more than anything I'd love to do well here," the American said. A first round loser last year, Williams is hoping to progress cominvent as beyond the quarter-finals for the first time in her career.
  • 28. Migration objectives – Possible objectives include: • Lower maintenance cost • Deeper in-house competency • Less dependent on external consultants • Ownership and visibility of source code • Shorter time to market for new features • Bugs fixed faster – or even fix ourselves • Larger community, mailing lists that work! • More choice in external consultants • Contribute back to Open Source • Lower HW footprint cominvent as
  • 29. Migration steps – Knowledge gathering & Training – Review current features & arch • Want to keep all features? Add new? – Migration areas: • Index profile • Content • Feeding • Document Processing • Querying • Search middleware? • Admin & Operational – What to do in Application space vs Search space? cominvent as
  • 30. Feature comparison ESP – Solr (similarities) Feature ESP Solr Full-text, boolean, range search, Yes Yes sorting, sub-second, facets, did-you- mean, synonyms, faceting Scaling for QPS Add rows Add rows Scaling for document volume Add columns Add shards Synonyms Index/query side Index/query side GEO search Yes Yes (1.5) Boolean query language Yes (FQL) Yes (Lucene or (e)DisMax) APIs HTTP, Java, .NET, HTTP, Java, .NET, C++, PHP Ruby, Python, PHP, Perl, JS cominvent as
  • 31. Feature comparison ESP – Solr (differences) Feature ESP Solr Admin server Yes No (coming 1.5) Processes Many (C++, Java, One WAR in Java Python) app-server, 100% Java Navigators / Facets Index-time Query-time Did-you-mean Dictionary based Dictionary or index based Feeding API only HTTP POST or API Document processing Pipeline (py) Simple pipeline (Java, JS, Groovy, Jython, JRuby..) Multi field querying Composite fields DisMax handler cominvent as
  • 32. Feature comparison ESP – Solr (differences) Feature ESP Solr Relevancy tuning Rank profiles, term Dynamic function boosting queries and boost functions XRANK XRANK operator Function Queries Freshness boost Freshness in rank Function Queries profile Boost GEO distance Rank profile and Function Queries special Major schema or software updates Cold update, use Stage new content stage environment into new Solr core Pluggability Docprocs, QT/RP Everything :) (limited), clients Request Handlers, Query Parsers, Docprocs, Rank, Spell, tokenizer++ cominvent as
  • 33. Feature comparison ESP – Solr (differences) Feature ESP Solr Lemmatization Can be licensed Can be licensed for many from 3rd party languages Query syntax and(a:foo, b:bar) a:foo OR b:bar i:range(0, 100) I:[0 TO 100] d:range(2000-01- d:[2000-01- 01T00:00:00, 01T00:00:00Z TO 2010-03- NOW] 03T12:00:00) Query params query= q= offset= start= hits= rows= spell=1 spellcheck=true What fields to return view=viewname fl=title,price,body... cominvent as
  • 34. Feature comparison ESP – Solr (differences) Feature ESP Solr Search XML hierarchy Yes, scope search No Reports Built in analytics Use 3rd party log analysis such as Splunk.com cominvent as
  • 35. Your existing FAST system - overview Your web-app Search middleware? cominvent as Graphics diagram: www.microsoft.com
  • 36. Migrating index profile – ESP index profile -> Solr schema.xml – Setup field types, use defaults or create your own – Setup the static fields. ESP: – Solr equivalent: – No need for generic*, use dynamic fields: cominvent as
  • 37. Migrating index profile – Composite fields? • Solr can use <copyField> to copy multiple fields into one, e.g. as we did to map many attributes into one field • However, to achieve ranking with different boost of each field, Solr does not need composite field. Use DisMax query handler instead. Very powerful! – No need to edit schema to add new fields. Using dynamic fields, it is easy to e.g. Introduce a color facet for cars or a Mpixels facet for digital cameras cominvent as
  • 38. DisMax query example – This Solr query can replace use of composite-field • qt=dismax • q=oslo • qf=title^0.7 highpriorityfields^1.5 mediumpriorityfields^0.6 lowpriorityfields^0.2 recallfields^0.0 body^0.0 • bf=recip(rord(creationDate),1,1000,1000) cominvent as
  • 39. Migrating content – If using FAST ContentAPI to push programatically • Use Solr's clients (Java, .NET, Ruby, Python, PHP...) – If feeding FastXML using FileTraverser • Feed as Solr XML using HTTP POST or a POST client – If you feed custom XML with XMLMapper • Have a look at DIH's import and mapping features cominvent as
  • 40. Push Feeding example – Feed XML using HTTP POST: • curl http://localhost:8080/solr/update?commit=true -H "Content-Type: text/xml" --data-binary @mydoc.xml – Ruby example: • >gem sources -a http://gemcutter.org >sudo gem install rsolr require 'rsolr' solr = RSolr.connect :url=>'http://localhost:8080' documents = [{:id=>1, :price=>1.00}, {:id=>2, :price=>10.50}] solr.add documents solr.commit cominvent as
  • 42. Querying examples – http://localhost:8080/solr/select?q=car&fl=id,title – Ruby • res=solr.select :q=>'roses', :fq=>['red','white'] res['response']['docs'].each do |doc| puts doc['title'] end cominvent as
  • 43. Migrating document processing – Solr lacks a sophisticated pipeline with entity extraction etc. Alternatives: • Do extraction in Application space (Ruby) • Write own stage in Solr pipeline for simple cases • Integrate to do more advanced stuff – Matchers/extractors • LingPipe NamedEntityExtractor inside of OpenPipeline – Synonyms: • Use Solr's synonym handling index/query side – Custom stages: • Write a Solr UpdateProcessor (in Java, Jython etc) – Got a LOT of custom FAST docproc stages? • Have a look at SESAT's PY ProcServer for Solr (GPL) cominvent as
  • 44. Migrating linguistics (lemmatization) – Solr ships with Stemming instead of Lemmatization – Stemming has limitations • Biler, bilen, bilene -> bil BUT • Bøker, bøkene -> bøk; boka, bok -> bok – Kstem better. Free with LucidWorks for Solr – If you need singular/plural handling only • Free dictionaries? Check lucene-hunspell – Lemmatization can be licensed from 3rd party such as Basistech, who also has language identification & entity extraction – Language identification also from Sematext cominvent as
  • 45. Basistech Rosette for Lucene – High-end linguistics capabilities for 19 languages – Language Identification – Segmentation and tokenization – Lemmatization – Noun decompounding – Part-of-speech tagging – Entity extraction – Easily integrated with Lucene/Solr – More: http://www.basistech.com/lucene/ cominvent as
  • 46. Migrating search middleware – Using FAST Unity? • Consider migrating middleware logic such as external source querying and federation to SESAT (AGPL) – Using Comperio Front? • Ask Comperio for Solr engine support • Or migrate custom Q&R formats – Or is plain Solr enough? • Solr has built-in support for shards • A shard query will query multiple shards and merge the results into one • Add custom processing as Query Components in Solr • Check contrib & patches! cominvent as
  • 47. Migrating Front ends – Using a middleware with Solr support? Lucky you! – If not, consider introducing one now. Look at (Java): – If you decide to migrate from FAST Java/.NET APIs • Choose SolrJ or SolrNET • Query language differences. &fq= instead of filter() • Solr facets do not require sessions/state as FAST's – Migrate fast's «views» into named ReqHandler configs – Multi lingual: Need to handle title_no, title_en etc... :( cominvent as
  • 48. Migrating Web Crawler – Solr has no built-in web crawler • Instead you can choose from several integrations – The Apache Nutch crawler • Proven with hundreds of millions of pages • http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ – Apache Droids • Still an incubator, but aims at becoming a full crawler • http://incubator.apache.org/droids/ – Heritix + Solr (example in Solr1.4 book) – OpenPipeline has a (very) simple crawler – Lucene Connectors Framework • Preparing crawler support cominvent as
  • 49. Migrating Connectors – Solr handles these sources internally through DIH: • Database, RSS, Web-services, Local filesystem – Additionally throgh Lucene Connectors Framework: • • EMC Documentum, FileNet, JDBC, LiveLink, Patriarch (Memex), Meridio, SharePoint, RSS • New connectors should be written for LCF – Another option: • • Sharepoint, IMAP, Documentum, Vignette, Filesystem cominvent as
  • 50. Operations – Solr has no admin-server (coming in 1.5) – Possible to run multiple Tomcat on same server – Multiple cores in same Tomcat – easier migration – No built-in query reports, use 3rd party tools – No built-in monitoring, have a look at – Log analysis? Check out cominvent as
  • 52. Thank You www.cominvent.com jh@cominvent.com www.twitter.com/cominvent linkedin.com/in/janhoy This presentation licensed under CC-by-sa license cominvent as You must attribute Cominvent with name and link