SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Ruby Day Kraków: Full Text Search
                    with Ferret

                                                Agnieszka Figiel


                                          25th November 2006




Ruby Day Kraków: Full Text Search with Ferret
Agenda



              full text search implementation options
              tools for ruby
              ferret and acts as ferret
              searching with ferret
              overview of index options
              multi search
              more like it




Ruby Day Kraków: Full Text Search with Ferret
Full Text Search

      A search of a document collection, which examines all of the words
      in every stored document as it tries to match search words supplied
      by the user.



              index
                     tokenize all documents
                     filter out stop words
                     apply stemming
                     apply a term weighting scheme
              search
                     use the index to find all documents matching a query


Ruby Day Kraków: Full Text Search with Ferret
Database Full Text Index




              MySQL
              PostgreSQL
              MS SQL
              Oracle
              DB2




Ruby Day Kraków: Full Text Search with Ferret
Search Systems



              Google, Yahoo
              Swish-e (C, Perl API available)
              Lucene (Java, ports for C, C++, .NET, Delphi, Perl, Python,
              PHP, Common Lisp, ruby)
              Nutch (Lucene + crawler)
              Lucene-WS (Lucene via REST)
              SOLR (Lucene via XML/HTTP and JSON)




Ruby Day Kraków: Full Text Search with Ferret
Ruby Search Systems




              Hyper Estraier
              Ferret




Ruby Day Kraków: Full Text Search with Ferret
Ferret




      http://rubyforge.org/projects/ferret

      a text search engine library written for Ruby. It is inspired by
      Apache Lucene Java project.




Ruby Day Kraków: Full Text Search with Ferret
acts as ferret


      http://projects.jkraemer.net/acts_as_ferret/wiki

      a plugin for Ruby on Rails which builds on Ferret
              search across the contents of any Rails model class
              each model has its own index on disk
              search multiple models
              support for Rails Single Table Inheritance
              index attributes or virtual attributes of a model
              indexing can be customized by overriding the to doc method
              find similar items (’more like this’)


Ruby Day Kraków: Full Text Search with Ferret
Installation



      ferret gem:

      gem install ferret

      acts as ferret:
      script/plugin install
      svn://projects.jkraemer.net/acts_as_ferret/tags/stable/acts_as_ferret




Ruby Day Kraków: Full Text Search with Ferret
Example




      YASB (Yet Another Searchable Blog)
       class Post < ActiveRecord::Base
         has_many :comments
       end

       class Comment < ActiveRecord::Base
         belongs_to :post
       end




Ruby Day Kraków: Full Text Search with Ferret
Basic post search


      Let’s add a basic search on the Post model:
       class Post < ActiveRecord::Base
         has_many :comments
         acts_as_ferret
       end


      Search posts:
       Post.find_by_contents(search_term)
      After running the first search an index will be created for the Post
      model.
      ALL fields are indexed if no additional options are given, including
      arrays of child objects (STI).



Ruby Day Kraków: Full Text Search with Ferret
Limit indexed fields



      To limit the fields that are indexed for a given model we can
      specify their list:
       acts_as_ferret :fields => [ ’title’, ’body’ ]


      NOTE: after any change to index settings, the index needs to be
      rebuilt.
       Post.rebuild_index




Ruby Day Kraków: Full Text Search with Ferret
Index options


      There are numerous options of customising ferret’s indexing.

      Example:
         acts_as_ferret( :fields            => {
           :title => { :boost =>            2 },
           :body => { :boost =>             1}
         }, :store_class_name =>            true)


      This will add a boost (importance) factor of 2 to the title field,
      and 1 to the body field. The class name will be stored for multiple
      class searches.



Ruby Day Kraków: Full Text Search with Ferret
Index options: store



             Value                  Description
             :no                    Don’t store field
             :yes                   Store field in its original format.
                                    Use this value if you want to highlight
                                    matches or print match excerpts a la Google
                                    search.
             :compressed            Store field in compressed format.




Ruby Day Kraków: Full Text Search with Ferret
Index options: index

        Value                                   Description
        :no                                     Do not make this field searchable.
        :yes                                    Make this field searchable and tok-
                                                enize its contents.
        :untokenized                            Make this field searchable but do not
                                                tokenize its contents. Use this value
                                                for fields you wish to sort by.
        :omit norms                             Same as :yes except omit the norms
                                                file. The norms file can be omit-
                                                ted if you don’t boost any fields and
                                                you don’t need scoring based on field
                                                length.
        :untokenized omit norms                 Same as :untokenized except omit the
                                                norms file.
Ruby Day Kraków: Full Text Search with Ferret
Index options: term vector



        Value                                   Description
        :no                                     Don’t store term-vectors
        :yes                                    Store term-vectors without storing positions
                                                or offsets.
        :with positions                         Store term-vectors with positions.
        :with offsets                            Store term-vectors with offsets.
        :with positions ofssets                 Store term-vectors with positions and off-
                                                sets.




Ruby Day Kraków: Full Text Search with Ferret
Index options: boost



                  Value         Description
                  Float         The boost property is used to set the default
                                boost for a field. This boost value will used
                                for all instances of the field in the index un-
                                less otherwise specified when you create the
                                field. All values should be positive.




Ruby Day Kraków: Full Text Search with Ferret
Search the comments


      Searching a model and its related models can be achieved with
      virtual attributes.

      A getter of all comment messages defined in Post class:
       def post_comments
         comments.collect{|c| c.message}.join(’ ’)
       end


      Add like a normal field to ferret’s field list:
       acts_as_ferret :fields => [ ’title’, ’body’, ’post_comments’ ]




Ruby Day Kraków: Full Text Search with Ferret
Search in multiple models



      In case we would like to search for both comments and posts
      (multi search) we need to:
              create index for both models
              for each of them set the store class name flag

      After rebuilding indices for Post and Comment we can run a multi
      search on both:
       Post.multi_search(params[:search],[Comment])




Ruby Day Kraków: Full Text Search with Ferret
More like this


      We would like a feature of finding the most similar posts to a
      chosen one.
      That’s pretty simple:
       post.more_like_this({:field_names=>[’title’,’body’,’post_comments’],
       :min_term_freq => 2, :min_doc_freq => 3})


      The options passed here tell the search engine 2 things:
              take into consideration only terms that appear more than once
              in the source document
              take into consideration only terms that appear in minimum 3
              documents



Ruby Day Kraków: Full Text Search with Ferret
Links
      Products:
              Swish-e http://swish-e.org/index.html
              Lucene http://lucene.apache.org/java/docs/index.html
              Nutch http://lucene.apache.org/nutch/
              Lucene-WS http://lucene-ws.sourceforge.net/
              SOLR http://incubator.apache.org/solr/
              Hyper Estraier http://hyperestraier.sourceforge.net/
              Ferret http://rubyforge.org/projects/ferret
              acts as ferret http://projects.jkraemer.net/acts as ferret/

      Reading:
              tutorial by Roman Mackovcak: http://blog.zmok.net/articles/2006/10/18/full-
              text-search-in-ruby-on-rails-3-ferret
              tutorial by Seth Fitzsimmons: http://mojodna.net/searchable/ruby/railsconf.pdf
              aaf and Unicode by Albert Ramstedt:
              http://albert.delamednoll.se/articles/2005/12/20/the-ferret-plugin-with-simple-
              unicode-support

Ruby Day Kraków: Full Text Search with Ferret
Thank you!


      Good luck using ferret!




Ruby Day Kraków: Full Text Search with Ferret

Weitere ähnliche Inhalte

Ähnlich wie Ruby Day Kraków: Full Text Search with Ferret

20100622 e z_find_slides_gig_v2.1
20100622 e z_find_slides_gig_v2.120100622 e z_find_slides_gig_v2.1
20100622 e z_find_slides_gig_v2.1Gilles Guirand
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"George Stathis
 
Finding Love with MongoDB
Finding Love with MongoDBFinding Love with MongoDB
Finding Love with MongoDBMongoDB
 
Compass Framework
Compass FrameworkCompass Framework
Compass FrameworkLukas Vlcek
 
Finding the right stuff, an intro to Elasticsearch (at Rug::B)
Finding the right stuff, an intro to Elasticsearch (at Rug::B) Finding the right stuff, an intro to Elasticsearch (at Rug::B)
Finding the right stuff, an intro to Elasticsearch (at Rug::B) Michael Reinsch
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache SolrBiogeeks
 
Day 2 - Intro to Rails
Day 2 - Intro to RailsDay 2 - Intro to Rails
Day 2 - Intro to RailsBarry Jones
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Karel Minarik
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedBeyondTrees
 
.NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se...
.NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se....NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se...
.NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se...Maarten Balliauw
 

Ähnlich wie Ruby Day Kraków: Full Text Search with Ferret (20)

Solr 8 interview
Solr 8 interview Solr 8 interview
Solr 8 interview
 
20100622 e z_find_slides_gig_v2.1
20100622 e z_find_slides_gig_v2.120100622 e z_find_slides_gig_v2.1
20100622 e z_find_slides_gig_v2.1
 
Vespa, A Tour
Vespa, A TourVespa, A Tour
Vespa, A Tour
 
Ruby On Rails
Ruby On RailsRuby On Rails
Ruby On Rails
 
SPIN in Five Slides
SPIN in Five SlidesSPIN in Five Slides
SPIN in Five Slides
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
 
Ruby on rails
Ruby on railsRuby on rails
Ruby on rails
 
Ruby on Rails
Ruby on Rails Ruby on Rails
Ruby on Rails
 
Ruby on rails
Ruby on railsRuby on rails
Ruby on rails
 
Finding Love with MongoDB
Finding Love with MongoDBFinding Love with MongoDB
Finding Love with MongoDB
 
Compass Framework
Compass FrameworkCompass Framework
Compass Framework
 
Java scriptforjavadev part2a
Java scriptforjavadev part2aJava scriptforjavadev part2a
Java scriptforjavadev part2a
 
Finding the right stuff, an intro to Elasticsearch (at Rug::B)
Finding the right stuff, an intro to Elasticsearch (at Rug::B) Finding the right stuff, an intro to Elasticsearch (at Rug::B)
Finding the right stuff, an intro to Elasticsearch (at Rug::B)
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Day 2 - Intro to Rails
Day 2 - Intro to RailsDay 2 - Intro to Rails
Day 2 - Intro to Rails
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Jena Programming
Jena ProgrammingJena Programming
Jena Programming
 
.NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se...
.NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se....NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se...
.NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se...
 

Mehr von elliando dias

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slideselliando dias
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScriptelliando dias
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structureselliando dias
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de containerelliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agilityelliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Webelliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduinoelliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorceryelliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Designelliando dias
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makeselliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Studyelliando dias
 
From Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn IntroductionFrom Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn Introductionelliando dias
 
FleetDB A Schema-Free Database in Clojure
FleetDB A Schema-Free Database in ClojureFleetDB A Schema-Free Database in Clojure
FleetDB A Schema-Free Database in Clojureelliando dias
 

Mehr von elliando dias (20)

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
 
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
 
Rango
RangoRango
Rango
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
 
From Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn IntroductionFrom Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn Introduction
 
FleetDB A Schema-Free Database in Clojure
FleetDB A Schema-Free Database in ClojureFleetDB A Schema-Free Database in Clojure
FleetDB A Schema-Free Database in Clojure
 

Kürzlich hochgeladen

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 

Kürzlich hochgeladen (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Ruby Day Kraków: Full Text Search with Ferret

  • 1. Ruby Day Kraków: Full Text Search with Ferret Agnieszka Figiel 25th November 2006 Ruby Day Kraków: Full Text Search with Ferret
  • 2. Agenda full text search implementation options tools for ruby ferret and acts as ferret searching with ferret overview of index options multi search more like it Ruby Day Kraków: Full Text Search with Ferret
  • 3. Full Text Search A search of a document collection, which examines all of the words in every stored document as it tries to match search words supplied by the user. index tokenize all documents filter out stop words apply stemming apply a term weighting scheme search use the index to find all documents matching a query Ruby Day Kraków: Full Text Search with Ferret
  • 4. Database Full Text Index MySQL PostgreSQL MS SQL Oracle DB2 Ruby Day Kraków: Full Text Search with Ferret
  • 5. Search Systems Google, Yahoo Swish-e (C, Perl API available) Lucene (Java, ports for C, C++, .NET, Delphi, Perl, Python, PHP, Common Lisp, ruby) Nutch (Lucene + crawler) Lucene-WS (Lucene via REST) SOLR (Lucene via XML/HTTP and JSON) Ruby Day Kraków: Full Text Search with Ferret
  • 6. Ruby Search Systems Hyper Estraier Ferret Ruby Day Kraków: Full Text Search with Ferret
  • 7. Ferret http://rubyforge.org/projects/ferret a text search engine library written for Ruby. It is inspired by Apache Lucene Java project. Ruby Day Kraków: Full Text Search with Ferret
  • 8. acts as ferret http://projects.jkraemer.net/acts_as_ferret/wiki a plugin for Ruby on Rails which builds on Ferret search across the contents of any Rails model class each model has its own index on disk search multiple models support for Rails Single Table Inheritance index attributes or virtual attributes of a model indexing can be customized by overriding the to doc method find similar items (’more like this’) Ruby Day Kraków: Full Text Search with Ferret
  • 9. Installation ferret gem: gem install ferret acts as ferret: script/plugin install svn://projects.jkraemer.net/acts_as_ferret/tags/stable/acts_as_ferret Ruby Day Kraków: Full Text Search with Ferret
  • 10. Example YASB (Yet Another Searchable Blog) class Post < ActiveRecord::Base has_many :comments end class Comment < ActiveRecord::Base belongs_to :post end Ruby Day Kraków: Full Text Search with Ferret
  • 11. Basic post search Let’s add a basic search on the Post model: class Post < ActiveRecord::Base has_many :comments acts_as_ferret end Search posts: Post.find_by_contents(search_term) After running the first search an index will be created for the Post model. ALL fields are indexed if no additional options are given, including arrays of child objects (STI). Ruby Day Kraków: Full Text Search with Ferret
  • 12. Limit indexed fields To limit the fields that are indexed for a given model we can specify their list: acts_as_ferret :fields => [ ’title’, ’body’ ] NOTE: after any change to index settings, the index needs to be rebuilt. Post.rebuild_index Ruby Day Kraków: Full Text Search with Ferret
  • 13. Index options There are numerous options of customising ferret’s indexing. Example: acts_as_ferret( :fields => { :title => { :boost => 2 }, :body => { :boost => 1} }, :store_class_name => true) This will add a boost (importance) factor of 2 to the title field, and 1 to the body field. The class name will be stored for multiple class searches. Ruby Day Kraków: Full Text Search with Ferret
  • 14. Index options: store Value Description :no Don’t store field :yes Store field in its original format. Use this value if you want to highlight matches or print match excerpts a la Google search. :compressed Store field in compressed format. Ruby Day Kraków: Full Text Search with Ferret
  • 15. Index options: index Value Description :no Do not make this field searchable. :yes Make this field searchable and tok- enize its contents. :untokenized Make this field searchable but do not tokenize its contents. Use this value for fields you wish to sort by. :omit norms Same as :yes except omit the norms file. The norms file can be omit- ted if you don’t boost any fields and you don’t need scoring based on field length. :untokenized omit norms Same as :untokenized except omit the norms file. Ruby Day Kraków: Full Text Search with Ferret
  • 16. Index options: term vector Value Description :no Don’t store term-vectors :yes Store term-vectors without storing positions or offsets. :with positions Store term-vectors with positions. :with offsets Store term-vectors with offsets. :with positions ofssets Store term-vectors with positions and off- sets. Ruby Day Kraków: Full Text Search with Ferret
  • 17. Index options: boost Value Description Float The boost property is used to set the default boost for a field. This boost value will used for all instances of the field in the index un- less otherwise specified when you create the field. All values should be positive. Ruby Day Kraków: Full Text Search with Ferret
  • 18. Search the comments Searching a model and its related models can be achieved with virtual attributes. A getter of all comment messages defined in Post class: def post_comments comments.collect{|c| c.message}.join(’ ’) end Add like a normal field to ferret’s field list: acts_as_ferret :fields => [ ’title’, ’body’, ’post_comments’ ] Ruby Day Kraków: Full Text Search with Ferret
  • 19. Search in multiple models In case we would like to search for both comments and posts (multi search) we need to: create index for both models for each of them set the store class name flag After rebuilding indices for Post and Comment we can run a multi search on both: Post.multi_search(params[:search],[Comment]) Ruby Day Kraków: Full Text Search with Ferret
  • 20. More like this We would like a feature of finding the most similar posts to a chosen one. That’s pretty simple: post.more_like_this({:field_names=>[’title’,’body’,’post_comments’], :min_term_freq => 2, :min_doc_freq => 3}) The options passed here tell the search engine 2 things: take into consideration only terms that appear more than once in the source document take into consideration only terms that appear in minimum 3 documents Ruby Day Kraków: Full Text Search with Ferret
  • 21. Links Products: Swish-e http://swish-e.org/index.html Lucene http://lucene.apache.org/java/docs/index.html Nutch http://lucene.apache.org/nutch/ Lucene-WS http://lucene-ws.sourceforge.net/ SOLR http://incubator.apache.org/solr/ Hyper Estraier http://hyperestraier.sourceforge.net/ Ferret http://rubyforge.org/projects/ferret acts as ferret http://projects.jkraemer.net/acts as ferret/ Reading: tutorial by Roman Mackovcak: http://blog.zmok.net/articles/2006/10/18/full- text-search-in-ruby-on-rails-3-ferret tutorial by Seth Fitzsimmons: http://mojodna.net/searchable/ruby/railsconf.pdf aaf and Unicode by Albert Ramstedt: http://albert.delamednoll.se/articles/2005/12/20/the-ferret-plugin-with-simple- unicode-support Ruby Day Kraków: Full Text Search with Ferret
  • 22. Thank you! Good luck using ferret! Ruby Day Kraków: Full Text Search with Ferret