SlideShare a Scribd company logo
1 of 73
Download to read offline
Ferret
A Ruby Search Engine
  Brian Sam-Bodden
Agenda

• What is Ferret?
• Concepts
• Fields
• Indexing
• Installing Ferret
Agenda

• The Recipe
• Documents
• Ferret::Index::Index
• FQL
• Ferret in you App
Agenda

• Ferret in Rails
• Resources
What is Ferret?

• Information Retrieval (IR) Library
• Full-featured Text Search Engine
• Inspired on the         Search Engine

• Port to Ruby by David Balmain
What is Ferret?

• Initially a 100% pure Ruby port
• Since 0.9 many core functions are
  implemented in C

• Fast! Now Faster than Lucene ;-)
Concepts
Concepts

• Index : Sequence of documents
Concepts

• Index : Sequence of documents
• Document : Sequence of fields
Concepts

• Index : Sequence of documents
• Document : Sequence of fields
• Field : Named sequence of terms
Concepts

• Index : Sequence of documents
• Document : Sequence of fields
• Field : Named sequence of terms
• Term : A text string, keyed by field name
Fields of a Document in
        an Index
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
  • Indexed: Inverted to rapidly find all Documents
    containing any of the Terms
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
  • Indexed: Inverted to rapidly find all Documents
    containing any of the Terms

  • Tokenized: Individual Terms extracted are
    indexed
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
  • Indexed: Inverted to rapidly find all Documents
    containing any of the Terms

  • Tokenized: Individual Terms extracted are
    indexed

  • Vectored: Frequency and location of Terms are
    stored
It’s all about Indexing

• Indexing is the processing of a source
  document into plain text tokens that Ferret
  can manipulate
• For any non-plaintext sources such as PDF,
  Word, Excel you need to:
  • Extract
  • Analyze
Installing Ferret
Installing Ferret



gem install ferret
Installing Ferret
Installing Ferret
Installing Ferret



    }
Installing Ferret



    }   Pick the latest version
        for your platform
The Recipe
The Recipe

1. Create some Documents
The Recipe

1. Create some Documents

2. Create an Index
The Recipe

1. Create some Documents

2. Create an Index

3. Adding Documents to the Index
The Recipe

1. Create some Documents

2. Create an Index

3. Adding Documents to the Index

4. Perform some Queries
Example Documents
 Create some Documents
Example Documents
  Create some Documents




 “Any String is a Document”
Example Documents
 Create some Documents
Example Documents
   Create some Documents




[“This”, “is also”, “a document”]
Example Documents
 Create some Documents
Example Documents
 Create some Documents
Ferret::Index::Index
     Create an Index
Ferret::Index::Index
            Create an Index

• Indexes are encapsulated by the class
Ferret::Index::Index
             Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
Ferret::Index::Index
             Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
Ferret::Index::Index
             Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
Ferret::Index::Index
              Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
 ➡ index = Ferret::I.new(:path = > ‘/somepath’)
Ferret::Index::Index
              Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
 ➡ index = Ferret::I.new(:path = > ‘/somepath’)
• Or, completely in Memory
Ferret::Index::Index
              Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
 ➡ index = Ferret::I.new(:path = > ‘/somepath’)
• Or, completely in Memory
 ➡ index = Ferret::I.new()
Ferret::Index::Index
     Adding Documents to the Index

• Index provides the add_document
  method

• It also provides the << alias
• Adding documents is then as easy as:
 ➡ index << “This is a document”
 ➡ index << {:first => “Bob”, :last => “Smith”}
Ferret::Index::Index
   Perform some Queries
Ferret::Index::Index
         Perform some Queries

• Index provides the search and
  search_each methods
Ferret::Index::Index
          Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
Ferret::Index::Index
           Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
 ➡ search(query, options = {})
Ferret::Index::Index
          Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
 ➡ search(query, options = {})
• The search_each method provides an
  iterator block
Ferret::Index::Index
            Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
 ➡ search(query, options = {})
• The search_each method provides an
  iterator block
 ➡ search_each(query, options = {}) {|doc, score| ... }
Playing with Ferret in irb
Playing with Ferret in irb
Playing with Ferret in irb
Playing with Ferret in irb
Playing with Ferret in irb
Playing with Ferret in irb
Playing with Ferret in irb
Playing with Ferret in irb
Ferret Query Language

• Ferret own Query Language, FQL is a
  powerful way to specify search queries

• FQL supports many query types,
  including:

     • Term         • Range
     • Phrase       • Wild
     • Field        • Fuzz
     • Boolean
Index.explain

• The explain method of Index describes
  how a document score against a query
 • Very useful for debugging
 • and for learning how Ferret works
Index.explain
Ferret in your App
Application


                   Database             Web


                                                                   User
                                          Manual
              File System
                                           Input


                                                      Get User’s             Present
                        Gather Data                                       Search Results
                                                        Query



                              Index
                            Documents                        Search Index
Ferret




                                              Index
Ferret in Rails

• Acts As Ferret is an ActiveRecord
  extension

• Available as a plugin
• Provides a simplified interface to
  Ferret
• Maintained by Jens Kramer
Ferret in Rails

• Adding an index to an ActiveRecord
  model is as simple as:
Ferret in Rails

• Adding an index to an ActiveRecord
  model is as simple as:
Ferret in Rails
• Simple model has two searchable
  fields title and body:
Ferret in Rails

• After a quick rake db:migrate we now
  have some data to play with
• Fire up the Rails Console and let’s see
  what acts_as_ferret can do for our
  models
Ferret in Rails
Want more?

• Ferret is improving constantly
• Acts As Ferret seems to catch up
  quickly

• Real-life usage seems to require some
  good engineering on your part

  • Background indexing
  • Hot swap of indexes?
Want more?

• We only covered the simplest
  constructs in Ferret

• Ferret’s API provides enough
  flexibility for the most demanding
  searching needs
Online Resources

• http://ferret.davebalmain.com
• http://lucene.apache.org
• http://lucenebook.com
• http://projects.jkraemer.net/acts_as_ferret
In-Print Resources
Thanks!

More Related Content

What's hot

Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationVladimir Alexiev, PhD, PMP
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache SolrBiogeeks
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conferenceErik Hatcher
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solrpittaya
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEcommerce Solution Provider SysIQ
 
LODOP - Multi-Query Optimization for Linked Data Profiling Queries
LODOP - Multi-Query Optimization for Linked Data Profiling QueriesLODOP - Multi-Query Optimization for Linked Data Profiling Queries
LODOP - Multi-Query Optimization for Linked Data Profiling QueriesAnja Jentzsch
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
Introduction to apache lucene
Introduction to apache luceneIntroduction to apache lucene
Introduction to apache luceneShrikrishna Parab
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 

What's hot (20)

Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Elasticsearch speed is key
Elasticsearch speed is keyElasticsearch speed is key
Elasticsearch speed is key
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
Tibco-Patterns
Tibco-Patterns Tibco-Patterns
Tibco-Patterns
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
LODOP - Multi-Query Optimization for Linked Data Profiling Queries
LODOP - Multi-Query Optimization for Linked Data Profiling QueriesLODOP - Multi-Query Optimization for Linked Data Profiling Queries
LODOP - Multi-Query Optimization for Linked Data Profiling Queries
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
Introduction to apache lucene
Introduction to apache luceneIntroduction to apache lucene
Introduction to apache lucene
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 

Viewers also liked

The Ferret - Centre for Investigative Journalism Talk '16
The Ferret - Centre for Investigative Journalism Talk '16The Ferret - Centre for Investigative Journalism Talk '16
The Ferret - Centre for Investigative Journalism Talk '16TheFerret
 
Free Ferret-World.com Calendar 2011 - images right side up
Free Ferret-World.com Calendar 2011 - images right side upFree Ferret-World.com Calendar 2011 - images right side up
Free Ferret-World.com Calendar 2011 - images right side upFerret-World .com
 
Black Footed Ferret Webquest
Black Footed Ferret WebquestBlack Footed Ferret Webquest
Black Footed Ferret WebquestBrad McNutt
 
Gorilas. dian fossey
Gorilas. dian fosseyGorilas. dian fossey
Gorilas. dian fosseyjclua1234
 
Giraffe powerpoint
Giraffe powerpointGiraffe powerpoint
Giraffe powerpointrsimonsen
 
Emu
EmuEmu
Emumrsw
 
Hippopotamus
HippopotamusHippopotamus
Hippopotamusseattle15
 
The African Elephant
The African ElephantThe African Elephant
The African Elephantpriomhoide
 
Emu (Economic and monetary union)
Emu (Economic and monetary union)Emu (Economic and monetary union)
Emu (Economic and monetary union)Shekharaditya Patel
 
Gorilas - Tudo sobre Gorilas
Gorilas - Tudo sobre GorilasGorilas - Tudo sobre Gorilas
Gorilas - Tudo sobre GorilasJéssica Pett
 
Giraffes
GiraffesGiraffes
GiraffesJleu12
 

Viewers also liked (18)

The Ferret - Centre for Investigative Journalism Talk '16
The Ferret - Centre for Investigative Journalism Talk '16The Ferret - Centre for Investigative Journalism Talk '16
The Ferret - Centre for Investigative Journalism Talk '16
 
Free Ferret-World.com Calendar 2011 - images right side up
Free Ferret-World.com Calendar 2011 - images right side upFree Ferret-World.com Calendar 2011 - images right side up
Free Ferret-World.com Calendar 2011 - images right side up
 
Fox
FoxFox
Fox
 
Black Footed Ferret Webquest
Black Footed Ferret WebquestBlack Footed Ferret Webquest
Black Footed Ferret Webquest
 
Red Foxes
Red FoxesRed Foxes
Red Foxes
 
Giraffes
GiraffesGiraffes
Giraffes
 
Gorilas. dian fossey
Gorilas. dian fosseyGorilas. dian fossey
Gorilas. dian fossey
 
Elephant facts
Elephant factsElephant facts
Elephant facts
 
Giraffe powerpoint
Giraffe powerpointGiraffe powerpoint
Giraffe powerpoint
 
Emu
EmuEmu
Emu
 
Hippopotamus
HippopotamusHippopotamus
Hippopotamus
 
The African Elephant
The African ElephantThe African Elephant
The African Elephant
 
Emu (Economic and monetary union)
Emu (Economic and monetary union)Emu (Economic and monetary union)
Emu (Economic and monetary union)
 
The Emu
The EmuThe Emu
The Emu
 
Giraffes ppt
Giraffes pptGiraffes ppt
Giraffes ppt
 
Gorilas - Tudo sobre Gorilas
Gorilas - Tudo sobre GorilasGorilas - Tudo sobre Gorilas
Gorilas - Tudo sobre Gorilas
 
Espiritismo
EspiritismoEspiritismo
Espiritismo
 
Giraffes
GiraffesGiraffes
Giraffes
 

Similar to Ferret

Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with LuceneWO Community
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die() LivePerson
 
Rapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxRapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxMichael Hackstein
 
Website designing company_in_delhi_phpwebdevelopment
Website designing company_in_delhi_phpwebdevelopmentWebsite designing company_in_delhi_phpwebdevelopment
Website designing company_in_delhi_phpwebdevelopmentCss Founder
 
Codeigniter Training Part3
Codeigniter Training Part3Codeigniter Training Part3
Codeigniter Training Part3Weerayut Hongsa
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 
Write code that writes code! A beginner's guide to Annotation Processing - Ja...
Write code that writes code! A beginner's guide to Annotation Processing - Ja...Write code that writes code! A beginner's guide to Annotation Processing - Ja...
Write code that writes code! A beginner's guide to Annotation Processing - Ja...DroidConTLV
 
Write code that writes code!
Write code that writes code!Write code that writes code!
Write code that writes code!Jason Feinstein
 
Siddeswara Guru_ TERN's Data Discovery Portal: finding and accessing Australi...
Siddeswara Guru_ TERN's Data Discovery Portal: finding and accessing Australi...Siddeswara Guru_ TERN's Data Discovery Portal: finding and accessing Australi...
Siddeswara Guru_ TERN's Data Discovery Portal: finding and accessing Australi...TERN Australia
 
Lemur Tutorial at SIGIR 2006
Lemur Tutorial at SIGIR 2006Lemur Tutorial at SIGIR 2006
Lemur Tutorial at SIGIR 2006pogil
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search EnginesNitin Pande
 
Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxsmile790243
 
EnCase Enterprise Basic File Collection
EnCase Enterprise Basic File Collection EnCase Enterprise Basic File Collection
EnCase Enterprise Basic File Collection Damir Delija
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenchesIsmail Mayat
 

Similar to Ferret (20)

Scaling / optimizing search on netlog
Scaling / optimizing search on netlogScaling / optimizing search on netlog
Scaling / optimizing search on netlog
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
Zend Framework MVC driven ExtJS
Zend Framework MVC driven ExtJSZend Framework MVC driven ExtJS
Zend Framework MVC driven ExtJS
 
Measure() or die()
Measure() or die()Measure() or die()
Measure() or die()
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die()
 
Rapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxRapid API Development ArangoDB Foxx
Rapid API Development ArangoDB Foxx
 
Website designing company_in_delhi_phpwebdevelopment
Website designing company_in_delhi_phpwebdevelopmentWebsite designing company_in_delhi_phpwebdevelopment
Website designing company_in_delhi_phpwebdevelopment
 
Tthornton code4lib
Tthornton code4libTthornton code4lib
Tthornton code4lib
 
Codeigniter Training Part3
Codeigniter Training Part3Codeigniter Training Part3
Codeigniter Training Part3
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Write code that writes code! A beginner's guide to Annotation Processing - Ja...
Write code that writes code! A beginner's guide to Annotation Processing - Ja...Write code that writes code! A beginner's guide to Annotation Processing - Ja...
Write code that writes code! A beginner's guide to Annotation Processing - Ja...
 
Write code that writes code!
Write code that writes code!Write code that writes code!
Write code that writes code!
 
Siddeswara Guru_ TERN's Data Discovery Portal: finding and accessing Australi...
Siddeswara Guru_ TERN's Data Discovery Portal: finding and accessing Australi...Siddeswara Guru_ TERN's Data Discovery Portal: finding and accessing Australi...
Siddeswara Guru_ TERN's Data Discovery Portal: finding and accessing Australi...
 
Bio2RDF@BH2010
Bio2RDF@BH2010Bio2RDF@BH2010
Bio2RDF@BH2010
 
Session6
Session6Session6
Session6
 
Lemur Tutorial at SIGIR 2006
Lemur Tutorial at SIGIR 2006Lemur Tutorial at SIGIR 2006
Lemur Tutorial at SIGIR 2006
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search Engines
 
Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docx
 
EnCase Enterprise Basic File Collection
EnCase Enterprise Basic File Collection EnCase Enterprise Basic File Collection
EnCase Enterprise Basic File Collection
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenches
 

More from Brian Sam-Bodden

Baruco 2014 - Rubymotion Workshop
Baruco 2014 - Rubymotion WorkshopBaruco 2014 - Rubymotion Workshop
Baruco 2014 - Rubymotion WorkshopBrian Sam-Bodden
 
Server-Side Push: Comet, Web Sockets come of age (OSCON 2013)
Server-Side Push: Comet, Web Sockets come of age (OSCON 2013)Server-Side Push: Comet, Web Sockets come of age (OSCON 2013)
Server-Side Push: Comet, Web Sockets come of age (OSCON 2013)Brian Sam-Bodden
 
Ruby Metaprogramming - OSCON 2008
Ruby Metaprogramming - OSCON 2008Ruby Metaprogramming - OSCON 2008
Ruby Metaprogramming - OSCON 2008Brian Sam-Bodden
 
RailsConf 2013: RubyMotion
RailsConf 2013: RubyMotionRailsConf 2013: RubyMotion
RailsConf 2013: RubyMotionBrian Sam-Bodden
 
Rspec and Capybara Intro Tutorial at RailsConf 2013
Rspec and Capybara Intro Tutorial at RailsConf 2013Rspec and Capybara Intro Tutorial at RailsConf 2013
Rspec and Capybara Intro Tutorial at RailsConf 2013Brian Sam-Bodden
 
Road to mobile w/ Sinatra, jQuery Mobile, Spine.js and Mustache
Road to mobile w/ Sinatra, jQuery Mobile, Spine.js and MustacheRoad to mobile w/ Sinatra, jQuery Mobile, Spine.js and Mustache
Road to mobile w/ Sinatra, jQuery Mobile, Spine.js and MustacheBrian Sam-Bodden
 
Trellis Framework At RubyWebConf
Trellis Framework At RubyWebConfTrellis Framework At RubyWebConf
Trellis Framework At RubyWebConfBrian Sam-Bodden
 
Bitter Java, Sweeten with JRuby
Bitter Java, Sweeten with JRubyBitter Java, Sweeten with JRuby
Bitter Java, Sweeten with JRubyBrian Sam-Bodden
 

More from Brian Sam-Bodden (10)

Baruco 2014 - Rubymotion Workshop
Baruco 2014 - Rubymotion WorkshopBaruco 2014 - Rubymotion Workshop
Baruco 2014 - Rubymotion Workshop
 
Server-Side Push: Comet, Web Sockets come of age (OSCON 2013)
Server-Side Push: Comet, Web Sockets come of age (OSCON 2013)Server-Side Push: Comet, Web Sockets come of age (OSCON 2013)
Server-Side Push: Comet, Web Sockets come of age (OSCON 2013)
 
Ruby Metaprogramming - OSCON 2008
Ruby Metaprogramming - OSCON 2008Ruby Metaprogramming - OSCON 2008
Ruby Metaprogramming - OSCON 2008
 
RailsConf 2013: RubyMotion
RailsConf 2013: RubyMotionRailsConf 2013: RubyMotion
RailsConf 2013: RubyMotion
 
Rspec and Capybara Intro Tutorial at RailsConf 2013
Rspec and Capybara Intro Tutorial at RailsConf 2013Rspec and Capybara Intro Tutorial at RailsConf 2013
Rspec and Capybara Intro Tutorial at RailsConf 2013
 
Road to mobile w/ Sinatra, jQuery Mobile, Spine.js and Mustache
Road to mobile w/ Sinatra, jQuery Mobile, Spine.js and MustacheRoad to mobile w/ Sinatra, jQuery Mobile, Spine.js and Mustache
Road to mobile w/ Sinatra, jQuery Mobile, Spine.js and Mustache
 
Trellis Framework At RubyWebConf
Trellis Framework At RubyWebConfTrellis Framework At RubyWebConf
Trellis Framework At RubyWebConf
 
Integrallis groovy-cloud
Integrallis groovy-cloudIntegrallis groovy-cloud
Integrallis groovy-cloud
 
Bitter Java, Sweeten with JRuby
Bitter Java, Sweeten with JRubyBitter Java, Sweeten with JRuby
Bitter Java, Sweeten with JRuby
 
Ruby Metaprogramming 08
Ruby Metaprogramming 08Ruby Metaprogramming 08
Ruby Metaprogramming 08
 

Recently uploaded

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Ferret

  • 1. Ferret A Ruby Search Engine Brian Sam-Bodden
  • 2. Agenda • What is Ferret? • Concepts • Fields • Indexing • Installing Ferret
  • 3. Agenda • The Recipe • Documents • Ferret::Index::Index • FQL • Ferret in you App
  • 4. Agenda • Ferret in Rails • Resources
  • 5. What is Ferret? • Information Retrieval (IR) Library • Full-featured Text Search Engine • Inspired on the Search Engine • Port to Ruby by David Balmain
  • 6. What is Ferret? • Initially a 100% pure Ruby port • Since 0.9 many core functions are implemented in C • Fast! Now Faster than Lucene ;-)
  • 8. Concepts • Index : Sequence of documents
  • 9. Concepts • Index : Sequence of documents • Document : Sequence of fields
  • 10. Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms
  • 11. Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms • Term : A text string, keyed by field name
  • 12. Fields of a Document in an Index
  • 13. Fields of a Document in an Index • Fields are individually searchable units that are:
  • 14. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store
  • 15. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms
  • 16. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed
  • 17. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed • Vectored: Frequency and location of Terms are stored
  • 18. It’s all about Indexing • Indexing is the processing of a source document into plain text tokens that Ferret can manipulate • For any non-plaintext sources such as PDF, Word, Excel you need to: • Extract • Analyze
  • 24. Installing Ferret } Pick the latest version for your platform
  • 26. The Recipe 1. Create some Documents
  • 27. The Recipe 1. Create some Documents 2. Create an Index
  • 28. The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index
  • 29. The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index 4. Perform some Queries
  • 30. Example Documents Create some Documents
  • 31. Example Documents Create some Documents “Any String is a Document”
  • 32. Example Documents Create some Documents
  • 33. Example Documents Create some Documents [“This”, “is also”, “a document”]
  • 34. Example Documents Create some Documents
  • 35. Example Documents Create some Documents
  • 36. Ferret::Index::Index Create an Index
  • 37. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class
  • 38. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index
  • 39. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience
  • 40. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent
  • 41. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’)
  • 42. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory
  • 43. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory ➡ index = Ferret::I.new()
  • 44. Ferret::Index::Index Adding Documents to the Index • Index provides the add_document method • It also provides the << alias • Adding documents is then as easy as: ➡ index << “This is a document” ➡ index << {:first => “Bob”, :last => “Smith”}
  • 45. Ferret::Index::Index Perform some Queries
  • 46. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods
  • 47. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters:
  • 48. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {})
  • 49. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block
  • 50. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block ➡ search_each(query, options = {}) {|doc, score| ... }
  • 59. Ferret Query Language • Ferret own Query Language, FQL is a powerful way to specify search queries • FQL supports many query types, including: • Term • Range • Phrase • Wild • Field • Fuzz • Boolean
  • 60. Index.explain • The explain method of Index describes how a document score against a query • Very useful for debugging • and for learning how Ferret works
  • 62. Ferret in your App Application Database Web User Manual File System Input Get User’s Present Gather Data Search Results Query Index Documents Search Index Ferret Index
  • 63. Ferret in Rails • Acts As Ferret is an ActiveRecord extension • Available as a plugin • Provides a simplified interface to Ferret • Maintained by Jens Kramer
  • 64. Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
  • 65. Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
  • 66. Ferret in Rails • Simple model has two searchable fields title and body:
  • 67. Ferret in Rails • After a quick rake db:migrate we now have some data to play with • Fire up the Rails Console and let’s see what acts_as_ferret can do for our models
  • 69. Want more? • Ferret is improving constantly • Acts As Ferret seems to catch up quickly • Real-life usage seems to require some good engineering on your part • Background indexing • Hot swap of indexes?
  • 70. Want more? • We only covered the simplest constructs in Ferret • Ferret’s API provides enough flexibility for the most demanding searching needs
  • 71. Online Resources • http://ferret.davebalmain.com • http://lucene.apache.org • http://lucenebook.com • http://projects.jkraemer.net/acts_as_ferret