SlideShare a Scribd company logo
1 of 31
Finding stuff under the Couch
   with CouchDB-Lucene



           Martin Rehfeld
        @ RUG-B 01-Apr-2010
CouchDB

•   JSON document store
•   all documents in a given database reside in
    one large pool and may be retrieved using
    their ID ...
•   ... or through Map & Reduce based indexes
So how do you do full
    text search?
You potentially could
 achieve this with just
Map & Reduce functions
But that would mean
implementing an actual
   search engine ...
... and this has been done
           before.
Enter Lucene
Apache Lucene is a high-
performance, full-featured text search
engine library written entirely in Java.
It is a technology suitable for nearly
any application that requires full-text
search, especially cross-platform.
                  Courtesy of The Apache Foundation
Lucene Features
•   ranked searching
•   many powerful query types: phrase queries,
    wildcard queries, proximity queries, range
    queries and more
•   fielded searching (e.g., title, author, contents)
•   boolean operators
•   sorting by any field
•   allows simultaneous update and searching
CouchDB Integration
•   couchdb-lucene
    (ready to run Lucene plus
    CouchDB interface)

•   Search interface via
    http_db_handlers, usually
    _fti


•   Indexer interface via
    CouchDB
    update_notification
    facility and fulltext design
    docs
Sample design document,
          i.e., _id: „_design/search“


{
    "fulltext": {
      "by_name": {
      "defaults": { "store":"yes" },
      "index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
    }
    }
}
Sample design document,
          i.e., _id: „_design/search“

                     Name of the index
{
    "fulltext": {
      "by_name": {
      "defaults": { "store":"yes" },
      "index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
    }
    }
}
Sample design document,
          i.e., _id: „_design/search“

                     Name of the index
{
    "fulltext": {              Default options
      "by_name": {             (can be overridden per field)
      "defaults": { "store":"yes" },
      "index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
    }
    }
}
Sample design document,
          i.e., _id: „_design/search“

                       Name of the index
{
    "fulltext": {                Default options
      "by_name": {               (can be overridden per field)
      "defaults": { "store":"yes" },
      "index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
    }
    }     Index function
}
Sample design document,
          i.e., _id: „_design/search“

                       Name of the index
{
    "fulltext": {                 Default options
      "by_name": {                (can be overridden per field)
      "defaults": { "store":"yes" },
      "index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
    }
    }     Index function Builds and returns documents to
}                        be put into Lucene‘s index (may
                         return an array of multiple
                         documents)
Querying the index
http://localhost:5984/your-couch-db/_fti/
your-design-document-name/your-index-name?

 q=
   
   
   
   
   query string

 sort=	 	      	 
     comma-separated fields to sort on

 limit=	 	     	 
     max number of results to return

 skip=
    
   
   
   offset
 include_docs=

       include CouchDB documents in
 
 
   
   
   
   
   response
A full stack example
CouchDB Person
         Document
{
    "_id": "9db68c69726e486b811859937fbb6b09",
    "_rev": "1-c890039865e37eb8b911ff762162772e",
    "name": "Martin Rehfeld",
    "email": "martin.rehfeld@glnetworks.de",
    "notes": "Talks about CouchDB Lucene"
}
Objectives

•   Search for people by name
•   Search for people by any field‘s content
•   Querying from Ruby
•   Paginating results
Index Function
function(doc) {
  // first check if doc is a person document!
  ...
  var ret=new Document();
  ret.add(doc.name);
  ret.add(doc.email);
  ret.add(doc.notes);
  ret.add(doc.name, {field:“name“, store:“yes“});
  ret.add(doc.email, {field:“email“, store:“yes“});
  return ret;
}
Index Function
function(doc) {
  // first check if doc is a person document!
  ...
  var ret=new Document();


                      }   content added to
  ret.add(doc.name);
  ret.add(doc.email);
  ret.add(doc.notes);
                          „default“ field
  ret.add(doc.name, {field:“name“, store:“yes“});
  ret.add(doc.email, {field:“email“, store:“yes“});
  return ret;
}
Index Function
function(doc) {
  // first check if doc is a person document!
  ...
  var ret=new Document();


                      }   content added to
  ret.add(doc.name);
  ret.add(doc.email);
  ret.add(doc.notes);
                          „default“ field
  ret.add(doc.name, {field:“name“, store:“yes“});
  ret.add(doc.email, {field:“email“, store:“yes“});
  return ret;
                                content added to
}
                                named fields
Field Options
name           description                 available options

          the field name to index
field                                          user-defined
                   under
                                      date, double, float, int, long,
type       the type of the field
                                                string
        whether the data is stored.
store   The value will be returned               yes, no
           in the search result
                                          analyzed,
        whether (and how) the data analyzed_no_norms, no,
index
                is indexed              not_analyzed,
                                   not_analyzed_no_norms
Querying the Index I
http://localhost:5984/mydb/_fti/search/
global?q=couchdb
 {
     "q": "default:couchdb",
     "etag": "119e498956048ea8",
     "skip": 0,
     "limit": 25,
     "total_rows": 1,
     "search_duration": 0,
     "fetch_duration": 8,
     "rows":    [
       {
         "id": "9db68c69726e486b811859937fbb6b09",
         "score": 4.520571708679199,
         "fields":        {
           "name": "Martin Rehfeld",
           "email": "martin.rehfeld@glnetworks.de",
         }
       }
     ]
 }
Querying the Index I
http://localhost:5984/mydb/_fti/search/
global?q=couchdb
                                  default field
 {
     "q": "default:couchdb",      is queried
     "etag": "119e498956048ea8",
     "skip": 0,
     "limit": 25,
     "total_rows": 1,
     "search_duration": 0,
     "fetch_duration": 8,
     "rows":    [
       {
         "id": "9db68c69726e486b811859937fbb6b09",
         "score": 4.520571708679199,
         "fields":        {
           "name": "Martin Rehfeld",
           "email": "martin.rehfeld@glnetworks.de",
         }
       }
     ]
 }
Querying the Index I
http://localhost:5984/mydb/_fti/search/
global?q=couchdb
                                  default field
 {
     "q": "default:couchdb",      is queried Content of fields
     "etag": "119e498956048ea8",
     "skip": 0,
     "limit": 25,                              with store:“yes“
     "total_rows": 1,
     "search_duration": 0,                     option are returned
     "fetch_duration": 8,
     "rows":    [                              with the query
       {
                                               results
         "id": "9db68c69726e486b811859937fbb6b09",
         "score": 4.520571708679199,
         "fields":        {
           "name": "Martin Rehfeld",
           "email": "martin.rehfeld@glnetworks.de",
         }
       }
     ]
 }
Querying the Index II
http://localhost:5984/mydb/_fti/search/
global?q=name:rehfeld
 {
     "q": "name:rehfeld",
     "etag": "119e498956048ea8",
     "skip": 0,
     "limit": 25,
     "total_rows": 1,
     "search_duration": 0,
     "fetch_duration": 8,
     "rows":    [
       {
         "id": "9db68c69726e486b811859937fbb6b09",
         "score": 4.520571708679199,
         "fields":        {
           "name": "Martin Rehfeld",
           "email": "martin.rehfeld@glnetworks.de",
         }
       }
     ]
 }
Querying the Index II
http://localhost:5984/mydb/_fti/search/
global?q=name:rehfeld
 {
     "q": "name:rehfeld",                       name field
     "etag": "119e498956048ea8",
     "skip": 0,
     "limit": 25,
                                                is queried
     "total_rows": 1,
     "search_duration": 0,
     "fetch_duration": 8,
     "rows":    [
       {
         "id": "9db68c69726e486b811859937fbb6b09",
         "score": 4.520571708679199,
         "fields":        {
           "name": "Martin Rehfeld",
           "email": "martin.rehfeld@glnetworks.de",
         }
       }
     ]
 }
Querying from Ruby

class Search
  include HTTParty

 base_uri "localhost:5984/#{CouchPotato::Config.database_name}/_fti/search"
 format :json

  def self.query(options = {})
    index = options.delete(:index)
    get("/#{index}", :query => options)
  end
end
Controller / Pagination
class SearchController < ApplicationController
  HITS_PER_PAGE = 10

  def index
    result = Search.query(params.merge(:skip => skip, :limit => HITS_PER_PAGE))
    @hits = WillPaginate::Collection.create(params[:page] || 1, HITS_PER_PAGE,
                                            result['total_rows']) do |pager|
      pager.replace(result['rows'])
    end
  end

private

  def skip
    params[:page] ? (params[:page].to_i - 1) * HITS_PER_PAGE : 0
  end
end
Resources

•   http://couchdb.apache.org/
•   http://lucene.apache.org/java/docs/index.html
•   http://github.com/rnewson/couchdb-lucene
•   http://lucene.apache.org/java/3_0_1/
    queryparsersyntax.html
Q &A



!
    Martin Rehfeld

    http://inside.glnetworks.de
    martin.rehfeld@glnetworks.de

    @klickmich

More Related Content

What's hot

Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with SparkMohammed Guller
 
From flat files to deconstructed database
From flat files to deconstructed databaseFrom flat files to deconstructed database
From flat files to deconstructed databaseJulien Le Dem
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkDatabricks
 
Data Modeling for MongoDB
Data Modeling for MongoDBData Modeling for MongoDB
Data Modeling for MongoDBMongoDB
 
Memory management in oracle
Memory management in oracleMemory management in oracle
Memory management in oracleDavin Abraham
 
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Spark Summit
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterDatabricks
 
Microsoft Azure Cloud Services
Microsoft Azure Cloud ServicesMicrosoft Azure Cloud Services
Microsoft Azure Cloud ServicesDavid J Rosenthal
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackMichel Tricot
 
[pgday.Seoul 2022] PostgreSQL with Google Cloud
[pgday.Seoul 2022] PostgreSQL with Google Cloud[pgday.Seoul 2022] PostgreSQL with Google Cloud
[pgday.Seoul 2022] PostgreSQL with Google CloudPgDay.Seoul
 

What's hot (20)

Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with Spark
 
From flat files to deconstructed database
From flat files to deconstructed databaseFrom flat files to deconstructed database
From flat files to deconstructed database
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Data Modeling for MongoDB
Data Modeling for MongoDBData Modeling for MongoDB
Data Modeling for MongoDB
 
Apache CouchDB
Apache CouchDBApache CouchDB
Apache CouchDB
 
Memory management in oracle
Memory management in oracleMemory management in oracle
Memory management in oracle
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Spark
SparkSpark
Spark
 
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Couch db
Couch dbCouch db
Couch db
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
 
Cloud service providers
Cloud service providersCloud service providers
Cloud service providers
 
Apache spark
Apache sparkApache spark
Apache spark
 
Microsoft Azure Cloud Services
Microsoft Azure Cloud ServicesMicrosoft Azure Cloud Services
Microsoft Azure Cloud Services
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stack
 
[pgday.Seoul 2022] PostgreSQL with Google Cloud
[pgday.Seoul 2022] PostgreSQL with Google Cloud[pgday.Seoul 2022] PostgreSQL with Google Cloud
[pgday.Seoul 2022] PostgreSQL with Google Cloud
 

Viewers also liked

Real World CouchDB
Real World CouchDBReal World CouchDB
Real World CouchDBJohn Wood
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the WebKarel Minarik
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBBradley Holt
 
(R)évolutions, l'innovation entre les lignes
(R)évolutions, l'innovation entre les lignes(R)évolutions, l'innovation entre les lignes
(R)évolutions, l'innovation entre les lignes366
 
Lucene - 10 ans d'usages plus ou moins classiques
Lucene - 10 ans d'usages plus ou moins classiquesLucene - 10 ans d'usages plus ou moins classiques
Lucene - 10 ans d'usages plus ou moins classiquesSylvain Wallez
 
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...StampedeCon
 
ZendCon 2011 Learning CouchDB
ZendCon 2011 Learning CouchDBZendCon 2011 Learning CouchDB
ZendCon 2011 Learning CouchDBBradley Holt
 
Couch Db In 60 Minutes
Couch Db In 60 MinutesCouch Db In 60 Minutes
Couch Db In 60 MinutesGeorge Ang
 
Couch db@nosql+taiwan
Couch db@nosql+taiwanCouch db@nosql+taiwan
Couch db@nosql+taiwanKenzou Yeh
 
CouchDB at New York PHP
CouchDB at New York PHPCouchDB at New York PHP
CouchDB at New York PHPBradley Holt
 
CouchApps: Requiem for Accidental Complexity
CouchApps: Requiem for Accidental ComplexityCouchApps: Requiem for Accidental Complexity
CouchApps: Requiem for Accidental ComplexityFederico Galassi
 
An introduction to CouchDB
An introduction to CouchDBAn introduction to CouchDB
An introduction to CouchDBDavid Coallier
 
Search engine-optimization-starter-guide-fr
Search engine-optimization-starter-guide-frSearch engine-optimization-starter-guide-fr
Search engine-optimization-starter-guide-frNeoSting
 

Viewers also liked (19)

Real World CouchDB
Real World CouchDBReal World CouchDB
Real World CouchDB
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the Web
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDB
 
(R)évolutions, l'innovation entre les lignes
(R)évolutions, l'innovation entre les lignes(R)évolutions, l'innovation entre les lignes
(R)évolutions, l'innovation entre les lignes
 
MySQL Indexes
MySQL IndexesMySQL Indexes
MySQL Indexes
 
Lucene - 10 ans d'usages plus ou moins classiques
Lucene - 10 ans d'usages plus ou moins classiquesLucene - 10 ans d'usages plus ou moins classiques
Lucene - 10 ans d'usages plus ou moins classiques
 
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
 
ZendCon 2011 Learning CouchDB
ZendCon 2011 Learning CouchDBZendCon 2011 Learning CouchDB
ZendCon 2011 Learning CouchDB
 
Couch Db In 60 Minutes
Couch Db In 60 MinutesCouch Db In 60 Minutes
Couch Db In 60 Minutes
 
Couch db
Couch dbCouch db
Couch db
 
Couch db@nosql+taiwan
Couch db@nosql+taiwanCouch db@nosql+taiwan
Couch db@nosql+taiwan
 
CouchDB at New York PHP
CouchDB at New York PHPCouchDB at New York PHP
CouchDB at New York PHP
 
Couch db
Couch dbCouch db
Couch db
 
CouchDB
CouchDBCouchDB
CouchDB
 
CouchApps: Requiem for Accidental Complexity
CouchApps: Requiem for Accidental ComplexityCouchApps: Requiem for Accidental Complexity
CouchApps: Requiem for Accidental Complexity
 
CouchDB Vs MongoDB
CouchDB Vs MongoDBCouchDB Vs MongoDB
CouchDB Vs MongoDB
 
An introduction to CouchDB
An introduction to CouchDBAn introduction to CouchDB
An introduction to CouchDB
 
CouchDB
CouchDBCouchDB
CouchDB
 
Search engine-optimization-starter-guide-fr
Search engine-optimization-starter-guide-frSearch engine-optimization-starter-guide-fr
Search engine-optimization-starter-guide-fr
 

Similar to CouchDB-Lucene

10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data ModelingDATAVERSITY
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDBrogerbodamer
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overviewAmit Juneja
 
Postman Collection Format v2.0 (pre-draft)
Postman Collection Format v2.0 (pre-draft)Postman Collection Format v2.0 (pre-draft)
Postman Collection Format v2.0 (pre-draft)Postman
 
03 form-data
03 form-data03 form-data
03 form-datasnopteck
 
d3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlind3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in BerlinToshiaki Katayama
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenchesIsmail Mayat
 
Building Your First MongoDB App
Building Your First MongoDB AppBuilding Your First MongoDB App
Building Your First MongoDB AppHenrik Ingo
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling rogerbodamer
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation Amit Ghosh
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring DataEric Bottard
 
Building a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and JavaBuilding a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and Javaantoinegirbal
 
Peggy elasticsearch應用
Peggy elasticsearch應用Peggy elasticsearch應用
Peggy elasticsearch應用LearningTech
 
Academy PRO: Elasticsearch. Data management
Academy PRO: Elasticsearch. Data managementAcademy PRO: Elasticsearch. Data management
Academy PRO: Elasticsearch. Data managementBinary Studio
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 MinutesKarel Minarik
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsMongoDB
 

Similar to CouchDB-Lucene (20)

10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Postman Collection Format v2.0 (pre-draft)
Postman Collection Format v2.0 (pre-draft)Postman Collection Format v2.0 (pre-draft)
Postman Collection Format v2.0 (pre-draft)
 
03 form-data
03 form-data03 form-data
03 form-data
 
d3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlind3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlin
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenches
 
Building Your First MongoDB App
Building Your First MongoDB AppBuilding Your First MongoDB App
Building Your First MongoDB App
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
 
Building a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and JavaBuilding a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and Java
 
Power tools in Java
Power tools in JavaPower tools in Java
Power tools in Java
 
Peggy elasticsearch應用
Peggy elasticsearch應用Peggy elasticsearch應用
Peggy elasticsearch應用
 
Academy PRO: Elasticsearch. Data management
Academy PRO: Elasticsearch. Data managementAcademy PRO: Elasticsearch. Data management
Academy PRO: Elasticsearch. Data management
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
 
MongoDb and NoSQL
MongoDb and NoSQLMongoDb and NoSQL
MongoDb and NoSQL
 

Recently uploaded

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

CouchDB-Lucene

  • 1. Finding stuff under the Couch with CouchDB-Lucene Martin Rehfeld @ RUG-B 01-Apr-2010
  • 2. CouchDB • JSON document store • all documents in a given database reside in one large pool and may be retrieved using their ID ... • ... or through Map & Reduce based indexes
  • 3. So how do you do full text search?
  • 4. You potentially could achieve this with just Map & Reduce functions
  • 5. But that would mean implementing an actual search engine ...
  • 6. ... and this has been done before.
  • 7. Enter Lucene Apache Lucene is a high- performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. Courtesy of The Apache Foundation
  • 8. Lucene Features • ranked searching • many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more • fielded searching (e.g., title, author, contents) • boolean operators • sorting by any field • allows simultaneous update and searching
  • 9. CouchDB Integration • couchdb-lucene (ready to run Lucene plus CouchDB interface) • Search interface via http_db_handlers, usually _fti • Indexer interface via CouchDB update_notification facility and fulltext design docs
  • 10. Sample design document, i.e., _id: „_design/search“ { "fulltext": { "by_name": { "defaults": { "store":"yes" }, "index":"function(doc) { var ret=new Document(); ret.add(doc.name); return ret }" } } }
  • 11. Sample design document, i.e., _id: „_design/search“ Name of the index { "fulltext": { "by_name": { "defaults": { "store":"yes" }, "index":"function(doc) { var ret=new Document(); ret.add(doc.name); return ret }" } } }
  • 12. Sample design document, i.e., _id: „_design/search“ Name of the index { "fulltext": { Default options "by_name": { (can be overridden per field) "defaults": { "store":"yes" }, "index":"function(doc) { var ret=new Document(); ret.add(doc.name); return ret }" } } }
  • 13. Sample design document, i.e., _id: „_design/search“ Name of the index { "fulltext": { Default options "by_name": { (can be overridden per field) "defaults": { "store":"yes" }, "index":"function(doc) { var ret=new Document(); ret.add(doc.name); return ret }" } } Index function }
  • 14. Sample design document, i.e., _id: „_design/search“ Name of the index { "fulltext": { Default options "by_name": { (can be overridden per field) "defaults": { "store":"yes" }, "index":"function(doc) { var ret=new Document(); ret.add(doc.name); return ret }" } } Index function Builds and returns documents to } be put into Lucene‘s index (may return an array of multiple documents)
  • 15. Querying the index http://localhost:5984/your-couch-db/_fti/ your-design-document-name/your-index-name? q= query string sort= comma-separated fields to sort on limit= max number of results to return skip= offset include_docs= include CouchDB documents in response
  • 16. A full stack example
  • 17. CouchDB Person Document { "_id": "9db68c69726e486b811859937fbb6b09", "_rev": "1-c890039865e37eb8b911ff762162772e", "name": "Martin Rehfeld", "email": "martin.rehfeld@glnetworks.de", "notes": "Talks about CouchDB Lucene" }
  • 18. Objectives • Search for people by name • Search for people by any field‘s content • Querying from Ruby • Paginating results
  • 19. Index Function function(doc) { // first check if doc is a person document! ... var ret=new Document(); ret.add(doc.name); ret.add(doc.email); ret.add(doc.notes); ret.add(doc.name, {field:“name“, store:“yes“}); ret.add(doc.email, {field:“email“, store:“yes“}); return ret; }
  • 20. Index Function function(doc) { // first check if doc is a person document! ... var ret=new Document(); } content added to ret.add(doc.name); ret.add(doc.email); ret.add(doc.notes); „default“ field ret.add(doc.name, {field:“name“, store:“yes“}); ret.add(doc.email, {field:“email“, store:“yes“}); return ret; }
  • 21. Index Function function(doc) { // first check if doc is a person document! ... var ret=new Document(); } content added to ret.add(doc.name); ret.add(doc.email); ret.add(doc.notes); „default“ field ret.add(doc.name, {field:“name“, store:“yes“}); ret.add(doc.email, {field:“email“, store:“yes“}); return ret; content added to } named fields
  • 22. Field Options name description available options the field name to index field user-defined under date, double, float, int, long, type the type of the field string whether the data is stored. store The value will be returned yes, no in the search result analyzed, whether (and how) the data analyzed_no_norms, no, index is indexed not_analyzed, not_analyzed_no_norms
  • 23. Querying the Index I http://localhost:5984/mydb/_fti/search/ global?q=couchdb { "q": "default:couchdb", "etag": "119e498956048ea8", "skip": 0, "limit": 25, "total_rows": 1, "search_duration": 0, "fetch_duration": 8, "rows": [ { "id": "9db68c69726e486b811859937fbb6b09", "score": 4.520571708679199, "fields": { "name": "Martin Rehfeld", "email": "martin.rehfeld@glnetworks.de", } } ] }
  • 24. Querying the Index I http://localhost:5984/mydb/_fti/search/ global?q=couchdb default field { "q": "default:couchdb", is queried "etag": "119e498956048ea8", "skip": 0, "limit": 25, "total_rows": 1, "search_duration": 0, "fetch_duration": 8, "rows": [ { "id": "9db68c69726e486b811859937fbb6b09", "score": 4.520571708679199, "fields": { "name": "Martin Rehfeld", "email": "martin.rehfeld@glnetworks.de", } } ] }
  • 25. Querying the Index I http://localhost:5984/mydb/_fti/search/ global?q=couchdb default field { "q": "default:couchdb", is queried Content of fields "etag": "119e498956048ea8", "skip": 0, "limit": 25, with store:“yes“ "total_rows": 1, "search_duration": 0, option are returned "fetch_duration": 8, "rows": [ with the query { results "id": "9db68c69726e486b811859937fbb6b09", "score": 4.520571708679199, "fields": { "name": "Martin Rehfeld", "email": "martin.rehfeld@glnetworks.de", } } ] }
  • 26. Querying the Index II http://localhost:5984/mydb/_fti/search/ global?q=name:rehfeld { "q": "name:rehfeld", "etag": "119e498956048ea8", "skip": 0, "limit": 25, "total_rows": 1, "search_duration": 0, "fetch_duration": 8, "rows": [ { "id": "9db68c69726e486b811859937fbb6b09", "score": 4.520571708679199, "fields": { "name": "Martin Rehfeld", "email": "martin.rehfeld@glnetworks.de", } } ] }
  • 27. Querying the Index II http://localhost:5984/mydb/_fti/search/ global?q=name:rehfeld { "q": "name:rehfeld", name field "etag": "119e498956048ea8", "skip": 0, "limit": 25, is queried "total_rows": 1, "search_duration": 0, "fetch_duration": 8, "rows": [ { "id": "9db68c69726e486b811859937fbb6b09", "score": 4.520571708679199, "fields": { "name": "Martin Rehfeld", "email": "martin.rehfeld@glnetworks.de", } } ] }
  • 28. Querying from Ruby class Search include HTTParty base_uri "localhost:5984/#{CouchPotato::Config.database_name}/_fti/search" format :json def self.query(options = {}) index = options.delete(:index) get("/#{index}", :query => options) end end
  • 29. Controller / Pagination class SearchController < ApplicationController HITS_PER_PAGE = 10 def index result = Search.query(params.merge(:skip => skip, :limit => HITS_PER_PAGE)) @hits = WillPaginate::Collection.create(params[:page] || 1, HITS_PER_PAGE, result['total_rows']) do |pager| pager.replace(result['rows']) end end private def skip params[:page] ? (params[:page].to_i - 1) * HITS_PER_PAGE : 0 end end
  • 30. Resources • http://couchdb.apache.org/ • http://lucene.apache.org/java/docs/index.html • http://github.com/rnewson/couchdb-lucene • http://lucene.apache.org/java/3_0_1/ queryparsersyntax.html
  • 31. Q &A ! Martin Rehfeld http://inside.glnetworks.de martin.rehfeld@glnetworks.de @klickmich

Editor's Notes

  1. short recap of what CouchDB is
  2. some (very) limited examples are actually floating around
  3. mapping all documents, split them into words, push through a stemmer, and cross-index them with the documents containing them
  4. ... multiple times, in fact
  5. add all searchable content to the default field, add fields for searching by individual field or using contents in view
  6. the stored field contents can be used to render search results without touching CouchDB
  7. the stored field contents can be used to render search results without touching CouchDB
  8. could be as simple as that (using the httparty gem &amp; Couch Potato) sans error handling
  9. using the Search class in an controller + pagination; utilizing the will_paginate gem