SlideShare a Scribd company logo
1 of 31
Finding stuff under the Couch
   with CouchDB-Lucene



           Martin Rehfeld
        @ RUG-B 01-Apr-2010
CouchDB

•   JSON document store
•   all documents in a given database reside in
    one large pool and may be retrieved using
    their ID ...
•   ... or through Map & Reduce based indexes
So how do you do full
    text search?
You potentially could
 achieve this with just
Map & Reduce functions
But that would mean
implementing an actual
   search engine ...
... and this has been done
           before.
Enter Lucene
Apache Lucene is a high-
performance, full-featured text search
engine library written entirely in Java.
It is a technology suitable for nearly
any application that requires full-text
search, especially cross-platform.
                  Courtesy of The Apache Foundation
Lucene Features
•   ranked searching
•   many powerful query types: phrase queries,
    wildcard queries, proximity queries, range
    queries and more
•   fielded searching (e.g., title, author, contents)
•   boolean operators
•   sorting by any field
•   allows simultaneous update and searching
CouchDB Integration
•   couchdb-lucene
    (ready to run Lucene plus
    CouchDB interface)

•   Search interface via
    http_db_handlers, usually
    _fti


•   Indexer interface via
    CouchDB
    update_notification
    facility and fulltext design
    docs
Sample design document,
          i.e., _id: „_design/search“


{
    "fulltext": {
      "by_name": {
      "defaults": { "store":"yes" },
      "index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
    }
    }
}
Sample design document,
          i.e., _id: „_design/search“

                     Name of the index
{
    "fulltext": {
      "by_name": {
      "defaults": { "store":"yes" },
      "index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
    }
    }
}
Sample design document,
          i.e., _id: „_design/search“

                     Name of the index
{
    "fulltext": {              Default options
      "by_name": {             (can be overridden per field)
      "defaults": { "store":"yes" },
      "index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
    }
    }
}
Sample design document,
          i.e., _id: „_design/search“

                       Name of the index
{
    "fulltext": {                Default options
      "by_name": {               (can be overridden per field)
      "defaults": { "store":"yes" },
      "index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
    }
    }     Index function
}
Sample design document,
          i.e., _id: „_design/search“

                       Name of the index
{
    "fulltext": {                 Default options
      "by_name": {                (can be overridden per field)
      "defaults": { "store":"yes" },
      "index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
    }
    }     Index function Builds and returns documents to
}                        be put into Lucene‘s index (may
                         return an array of multiple
                         documents)
Querying the index
http://localhost:5984/your-couch-db/_fti/
your-design-document-name/your-index-name?

 q=
   
   
   
   
   query string

 sort=	 	      	 
     comma-separated fields to sort on

 limit=	 	     	 
     max number of results to return

 skip=
    
   
   
   offset
 include_docs=

       include CouchDB documents in
 
 
   
   
   
   
   response
A full stack example
CouchDB Person
         Document
{
    "_id": "9db68c69726e486b811859937fbb6b09",
    "_rev": "1-c890039865e37eb8b911ff762162772e",
    "name": "Martin Rehfeld",
    "email": "martin.rehfeld@glnetworks.de",
    "notes": "Talks about CouchDB Lucene"
}
Objectives

•   Search for people by name
•   Search for people by any field‘s content
•   Querying from Ruby
•   Paginating results
Index Function
function(doc) {
  // first check if doc is a person document!
  ...
  var ret=new Document();
  ret.add(doc.name);
  ret.add(doc.email);
  ret.add(doc.notes);
  ret.add(doc.name, {field:“name“, store:“yes“});
  ret.add(doc.email, {field:“email“, store:“yes“});
  return ret;
}
Index Function
function(doc) {
  // first check if doc is a person document!
  ...
  var ret=new Document();


                      }   content added to
  ret.add(doc.name);
  ret.add(doc.email);
  ret.add(doc.notes);
                          „default“ field
  ret.add(doc.name, {field:“name“, store:“yes“});
  ret.add(doc.email, {field:“email“, store:“yes“});
  return ret;
}
Index Function
function(doc) {
  // first check if doc is a person document!
  ...
  var ret=new Document();


                      }   content added to
  ret.add(doc.name);
  ret.add(doc.email);
  ret.add(doc.notes);
                          „default“ field
  ret.add(doc.name, {field:“name“, store:“yes“});
  ret.add(doc.email, {field:“email“, store:“yes“});
  return ret;
                                content added to
}
                                named fields
Field Options
name           description                 available options

          the field name to index
field                                          user-defined
                   under
                                      date, double, float, int, long,
type       the type of the field
                                                string
        whether the data is stored.
store   The value will be returned               yes, no
           in the search result
                                          analyzed,
        whether (and how) the data analyzed_no_norms, no,
index
                is indexed              not_analyzed,
                                   not_analyzed_no_norms
Querying the Index I
http://localhost:5984/mydb/_fti/search/
global?q=couchdb
 {
     "q": "default:couchdb",
     "etag": "119e498956048ea8",
     "skip": 0,
     "limit": 25,
     "total_rows": 1,
     "search_duration": 0,
     "fetch_duration": 8,
     "rows":    [
       {
         "id": "9db68c69726e486b811859937fbb6b09",
         "score": 4.520571708679199,
         "fields":        {
           "name": "Martin Rehfeld",
           "email": "martin.rehfeld@glnetworks.de",
         }
       }
     ]
 }
Querying the Index I
http://localhost:5984/mydb/_fti/search/
global?q=couchdb
                                  default field
 {
     "q": "default:couchdb",      is queried
     "etag": "119e498956048ea8",
     "skip": 0,
     "limit": 25,
     "total_rows": 1,
     "search_duration": 0,
     "fetch_duration": 8,
     "rows":    [
       {
         "id": "9db68c69726e486b811859937fbb6b09",
         "score": 4.520571708679199,
         "fields":        {
           "name": "Martin Rehfeld",
           "email": "martin.rehfeld@glnetworks.de",
         }
       }
     ]
 }
Querying the Index I
http://localhost:5984/mydb/_fti/search/
global?q=couchdb
                                  default field
 {
     "q": "default:couchdb",      is queried Content of fields
     "etag": "119e498956048ea8",
     "skip": 0,
     "limit": 25,                              with store:“yes“
     "total_rows": 1,
     "search_duration": 0,                     option are returned
     "fetch_duration": 8,
     "rows":    [                              with the query
       {
                                               results
         "id": "9db68c69726e486b811859937fbb6b09",
         "score": 4.520571708679199,
         "fields":        {
           "name": "Martin Rehfeld",
           "email": "martin.rehfeld@glnetworks.de",
         }
       }
     ]
 }
Querying the Index II
http://localhost:5984/mydb/_fti/search/
global?q=name:rehfeld
 {
     "q": "name:rehfeld",
     "etag": "119e498956048ea8",
     "skip": 0,
     "limit": 25,
     "total_rows": 1,
     "search_duration": 0,
     "fetch_duration": 8,
     "rows":    [
       {
         "id": "9db68c69726e486b811859937fbb6b09",
         "score": 4.520571708679199,
         "fields":        {
           "name": "Martin Rehfeld",
           "email": "martin.rehfeld@glnetworks.de",
         }
       }
     ]
 }
Querying the Index II
http://localhost:5984/mydb/_fti/search/
global?q=name:rehfeld
 {
     "q": "name:rehfeld",                       name field
     "etag": "119e498956048ea8",
     "skip": 0,
     "limit": 25,
                                                is queried
     "total_rows": 1,
     "search_duration": 0,
     "fetch_duration": 8,
     "rows":    [
       {
         "id": "9db68c69726e486b811859937fbb6b09",
         "score": 4.520571708679199,
         "fields":        {
           "name": "Martin Rehfeld",
           "email": "martin.rehfeld@glnetworks.de",
         }
       }
     ]
 }
Querying from Ruby

class Search
  include HTTParty

 base_uri "localhost:5984/#{CouchPotato::Config.database_name}/_fti/search"
 format :json

  def self.query(options = {})
    index = options.delete(:index)
    get("/#{index}", :query => options)
  end
end
Controller / Pagination
class SearchController < ApplicationController
  HITS_PER_PAGE = 10

  def index
    result = Search.query(params.merge(:skip => skip, :limit => HITS_PER_PAGE))
    @hits = WillPaginate::Collection.create(params[:page] || 1, HITS_PER_PAGE,
                                            result['total_rows']) do |pager|
      pager.replace(result['rows'])
    end
  end

private

  def skip
    params[:page] ? (params[:page].to_i - 1) * HITS_PER_PAGE : 0
  end
end
Resources

•   http://couchdb.apache.org/
•   http://lucene.apache.org/java/docs/index.html
•   http://github.com/rnewson/couchdb-lucene
•   http://lucene.apache.org/java/3_0_1/
    queryparsersyntax.html
Q &A



!
    Martin Rehfeld

    http://inside.glnetworks.de
    martin.rehfeld@glnetworks.de

    @klickmich

More Related Content

What's hot

Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Key Considerations While Rolling Out Denodo Platform
Key Considerations While Rolling Out Denodo PlatformKey Considerations While Rolling Out Denodo Platform
Key Considerations While Rolling Out Denodo PlatformDenodo
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks EDB
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceDenodo
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational DatabasesChris Baglieri
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...HostedbyConfluent
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDATAVERSITY
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKKriangkrai Chaonithi
 
Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best PracticesEduardo Castro
 
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...Cathrine Wilhelmsen
 
Data Modeling is Data Governance
Data Modeling is Data GovernanceData Modeling is Data Governance
Data Modeling is Data GovernanceDATAVERSITY
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDBMongoDB
 

What's hot (20)

Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Key Considerations While Rolling Out Denodo Platform
Key Considerations While Rolling Out Denodo PlatformKey Considerations While Rolling Out Denodo Platform
Key Considerations While Rolling Out Denodo Platform
 
NoSql
NoSqlNoSql
NoSql
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational Databases
 
Mongo DB
Mongo DB Mongo DB
Mongo DB
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
 
CouchDB
CouchDBCouchDB
CouchDB
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Rules engine
Rules engineRules engine
Rules engine
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OK
 
Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best Practices
 
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
 
Data Modeling is Data Governance
Data Modeling is Data GovernanceData Modeling is Data Governance
Data Modeling is Data Governance
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 

Viewers also liked

Real World CouchDB
Real World CouchDBReal World CouchDB
Real World CouchDBJohn Wood
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the WebKarel Minarik
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBBradley Holt
 
(R)évolutions, l'innovation entre les lignes
(R)évolutions, l'innovation entre les lignes(R)évolutions, l'innovation entre les lignes
(R)évolutions, l'innovation entre les lignes366
 
Lucene - 10 ans d'usages plus ou moins classiques
Lucene - 10 ans d'usages plus ou moins classiquesLucene - 10 ans d'usages plus ou moins classiques
Lucene - 10 ans d'usages plus ou moins classiquesSylvain Wallez
 
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...StampedeCon
 
ZendCon 2011 Learning CouchDB
ZendCon 2011 Learning CouchDBZendCon 2011 Learning CouchDB
ZendCon 2011 Learning CouchDBBradley Holt
 
Couch Db In 60 Minutes
Couch Db In 60 MinutesCouch Db In 60 Minutes
Couch Db In 60 MinutesGeorge Ang
 
Couch db@nosql+taiwan
Couch db@nosql+taiwanCouch db@nosql+taiwan
Couch db@nosql+taiwanKenzou Yeh
 
CouchDB at New York PHP
CouchDB at New York PHPCouchDB at New York PHP
CouchDB at New York PHPBradley Holt
 
CouchApps: Requiem for Accidental Complexity
CouchApps: Requiem for Accidental ComplexityCouchApps: Requiem for Accidental Complexity
CouchApps: Requiem for Accidental ComplexityFederico Galassi
 
An introduction to CouchDB
An introduction to CouchDBAn introduction to CouchDB
An introduction to CouchDBDavid Coallier
 
Search engine-optimization-starter-guide-fr
Search engine-optimization-starter-guide-frSearch engine-optimization-starter-guide-fr
Search engine-optimization-starter-guide-frNeoSting
 

Viewers also liked (20)

Real World CouchDB
Real World CouchDBReal World CouchDB
Real World CouchDB
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the Web
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDB
 
(R)évolutions, l'innovation entre les lignes
(R)évolutions, l'innovation entre les lignes(R)évolutions, l'innovation entre les lignes
(R)évolutions, l'innovation entre les lignes
 
MySQL Indexes
MySQL IndexesMySQL Indexes
MySQL Indexes
 
Lucene - 10 ans d'usages plus ou moins classiques
Lucene - 10 ans d'usages plus ou moins classiquesLucene - 10 ans d'usages plus ou moins classiques
Lucene - 10 ans d'usages plus ou moins classiques
 
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
 
ZendCon 2011 Learning CouchDB
ZendCon 2011 Learning CouchDBZendCon 2011 Learning CouchDB
ZendCon 2011 Learning CouchDB
 
Apache CouchDB
Apache CouchDBApache CouchDB
Apache CouchDB
 
Couch Db In 60 Minutes
Couch Db In 60 MinutesCouch Db In 60 Minutes
Couch Db In 60 Minutes
 
Couch db
Couch dbCouch db
Couch db
 
Couch db@nosql+taiwan
Couch db@nosql+taiwanCouch db@nosql+taiwan
Couch db@nosql+taiwan
 
CouchDB at New York PHP
CouchDB at New York PHPCouchDB at New York PHP
CouchDB at New York PHP
 
Couch db
Couch dbCouch db
Couch db
 
CouchDB
CouchDBCouchDB
CouchDB
 
CouchApps: Requiem for Accidental Complexity
CouchApps: Requiem for Accidental ComplexityCouchApps: Requiem for Accidental Complexity
CouchApps: Requiem for Accidental Complexity
 
CouchDB Vs MongoDB
CouchDB Vs MongoDBCouchDB Vs MongoDB
CouchDB Vs MongoDB
 
An introduction to CouchDB
An introduction to CouchDBAn introduction to CouchDB
An introduction to CouchDB
 
CouchDB
CouchDBCouchDB
CouchDB
 
Search engine-optimization-starter-guide-fr
Search engine-optimization-starter-guide-frSearch engine-optimization-starter-guide-fr
Search engine-optimization-starter-guide-fr
 

Similar to CouchDB-Lucene

10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data ModelingDATAVERSITY
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDBrogerbodamer
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overviewAmit Juneja
 
Postman Collection Format v2.0 (pre-draft)
Postman Collection Format v2.0 (pre-draft)Postman Collection Format v2.0 (pre-draft)
Postman Collection Format v2.0 (pre-draft)Postman
 
03 form-data
03 form-data03 form-data
03 form-datasnopteck
 
d3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlind3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in BerlinToshiaki Katayama
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenchesIsmail Mayat
 
Building Your First MongoDB App
Building Your First MongoDB AppBuilding Your First MongoDB App
Building Your First MongoDB AppHenrik Ingo
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling rogerbodamer
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation Amit Ghosh
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring DataEric Bottard
 
Building a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and JavaBuilding a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and Javaantoinegirbal
 
Peggy elasticsearch應用
Peggy elasticsearch應用Peggy elasticsearch應用
Peggy elasticsearch應用LearningTech
 
Academy PRO: Elasticsearch. Data management
Academy PRO: Elasticsearch. Data managementAcademy PRO: Elasticsearch. Data management
Academy PRO: Elasticsearch. Data managementBinary Studio
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 MinutesKarel Minarik
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsMongoDB
 

Similar to CouchDB-Lucene (20)

10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Postman Collection Format v2.0 (pre-draft)
Postman Collection Format v2.0 (pre-draft)Postman Collection Format v2.0 (pre-draft)
Postman Collection Format v2.0 (pre-draft)
 
03 form-data
03 form-data03 form-data
03 form-data
 
d3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlind3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlin
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenches
 
Building Your First MongoDB App
Building Your First MongoDB AppBuilding Your First MongoDB App
Building Your First MongoDB App
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
 
Building a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and JavaBuilding a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and Java
 
Power tools in Java
Power tools in JavaPower tools in Java
Power tools in Java
 
Peggy elasticsearch應用
Peggy elasticsearch應用Peggy elasticsearch應用
Peggy elasticsearch應用
 
Academy PRO: Elasticsearch. Data management
Academy PRO: Elasticsearch. Data managementAcademy PRO: Elasticsearch. Data management
Academy PRO: Elasticsearch. Data management
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
 
MongoDb and NoSQL
MongoDb and NoSQLMongoDb and NoSQL
MongoDb and NoSQL
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

CouchDB-Lucene

  • 1. Finding stuff under the Couch with CouchDB-Lucene Martin Rehfeld @ RUG-B 01-Apr-2010
  • 2. CouchDB • JSON document store • all documents in a given database reside in one large pool and may be retrieved using their ID ... • ... or through Map & Reduce based indexes
  • 3. So how do you do full text search?
  • 4. You potentially could achieve this with just Map & Reduce functions
  • 5. But that would mean implementing an actual search engine ...
  • 6. ... and this has been done before.
  • 7. Enter Lucene Apache Lucene is a high- performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. Courtesy of The Apache Foundation
  • 8. Lucene Features • ranked searching • many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more • fielded searching (e.g., title, author, contents) • boolean operators • sorting by any field • allows simultaneous update and searching
  • 9. CouchDB Integration • couchdb-lucene (ready to run Lucene plus CouchDB interface) • Search interface via http_db_handlers, usually _fti • Indexer interface via CouchDB update_notification facility and fulltext design docs
  • 10. Sample design document, i.e., _id: „_design/search“ { "fulltext": { "by_name": { "defaults": { "store":"yes" }, "index":"function(doc) { var ret=new Document(); ret.add(doc.name); return ret }" } } }
  • 11. Sample design document, i.e., _id: „_design/search“ Name of the index { "fulltext": { "by_name": { "defaults": { "store":"yes" }, "index":"function(doc) { var ret=new Document(); ret.add(doc.name); return ret }" } } }
  • 12. Sample design document, i.e., _id: „_design/search“ Name of the index { "fulltext": { Default options "by_name": { (can be overridden per field) "defaults": { "store":"yes" }, "index":"function(doc) { var ret=new Document(); ret.add(doc.name); return ret }" } } }
  • 13. Sample design document, i.e., _id: „_design/search“ Name of the index { "fulltext": { Default options "by_name": { (can be overridden per field) "defaults": { "store":"yes" }, "index":"function(doc) { var ret=new Document(); ret.add(doc.name); return ret }" } } Index function }
  • 14. Sample design document, i.e., _id: „_design/search“ Name of the index { "fulltext": { Default options "by_name": { (can be overridden per field) "defaults": { "store":"yes" }, "index":"function(doc) { var ret=new Document(); ret.add(doc.name); return ret }" } } Index function Builds and returns documents to } be put into Lucene‘s index (may return an array of multiple documents)
  • 15. Querying the index http://localhost:5984/your-couch-db/_fti/ your-design-document-name/your-index-name? q= query string sort= comma-separated fields to sort on limit= max number of results to return skip= offset include_docs= include CouchDB documents in response
  • 16. A full stack example
  • 17. CouchDB Person Document { "_id": "9db68c69726e486b811859937fbb6b09", "_rev": "1-c890039865e37eb8b911ff762162772e", "name": "Martin Rehfeld", "email": "martin.rehfeld@glnetworks.de", "notes": "Talks about CouchDB Lucene" }
  • 18. Objectives • Search for people by name • Search for people by any field‘s content • Querying from Ruby • Paginating results
  • 19. Index Function function(doc) { // first check if doc is a person document! ... var ret=new Document(); ret.add(doc.name); ret.add(doc.email); ret.add(doc.notes); ret.add(doc.name, {field:“name“, store:“yes“}); ret.add(doc.email, {field:“email“, store:“yes“}); return ret; }
  • 20. Index Function function(doc) { // first check if doc is a person document! ... var ret=new Document(); } content added to ret.add(doc.name); ret.add(doc.email); ret.add(doc.notes); „default“ field ret.add(doc.name, {field:“name“, store:“yes“}); ret.add(doc.email, {field:“email“, store:“yes“}); return ret; }
  • 21. Index Function function(doc) { // first check if doc is a person document! ... var ret=new Document(); } content added to ret.add(doc.name); ret.add(doc.email); ret.add(doc.notes); „default“ field ret.add(doc.name, {field:“name“, store:“yes“}); ret.add(doc.email, {field:“email“, store:“yes“}); return ret; content added to } named fields
  • 22. Field Options name description available options the field name to index field user-defined under date, double, float, int, long, type the type of the field string whether the data is stored. store The value will be returned yes, no in the search result analyzed, whether (and how) the data analyzed_no_norms, no, index is indexed not_analyzed, not_analyzed_no_norms
  • 23. Querying the Index I http://localhost:5984/mydb/_fti/search/ global?q=couchdb { "q": "default:couchdb", "etag": "119e498956048ea8", "skip": 0, "limit": 25, "total_rows": 1, "search_duration": 0, "fetch_duration": 8, "rows": [ { "id": "9db68c69726e486b811859937fbb6b09", "score": 4.520571708679199, "fields": { "name": "Martin Rehfeld", "email": "martin.rehfeld@glnetworks.de", } } ] }
  • 24. Querying the Index I http://localhost:5984/mydb/_fti/search/ global?q=couchdb default field { "q": "default:couchdb", is queried "etag": "119e498956048ea8", "skip": 0, "limit": 25, "total_rows": 1, "search_duration": 0, "fetch_duration": 8, "rows": [ { "id": "9db68c69726e486b811859937fbb6b09", "score": 4.520571708679199, "fields": { "name": "Martin Rehfeld", "email": "martin.rehfeld@glnetworks.de", } } ] }
  • 25. Querying the Index I http://localhost:5984/mydb/_fti/search/ global?q=couchdb default field { "q": "default:couchdb", is queried Content of fields "etag": "119e498956048ea8", "skip": 0, "limit": 25, with store:“yes“ "total_rows": 1, "search_duration": 0, option are returned "fetch_duration": 8, "rows": [ with the query { results "id": "9db68c69726e486b811859937fbb6b09", "score": 4.520571708679199, "fields": { "name": "Martin Rehfeld", "email": "martin.rehfeld@glnetworks.de", } } ] }
  • 26. Querying the Index II http://localhost:5984/mydb/_fti/search/ global?q=name:rehfeld { "q": "name:rehfeld", "etag": "119e498956048ea8", "skip": 0, "limit": 25, "total_rows": 1, "search_duration": 0, "fetch_duration": 8, "rows": [ { "id": "9db68c69726e486b811859937fbb6b09", "score": 4.520571708679199, "fields": { "name": "Martin Rehfeld", "email": "martin.rehfeld@glnetworks.de", } } ] }
  • 27. Querying the Index II http://localhost:5984/mydb/_fti/search/ global?q=name:rehfeld { "q": "name:rehfeld", name field "etag": "119e498956048ea8", "skip": 0, "limit": 25, is queried "total_rows": 1, "search_duration": 0, "fetch_duration": 8, "rows": [ { "id": "9db68c69726e486b811859937fbb6b09", "score": 4.520571708679199, "fields": { "name": "Martin Rehfeld", "email": "martin.rehfeld@glnetworks.de", } } ] }
  • 28. Querying from Ruby class Search include HTTParty base_uri "localhost:5984/#{CouchPotato::Config.database_name}/_fti/search" format :json def self.query(options = {}) index = options.delete(:index) get("/#{index}", :query => options) end end
  • 29. Controller / Pagination class SearchController < ApplicationController HITS_PER_PAGE = 10 def index result = Search.query(params.merge(:skip => skip, :limit => HITS_PER_PAGE)) @hits = WillPaginate::Collection.create(params[:page] || 1, HITS_PER_PAGE, result['total_rows']) do |pager| pager.replace(result['rows']) end end private def skip params[:page] ? (params[:page].to_i - 1) * HITS_PER_PAGE : 0 end end
  • 30. Resources • http://couchdb.apache.org/ • http://lucene.apache.org/java/docs/index.html • http://github.com/rnewson/couchdb-lucene • http://lucene.apache.org/java/3_0_1/ queryparsersyntax.html
  • 31. Q &A ! Martin Rehfeld http://inside.glnetworks.de martin.rehfeld@glnetworks.de @klickmich

Editor's Notes

  1. short recap of what CouchDB is
  2. some (very) limited examples are actually floating around
  3. mapping all documents, split them into words, push through a stemmer, and cross-index them with the documents containing them
  4. ... multiple times, in fact
  5. add all searchable content to the default field, add fields for searching by individual field or using contents in view
  6. the stored field contents can be used to render search results without touching CouchDB
  7. the stored field contents can be used to render search results without touching CouchDB
  8. could be as simple as that (using the httparty gem &amp; Couch Potato) sans error handling
  9. using the Search class in an controller + pagination; utilizing the will_paginate gem