Ensuring Technical Readiness For Copilot in Microsoft 365
CouchDB-Lucene
1. Finding stuff under the Couch
with CouchDB-Lucene
Martin Rehfeld
@ RUG-B 01-Apr-2010
2. CouchDB
• JSON document store
• all documents in a given database reside in
one large pool and may be retrieved using
their ID ...
• ... or through Map & Reduce based indexes
7. Enter Lucene
Apache Lucene is a high-
performance, full-featured text search
engine library written entirely in Java.
It is a technology suitable for nearly
any application that requires full-text
search, especially cross-platform.
Courtesy of The Apache Foundation
8. Lucene Features
• ranked searching
• many powerful query types: phrase queries,
wildcard queries, proximity queries, range
queries and more
• fielded searching (e.g., title, author, contents)
• boolean operators
• sorting by any field
• allows simultaneous update and searching
9. CouchDB Integration
• couchdb-lucene
(ready to run Lucene plus
CouchDB interface)
• Search interface via
http_db_handlers, usually
_fti
• Indexer interface via
CouchDB
update_notification
facility and fulltext design
docs
11. Sample design document,
i.e., _id: „_design/search“
Name of the index
{
"fulltext": {
"by_name": {
"defaults": { "store":"yes" },
"index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
}
}
}
12. Sample design document,
i.e., _id: „_design/search“
Name of the index
{
"fulltext": { Default options
"by_name": { (can be overridden per field)
"defaults": { "store":"yes" },
"index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
}
}
}
13. Sample design document,
i.e., _id: „_design/search“
Name of the index
{
"fulltext": { Default options
"by_name": { (can be overridden per field)
"defaults": { "store":"yes" },
"index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
}
} Index function
}
14. Sample design document,
i.e., _id: „_design/search“
Name of the index
{
"fulltext": { Default options
"by_name": { (can be overridden per field)
"defaults": { "store":"yes" },
"index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
}
} Index function Builds and returns documents to
} be put into Lucene‘s index (may
return an array of multiple
documents)
17. CouchDB Person
Document
{
"_id": "9db68c69726e486b811859937fbb6b09",
"_rev": "1-c890039865e37eb8b911ff762162772e",
"name": "Martin Rehfeld",
"email": "martin.rehfeld@glnetworks.de",
"notes": "Talks about CouchDB Lucene"
}
18. Objectives
• Search for people by name
• Search for people by any field‘s content
• Querying from Ruby
• Paginating results
19. Index Function
function(doc) {
// first check if doc is a person document!
...
var ret=new Document();
ret.add(doc.name);
ret.add(doc.email);
ret.add(doc.notes);
ret.add(doc.name, {field:“name“, store:“yes“});
ret.add(doc.email, {field:“email“, store:“yes“});
return ret;
}
20. Index Function
function(doc) {
// first check if doc is a person document!
...
var ret=new Document();
} content added to
ret.add(doc.name);
ret.add(doc.email);
ret.add(doc.notes);
„default“ field
ret.add(doc.name, {field:“name“, store:“yes“});
ret.add(doc.email, {field:“email“, store:“yes“});
return ret;
}
21. Index Function
function(doc) {
// first check if doc is a person document!
...
var ret=new Document();
} content added to
ret.add(doc.name);
ret.add(doc.email);
ret.add(doc.notes);
„default“ field
ret.add(doc.name, {field:“name“, store:“yes“});
ret.add(doc.email, {field:“email“, store:“yes“});
return ret;
content added to
}
named fields
22. Field Options
name description available options
the field name to index
field user-defined
under
date, double, float, int, long,
type the type of the field
string
whether the data is stored.
store The value will be returned yes, no
in the search result
analyzed,
whether (and how) the data analyzed_no_norms, no,
index
is indexed not_analyzed,
not_analyzed_no_norms
27. Querying the Index II
http://localhost:5984/mydb/_fti/search/
global?q=name:rehfeld
{
"q": "name:rehfeld", name field
"etag": "119e498956048ea8",
"skip": 0,
"limit": 25,
is queried
"total_rows": 1,
"search_duration": 0,
"fetch_duration": 8,
"rows": [
{
"id": "9db68c69726e486b811859937fbb6b09",
"score": 4.520571708679199,
"fields": {
"name": "Martin Rehfeld",
"email": "martin.rehfeld@glnetworks.de",
}
}
]
}
28. Querying from Ruby
class Search
include HTTParty
base_uri "localhost:5984/#{CouchPotato::Config.database_name}/_fti/search"
format :json
def self.query(options = {})
index = options.delete(:index)
get("/#{index}", :query => options)
end
end
29. Controller / Pagination
class SearchController < ApplicationController
HITS_PER_PAGE = 10
def index
result = Search.query(params.merge(:skip => skip, :limit => HITS_PER_PAGE))
@hits = WillPaginate::Collection.create(params[:page] || 1, HITS_PER_PAGE,
result['total_rows']) do |pager|
pager.replace(result['rows'])
end
end
private
def skip
params[:page] ? (params[:page].to_i - 1) * HITS_PER_PAGE : 0
end
end