SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Lucene in Action
Применение Lucene для
построения
высокопроизволительных систем
Гавриленко Евгений
Ведущий разработчик Artezio
Lucene
• Что же это такое?
• Twitter 1млрд запросов в день
• hh.ru 400 запросов в секунду
• LinkedIn, FedEx…
Основные компоненты индексации
• IndexWriter
• Directory (FSDirectory, RAMDirectory)
• Analyzer
• Document
• Field / Multivalued fields
Построение индекса
var directory = new RAMDirectory();
//var directory = FSDirectory.Open("/tmp/testindex");
var analyzer = new RussianAnalyzer(Version.LUCENE_30);
using (var writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
{
for (var i = 0; i < 1000000; i++)
{
var doc = new Document();
doc.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
doc.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED));
doc.Add(new Field("text",string.Format("{0} строка 2.", i),Field.Store.YES,Field.Index.ANALYZED));
writer.AddDocument(doc);
if (i%100000 == 0)
Console.WriteLine("[{1}]: {0} документов сохранено.",i,DateTime.Now);
}
writer.Optimize();
}
Схема данных
var doc1 = new Document();
doc1.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
doc1.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED));
var field = new NumericField(“numericField1”, Field.Store.NO, true);
doc1.Add(field.SetDoubleValue(value));
var doc2 = new Document();
doc2.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
doc2.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED));
doc2.Add(new Field(“blablaFild1", “blabla-body",Field.Store.YES,Field.Index.ANALYZED));
Основные компоненты поиска
• IndexSearcher/MultiSearcher/ParallelMultiSearcher
• Term
• Query
• TermQuery
• TopDocs
Query
• TermQuery
• MultiFieldQueryParser
• BooleanQuery
• NumericRangeQuery
• SpanQuery
• …
• QueryParser
Поиск
var reader = IndexReader.Open(directory, true);
var searcher = new IndexSearcher(reader);
var parser = new QueryParser(Version.LUCENE_30, "text", analyzer);
var query = parser.Parse("20 строку");
var hits = searcher.Search(query, 100);
Console.WriteLine("total hits: {0}", hits.TotalHits);
if (hits.TotalHits == 0) return;
var rdoc = reader.Document(hits.ScoreDocs[0].Doc);
Console.WriteLine("value:{0}", rdoc.Get("text"));
Поиск с сортировкой
switch (sl)
{
case "barcode":
case "code":
indexSort = new Sort(new SortField(sl, SortField.STRING,indexDir));
break;
case "price":
indexSort = new Sort(new SortField(sl, SortField.DOUBLE, indexDir));
break;
default:
indexSort = new Sort(new SortField(sl, SortField.STRING, indexDir));
break;
}
...
searcher.SetDefaultFieldSortScoring(true,false);
var hits = searcher.Search(query, filter, count, indexSort);
Paging
Анализаторы
• StandardAnalyzer
• SnowballAnalyzer
• KeywordAnalyzer
• WhitespaceAnalyzer
• RussianAnalyzer ()
Применение в E-Commerce
Ecommerce
DB
Service/
Daemon
Lucene
Index
search
service
Search
backend
Linq to Lucene
public class Article
{
[Field(Analyzer = typeof(StandardAnalyzer))]
public string Author { get; set; }
[Field(Analyzer = typeof(StandardAnalyzer))]
public string Title { get; set; }
public DateTimeOffset PublishDate { get; set; }
[NumericField]
public long Id { get; set; }
[Field(IndexMode.NotIndexed, Store = StoreMode.Yes)]
public string BodyText { get; set; }
[Field("text", Store = StoreMode.No, Analyzer = typeof(PorterStemAnalyzer))]
public string SearchText
{
get { return string.Join(" ", new[] {Author, Title, BodyText}); }
}
}
Linq to Lucene
var directory = new RAMDirectory();
var provider = new LuceneDataProvider(directory, Version.LUCENE_30);
using (var session = provider.OpenSession<Article>())
{
session.Add(new Article {Author = "John Doe", BodyText = "some body text", PublishDate = DateTimeOffset.UtcNow});
}
var articles = provider.AsQueryable<Article>();
var threshold = DateTimeOffset.UtcNow.Subtract(TimeSpan.FromDays(30));
var articlesByJohn = from a in articles
where a.Author == "John Doe" && a.PublishDate > threshold
orderby a.Title
select a;
Console.WriteLine("Articles by John Doe: " + articlesByJohn.Count());
var searchResults = from a in articles
where a.SearchText == "some search query"
select a;
Console.WriteLine("Search Results: " + searchResults.Count());
Полезные ресурсы
• Lucene http://lucene.apache.org/
• Lucene.Net http://lucenenet.apache.org
• Linq to Lucene https://github.com/themotleyfool/Lucene.Net.Linq
• “Lucene in Action” http://it-ebooks.info/book/2112

Weitere ähnliche Inhalte

Was ist angesagt?

Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Lucidworks
 
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
SH 2 - SES 3 -  MongoDB Aggregation Framework.pptxSH 2 - SES 3 -  MongoDB Aggregation Framework.pptx
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
MongoDB
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query Optimization
MongoDB
 

Was ist angesagt? (20)

Dapper performance
Dapper performanceDapper performance
Dapper performance
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScript
 
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkSanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
 
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
SH 2 - SES 3 -  MongoDB Aggregation Framework.pptxSH 2 - SES 3 -  MongoDB Aggregation Framework.pptx
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015
 
Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
 
Avro introduction
Avro introductionAvro introduction
Avro introduction
 
Tapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and FlinkTapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and Flink
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
ElasticSearch Basics
ElasticSearch BasicsElasticSearch Basics
ElasticSearch Basics
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query Optimization
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search Results
 

Andere mochten auch

Database reverse engineering
Database reverse engineeringDatabase reverse engineering
Database reverse engineering
DevOWL Meetup
 
Miscosoft Singularity - konkurs presentation
Miscosoft Singularity - konkurs presentationMiscosoft Singularity - konkurs presentation
Miscosoft Singularity - konkurs presentation
Vasilii Diachenko
 

Andere mochten auch (17)

Database reverse engineering
Database reverse engineeringDatabase reverse engineering
Database reverse engineering
 
devOWL coffee-break
devOWL coffee-breakdevOWL coffee-break
devOWL coffee-break
 
SEO basics for developers
SEO basics for developersSEO basics for developers
SEO basics for developers
 
Startup tactics for developers: A, B, C
Startup tactics for developers: A, B, CStartup tactics for developers: A, B, C
Startup tactics for developers: A, B, C
 
HR VS DEV
HR VS DEVHR VS DEV
HR VS DEV
 
Bootstrap3 basics
Bootstrap3 basicsBootstrap3 basics
Bootstrap3 basics
 
Testing is coming
Testing is comingTesting is coming
Testing is coming
 
Easily create apps using Phonegap
Easily create apps using PhonegapEasily create apps using Phonegap
Easily create apps using Phonegap
 
Trainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.js
Trainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.jsTrainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.js
Trainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.js
 
Как и зачем мы тестируем UI
Как и зачем мы тестируем UIКак и зачем мы тестируем UI
Как и зачем мы тестируем UI
 
ECMAScript 5 Features
ECMAScript 5 FeaturesECMAScript 5 Features
ECMAScript 5 Features
 
Потоковая репликация PostgreSQL
Потоковая репликация PostgreSQLПотоковая репликация PostgreSQL
Потоковая репликация PostgreSQL
 
Async Module Definition via RequireJS
Async Module Definition via RequireJSAsync Module Definition via RequireJS
Async Module Definition via RequireJS
 
AngularJS basics & theory
AngularJS basics & theoryAngularJS basics & theory
AngularJS basics & theory
 
Miscosoft Singularity - konkurs presentation
Miscosoft Singularity - konkurs presentationMiscosoft Singularity - konkurs presentation
Miscosoft Singularity - konkurs presentation
 
Reactивная тяга
Reactивная тягаReactивная тяга
Reactивная тяга
 
Как оценить время на тестирование. Александр Зиновьев, Test Lead Softengi
Как оценить время на тестирование. Александр Зиновьев, Test Lead SoftengiКак оценить время на тестирование. Александр Зиновьев, Test Lead Softengi
Как оценить время на тестирование. Александр Зиновьев, Test Lead Softengi
 

Ähnlich wie Lucene in Action

DIY Percolator
DIY PercolatorDIY Percolator
DIY Percolator
jdhok
 
Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010
Rob Windsor
 
Coherence SIG: Advanced usage of indexes in coherence
Coherence SIG: Advanced usage of indexes in coherenceCoherence SIG: Advanced usage of indexes in coherence
Coherence SIG: Advanced usage of indexes in coherence
aragozin
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
WO Community
 

Ähnlich wie Lucene in Action (20)

Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenches
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
 
IR with lucene
IR with luceneIR with lucene
IR with lucene
 
Fast track to lucene
Fast track to luceneFast track to lucene
Fast track to lucene
 
Java Search Engine Framework
Java Search Engine FrameworkJava Search Engine Framework
Java Search Engine Framework
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
 
DIY Percolator
DIY PercolatorDIY Percolator
DIY Percolator
 
Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 
Infinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGMInfinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGM
 
CouchDB-Lucene
CouchDB-LuceneCouchDB-Lucene
CouchDB-Lucene
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Solr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene EuroconSolr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene Eurocon
 
Open Source Search: An Analysis
Open Source Search: An AnalysisOpen Source Search: An Analysis
Open Source Search: An Analysis
 
Coherence SIG: Advanced usage of indexes in coherence
Coherence SIG: Advanced usage of indexes in coherenceCoherence SIG: Advanced usage of indexes in coherence
Coherence SIG: Advanced usage of indexes in coherence
 
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 

Mehr von DevOWL Meetup

Mehr von DevOWL Meetup (7)

Что такое современная Frontend разработка
Что такое современная Frontend разработкаЧто такое современная Frontend разработка
Что такое современная Frontend разработка
 
CQRS and EventSourcing
CQRS and EventSourcingCQRS and EventSourcing
CQRS and EventSourcing
 
Cага о сагах
Cага о сагахCага о сагах
Cага о сагах
 
MeetupCamp Витебский летний митап 5-6 июля
MeetupCamp Витебский летний митап 5-6 июляMeetupCamp Витебский летний митап 5-6 июля
MeetupCamp Витебский летний митап 5-6 июля
 
Обзор Haxe & OpenFl
Обзор Haxe & OpenFlОбзор Haxe & OpenFl
Обзор Haxe & OpenFl
 
Recommerce изнутри
Recommerce изнутриRecommerce изнутри
Recommerce изнутри
 
Google map markers with Symfony2
Google map markers with Symfony2Google map markers with Symfony2
Google map markers with Symfony2
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Lucene in Action

  • 1. Lucene in Action Применение Lucene для построения высокопроизволительных систем Гавриленко Евгений Ведущий разработчик Artezio
  • 2. Lucene • Что же это такое? • Twitter 1млрд запросов в день • hh.ru 400 запросов в секунду • LinkedIn, FedEx…
  • 3. Основные компоненты индексации • IndexWriter • Directory (FSDirectory, RAMDirectory) • Analyzer • Document • Field / Multivalued fields
  • 4. Построение индекса var directory = new RAMDirectory(); //var directory = FSDirectory.Open("/tmp/testindex"); var analyzer = new RussianAnalyzer(Version.LUCENE_30); using (var writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED)) { for (var i = 0; i < 1000000; i++) { var doc = new Document(); doc.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); doc.Add(new Field("text",string.Format("{0} строка 2.", i),Field.Store.YES,Field.Index.ANALYZED)); writer.AddDocument(doc); if (i%100000 == 0) Console.WriteLine("[{1}]: {0} документов сохранено.",i,DateTime.Now); } writer.Optimize(); }
  • 5. Схема данных var doc1 = new Document(); doc1.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc1.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); var field = new NumericField(“numericField1”, Field.Store.NO, true); doc1.Add(field.SetDoubleValue(value)); var doc2 = new Document(); doc2.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc2.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); doc2.Add(new Field(“blablaFild1", “blabla-body",Field.Store.YES,Field.Index.ANALYZED));
  • 6. Основные компоненты поиска • IndexSearcher/MultiSearcher/ParallelMultiSearcher • Term • Query • TermQuery • TopDocs
  • 7. Query • TermQuery • MultiFieldQueryParser • BooleanQuery • NumericRangeQuery • SpanQuery • … • QueryParser
  • 8. Поиск var reader = IndexReader.Open(directory, true); var searcher = new IndexSearcher(reader); var parser = new QueryParser(Version.LUCENE_30, "text", analyzer); var query = parser.Parse("20 строку"); var hits = searcher.Search(query, 100); Console.WriteLine("total hits: {0}", hits.TotalHits); if (hits.TotalHits == 0) return; var rdoc = reader.Document(hits.ScoreDocs[0].Doc); Console.WriteLine("value:{0}", rdoc.Get("text"));
  • 9. Поиск с сортировкой switch (sl) { case "barcode": case "code": indexSort = new Sort(new SortField(sl, SortField.STRING,indexDir)); break; case "price": indexSort = new Sort(new SortField(sl, SortField.DOUBLE, indexDir)); break; default: indexSort = new Sort(new SortField(sl, SortField.STRING, indexDir)); break; } ... searcher.SetDefaultFieldSortScoring(true,false); var hits = searcher.Search(query, filter, count, indexSort);
  • 11. Анализаторы • StandardAnalyzer • SnowballAnalyzer • KeywordAnalyzer • WhitespaceAnalyzer • RussianAnalyzer ()
  • 13. Linq to Lucene public class Article { [Field(Analyzer = typeof(StandardAnalyzer))] public string Author { get; set; } [Field(Analyzer = typeof(StandardAnalyzer))] public string Title { get; set; } public DateTimeOffset PublishDate { get; set; } [NumericField] public long Id { get; set; } [Field(IndexMode.NotIndexed, Store = StoreMode.Yes)] public string BodyText { get; set; } [Field("text", Store = StoreMode.No, Analyzer = typeof(PorterStemAnalyzer))] public string SearchText { get { return string.Join(" ", new[] {Author, Title, BodyText}); } } }
  • 14. Linq to Lucene var directory = new RAMDirectory(); var provider = new LuceneDataProvider(directory, Version.LUCENE_30); using (var session = provider.OpenSession<Article>()) { session.Add(new Article {Author = "John Doe", BodyText = "some body text", PublishDate = DateTimeOffset.UtcNow}); } var articles = provider.AsQueryable<Article>(); var threshold = DateTimeOffset.UtcNow.Subtract(TimeSpan.FromDays(30)); var articlesByJohn = from a in articles where a.Author == "John Doe" && a.PublishDate > threshold orderby a.Title select a; Console.WriteLine("Articles by John Doe: " + articlesByJohn.Count()); var searchResults = from a in articles where a.SearchText == "some search query" select a; Console.WriteLine("Search Results: " + searchResults.Count());
  • 15. Полезные ресурсы • Lucene http://lucene.apache.org/ • Lucene.Net http://lucenenet.apache.org • Linq to Lucene https://github.com/themotleyfool/Lucene.Net.Linq • “Lucene in Action” http://it-ebooks.info/book/2112