SlideShare ist ein Scribd-Unternehmen logo
1 von 77
Java Indexing and Searching By : Shay Sofer & EvgenyBorisov
Motivation Lucene Intro Hibernate Search Indexing Searching Scoring Alternatives Agenda
Motivation What is Full Text Search and why do I need it?
Motivation Use case “Book” table Good practices for Gava
We’d like to :  Index the information efficiently answer queries using that index More common than you think Full Text Search Motivation
Integrated full text search engine in the database  e.g. DBSight, Recent versions of MySQL, MS SQL Server, Oracle Text, etc		 Out of the box Search Appliances  e.g. Google Search Appliance Third party libraries Full Text Search Solutions Motivation
Lucene Intro
The most popular full text search library Scalable and high performance Around for about 9 years Open source  Supported by the Apache Software Foundation Apache Lucene Lucene Intro
Lucene Intro
“Word-oriented” search Powerful query syntax Wildcards, typos, proximity search. Sorting by relevance (Lucene’s scoring algorithm) or any other field Fast searching, fast indexing Inverted index. Lucene’s Features Lucene Intro
Lucene Intro Inverted Index          DB Head First Java 0 Best of the best of the best 1 Chuck Norris in action 2 JBoss in action 3
A Field is a key+value. Value is always represented as a String (Textual) A Document can contain as many Fields as we’d like Lucene’sindex is a collection of Documents Basic Definitions Lucene Intro
Lucene Intro Using Lucene API… IndexSearcher is = newIndexSearcher(“BookIndex"); QueryParserparser = newQueryParser("title", 							        analyzer); Query query = parser.parse(“Good practices for Gava”); return is.search(query);
OO domain model Vs. Lucene’s Index structure Lucene Intro
The Structural Mismatch Converting objects to string and vice versa No representation of relation between Documents The Synchronization Mismatch DB must by sync’ed with the index The Retrieval Mismatch Retrieving documents ( =pairs of key + value) and not objects	 Object vs Flat text mismatches Lucene Intro
Hibernate Search Emmanuel Bernard
Leverages ORM and Lucene together to solve those mismatches Complements Hibernate Core by providing FTS on persistent domain models. It’s actually a bridge that hides the sometimes complex Lucene API usage. Open source. Hibernate Search
Document = Class (Mapped POJO) Hibernate Search metadata can be described by Annotations only Regardless,  you can still use Hibernate Core with XML descriptors (hbm files) Let’s create our first mapping – Book Mapping Hibernate Search
@Entity @Indexed publicclass Book implementsSerializable  { @Id private Long id; @Boost(2.0f)    @Field  private String title; @Field    privateStringdescription;    privateStringimageURL; @Field (index=Index.UN_TOKENIZED)    privateStringisbn;    …  } Hibernate Search
Types will be converted via “Field Bridge”. It is a bridge between the Java type and its representation in Lucene (aka String) Hibernate Search comes with a set for most standard types (Numbers – primitives and wrappers, Date, Class etc) They are extendable, of course Bridges Hibernate Search
We can use a field bridge… @FieldBridge(impl = MyPaddedFieldBridge.class, params = {@Parameter(name="padding", 						  value=“5")} ) public Double getPrice(){ return price; } Or a class bridge - incase the data we want to index is more than just the field itself e.g. concatenation of 2 fields Custom Bridges Hibernate Search
In order to create a custom bridge we need to implement the interface StringBridge ParameterizedBridge – to inject params Custom Bridges Hibernate Search
Directory is where Lucene stores its index structure. Filesystem Directory Provider In-memory Directory Provider Clustering Directory Providers Hibernate Search
Default Most efficient Limited only by the disk’s free space Can be easily replicated Luke support Filesystem Directory Provider Hibernate Search
Index dies as soon as SessionFactory is closed. Very useful when unit testing. (along side with            in-memory DBs) Data can be made persistent at any moment, if needed. Obviously, be aware of OutOfMemoryException In-memory Directory Provider	 Hibernate Search
<!--  Hibernate Search Config --> <propertyname="hibernate.search.default.directory_provider"> org.hibernate.search.store.FSDirectoryProvider </property> <propertyname= "hibernate.search.com.alphacsp.Book.directory_provider"> org.hibernate.search.store.RAMDirectoryProvider </property> Directory Providers Config Example Hibernate Search
Correlated queries - How do we navigate from one entity to another? Lucene doesn’t support relationships between documents Hibernate Search to the rescue - Denormalization Relationships Hibernate Search
Hibernate Search
@Entity@Indexed publicclass Book{   @ManyToOne   @IndexEmbedded 	private Author author; } @Entity @Indexed publicclass Author{ private String firstName; } Object navigation is easy (author.firstName) Relationships Hibernate Search
Entities  can be referenced by other entities. Relationships – Denormalization Pitfall Hibernate Search
Entities  can be referenced by other entities. Relationships – Denormalization Pitfall Hibernate Search
Entities  can be referenced by other entities. Relationships – Denormalization Pitfall Hibernate Search
The solution: The association pointing back to the parent will be marked with @ContainedIn @Entity @Indexed publicclass Book{   @ManyToOne   @IndexEmbedded private Author author; } @Entity @Indexed publicclass Author{ @OneToMany(mappedBy=“author”)  	@ContainedIn 	private Set<Book> books; } Relationships – Solution Hibernate Search
Responsible for tokenizing and filtering words  Tokenizing – not a trivial as it seems Filtering – Clearing the noise (case, stop words etc) and applying “other” operations Creating a custom analyzer is easy The default analyzer is Standard Analyzer Analyzers Hibernate Search
StandardTokenizer : Splits words and removes punctuations. StandardFilter : Removes apostrophes and dots from acronyms. LowerCaseFilter : Decapitalizes words. StopFilter : Eliminates common words. Standard Analyzer Hibernate Search
Other cool Filters…. Hibernate Search
N-Gram algorithm – Indexing a sequence of n consecutive characters.   Usually when a typo occurs, part of the word is still correct Encyclopedia in 3-grams = Enc | ncy | cyc | ycl | clo | lop | ope | ped | edi | dia Approximative Search Hibernate Search
Algorithms for indexing of words by their pronunciation  The most widely known algorithm is Soundex Other Algorithms that are available : RefinedSoundex, Metaphone, DoubleMetaphone Phonetic Approximation Hibernate Search
Synonyms You can expand your synonym dictionary with your own rules (e.g. Business oriented words) Stemming Stemming is the process of reducing words to their stem, base or root form. “Fishing”, “Fisher”, “Fish” and  “Fished”  Fish Snowball stemming language – supports over 15 languages Synonyms & Stemming Hibernate Search
Lucene is bundled with the basic analyzers, tokenizers and filters.  More can be found at Lucene’s contribution part and at Apache-Solr Additional Analyzers Hibernate Search
No free Hebrew analyzer for Lucene ItamarSyn-Hershko Involved in the creation of CLucene (The C++ port of Lucene) Creating a Hebrew analyzer as a side project Looking to join forces itamar@divrei-tora.com Hebrew? Hibernate Search
Hibernate Search שר הטבעות, גירסה ראשונה:אחוות הטבעת
Motivation Lucene Intro Hibernate Search Indexing Searching Scoring Alternatives Agenda
When data has changed? Which data has changed? When to index the changing data? How to do it all efficiently?   Hibernate Search will do it for you! Transparent indexing Indexing
Indexing – On Rollback  Application Queue DB Start Transaction Session  (Entity Manager) Insert/update delete Lucene Index
Indexing – On Rollback  Transaction failed Application Queue DB Rollback Start Transaction Session  (Entity Manager) Insert/update delete Lucene Index
Indexing – On Commit  Transaction Committed Application Queue DB Session  (Entity Manager) Insert/update delete √ Lucene Index
<property         name="org.hibernate.worker.execution“>async </property> <property       name="org.hibernate.worker.thread_pool.size“>2  </property> <property      name="org.hibernate.worker.buffer_queue.max“>10 </property>     hibernate.cfg.xml Indexing
Indexing It’s too late! I already have a database without Lucene!
FullTextSession extends from Session of Hibernate core  Session session = sessionFactory.openSession(); FullTextSessionfts = Search.getFullTextSession(session); index(Object entity) purge(Class entityType, Serializable id) purgeAll(Class entityType) Manual indexing Indexing
tx = fullTextSession.beginTransaction(); //read the data from the database  Query query = fullTextSession.createCriteria(Book.class);  List<Book> books = query.list(); for (Book book: books ) { fullTextSession.index( book); } tx.commit(); Manual indexing Indexing
tx = fullTextSession.beginTransaction(); List<Integer> ids = getIds(); for (Integer id : ids) { if(…){ fullTextSession.purge(Book.class, id );   }  } tx.commit(); fullTextSession.purgeAll(Book.class); Removing objects from the Lucene index Indexing
Indexing Rrrr!!! I got an OutOfMemoryException!
session.setFlushMode(FlushMode.MANUAL); session.setCacheMode(CacheMode.IGNORE); Transactiontx=session.beginTransaction(); ScrollableResultsresults = session.createCriteria(Item.class) 		    .scroll(ScrollMode.FORWARD_ONLY); intindex = 0; while(results.next())   { index++; session.index(results.get(0)); if (index % BATCH_SIZE == 0){ session.flushToIndexes(); session.clear();  } } tx.commit(); Indexing 100 54
Searching
 title : lord  title: rings +title : lord +title: rings  title : lord –author: Tolkien  title: r?ngs  title: r*gs  title: “Lord of the Rings”  title: “Lord Rings”~5  title: rengs~0.8  title: lord  author: Tolkien^2 And more… Lucene’s Query Syntax Searching
To build FTS queries we need to: Create a Lucene query Create a Hibernate Search query that wraps the Lucene query Why? No need to build framework around Lucene Converting document to object happens transparently. Seamless integration with Hibernate Core API Querying Searching
String stringToSearch = “rings"; Term term = new Term(“title",stringToSearch); TermQuery query = newTermQuery(term); FullTextQueryhibQuery =  session.createFullTextQuery(query,Book.class);  List<Book> results = hibQuery.list();  Hibernate Queries Examples Searching
String stringToSearch = "r??gs"; Term term = new Term(“title",stringToSearch); WildCardQuery query = newWildCardQuery (term); ... List<Book> results = hibQuery.list();  WildCardQuery Example Searching
Motivation Use case Book table Good practices for Gava
HS Query Flowchart Searching   Hibernate  Search Query Query the index Lucene Index Client Receive matching ids Loads objects from the Persistence Context DB DB access  (if needed) Persistence Context
You can use list(), uniqueResult(), iterate(), scroll() – just like in Hibernate Core ! Multistage search engine Sorting Explanation object Querying tips Searching
Score
Most based on Vector Space Model of Salton Score
Most based on Vector Space Model of Salton Score
Term Rating Score Logarithm number of documents in the index term weight total number of documents containing term “I” best java in action books
Term Rating Calculation Score
Head First Java Best of the best of the best Best examples from Hibernate in action The best action of Chuck Norris Scoring example Score Search for: “best java in action books” 0.60206 0.12494 0.30103
Conventional Boolean retrieval Calculating score for only matching documents Customizing similarity algorithm Query boosting Custom scoring algorithms Lucene’s scoring approach Score
Alternatives
Alternatives Shay Banon
Alternatives Distributed Spring support Simple Lucene based Integrates with popular ORM frameworks Configurable via XML or annotations Local & External TX Manager
Alternatives
Enterprise Search Server Supports multiple protocols (xml, json, ruby, etc...) Runs as a standalone Full Text Search server within a servlet e.g. Tomcat Heavily based on Lucene JSA – Java Search API (based on JPA) ODM (Object/Document Mapping)  Spring integration (Transactions) Apache Solr Alternatives
Powerful  Web Administration Interface Can be tailored without any Java coding! Extensive plugin architecture Server statistics exposed over JMX Scalability – easily replicated Apache Solr Alternatives
Resources Lucene Lucenecontrib part Hibernate Search Hibernate Search in Action / Emmanuel Bernard, John Griffin Compass Apache Solr
Thank you! Q & A

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Enginelucenerevolution
 
Search is the UI
Search is the UI Search is the UI
Search is the UI danielbeach
 
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...Lucidworks
 
Content analysis for ECM with Apache Tika
Content analysis for ECM with Apache TikaContent analysis for ECM with Apache Tika
Content analysis for ECM with Apache TikaPaolo Mottadelli
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchTrey Grainger
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrTrey Grainger
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engineth0masr
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksLucidworks
 
Munching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingMunching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingabial
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...lucenerevolution
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrAndy Jackson
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHPPaul Borgermans
 

Was ist angesagt? (19)

ElasticSearch Basics
ElasticSearch Basics ElasticSearch Basics
ElasticSearch Basics
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
ProjectHub
ProjectHubProjectHub
ProjectHub
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Engine
 
Search is the UI
Search is the UI Search is the UI
Search is the UI
 
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
 
Content analysis for ECM with Apache Tika
Content analysis for ECM with Apache TikaContent analysis for ECM with Apache Tika
Content analysis for ECM with Apache Tika
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
elasticsearch
elasticsearchelasticsearch
elasticsearch
 
Solr Architecture
Solr ArchitectureSolr Architecture
Solr Architecture
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
 
Munching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingMunching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processing
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 

Andere mochten auch

Linux history & features
Linux history & featuresLinux history & features
Linux history & featuresRohit Kumar
 
History Of Linux
History Of LinuxHistory Of Linux
History Of Linuxanand09
 
Java Web Application Security with Java EE, Spring Security and Apache Shiro ...
Java Web Application Security with Java EE, Spring Security and Apache Shiro ...Java Web Application Security with Java EE, Spring Security and Apache Shiro ...
Java Web Application Security with Java EE, Spring Security and Apache Shiro ...Matt Raible
 
Intro To Hibernate
Intro To HibernateIntro To Hibernate
Intro To HibernateAmit Himani
 
Microservices - java ee vs spring boot and spring cloud
Microservices - java ee vs spring boot and spring cloudMicroservices - java ee vs spring boot and spring cloud
Microservices - java ee vs spring boot and spring cloudBen Wilcock
 
tybsc it asp.net full unit 1,2,3,4,5,6 notes
tybsc it asp.net full unit 1,2,3,4,5,6 notestybsc it asp.net full unit 1,2,3,4,5,6 notes
tybsc it asp.net full unit 1,2,3,4,5,6 notesWE-IT TUTORIALS
 
Developing an ASP.NET Web Application
Developing an ASP.NET Web ApplicationDeveloping an ASP.NET Web Application
Developing an ASP.NET Web ApplicationRishi Kothari
 
Java EE and Spring Side-by-Side
Java EE and Spring Side-by-SideJava EE and Spring Side-by-Side
Java EE and Spring Side-by-SideReza Rahman
 

Andere mochten auch (9)

Linux history & features
Linux history & featuresLinux history & features
Linux history & features
 
History Of Linux
History Of LinuxHistory Of Linux
History Of Linux
 
Java Web Application Security with Java EE, Spring Security and Apache Shiro ...
Java Web Application Security with Java EE, Spring Security and Apache Shiro ...Java Web Application Security with Java EE, Spring Security and Apache Shiro ...
Java Web Application Security with Java EE, Spring Security and Apache Shiro ...
 
Intro To Hibernate
Intro To HibernateIntro To Hibernate
Intro To Hibernate
 
Microservices - java ee vs spring boot and spring cloud
Microservices - java ee vs spring boot and spring cloudMicroservices - java ee vs spring boot and spring cloud
Microservices - java ee vs spring boot and spring cloud
 
tybsc it asp.net full unit 1,2,3,4,5,6 notes
tybsc it asp.net full unit 1,2,3,4,5,6 notestybsc it asp.net full unit 1,2,3,4,5,6 notes
tybsc it asp.net full unit 1,2,3,4,5,6 notes
 
Developing an ASP.NET Web Application
Developing an ASP.NET Web ApplicationDeveloping an ASP.NET Web Application
Developing an ASP.NET Web Application
 
green engine
green enginegreen engine
green engine
 
Java EE and Spring Side-by-Side
Java EE and Spring Side-by-SideJava EE and Spring Side-by-Side
Java EE and Spring Side-by-Side
 

Ähnlich wie JavaEdge09 : Java Indexing and Searching

ElasticSearch Getting Started
ElasticSearch Getting StartedElasticSearch Getting Started
ElasticSearch Getting StartedOnuralp Taner
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!Alex Kursov
 
Infinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGMInfinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGMJBug Italy
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Netgramana
 
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiPhilly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiRobert Calcavecchia
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1 GokulD
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and SolrGrant Ingersoll
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
Drupal & Summon: Keeping Article Discovery in the Library
Drupal & Summon: Keeping Article Discovery in the LibraryDrupal & Summon: Keeping Article Discovery in the Library
Drupal & Summon: Keeping Article Discovery in the LibraryKen Varnum
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedBeyondTrees
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20Tibor Lipusz
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAsad Abbas
 
Develop open source search engine
Develop open source search engineDevelop open source search engine
Develop open source search engineNAILBITER
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Manish kumar
 
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonChetan Giridhar
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Filebeat Elastic Search Presentation.pptx
Filebeat Elastic Search Presentation.pptxFilebeat Elastic Search Presentation.pptx
Filebeat Elastic Search Presentation.pptxKnoldus Inc.
 
Elastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptxElastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptxKnoldus Inc.
 

Ähnlich wie JavaEdge09 : Java Indexing and Searching (20)

ElasticSearch Getting Started
ElasticSearch Getting StartedElasticSearch Getting Started
ElasticSearch Getting Started
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!
 
Infinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGMInfinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGM
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Net
 
Apache Lucene Searching The Web
Apache Lucene Searching The WebApache Lucene Searching The Web
Apache Lucene Searching The Web
 
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiPhilly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Drupal & Summon: Keeping Article Discovery in the Library
Drupal & Summon: Keeping Article Discovery in the LibraryDrupal & Summon: Keeping Article Discovery in the Library
Drupal & Summon: Keeping Article Discovery in the Library
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Develop open source search engine
Develop open source search engineDevelop open source search engine
Develop open source search engine
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)
 
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in python
 
Apache solr
Apache solrApache solr
Apache solr
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Filebeat Elastic Search Presentation.pptx
Filebeat Elastic Search Presentation.pptxFilebeat Elastic Search Presentation.pptx
Filebeat Elastic Search Presentation.pptx
 
Elastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptxElastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptx
 

Kürzlich hochgeladen

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Kürzlich hochgeladen (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

JavaEdge09 : Java Indexing and Searching

  • 1. Java Indexing and Searching By : Shay Sofer & EvgenyBorisov
  • 2. Motivation Lucene Intro Hibernate Search Indexing Searching Scoring Alternatives Agenda
  • 3. Motivation What is Full Text Search and why do I need it?
  • 4. Motivation Use case “Book” table Good practices for Gava
  • 5. We’d like to : Index the information efficiently answer queries using that index More common than you think Full Text Search Motivation
  • 6. Integrated full text search engine in the database e.g. DBSight, Recent versions of MySQL, MS SQL Server, Oracle Text, etc Out of the box Search Appliances e.g. Google Search Appliance Third party libraries Full Text Search Solutions Motivation
  • 8. The most popular full text search library Scalable and high performance Around for about 9 years Open source Supported by the Apache Software Foundation Apache Lucene Lucene Intro
  • 10. “Word-oriented” search Powerful query syntax Wildcards, typos, proximity search. Sorting by relevance (Lucene’s scoring algorithm) or any other field Fast searching, fast indexing Inverted index. Lucene’s Features Lucene Intro
  • 11. Lucene Intro Inverted Index DB Head First Java 0 Best of the best of the best 1 Chuck Norris in action 2 JBoss in action 3
  • 12. A Field is a key+value. Value is always represented as a String (Textual) A Document can contain as many Fields as we’d like Lucene’sindex is a collection of Documents Basic Definitions Lucene Intro
  • 13. Lucene Intro Using Lucene API… IndexSearcher is = newIndexSearcher(“BookIndex"); QueryParserparser = newQueryParser("title", analyzer); Query query = parser.parse(“Good practices for Gava”); return is.search(query);
  • 14. OO domain model Vs. Lucene’s Index structure Lucene Intro
  • 15. The Structural Mismatch Converting objects to string and vice versa No representation of relation between Documents The Synchronization Mismatch DB must by sync’ed with the index The Retrieval Mismatch Retrieving documents ( =pairs of key + value) and not objects Object vs Flat text mismatches Lucene Intro
  • 17. Leverages ORM and Lucene together to solve those mismatches Complements Hibernate Core by providing FTS on persistent domain models. It’s actually a bridge that hides the sometimes complex Lucene API usage. Open source. Hibernate Search
  • 18. Document = Class (Mapped POJO) Hibernate Search metadata can be described by Annotations only Regardless, you can still use Hibernate Core with XML descriptors (hbm files) Let’s create our first mapping – Book Mapping Hibernate Search
  • 19. @Entity @Indexed publicclass Book implementsSerializable { @Id private Long id; @Boost(2.0f) @Field private String title; @Field privateStringdescription; privateStringimageURL; @Field (index=Index.UN_TOKENIZED) privateStringisbn; … } Hibernate Search
  • 20. Types will be converted via “Field Bridge”. It is a bridge between the Java type and its representation in Lucene (aka String) Hibernate Search comes with a set for most standard types (Numbers – primitives and wrappers, Date, Class etc) They are extendable, of course Bridges Hibernate Search
  • 21. We can use a field bridge… @FieldBridge(impl = MyPaddedFieldBridge.class, params = {@Parameter(name="padding", value=“5")} ) public Double getPrice(){ return price; } Or a class bridge - incase the data we want to index is more than just the field itself e.g. concatenation of 2 fields Custom Bridges Hibernate Search
  • 22. In order to create a custom bridge we need to implement the interface StringBridge ParameterizedBridge – to inject params Custom Bridges Hibernate Search
  • 23. Directory is where Lucene stores its index structure. Filesystem Directory Provider In-memory Directory Provider Clustering Directory Providers Hibernate Search
  • 24. Default Most efficient Limited only by the disk’s free space Can be easily replicated Luke support Filesystem Directory Provider Hibernate Search
  • 25. Index dies as soon as SessionFactory is closed. Very useful when unit testing. (along side with in-memory DBs) Data can be made persistent at any moment, if needed. Obviously, be aware of OutOfMemoryException In-memory Directory Provider Hibernate Search
  • 26. <!-- Hibernate Search Config --> <propertyname="hibernate.search.default.directory_provider"> org.hibernate.search.store.FSDirectoryProvider </property> <propertyname= "hibernate.search.com.alphacsp.Book.directory_provider"> org.hibernate.search.store.RAMDirectoryProvider </property> Directory Providers Config Example Hibernate Search
  • 27. Correlated queries - How do we navigate from one entity to another? Lucene doesn’t support relationships between documents Hibernate Search to the rescue - Denormalization Relationships Hibernate Search
  • 29. @Entity@Indexed publicclass Book{ @ManyToOne @IndexEmbedded private Author author; } @Entity @Indexed publicclass Author{ private String firstName; } Object navigation is easy (author.firstName) Relationships Hibernate Search
  • 30. Entities can be referenced by other entities. Relationships – Denormalization Pitfall Hibernate Search
  • 31. Entities can be referenced by other entities. Relationships – Denormalization Pitfall Hibernate Search
  • 32. Entities can be referenced by other entities. Relationships – Denormalization Pitfall Hibernate Search
  • 33. The solution: The association pointing back to the parent will be marked with @ContainedIn @Entity @Indexed publicclass Book{ @ManyToOne @IndexEmbedded private Author author; } @Entity @Indexed publicclass Author{ @OneToMany(mappedBy=“author”) @ContainedIn private Set<Book> books; } Relationships – Solution Hibernate Search
  • 34. Responsible for tokenizing and filtering words Tokenizing – not a trivial as it seems Filtering – Clearing the noise (case, stop words etc) and applying “other” operations Creating a custom analyzer is easy The default analyzer is Standard Analyzer Analyzers Hibernate Search
  • 35. StandardTokenizer : Splits words and removes punctuations. StandardFilter : Removes apostrophes and dots from acronyms. LowerCaseFilter : Decapitalizes words. StopFilter : Eliminates common words. Standard Analyzer Hibernate Search
  • 36. Other cool Filters…. Hibernate Search
  • 37. N-Gram algorithm – Indexing a sequence of n consecutive characters. Usually when a typo occurs, part of the word is still correct Encyclopedia in 3-grams = Enc | ncy | cyc | ycl | clo | lop | ope | ped | edi | dia Approximative Search Hibernate Search
  • 38. Algorithms for indexing of words by their pronunciation The most widely known algorithm is Soundex Other Algorithms that are available : RefinedSoundex, Metaphone, DoubleMetaphone Phonetic Approximation Hibernate Search
  • 39. Synonyms You can expand your synonym dictionary with your own rules (e.g. Business oriented words) Stemming Stemming is the process of reducing words to their stem, base or root form. “Fishing”, “Fisher”, “Fish” and “Fished”  Fish Snowball stemming language – supports over 15 languages Synonyms & Stemming Hibernate Search
  • 40. Lucene is bundled with the basic analyzers, tokenizers and filters. More can be found at Lucene’s contribution part and at Apache-Solr Additional Analyzers Hibernate Search
  • 41. No free Hebrew analyzer for Lucene ItamarSyn-Hershko Involved in the creation of CLucene (The C++ port of Lucene) Creating a Hebrew analyzer as a side project Looking to join forces itamar@divrei-tora.com Hebrew? Hibernate Search
  • 42. Hibernate Search שר הטבעות, גירסה ראשונה:אחוות הטבעת
  • 43. Motivation Lucene Intro Hibernate Search Indexing Searching Scoring Alternatives Agenda
  • 44. When data has changed? Which data has changed? When to index the changing data? How to do it all efficiently? Hibernate Search will do it for you! Transparent indexing Indexing
  • 45. Indexing – On Rollback Application Queue DB Start Transaction Session (Entity Manager) Insert/update delete Lucene Index
  • 46. Indexing – On Rollback Transaction failed Application Queue DB Rollback Start Transaction Session (Entity Manager) Insert/update delete Lucene Index
  • 47. Indexing – On Commit Transaction Committed Application Queue DB Session (Entity Manager) Insert/update delete √ Lucene Index
  • 48. <property name="org.hibernate.worker.execution“>async </property> <property name="org.hibernate.worker.thread_pool.size“>2  </property> <property name="org.hibernate.worker.buffer_queue.max“>10 </property>     hibernate.cfg.xml Indexing
  • 49. Indexing It’s too late! I already have a database without Lucene!
  • 50. FullTextSession extends from Session of Hibernate core Session session = sessionFactory.openSession(); FullTextSessionfts = Search.getFullTextSession(session); index(Object entity) purge(Class entityType, Serializable id) purgeAll(Class entityType) Manual indexing Indexing
  • 51. tx = fullTextSession.beginTransaction(); //read the data from the database Query query = fullTextSession.createCriteria(Book.class); List<Book> books = query.list(); for (Book book: books ) { fullTextSession.index( book); } tx.commit(); Manual indexing Indexing
  • 52. tx = fullTextSession.beginTransaction(); List<Integer> ids = getIds(); for (Integer id : ids) { if(…){ fullTextSession.purge(Book.class, id ); } } tx.commit(); fullTextSession.purgeAll(Book.class); Removing objects from the Lucene index Indexing
  • 53. Indexing Rrrr!!! I got an OutOfMemoryException!
  • 54. session.setFlushMode(FlushMode.MANUAL); session.setCacheMode(CacheMode.IGNORE); Transactiontx=session.beginTransaction(); ScrollableResultsresults = session.createCriteria(Item.class) .scroll(ScrollMode.FORWARD_ONLY); intindex = 0; while(results.next()) { index++; session.index(results.get(0)); if (index % BATCH_SIZE == 0){ session.flushToIndexes(); session.clear(); } } tx.commit(); Indexing 100 54
  • 56. title : lord title: rings +title : lord +title: rings title : lord –author: Tolkien title: r?ngs title: r*gs title: “Lord of the Rings” title: “Lord Rings”~5 title: rengs~0.8 title: lord author: Tolkien^2 And more… Lucene’s Query Syntax Searching
  • 57. To build FTS queries we need to: Create a Lucene query Create a Hibernate Search query that wraps the Lucene query Why? No need to build framework around Lucene Converting document to object happens transparently. Seamless integration with Hibernate Core API Querying Searching
  • 58. String stringToSearch = “rings"; Term term = new Term(“title",stringToSearch); TermQuery query = newTermQuery(term); FullTextQueryhibQuery = session.createFullTextQuery(query,Book.class); List<Book> results = hibQuery.list(); Hibernate Queries Examples Searching
  • 59. String stringToSearch = "r??gs"; Term term = new Term(“title",stringToSearch); WildCardQuery query = newWildCardQuery (term); ... List<Book> results = hibQuery.list(); WildCardQuery Example Searching
  • 60. Motivation Use case Book table Good practices for Gava
  • 61. HS Query Flowchart Searching Hibernate Search Query Query the index Lucene Index Client Receive matching ids Loads objects from the Persistence Context DB DB access (if needed) Persistence Context
  • 62. You can use list(), uniqueResult(), iterate(), scroll() – just like in Hibernate Core ! Multistage search engine Sorting Explanation object Querying tips Searching
  • 63. Score
  • 64. Most based on Vector Space Model of Salton Score
  • 65. Most based on Vector Space Model of Salton Score
  • 66. Term Rating Score Logarithm number of documents in the index term weight total number of documents containing term “I” best java in action books
  • 68. Head First Java Best of the best of the best Best examples from Hibernate in action The best action of Chuck Norris Scoring example Score Search for: “best java in action books” 0.60206 0.12494 0.30103
  • 69. Conventional Boolean retrieval Calculating score for only matching documents Customizing similarity algorithm Query boosting Custom scoring algorithms Lucene’s scoring approach Score
  • 72. Alternatives Distributed Spring support Simple Lucene based Integrates with popular ORM frameworks Configurable via XML or annotations Local & External TX Manager
  • 74. Enterprise Search Server Supports multiple protocols (xml, json, ruby, etc...) Runs as a standalone Full Text Search server within a servlet e.g. Tomcat Heavily based on Lucene JSA – Java Search API (based on JPA) ODM (Object/Document Mapping) Spring integration (Transactions) Apache Solr Alternatives
  • 75. Powerful Web Administration Interface Can be tailored without any Java coding! Extensive plugin architecture Server statistics exposed over JMX Scalability – easily replicated Apache Solr Alternatives
  • 76. Resources Lucene Lucenecontrib part Hibernate Search Hibernate Search in Action / Emmanuel Bernard, John Griffin Compass Apache Solr

Hinweis der Redaktion

  1. JIRA search for issuesECLIPSE – search for its documentation
  2. לדבר על HIBERNATE וORM ממש בקצרה ואז להעביר לשקף הבא
  3. Execution – sync or async. (default: sync)Thread_pool.size. (default: 1)Buffer_queue.max (default: infinite) Be aware of OutOfMemoryException