Presented by Patrick Beaucamp | Bpm-Conseil. See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012
Vanilla, an Open Source business intelligence application by bpm-conseil.com, offers unique features such as report indexing through an embedded Lucene integration. Using Vanilla and Lucene, developers can manage both report indexing and external document indexing, which ultimately saves end users time when they search for specific keywords such as "product code," or "customer code." Vanilla can build upon an existing Solr/Lucene installation that takes care of all the indexing processes while Vanilla takes care of the Reporting/Dashboard creation. During this presentation, attendees will learn how we moved from embed Lucene Api to a Solr/Lucene platform and all the technical and business benefits from this architecture in terms of clustering, caching and access mode.
How to Gain Greater Business Intelligence from Lucene/Solr
1. Patrick Beaucamp
Founder of the Vanilla Project
Mail : Patrick.beaucamp@bpm-conseil.com
How to Gain Greater Business Intelligence
with Vanilla from Solr/Lucene
LuceneRevolution, Boston 1
2. Presentation Agenda
Vanilla powered by Lucene
- Report Indexation, Search Interface
- External document management
- evolution & constraints
Step to Solr/Lucene Adoption
- Indexation, Storage, Search
- Embeded Solr/Lucene
- External Solr/Lucene Platform
Keys Benefit for Vanilla powered by Solr/Lucene
- Cluster Architecture
- Cache Mechanism
- Support for enhanced search language
LuceneRevolution, Boston 2
3. Some Vanilla Features
Flash maps and charts : Reports, Cubes and Dashboard
Vanilla Apps : Android and Iphone
LuceneRevolution, Boston 3
4. Vanilla Powered by Lucene (1/6)
Vanilla is a full Business Intelligence Platform that provide :
- Reporting, Olap, Dashboard, Kpi, Maps Visualisation
- Etl, Workflow, Document Management search Engine
LuceneRevolution, Boston 4
5. Vanilla Powered by Lucene (2/6)
Report Indexation
- Search engine is Apache Lucene (summer 2010)
- External Document & Vanilla Report are indexed
- Different Indexation strategy for documents :
– No indexation
– Real Time indexation
– Late Indexation
2 modules to manage indexation strategy
- Enterprise Services to set document property
- Norparena to Manage Indexation
LuceneRevolution, Boston 5
6. Vanilla Powered by Lucene (3/6)
Search Interface
- Search Interface available from Vanilla Portal
- Search against Lucene index (inside Vanilla)
- Search result is combined with Security on documents
– List contains all documents
– Documents are ordered based on popularity
LuceneRevolution, Boston 6
7. Vanilla Powered by Lucene (4/6)
External document management
- various document format are available (Lucene)
- additional properties can be set on documents, for later
useage in search criteria
- check In / check Out on document for versioning
- search is run on the latest document version
LuceneRevolution, Boston 7
8. Vanilla Powered by Lucene (5/6)
Evolution and constraints
- No clustering available for search engine (embeded Api),
as opposed to Vanilla Report Services
- Limitation in language and keywords (internal search)
- No cache to manage search resultset, as opposed to
Vanilla dataset, powered by Memcached
- request from customers to be compliant with enterprise
search engine → need to setup an external search
architecture
LuceneRevolution, Boston 8
9. Vanilla Powered by Lucene (6/6)
Embeded Lucene Api inside Vanilla Platform - Video
LuceneRevolution, Boston 9
10. Step to Solr/Lucene Adoption (1/9)
Solr/Lucene is the natural evolution of any embeded Lucene platform
Solr Version : 3.5
Indexation
Vanilla Lucene Index can be transfert & read by a Solr/Lucene
(a Solr/Lucene index is not usable inside Vanilla Platform)
Storage
Vanilla search Indexed can be managed by a Solr/Lucene platform
Search
Search language is compliant
LuceneRevolution, Boston 10
11. Step to Solr/Lucene Adoption (2/9)
Embeded Solr/Lucene inside Vanilla Platform
No need for any changed in Vanilla code : use of solrj Api
Immediatly provide additional features such as new Keywords
Potential upgrade to Solr/Lucene Enterprise
LuceneRevolution, Boston 11
12. Step to Solr/Lucene Adoption (3/9)
From Embeded Lucene to Embeded Solr/Lucene inside Vanilla Platform
LuceneRevolution, Boston 12
13. Step to Solr/Lucene Adoption (4/9)
Embeded Solr/Lucene inside Vanilla Platform - Video
LuceneRevolution, Boston 13
14. Step to Solr/Lucene Adoption (5/9)
Solr/Lucene Platform with a Vanilla Platform
Need for changes in Vanilla code, to separate document management, indexation
& search Api → 10 man days workload
Document Management Api
Easy to move to any Cmis compliancy
Indexation & Search Api
Solr/Lucene oriented & compliant, but now open to any other Search Platform
LuceneRevolution, Boston 14
15. Step to Solr/Lucene Adoption (6/9)
Coding Before
Example of Code (Api) Before the split
- Direct use of the Lucene Api
- Parse the document content using Apache TIKA
- Generate Lucene's queries
LuceneRevolution, Boston 15
16. Step to Solr/Lucene Adoption (7/9)
Coding After
Example of Code (Api) After the split
- Easy to use Solrj Api
- Distributed search
- Indexation with automatic parsing (using Apache Tika)
LuceneRevolution, Boston 16
17. Step to Solr/Lucene Adoption (8/9)
Solr/Lucene Platform with Vanilla Platform - Screenshot
LuceneRevolution, Boston 17
18. Step to Solr/Lucene Adoption (9/9)
Solr/Lucene Platform with Vanilla Platform - Video
LuceneRevolution, Boston 18
19. Key Benefits for Vanilla Powered
by Solr/Lucene (1/4)
Clustering Search Architecture, outside of Vanilla
Search results clustering implementation (CarrotClusteringEngine) is based on the
Carrot2 framework.
LuceneRevolution, Boston 19
20. Key Benefits for Vanilla Powered
by Solr/Lucene (2/4)
Additional query language to perform search
Solr Uses the Lucene Search Library and Extends it!
- A Real Data Schema, with Numeric Types, Dynamic Fields, Unique Keys
- Powerful Extensions to the Lucene Query Language
- Faceted Search and Filtering
- Geospatial Search
- Advanced, Configurable Text Analysis
LuceneRevolution, Boston 20
21. Key Benefits for Vanilla Powered
by Solr/Lucene (3/4)
New methods to manage result set (binary, Xml, Json)
Solr enterprise search server with a REST-like API.
You put documents in it (called "indexing") via
XML, JSON or binary over HTTP.
You query it via HTTP GET
and receive XML, JSON, or binary results
- Advanced Full-Text Search Capabilities
- Optimized for High Volume Web Traffic
- Standards Based Open Interfaces - XML,JSON and HTTP
LuceneRevolution, Boston 21
22. Key Benefits for Vanilla Powered
by Solr/Lucene (4/4)
Cache Mechanism
Solr caches are associated with an Index Searcher
Three cache implementations :
solr.LRUCache (LRU = Least Recently Used in memory),
solr.FastLRUCache,
solr.LFUCache (Least Frequenty Used)
Many configuration parameters for cache optimisation
LuceneRevolution, Boston 22
23. Next Steps
Upgrade to Solr 4.0
New features for Document cycle Management
Roadmap for better Internationalisation :
- 10 languages available (not Japaneese)
- Search Translation management
LuceneRevolution, Boston 23
24. Documentations and tutorials available on our Web sites:
www.bpm-conseil.com and forge.bpm-conseil.com
Thanks for your attention
LuceneRevolution, Boston 24