Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
OSS Enterprise Search EU Tour
1. OSS Enterprise Search
EU Tour
Spreading Enterprise Search Solutions around Europe
London – Amsterdam – Rome
25 – 26 – 28 October
Upayavira
Maurizio Pillitu
Tommaso Teofili
2. Summary
✓ Sourcesense is involved in many Open Source projects
✓ We continuously spot opportunities to integrate them
✓ Has contributors to Apache UIMA and CMIS we saw
opportunities to integrate them with Lucene/Solr...
✓ ...and so we did!
✓ Everything is already released as OSS or will be shortly
4. Solr - UIMA
✓ A Solr plugin to automatically extract relevant
knowledge from documents while indexing them
✓ Recognize and search document’s language, sentences,
keywords, concepts, named entities, ...
✓ Extensible architecture provided by Apache UIMA to
extract and index more information via configuration
✓ Proposed as Apache Solr patch (issue SOLR-2129)
5. Solr – UIMA use cases
✓ Automatic enable language specific documents’ search
✓ Easy sentence scoped search
✓ Full text search on concepts, keywords or other named
entities (cities, persons, companies)
✓ Semantic faceting
✓ Plug other semantic enrichment engines (no further
architectural layers required)
6. CMIS
✓ Interoperability between different Enterprise Content
Management Systems
✓ OASIS Specification on May 1, 2010
✓ Standard Data Model
✓ SOAP and ATOM Pub WS over REST
✓ Java, JavaScript, PHP, Python, .NET implementations
7. Why CMIS
✓ Allows to build and leverage applications against
multiple repositories
✓ Decouples Web Services from the Content
Management System
✓ Avoids yet another custom WS tier
✓ Standardized and certified interfaces
✓ Platform and language agnostic
8. Solr CMIS Integration
✓ Retrieves documents from multiple CMIS repositories
✓ Configurable mapping cmis:document into
solr:document
✓ Leverages Solr Multicore feature
✓ Smooth integration with pre-existing data
✓ Keeps Solr indexes up-to-date with CMIS repository
changes
9. “From scratch Deployment”
●
The need:
✓ Reliable, resilient, scalable search solution
✓ Ability to roll out new 'rows' at will
●
The solution:
✓ Virtualisation
✓ Automation
✓ Tools: Capistrano, bash, potentially Puppet/Chef
11. DeployX Stages
Instantiate VMInstantiate VM
Configure hostConfigure host
Deploy applicationDeploy application
Duplicate dataDuplicate data
Add to poolAdd to pool
Push buttonPush button
Inparallel