SlideShare ist ein Scribd-Unternehmen logo
1 von 12
OSS Enterprise Search
EU Tour
Spreading Enterprise Search Solutions around Europe
London – Amsterdam – Rome
25 – 26 – 28 October
Upayavira
Maurizio Pillitu
Tommaso Teofili
Summary
✓ Sourcesense is involved in many Open Source projects
✓ We continuously spot opportunities to integrate them
✓ Has contributors to Apache UIMA and CMIS we saw
opportunities to integrate them with Lucene/Solr...
✓ ...and so we did!
✓ Everything is already released as OSS or will be shortly
Solr - UIMA
Semantic content extraction while indexing
Solr - UIMA
✓ A Solr plugin to automatically extract relevant
knowledge from documents while indexing them
✓ Recognize and search document’s language, sentences,
keywords, concepts, named entities, ...
✓ Extensible architecture provided by Apache UIMA to
extract and index more information via configuration
✓ Proposed as Apache Solr patch (issue SOLR-2129)
Solr – UIMA use cases
✓ Automatic enable language specific documents’ search
✓ Easy sentence scoped search
✓ Full text search on concepts, keywords or other named
entities (cities, persons, companies)
✓ Semantic faceting
✓ Plug other semantic enrichment engines (no further
architectural layers required)
CMIS
✓ Interoperability between different Enterprise Content
Management Systems
✓ OASIS Specification on May 1, 2010
✓ Standard Data Model
✓ SOAP and ATOM Pub WS over REST
✓ Java, JavaScript, PHP, Python, .NET implementations
Why CMIS
✓ Allows to build and leverage applications against
multiple repositories
✓ Decouples Web Services from the Content
Management System
✓ Avoids yet another custom WS tier
✓ Standardized and certified interfaces
✓ Platform and language agnostic
Solr CMIS Integration
✓ Retrieves documents from multiple CMIS repositories
✓ Configurable mapping cmis:document into
solr:document
✓ Leverages Solr Multicore feature
✓ Smooth integration with pre-existing data
✓ Keeps Solr indexes up-to-date with CMIS repository
changes
“From scratch Deployment”
●
The need:
✓ Reliable, resilient, scalable search solution
✓ Ability to roll out new 'rows' at will
●
The solution:
✓ Virtualisation
✓ Automation
✓ Tools: Capistrano, bash, potentially Puppet/Chef
Sample Solr Setup
shard3shard3
co-ordinatorco-ordinator
shard1shard1
shard2shard2
Load balancer
Load balancer
shard3shard3
co-ordinatorco-ordinator
shard1shard1
shard2shard2
DeployX Stages
Instantiate VMInstantiate VM
Configure hostConfigure host
Deploy applicationDeploy application
Duplicate dataDuplicate data
Add to poolAdd to pool
Push buttonPush button
Inparallel
Demo
✓Extract CMIS documents
✓Index on Solr
✓Enrich with UIMA

Weitere ähnliche Inhalte

Ähnlich wie OSS Enterprise Search EU Tour

Adopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuiteAdopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuite
AnswerModules
 

Ähnlich wie OSS Enterprise Search EU Tour (20)

Apereo OAE - Bootcamp
Apereo OAE - BootcampApereo OAE - Bootcamp
Apereo OAE - Bootcamp
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
 
Alfresco Coding mit dem Alfresco SDK (auf Englisch) - Julien Bruinaud, Techni...
Alfresco Coding mit dem Alfresco SDK (auf Englisch) - Julien Bruinaud, Techni...Alfresco Coding mit dem Alfresco SDK (auf Englisch) - Julien Bruinaud, Techni...
Alfresco Coding mit dem Alfresco SDK (auf Englisch) - Julien Bruinaud, Techni...
 
Azure Templates for Consistent Deployment
Azure Templates for Consistent DeploymentAzure Templates for Consistent Deployment
Azure Templates for Consistent Deployment
 
Cloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
Cloudera - Using morphlines for on the-fly ETL by Wolfgang HoschekCloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
Cloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
 
Adopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuiteAdopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuite
 
Using AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics ServiceUsing AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics Service
 
Activate CTO Day
Activate CTO DayActivate CTO Day
Activate CTO Day
 
[DevDay 2016] OpenStack and approaches for new users - Speaker: Chi Le – Head...
[DevDay 2016] OpenStack and approaches for new users - Speaker: Chi Le – Head...[DevDay 2016] OpenStack and approaches for new users - Speaker: Chi Le – Head...
[DevDay 2016] OpenStack and approaches for new users - Speaker: Chi Le – Head...
 
PowerPoint Presentation
PowerPoint PresentationPowerPoint Presentation
PowerPoint Presentation
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
Easy Docker Deployments with Mesosphere DCOS on Azure
Easy Docker Deployments with Mesosphere DCOS on AzureEasy Docker Deployments with Mesosphere DCOS on Azure
Easy Docker Deployments with Mesosphere DCOS on Azure
 
Above the cloud joarder kamal
Above the cloud   joarder kamalAbove the cloud   joarder kamal
Above the cloud joarder kamal
 
Azure fb-google Web Services
Azure fb-google Web ServicesAzure fb-google Web Services
Azure fb-google Web Services
 
Exploring, understanding and monitoring macOS activity with osquery
Exploring, understanding and monitoring macOS activity with osqueryExploring, understanding and monitoring macOS activity with osquery
Exploring, understanding and monitoring macOS activity with osquery
 
Starting Azure mobile services
Starting Azure mobile servicesStarting Azure mobile services
Starting Azure mobile services
 
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
 
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
 
Microservices in the Enterprise
Microservices in the Enterprise Microservices in the Enterprise
Microservices in the Enterprise
 
From your First Migration to Mass migrations.
From your First Migration to Mass migrations. From your First Migration to Mass migrations.
From your First Migration to Mass migrations.
 

Mehr von Tommaso Teofili

Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industry
Tommaso Teofili
 
Scaling search in Oak with Solr
Scaling search in Oak with Solr Scaling search in Oak with Solr
Scaling search in Oak with Solr
Tommaso Teofili
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
Tommaso Teofili
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache Hama
Tommaso Teofili
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 

Mehr von Tommaso Teofili (19)

Affect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRAffect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IR
 
Flexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit OakFlexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit Oak
 
Data replication in Sling
Data replication in SlingData replication in Sling
Data replication in Sling
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industry
 
Scaling search in Oak with Solr
Scaling search in Oak with Solr Scaling search in Oak with Solr
Scaling search in Oak with Solr
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache Hama
 
Adapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiAdapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGi
 
Oak / Solr integration
Oak / Solr integrationOak / Solr integration
Oak / Solr integration
 
Domeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and ClerezzaDomeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and Clerezza
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in Solr
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Apache UIMA - Hands on code
Apache UIMA - Hands on codeApache UIMA - Hands on code
Apache UIMA - Hands on code
 
Apache UIMA Introduction
Apache UIMA IntroductionApache UIMA Introduction
Apache UIMA Introduction
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - Usecases
 
Apache UIMA and Metadata Generation
Apache UIMA and Metadata GenerationApache UIMA and Metadata Generation
Apache UIMA and Metadata Generation
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the Web
 
Apache UIMA and Semantic Search
Apache UIMA and Semantic SearchApache UIMA and Semantic Search
Apache UIMA and Semantic Search
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 

OSS Enterprise Search EU Tour

  • 1. OSS Enterprise Search EU Tour Spreading Enterprise Search Solutions around Europe London – Amsterdam – Rome 25 – 26 – 28 October Upayavira Maurizio Pillitu Tommaso Teofili
  • 2. Summary ✓ Sourcesense is involved in many Open Source projects ✓ We continuously spot opportunities to integrate them ✓ Has contributors to Apache UIMA and CMIS we saw opportunities to integrate them with Lucene/Solr... ✓ ...and so we did! ✓ Everything is already released as OSS or will be shortly
  • 3. Solr - UIMA Semantic content extraction while indexing
  • 4. Solr - UIMA ✓ A Solr plugin to automatically extract relevant knowledge from documents while indexing them ✓ Recognize and search document’s language, sentences, keywords, concepts, named entities, ... ✓ Extensible architecture provided by Apache UIMA to extract and index more information via configuration ✓ Proposed as Apache Solr patch (issue SOLR-2129)
  • 5. Solr – UIMA use cases ✓ Automatic enable language specific documents’ search ✓ Easy sentence scoped search ✓ Full text search on concepts, keywords or other named entities (cities, persons, companies) ✓ Semantic faceting ✓ Plug other semantic enrichment engines (no further architectural layers required)
  • 6. CMIS ✓ Interoperability between different Enterprise Content Management Systems ✓ OASIS Specification on May 1, 2010 ✓ Standard Data Model ✓ SOAP and ATOM Pub WS over REST ✓ Java, JavaScript, PHP, Python, .NET implementations
  • 7. Why CMIS ✓ Allows to build and leverage applications against multiple repositories ✓ Decouples Web Services from the Content Management System ✓ Avoids yet another custom WS tier ✓ Standardized and certified interfaces ✓ Platform and language agnostic
  • 8. Solr CMIS Integration ✓ Retrieves documents from multiple CMIS repositories ✓ Configurable mapping cmis:document into solr:document ✓ Leverages Solr Multicore feature ✓ Smooth integration with pre-existing data ✓ Keeps Solr indexes up-to-date with CMIS repository changes
  • 9. “From scratch Deployment” ● The need: ✓ Reliable, resilient, scalable search solution ✓ Ability to roll out new 'rows' at will ● The solution: ✓ Virtualisation ✓ Automation ✓ Tools: Capistrano, bash, potentially Puppet/Chef
  • 10. Sample Solr Setup shard3shard3 co-ordinatorco-ordinator shard1shard1 shard2shard2 Load balancer Load balancer shard3shard3 co-ordinatorco-ordinator shard1shard1 shard2shard2
  • 11. DeployX Stages Instantiate VMInstantiate VM Configure hostConfigure host Deploy applicationDeploy application Duplicate dataDuplicate data Add to poolAdd to pool Push buttonPush button Inparallel
  • 12. Demo ✓Extract CMIS documents ✓Index on Solr ✓Enrich with UIMA