SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
ECIR 2014 Industry Day
Content Discovery Through Entity Driven Search
Alessandro Benedetti
http://uk.linkedin.com/in/alexbenedetti
Antonio David Perez Morales
http://es.linkedin.com/in/adperezmorales
16th
April 2014
• Experienced at building and delivering a wide range of enterprise
solutions across the whole information life cycle
• Alfresco & Ephesoft certified Platinum Partner
• Red Hat Enterprise Linux Ready Partner
• Crafter & Varnish Gold Partners
• Search Solutions Consultant
Alfresco Partner of the Year 2012 and
2013
Working effectively together
Who We Are
3
Antonio David Pérez Morales
- R&D Senior Engineer
- Master in Engineering and Technology
Software
- Digital Identity and Security expert
- Enterprise Search Background
- Semantic, NLP, ML Technologies and
Information Retrieval lover
- Apache Stanbol Committer
- Apache contributor
@adperezmorales
http://es.linkedin.com/in/adperezmorales/
Alessandro Benedetti
- R&D Senior Engineer
- Master in Computer Science
- Information Retrieval background
-- Enterprise Search specialist
- Semantic, NLP, ML Technologies
and Information Retrieval lover
@AlexBenedetti
http://uk.linkedin.com/in/alexbenedetti
Working effectively together
Agenda
4
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Agenda
5
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Zaizi R&D Department
6
•Giving sense to the content
• Enriching it semantically
•Adding value to ECM/CMS
• More structured content, easy to manage, link and search,
•Improving search
• Across different domains, data sources, User Experience
• Machine Learning applied research
• Content Organization – Recommendation Systems
Working effectively together
Agenda
7
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Enterprise Search Problems
8
Challenge :
Search within Big and Heterogeneus Repositories
• Heterogeneus Data Sources
• Filesystem, DB, ECM/CMS, Email, …
• Unstructured Content
• PDFs, text plain, Word, …
• Documents not linked between each other
• Federated Search needed
• Search across data sources
• Different permissions
• Centralized endpoint
Working effectively together
Current Enterprise Search Weaknesses
9
• Keyword based
• Low precision
• Ambiguous terms not in context
• Not accurate weighting when keywords are combined
in a query
Working effectively together
Agenda
10
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Entity Driven Search
11
• Moves from keywords to Entities
•More understandable to a Human
• Process the unstructured text
• Enrich it
• Build specific indexes
• Use entities and concepts in searches
Working effectively together
Sensefy
12
• Semantic Enterprise Search Engine
• Federated Search
• Evolved User Experience
• Based on cutting-edge Open Source Frameworks
Working effectively together
Architecture
13
Working effectively together
RedLink
14
• Semantic Cloud platform
• Providing Software as a Service
• Manage unstructured data
• Extract knowledge and intelligence
• Make sense of information
• Feed into business processes
• Open-Source based components
• Entity Linking using Knowledge Bases
Working effectively together
NLP & Semantic Enrichment
15
• From unstructured to structured
• NLP Analysis. POS Tagging
• Named Entities Recognition
• Linked Data
• Entity Linking using Knowledge Bases
• Disambiguation
• Indexing in Solr
Working effectively together
Smart Autocomplete
16
• Multi Phase suggestions
• Closer to natural language query formulation
• Named Entities infix
• Entity types infix
• Multi Language entity type support
• Properties driven query approach
Working effectively together
Smart Autocomplete
Configuration
17
• Entity type properties
•Interesting to our use case and scenario
• Properties inheritance through type hierarchy
• Enhance type information from external resource
•Freebase, DbPedia , Custom Data Set
Working effectively together
Semantic Search
18
• Search by Named Entity
• Search by Entity Type
• Search by Entity Type properties
• Grouping Results by Sense
• Contextualize Results Using Semantic Information
Working effectively together
Semantic More Like This
19
• Search for Similar Documents based on Entities and Entities’
categories
• Similarity Function based on Documents’ Sense
• Not based on text tokens
• Entity Frequency /
Inverted Document Frequency
• Entity Type Frequency /
Inverted Document Frequency
Working effectively together
Agenda
20
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Agenda
21
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Future Work
22
• Semantic More Like This new approach (Graph
relations)
• Machine Learning components: Classification, Topic
annotation, Clustering
• Semantic facets
• Secured Entity Search
• Image and Media searches
Working effectively together
Conclusions
23
• Better user experience
• More precision in search results
• Closer to human language
Zaizi Headquarters
Brook House
4th Floor, North Wing
229-243 Shepherd’s Bush Road
London W6 7AN
United Kingdom
T: (+44) 20 3582 8330
Zaizi Iberia
Calle Gremios 13-15, Edificio Diseño
Planta 1, Oficina 5
41927 Mairena del Aljarafe
Sevilla
Spain
T: (+34) 666 42 43 64
Zaizi Asia
50 Flower Road
Colombo 07
Sri Lanka
T: (+94) 112 301 461
Zaizi Singapore
14 Robinson Road #13-00
Far East Finance Building
Singapore 048545
T: (+65) 3158 5886
F: (+65) 6323 1839
VAT Registration No GB 932 8855 89
Registered in England and Wales with registration number 6440931
www.zaizi.com
Thanks!

Weitere ähnliche Inhalte

Was ist angesagt?

Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
Peter Mika
 

Was ist angesagt? (17)

Semantic Search on the Rise
Semantic Search on the RiseSemantic Search on the Rise
Semantic Search on the Rise
 
Managing Annotations (OR2016)
Managing Annotations (OR2016)Managing Annotations (OR2016)
Managing Annotations (OR2016)
 
Semantic Search at Yahoo
Semantic Search at YahooSemantic Search at Yahoo
Semantic Search at Yahoo
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
IR
IRIR
IR
 
Making things findable
Making things findableMaking things findable
Making things findable
 
The Art Of Searching
The Art Of SearchingThe Art Of Searching
The Art Of Searching
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech Legislation
 
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
 
An Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic SearchAn Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic Search
 
IR: Open source state
IR: Open source stateIR: Open source state
IR: Open source state
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
 
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
IIIF Foundational Specifications
IIIF Foundational SpecificationsIIIF Foundational Specifications
IIIF Foundational Specifications
 

Ähnlich wie Content Discovery Through Entity Driven Search

How search engines work
How search engines workHow search engines work
How search engines work
Chinna Botla
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
enterprisesearchmeetup
 
SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...
SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...
SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...
Agnes Molnar
 
#SEASPC: Information Architecture and Enterprise Search - Better Together
#SEASPC: Information Architecture and Enterprise Search - Better Together#SEASPC: Information Architecture and Enterprise Search - Better Together
#SEASPC: Information Architecture and Enterprise Search - Better Together
Agnes Molnar
 
European SharePoint Conference Automated Tagging and Metadata Management w...
European SharePoint Conference   Automated Tagging and Metadata  Management w...European SharePoint Conference   Automated Tagging and Metadata  Management w...
European SharePoint Conference Automated Tagging and Metadata Management w...
B-S-S Business Software Solutions GmbH
 

Ähnlich wie Content Discovery Through Entity Driven Search (20)

Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
Chalitha Perera | Cross Media Concept and Entity Driven Search for EnterpriseChalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
 
cross media concept and entity driven search for enterprise
cross media concept and entity driven search for enterprisecross media concept and entity driven search for enterprise
cross media concept and entity driven search for enterprise
 
How search engines work
How search engines workHow search engines work
How search engines work
 
The evolution of Search spscinci
The evolution of Search spscinciThe evolution of Search spscinci
The evolution of Search spscinci
 
Enterprise search Information
Enterprise search Information Enterprise search Information
Enterprise search Information
 
Solving Real World Challenges with Enterprise Search
Solving Real World Challenges with Enterprise SearchSolving Real World Challenges with Enterprise Search
Solving Real World Challenges with Enterprise Search
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 
Search Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint Summit
Search Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint SummitSearch Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint Summit
Search Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint Summit
 
Graph databases and the #panamapapers
Graph databases and the #panamapapersGraph databases and the #panamapapers
Graph databases and the #panamapapers
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
 
10 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-19
10 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-1910 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-19
10 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-19
 
Harnessing search engines for KM
Harnessing search engines for KMHarnessing search engines for KM
Harnessing search engines for KM
 
Information Architecture Exposing the Secret Sauce for Success
Information Architecture Exposing the Secret Sauce for Success Information Architecture Exposing the Secret Sauce for Success
Information Architecture Exposing the Secret Sauce for Success
 
SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...
SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...
SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...
 
Going Meta – How to Use Metadata in SharePoint and Office 365
Going Meta – How to Use Metadata in SharePoint and Office 365Going Meta – How to Use Metadata in SharePoint and Office 365
Going Meta – How to Use Metadata in SharePoint and Office 365
 
SharePoint Saturday Toronto - Going Meta – How to Use Metadata in SharePoint ...
SharePoint Saturday Toronto - Going Meta – How to Use Metadata in SharePoint ...SharePoint Saturday Toronto - Going Meta – How to Use Metadata in SharePoint ...
SharePoint Saturday Toronto - Going Meta – How to Use Metadata in SharePoint ...
 
Crawlable Spatial Data - #Geo4Web research topic #3
Crawlable Spatial Data - #Geo4Web research topic #3Crawlable Spatial Data - #Geo4Web research topic #3
Crawlable Spatial Data - #Geo4Web research topic #3
 
#SEASPC: Information Architecture and Enterprise Search - Better Together
#SEASPC: Information Architecture and Enterprise Search - Better Together#SEASPC: Information Architecture and Enterprise Search - Better Together
#SEASPC: Information Architecture and Enterprise Search - Better Together
 
FAST Search-webinar-06-29-2010
FAST Search-webinar-06-29-2010FAST Search-webinar-06-29-2010
FAST Search-webinar-06-29-2010
 
European SharePoint Conference Automated Tagging and Metadata Management w...
European SharePoint Conference   Automated Tagging and Metadata  Management w...European SharePoint Conference   Automated Tagging and Metadata  Management w...
European SharePoint Conference Automated Tagging and Metadata Management w...
 

Mehr von Alessandro Benedetti

Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Alessandro Benedetti
 
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Alessandro Benedetti
 
Multi-language Content Discovery Through Entity Driven Search
Multi-language Content Discovery Through Entity Driven SearchMulti-language Content Discovery Through Entity Driven Search
Multi-language Content Discovery Through Entity Driven Search
Alessandro Benedetti
 

Mehr von Alessandro Benedetti (9)

Search Quality Evaluation to Help Reproducibility : an Open Source Approach
Search Quality Evaluation to Help Reproducibility : an Open Source ApproachSearch Quality Evaluation to Help Reproducibility : an Open Source Approach
Search Quality Evaluation to Help Reproducibility : an Open Source Approach
 
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
 
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachSearch Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
 
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
 
Search Quality Evaluation: Tools and Techniques
Search Quality Evaluation: Tools and TechniquesSearch Quality Evaluation: Tools and Techniques
Search Quality Evaluation: Tools and Techniques
 
From Academic Papers To Production : A Learning To Rank Story
From Academic Papers To Production : A Learning To Rank StoryFrom Academic Papers To Production : A Learning To Rank Story
From Academic Papers To Production : A Learning To Rank Story
 
Advanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache LuceneAdvanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache Lucene
 
Lucene And Solr Document Classification
Lucene And Solr Document ClassificationLucene And Solr Document Classification
Lucene And Solr Document Classification
 
Multi-language Content Discovery Through Entity Driven Search
Multi-language Content Discovery Through Entity Driven SearchMulti-language Content Discovery Through Entity Driven Search
Multi-language Content Discovery Through Entity Driven Search
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Content Discovery Through Entity Driven Search

  • 1. ECIR 2014 Industry Day Content Discovery Through Entity Driven Search Alessandro Benedetti http://uk.linkedin.com/in/alexbenedetti Antonio David Perez Morales http://es.linkedin.com/in/adperezmorales 16th April 2014
  • 2. • Experienced at building and delivering a wide range of enterprise solutions across the whole information life cycle • Alfresco & Ephesoft certified Platinum Partner • Red Hat Enterprise Linux Ready Partner • Crafter & Varnish Gold Partners • Search Solutions Consultant Alfresco Partner of the Year 2012 and 2013
  • 3. Working effectively together Who We Are 3 Antonio David Pérez Morales - R&D Senior Engineer - Master in Engineering and Technology Software - Digital Identity and Security expert - Enterprise Search Background - Semantic, NLP, ML Technologies and Information Retrieval lover - Apache Stanbol Committer - Apache contributor @adperezmorales http://es.linkedin.com/in/adperezmorales/ Alessandro Benedetti - R&D Senior Engineer - Master in Computer Science - Information Retrieval background -- Enterprise Search specialist - Semantic, NLP, ML Technologies and Information Retrieval lover @AlexBenedetti http://uk.linkedin.com/in/alexbenedetti
  • 4. Working effectively together Agenda 4 • Context • Problem • Solution • Demo • Future Works
  • 5. Working effectively together Agenda 5 • Context • Problem • Solution • Demo • Future Works
  • 6. Working effectively together Zaizi R&D Department 6 •Giving sense to the content • Enriching it semantically •Adding value to ECM/CMS • More structured content, easy to manage, link and search, •Improving search • Across different domains, data sources, User Experience • Machine Learning applied research • Content Organization – Recommendation Systems
  • 7. Working effectively together Agenda 7 • Context • Problem • Solution • Demo • Future Works
  • 8. Working effectively together Enterprise Search Problems 8 Challenge : Search within Big and Heterogeneus Repositories • Heterogeneus Data Sources • Filesystem, DB, ECM/CMS, Email, … • Unstructured Content • PDFs, text plain, Word, … • Documents not linked between each other • Federated Search needed • Search across data sources • Different permissions • Centralized endpoint
  • 9. Working effectively together Current Enterprise Search Weaknesses 9 • Keyword based • Low precision • Ambiguous terms not in context • Not accurate weighting when keywords are combined in a query
  • 10. Working effectively together Agenda 10 • Context • Problem • Solution • Demo • Future Works
  • 11. Working effectively together Entity Driven Search 11 • Moves from keywords to Entities •More understandable to a Human • Process the unstructured text • Enrich it • Build specific indexes • Use entities and concepts in searches
  • 12. Working effectively together Sensefy 12 • Semantic Enterprise Search Engine • Federated Search • Evolved User Experience • Based on cutting-edge Open Source Frameworks
  • 14. Working effectively together RedLink 14 • Semantic Cloud platform • Providing Software as a Service • Manage unstructured data • Extract knowledge and intelligence • Make sense of information • Feed into business processes • Open-Source based components • Entity Linking using Knowledge Bases
  • 15. Working effectively together NLP & Semantic Enrichment 15 • From unstructured to structured • NLP Analysis. POS Tagging • Named Entities Recognition • Linked Data • Entity Linking using Knowledge Bases • Disambiguation • Indexing in Solr
  • 16. Working effectively together Smart Autocomplete 16 • Multi Phase suggestions • Closer to natural language query formulation • Named Entities infix • Entity types infix • Multi Language entity type support • Properties driven query approach
  • 17. Working effectively together Smart Autocomplete Configuration 17 • Entity type properties •Interesting to our use case and scenario • Properties inheritance through type hierarchy • Enhance type information from external resource •Freebase, DbPedia , Custom Data Set
  • 18. Working effectively together Semantic Search 18 • Search by Named Entity • Search by Entity Type • Search by Entity Type properties • Grouping Results by Sense • Contextualize Results Using Semantic Information
  • 19. Working effectively together Semantic More Like This 19 • Search for Similar Documents based on Entities and Entities’ categories • Similarity Function based on Documents’ Sense • Not based on text tokens • Entity Frequency / Inverted Document Frequency • Entity Type Frequency / Inverted Document Frequency
  • 20. Working effectively together Agenda 20 • Context • Problem • Solution • Demo • Future Works
  • 21. Working effectively together Agenda 21 • Context • Problem • Solution • Demo • Future Works
  • 22. Working effectively together Future Work 22 • Semantic More Like This new approach (Graph relations) • Machine Learning components: Classification, Topic annotation, Clustering • Semantic facets • Secured Entity Search • Image and Media searches
  • 23. Working effectively together Conclusions 23 • Better user experience • More precision in search results • Closer to human language
  • 24. Zaizi Headquarters Brook House 4th Floor, North Wing 229-243 Shepherd’s Bush Road London W6 7AN United Kingdom T: (+44) 20 3582 8330 Zaizi Iberia Calle Gremios 13-15, Edificio Diseño Planta 1, Oficina 5 41927 Mairena del Aljarafe Sevilla Spain T: (+34) 666 42 43 64 Zaizi Asia 50 Flower Road Colombo 07 Sri Lanka T: (+94) 112 301 461 Zaizi Singapore 14 Robinson Road #13-00 Far East Finance Building Singapore 048545 T: (+65) 3158 5886 F: (+65) 6323 1839 VAT Registration No GB 932 8855 89 Registered in England and Wales with registration number 6440931 www.zaizi.com Thanks!