SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Apache UIMA &
Semantic Search
      Tommaso Teofili
   tommaso@apache.org
Apache UIMA - what is it?
Unstructered Information Management Architecture

Architectural Framework to manage (eventually large) volumes of
unstructered data

Former IBM Alphaworks project donated to ASF

Currently an Incubator podling ( http://incubator.apache.org/uima )

Apache UIMA is an Oasis standard ( http://www.oasis-open.org )
Apache UIMA - how?

Many pluggable reusable components (described via XML)
Analysis Engines (primitive or aggregates)
Asynchronous scaleout (JMS, Apache ActiveMQ)
Flow controllers
Type systems
Apache UIMA - what is NOT?

It’s not a semantic search tool inherently
the “Lucas example”
the semantic search package for UIMA is not open source!
( http://www.alphaworks.ibm.com/tech/uima/download )
UIMA & Semantic Search
Metadata generation engine for CM systems
Data enrichment
Linked data
Jeopardy (see http://www.research.ibm.com/deepqa/
faq.shtml#24 )
Let’s see...
RE Market Analysis & UIMA
Macpi: a real estate market analysis tool developed at DIA
   Webpipe (crawling and wrapping data)
   Apache UIMA
   Spring framework
Knowledge extraction
Extract metadata with Apache UIMA to build our search
Apache UIMA & AlchemyAPI
AlchemyAPI from Orchestr8 services wrapped as UIMA AEs
Named-entity recognition, word disambiguation
   “Barack Obama” is http://dbpedia.org/resource/Barack_Obama

Exploiting linked data
   enriching free text with DBpedia, GeoNames, Freebase URIs

Plugging with other UIMA AEs
   providing you with a reusable component to deal with Linked Data
UIMA & Semantic Search



  it’s demo time!

Weitere ähnliche Inhalte

Ähnlich wie Apache UIMA and Semantic Search

Don't Cross The Streams - Data Streaming And Apache Flink
Don't Cross The Streams  - Data Streaming And Apache FlinkDon't Cross The Streams  - Data Streaming And Apache Flink
Don't Cross The Streams - Data Streaming And Apache FlinkJohn Gorman (BSc, CISSP)
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache Tuscany
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache TuscanyApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache Tuscany
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache TuscanyJean-Sebastien Delfino
 
Security From The Big Data and Analytics Perspective
Security From The Big Data and Analytics PerspectiveSecurity From The Big Data and Analytics Perspective
Security From The Big Data and Analytics PerspectiveAll Things Open
 
Apache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New FeaturesApache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New FeaturesHao Chen
 
ImpressCMS Persistable Framework: Rapid Modules Development
ImpressCMS Persistable Framework: Rapid Modules DevelopmentImpressCMS Persistable Framework: Rapid Modules Development
ImpressCMS Persistable Framework: Rapid Modules DevelopmentINBOX International inc.
 
Harnessing Stack Overflow for the IDE - RSSE 2012
Harnessing Stack Overflow for the IDE - RSSE 2012Harnessing Stack Overflow for the IDE - RSSE 2012
Harnessing Stack Overflow for the IDE - RSSE 2012Luca Ponzanelli
 
Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...
Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...
Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...cresco
 
Cms an overview
Cms an overviewCms an overview
Cms an overviewkmusthu
 
Apache Eagle Architecture Evolvement
Apache Eagle Architecture EvolvementApache Eagle Architecture Evolvement
Apache Eagle Architecture EvolvementHao Chen
 
(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014
(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014
(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014Amazon Web Services
 
Amoeba distributed operating System
Amoeba distributed operating SystemAmoeba distributed operating System
Amoeba distributed operating SystemSaurabh Gupta
 
Graymeta C4 use case, Deduplication
Graymeta C4 use case, DeduplicationGraymeta C4 use case, Deduplication
Graymeta C4 use case, DeduplicationETCenter
 
AWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAmazon Web Services
 
Log Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin KropovLog Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin KropovSoftServe
 
Log Data Analysis Platform
Log Data Analysis PlatformLog Data Analysis Platform
Log Data Analysis PlatformValentin Kropov
 
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...Trieu Nguyen
 
IBM i Resources Retiring?
IBM i Resources Retiring?IBM i Resources Retiring?
IBM i Resources Retiring?HelpSystems
 

Ähnlich wie Apache UIMA and Semantic Search (20)

Don't Cross The Streams - Data Streaming And Apache Flink
Don't Cross The Streams  - Data Streaming And Apache FlinkDon't Cross The Streams  - Data Streaming And Apache Flink
Don't Cross The Streams - Data Streaming And Apache Flink
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache Tuscany
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache TuscanyApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache Tuscany
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache Tuscany
 
Security From The Big Data and Analytics Perspective
Security From The Big Data and Analytics PerspectiveSecurity From The Big Data and Analytics Perspective
Security From The Big Data and Analytics Perspective
 
Apache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New FeaturesApache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New Features
 
Lighty
LightyLighty
Lighty
 
ImpressCMS Persistable Framework: Rapid Modules Development
ImpressCMS Persistable Framework: Rapid Modules DevelopmentImpressCMS Persistable Framework: Rapid Modules Development
ImpressCMS Persistable Framework: Rapid Modules Development
 
Cloud Computing 2019
Cloud Computing 2019Cloud Computing 2019
Cloud Computing 2019
 
Harnessing Stack Overflow for the IDE - RSSE 2012
Harnessing Stack Overflow for the IDE - RSSE 2012Harnessing Stack Overflow for the IDE - RSSE 2012
Harnessing Stack Overflow for the IDE - RSSE 2012
 
Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...
Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...
Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...
 
Cms an overview
Cms an overviewCms an overview
Cms an overview
 
Apache Eagle Architecture Evolvement
Apache Eagle Architecture EvolvementApache Eagle Architecture Evolvement
Apache Eagle Architecture Evolvement
 
(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014
(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014
(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014
 
Amoeba distributed operating System
Amoeba distributed operating SystemAmoeba distributed operating System
Amoeba distributed operating System
 
Graymeta C4 use case, Deduplication
Graymeta C4 use case, DeduplicationGraymeta C4 use case, Deduplication
Graymeta C4 use case, Deduplication
 
AWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache Storm
 
Log Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin KropovLog Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin Kropov
 
Log Data Analysis Platform
Log Data Analysis PlatformLog Data Analysis Platform
Log Data Analysis Platform
 
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
 
IBM i Resources Retiring?
IBM i Resources Retiring?IBM i Resources Retiring?
IBM i Resources Retiring?
 

Mehr von Tommaso Teofili

Affect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRAffect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRTommaso Teofili
 
Flexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit OakFlexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit OakTommaso Teofili
 
Data replication in Sling
Data replication in SlingData replication in Sling
Data replication in SlingTommaso Teofili
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industryTommaso Teofili
 
Scaling search in Oak with Solr
Scaling search in Oak with Solr Scaling search in Oak with Solr
Scaling search in Oak with Solr Tommaso Teofili
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and SolrTommaso Teofili
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache HamaTommaso Teofili
 
Adapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiAdapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiTommaso Teofili
 
Domeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and ClerezzaDomeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and ClerezzaTommaso Teofili
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
OSS Enterprise Search EU Tour
OSS Enterprise Search EU TourOSS Enterprise Search EU Tour
OSS Enterprise Search EU TourTommaso Teofili
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesTommaso Teofili
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the WebTommaso Teofili
 

Mehr von Tommaso Teofili (15)

Affect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRAffect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IR
 
Flexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit OakFlexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit Oak
 
Data replication in Sling
Data replication in SlingData replication in Sling
Data replication in Sling
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industry
 
Scaling search in Oak with Solr
Scaling search in Oak with Solr Scaling search in Oak with Solr
Scaling search in Oak with Solr
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache Hama
 
Adapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiAdapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGi
 
Oak / Solr integration
Oak / Solr integrationOak / Solr integration
Oak / Solr integration
 
Domeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and ClerezzaDomeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and Clerezza
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
OSS Enterprise Search EU Tour
OSS Enterprise Search EU TourOSS Enterprise Search EU Tour
OSS Enterprise Search EU Tour
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - Usecases
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the Web
 

Kürzlich hochgeladen

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Apache UIMA and Semantic Search

  • 1. Apache UIMA & Semantic Search Tommaso Teofili tommaso@apache.org
  • 2. Apache UIMA - what is it? Unstructered Information Management Architecture Architectural Framework to manage (eventually large) volumes of unstructered data Former IBM Alphaworks project donated to ASF Currently an Incubator podling ( http://incubator.apache.org/uima ) Apache UIMA is an Oasis standard ( http://www.oasis-open.org )
  • 3. Apache UIMA - how? Many pluggable reusable components (described via XML) Analysis Engines (primitive or aggregates) Asynchronous scaleout (JMS, Apache ActiveMQ) Flow controllers Type systems
  • 4. Apache UIMA - what is NOT? It’s not a semantic search tool inherently the “Lucas example” the semantic search package for UIMA is not open source! ( http://www.alphaworks.ibm.com/tech/uima/download )
  • 5. UIMA & Semantic Search Metadata generation engine for CM systems Data enrichment Linked data Jeopardy (see http://www.research.ibm.com/deepqa/ faq.shtml#24 ) Let’s see...
  • 6. RE Market Analysis & UIMA Macpi: a real estate market analysis tool developed at DIA Webpipe (crawling and wrapping data) Apache UIMA Spring framework Knowledge extraction Extract metadata with Apache UIMA to build our search
  • 7. Apache UIMA & AlchemyAPI AlchemyAPI from Orchestr8 services wrapped as UIMA AEs Named-entity recognition, word disambiguation “Barack Obama” is http://dbpedia.org/resource/Barack_Obama Exploiting linked data enriching free text with DBpedia, GeoNames, Freebase URIs Plugging with other UIMA AEs providing you with a reusable component to deal with Linked Data
  • 8. UIMA & Semantic Search it’s demo time!

Hinweis der Redaktion

  1. Hello everybody, I am Tommaso Teofili, I am from Rome and I work in Sourcesense and I’m an Apache UIMA committer. In this talk I am going to tell you about Apache UIMA & Semantic Search.
  2. Apache UIMA stands for Unstructerd Information Management Architecture, it's an architectural framework to manage large volumes of unstructured data. It was an IBM project donated to Apache. It's currently an Incubator podling and the new 2.3.0 release is coming. It's also been approved as an Oasis standard. By the way, I became Apache UIMA committer on August.
  3. UIMA is made of components, which can be put in a pipeline to be executed. Every Apache UIMA component is described via XML descriptors. The most important of which is the Analysis Engine, that is responsible for the very analysis of documents. Each AE can be aggregated using an aggregate XML descriptor. To manage large volumes of data UIMA can scale using Asynchronous Scaleout. Each UIMA pipeline is made of AEs, one or more flow controllers and CAS consumers, the component which is responsible of treating the output (usually CASs).
  4. Apache UIMA is not inherently a semantic search tool. The semantic search tool package for UIMA is not open source. Lucas is a CAS consumer for UIMA capable of putting UIMA annotations inside Lucene indexes, a semi-semantic approach.
  5. UIMA is well suited as a metadata generation engine for CM systems. It can enrich data with annotations, entities, categories and so on, eventually linking them. For those who know the US Jeopardy game, IBM it's building a system able to answer open-domain questions against humans and it's based on UIMA-AS. We'll see two examples of the power of UIMA to boost semantic search.
  6. Here's the first one: Recently during my college studies I used UIMA to extract metadata to build semantic searches on a real estate market analysis tool called Macpi. zone statistics -> UIMA to find zones, price, filter out bad matchings frequent announces -> disambiguation, data filtering
  7. Another little demo I’m going to show is about integrating Apache UIMA and AlchemyAPI from Ochestr8. AlchemyAPI provides, you guess, services to extract Entities from text, but what we are interested in here is using the linked data, to exploit, for example DBPedia, Freebase and so on data clouds. So the power of UIMA here is again putting together in a pipeline many components, once my system learned I can easily substitute this component with a UIMA Dictionary using the ConceptMapper. I get Alchemy component then I can put the OpenCalais component, my personal dictionary of specific terms, a machine learning engine based on what UIMA extracts and so on. Here we’ll see indexing documents by concepts.
  8. Ok, let’s go to the demos!