SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
The Road to Federated Text Mining:
Are we there yet?
II-SDV 2014
Guy Singh
Click to edit Master title styleClick to edit Master title style
“Federated search is an information retrieval technology that
allows the simultaneous search of multiple searchable
resources.
2
What is federated search?
A user makes a single query request which is distributed to the
search engines participating in the federation”
- Wikipedia
Click to edit Master title styleClick to edit Master title styleCurrent Situation
• Volume of data ever increasing
• Proprietary content can reside within Enterprise
• No need for everyone to keep standard sources up-to-date
• Data from content providers can reside on their sites
Linguamatics Customer Confidential3
Internal Content External Content
MEDLINE Clinical
Trials
Publisher
Content
FDA Drug
Labels
Patents
Click to edit Master title styleClick to edit Master title style
Data
Sources
Scientific
Literature
Social
Media
News
Web Pages
Internal
Documents
Patents
RSS
Clinical
Trials
4
Increasing Range of Data Sources
Click to edit Master title styleClick to edit Master title style
5
Varying in Structure
Click to edit Master title styleClick to edit Master title styleHow does text mining differ from keyword search?
Example: What genes affect breast cancer
Click to edit Master title styleClick to edit Master title style
• Searching across documents using keywords is relatively
trivial
– Do not need to be aware of where the words occur and in what
context
• Text mining documents with varying structure requires a
more sophisticated approach; Need to:
– Know where words matching entities/concepts occur
– Disambiguate depending on context and location
– Find terms in particular regions/parts of document for targeted
searches
7
Why does document structure matter?
Click to edit Master title styleClick to edit Master title style
• Integrate the data together into a data warehouse
– Extract, Transform and Load each data source into a new database
– Multiple copies of the data
– Data normalisation can be difficult and challenging
– Time consuming and expensive process
– Most database vendors take this approach
– Allows users to perform a single search across all the content
• Leave the data where it is, federated content
– Data remains in it’s original form and location
– Multiple data types
– Multiple network locations
– Single search across multiple different data sources
8
Approaches to dealing with different data sources
Click to edit Master title styleClick to edit Master title style
Data
Normalisation
Link the
Content
Servers
Merge
Results
Federated
Text Mining
9
How do we get to Federated Text Mining?
Click to edit Master title styleClick to edit Master title style
10
Data Normalisation – Virtual Indexes
Pathology
Reports Index
Journal
Abstracts Index
Virtual
Index
Click to edit Master title styleClick to edit Master title style
11
Data Normalisation – Document Structure
Pathology
Reports
Journal
Abstracts
Click to edit Master title styleClick to edit Master title style
12
Data Normalisation - Entities
Journal
Abstracts
Pathology
ReportsCombined
(Normalized)
Linking Content Servers
Linguamatics Customer Confidential13
Click to edit Master title styleClick to edit Master title style
• I2E 4.1 introduced a new feature – Linked Server
• One I2E server can be linked to another I2E server
• Provides access to remote and local indexes and queries
through a single I2E interface (Linked Servers)
– Indexes and queries on remote servers on the network appear the
same as local indexes
Linked Servers
Development Status
Click to edit Master title styleClick to edit Master title style
Linguamatics – Customer confidential
I2E 4.1 Linked Servers
I2E Enterprise on
Customer network
I2E OnDemand
SaaS
Infrastructure
In-house
Indexes
I2E OnDemand
Standard Indexes
I2E Enterprise Access
Custom Indexes
Access via Linked
Servers
Access via single UI
Merging Results (Part I)
Single Server, Multiple Queries
Click to edit Master title styleClick to edit Master title styleI2E 3.0 (2009) – Merging Results (part I) from one server
Profiling Individuals
• Example from news reports related to pharmaceutical industry
• Pick up properties from one document or many
© Linguamatics 2012 - Customer Confidential
Click to edit Master title styleClick to edit Master title style
© Linguamatics 2013 - Confidential
I2E 3.0 – Merging Results (part I) from one server
Document
Identifier
Patient
information
Disease history
Patient data
Medications
and dosages
Hit displayed in
context
Merging Results (Part II)
Linguamatics Customer Confidential19
Multiple Servers, Multiple Queries
Click to edit Master title styleClick to edit Master title style
20
Each Server supplying separate set of results
Content
Server 1
Content
Server 2
Content
Server 3
Content
Server 4
Merge into a single set
of results
The Road to Federated Text Mining
Linking Content Servers
Click to edit Master title styleClick to edit Master title styleI2E 4.0: Multiple Clients, Multiple Results
I2E Server 2
FDA Drug Labels
I2E Server 1
Internal Documents
external networkinternal network
Linguamatics Customer Confidential23
Click to edit Master title styleClick to edit Master title styleI2E 4.1/4.2: Single Client, Multiple Results
I2E Server 2
FDA Drug Labels
I2E Server 1
Internal Documents
external networkinternal network
Linguamatics Customer Confidential24
Linked
server
Merging Results (Part II)
Click to edit Master title styleClick to edit Master title styleQ4 2014: Single Client, Single Result, Multiple Servers
I2E Server 2
FDA Drug Labels
I2E Server 1
Internal Documents
external networkinternal network
Linguamatics Customer Confidential26
Linked
server
Click to edit Master title styleClick to edit Master title styleQ4 2014: Federated Text Mining Example
• Single Query
• Differently structured data sources on different servers
– Journal Articles (PubMed Central) on Enterprise Server
– MEDLINE on I2E OnDemand
• Single set of results
Linguamatics Customer Confidential27
Click to edit Master title styleClick to edit Master title styleThe Road to Federated – Are we there yet?
I2E 4.0
Dec 2012
I2E 4.1
October 2013
Next release: in
Development
Q4 2014
Merging the
Results (part II)
Data
Normalisation
Linking
Content Servers
Demo
Linguamatics – Customer confidential
Click to edit Master title styleClick to edit Master title style
30
Demo
Cambridge
VPN
Nice
Linked Server
Journal Abstracts
Pathology Reports
Thank you
Linguamatics – Customer confidential

Weitere ähnliche Inhalte

Was ist angesagt?

ICIC 2013 Conference Proceedings Kim Zwollo Rights Direct
ICIC 2013 Conference Proceedings Kim Zwollo Rights DirectICIC 2013 Conference Proceedings Kim Zwollo Rights Direct
ICIC 2013 Conference Proceedings Kim Zwollo Rights DirectDr. Haxel Consult
 
MongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB
 
ICIC 2013 Conference Proceedings Richard Garner (LexisNexis)
ICIC 2013 Conference Proceedings Richard Garner (LexisNexis)ICIC 2013 Conference Proceedings Richard Garner (LexisNexis)
ICIC 2013 Conference Proceedings Richard Garner (LexisNexis)Dr. Haxel Consult
 
ICIC 2013 Conference Proceedings Sumair Riyaz Dolcera
ICIC 2013 Conference Proceedings Sumair Riyaz DolceraICIC 2013 Conference Proceedings Sumair Riyaz Dolcera
ICIC 2013 Conference Proceedings Sumair Riyaz DolceraDr. Haxel Consult
 
MongoDB at Agilysys: A Case Study
MongoDB at Agilysys: A Case StudyMongoDB at Agilysys: A Case Study
MongoDB at Agilysys: A Case StudyMongoDB
 
The Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellThe Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellDr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 
iData Insights Research on Demand Offerings
iData Insights Research on Demand OfferingsiData Insights Research on Demand Offerings
iData Insights Research on Demand OfferingsiData Insights
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 
Knowledge Graphs: Changing How We Think About Data
Knowledge Graphs: Changing How We Think About DataKnowledge Graphs: Changing How We Think About Data
Knowledge Graphs: Changing How We Think About DataTim Williams
 
II-SDV 2017: Spotting the Stars in your Galaxy of Patent Data
II-SDV 2017: Spotting the Stars in your Galaxy of Patent DataII-SDV 2017: Spotting the Stars in your Galaxy of Patent Data
II-SDV 2017: Spotting the Stars in your Galaxy of Patent DataDr. Haxel Consult
 
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities  ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities Dr. Haxel Consult
 
II-SDV 2013 Key Success Factors in the Setup of Cutting-Edge Patent Intellige...
II-SDV 2013 Key Success Factors in the Setup of Cutting-Edge Patent Intellige...II-SDV 2013 Key Success Factors in the Setup of Cutting-Edge Patent Intellige...
II-SDV 2013 Key Success Factors in the Setup of Cutting-Edge Patent Intellige...Dr. Haxel Consult
 
Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...Pistoia Alliance
 
Data Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingData Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingDenodo
 
II-SDV 2013 Customized Newsletters - Strategies to Improved Current Awareness
II-SDV 2013 Customized Newsletters - Strategies to Improved Current AwarenessII-SDV 2013 Customized Newsletters - Strategies to Improved Current Awareness
II-SDV 2013 Customized Newsletters - Strategies to Improved Current AwarenessDr. Haxel Consult
 
The Economic Value of Data: A New Revenue Stream for Global Custodians
The Economic Value of Data: A New Revenue Stream for Global CustodiansThe Economic Value of Data: A New Revenue Stream for Global Custodians
The Economic Value of Data: A New Revenue Stream for Global CustodiansCognizant
 
UKSG 2018 Breakout - Setting up an effective Request for Proposal (RFP) and t...
UKSG 2018 Breakout - Setting up an effective Request for Proposal (RFP) and t...UKSG 2018 Breakout - Setting up an effective Request for Proposal (RFP) and t...
UKSG 2018 Breakout - Setting up an effective Request for Proposal (RFP) and t...UKSG: connecting the knowledge community
 

Was ist angesagt? (20)

ICIC 2013 Conference Proceedings Kim Zwollo Rights Direct
ICIC 2013 Conference Proceedings Kim Zwollo Rights DirectICIC 2013 Conference Proceedings Kim Zwollo Rights Direct
ICIC 2013 Conference Proceedings Kim Zwollo Rights Direct
 
MongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB Case Study in Healthcare
MongoDB Case Study in Healthcare
 
ICIC 2013 Conference Proceedings Richard Garner (LexisNexis)
ICIC 2013 Conference Proceedings Richard Garner (LexisNexis)ICIC 2013 Conference Proceedings Richard Garner (LexisNexis)
ICIC 2013 Conference Proceedings Richard Garner (LexisNexis)
 
ICIC 2013 Conference Proceedings Sumair Riyaz Dolcera
ICIC 2013 Conference Proceedings Sumair Riyaz DolceraICIC 2013 Conference Proceedings Sumair Riyaz Dolcera
ICIC 2013 Conference Proceedings Sumair Riyaz Dolcera
 
MongoDB at Agilysys: A Case Study
MongoDB at Agilysys: A Case StudyMongoDB at Agilysys: A Case Study
MongoDB at Agilysys: A Case Study
 
The Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellThe Enterprise Search Market in a Nutshell
The Enterprise Search Market in a Nutshell
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
Enterprise search
Enterprise searchEnterprise search
Enterprise search
 
iData Insights Research on Demand Offerings
iData Insights Research on Demand OfferingsiData Insights Research on Demand Offerings
iData Insights Research on Demand Offerings
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
Knowledge Graphs: Changing How We Think About Data
Knowledge Graphs: Changing How We Think About DataKnowledge Graphs: Changing How We Think About Data
Knowledge Graphs: Changing How We Think About Data
 
II-SDV 2017: Spotting the Stars in your Galaxy of Patent Data
II-SDV 2017: Spotting the Stars in your Galaxy of Patent DataII-SDV 2017: Spotting the Stars in your Galaxy of Patent Data
II-SDV 2017: Spotting the Stars in your Galaxy of Patent Data
 
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities  ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
 
II-SDV 2013 Key Success Factors in the Setup of Cutting-Edge Patent Intellige...
II-SDV 2013 Key Success Factors in the Setup of Cutting-Edge Patent Intellige...II-SDV 2013 Key Success Factors in the Setup of Cutting-Edge Patent Intellige...
II-SDV 2013 Key Success Factors in the Setup of Cutting-Edge Patent Intellige...
 
Semantic Technology in Publishing & Finance
Semantic Technology in Publishing & FinanceSemantic Technology in Publishing & Finance
Semantic Technology in Publishing & Finance
 
Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...
 
Data Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingData Virtualization Modernizes Biobanking
Data Virtualization Modernizes Biobanking
 
II-SDV 2013 Customized Newsletters - Strategies to Improved Current Awareness
II-SDV 2013 Customized Newsletters - Strategies to Improved Current AwarenessII-SDV 2013 Customized Newsletters - Strategies to Improved Current Awareness
II-SDV 2013 Customized Newsletters - Strategies to Improved Current Awareness
 
The Economic Value of Data: A New Revenue Stream for Global Custodians
The Economic Value of Data: A New Revenue Stream for Global CustodiansThe Economic Value of Data: A New Revenue Stream for Global Custodians
The Economic Value of Data: A New Revenue Stream for Global Custodians
 
UKSG 2018 Breakout - Setting up an effective Request for Proposal (RFP) and t...
UKSG 2018 Breakout - Setting up an effective Request for Proposal (RFP) and t...UKSG 2018 Breakout - Setting up an effective Request for Proposal (RFP) and t...
UKSG 2018 Breakout - Setting up an effective Request for Proposal (RFP) and t...
 

Ähnlich wie II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)

ICIC 2013 Conference Proceedings David Milward Linguamatics
ICIC 2013 Conference Proceedings David Milward LinguamaticsICIC 2013 Conference Proceedings David Milward Linguamatics
ICIC 2013 Conference Proceedings David Milward LinguamaticsDr. Haxel Consult
 
The valule of Multi-model Databases
The valule of Multi-model DatabasesThe valule of Multi-model Databases
The valule of Multi-model DatabasesRobert Bira
 
II-SDV 2013 Text Mining Diverse Data
II-SDV 2013 Text Mining Diverse DataII-SDV 2013 Text Mining Diverse Data
II-SDV 2013 Text Mining Diverse DataDr. Haxel Consult
 
Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)Matt Barnes
 
Managing Biomedical Data and Metadata in Large Scale Collaborations
Managing Biomedical Data and Metadata in Large Scale CollaborationsManaging Biomedical Data and Metadata in Large Scale Collaborations
Managing Biomedical Data and Metadata in Large Scale CollaborationsGeorges Heiter
 
Xybion Enterprise Content and Data Management
Xybion Enterprise Content and Data Management Xybion Enterprise Content and Data Management
Xybion Enterprise Content and Data Management Xybion Corporation
 
NHSPUG June 2015 - Must Love Term Sets: The New and Improved Managed Metadat...
NHSPUG June 2015  - Must Love Term Sets: The New and Improved Managed Metadat...NHSPUG June 2015  - Must Love Term Sets: The New and Improved Managed Metadat...
NHSPUG June 2015 - Must Love Term Sets: The New and Improved Managed Metadat...Jonathan Ralton
 
Connected development data
Connected development dataConnected development data
Connected development dataRob Worthington
 
Henninger_MakingReferenceDataMoreMeaningful-Final
Henninger_MakingReferenceDataMoreMeaningful-FinalHenninger_MakingReferenceDataMoreMeaningful-Final
Henninger_MakingReferenceDataMoreMeaningful-FinalScott Henninger
 
Präsentation share point
Präsentation share pointPräsentation share point
Präsentation share pointcoda-efurt
 
Interior Designs
Interior DesignsInterior Designs
Interior Designsarun kumar
 
Sharepoint Architecture
Sharepoint Architecture Sharepoint Architecture
Sharepoint Architecture arun kumar
 
Model Confidence for Master Data with David Loshin
Model Confidence for Master Data with David LoshinModel Confidence for Master Data with David Loshin
Model Confidence for Master Data with David LoshinEmbarcadero Technologies
 
Google Search Appliance Version 2.0 Webinar - May 2012
Google Search Appliance Version 2.0 Webinar - May 2012Google Search Appliance Version 2.0 Webinar - May 2012
Google Search Appliance Version 2.0 Webinar - May 2012Fishbowl Solutions
 
Microsoft PPT_Sharepoint_introduction
Microsoft PPT_Sharepoint_introductionMicrosoft PPT_Sharepoint_introduction
Microsoft PPT_Sharepoint_introductionDipti Bohra
 
Linked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for EntrepreneursLinked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for Entrepreneurs3 Round Stones
 
Creating enterprise standards 09302010
Creating enterprise standards 09302010Creating enterprise standards 09302010
Creating enterprise standards 09302010ERwin Modeling
 
SharePoint Migrations Pitfalls from the Crypt
SharePoint Migrations Pitfalls from the CryptSharePoint Migrations Pitfalls from the Crypt
SharePoint Migrations Pitfalls from the CryptJohn Mongell
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...David Peyruc
 
GDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data VirtualizationGDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data VirtualizationDenodo
 

Ähnlich wie II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK) (20)

ICIC 2013 Conference Proceedings David Milward Linguamatics
ICIC 2013 Conference Proceedings David Milward LinguamaticsICIC 2013 Conference Proceedings David Milward Linguamatics
ICIC 2013 Conference Proceedings David Milward Linguamatics
 
The valule of Multi-model Databases
The valule of Multi-model DatabasesThe valule of Multi-model Databases
The valule of Multi-model Databases
 
II-SDV 2013 Text Mining Diverse Data
II-SDV 2013 Text Mining Diverse DataII-SDV 2013 Text Mining Diverse Data
II-SDV 2013 Text Mining Diverse Data
 
Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)
 
Managing Biomedical Data and Metadata in Large Scale Collaborations
Managing Biomedical Data and Metadata in Large Scale CollaborationsManaging Biomedical Data and Metadata in Large Scale Collaborations
Managing Biomedical Data and Metadata in Large Scale Collaborations
 
Xybion Enterprise Content and Data Management
Xybion Enterprise Content and Data Management Xybion Enterprise Content and Data Management
Xybion Enterprise Content and Data Management
 
NHSPUG June 2015 - Must Love Term Sets: The New and Improved Managed Metadat...
NHSPUG June 2015  - Must Love Term Sets: The New and Improved Managed Metadat...NHSPUG June 2015  - Must Love Term Sets: The New and Improved Managed Metadat...
NHSPUG June 2015 - Must Love Term Sets: The New and Improved Managed Metadat...
 
Connected development data
Connected development dataConnected development data
Connected development data
 
Henninger_MakingReferenceDataMoreMeaningful-Final
Henninger_MakingReferenceDataMoreMeaningful-FinalHenninger_MakingReferenceDataMoreMeaningful-Final
Henninger_MakingReferenceDataMoreMeaningful-Final
 
Präsentation share point
Präsentation share pointPräsentation share point
Präsentation share point
 
Interior Designs
Interior DesignsInterior Designs
Interior Designs
 
Sharepoint Architecture
Sharepoint Architecture Sharepoint Architecture
Sharepoint Architecture
 
Model Confidence for Master Data with David Loshin
Model Confidence for Master Data with David LoshinModel Confidence for Master Data with David Loshin
Model Confidence for Master Data with David Loshin
 
Google Search Appliance Version 2.0 Webinar - May 2012
Google Search Appliance Version 2.0 Webinar - May 2012Google Search Appliance Version 2.0 Webinar - May 2012
Google Search Appliance Version 2.0 Webinar - May 2012
 
Microsoft PPT_Sharepoint_introduction
Microsoft PPT_Sharepoint_introductionMicrosoft PPT_Sharepoint_introduction
Microsoft PPT_Sharepoint_introduction
 
Linked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for EntrepreneursLinked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for Entrepreneurs
 
Creating enterprise standards 09302010
Creating enterprise standards 09302010Creating enterprise standards 09302010
Creating enterprise standards 09302010
 
SharePoint Migrations Pitfalls from the Crypt
SharePoint Migrations Pitfalls from the CryptSharePoint Migrations Pitfalls from the Crypt
SharePoint Migrations Pitfalls from the Crypt
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
 
GDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data VirtualizationGDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data Virtualization
 

Mehr von Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterDr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCDr. Haxel Consult
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
 

Mehr von Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Kürzlich hochgeladen

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 

Kürzlich hochgeladen (20)

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 

II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)

  • 1. The Road to Federated Text Mining: Are we there yet? II-SDV 2014 Guy Singh
  • 2. Click to edit Master title styleClick to edit Master title style “Federated search is an information retrieval technology that allows the simultaneous search of multiple searchable resources. 2 What is federated search? A user makes a single query request which is distributed to the search engines participating in the federation” - Wikipedia
  • 3. Click to edit Master title styleClick to edit Master title styleCurrent Situation • Volume of data ever increasing • Proprietary content can reside within Enterprise • No need for everyone to keep standard sources up-to-date • Data from content providers can reside on their sites Linguamatics Customer Confidential3 Internal Content External Content MEDLINE Clinical Trials Publisher Content FDA Drug Labels Patents
  • 4. Click to edit Master title styleClick to edit Master title style Data Sources Scientific Literature Social Media News Web Pages Internal Documents Patents RSS Clinical Trials 4 Increasing Range of Data Sources
  • 5. Click to edit Master title styleClick to edit Master title style 5 Varying in Structure
  • 6. Click to edit Master title styleClick to edit Master title styleHow does text mining differ from keyword search? Example: What genes affect breast cancer
  • 7. Click to edit Master title styleClick to edit Master title style • Searching across documents using keywords is relatively trivial – Do not need to be aware of where the words occur and in what context • Text mining documents with varying structure requires a more sophisticated approach; Need to: – Know where words matching entities/concepts occur – Disambiguate depending on context and location – Find terms in particular regions/parts of document for targeted searches 7 Why does document structure matter?
  • 8. Click to edit Master title styleClick to edit Master title style • Integrate the data together into a data warehouse – Extract, Transform and Load each data source into a new database – Multiple copies of the data – Data normalisation can be difficult and challenging – Time consuming and expensive process – Most database vendors take this approach – Allows users to perform a single search across all the content • Leave the data where it is, federated content – Data remains in it’s original form and location – Multiple data types – Multiple network locations – Single search across multiple different data sources 8 Approaches to dealing with different data sources
  • 9. Click to edit Master title styleClick to edit Master title style Data Normalisation Link the Content Servers Merge Results Federated Text Mining 9 How do we get to Federated Text Mining?
  • 10. Click to edit Master title styleClick to edit Master title style 10 Data Normalisation – Virtual Indexes Pathology Reports Index Journal Abstracts Index Virtual Index
  • 11. Click to edit Master title styleClick to edit Master title style 11 Data Normalisation – Document Structure Pathology Reports Journal Abstracts
  • 12. Click to edit Master title styleClick to edit Master title style 12 Data Normalisation - Entities Journal Abstracts Pathology ReportsCombined (Normalized)
  • 13. Linking Content Servers Linguamatics Customer Confidential13
  • 14. Click to edit Master title styleClick to edit Master title style • I2E 4.1 introduced a new feature – Linked Server • One I2E server can be linked to another I2E server • Provides access to remote and local indexes and queries through a single I2E interface (Linked Servers) – Indexes and queries on remote servers on the network appear the same as local indexes Linked Servers Development Status
  • 15. Click to edit Master title styleClick to edit Master title style Linguamatics – Customer confidential I2E 4.1 Linked Servers I2E Enterprise on Customer network I2E OnDemand SaaS Infrastructure In-house Indexes I2E OnDemand Standard Indexes I2E Enterprise Access Custom Indexes Access via Linked Servers Access via single UI
  • 16. Merging Results (Part I) Single Server, Multiple Queries
  • 17. Click to edit Master title styleClick to edit Master title styleI2E 3.0 (2009) – Merging Results (part I) from one server Profiling Individuals • Example from news reports related to pharmaceutical industry • Pick up properties from one document or many © Linguamatics 2012 - Customer Confidential
  • 18. Click to edit Master title styleClick to edit Master title style © Linguamatics 2013 - Confidential I2E 3.0 – Merging Results (part I) from one server Document Identifier Patient information Disease history Patient data Medications and dosages Hit displayed in context
  • 19. Merging Results (Part II) Linguamatics Customer Confidential19 Multiple Servers, Multiple Queries
  • 20. Click to edit Master title styleClick to edit Master title style 20 Each Server supplying separate set of results Content Server 1 Content Server 2 Content Server 3 Content Server 4 Merge into a single set of results
  • 21. The Road to Federated Text Mining
  • 23. Click to edit Master title styleClick to edit Master title styleI2E 4.0: Multiple Clients, Multiple Results I2E Server 2 FDA Drug Labels I2E Server 1 Internal Documents external networkinternal network Linguamatics Customer Confidential23
  • 24. Click to edit Master title styleClick to edit Master title styleI2E 4.1/4.2: Single Client, Multiple Results I2E Server 2 FDA Drug Labels I2E Server 1 Internal Documents external networkinternal network Linguamatics Customer Confidential24 Linked server
  • 26. Click to edit Master title styleClick to edit Master title styleQ4 2014: Single Client, Single Result, Multiple Servers I2E Server 2 FDA Drug Labels I2E Server 1 Internal Documents external networkinternal network Linguamatics Customer Confidential26 Linked server
  • 27. Click to edit Master title styleClick to edit Master title styleQ4 2014: Federated Text Mining Example • Single Query • Differently structured data sources on different servers – Journal Articles (PubMed Central) on Enterprise Server – MEDLINE on I2E OnDemand • Single set of results Linguamatics Customer Confidential27
  • 28. Click to edit Master title styleClick to edit Master title styleThe Road to Federated – Are we there yet? I2E 4.0 Dec 2012 I2E 4.1 October 2013 Next release: in Development Q4 2014 Merging the Results (part II) Data Normalisation Linking Content Servers
  • 30. Click to edit Master title styleClick to edit Master title style 30 Demo Cambridge VPN Nice Linked Server Journal Abstracts Pathology Reports
  • 31. Thank you Linguamatics – Customer confidential