SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
Text and Data Mining at CCC
Solving the Content Retrieval and Licensing Conundrums for TDM
Dr. Haralambos Marmanis
CTO & VP, Engineering
Copyright Clearance Center
Introduction
4/22/20152
Making Copyright Work – CCC and RightsDirect
Rightsholders Content Users
600+ million rights
from:
• Publishers
• Authors
• Creators
• 35,000 companies
• Employees
worldwide
• Users in 180
countries
• Licensing
Solutions
• Rights
Management
• Content Delivery
• Copyright
Education
4/22/2015
Who Am I?
4/22/20154
What Is Text and Data Mining?
• Automate the extraction of “Entities” from Text
• Find Relationships and Patterns
• Produce hypotheses of interest
• Drive decision making
4/22/20155
Applications
• Biomarker discovery
• Drug repurposing
• Drug safety
• Competitive intelligence
• Sentiment analysis
• …….
4/22/20156
The General Problem & Our Solution
Through An Example
4/22/20157
“Drug Discovery” Process
• Goal: Develop new treatments for diseases
through hypothesis formation.
• Methodology:
– Keyword/Database Searching
– Review Literature
– Find relationships
– Develop hypothesis
– Test
– Product development
Etc.
4/22/20158
General Overview of the Process
1. Identify a set of resources that are relevant to a
particular research objective
2. Analyze and extract information specific to the
research objective
3. Develop and explore the various relations between
extracted objects of interest
4/22/20159
Data Processing Workflow:
Information Retrieval and Knowledge Discovery
4/22/201510 *http://www.jisc.ac.uk/reports/value-and-benefits-of-text-mining
Software Platforms for TDM
Information
Retrieval
Knowledge
Discovery
Problem: Too Much Research
• 53M Records in Scopus
• 800,000 Journal Articles published per year
4/22/201511
More Problems…
• Many sources of content
• Many formats
• Difficult to obtain full-text in XML
• Difficult to integrate content into TDM software.
• Hard to negotiate and manage licenses and feeds from
all publishers.
4/22/201512
The DirectPath Solution
• Speed up time to obtain properly licensed content for
text mining
• Discover and download full-text in XML, not just
abstracts
• Main corpus includes Subscribed and Not-Subscribed
content
• Normalize XML format across many publishers
• Provide a Web UI and RESTful API services
4/22/201513
4/22/201514
2. Researchers create
content sets by using
search or other
discovery criteria
XML
Article
corpus
TDM Software
3. Researchers slice and
dice results and identify
an appropriate corpus for
their project
4. XML corpus
can be
imported into
various TDM
tools
1. Publishers
provide
content
and rights
<XML>
<XML>
<XML>
Publishers Researchers
Application Walkthrough
4/22/201515
4/22/201516
4/22/201517
4/22/201518
4/22/201519
4/22/201520
4/22/201521
RESTful Services Based on Open Standards
4/22/201522
4/22/201523
Unique Features
• Custom analysis/indexing for each Project
– Custom stop-word lists; synonyms/dictionaries
– Custom analyzers
– The finest granularity at the analysis and indexing level
• Build by design with multilingual support in mind
– Based on Lucene
• Search beyond TFIDF (e.g. document ranking by citation)
• Retrieval beyond Search (e.g. nearest neighbors)
• Cost and Quality Optimization (roadmap/patent pending)
• Integration with text mining tools like Linguamatics I2E
4/22/201524
TDM Product Roadmap
• Augment and Enrich the Inventory
• Workflow Integrations with 3rd Party Support
• Expand and enhance Metadata Normalization
• Introduce Content Metrics for Retrieval
• Cost Optimization
• Information Content Optimization
4/22/201525
Thank You!

Weitere ähnliche Inhalte

Was ist angesagt?

II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities  ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
Dr. Haxel Consult
 

Was ist angesagt? (20)

II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
Optimising Content Spending with Analytics
Optimising Content Spending with AnalyticsOptimising Content Spending with Analytics
Optimising Content Spending with Analytics
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web SearchII-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search
 
ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy  ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy
 
New Product Introductions - Minesoft
New Product Introductions - MinesoftNew Product Introductions - Minesoft
New Product Introductions - Minesoft
 
II-SDV 2016 VantagePoint
II-SDV 2016 VantagePointII-SDV 2016 VantagePoint
II-SDV 2016 VantagePoint
 
RightsDirekt
RightsDirektRightsDirekt
RightsDirekt
 
II-SDV 2016 Questel Intellixir
II-SDV 2016 Questel IntellixirII-SDV 2016 Questel Intellixir
II-SDV 2016 Questel Intellixir
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
ICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoftICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoft
 
ICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ KarlsruheICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ Karlsruhe
 
The Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellThe Enterprise Search Market in a Nutshell
The Enterprise Search Market in a Nutshell
 
ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...
ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...
ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...
 
II-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping
II-SDV 2016 Aalt van de Kuilen - The Art of Patent LandscapingII-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping
II-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping
 
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities  ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
 
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
 
ICIC 2017: New product presentationsLighthouse IP
ICIC 2017: New product presentationsLighthouse IPICIC 2017: New product presentationsLighthouse IP
ICIC 2017: New product presentationsLighthouse IP
 
Smart Data Applications powered by the Wikidata Knowledge Graph
Smart Data Applications powered by the Wikidata Knowledge GraphSmart Data Applications powered by the Wikidata Knowledge Graph
Smart Data Applications powered by the Wikidata Knowledge Graph
 
II-SDV 2016 IRIX Software Engineering
II-SDV 2016 IRIX Software EngineeringII-SDV 2016 IRIX Software Engineering
II-SDV 2016 IRIX Software Engineering
 

Andere mochten auch

II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 21 - 21 April, in Nice
II-SDV 2015, 21 - 21 April, in NiceII-SDV 2015, 21 - 21 April, in Nice
II-SDV 2015, 21 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV Emmanuelle Fortune - SMEs as Patent Applicants in France in 2014
II-SDV Emmanuelle Fortune - SMEs as Patent Applicants in France in 2014 II-SDV Emmanuelle Fortune - SMEs as Patent Applicants in France in 2014
II-SDV Emmanuelle Fortune - SMEs as Patent Applicants in France in 2014
Dr. Haxel Consult
 

Andere mochten auch (17)

II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 21 - 21 April, in Nice
II-SDV 2015, 21 - 21 April, in NiceII-SDV 2015, 21 - 21 April, in Nice
II-SDV 2015, 21 - 21 April, in Nice
 
II-SDV Emmanuelle Fortune - SMEs as Patent Applicants in France in 2014
II-SDV Emmanuelle Fortune - SMEs as Patent Applicants in France in 2014 II-SDV Emmanuelle Fortune - SMEs as Patent Applicants in France in 2014
II-SDV Emmanuelle Fortune - SMEs as Patent Applicants in France in 2014
 
II-SDV 2016 Deep SEARCH 9
II-SDV 2016 Deep SEARCH 9II-SDV 2016 Deep SEARCH 9
II-SDV 2016 Deep SEARCH 9
 
II-SDV 2016 Centredoc
II-SDV 2016 CentredocII-SDV 2016 Centredoc
II-SDV 2016 Centredoc
 
II-SDV 2016 Stefan Geißler Navigating complex information landscapes – Semant...
II-SDV 2016 Stefan Geißler Navigating complex information landscapes – Semant...II-SDV 2016 Stefan Geißler Navigating complex information landscapes – Semant...
II-SDV 2016 Stefan Geißler Navigating complex information landscapes – Semant...
 
II-SDV 2016 GQ Life Sciences
II-SDV 2016 GQ Life SciencesII-SDV 2016 GQ Life Sciences
II-SDV 2016 GQ Life Sciences
 
II-SDV 2016 Denis Bayada - Concomitant Ontology-Driven Patent and Non-Patent ...
II-SDV 2016 Denis Bayada - Concomitant Ontology-Driven Patent and Non-Patent ...II-SDV 2016 Denis Bayada - Concomitant Ontology-Driven Patent and Non-Patent ...
II-SDV 2016 Denis Bayada - Concomitant Ontology-Driven Patent and Non-Patent ...
 
II-SDV 2016 GRIDLOGICS
II-SDV 2016 GRIDLOGICSII-SDV 2016 GRIDLOGICS
II-SDV 2016 GRIDLOGICS
 
II-SDV 2016 Linguamatics
II-SDV 2016 LinguamaticsII-SDV 2016 Linguamatics
II-SDV 2016 Linguamatics
 
II-SDV 2016 Bob Stembridge We have all the Time in the World; a Review of ho...
II-SDV 2016 Bob Stembridge  We have all the Time in the World; a Review of ho...II-SDV 2016 Bob Stembridge  We have all the Time in the World; a Review of ho...
II-SDV 2016 Bob Stembridge We have all the Time in the World; a Review of ho...
 

Ähnlich wie II-SDV 2015, 20 - 21 April, in Nice

CRC-STC May 2013 Summit Presentation
CRC-STC May 2013 Summit PresentationCRC-STC May 2013 Summit Presentation
CRC-STC May 2013 Summit Presentation
crcstc
 
Climbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations WebinarClimbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations Webinar
Concept Searching, Inc
 

Ähnlich wie II-SDV 2015, 20 - 21 April, in Nice (20)

Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
 
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
 
Building blocks for success: criteria for trusted institutional repositories
Building blocks for success: criteria for trusted institutional repositoriesBuilding blocks for success: criteria for trusted institutional repositories
Building blocks for success: criteria for trusted institutional repositories
 
ORCID - UK PIDs for Open Access - progress update
ORCID - UK PIDs for Open Access - progress updateORCID - UK PIDs for Open Access - progress update
ORCID - UK PIDs for Open Access - progress update
 
MOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLPMOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLP
 
MOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLPMOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLP
 
OpenChain at EOLE 2017
OpenChain at EOLE 2017OpenChain at EOLE 2017
OpenChain at EOLE 2017
 
Common Protocol Template Executive Summary
Common Protocol Template Executive SummaryCommon Protocol Template Executive Summary
Common Protocol Template Executive Summary
 
Online Journal Management using Open Journal Systems (OJS)
Online Journal Management using Open Journal Systems (OJS)Online Journal Management using Open Journal Systems (OJS)
Online Journal Management using Open Journal Systems (OJS)
 
ufsojs-161024084446 (1).pdf
ufsojs-161024084446 (1).pdfufsojs-161024084446 (1).pdf
ufsojs-161024084446 (1).pdf
 
Webinar@AIMS on RIOXX
Webinar@AIMS on RIOXXWebinar@AIMS on RIOXX
Webinar@AIMS on RIOXX
 
OpenKM commercial
OpenKM commercialOpenKM commercial
OpenKM commercial
 
Supporting the uptake of TDM
Supporting the uptake of TDMSupporting the uptake of TDM
Supporting the uptake of TDM
 
What Do Records Managers Need to Know About Open Source, Open Standards, Open...
What Do Records Managers Need to Know About Open Source, Open Standards, Open...What Do Records Managers Need to Know About Open Source, Open Standards, Open...
What Do Records Managers Need to Know About Open Source, Open Standards, Open...
 
What You Need to Know Before Upgrading to SharePoint 2013
What You Need to Know Before Upgrading to SharePoint 2013What You Need to Know Before Upgrading to SharePoint 2013
What You Need to Know Before Upgrading to SharePoint 2013
 
Software management plans in research software
Software management plans in research softwareSoftware management plans in research software
Software management plans in research software
 
CRC-STC May 2013 Summit Presentation
CRC-STC May 2013 Summit PresentationCRC-STC May 2013 Summit Presentation
CRC-STC May 2013 Summit Presentation
 
Building blocks for success: criteria for trusted institutional repositories
Building blocks for success: criteria for trusted institutional repositoriesBuilding blocks for success: criteria for trusted institutional repositories
Building blocks for success: criteria for trusted institutional repositories
 
Introduction to Competitive Intelligence Portals
Introduction to Competitive Intelligence PortalsIntroduction to Competitive Intelligence Portals
Introduction to Competitive Intelligence Portals
 
Climbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations WebinarClimbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations Webinar
 

Mehr von Dr. Haxel Consult

AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
Dr. Haxel Consult
 

Mehr von Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Kürzlich hochgeladen

PowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxPowerDirector Explination Process...pptx
PowerDirector Explination Process...pptx
galaxypingy
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Monica Sydney
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Monica Sydney
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
ayvbos
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Monica Sydney
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
ydyuyu
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
pxcywzqs
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
Asmae Rabhi
 

Kürzlich hochgeladen (20)

PowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxPowerDirector Explination Process...pptx
PowerDirector Explination Process...pptx
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 

II-SDV 2015, 20 - 21 April, in Nice

  • 1. Text and Data Mining at CCC Solving the Content Retrieval and Licensing Conundrums for TDM Dr. Haralambos Marmanis CTO & VP, Engineering Copyright Clearance Center
  • 3. Making Copyright Work – CCC and RightsDirect Rightsholders Content Users 600+ million rights from: • Publishers • Authors • Creators • 35,000 companies • Employees worldwide • Users in 180 countries • Licensing Solutions • Rights Management • Content Delivery • Copyright Education 4/22/2015
  • 5. What Is Text and Data Mining? • Automate the extraction of “Entities” from Text • Find Relationships and Patterns • Produce hypotheses of interest • Drive decision making 4/22/20155
  • 6. Applications • Biomarker discovery • Drug repurposing • Drug safety • Competitive intelligence • Sentiment analysis • ……. 4/22/20156
  • 7. The General Problem & Our Solution Through An Example 4/22/20157
  • 8. “Drug Discovery” Process • Goal: Develop new treatments for diseases through hypothesis formation. • Methodology: – Keyword/Database Searching – Review Literature – Find relationships – Develop hypothesis – Test – Product development Etc. 4/22/20158
  • 9. General Overview of the Process 1. Identify a set of resources that are relevant to a particular research objective 2. Analyze and extract information specific to the research objective 3. Develop and explore the various relations between extracted objects of interest 4/22/20159
  • 10. Data Processing Workflow: Information Retrieval and Knowledge Discovery 4/22/201510 *http://www.jisc.ac.uk/reports/value-and-benefits-of-text-mining Software Platforms for TDM Information Retrieval Knowledge Discovery
  • 11. Problem: Too Much Research • 53M Records in Scopus • 800,000 Journal Articles published per year 4/22/201511
  • 12. More Problems… • Many sources of content • Many formats • Difficult to obtain full-text in XML • Difficult to integrate content into TDM software. • Hard to negotiate and manage licenses and feeds from all publishers. 4/22/201512
  • 13. The DirectPath Solution • Speed up time to obtain properly licensed content for text mining • Discover and download full-text in XML, not just abstracts • Main corpus includes Subscribed and Not-Subscribed content • Normalize XML format across many publishers • Provide a Web UI and RESTful API services 4/22/201513
  • 14. 4/22/201514 2. Researchers create content sets by using search or other discovery criteria XML Article corpus TDM Software 3. Researchers slice and dice results and identify an appropriate corpus for their project 4. XML corpus can be imported into various TDM tools 1. Publishers provide content and rights <XML> <XML> <XML> Publishers Researchers
  • 22. RESTful Services Based on Open Standards 4/22/201522
  • 24. Unique Features • Custom analysis/indexing for each Project – Custom stop-word lists; synonyms/dictionaries – Custom analyzers – The finest granularity at the analysis and indexing level • Build by design with multilingual support in mind – Based on Lucene • Search beyond TFIDF (e.g. document ranking by citation) • Retrieval beyond Search (e.g. nearest neighbors) • Cost and Quality Optimization (roadmap/patent pending) • Integration with text mining tools like Linguamatics I2E 4/22/201524
  • 25. TDM Product Roadmap • Augment and Enrich the Inventory • Workflow Integrations with 3rd Party Support • Expand and enhance Metadata Normalization • Introduce Content Metrics for Retrieval • Cost Optimization • Information Content Optimization 4/22/201525