SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Confidential
Dealing with large volumes in statistical
analysis and text mining tools
April 16-17 2012
II-SDV Nice
Laurent Hill
Questel
Introduction
“What’s hot in the semiconductor industry?”
“What’s patented in Korea but not in the US?”
“What are the trends in biofuel production?”
To answer those questions you need to
analyze 10,000s to millions of patent
records in detail
Presentation Outline
• Problem statement
• Large scale text mining
• Text mining applications in Fampat
• Statistical analysis at work on large data –sets
• Converging scale-out and small corpus
techniques
Scaling out patent analysis:
Problem statement
• Some projects deal with up to ~ 10,000 patents
– Existing tools well adapted
– Classic cycle: extract, run text mining, vet data, graph results
• For some other problems, the volume is just too high for this
approach
– See introduction questions
• Last 10 years of semiconductor patents: 854,000 families, 1M+ patents
• Patented in Korea but not in the US, 2000-2010: 1,134,000 families
• Trends in biofuel production: 18,800 patent families (51,000 patents)
– Yet we need fast answers, stakeholders do not care about volume,
they just need answers.
– An entirely different approach is needed, while still having the same
user interface, and without giving up on any functionality for smaller
data-sets.
Large scale text mining
• Text mining:
– Producing a model from text analysis
– Deriving high quality information / features from
the model
• Issues
– Text mining analysis is resource intensive, cannot
be done in real time on a large corpus
– The model often is a black box to the user,
lowering user confidence on derived applications
Concept extraction
• Main idea: extract concepts rather than words
as a semantic model of patent documents,
weight concepts according to sentence type
– Weighted concepts produce a natural semantic
vector model of the patent
• To deal with large volume we need to do this
upfront when loading the data
– 3 CPU years to pre-process the whole database
Key sentences tagging
• Identify key sentences describing patent
object, advantages and drawbacks,
independent claims
• Uses morpho-syntactic analysis to spot
important sentences
– “patent writer sentiment analysis”
• Good compromise between conciseness of
bibliographic abstracts and full text
Semantic concept tagging
• « noun phrases » identification
– Part-of-speech + stemming
– Verb Suppression, some adjectives
– Suppress patent boilerplate terminology
• « preferred embodiment », « skilled artisan » ….
– Syntactic normalization:
• « surface of screens » « screen surface »
• Relevance score computation, based on :
– Field, key sentence morpho-syntactic detection, and
number of occurrences
Concept extraction example
The invention relates generally to molecular level
cleaning of parts by vapor degreasing.
More particularly, the invention relates to a solvent
mixture comprising n-propyl bromide, a [mixture
of low boiling solvents] and, …
The solvent mixture of the invention is non-
flammable, non-corrosive, non-hazardous, and has
a low ozone depletion potential.
Normalization examples
the heat conductivity
the conduction of heat
heat conductivity
heat conduction
HEAT CONDUCTION
the user of the cellular telephone
cellular phone users
a user of a mobile telephone
any mobile phone users
their user mobile telephones
MOBILE PHONE USER
« tag cloud »
style
visualization in
orbit
Vector model made
user-friendly
Text mining applications at
database level
• Since text mining has been applied massively, it
can be leveraged at the database level (Fampat)
• Similarity searching
– More like this (one or more patents)
– Refine by example
• Related concept search
– “chaussure de ski” yields:
• Ski boot
• Skier foot
• Ski binding
Large scale statistical analysis
• Nothing special here ?
– More precisely: that’s the goal
– Be able to get all usual business graphics on
patents
– With close to instant response time on 1 million
patent families
– And exact results
Prerequisites
• All data normalized and clean in the database
– Normalized assignee names
– Legal status for all patents and European national
phases
– Normalized text mining concepts
• Everything available at the family level in the
same database
• A fast hybrid engine
– Boolean full-text
– Semantic
– Analytic (OLAP)
Results
• 1M patent family, less than 10s for top
assignees
Data exploration
• On large volumes, you often are formulating
and testing hypothesis on the spot
– One click drill down :
Semantic exploration
• Cluster of concepts, filtered on year 2010
– Drill-down on any cell
Graphing along Custom axis
• Sometimes you want to produce business
graphics on your own queries
– Combining International classes with concepts
• A61Q or cosmetic/KEYW
– Defining precise patent categories by full text
queries
• (bio_diesel P Alga??)/CLMS
Custom business graphic
Crossing custom axis
• Crossing independent claim searches with concepts
Concept words
Independent
Claim
content
Comparing scale-out and small corpus
techniques
Traditional /small corpus Large scale
Pros • Ability to edit the data
• Save edited data
• Maps
• Practically unlimited in volume
• Full integration with search
engine
• Full access to text mining features
Cons • Limited to 60,000 patent
families
• Full text queries often
limited
• Have to wait for data to be
extracted/saved
• No data vetting / editting
• No data set saving
We need to get the
best of both worlds
Converged tool:
orbit analysis module
• 3 data source
– Live from database
– Workfiles
– Saved analysis: edited data
• Same features, same user interface
Leveraging
concepts:
Mapping
module
Questions
• Feel free to share your horror / success stories
in analyzing hundreds of thousand of patents.
Thank you very much

Weitere ähnliche Inhalte

Was ist angesagt?

IC-SDV 2019: Search Technology / Vantage Point
IC-SDV 2019: Search Technology / Vantage PointIC-SDV 2019: Search Technology / Vantage Point
IC-SDV 2019: Search Technology / Vantage PointDr. Haxel Consult
 
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...Dr. Haxel Consult
 
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...Dr. Haxel Consult
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining AreaMahamudHasanCSE
 
AI-SDV 2021: Heiko Wongel - Machine learning tools in patent searching - are ...
AI-SDV 2021: Heiko Wongel - Machine learning tools in patent searching - are ...AI-SDV 2021: Heiko Wongel - Machine learning tools in patent searching - are ...
AI-SDV 2021: Heiko Wongel - Machine learning tools in patent searching - are ...Dr. Haxel Consult
 
Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Kai Li
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIPistoia Alliance
 
II-SDV 2014 Patent Valuation: Building the tools to extract and unveil intell...
II-SDV 2014 Patent Valuation: Building the tools to extract and unveil intell...II-SDV 2014 Patent Valuation: Building the tools to extract and unveil intell...
II-SDV 2014 Patent Valuation: Building the tools to extract and unveil intell...Dr. Haxel Consult
 
II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...
II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...
II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...Dr. Haxel Consult
 
Introduction To Data Science With Python
Introduction To Data Science With PythonIntroduction To Data Science With Python
Introduction To Data Science With PythonSpotle.ai
 
Product forecastingwebinar 20130417
Product forecastingwebinar 20130417Product forecastingwebinar 20130417
Product forecastingwebinar 20130417Kognitio
 
II-SDV 2014 Organising Data: The step before visualisation (Nils C. Newman - ...
II-SDV 2014 Organising Data: The step before visualisation (Nils C. Newman - ...II-SDV 2014 Organising Data: The step before visualisation (Nils C. Newman - ...
II-SDV 2014 Organising Data: The step before visualisation (Nils C. Newman - ...Dr. Haxel Consult
 
Adding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' ProblemsAdding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' ProblemsSimon Price
 
Machine learning in action at Pipedrive
Machine learning in action at PipedriveMachine learning in action at Pipedrive
Machine learning in action at PipedriveAndré Karpištšenko
 
PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences Pistoia Alliance
 
Data science with python
Data science with pythonData science with python
Data science with pythonKartavya Jain
 
Amanda Tilot: Can I say that?
Amanda Tilot: Can I say that?Amanda Tilot: Can I say that?
Amanda Tilot: Can I say that?Amanda Tilot
 

Was ist angesagt? (20)

IC-SDV 2019: Search Technology / Vantage Point
IC-SDV 2019: Search Technology / Vantage PointIC-SDV 2019: Search Technology / Vantage Point
IC-SDV 2019: Search Technology / Vantage Point
 
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...
 
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...
 
IC-SDV 2019: OntoChem
IC-SDV 2019: OntoChemIC-SDV 2019: OntoChem
IC-SDV 2019: OntoChem
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining Area
 
AI-SDV 2021: Heiko Wongel - Machine learning tools in patent searching - are ...
AI-SDV 2021: Heiko Wongel - Machine learning tools in patent searching - are ...AI-SDV 2021: Heiko Wongel - Machine learning tools in patent searching - are ...
AI-SDV 2021: Heiko Wongel - Machine learning tools in patent searching - are ...
 
Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...
 
Data Science in Action
Data Science in ActionData Science in Action
Data Science in Action
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBI
 
II-SDV 2014 Patent Valuation: Building the tools to extract and unveil intell...
II-SDV 2014 Patent Valuation: Building the tools to extract and unveil intell...II-SDV 2014 Patent Valuation: Building the tools to extract and unveil intell...
II-SDV 2014 Patent Valuation: Building the tools to extract and unveil intell...
 
II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...
II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...
II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...
 
Introduction To Data Science With Python
Introduction To Data Science With PythonIntroduction To Data Science With Python
Introduction To Data Science With Python
 
Product forecastingwebinar 20130417
Product forecastingwebinar 20130417Product forecastingwebinar 20130417
Product forecastingwebinar 20130417
 
II-SDV 2014 Organising Data: The step before visualisation (Nils C. Newman - ...
II-SDV 2014 Organising Data: The step before visualisation (Nils C. Newman - ...II-SDV 2014 Organising Data: The step before visualisation (Nils C. Newman - ...
II-SDV 2014 Organising Data: The step before visualisation (Nils C. Newman - ...
 
Adding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' ProblemsAdding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' Problems
 
Machine learning in action at Pipedrive
Machine learning in action at PipedriveMachine learning in action at Pipedrive
Machine learning in action at Pipedrive
 
Text analytics
Text analyticsText analytics
Text analytics
 
PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences
 
Data science with python
Data science with pythonData science with python
Data science with python
 
Amanda Tilot: Can I say that?
Amanda Tilot: Can I say that?Amanda Tilot: Can I say that?
Amanda Tilot: Can I say that?
 

Ähnlich wie II-SDV 2012 Dealing with Large Data Volumes in Statistical Analysis and Text Mining Tools

Optique - poster
Optique - posterOptique - poster
Optique - posterDBOnto
 
Bioinformatics&Databases.ppt
Bioinformatics&Databases.pptBioinformatics&Databases.ppt
Bioinformatics&Databases.pptBlackHunt1
 
ProjectsSummary.pptx
ProjectsSummary.pptxProjectsSummary.pptx
ProjectsSummary.pptxJamesKirk79
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) SkillsOscar Corcho
 
II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (K...
II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (K...II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (K...
II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (K...Dr. Haxel Consult
 
ICIC 2014 New Product Introduction Gridlogisc
ICIC 2014 New Product Introduction GridlogiscICIC 2014 New Product Introduction Gridlogisc
ICIC 2014 New Product Introduction GridlogiscDr. Haxel Consult
 
ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...
ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...
ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...Dr. Haxel Consult
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...semanticsconference
 
PatSeer Overview
PatSeer OverviewPatSeer Overview
PatSeer OverviewGridlogics
 
AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...
AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...
AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...Dr. Haxel Consult
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudDatabricks
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologiesenterprisesearchmeetup
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiIvo Andreev
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
 

Ähnlich wie II-SDV 2012 Dealing with Large Data Volumes in Statistical Analysis and Text Mining Tools (20)

II-SDV 2016 GRIDLOGICS
II-SDV 2016 GRIDLOGICSII-SDV 2016 GRIDLOGICS
II-SDV 2016 GRIDLOGICS
 
Optique - poster
Optique - posterOptique - poster
Optique - poster
 
Bioinformatics&Databases.ppt
Bioinformatics&Databases.pptBioinformatics&Databases.ppt
Bioinformatics&Databases.ppt
 
Solved Big Data and Data Science Projects pdf.pdf
Solved Big Data and Data Science Projects pdf.pdfSolved Big Data and Data Science Projects pdf.pdf
Solved Big Data and Data Science Projects pdf.pdf
 
ProjectsSummary.pptx
ProjectsSummary.pptxProjectsSummary.pptx
ProjectsSummary.pptx
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (K...
II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (K...II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (K...
II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (K...
 
ICIC 2014 New Product Introduction Gridlogisc
ICIC 2014 New Product Introduction GridlogiscICIC 2014 New Product Introduction Gridlogisc
ICIC 2014 New Product Introduction Gridlogisc
 
ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...
ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...
ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...
 
PatSeer Overview
PatSeer OverviewPatSeer Overview
PatSeer Overview
 
Software Patent Eligibility - A Post-Alice Landscape Discussion
Software Patent Eligibility - A Post-Alice Landscape DiscussionSoftware Patent Eligibility - A Post-Alice Landscape Discussion
Software Patent Eligibility - A Post-Alice Landscape Discussion
 
AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...
AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...
AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
 
AI-SDV 2021: Dolcera
AI-SDV 2021: DolceraAI-SDV 2021: Dolcera
AI-SDV 2021: Dolcera
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project Bonsai
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
 
Chatbots: Automated Conversational Model using Machine Learning
Chatbots: Automated Conversational Model using Machine LearningChatbots: Automated Conversational Model using Machine Learning
Chatbots: Automated Conversational Model using Machine Learning
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 

Mehr von Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterDr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCDr. Haxel Consult
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
 

Mehr von Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Kürzlich hochgeladen

Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Sheetaleventcompany
 
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Onlineanilsa9823
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGAPNIC
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$kojalkojal131
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...SofiyaSharma5
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Call Girls in Nagpur High Profile
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)Delhi Call girls
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Delhi Call girls
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...Neha Pandey
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableSeo
 

Kürzlich hochgeladen (20)

Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
 
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
 
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOG
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
 
@9999965857 🫦 Sexy Desi Call Girls Laxmi Nagar 💓 High Profile Escorts Delhi 🫶
@9999965857 🫦 Sexy Desi Call Girls Laxmi Nagar 💓 High Profile Escorts Delhi 🫶@9999965857 🫦 Sexy Desi Call Girls Laxmi Nagar 💓 High Profile Escorts Delhi 🫶
@9999965857 🫦 Sexy Desi Call Girls Laxmi Nagar 💓 High Profile Escorts Delhi 🫶
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
Russian Call Girls in %(+971524965298  )#  Call Girls in DubaiRussian Call Girls in %(+971524965298  )#  Call Girls in Dubai
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
 

II-SDV 2012 Dealing with Large Data Volumes in Statistical Analysis and Text Mining Tools

  • 1. Confidential Dealing with large volumes in statistical analysis and text mining tools April 16-17 2012 II-SDV Nice Laurent Hill Questel
  • 2. Introduction “What’s hot in the semiconductor industry?” “What’s patented in Korea but not in the US?” “What are the trends in biofuel production?” To answer those questions you need to analyze 10,000s to millions of patent records in detail
  • 3. Presentation Outline • Problem statement • Large scale text mining • Text mining applications in Fampat • Statistical analysis at work on large data –sets • Converging scale-out and small corpus techniques
  • 4. Scaling out patent analysis: Problem statement • Some projects deal with up to ~ 10,000 patents – Existing tools well adapted – Classic cycle: extract, run text mining, vet data, graph results • For some other problems, the volume is just too high for this approach – See introduction questions • Last 10 years of semiconductor patents: 854,000 families, 1M+ patents • Patented in Korea but not in the US, 2000-2010: 1,134,000 families • Trends in biofuel production: 18,800 patent families (51,000 patents) – Yet we need fast answers, stakeholders do not care about volume, they just need answers. – An entirely different approach is needed, while still having the same user interface, and without giving up on any functionality for smaller data-sets.
  • 5. Large scale text mining • Text mining: – Producing a model from text analysis – Deriving high quality information / features from the model • Issues – Text mining analysis is resource intensive, cannot be done in real time on a large corpus – The model often is a black box to the user, lowering user confidence on derived applications
  • 6. Concept extraction • Main idea: extract concepts rather than words as a semantic model of patent documents, weight concepts according to sentence type – Weighted concepts produce a natural semantic vector model of the patent • To deal with large volume we need to do this upfront when loading the data – 3 CPU years to pre-process the whole database
  • 7. Key sentences tagging • Identify key sentences describing patent object, advantages and drawbacks, independent claims • Uses morpho-syntactic analysis to spot important sentences – “patent writer sentiment analysis” • Good compromise between conciseness of bibliographic abstracts and full text
  • 8. Semantic concept tagging • « noun phrases » identification – Part-of-speech + stemming – Verb Suppression, some adjectives – Suppress patent boilerplate terminology • « preferred embodiment », « skilled artisan » …. – Syntactic normalization: • « surface of screens » « screen surface » • Relevance score computation, based on : – Field, key sentence morpho-syntactic detection, and number of occurrences
  • 9. Concept extraction example The invention relates generally to molecular level cleaning of parts by vapor degreasing. More particularly, the invention relates to a solvent mixture comprising n-propyl bromide, a [mixture of low boiling solvents] and, … The solvent mixture of the invention is non- flammable, non-corrosive, non-hazardous, and has a low ozone depletion potential.
  • 10. Normalization examples the heat conductivity the conduction of heat heat conductivity heat conduction HEAT CONDUCTION the user of the cellular telephone cellular phone users a user of a mobile telephone any mobile phone users their user mobile telephones MOBILE PHONE USER
  • 11. « tag cloud » style visualization in orbit Vector model made user-friendly
  • 12. Text mining applications at database level • Since text mining has been applied massively, it can be leveraged at the database level (Fampat) • Similarity searching – More like this (one or more patents) – Refine by example • Related concept search – “chaussure de ski” yields: • Ski boot • Skier foot • Ski binding
  • 13. Large scale statistical analysis • Nothing special here ? – More precisely: that’s the goal – Be able to get all usual business graphics on patents – With close to instant response time on 1 million patent families – And exact results
  • 14. Prerequisites • All data normalized and clean in the database – Normalized assignee names – Legal status for all patents and European national phases – Normalized text mining concepts • Everything available at the family level in the same database • A fast hybrid engine – Boolean full-text – Semantic – Analytic (OLAP)
  • 15. Results • 1M patent family, less than 10s for top assignees
  • 16. Data exploration • On large volumes, you often are formulating and testing hypothesis on the spot – One click drill down :
  • 17. Semantic exploration • Cluster of concepts, filtered on year 2010 – Drill-down on any cell
  • 18. Graphing along Custom axis • Sometimes you want to produce business graphics on your own queries – Combining International classes with concepts • A61Q or cosmetic/KEYW – Defining precise patent categories by full text queries • (bio_diesel P Alga??)/CLMS
  • 20. Crossing custom axis • Crossing independent claim searches with concepts Concept words Independent Claim content
  • 21. Comparing scale-out and small corpus techniques Traditional /small corpus Large scale Pros • Ability to edit the data • Save edited data • Maps • Practically unlimited in volume • Full integration with search engine • Full access to text mining features Cons • Limited to 60,000 patent families • Full text queries often limited • Have to wait for data to be extracted/saved • No data vetting / editting • No data set saving We need to get the best of both worlds
  • 22. Converged tool: orbit analysis module • 3 data source – Live from database – Workfiles – Saved analysis: edited data • Same features, same user interface
  • 24. Questions • Feel free to share your horror / success stories in analyzing hundreds of thousand of patents.