SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Boehringer Ingelheim Pharma GmbH & Co. KG
Research Networking - Aleksandar Kapisoda
Deep Web Search
Deep SEARCH 9 GmbH
Klaus Kater
Content
1. Intro
2. Search Approach
• Public Search Approach
• SEARCHCORPUS® Approach
3. Use Cases
• SEARCHCORPUS® for life science startups:
We find startup information we could not find in public search engines.
• Life science news SEARCHCORPUS®:
100s of incoming mails and alerts are processed every day and websites and
articles behind the news tags are crawled automatically.
4. Technical Features
5. Outlook
1. Intro
Deep (Web) Search
1. Intro
2015
(Deep Web) Search
We showed that we can crawl and find content that public search engines do not find.
1. Intro
What we did in 2015…
2015 (Deep Web) Search
……2014 …………………….………2015………………….………2016…..
During the year we
established our internal
processes to build targeted
SEARCHCORPORA.
We built solutions and
rolled them out.
And we found more than we
bargained for.
1. Intro
2016
Deep (Web Search)
This year we will talk about a misconception were confronted with
when comparing our SEARCHCORPUS® based search results
with search results from public search engines.
2. The Public Search Approach
Public Search Misconception
Clashing with Incomplete Search Results
Let’s make up a „Weißwurst Misconception“…
2. The Public Search Approach
Clashing with Incomplete Search Results
Anybody understands that Weißwurst without Weißwurst mustard is
like Fish‘n‘Chips without Chips.
…to make it easier to understand the “Public Search Misconception” .
Web search is like trying to find “Weißwurst”mustard”
in a Convenience Store1)
2. The Public Search Approach
Clashing with Incomplete Search Results
You will find loads of local and
not so local mustards.
But if Weißwurst mustard is
located in the specialities
section, you will only find it by
chance or not at all…
1) Not a Bavarian conveniance store.
1) Not a Bavarian conveniance store.
No Weißwurst
mustard!
Web search is like trying to find Weißwurst mustard
in a Convenience Store1)
2. The Public Search Approach
Clashing with Incomplete Search Results
So you may believe, that the
store does not carry Weißwurst
mustard at all.
2. The Public Search Approach
Clashing with Incomplete Search Results
There are two common misperceptions researchers
using public search are entrapped in:
• If a search has results,
we believe that these results are complete.
• If a search doesn‘t have results,
we believe there is nothing that can be found
Both perceptions are wrong and represent the Public Search misconception :
We believe that there is nothing to be found, even though the information may be
available.
We just don’t know where and need the right tools to find it.
This store
doesn‘t have
Weißwurst
mustard…
2. The Public Search Approach
Why Results Are Missed
An explanation why results are missed
Assume we want to monitor startup activities in the area
of CRISPR being used in the fight against diabetes type 1:
+CRISPR +diabetes type 1
2. The Public Search Approach
Why Results Are Missed
2. The Public Search Approach
Why Results Are Missed
An explanation why results are missed
To avoid getting overloaded with biotechnological research papers,
we try to tell the search engine that we are interested in +startups....
+CRISPR +diabetes type 1
+startup
2. The Public Search Approach
Why Results Are Missed
+CRISPR +diabetes type 1
+startup
Only documents in which all terms
match are returned.These documents
are actually on startups.
But only, if the startups were
mentioned in some press release
or report.
2. SEARCHCORPUS® Approach
Documents are set into context already when the SEARCHCORPUS® is being built.
3. Use Cases
Use Case 1
SEARCHCORPUS® for Life Science Startups
3. Use Cases
SEARCHCORPUS® for Life Science Startups:
Situation:
Researchers manually search for startup activities and companies who are active in
specific areas of interest. Interest changes frequently.
Problem:
Searching for startups by scientific topics generates an enormous amount of noise that
needs to be filtered manually.
Approach:
Implementation of a startup SEARCHCORPUS® spanning global startup companies.
Status:
Existing startup SEARCHCORPUS for targeted Search
3. Use Cases
SEARCHCORPUS® for life science startups:
Google SEARCH results
3. Use Cases
SEARCHCORPUS® for life science startups:
3. Use Cases
SEARCHCORPUS® for life science startups:
3. Use Cases
SEARCHCORPUS® for life science startups:
3. Use Cases
SEARCHCORPUS® for life science startups:
The startup that was found in the SEARCHCORPUS®
Proximity search
3. Use Cases
Use Case 2
Life Science News SEARCHCORPUS®
3. Use Cases
Life Science News SEARCHCORPUS®
Situation:
Researchers are manually filtering 100reds of websites, emails and news feeds
• News that are not screened immediately are lost
Approach:
A targeted news SEARCHCORPUS® using periodic targeted crawling and extraction of
news from sources used by Boehringer Ingelheim scientists.
1. Tracker is made available to researchers in the corporate Intranet
2. News-Archive with faceted search using ontology based query term expansion
3. Search profile based email alerting, whenever matching news are crawled
Status:
Existing news SEARCHCORPUS for targeted Search
3. Use Cases
Life science news SEARCHCORPUS®
• Viewer is updated by the minute, targets could be crawled as frequently as every 10s.
• Crawling frequence and crawling schedule are defined by target.
3. Use Cases
Life science news SEARCHCORPUS®
4. Technical Features
Software:
Deep SEARCH 9 platform for advanced web analytics:
• Concurrent targeted crawling
• Content extraction
• Document caching
• Content annotation (RDF based and via APIs, e.g. Luxid)
• Scheduler for periodic jobs
• Integration of ds9 search and visualization in BI Intranet through API
• News tracker GUI for real-time news monitoring
• Faceted search GUI with RDF based query term expansion
Hardware:
3 Server cluster running ds9, JDBC database, RDF triple store and Elasticsearch.
Currently 90 TB disk space.
5. Outlook
• SEARCHCORPORA®
• Setup of more comprehensive SEARCHCORPORA® (startup, news)
• Extending targeted SEARCHCORPORA® (Life Science domain)
• More Viewer for Data Visualisation (Results)
• Communication with other third party software via API / webservice
• Integration of Semantic Web Technologies
• Terminology
• RDF import/export
Contact Information
Aleksandar Kapisoda
aleksandar.kapisoda@boehringer-ingelheim.com
Research Networking
klaus.kater@deepsearchnine.com
Klaus Kater
Questions?
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
Dr. Haxel Consult
 
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
Dr. Haxel Consult
 
IC-SDV 2019: OntoChem
IC-SDV 2019: OntoChemIC-SDV 2019: OntoChem
IC-SDV 2019: OntoChem
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities  ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
Dr. Haxel Consult
 

Was ist angesagt? (20)

How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
 
II-SDV 2016 Minesoft
II-SDV 2016 MinesoftII-SDV 2016 Minesoft
II-SDV 2016 Minesoft
 
New PID developments
New PID developmentsNew PID developments
New PID developments
 
ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy  ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy
 
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
 
Role of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly worksRole of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly works
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
 
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
 
ICIC 2017: New product presentations CAS
ICIC 2017: New product presentations CASICIC 2017: New product presentations CAS
ICIC 2017: New product presentations CAS
 
ICIC 2014 Patent Citation Analysis: Tools and Techniques
ICIC 2014 Patent Citation Analysis: Tools and Techniques ICIC 2014 Patent Citation Analysis: Tools and Techniques
ICIC 2014 Patent Citation Analysis: Tools and Techniques
 
ICIC 2014 New Product Presentations ChemAxon
ICIC 2014 New Product Presentations ChemAxon ICIC 2014 New Product Presentations ChemAxon
ICIC 2014 New Product Presentations ChemAxon
 
IC-SDV 2019: OntoChem
IC-SDV 2019: OntoChemIC-SDV 2019: OntoChem
IC-SDV 2019: OntoChem
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2016 GRIDLOGICS
II-SDV 2016 GRIDLOGICSII-SDV 2016 GRIDLOGICS
II-SDV 2016 GRIDLOGICS
 
II-SDV Andrew Hinton - Text mining - as normal as data mining?
II-SDV Andrew Hinton - Text mining - as normal as data mining?II-SDV Andrew Hinton - Text mining - as normal as data mining?
II-SDV Andrew Hinton - Text mining - as normal as data mining?
 
ICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoftICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoft
 
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities  ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
 

Andere mochten auch

II-SDV Emmanuelle Fortune - SMEs as Patent Applicants in France in 2014
II-SDV Emmanuelle Fortune - SMEs as Patent Applicants in France in 2014 II-SDV Emmanuelle Fortune - SMEs as Patent Applicants in France in 2014
II-SDV Emmanuelle Fortune - SMEs as Patent Applicants in France in 2014
Dr. Haxel Consult
 

Andere mochten auch (13)

II-SDV 2016 Stefan Geißler Navigating complex information landscapes – Semant...
II-SDV 2016 Stefan Geißler Navigating complex information landscapes – Semant...II-SDV 2016 Stefan Geißler Navigating complex information landscapes – Semant...
II-SDV 2016 Stefan Geißler Navigating complex information landscapes – Semant...
 
II-SDV 2016 Denis Bayada - Concomitant Ontology-Driven Patent and Non-Patent ...
II-SDV 2016 Denis Bayada - Concomitant Ontology-Driven Patent and Non-Patent ...II-SDV 2016 Denis Bayada - Concomitant Ontology-Driven Patent and Non-Patent ...
II-SDV 2016 Denis Bayada - Concomitant Ontology-Driven Patent and Non-Patent ...
 
II-SDV 2016 GQ Life Sciences
II-SDV 2016 GQ Life SciencesII-SDV 2016 GQ Life Sciences
II-SDV 2016 GQ Life Sciences
 
II-SDV 2016 VantagePoint
II-SDV 2016 VantagePointII-SDV 2016 VantagePoint
II-SDV 2016 VantagePoint
 
II-SDV 2016 Centredoc
II-SDV 2016 CentredocII-SDV 2016 Centredoc
II-SDV 2016 Centredoc
 
II-SDV Emmanuelle Fortune - SMEs as Patent Applicants in France in 2014
II-SDV Emmanuelle Fortune - SMEs as Patent Applicants in France in 2014 II-SDV Emmanuelle Fortune - SMEs as Patent Applicants in France in 2014
II-SDV Emmanuelle Fortune - SMEs as Patent Applicants in France in 2014
 
II-SDV 2016 Raphael Ilmer, Quentin Ladetto - Optimization of Patent Landscape...
II-SDV 2016 Raphael Ilmer, Quentin Ladetto - Optimization of Patent Landscape...II-SDV 2016 Raphael Ilmer, Quentin Ladetto - Optimization of Patent Landscape...
II-SDV 2016 Raphael Ilmer, Quentin Ladetto - Optimization of Patent Landscape...
 
II-SDV 2016 Bob Stembridge We have all the Time in the World; a Review of ho...
II-SDV 2016 Bob Stembridge  We have all the Time in the World; a Review of ho...II-SDV 2016 Bob Stembridge  We have all the Time in the World; a Review of ho...
II-SDV 2016 Bob Stembridge We have all the Time in the World; a Review of ho...
 
II-SDV 2016 BizInt
II-SDV 2016 BizIntII-SDV 2016 BizInt
II-SDV 2016 BizInt
 
II-SDV 2016 Diane Webb - Challenges in Visualizing Pharmaceutical Information...
II-SDV 2016 Diane Webb - Challenges in Visualizing Pharmaceutical Information...II-SDV 2016 Diane Webb - Challenges in Visualizing Pharmaceutical Information...
II-SDV 2016 Diane Webb - Challenges in Visualizing Pharmaceutical Information...
 
II-SDV 2015 The International Information Conference on Search, Data Mining a...
II-SDV 2015 The International Information Conference on Search, Data Mining a...II-SDV 2015 The International Information Conference on Search, Data Mining a...
II-SDV 2015 The International Information Conference on Search, Data Mining a...
 
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
 
Gephi Quick Start
Gephi Quick StartGephi Quick Start
Gephi Quick Start
 

Ähnlich wie II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search

II-SDV 2015, 20 - 21 April 2015 in Nice
II-SDV 2015, 20 - 21 April 2015 in NiceII-SDV 2015, 20 - 21 April 2015 in Nice
II-SDV 2015, 20 - 21 April 2015 in Nice
Dr. Haxel Consult
 
In search for a good practice of finding information
In search for a good practice of finding informationIn search for a good practice of finding information
In search for a good practice of finding information
Kristian Norling
 
Seo Presentation
Seo PresentationSeo Presentation
Seo Presentation
Astuanax
 

Ähnlich wie II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search (20)

II-SDV 2015, 20 - 21 April 2015 in Nice
II-SDV 2015, 20 - 21 April 2015 in NiceII-SDV 2015, 20 - 21 April 2015 in Nice
II-SDV 2015, 20 - 21 April 2015 in Nice
 
In search for a good practice of finding information
In search for a good practice of finding informationIn search for a good practice of finding information
In search for a good practice of finding information
 
II-SDV 2017: Deep SEARCH 9
II-SDV 2017: Deep SEARCH 9II-SDV 2017: Deep SEARCH 9
II-SDV 2017: Deep SEARCH 9
 
Seo Presentation
Seo PresentationSeo Presentation
Seo Presentation
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Professional Information Research
Professional Information ResearchProfessional Information Research
Professional Information Research
 
Online research and research skills
Online research and research skillsOnline research and research skills
Online research and research skills
 
SplunkSummit 2015 - A Quick Guide to Search Optimization
SplunkSummit 2015 - A Quick Guide to Search OptimizationSplunkSummit 2015 - A Quick Guide to Search Optimization
SplunkSummit 2015 - A Quick Guide to Search Optimization
 
Textkernel - Semantic Recruiting Technology
Textkernel - Semantic Recruiting TechnologyTextkernel - Semantic Recruiting Technology
Textkernel - Semantic Recruiting Technology
 
Mastering the Blog: A Step-by-Step Plan from Launch to Leads
Mastering the Blog: A Step-by-Step Plan from Launch to LeadsMastering the Blog: A Step-by-Step Plan from Launch to Leads
Mastering the Blog: A Step-by-Step Plan from Launch to Leads
 
Enterprise Search – How Relevant Is Relevance?
Enterprise Search – How Relevant Is Relevance?Enterprise Search – How Relevant Is Relevance?
Enterprise Search – How Relevant Is Relevance?
 
Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
 
pre- venture cup #1: Rune Rex
pre- venture cup #1: Rune Rexpre- venture cup #1: Rune Rex
pre- venture cup #1: Rune Rex
 
The Impact of OpenSocial at UCSF
The Impact of OpenSocial at UCSFThe Impact of OpenSocial at UCSF
The Impact of OpenSocial at UCSF
 
Ask the Experts about LinkedIn Recruiter
Ask the Experts about LinkedIn RecruiterAsk the Experts about LinkedIn Recruiter
Ask the Experts about LinkedIn Recruiter
 
IC-SDV 2018: Deep Search 9
IC-SDV 2018: Deep Search 9IC-SDV 2018: Deep Search 9
IC-SDV 2018: Deep Search 9
 
Social Work Masters Literature Review: Practical Searching
Social Work Masters Literature Review: Practical SearchingSocial Work Masters Literature Review: Practical Searching
Social Work Masters Literature Review: Practical Searching
 
Georgetown University Guest lecture on SEO and online marketing
Georgetown University Guest lecture on SEO and online marketingGeorgetown University Guest lecture on SEO and online marketing
Georgetown University Guest lecture on SEO and online marketing
 
Developing a Search & Findability Practice for the Enterprise – Ravi Mynampat...
Developing a Search & Findability Practice for the Enterprise – Ravi Mynampat...Developing a Search & Findability Practice for the Enterprise – Ravi Mynampat...
Developing a Search & Findability Practice for the Enterprise – Ravi Mynampat...
 

Mehr von Dr. Haxel Consult

AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
Dr. Haxel Consult
 

Mehr von Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Kürzlich hochgeladen

VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
ydyuyu
 
( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...
( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...
( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...
nilamkumrai
 
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
nilamkumrai
 
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
@Chandigarh #call #Girls 9053900678 @Call #Girls in @Punjab 9053900678
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
@Chandigarh #call #Girls 9053900678 @Call #Girls in @Punjab 9053900678
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Chandigarh Call girls 9053900678 Call girls in Chandigarh
 

Kürzlich hochgeladen (20)

VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...
( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...
( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
 
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
 
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 

II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search

  • 1. Boehringer Ingelheim Pharma GmbH & Co. KG Research Networking - Aleksandar Kapisoda Deep Web Search Deep SEARCH 9 GmbH Klaus Kater
  • 2. Content 1. Intro 2. Search Approach • Public Search Approach • SEARCHCORPUS® Approach 3. Use Cases • SEARCHCORPUS® for life science startups: We find startup information we could not find in public search engines. • Life science news SEARCHCORPUS®: 100s of incoming mails and alerts are processed every day and websites and articles behind the news tags are crawled automatically. 4. Technical Features 5. Outlook
  • 4. 1. Intro 2015 (Deep Web) Search We showed that we can crawl and find content that public search engines do not find.
  • 5. 1. Intro What we did in 2015… 2015 (Deep Web) Search ……2014 …………………….………2015………………….………2016….. During the year we established our internal processes to build targeted SEARCHCORPORA. We built solutions and rolled them out. And we found more than we bargained for.
  • 6. 1. Intro 2016 Deep (Web Search) This year we will talk about a misconception were confronted with when comparing our SEARCHCORPUS® based search results with search results from public search engines.
  • 7. 2. The Public Search Approach Public Search Misconception Clashing with Incomplete Search Results
  • 8. Let’s make up a „Weißwurst Misconception“… 2. The Public Search Approach Clashing with Incomplete Search Results Anybody understands that Weißwurst without Weißwurst mustard is like Fish‘n‘Chips without Chips. …to make it easier to understand the “Public Search Misconception” .
  • 9. Web search is like trying to find “Weißwurst”mustard” in a Convenience Store1) 2. The Public Search Approach Clashing with Incomplete Search Results You will find loads of local and not so local mustards. But if Weißwurst mustard is located in the specialities section, you will only find it by chance or not at all… 1) Not a Bavarian conveniance store.
  • 10. 1) Not a Bavarian conveniance store. No Weißwurst mustard! Web search is like trying to find Weißwurst mustard in a Convenience Store1) 2. The Public Search Approach Clashing with Incomplete Search Results So you may believe, that the store does not carry Weißwurst mustard at all.
  • 11. 2. The Public Search Approach Clashing with Incomplete Search Results There are two common misperceptions researchers using public search are entrapped in: • If a search has results, we believe that these results are complete. • If a search doesn‘t have results, we believe there is nothing that can be found Both perceptions are wrong and represent the Public Search misconception : We believe that there is nothing to be found, even though the information may be available. We just don’t know where and need the right tools to find it. This store doesn‘t have Weißwurst mustard…
  • 12. 2. The Public Search Approach Why Results Are Missed An explanation why results are missed Assume we want to monitor startup activities in the area of CRISPR being used in the fight against diabetes type 1: +CRISPR +diabetes type 1
  • 13. 2. The Public Search Approach Why Results Are Missed
  • 14. 2. The Public Search Approach Why Results Are Missed An explanation why results are missed To avoid getting overloaded with biotechnological research papers, we try to tell the search engine that we are interested in +startups.... +CRISPR +diabetes type 1 +startup
  • 15. 2. The Public Search Approach Why Results Are Missed +CRISPR +diabetes type 1 +startup Only documents in which all terms match are returned.These documents are actually on startups. But only, if the startups were mentioned in some press release or report.
  • 16. 2. SEARCHCORPUS® Approach Documents are set into context already when the SEARCHCORPUS® is being built.
  • 17. 3. Use Cases Use Case 1 SEARCHCORPUS® for Life Science Startups
  • 18. 3. Use Cases SEARCHCORPUS® for Life Science Startups: Situation: Researchers manually search for startup activities and companies who are active in specific areas of interest. Interest changes frequently. Problem: Searching for startups by scientific topics generates an enormous amount of noise that needs to be filtered manually. Approach: Implementation of a startup SEARCHCORPUS® spanning global startup companies. Status: Existing startup SEARCHCORPUS for targeted Search
  • 19. 3. Use Cases SEARCHCORPUS® for life science startups: Google SEARCH results
  • 20. 3. Use Cases SEARCHCORPUS® for life science startups:
  • 21. 3. Use Cases SEARCHCORPUS® for life science startups:
  • 22. 3. Use Cases SEARCHCORPUS® for life science startups:
  • 23. 3. Use Cases SEARCHCORPUS® for life science startups: The startup that was found in the SEARCHCORPUS® Proximity search
  • 24. 3. Use Cases Use Case 2 Life Science News SEARCHCORPUS®
  • 25. 3. Use Cases Life Science News SEARCHCORPUS® Situation: Researchers are manually filtering 100reds of websites, emails and news feeds • News that are not screened immediately are lost Approach: A targeted news SEARCHCORPUS® using periodic targeted crawling and extraction of news from sources used by Boehringer Ingelheim scientists. 1. Tracker is made available to researchers in the corporate Intranet 2. News-Archive with faceted search using ontology based query term expansion 3. Search profile based email alerting, whenever matching news are crawled Status: Existing news SEARCHCORPUS for targeted Search
  • 26. 3. Use Cases Life science news SEARCHCORPUS® • Viewer is updated by the minute, targets could be crawled as frequently as every 10s. • Crawling frequence and crawling schedule are defined by target.
  • 27. 3. Use Cases Life science news SEARCHCORPUS®
  • 28. 4. Technical Features Software: Deep SEARCH 9 platform for advanced web analytics: • Concurrent targeted crawling • Content extraction • Document caching • Content annotation (RDF based and via APIs, e.g. Luxid) • Scheduler for periodic jobs • Integration of ds9 search and visualization in BI Intranet through API • News tracker GUI for real-time news monitoring • Faceted search GUI with RDF based query term expansion Hardware: 3 Server cluster running ds9, JDBC database, RDF triple store and Elasticsearch. Currently 90 TB disk space.
  • 29. 5. Outlook • SEARCHCORPORA® • Setup of more comprehensive SEARCHCORPORA® (startup, news) • Extending targeted SEARCHCORPORA® (Life Science domain) • More Viewer for Data Visualisation (Results) • Communication with other third party software via API / webservice • Integration of Semantic Web Technologies • Terminology • RDF import/export