SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
1 © 2020 Deep SEARCH 9 GmbH1https://deepsearchnine.com
Deep SEARCH 9
Using Transformer technology to build an
AI based personal News Rating system
AI-SDV 2020, October 06, Nice, France
Klaus Kater
Deep SEARCH 9 GmbH
Managing Partner
https://deepsearchnine.com
2 © 2020 Deep SEARCH 9 GmbH2https://deepsearchnine.com
Postcard from the Black Forest, D
3 © 2020 Deep SEARCH 9 GmbH3https://deepsearchnine.com
Personal News Rating
This is what I’ll cover in the next 25 minutes
• The goal: automatically retrieve news,
map to channels, rate and distribute
• Transformer Networks / BERT for NLP
• Data feeds
• AI development in the cloud
• Project architecture
• Screenshots
• Next steps
4 © 2020 Deep SEARCH 9 GmbH4https://deepsearchnine.com
Goal
Automatically retrieve news, map to channels, rate and distribute
Corporate
Websites
News portals
News feeds
Licensed 3rd party
content
• Mostly competitors, but also CROs,
suppliers, …
• Pharma news portals
• News providers, but mostly regulatory
• Mailing lists, newsletter
• Informa
About 120 feeds during the pilot phase
News recipients
Use some
magic
Categorize into different “Channels”,
Rate as “important” or “out of scope”
5 © 2020 Deep SEARCH 9 GmbH5https://deepsearchnine.com
AI development
Break-through in NN based NLP
1)Avaswani et al. (2017). Attention is all you need, 2)Alammar, Jay (2018). The Illustrated Transformer
“Attention is all you need”1)
Transformers: attention-based encoder – decoder neural networks
This isn’t a scientific lecture on the topic, for an “easy intro” I recommend: http://bit.ly/easy-transformer2)
1. Encoder embeds input terms with context information #1 - #3
2. The decoder uses this context information #1 - #3
(plus its own context information #1’ - #3’) to build the output
Sequence in – sequence out, e.g. for translation:
6 © 2020 Deep SEARCH 9 GmbH6https://deepsearchnine.com
AI development
BERT: We don’t need an output sequence
1)CodeEmporium (2020). BERT Neural Network - EXPLAINED!
Input classification is what we need, forget about the decoder
BERT comes pretrained (If you are interested in a more detailed explanation of how BERT works, I recommend https://bit.ly/easy-bert1))
Therefore, our problem is not language but “only” the fine-tuning to classify the news
„Researchers are developing and testing a
wearable device that can detect the
presence of cancer cells in the
bloodstream with greater accuracy.“
BERT
Classifier

Out of scope
Important

7 © 2020 Deep SEARCH 9 GmbH7https://deepsearchnine.com
AI development in the cloud
We chose Google Colab
for development of the
models in Python and
for preliminary training
BERT: We don’t need an output sequence
1)CodeEmporium (2020). BERT Neural Network - EXPLAINED!
Input classification is what we need, forget about the decoder
BERT comes pretrained
Therefore, our problem is not language but “only” the finetuning to classify the news
„Researchers are developing and testing a
wearable device that can detect the
presence of cancer cells in the
bloodstream with greater accuracy.“

Out of scope
Important

BERT
Classifier
8 © 2020 Deep SEARCH 9 GmbH8https://deepsearchnine.com
AI development in the cloud
We chose Google Colab
for development of the
models in Python and
preliminary training
First set of training
data (Excel sheets)
was manually created
by channel managers
resembling typical user
profiles
Create environment for NLP
• It is very difficult to train a model for
classification with some classes being
so under represented and others
being so dominant.
• Because there was not more training
data available, we decided to start
with two classes: “important” and
“out of scope”
97% training accuracy
82% prediction accuracy
9 © 2020 Deep SEARCH 9 GmbH9https://deepsearchnine.com
AI development in the cloud
Corporate
Websites
News portals
News feeds
Crawlers
{APIs}
{APIs}
3rd party APIsLicensed 3rd party
content
Feed readers
DS9 Runtime
{APIs}
Crawl/retrievenewsSetup News Tracker (DS9’s “bread and butter”)
Unrated news

 

10 © 2020 Deep SEARCH 9 GmbH10https://deepsearchnine.com
Predict with
trained models
Unrated and
rated news
Import
Import
Rated news
Export





AI development in the cloud
Verifying models manually
11 © 2020 Deep SEARCH 9 GmbH11https://deepsearchnine.com
Rating news manually (daily)
Train models
Unrated and
rated news
Import
Import
Rated news
Export
Corporate
Websites
News portals
News feeds
Crawlers
{APIs}
{APIs}
3rd party APIsLicensed 3rd party
content
Feed readers
News archive
Consume
DS9 Runtime
{APIs}
Crawl/retrievenews
extracted news
12 © 2020 Deep SEARCH 9 GmbH12https://deepsearchnine.com
Automating model deployment
{standardAPI}
Model meta data
Gitlab repository used for versioning and
maintenance of models with the original
software stack they were built on
Deploy Docker image
with selected model

Rate news with
selected model

Share models as
Docker image in
GitLab repository
Deploy and verify model
on Docker image
Publish model metadata




We chose Google Colab
for development of the
models in Python and
fine-tuning of BERT
13 © 2020 Deep SEARCH 9 GmbH13https://deepsearchnine.com
Final project architecture
Corporate
Websites
News portals
News feeds
Crawlers
{APIs}
{APIs}
3rd party APIsLicensed 3rd party
content
Feed readers
News archive (un)rated news
Consume
DS9 Runtime
{APIs}
{standardAPI}
Model meta data
Gitlab repository used for versioning and
maintenance of models with the original
software stack they were built on
Request news rating
for selected model
Return news rated with
selected model
Deploy Docker image
with selected model
Crawl/retrievenews





Rate news with
selected model

extracted news
Share models as
Docker image in
GitLab repository
Deploy and verify model
on Docker image
Publish model metadata
Optimization of
models, retraining
 model rating
14 © 2020 Deep SEARCH 9 GmbH14https://deepsearchnine.com
Training new channels
Corporate
Websites
News portals
News feeds
Crawlers
{APIs}
{APIs}
3rd party APIsLicensed 3rd party
content
Feed readers
News archive
DS9 Runtime
{APIs}
Model meta data
Crawl/retrievenews




Gitlab repository used for versioning and
maintenance of models with the original
software stack they were built on
Define new channel
Select sources
Manually rate news to create training data
Develop model for
new channel,
optimize, test and
deploy for
production
Share models as
Docker image in
GitLab repository
Deploy and verify model
on Docker image
Publish model
metadata
extracted news
manual rating
read training data




(un)rated news
15 © 2020 Deep SEARCH 9 GmbH15https://deepsearchnine.com
Goal
Automatically retrieve news, map to channels, rate and distribute
✓
16 © 2020 Deep SEARCH 9 GmbH16https://deepsearchnine.com
User interface
Subscribe to published news channels or
build personal channels
Filter on news
sources or use
tagged content for
drilling down
AI-based news rating:
Tagged with high or low importance
Manually override machine rating to
set rating to high / low
Package for redistribution
Search the archive
Link outs to original source
Ordered by prediction probability
17 © 2020 Deep SEARCH 9 GmbH17https://deepsearchnine.com
Next step
Let users maintain and publish their own channels
18 © 2020 Deep SEARCH 9 GmbH18https://deepsearchnine.com
Channel management
• Cloning and customize existing channel or
create a new channel
• If channel supports AI-based rating, retrain
Deep Learning-model based on user feedback
• Select only most interesting sources
• Filter news based on (complex Lucene) queries
• Users with appropriate permissions are entitled
to publish new channels for subscription by
others, alternatively publication requests need
to be authorized
19 © 2020 Deep SEARCH 9 GmbH19https://deepsearchnine.com
Deep SEARCH 9
Klaus Kater
Deep SEARCH 9 GmbH
Managing Partner
https://deepsearchnine.com
Thank you for your interest and attention!
Deep SEARCH 9 would love to meet you during AI-SDV 2020
To catch-up and to let you know what Deep SEARCH 9 has been up to since we last met!
We have two virtual rooms “Nice” and “Lyon” where you can meet us
If you’d just like to pop around to see us on our stand at any time during the breaks, please go to
http://bit.ly/AI-SDV-2020-Room-Nice
to book a time in our virtual room “Lyon”
to meet us in our virtual room “Nice”
If you'd like to book a 15-minute one-to-one slot, please go to
http://bit.ly/meet_DS9_at_AI-SDV

Weitere ähnliche Inhalte

Was ist angesagt?

ICIC 2017: The Use of Patent Information for Innovation and Competitive Intel...
ICIC 2017: The Use of Patent Information for Innovation and Competitive Intel...ICIC 2017: The Use of Patent Information for Innovation and Competitive Intel...
ICIC 2017: The Use of Patent Information for Innovation and Competitive Intel...
Dr. Haxel Consult
 
AI-SDV 2021: Francisco Webber - Efficiency is the New Precision
AI-SDV 2021: Francisco Webber - Efficiency is the New PrecisionAI-SDV 2021: Francisco Webber - Efficiency is the New Precision
AI-SDV 2021: Francisco Webber - Efficiency is the New Precision
Dr. Haxel Consult
 
ICIC 2014 Valuing IP in the Chemical Space – Science, Art and Special Conside...
ICIC 2014 Valuing IP in the Chemical Space – Science, Art and Special Conside...ICIC 2014 Valuing IP in the Chemical Space – Science, Art and Special Conside...
ICIC 2014 Valuing IP in the Chemical Space – Science, Art and Special Conside...
Dr. Haxel Consult
 
II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (K...
II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (K...II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (K...
II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (K...
Dr. Haxel Consult
 

Was ist angesagt? (20)

Utilising Open Source and Communities to Drive Innovation in a Cost-Effective...
Utilising Open Source and Communities to Drive Innovation in a Cost-Effective...Utilising Open Source and Communities to Drive Innovation in a Cost-Effective...
Utilising Open Source and Communities to Drive Innovation in a Cost-Effective...
 
IC-SDV 2018: Search Technology / VanatagePoint
IC-SDV 2018: Search Technology / VanatagePointIC-SDV 2018: Search Technology / VanatagePoint
IC-SDV 2018: Search Technology / VanatagePoint
 
II-SDV 2017: Centredoc
II-SDV 2017: CentredocII-SDV 2017: Centredoc
II-SDV 2017: Centredoc
 
II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...
II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...
II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...
 
II-SDV 2017: Approaches of Web Information Analysis in a Day to Day Work Envi...
II-SDV 2017: Approaches of Web Information Analysis in a Day to Day Work Envi...II-SDV 2017: Approaches of Web Information Analysis in a Day to Day Work Envi...
II-SDV 2017: Approaches of Web Information Analysis in a Day to Day Work Envi...
 
AI-SDV 2020: IPscreener
AI-SDV 2020: IPscreenerAI-SDV 2020: IPscreener
AI-SDV 2020: IPscreener
 
ICIC 2013 Conference Proceedings Tony Trippe Patinformatics
ICIC 2013 Conference Proceedings Tony Trippe PatinformaticsICIC 2013 Conference Proceedings Tony Trippe Patinformatics
ICIC 2013 Conference Proceedings Tony Trippe Patinformatics
 
II-PIC 2017: Porduct presentation minesoft
II-PIC 2017: Porduct presentation minesoftII-PIC 2017: Porduct presentation minesoft
II-PIC 2017: Porduct presentation minesoft
 
II-SV 2017: How to effectively monitor Technological Developments in IP
II-SV 2017: How to effectively monitor Technological Developments in IPII-SV 2017: How to effectively monitor Technological Developments in IP
II-SV 2017: How to effectively monitor Technological Developments in IP
 
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
 
II-SDV 2017: Gridlogics Technologies
II-SDV 2017: Gridlogics TechnologiesII-SDV 2017: Gridlogics Technologies
II-SDV 2017: Gridlogics Technologies
 
ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...
ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...
ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...
 
II-SDV 2017: What is Innovation and how can we measure it?
II-SDV 2017: What is Innovation and how can we measure it?II-SDV 2017: What is Innovation and how can we measure it?
II-SDV 2017: What is Innovation and how can we measure it?
 
ICIC 2017: The Use of Patent Information for Innovation and Competitive Intel...
ICIC 2017: The Use of Patent Information for Innovation and Competitive Intel...ICIC 2017: The Use of Patent Information for Innovation and Competitive Intel...
ICIC 2017: The Use of Patent Information for Innovation and Competitive Intel...
 
AI-SDV 2021: Francisco Webber - Efficiency is the New Precision
AI-SDV 2021: Francisco Webber - Efficiency is the New PrecisionAI-SDV 2021: Francisco Webber - Efficiency is the New Precision
AI-SDV 2021: Francisco Webber - Efficiency is the New Precision
 
II-PIC 2017: Gain insight into technical, legal and business information thro...
II-PIC 2017: Gain insight into technical, legal and business information thro...II-PIC 2017: Gain insight into technical, legal and business information thro...
II-PIC 2017: Gain insight into technical, legal and business information thro...
 
II-PIC 2017: Patent Information User Group PIUG
II-PIC 2017: Patent Information User Group PIUGII-PIC 2017: Patent Information User Group PIUG
II-PIC 2017: Patent Information User Group PIUG
 
II-PIC 2017: Gain insight into technical, legal and business information thro...
II-PIC 2017: Gain insight into technical, legal and business information thro...II-PIC 2017: Gain insight into technical, legal and business information thro...
II-PIC 2017: Gain insight into technical, legal and business information thro...
 
ICIC 2014 Valuing IP in the Chemical Space – Science, Art and Special Conside...
ICIC 2014 Valuing IP in the Chemical Space – Science, Art and Special Conside...ICIC 2014 Valuing IP in the Chemical Space – Science, Art and Special Conside...
ICIC 2014 Valuing IP in the Chemical Space – Science, Art and Special Conside...
 
II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (K...
II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (K...II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (K...
II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (K...
 

Ähnlich wie AI-SDV 2020: Using Transformer technology to build an AI based personal News Rating system Klaus Kater (Deep SEARCH 9, Germany )

IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter Analytics
Adrian Turcu
 

Ähnlich wie AI-SDV 2020: Using Transformer technology to build an AI based personal News Rating system Klaus Kater (Deep SEARCH 9, Germany ) (20)

Final create an app to perform intelligent searched on your data
Final  create an app to perform intelligent searched on your dataFinal  create an app to perform intelligent searched on your data
Final create an app to perform intelligent searched on your data
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
 
AI-SDV 2020: Seea SEARCH 9
AI-SDV 2020: Seea SEARCH 9AI-SDV 2020: Seea SEARCH 9
AI-SDV 2020: Seea SEARCH 9
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter Analytics
 
An Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech CompanyAn Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech Company
 
"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies
 
Digital Reinvention by NRB
Digital Reinvention by NRBDigital Reinvention by NRB
Digital Reinvention by NRB
 
How to build containerized architectures for deep learning - Data Festival 20...
How to build containerized architectures for deep learning - Data Festival 20...How to build containerized architectures for deep learning - Data Festival 20...
How to build containerized architectures for deep learning - Data Festival 20...
 
How open source empowers startups to start big, with case Double Open Oy
How open source empowers startups to start big, with case Double Open OyHow open source empowers startups to start big, with case Double Open Oy
How open source empowers startups to start big, with case Double Open Oy
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Automatisierte Provisionierung einer Data Lab Umgebung für Data Scientists
Automatisierte Provisionierung einer Data Lab Umgebung für Data ScientistsAutomatisierte Provisionierung einer Data Lab Umgebung für Data Scientists
Automatisierte Provisionierung einer Data Lab Umgebung für Data Scientists
 
Evolve18 | Carmen Sutter & Sarah Xu | Accelerate your Digital Experience with...
Evolve18 | Carmen Sutter & Sarah Xu | Accelerate your Digital Experience with...Evolve18 | Carmen Sutter & Sarah Xu | Accelerate your Digital Experience with...
Evolve18 | Carmen Sutter & Sarah Xu | Accelerate your Digital Experience with...
 
Tech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @CriteoTech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @Criteo
 
SnapLogic At Tableau Conference - Sept 2013 #tcc13
SnapLogic At Tableau Conference - Sept 2013 #tcc13SnapLogic At Tableau Conference - Sept 2013 #tcc13
SnapLogic At Tableau Conference - Sept 2013 #tcc13
 
How to deploy machine learning models into production
How to deploy machine learning models into productionHow to deploy machine learning models into production
How to deploy machine learning models into production
 
Media and Entertainment Industry Analysis
Media and Entertainment Industry AnalysisMedia and Entertainment Industry Analysis
Media and Entertainment Industry Analysis
 
ICIC 2016: New Product Introduction Deep SEARCH 9
ICIC 2016: New Product Introduction Deep SEARCH 9ICIC 2016: New Product Introduction Deep SEARCH 9
ICIC 2016: New Product Introduction Deep SEARCH 9
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 

Mehr von Dr. Haxel Consult

AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
Dr. Haxel Consult
 

Mehr von Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Kürzlich hochgeladen

Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Monica Sydney
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
ydyuyu
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
ayvbos
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Monica Sydney
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
Asmae Rabhi
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
pxcywzqs
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Monica Sydney
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 

Kürzlich hochgeladen (20)

APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolino
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 

AI-SDV 2020: Using Transformer technology to build an AI based personal News Rating system Klaus Kater (Deep SEARCH 9, Germany )

  • 1. 1 © 2020 Deep SEARCH 9 GmbH1https://deepsearchnine.com Deep SEARCH 9 Using Transformer technology to build an AI based personal News Rating system AI-SDV 2020, October 06, Nice, France Klaus Kater Deep SEARCH 9 GmbH Managing Partner https://deepsearchnine.com
  • 2. 2 © 2020 Deep SEARCH 9 GmbH2https://deepsearchnine.com Postcard from the Black Forest, D
  • 3. 3 © 2020 Deep SEARCH 9 GmbH3https://deepsearchnine.com Personal News Rating This is what I’ll cover in the next 25 minutes • The goal: automatically retrieve news, map to channels, rate and distribute • Transformer Networks / BERT for NLP • Data feeds • AI development in the cloud • Project architecture • Screenshots • Next steps
  • 4. 4 © 2020 Deep SEARCH 9 GmbH4https://deepsearchnine.com Goal Automatically retrieve news, map to channels, rate and distribute Corporate Websites News portals News feeds Licensed 3rd party content • Mostly competitors, but also CROs, suppliers, … • Pharma news portals • News providers, but mostly regulatory • Mailing lists, newsletter • Informa About 120 feeds during the pilot phase News recipients Use some magic Categorize into different “Channels”, Rate as “important” or “out of scope”
  • 5. 5 © 2020 Deep SEARCH 9 GmbH5https://deepsearchnine.com AI development Break-through in NN based NLP 1)Avaswani et al. (2017). Attention is all you need, 2)Alammar, Jay (2018). The Illustrated Transformer “Attention is all you need”1) Transformers: attention-based encoder – decoder neural networks This isn’t a scientific lecture on the topic, for an “easy intro” I recommend: http://bit.ly/easy-transformer2) 1. Encoder embeds input terms with context information #1 - #3 2. The decoder uses this context information #1 - #3 (plus its own context information #1’ - #3’) to build the output Sequence in – sequence out, e.g. for translation:
  • 6. 6 © 2020 Deep SEARCH 9 GmbH6https://deepsearchnine.com AI development BERT: We don’t need an output sequence 1)CodeEmporium (2020). BERT Neural Network - EXPLAINED! Input classification is what we need, forget about the decoder BERT comes pretrained (If you are interested in a more detailed explanation of how BERT works, I recommend https://bit.ly/easy-bert1)) Therefore, our problem is not language but “only” the fine-tuning to classify the news „Researchers are developing and testing a wearable device that can detect the presence of cancer cells in the bloodstream with greater accuracy.“ BERT Classifier  Out of scope Important 
  • 7. 7 © 2020 Deep SEARCH 9 GmbH7https://deepsearchnine.com AI development in the cloud We chose Google Colab for development of the models in Python and for preliminary training BERT: We don’t need an output sequence 1)CodeEmporium (2020). BERT Neural Network - EXPLAINED! Input classification is what we need, forget about the decoder BERT comes pretrained Therefore, our problem is not language but “only” the finetuning to classify the news „Researchers are developing and testing a wearable device that can detect the presence of cancer cells in the bloodstream with greater accuracy.“  Out of scope Important  BERT Classifier
  • 8. 8 © 2020 Deep SEARCH 9 GmbH8https://deepsearchnine.com AI development in the cloud We chose Google Colab for development of the models in Python and preliminary training First set of training data (Excel sheets) was manually created by channel managers resembling typical user profiles Create environment for NLP • It is very difficult to train a model for classification with some classes being so under represented and others being so dominant. • Because there was not more training data available, we decided to start with two classes: “important” and “out of scope” 97% training accuracy 82% prediction accuracy
  • 9. 9 © 2020 Deep SEARCH 9 GmbH9https://deepsearchnine.com AI development in the cloud Corporate Websites News portals News feeds Crawlers {APIs} {APIs} 3rd party APIsLicensed 3rd party content Feed readers DS9 Runtime {APIs} Crawl/retrievenewsSetup News Tracker (DS9’s “bread and butter”) Unrated news    
  • 10. 10 © 2020 Deep SEARCH 9 GmbH10https://deepsearchnine.com Predict with trained models Unrated and rated news Import Import Rated news Export      AI development in the cloud Verifying models manually
  • 11. 11 © 2020 Deep SEARCH 9 GmbH11https://deepsearchnine.com Rating news manually (daily) Train models Unrated and rated news Import Import Rated news Export Corporate Websites News portals News feeds Crawlers {APIs} {APIs} 3rd party APIsLicensed 3rd party content Feed readers News archive Consume DS9 Runtime {APIs} Crawl/retrievenews extracted news
  • 12. 12 © 2020 Deep SEARCH 9 GmbH12https://deepsearchnine.com Automating model deployment {standardAPI} Model meta data Gitlab repository used for versioning and maintenance of models with the original software stack they were built on Deploy Docker image with selected model  Rate news with selected model  Share models as Docker image in GitLab repository Deploy and verify model on Docker image Publish model metadata     We chose Google Colab for development of the models in Python and fine-tuning of BERT
  • 13. 13 © 2020 Deep SEARCH 9 GmbH13https://deepsearchnine.com Final project architecture Corporate Websites News portals News feeds Crawlers {APIs} {APIs} 3rd party APIsLicensed 3rd party content Feed readers News archive (un)rated news Consume DS9 Runtime {APIs} {standardAPI} Model meta data Gitlab repository used for versioning and maintenance of models with the original software stack they were built on Request news rating for selected model Return news rated with selected model Deploy Docker image with selected model Crawl/retrievenews      Rate news with selected model  extracted news Share models as Docker image in GitLab repository Deploy and verify model on Docker image Publish model metadata Optimization of models, retraining  model rating
  • 14. 14 © 2020 Deep SEARCH 9 GmbH14https://deepsearchnine.com Training new channels Corporate Websites News portals News feeds Crawlers {APIs} {APIs} 3rd party APIsLicensed 3rd party content Feed readers News archive DS9 Runtime {APIs} Model meta data Crawl/retrievenews     Gitlab repository used for versioning and maintenance of models with the original software stack they were built on Define new channel Select sources Manually rate news to create training data Develop model for new channel, optimize, test and deploy for production Share models as Docker image in GitLab repository Deploy and verify model on Docker image Publish model metadata extracted news manual rating read training data     (un)rated news
  • 15. 15 © 2020 Deep SEARCH 9 GmbH15https://deepsearchnine.com Goal Automatically retrieve news, map to channels, rate and distribute ✓
  • 16. 16 © 2020 Deep SEARCH 9 GmbH16https://deepsearchnine.com User interface Subscribe to published news channels or build personal channels Filter on news sources or use tagged content for drilling down AI-based news rating: Tagged with high or low importance Manually override machine rating to set rating to high / low Package for redistribution Search the archive Link outs to original source Ordered by prediction probability
  • 17. 17 © 2020 Deep SEARCH 9 GmbH17https://deepsearchnine.com Next step Let users maintain and publish their own channels
  • 18. 18 © 2020 Deep SEARCH 9 GmbH18https://deepsearchnine.com Channel management • Cloning and customize existing channel or create a new channel • If channel supports AI-based rating, retrain Deep Learning-model based on user feedback • Select only most interesting sources • Filter news based on (complex Lucene) queries • Users with appropriate permissions are entitled to publish new channels for subscription by others, alternatively publication requests need to be authorized
  • 19. 19 © 2020 Deep SEARCH 9 GmbH19https://deepsearchnine.com Deep SEARCH 9 Klaus Kater Deep SEARCH 9 GmbH Managing Partner https://deepsearchnine.com Thank you for your interest and attention! Deep SEARCH 9 would love to meet you during AI-SDV 2020 To catch-up and to let you know what Deep SEARCH 9 has been up to since we last met! We have two virtual rooms “Nice” and “Lyon” where you can meet us If you’d just like to pop around to see us on our stand at any time during the breaks, please go to http://bit.ly/AI-SDV-2020-Room-Nice to book a time in our virtual room “Lyon” to meet us in our virtual room “Nice” If you'd like to book a 15-minute one-to-one slot, please go to http://bit.ly/meet_DS9_at_AI-SDV