Weitere ähnliche Inhalte Ähnlich wie AI-SDV 2020: Using Transformer technology to build an AI based personal News Rating system Klaus Kater (Deep SEARCH 9, Germany ) (20) Mehr von Dr. Haxel Consult (20) Kürzlich hochgeladen (20) AI-SDV 2020: Using Transformer technology to build an AI based personal News Rating system Klaus Kater (Deep SEARCH 9, Germany )1. 1 © 2020 Deep SEARCH 9 GmbH1https://deepsearchnine.com
Deep SEARCH 9
Using Transformer technology to build an
AI based personal News Rating system
AI-SDV 2020, October 06, Nice, France
Klaus Kater
Deep SEARCH 9 GmbH
Managing Partner
https://deepsearchnine.com
2. 2 © 2020 Deep SEARCH 9 GmbH2https://deepsearchnine.com
Postcard from the Black Forest, D
3. 3 © 2020 Deep SEARCH 9 GmbH3https://deepsearchnine.com
Personal News Rating
This is what I’ll cover in the next 25 minutes
• The goal: automatically retrieve news,
map to channels, rate and distribute
• Transformer Networks / BERT for NLP
• Data feeds
• AI development in the cloud
• Project architecture
• Screenshots
• Next steps
4. 4 © 2020 Deep SEARCH 9 GmbH4https://deepsearchnine.com
Goal
Automatically retrieve news, map to channels, rate and distribute
Corporate
Websites
News portals
News feeds
Licensed 3rd party
content
• Mostly competitors, but also CROs,
suppliers, …
• Pharma news portals
• News providers, but mostly regulatory
• Mailing lists, newsletter
• Informa
About 120 feeds during the pilot phase
News recipients
Use some
magic
Categorize into different “Channels”,
Rate as “important” or “out of scope”
5. 5 © 2020 Deep SEARCH 9 GmbH5https://deepsearchnine.com
AI development
Break-through in NN based NLP
1)Avaswani et al. (2017). Attention is all you need, 2)Alammar, Jay (2018). The Illustrated Transformer
“Attention is all you need”1)
Transformers: attention-based encoder – decoder neural networks
This isn’t a scientific lecture on the topic, for an “easy intro” I recommend: http://bit.ly/easy-transformer2)
1. Encoder embeds input terms with context information #1 - #3
2. The decoder uses this context information #1 - #3
(plus its own context information #1’ - #3’) to build the output
Sequence in – sequence out, e.g. for translation:
6. 6 © 2020 Deep SEARCH 9 GmbH6https://deepsearchnine.com
AI development
BERT: We don’t need an output sequence
1)CodeEmporium (2020). BERT Neural Network - EXPLAINED!
Input classification is what we need, forget about the decoder
BERT comes pretrained (If you are interested in a more detailed explanation of how BERT works, I recommend https://bit.ly/easy-bert1))
Therefore, our problem is not language but “only” the fine-tuning to classify the news
„Researchers are developing and testing a
wearable device that can detect the
presence of cancer cells in the
bloodstream with greater accuracy.“
BERT
Classifier
Out of scope
Important
7. 7 © 2020 Deep SEARCH 9 GmbH7https://deepsearchnine.com
AI development in the cloud
We chose Google Colab
for development of the
models in Python and
for preliminary training
BERT: We don’t need an output sequence
1)CodeEmporium (2020). BERT Neural Network - EXPLAINED!
Input classification is what we need, forget about the decoder
BERT comes pretrained
Therefore, our problem is not language but “only” the finetuning to classify the news
„Researchers are developing and testing a
wearable device that can detect the
presence of cancer cells in the
bloodstream with greater accuracy.“
Out of scope
Important
BERT
Classifier
8. 8 © 2020 Deep SEARCH 9 GmbH8https://deepsearchnine.com
AI development in the cloud
We chose Google Colab
for development of the
models in Python and
preliminary training
First set of training
data (Excel sheets)
was manually created
by channel managers
resembling typical user
profiles
Create environment for NLP
• It is very difficult to train a model for
classification with some classes being
so under represented and others
being so dominant.
• Because there was not more training
data available, we decided to start
with two classes: “important” and
“out of scope”
97% training accuracy
82% prediction accuracy
9. 9 © 2020 Deep SEARCH 9 GmbH9https://deepsearchnine.com
AI development in the cloud
Corporate
Websites
News portals
News feeds
Crawlers
{APIs}
{APIs}
3rd party APIsLicensed 3rd party
content
Feed readers
DS9 Runtime
{APIs}
Crawl/retrievenewsSetup News Tracker (DS9’s “bread and butter”)
Unrated news
10. 10 © 2020 Deep SEARCH 9 GmbH10https://deepsearchnine.com
Predict with
trained models
Unrated and
rated news
Import
Import
Rated news
Export
AI development in the cloud
Verifying models manually
11. 11 © 2020 Deep SEARCH 9 GmbH11https://deepsearchnine.com
Rating news manually (daily)
Train models
Unrated and
rated news
Import
Import
Rated news
Export
Corporate
Websites
News portals
News feeds
Crawlers
{APIs}
{APIs}
3rd party APIsLicensed 3rd party
content
Feed readers
News archive
Consume
DS9 Runtime
{APIs}
Crawl/retrievenews
extracted news
12. 12 © 2020 Deep SEARCH 9 GmbH12https://deepsearchnine.com
Automating model deployment
{standardAPI}
Model meta data
Gitlab repository used for versioning and
maintenance of models with the original
software stack they were built on
Deploy Docker image
with selected model
Rate news with
selected model
Share models as
Docker image in
GitLab repository
Deploy and verify model
on Docker image
Publish model metadata
We chose Google Colab
for development of the
models in Python and
fine-tuning of BERT
13. 13 © 2020 Deep SEARCH 9 GmbH13https://deepsearchnine.com
Final project architecture
Corporate
Websites
News portals
News feeds
Crawlers
{APIs}
{APIs}
3rd party APIsLicensed 3rd party
content
Feed readers
News archive (un)rated news
Consume
DS9 Runtime
{APIs}
{standardAPI}
Model meta data
Gitlab repository used for versioning and
maintenance of models with the original
software stack they were built on
Request news rating
for selected model
Return news rated with
selected model
Deploy Docker image
with selected model
Crawl/retrievenews
Rate news with
selected model
extracted news
Share models as
Docker image in
GitLab repository
Deploy and verify model
on Docker image
Publish model metadata
Optimization of
models, retraining
model rating
14. 14 © 2020 Deep SEARCH 9 GmbH14https://deepsearchnine.com
Training new channels
Corporate
Websites
News portals
News feeds
Crawlers
{APIs}
{APIs}
3rd party APIsLicensed 3rd party
content
Feed readers
News archive
DS9 Runtime
{APIs}
Model meta data
Crawl/retrievenews
Gitlab repository used for versioning and
maintenance of models with the original
software stack they were built on
Define new channel
Select sources
Manually rate news to create training data
Develop model for
new channel,
optimize, test and
deploy for
production
Share models as
Docker image in
GitLab repository
Deploy and verify model
on Docker image
Publish model
metadata
extracted news
manual rating
read training data
(un)rated news
15. 15 © 2020 Deep SEARCH 9 GmbH15https://deepsearchnine.com
Goal
Automatically retrieve news, map to channels, rate and distribute
✓
16. 16 © 2020 Deep SEARCH 9 GmbH16https://deepsearchnine.com
User interface
Subscribe to published news channels or
build personal channels
Filter on news
sources or use
tagged content for
drilling down
AI-based news rating:
Tagged with high or low importance
Manually override machine rating to
set rating to high / low
Package for redistribution
Search the archive
Link outs to original source
Ordered by prediction probability
17. 17 © 2020 Deep SEARCH 9 GmbH17https://deepsearchnine.com
Next step
Let users maintain and publish their own channels
18. 18 © 2020 Deep SEARCH 9 GmbH18https://deepsearchnine.com
Channel management
• Cloning and customize existing channel or
create a new channel
• If channel supports AI-based rating, retrain
Deep Learning-model based on user feedback
• Select only most interesting sources
• Filter news based on (complex Lucene) queries
• Users with appropriate permissions are entitled
to publish new channels for subscription by
others, alternatively publication requests need
to be authorized
19. 19 © 2020 Deep SEARCH 9 GmbH19https://deepsearchnine.com
Deep SEARCH 9
Klaus Kater
Deep SEARCH 9 GmbH
Managing Partner
https://deepsearchnine.com
Thank you for your interest and attention!
Deep SEARCH 9 would love to meet you during AI-SDV 2020
To catch-up and to let you know what Deep SEARCH 9 has been up to since we last met!
We have two virtual rooms “Nice” and “Lyon” where you can meet us
If you’d just like to pop around to see us on our stand at any time during the breaks, please go to
http://bit.ly/AI-SDV-2020-Room-Nice
to book a time in our virtual room “Lyon”
to meet us in our virtual room “Nice”
If you'd like to book a 15-minute one-to-one slot, please go to
http://bit.ly/meet_DS9_at_AI-SDV