SlideShare a Scribd company logo
1 of 41
Download to read offline
machine learning
excellence delivered
Machine Learning
(ML) and Artificial
Intelligence (AI)
for Publishers
Machine Learning, Deep Learning and Data Science become
quite popular buzzwords these days. However, question arise:
will it change your business?
We try to answer this specifically for the book publishers.
According to some reports, up to 70% of companies will
adopt Artificial Intelligence (AI) by 2020.
…Although we find these numbers greatly exaggerated, it is
undeniable that AI is gradually changing the way all
businesses operate.
Introduction
ML Rule of Thumb
If a typical person can do a mental task with less than one second of
thought, we can probably automate it using AI either now or in the near
future.
Andrew Ng, Stanford Uni,
• Fraud detection
• Recommendations
• Customer
segmentation
• Text analysis
• Text summarization
Source: What Artificial Intelligence Can and Can’t Do Right Now
2018
Differentiable
programming
Data science
Big data
Data mining
Synthetic intelligence
Cognitive computing
Affective computing
Deep tech
Medtech
Buzzwords
Wikipedia:
Outline of artificial intelligence
Image source: Jason's Machine Learning 101 presentation, Dec 2017
TL;DR
What is / Why use ML?
TL;DR:
• Software tool for getting answers and decision making
• Mainly based on data, not rules/algorithms
• Two-step:
• Automatic extraction of “knowledge” (patterns) from past data
• Prediction on new data
Which data?
• See below…
Machine Learning Principle
Easiest ML algorithm
Is my business grow and how?
Source: Wikipedia
Your own data
(incl. customer’s)
Open data
(internet)
3rd-party data
(rare)
Data is the king!
Idea from François Chollet, Deep Learning with Python, Manning, 2017
New data Model Answers
Past data &
answers
Model
Patterns found in
past data
Deep Learning: Image Recognition
Determine (automatically!)
image characteristics
DecisionCombination of
simple algorithms
Source: Jason's Machine Learning 101 presentation, Dec 2017
Nearly human performance in
• Image recognition
• Object detection:
• Images
• Video
• Translation
Deep Learning: revolution
Image source: Xavier Giro-o-Nieto
2012 contest
DL example: Google Translate …
ыво алолд фыалллл ооооооооооо
оооооооооооо фЛОЛД игишщгпр
оооооооооооо шгигигшщр оооооооооо
“Mongolian”
?
German
ооооо злолтщшолд
оооооооооооооооооооооооооооооооо
ваоифтваомщофит
оооооооооооооооооооощватмшщтощт
щт
“Mongolian”
?
English
Google Translate fault (or joke…)
ыво алолд фыалллл ооооооооооо
оооооооооооо фЛОЛД игишщгпр
оооооооооооо шгигигшщр оооооооооо
“Mongolian”
Du musst dich selbst töten
warf seinen Kopf auf seine Knie
German
ооооо злолтщшолд
оооооооооооооооооооооооооооооооо
ваоифтваомщофит
оооооооооооооооооооощватмшщтощт
щт
“Mongolian”
Have you forgotten that you have not
forgiven yourself yet?
English
Machine Learning in Publishing
What data to process?
• Structured
• Sales data
• Web search statistics, etc.
• Access records / reading stats for eBooks
• Semi-structured
• Invoices, other business documents
• Reports, editor reviews
• Free-form
• Book texts
• Customer reviews (like Amazon)
Reading statistics (averaged)
Source: internal SilkCode / LookUP! statistics
Page / Section / Magazine article
• Recommendations
• Reading profile analysis
• Improved by text analysis
• Recommend similar
material
• Search improvements
• Feedback to publisher
• Personalized ads
• Reader's notes / highlights:
• Interest / importance
• Complexity (education)
Reading statistics: applications
Future: Hits ordering
by popularity
Recommendation systems
LookUP! perspective:
“You may also be interested in …”
feature
• Recommend book/journal
• …similar to a current one
• …similar to all material
read
• …read by users with
similar interests
• Analysis of reading habits
(profile)
Recommendation systems
Image source: https://medium.com/@humansforai/recommendation-engines-e431b6b6b446
Recommendation systems
Book similarity
• Metadata
• Author
• Publisher’s descriptions
• …
• NLP-extracted keywords
• Applications:
• Editorial process
automation
• Book recommendation
Readers with different
interests within the same
material
Test data: book plots
Wikipedia ID 1166383
Freebase ID /m/04cvx9
Book title White Noise
Book author Don DeLillo
Publication date 1985-01-21
Genres
Novel, Postmodernism, Speculative fiction,
Fiction
Book metadata
Plot summary
Set at a bucolic Midwestern college known only as The-College-on-the-Hill, White Noise follows a
year in the life of Jack Gladney, a professor who has made his name by pioneering the field of
Hitler Studies (though he hasn't taken German language lessons until this year). He has been
married five times to four women and has a brood of children and stepchildren (Heinrich, Denise,
Steffie, Wilder) with his current wife, Babette.
...
Source: http://www.cs.cmu.edu/~dbamman/booksummaries.html
Text similarity explained
Document collection as a map
Source: F.V. Paulovich, R. Minghim “Text Map Explorer: a Tool to Create and Explore Document Maps”, DOI:10.1109/IV.2006.104
Key sentences
Source: https://www.publishersweekly.com/pw/by-topic/international/Frankfurt-Book-Fair/article/78118-frankfurt-book-fair-2018-preview.html
Named entities
Source: https://www.publishersweekly.com/pw/by-topic/international/Frankfurt-Book-Fair/article/78118-frankfurt-book-fair-2018-preview.html
Persons
Places
Events etc.
Named Entities
Wikipedia
LookUP! perspective:
Automatic Wikipedia
Linking
Sentiment analysis
Quickly find what readers think about your books
Source: https://text-processing.com/demo/sentiment/
Sentiment analysis
LookUP! perspective:
Notes analysis
… seltsam …
Wichtig …
Verstehe nicht
Sic!
Headquarters:
Silk AI lines LLC
Hikaly 3-7
220005 Minsk
Belarus
Partner in EU:
SilkCode GmbH
Luisenstraße 62
D-47799 Krefeld
Germany
www.silkdata.ai
hello@silkdata.ai
+49 (2151) 387 3531
EXTRA SLIDES
Multi-iterative
process
More details of ML pipeline
Data
collection
Exploration
Visualization
Data
cleaning
Dataset
splitting
Feature
selection
Model
selection
Model
training
Testing &
Evaluation
Text processing pipeline
• Input format analysis and conversion (HTML, PDF, …)
• Language identification
• Sentence segmentation
• POS-tagging and lemmatization
• Stop-word removal
• ML models
• TF-IDF
• LDA and friends
• Word embedding
• NER
• Summarization
• …
• Display of result within original text
Approaches and business values
• Structured data
• Prediction (market, customer churn)
• Book recommendation
• Semi-structured
• Document processing automation
• Document management
• Data extraction
• Unstructured data
• NLP (natural language processing)
• Information extraction
• Content similarity
Special terms
• .net, asp.net
• angular.js
• *nix (= Unix and/or Linux)
• C++ / C#
• SAP R/3, B2B, 3D, 7z
• All kinds of noise
• Quotes
• Ligatures
• Dashes
• Complex Unicode
• Always mix with English
Topic analysis: IT Vacancy ads
Preprocessing issues
Topic analysis
Documents
Words
≈
Statistical mix
Documents
Topics
Topics
Words
Technical terms
Marketing
Process
Topic modeling example
Source: http://www.princeton.edu/~achaney/tmve/wiki100k/docs/The_Hitchhiker_s_Guide_to_the_Galaxy.html
Exploratory navigation over Wikipedia
• TextRank algorithm
• Graph-based algorithm
proposed in 2004
• Capable to extract keywords
and key sentences
• Still in active development
• Heavily cited and extended
approach
• Known usage: O’Reilly
Document summarization
Data Obfuscation: Example
Allstate’s contest on Kaggle:
• https://www.kaggle.com/c/allstate-claims-severity/data
• “Each row in this dataset represents an insurance claim. You
must predict the value for the 'loss' column. Variables prefaced with
'cat' are categorical, while those prefaced with 'cont' are
continuous.”
Obfuscation
• No real names / categorical values
• Possible linear mix / rescaling of numerical values
• Patterns still remain and can be extracted by model!!!
id cat1 cat2 cat3 cat4 cat5 cat6 cat7 … cat116
1 A B A A A A B … LB
cont1 cont2 cont3 cont4 cont5 cont6 cont7 cont8 … loss
0.7263 0.2459 0.1875 0.7896 0.31006 0.71836 0.3350 0.3026 … 2213.18
Platforms and libraries
• Python
• Gensim
• SpaCy
• NLTK
• patterns
• Java
• OpenNLP
• Deep Learning
• Caffe
• Keras / Tensorflow
• Client solutions: browser or mobile

More Related Content

Similar to Silk data - machine learning

Artificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceArtificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceAbhishek Upadhyay
 
AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...
AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...
AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...AILABS Academy
 
Natural language Analysis
Natural language AnalysisNatural language Analysis
Natural language AnalysisRudradeb Mitra
 
AI Technology Overview and Career Advice
AI Technology Overview and Career AdviceAI Technology Overview and Career Advice
AI Technology Overview and Career AdviceKunling Geng
 
Data Science Workshop - day 1
Data Science Workshop - day 1Data Science Workshop - day 1
Data Science Workshop - day 1Aseel Addawood
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...Stefan Dietze
 
Short URLs, Big Fun
Short URLs, Big FunShort URLs, Big Fun
Short URLs, Big FunHilary Mason
 
Lecture 7: Learning from Massive Datasets
Lecture 7: Learning from Massive DatasetsLecture 7: Learning from Massive Datasets
Lecture 7: Learning from Massive DatasetsMarina Santini
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectbodaceacat
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSara-Jayne Terp
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Andre Freitas
 
Text-mining and Automation
Text-mining and AutomationText-mining and Automation
Text-mining and Automationbenosteen
 
Real World NLP, ML, and Big Data
Real World NLP, ML, and Big DataReal World NLP, ML, and Big Data
Real World NLP, ML, and Big DataDevin Bost
 
Jdb code biology and ai final
Jdb code biology and ai finalJdb code biology and ai final
Jdb code biology and ai finalJoachim De Beule
 
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...Maryam Farooq
 
Brave new search world
Brave new search worldBrave new search world
Brave new search worldvoginip
 
New Frontiers in IA: Design in the Era of Cognitive Computing
New Frontiers in IA: Design in the Era of Cognitive ComputingNew Frontiers in IA: Design in the Era of Cognitive Computing
New Frontiers in IA: Design in the Era of Cognitive ComputingPaul King
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextSeth Grimes
 
Introduction for skills seminar on Search and Data Mining, Master of European...
Introduction for skills seminar on Search and Data Mining, Master of European...Introduction for skills seminar on Search and Data Mining, Master of European...
Introduction for skills seminar on Search and Data Mining, Master of European...Gerben Zaagsma
 

Similar to Silk data - machine learning (20)

Artificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceArtificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of Intelligence
 
AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...
AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...
AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...
 
Natural language Analysis
Natural language AnalysisNatural language Analysis
Natural language Analysis
 
AI Technology Overview and Career Advice
AI Technology Overview and Career AdviceAI Technology Overview and Career Advice
AI Technology Overview and Career Advice
 
Oss swot
Oss swotOss swot
Oss swot
 
Data Science Workshop - day 1
Data Science Workshop - day 1Data Science Workshop - day 1
Data Science Workshop - day 1
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
 
Short URLs, Big Fun
Short URLs, Big FunShort URLs, Big Fun
Short URLs, Big Fun
 
Lecture 7: Learning from Massive Datasets
Lecture 7: Learning from Massive DatasetsLecture 7: Learning from Massive Datasets
Lecture 7: Learning from Massive Datasets
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)
 
Text-mining and Automation
Text-mining and AutomationText-mining and Automation
Text-mining and Automation
 
Real World NLP, ML, and Big Data
Real World NLP, ML, and Big DataReal World NLP, ML, and Big Data
Real World NLP, ML, and Big Data
 
Jdb code biology and ai final
Jdb code biology and ai finalJdb code biology and ai final
Jdb code biology and ai final
 
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
 
Brave new search world
Brave new search worldBrave new search world
Brave new search world
 
New Frontiers in IA: Design in the Era of Cognitive Computing
New Frontiers in IA: Design in the Era of Cognitive ComputingNew Frontiers in IA: Design in the Era of Cognitive Computing
New Frontiers in IA: Design in the Era of Cognitive Computing
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
 
Introduction for skills seminar on Search and Data Mining, Master of European...
Introduction for skills seminar on Search and Data Mining, Master of European...Introduction for skills seminar on Search and Data Mining, Master of European...
Introduction for skills seminar on Search and Data Mining, Master of European...
 

Recently uploaded

Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubssamaasim06
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardsticksaastr
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxmohammadalnahdi22
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsaqsarehman5055
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMoumonDas2
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfSenaatti-kiinteistöt
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxNikitaBankoti2
 

Recently uploaded (20)

Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptx
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 

Silk data - machine learning

  • 2. Machine Learning (ML) and Artificial Intelligence (AI) for Publishers
  • 3. Machine Learning, Deep Learning and Data Science become quite popular buzzwords these days. However, question arise: will it change your business? We try to answer this specifically for the book publishers.
  • 4. According to some reports, up to 70% of companies will adopt Artificial Intelligence (AI) by 2020. …Although we find these numbers greatly exaggerated, it is undeniable that AI is gradually changing the way all businesses operate.
  • 6. ML Rule of Thumb If a typical person can do a mental task with less than one second of thought, we can probably automate it using AI either now or in the near future. Andrew Ng, Stanford Uni, • Fraud detection • Recommendations • Customer segmentation • Text analysis • Text summarization Source: What Artificial Intelligence Can and Can’t Do Right Now
  • 7. 2018 Differentiable programming Data science Big data Data mining Synthetic intelligence Cognitive computing Affective computing Deep tech Medtech Buzzwords Wikipedia: Outline of artificial intelligence Image source: Jason's Machine Learning 101 presentation, Dec 2017
  • 8. TL;DR What is / Why use ML? TL;DR: • Software tool for getting answers and decision making • Mainly based on data, not rules/algorithms • Two-step: • Automatic extraction of “knowledge” (patterns) from past data • Prediction on new data Which data? • See below…
  • 10. Easiest ML algorithm Is my business grow and how? Source: Wikipedia
  • 11. Your own data (incl. customer’s) Open data (internet) 3rd-party data (rare) Data is the king! Idea from François Chollet, Deep Learning with Python, Manning, 2017 New data Model Answers Past data & answers Model Patterns found in past data
  • 12. Deep Learning: Image Recognition Determine (automatically!) image characteristics DecisionCombination of simple algorithms Source: Jason's Machine Learning 101 presentation, Dec 2017
  • 13. Nearly human performance in • Image recognition • Object detection: • Images • Video • Translation Deep Learning: revolution Image source: Xavier Giro-o-Nieto 2012 contest
  • 14. DL example: Google Translate … ыво алолд фыалллл ооооооооооо оооооооооооо фЛОЛД игишщгпр оооооооооооо шгигигшщр оооооооооо “Mongolian” ? German ооооо злолтщшолд оооооооооооооооооооооооооооооооо ваоифтваомщофит оооооооооооооооооооощватмшщтощт щт “Mongolian” ? English
  • 15. Google Translate fault (or joke…) ыво алолд фыалллл ооооооооооо оооооооооооо фЛОЛД игишщгпр оооооооооооо шгигигшщр оооооооооо “Mongolian” Du musst dich selbst töten warf seinen Kopf auf seine Knie German ооооо злолтщшолд оооооооооооооооооооооооооооооооо ваоифтваомщофит оооооооооооооооооооощватмшщтощт щт “Mongolian” Have you forgotten that you have not forgiven yourself yet? English
  • 16. Machine Learning in Publishing
  • 17. What data to process? • Structured • Sales data • Web search statistics, etc. • Access records / reading stats for eBooks • Semi-structured • Invoices, other business documents • Reports, editor reviews • Free-form • Book texts • Customer reviews (like Amazon)
  • 18. Reading statistics (averaged) Source: internal SilkCode / LookUP! statistics Page / Section / Magazine article
  • 19. • Recommendations • Reading profile analysis • Improved by text analysis • Recommend similar material • Search improvements • Feedback to publisher • Personalized ads • Reader's notes / highlights: • Interest / importance • Complexity (education) Reading statistics: applications Future: Hits ordering by popularity
  • 20. Recommendation systems LookUP! perspective: “You may also be interested in …” feature
  • 21. • Recommend book/journal • …similar to a current one • …similar to all material read • …read by users with similar interests • Analysis of reading habits (profile) Recommendation systems Image source: https://medium.com/@humansforai/recommendation-engines-e431b6b6b446
  • 22. Recommendation systems Book similarity • Metadata • Author • Publisher’s descriptions • … • NLP-extracted keywords • Applications: • Editorial process automation • Book recommendation Readers with different interests within the same material
  • 23. Test data: book plots Wikipedia ID 1166383 Freebase ID /m/04cvx9 Book title White Noise Book author Don DeLillo Publication date 1985-01-21 Genres Novel, Postmodernism, Speculative fiction, Fiction Book metadata Plot summary Set at a bucolic Midwestern college known only as The-College-on-the-Hill, White Noise follows a year in the life of Jack Gladney, a professor who has made his name by pioneering the field of Hitler Studies (though he hasn't taken German language lessons until this year). He has been married five times to four women and has a brood of children and stepchildren (Heinrich, Denise, Steffie, Wilder) with his current wife, Babette. ... Source: http://www.cs.cmu.edu/~dbamman/booksummaries.html
  • 25. Document collection as a map Source: F.V. Paulovich, R. Minghim “Text Map Explorer: a Tool to Create and Explore Document Maps”, DOI:10.1109/IV.2006.104
  • 29. Sentiment analysis Quickly find what readers think about your books Source: https://text-processing.com/demo/sentiment/
  • 30. Sentiment analysis LookUP! perspective: Notes analysis … seltsam … Wichtig … Verstehe nicht Sic!
  • 31. Headquarters: Silk AI lines LLC Hikaly 3-7 220005 Minsk Belarus Partner in EU: SilkCode GmbH Luisenstraße 62 D-47799 Krefeld Germany www.silkdata.ai hello@silkdata.ai +49 (2151) 387 3531
  • 33. Multi-iterative process More details of ML pipeline Data collection Exploration Visualization Data cleaning Dataset splitting Feature selection Model selection Model training Testing & Evaluation
  • 34. Text processing pipeline • Input format analysis and conversion (HTML, PDF, …) • Language identification • Sentence segmentation • POS-tagging and lemmatization • Stop-word removal • ML models • TF-IDF • LDA and friends • Word embedding • NER • Summarization • … • Display of result within original text
  • 35. Approaches and business values • Structured data • Prediction (market, customer churn) • Book recommendation • Semi-structured • Document processing automation • Document management • Data extraction • Unstructured data • NLP (natural language processing) • Information extraction • Content similarity
  • 36. Special terms • .net, asp.net • angular.js • *nix (= Unix and/or Linux) • C++ / C# • SAP R/3, B2B, 3D, 7z • All kinds of noise • Quotes • Ligatures • Dashes • Complex Unicode • Always mix with English Topic analysis: IT Vacancy ads Preprocessing issues
  • 38. Topic modeling example Source: http://www.princeton.edu/~achaney/tmve/wiki100k/docs/The_Hitchhiker_s_Guide_to_the_Galaxy.html Exploratory navigation over Wikipedia
  • 39. • TextRank algorithm • Graph-based algorithm proposed in 2004 • Capable to extract keywords and key sentences • Still in active development • Heavily cited and extended approach • Known usage: O’Reilly Document summarization
  • 40. Data Obfuscation: Example Allstate’s contest on Kaggle: • https://www.kaggle.com/c/allstate-claims-severity/data • “Each row in this dataset represents an insurance claim. You must predict the value for the 'loss' column. Variables prefaced with 'cat' are categorical, while those prefaced with 'cont' are continuous.” Obfuscation • No real names / categorical values • Possible linear mix / rescaling of numerical values • Patterns still remain and can be extracted by model!!! id cat1 cat2 cat3 cat4 cat5 cat6 cat7 … cat116 1 A B A A A A B … LB cont1 cont2 cont3 cont4 cont5 cont6 cont7 cont8 … loss 0.7263 0.2459 0.1875 0.7896 0.31006 0.71836 0.3350 0.3026 … 2213.18
  • 41. Platforms and libraries • Python • Gensim • SpaCy • NLTK • patterns • Java • OpenNLP • Deep Learning • Caffe • Keras / Tensorflow • Client solutions: browser or mobile