SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Named Entity Recognition (NER) from a business point of view :
coupling a rule-based approach with Machine Learning algorithms
Sotiria Bampatzani
NLP Data Engineer - QWAM Content Intelligence
Paris, France
CONTENTS
• Presentation
• Named Entity Recognition (NER)
• The rule-based approach
• Adding Machine Learning to the mix
• Use case example
• Not stopping there…
• Conclusion
PRESENTATION
QWAM Content Intelligence
• QWAM Content Intelligence is a solutions editor, who provides innovative software
solutions for analyzing textual data and extracting insights with its AI and Semantics
technologies.
Search engine to manage textual and press/media content
SaaS solution for real-time web information monitoring
Analytics platform for extracting key information from
textual data
3
NAMED ENTITY RECOGNITION
4
Brief introduction
• 1987 : first studies on information extraction (IE)
• 1991 : first study on Named Entity Recognition (NER)
• 1995 : NER becomes one of the basic Natural Language Processing (NLP) tasks
Named Entity Recognition
Entity Identification Entity Classification
How…
• Rule-based approach : Annotation rules
• Learning approach : word embeddings, statistical models, neural networks, etc.
• Hybrid approach : combination of the rule-based and learning approaches
THE RULE-BASED APPROACH
5
Advantages of this approach
• Robust
• Accurate results
• Adaptable to new types of entities
Drawbacks of this approach
• Based on non-contextual grammars and lexicon (gazetteer) lists, whose maintenance
and update is costly
• Impossible to treat all spelling variants and the resulting ambiguity
• Discovering new entities is very difficult, if not impossible
ADDING ML TO THE MIX…
6
Hybrid approach
• Creation of a dataset containing over 40M news articles with the use of one of
QWAM’s solutions, Ask’n’Read
• Annotation of the aforementioned dataset with the annotation rules developed by our
Text Analytics team
• Use of this annotated dataset in order to train ML models
o RNN/LSTM, word embeddings (word2vec), BERT…
However…
Data preprocessing and filtering do not result in a 100% “clean” dataset.
The training set also contains errors or missing annotations !
DATASET EXAMPLE
7
source : https://www.phonandroid.com/samsung-annonce-arrivee-smartphones-ecran-enroulable-coulissant.html
ADDING ML TO THE MIX…
8
Evaluation
• The ML model correctly identified and classified new entities, that are added to our
gazetteer lists.
• Following statistical evaluation, it appeared that a number of errors resulted from
specific annotation rules. These annotation rules were later improved.
• The dataset is then reannotated with the enhanced annotation rules and the cycle
starts anew…
…And what of client data ?
USE CASE - EXAMPLE
9
Client data
• The need to identify new types of entities arises. Extracting key information, not
limited to predefined categories (person names, locations, organizations, etc.) is
crucial in order to thoroughly analyze the data.
• The size, oftentimes sensitive nature of the dataset, as well as the time allocated to
the project, may not allow for machine learning.
QWAM’s solution…
• Preprocessing and annotating of the data with the “standard” application.
• Identification of a priori “interesting” entities in the data, thanks to an annotation rule
used for “discovering” potentially interesting information.
• Use of these annotations to build a dedicated ontology.
USE CASE - EXAMPLE
10
QWAM Ontology Manager
USE CASE - EXAMPLE
11
USE CASE - EXAMPLE
12
How ML is a part of
Ontology Manager
• Suggestions of a machine
learning algorithm are
incorporated in the platform,
and proposed to users in
order to promote and
facilitate ontology evolution
NOT STOPPING THERE…
13
Establishing relations between entities
• Once named entity recognition and concept recognition are in place, the next step is
to establish a link between them.
o “Atos finalise le rachat de la société canadienne In Fidem”
Company-buys-Company
o TESSI signe un partenariat stratégique avec NEHS DIGITAL”
Company-partners with-Company
• Like named entity and concept recognition, the same methods are implemented.
o A “standard” gazetteer with these expressions already exists, which allows for an
initial annotation and recognition.
o Another annotation rules is used in order to discover new expressions and
relations between different types of entities.
o A ML model is trained for further exploration.
USE CASE - EXAMPLE
14
• Unlike entities or concepts, suggestions are calculated and proposed
based on the content of the whole category.
CONCLUSION
15
Conclusion
• A rule-based approach is not sufficient if we wish to discover new entities “on the fly”.
• The size, oftentimes sensitive nature of a client’s dataset, as well as the time allocated
to the project, may not allow for machine learning in a business setting.
• A hybrid approach seems to be the most efficient method
o Adaptable to different client’s data
• When doing an NLP task such as named entity recognition, more often than not,
errors are cumulated at every step.
• The biggest advantage of the method we use at QWAM, is an NLP Engineer’s rapid
and active involvement at every step of the way.
THANK YOU
16

Weitere ähnliche Inhalte

Ähnlich wie Sotiria bampatzani wi_mlds_presentation_20200203

Mining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxMining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docx
ARIV4
 
McKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxMcKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docx
andreecapon
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
javed75
 

Ähnlich wie Sotiria bampatzani wi_mlds_presentation_20200203 (20)

Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
 
Machine learning
Machine learningMachine learning
Machine learning
 
Prescriptive Analytics-1.pptx
Prescriptive Analytics-1.pptxPrescriptive Analytics-1.pptx
Prescriptive Analytics-1.pptx
 
Eckovation Machine Learning
Eckovation Machine LearningEckovation Machine Learning
Eckovation Machine Learning
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Analytics demystified
Analytics demystifiedAnalytics demystified
Analytics demystified
 
artificggggggggggggggialintelligence.pdf
artificggggggggggggggialintelligence.pdfartificggggggggggggggialintelligence.pdf
artificggggggggggggggialintelligence.pdf
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Text Analytics for Non-Experts
Text Analytics for Non-ExpertsText Analytics for Non-Experts
Text Analytics for Non-Experts
 
CC TEL- Simulation-based co-design of algorithms
CC TEL- Simulation-based co-design of algorithmsCC TEL- Simulation-based co-design of algorithms
CC TEL- Simulation-based co-design of algorithms
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdf
 
Mining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxMining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docx
 
Implementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White PaperImplementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White Paper
 
Ai in finance
Ai in financeAi in finance
Ai in finance
 
Introduction to data science.pdf
Introduction to data science.pdfIntroduction to data science.pdf
Introduction to data science.pdf
 
Artificial Intelligence
Artificial Intelligence  Artificial Intelligence
Artificial Intelligence
 
McKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxMcKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docx
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 

Mehr von Paris Women in Machine Learning and Data Science

Mehr von Paris Women in Machine Learning and Data Science (20)

Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
How and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe DaudierHow and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe Daudier
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Perspectives, by M. Pannegeon
 
Evaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled dataEvaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled data
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
 

Kürzlich hochgeladen

Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 

Kürzlich hochgeladen (20)

Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 

Sotiria bampatzani wi_mlds_presentation_20200203

  • 1. Named Entity Recognition (NER) from a business point of view : coupling a rule-based approach with Machine Learning algorithms Sotiria Bampatzani NLP Data Engineer - QWAM Content Intelligence Paris, France
  • 2. CONTENTS • Presentation • Named Entity Recognition (NER) • The rule-based approach • Adding Machine Learning to the mix • Use case example • Not stopping there… • Conclusion
  • 3. PRESENTATION QWAM Content Intelligence • QWAM Content Intelligence is a solutions editor, who provides innovative software solutions for analyzing textual data and extracting insights with its AI and Semantics technologies. Search engine to manage textual and press/media content SaaS solution for real-time web information monitoring Analytics platform for extracting key information from textual data 3
  • 4. NAMED ENTITY RECOGNITION 4 Brief introduction • 1987 : first studies on information extraction (IE) • 1991 : first study on Named Entity Recognition (NER) • 1995 : NER becomes one of the basic Natural Language Processing (NLP) tasks Named Entity Recognition Entity Identification Entity Classification How… • Rule-based approach : Annotation rules • Learning approach : word embeddings, statistical models, neural networks, etc. • Hybrid approach : combination of the rule-based and learning approaches
  • 5. THE RULE-BASED APPROACH 5 Advantages of this approach • Robust • Accurate results • Adaptable to new types of entities Drawbacks of this approach • Based on non-contextual grammars and lexicon (gazetteer) lists, whose maintenance and update is costly • Impossible to treat all spelling variants and the resulting ambiguity • Discovering new entities is very difficult, if not impossible
  • 6. ADDING ML TO THE MIX… 6 Hybrid approach • Creation of a dataset containing over 40M news articles with the use of one of QWAM’s solutions, Ask’n’Read • Annotation of the aforementioned dataset with the annotation rules developed by our Text Analytics team • Use of this annotated dataset in order to train ML models o RNN/LSTM, word embeddings (word2vec), BERT… However… Data preprocessing and filtering do not result in a 100% “clean” dataset. The training set also contains errors or missing annotations !
  • 7. DATASET EXAMPLE 7 source : https://www.phonandroid.com/samsung-annonce-arrivee-smartphones-ecran-enroulable-coulissant.html
  • 8. ADDING ML TO THE MIX… 8 Evaluation • The ML model correctly identified and classified new entities, that are added to our gazetteer lists. • Following statistical evaluation, it appeared that a number of errors resulted from specific annotation rules. These annotation rules were later improved. • The dataset is then reannotated with the enhanced annotation rules and the cycle starts anew… …And what of client data ?
  • 9. USE CASE - EXAMPLE 9 Client data • The need to identify new types of entities arises. Extracting key information, not limited to predefined categories (person names, locations, organizations, etc.) is crucial in order to thoroughly analyze the data. • The size, oftentimes sensitive nature of the dataset, as well as the time allocated to the project, may not allow for machine learning. QWAM’s solution… • Preprocessing and annotating of the data with the “standard” application. • Identification of a priori “interesting” entities in the data, thanks to an annotation rule used for “discovering” potentially interesting information. • Use of these annotations to build a dedicated ontology.
  • 10. USE CASE - EXAMPLE 10 QWAM Ontology Manager
  • 11. USE CASE - EXAMPLE 11
  • 12. USE CASE - EXAMPLE 12 How ML is a part of Ontology Manager • Suggestions of a machine learning algorithm are incorporated in the platform, and proposed to users in order to promote and facilitate ontology evolution
  • 13. NOT STOPPING THERE… 13 Establishing relations between entities • Once named entity recognition and concept recognition are in place, the next step is to establish a link between them. o “Atos finalise le rachat de la société canadienne In Fidem” Company-buys-Company o TESSI signe un partenariat stratégique avec NEHS DIGITAL” Company-partners with-Company • Like named entity and concept recognition, the same methods are implemented. o A “standard” gazetteer with these expressions already exists, which allows for an initial annotation and recognition. o Another annotation rules is used in order to discover new expressions and relations between different types of entities. o A ML model is trained for further exploration.
  • 14. USE CASE - EXAMPLE 14 • Unlike entities or concepts, suggestions are calculated and proposed based on the content of the whole category.
  • 15. CONCLUSION 15 Conclusion • A rule-based approach is not sufficient if we wish to discover new entities “on the fly”. • The size, oftentimes sensitive nature of a client’s dataset, as well as the time allocated to the project, may not allow for machine learning in a business setting. • A hybrid approach seems to be the most efficient method o Adaptable to different client’s data • When doing an NLP task such as named entity recognition, more often than not, errors are cumulated at every step. • The biggest advantage of the method we use at QWAM, is an NLP Engineer’s rapid and active involvement at every step of the way.