Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Nächste SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Wird geladen in …3
×
1 von 16

Sotiria bampatzani wi_mlds_presentation_20200203

0

Teilen

Herunterladen, um offline zu lesen

Sotiria presentation, WiMLDS Limassol and Paris, Feb 2021

Sotiria bampatzani wi_mlds_presentation_20200203

  1. 1. Named Entity Recognition (NER) from a business point of view : coupling a rule-based approach with Machine Learning algorithms Sotiria Bampatzani NLP Data Engineer - QWAM Content Intelligence Paris, France
  2. 2. CONTENTS • Presentation • Named Entity Recognition (NER) • The rule-based approach • Adding Machine Learning to the mix • Use case example • Not stopping there… • Conclusion
  3. 3. PRESENTATION QWAM Content Intelligence • QWAM Content Intelligence is a solutions editor, who provides innovative software solutions for analyzing textual data and extracting insights with its AI and Semantics technologies. Search engine to manage textual and press/media content SaaS solution for real-time web information monitoring Analytics platform for extracting key information from textual data 3
  4. 4. NAMED ENTITY RECOGNITION 4 Brief introduction • 1987 : first studies on information extraction (IE) • 1991 : first study on Named Entity Recognition (NER) • 1995 : NER becomes one of the basic Natural Language Processing (NLP) tasks Named Entity Recognition Entity Identification Entity Classification How… • Rule-based approach : Annotation rules • Learning approach : word embeddings, statistical models, neural networks, etc. • Hybrid approach : combination of the rule-based and learning approaches
  5. 5. THE RULE-BASED APPROACH 5 Advantages of this approach • Robust • Accurate results • Adaptable to new types of entities Drawbacks of this approach • Based on non-contextual grammars and lexicon (gazetteer) lists, whose maintenance and update is costly • Impossible to treat all spelling variants and the resulting ambiguity • Discovering new entities is very difficult, if not impossible
  6. 6. ADDING ML TO THE MIX… 6 Hybrid approach • Creation of a dataset containing over 40M news articles with the use of one of QWAM’s solutions, Ask’n’Read • Annotation of the aforementioned dataset with the annotation rules developed by our Text Analytics team • Use of this annotated dataset in order to train ML models o RNN/LSTM, word embeddings (word2vec), BERT… However… Data preprocessing and filtering do not result in a 100% “clean” dataset. The training set also contains errors or missing annotations !
  7. 7. DATASET EXAMPLE 7 source : https://www.phonandroid.com/samsung-annonce-arrivee-smartphones-ecran-enroulable-coulissant.html
  8. 8. ADDING ML TO THE MIX… 8 Evaluation • The ML model correctly identified and classified new entities, that are added to our gazetteer lists. • Following statistical evaluation, it appeared that a number of errors resulted from specific annotation rules. These annotation rules were later improved. • The dataset is then reannotated with the enhanced annotation rules and the cycle starts anew… …And what of client data ?
  9. 9. USE CASE - EXAMPLE 9 Client data • The need to identify new types of entities arises. Extracting key information, not limited to predefined categories (person names, locations, organizations, etc.) is crucial in order to thoroughly analyze the data. • The size, oftentimes sensitive nature of the dataset, as well as the time allocated to the project, may not allow for machine learning. QWAM’s solution… • Preprocessing and annotating of the data with the “standard” application. • Identification of a priori “interesting” entities in the data, thanks to an annotation rule used for “discovering” potentially interesting information. • Use of these annotations to build a dedicated ontology.
  10. 10. USE CASE - EXAMPLE 10 QWAM Ontology Manager
  11. 11. USE CASE - EXAMPLE 11
  12. 12. USE CASE - EXAMPLE 12 How ML is a part of Ontology Manager • Suggestions of a machine learning algorithm are incorporated in the platform, and proposed to users in order to promote and facilitate ontology evolution
  13. 13. NOT STOPPING THERE… 13 Establishing relations between entities • Once named entity recognition and concept recognition are in place, the next step is to establish a link between them. o “Atos finalise le rachat de la société canadienne In Fidem” Company-buys-Company o TESSI signe un partenariat stratégique avec NEHS DIGITAL” Company-partners with-Company • Like named entity and concept recognition, the same methods are implemented. o A “standard” gazetteer with these expressions already exists, which allows for an initial annotation and recognition. o Another annotation rules is used in order to discover new expressions and relations between different types of entities. o A ML model is trained for further exploration.
  14. 14. USE CASE - EXAMPLE 14 • Unlike entities or concepts, suggestions are calculated and proposed based on the content of the whole category.
  15. 15. CONCLUSION 15 Conclusion • A rule-based approach is not sufficient if we wish to discover new entities “on the fly”. • The size, oftentimes sensitive nature of a client’s dataset, as well as the time allocated to the project, may not allow for machine learning in a business setting. • A hybrid approach seems to be the most efficient method o Adaptable to different client’s data • When doing an NLP task such as named entity recognition, more often than not, errors are cumulated at every step. • The biggest advantage of the method we use at QWAM, is an NLP Engineer’s rapid and active involvement at every step of the way.
  16. 16. THANK YOU 16

×