SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Alena Vasilevich
Computational Linguist@Coreon
ML-powered Taxonomization:
AI Lends Taxonomists a Hand
alena@coreon.com
https://www.linkedin.com/company/coreon-gmbh
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH
Coreon: Backbone for the Data
convert data into knowledge
incorporate terminologies
taxonomies
ontologies
thesauri
vocabularies
into one Knowledge Graph
concept-oriented and language-agnostic
data model
intuitive, lightweight UI
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH 3
Agenda
Structured data and IATE as a resource
Manual Taxonomization
Automatic Taxonomization
Collaborative-AI Approach
Industry use case: topic classification
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH
Perks of Structured Data
4
A Powerful Resource for AI/ML projects
Cross-lingual Data Analysis
Enterprise Search
Actionable intelligence
Cross-border Interoperability
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH
Interactive Terminology for Europe, IATE
5
Introduced in 2004, used by most EU Institutions,
covers all EU domains
Recent focus on healthcare, financial crisis,
environment, fisheries, and migration
EuroVoc for domain classification system
Number of concepts: 961 116
Number of terms: 7 992 325
New terms last week: 1 646
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH
Study Objective
TODO: draft a deeply-structured taxonomy, _fast_ 🚀
INPUT: a flat set of COVID concepts, no relations between them
OUTPUT: a hierarchical knowledge graph,
consumable via REST/SPARQL
6
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH
Two Tested Approaches
7
export 424 Covid concepts
into TBX file
agree on
• top level nodes
• max leaf size
study domain study domain
measure and compare
• time
• edit actions
• taxonomies
load into Coreon
build taxonmy from
scratch
taxonomize
automatically
name generated concepts
move wrongly grouped
semi-automatic manual
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH 8
Manual Taxonomization
 Top level nodes, temporary
helper buckets, and lots to
do…
 Concept card displaying
important metadata
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH
Manual Taxonomization: Editing Actions
9
dragging ‘inflammation’
from ‘diseases’
to ‘immune system’
drag’n’drop
pin
filter
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH
Manual Taxonomization: Result
10
load into Coreon
build taxonomy
from scratch
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH
Auto-Taxonomization:
Data + WE + Community Detection Algorithm
11
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH
Human Revision of AI-Drafted Taxonomy
12
initial situation
after automatic
taxonomization
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH
Good Clustering, Bad Clustering?
13
 55 clusters, majority pretty accurate
 some clusters are off, and we blame WE:
 ‘interstitial space’ and ‘hospital pharmacy’
 spaces appearing in similar “semantic neighborhoods”
 some existing IATE concepts became parents of concept clusters
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH
Collaborative Taxonomization: Result
14
automatic
taxonomization using
ML algorithm
name auto concepts
move wrong concepts
load into Coreon
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH
Effort and Perfomance
15
Metrics
Taxonomization
Manual Semi-Automatic
Curator‘s recorded time (hours) 40 8
Relations created / changed 1 147 432
Concepts created 115 28
Intermediate structural nodes renamed — 45
Overall relations 679 470
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH
Resulting Taxonomies
16
load into Coreon
build taxonmy
from scratch
automatic
taxonomization using
ML algorithm
name auto concepts
move wrong concepts
load into Coreon
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH
Taxonomization Benefits
17
Effective way to add structure to data
Improve data quality
avoid duplicates and overlapping concepts
associative relations
Easier and safer data maintenance
Formalize multilingual knowledge,
make it machine-digestible
Boost performance of AI algorithms,
priming them with structured data
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH
Text Classification
18
label
learn
tune
test
F1 score
classify
training
documents
production
documents
split Training/Dev/Test
1500 / 300 / 800
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH
CNN Predictions and True Labels / Test Set
19
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH
Document Classification: Metrics
20
Classifiers
Metrics (micro-averages), %
Precision Recall F-1
Non-initialized CNN 81.6 75.4 78.4
Initialized CNN 82.5 78.8 80.6
🤗 Zero-shot 0.95 threshold 15.3 37.8 21.7
🤗 Zero-shot 0.97 threshold 15.0 26.3 19.1
🤗 Zero-shot 0.99 threshold 12.0 10.2 11.0
15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH
Taxonomized Data to Enhance AI Performance
21
Label documents automatically
Boost CNN with taxonomy
Enjoy finer granularity in document
classification
Get multilingual for free
Thank You!
22
@coreonapp
@lennyvasilevich
https://www.linkedin.com/in/alenavasilevich
alena@coreon.com

Weitere ähnliche Inhalte

Mehr von Connected Data World

In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data ModelConnected Data World
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseConnected Data World
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Connected Data World
 
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Connected Data World
 
Semantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleSemantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleConnected Data World
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Connected Data World
 
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the WebConnected Data World
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsConnected Data World
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...Connected Data World
 
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGOConnected Data World
 
What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?Connected Data World
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesConnected Data World
 
Develop A Basic Recommendation System using Cypher
Develop A Basic Recommendation System using CypherDevelop A Basic Recommendation System using Cypher
Develop A Basic Recommendation System using CypherConnected Data World
 
A Semi-Automatic Tool for Linked Data Integration
A Semi-Automatic Tool for Linked Data IntegrationA Semi-Automatic Tool for Linked Data Integration
A Semi-Automatic Tool for Linked Data IntegrationConnected Data World
 
One Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACLOne Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACLConnected Data World
 
Dow Jones: Reimagining the News as a Knowledge Graph
Dow Jones: Reimagining the News as a Knowledge GraphDow Jones: Reimagining the News as a Knowledge Graph
Dow Jones: Reimagining the News as a Knowledge GraphConnected Data World
 
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...Connected Data World
 
Graph intelligence: the future of data-driven investigations
Graph intelligence: the future of data-driven investigationsGraph intelligence: the future of data-driven investigations
Graph intelligence: the future of data-driven investigationsConnected Data World
 

Mehr von Connected Data World (20)

In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data Model
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
 
Graph Realities
Graph RealitiesGraph Realities
Graph Realities
 
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
 
Semantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleSemantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scale
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
 
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the Web
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property Graphs
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
 
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGO
 
What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
 
Develop A Basic Recommendation System using Cypher
Develop A Basic Recommendation System using CypherDevelop A Basic Recommendation System using Cypher
Develop A Basic Recommendation System using Cypher
 
A Semi-Automatic Tool for Linked Data Integration
A Semi-Automatic Tool for Linked Data IntegrationA Semi-Automatic Tool for Linked Data Integration
A Semi-Automatic Tool for Linked Data Integration
 
One Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACLOne Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACL
 
Dow Jones: Reimagining the News as a Knowledge Graph
Dow Jones: Reimagining the News as a Knowledge GraphDow Jones: Reimagining the News as a Knowledge Graph
Dow Jones: Reimagining the News as a Knowledge Graph
 
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
 
Graph intelligence: the future of data-driven investigations
Graph intelligence: the future of data-driven investigationsGraph intelligence: the future of data-driven investigations
Graph intelligence: the future of data-driven investigations
 

Kürzlich hochgeladen

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Kürzlich hochgeladen (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Machine Learning-powered Taxonomization: AI Lends Taxonomists a Hand

  • 1. Alena Vasilevich Computational Linguist@Coreon ML-powered Taxonomization: AI Lends Taxonomists a Hand alena@coreon.com https://www.linkedin.com/company/coreon-gmbh
  • 2. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH Coreon: Backbone for the Data convert data into knowledge incorporate terminologies taxonomies ontologies thesauri vocabularies into one Knowledge Graph concept-oriented and language-agnostic data model intuitive, lightweight UI
  • 3. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH 3 Agenda Structured data and IATE as a resource Manual Taxonomization Automatic Taxonomization Collaborative-AI Approach Industry use case: topic classification
  • 4. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH Perks of Structured Data 4 A Powerful Resource for AI/ML projects Cross-lingual Data Analysis Enterprise Search Actionable intelligence Cross-border Interoperability
  • 5. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH Interactive Terminology for Europe, IATE 5 Introduced in 2004, used by most EU Institutions, covers all EU domains Recent focus on healthcare, financial crisis, environment, fisheries, and migration EuroVoc for domain classification system Number of concepts: 961 116 Number of terms: 7 992 325 New terms last week: 1 646
  • 6. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH Study Objective TODO: draft a deeply-structured taxonomy, _fast_ 🚀 INPUT: a flat set of COVID concepts, no relations between them OUTPUT: a hierarchical knowledge graph, consumable via REST/SPARQL 6
  • 7. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH Two Tested Approaches 7 export 424 Covid concepts into TBX file agree on • top level nodes • max leaf size study domain study domain measure and compare • time • edit actions • taxonomies load into Coreon build taxonmy from scratch taxonomize automatically name generated concepts move wrongly grouped semi-automatic manual
  • 8. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH 8 Manual Taxonomization  Top level nodes, temporary helper buckets, and lots to do…  Concept card displaying important metadata
  • 9. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH Manual Taxonomization: Editing Actions 9 dragging ‘inflammation’ from ‘diseases’ to ‘immune system’ drag’n’drop pin filter
  • 10. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH Manual Taxonomization: Result 10 load into Coreon build taxonomy from scratch
  • 11. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH Auto-Taxonomization: Data + WE + Community Detection Algorithm 11
  • 12. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH Human Revision of AI-Drafted Taxonomy 12 initial situation after automatic taxonomization
  • 13. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH Good Clustering, Bad Clustering? 13  55 clusters, majority pretty accurate  some clusters are off, and we blame WE:  ‘interstitial space’ and ‘hospital pharmacy’  spaces appearing in similar “semantic neighborhoods”  some existing IATE concepts became parents of concept clusters
  • 14. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH Collaborative Taxonomization: Result 14 automatic taxonomization using ML algorithm name auto concepts move wrong concepts load into Coreon
  • 15. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH Effort and Perfomance 15 Metrics Taxonomization Manual Semi-Automatic Curator‘s recorded time (hours) 40 8 Relations created / changed 1 147 432 Concepts created 115 28 Intermediate structural nodes renamed — 45 Overall relations 679 470
  • 16. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH Resulting Taxonomies 16 load into Coreon build taxonmy from scratch automatic taxonomization using ML algorithm name auto concepts move wrong concepts load into Coreon
  • 17. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH Taxonomization Benefits 17 Effective way to add structure to data Improve data quality avoid duplicates and overlapping concepts associative relations Easier and safer data maintenance Formalize multilingual knowledge, make it machine-digestible Boost performance of AI algorithms, priming them with structured data
  • 18. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH Text Classification 18 label learn tune test F1 score classify training documents production documents split Training/Dev/Test 1500 / 300 / 800
  • 19. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH CNN Predictions and True Labels / Test Set 19
  • 20. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH Document Classification: Metrics 20 Classifiers Metrics (micro-averages), % Precision Recall F-1 Non-initialized CNN 81.6 75.4 78.4 Initialized CNN 82.5 78.8 80.6 🤗 Zero-shot 0.95 threshold 15.3 37.8 21.7 🤗 Zero-shot 0.97 threshold 15.0 26.3 19.1 🤗 Zero-shot 0.99 threshold 12.0 10.2 11.0
  • 21. 15-04-2021 AI Lends Taxonomists a Hand Alena Vasilevich, Coreon GmbH Taxonomized Data to Enhance AI Performance 21 Label documents automatically Boost CNN with taxonomy Enjoy finer granularity in document classification Get multilingual for free