Slides of 1 hour session of Martin Kaltenböck (CFO and Managing Partner of Semantic Web Company / PoolParty Software Ltd) on 19 March 2019 in Boston, US at the Enterprise Data World 2019, with its title: Benefiting from Semantic AI along the data life cycle.
Benefiting from Semantic AI along the data life cycle
1. Benefiting from Semantic AI
Along the Data Lifecycle
Martin Kaltenböck
CFO & Managing Partner
Semantic Web Company
PoolParty Software Ltd
www.poolparty.biz | @kalte2707
2. Software Engineers &
Expert consultants for
NLP, Semantics and
Machine Learning
Introducing Semantic Web Company
Founded in 2004
Based in Vienna , Austria
Privately held
Developer & Vendor of
PoolParty Semantic Suite
Participating in
projects with
€2.5 million
funding for R&D
SWC named to
KMWorld’s
‘100 Companies That
Matter in Knowledge
Management’ in
2016, 2017, 2018,
and 2019
50+ FTE
revenue growth
last 2 years
~30%
ISO 27001:2013 certified
4,500 followers
3. First release in 2009
Fact sheet: PoolParty Semantic Suite
Most complete and secure
Semantic Middleware on
the Global Market
Semantic AI:
Fusing Graphs, NLP,
and Machine Learning
W3C standards compliant Named as Sample
Vendor in
Gartner’s Hype
Cycle for AI 2018
Current version 7.1
On-premises or cloud-
based
Over 200
Named as
Representative
Vendor in Gartner’s
Market Guide for
Hosted AI Services
2018
KMWorld listed
PoolParty as Trend-
Setting Product
2015, 2016, 2017,
and 2018
installations
world-wide
ISO 27001:2013 certified
4. Selected Customer References and Partners
SWC head-
quarters
US
East
US
West
AUS/
NZL
UK
Selected Customer References
● Credit Suisse
● Boehringer Ingelheim
● Roche
● adidas
● The Pokémon Company
● Fluor
● Harvard Business School
● Wolters Kluwer
● Philips
● Nestlé
● Electronic Arts
● Springer Nature
● Pearson - Always Learning
● Healthdirect Australia
● World Bank Group
● Canadian Broadcasting Corporation
● Oxford University Press
● International Atomic Energy Agency
● Siemens
● Singapore Academy of Law
● Inter-American Development Bank
● Council of the E.U.
● AT&T
Selected Partners
● Enterprise Knowledge
● Mekon Intelligent Content
Solutions
● Soitron
● Accenture
● Stardog
● BAON Enterprises
● Findwise
● Tellura Semantics
● HPC
● Minerva Intelligence
● Make it a Triple
We work with Global Fortune Companies, and
with some of the largest GOs and NGOs from
over 20 countries.
5. Agenda
5
▸ Current status of Artificial Intelligence (AI)
▸ Machine Learning, NLP & Knowledge Graphs
▸ Excursus: Importance of Explainable AI (XAI)
▸ Semantic AI: six core aspects
▸ How to build a Knowledge Graph
▸ Semantic AI: Use Cases
▸ Q & A
Source:www.hannahsanfordart.com
6. Agenda: Benefiting from Semantic AI Along the Data Lifecycle
6
▸ Current status of Artificial Intelligence (AI)
▸ Machine Learning, NLP & Knowledge Graphs
▸ Excursus: Importance of Explainable AI (XAI)
▸ Semantic AI: six core aspects
▸ How to build a Knowledge Graph
▸ Semantic AI: Use Cases
▸ Q & A
8. Motivation for / Facts of Artificial Intelligence (AI)
8
One important driver for the emerging AI business opportunities is the significant growth of data volume and the rates at
which it is generated. By 2020, there will be more than 16 zettabytes of useful data (16 trillion GB), reflecting a growth of
236% per year from 2013 to 2020.
In fact, according to IDC by 2020, 40% of all digital transformation initiatives, and 100% of all effective data-driven IoT
efforts will be supported by cognitive/AI capabilities.
Source: BDVA AI Positioning Paper 2018
Vernon Turner, John F. Gantz, David Reinsel and Stephen Minton, The digital universe of opportunities: rich data and the increasing value of the Internet of Things, Report from IDC
for EMC April 2014. IDC FutureScape: Worldwide IT Industry 2017 Predictions
9. Predictions and ground truths about AI
9
Herbert A. Simon (1965)
▸ AI pioneer Herbert A. Simon (Carnegie Mellon University) had predicted in 1965 that "machines will be
capable, within twenty years, of doing any work a man can do"
Doug Lenat (1998)
▸ “Those of us who lack shared knowledge and experience often don't understand much of what they say to
each other. In the same way, today's computers, which don't share even the common-sense knowledge we
all draw on in our everyday speech and writing, can't comprehend most of our speech or texts.”
HAL 9000 (1968) KITT (1982) Siri (2011)
10. AI suffers from a lack of common sense, but...
10
Google Assistant (2018)
11. … is talented in solving isolated problems
based on isolated data sets
11
Monte Carlo TS
Deep Learning
Deep Learning
Genetic
Algorithms
Neuronal networks
Case based
reasoning
Face recognition Game AI Fraud detection
12. 12
What AI needs versus what it has (Part 1)
What it needs: Contextualized, disambiguated, highly relevant and specific
integrated data, flowing to the point of need
What it has: Single batch datasets cleaned up to be good enough by data
scientists, who spend 80% of their time on cleanup
What it needs: Knowledge engineers, and many bold Data Visionaries in addition
to big D Data Scientists, data-centric architects, pipeline engineers, specialists in
many new data niches
What it has: A growing group of tool users versed only in probability theory, neural
networks, python and R, including small D data scientists, engineers and
architects, plus scads of entrenched application-centric developers.
Source: Allan Morrison, PWC, SEMANTiCS2018 conference, Vienna.
13. Machine Learning, NLP and
Knowledge Graphs
#HybridApproach: statistic AI and symbolic AI
Best of breed…
14. 14
Gartner Hype Cycle for Artificial Intelligence, 2018
“The rising role of content
and context for delivering
insights with AI technologies,
as well as recent knowledge
graph offerings for AI
applications have pulled
knowledge graphs to the
surface.”
15. 15
The fast growing Graph Database Market
Amazon Neptune Azure Cosmos DB
▸ Marklogic
▸ AllegroGraph
▸ GraphDB
▸ Oracle Spatial Graph
▸ Virtuoso
▸ Neo4j
▸ ...
Property Graph RDF Graph
Main use case Traverse a graph Query a graph
Typical applications Path Analytics, Social Network Analysis Data Integration, Knowledge Representation
Standards No standards → Gremlin, Cypher, PGQL, ... W3C Semantic Web standards → SPARQL 1.1
Additional options Shortest path calculations Inferencing
Stardog
16. 16
Level Example Questions
(6) Create
How to convert an inefficient AI system architecture to a more
efficient one by replacing your choice of components?
How would you improve …?
Can you formulate a theory for …?
Can you predict the outcome if …?
(5) Evaluate
Which kinds of knowledge models
are best for machine learning, and why?
What is your opinion of …?
How would you prioritize …?
What would you use to support the view …?
(4) Analyse
How does a graph database and a semantic knowledge model
work together?
How is ... related to ...?
What is the function of ...?
What conclusions can you draw ...?
(3) Apply
How can taxonomies be used to enhance machine learning? Why is … significant?
How is … an example of …?
What elements would you use to change …?
(2) Understand
What is the difference between an ontology and a taxonomy? What is the difference between …?
What is the main idea of …?
Which statements support …?
(1) Remember
Who is the inventor of the World Wide Web? Who is …?
Where is …?
Why did …?
What makes someone an intelligent being?
Assessment of the current status of Artificial Intelligence
Bloom’sTaxonomy:Classifycognitiveprocesses
17. 17
(1) Remember
Perth
Australia
Perth is one of
the most isolated
major cities in the
world, with a
population of
2,022,044 living
in Greater Perth.
Australia is a
member of the
OECD, United
Nations, G20,
ANZUS, and the
World Trade
Organisation.
Country
City
is a
is a
is located in
Avoid illogical answers:distance between
Commonwealth
of Nations
International
Organisation
is part of
is a
Support complex Q&A:
Which cities located in the
Commonwealth of Nations
have a population of more
than 2 mio. people?
“Knowledge graphs
silently accrue ‘smart
data’ — i.e., data that
can be easily read and
‘understood’ by AI
systems.”
Gartner Hype Cycle for
Artificial Intelligence,
2018
18. 18
(1) Remember - Knowledge Graphs & Knowledge Extraction
Knowledge Graphs (KG) can cover general
knowledge (often also called cross-domain
or encyclopedic knowledge), or provide
knowledge about special domains such as
biomedicine.
In most cases KGs are based on Semantic
Web standards, and have been generated by
a mixture of automatic extraction from text
or structured data, and manual curation
work.
Examples:
▸ Dbpedia https://wiki.dbpedia.org/
▸ Google Knowledge Graph
▸ YAGO
▸ OpenCyc
▸ Wikidata
Who is the inventor of the World Wide Web?
19. 19
(2) Understand...
Google Featured
Snippets based
on Sentence
Compression
Algorithms
To train Google’s artificial Q&A brain, the
company uses old news stories, where
machines start to see how headlines
serve as short summaries of the longer
articles that follow. But for now, the
company still needs its team of PhD
linguists.
Spanning about 100 PhD linguists across
the globe, the Pygmalion team produces
“the gold data,” while the news stories
are the “silver.” The silver data is still
useful, because there’s so much of it. But
the gold data is essential.
WIRED article
What is the difference between an ontology and a taxonomy?
20. 20
(6) Create - Example for DL-based ‘creativity’
Aiva’s compositions still require human input with regards to
orchestration and musical production. In fact, Aiva’s creators envisage
a future where man and machine will collaborate to fulfill their
creative potential, rather than replace one another.
http://www.aiva.ai/
After having listened to a large
amount of music and learned its
own models of music theory, Aiva
composes its very own sheet music.
These partitions are then played by
professional artists on real
instruments in a recording studio,
achieving the best sound quality
possible.
21. 21
AI Methods Overview Artificial
Intelligence (AI)
Artificial Neural
Network (ANN)
Symbolic AI
(GOFAI*)
Sub-Symbolic AI Statistical AI
Knowledge graphs &
reasoning
Natural Language
Processing (NLP)
Machine Learning
* Good old-fashioned AI
Word Embedding
(Word2Vec)
Deep Learning
(DNN)
Natural Language
Understanding
Entity Recognition
& Linking
Knowledge
Extraction
Semantic enhanced
Text Classification
22. 22
What AI needs versus what it has (Part 2)
Lack of AI Governance
Companies have concerns about validity, explainability and unintended bias of AI.
Lack of AI Strategy
Many organizations are currently undertaking POCs from a large pool of AI vendors only for tactical benefits.
Low Data Quality & Data Silos
80% of the work of data scientists is acquiring and preparing data. A demon that can drive up that 80% and
often makes initiatives impossible are data silos.
Danger of Vendor Lock-in
Use of black-boxes instead of hybrid middleware approaches connecting internal training assets to third-
party machine-learning solutions.
Lack of Knowledge / AI Literacy
Only 1 in 10 enterprises feel they have a competent approach to mining data, which ultimately hampers AI
efforts. A shortage of AI skills and risk managers' lack of familiarity with the technology increase the risk.
Sources:Gartner(March2018):“ClarifyStrategyandTacticsforArtificialIntelligence
bySeparatingTrainingandMachineLearning”byAnthonyMullen,MagnusRevang,
ErickBrethenoux
HarvardBusinessReview(2016):“BreakingDownDataSilos”byEddWilder-James
Gartner(April2018):“CIOsCanManagetheRisksofAIInvestments”byJorgeLopez,
PaulE.Proctor
24. 24
Excursus: Explainable AI
That’s Explainable AI. It’s the next stage of human augmentation by machines, when AI will empower
humans to take corrective actions according to the explanations given. Within three years, we believe it
will have come to dominate the AI landscape for businesses — because it will enable people to
understand and act responsibly, as well as creating effective teaming between human and machines.
Source: Accenture Labs, Understanding Machines, Explainable AI (September2018)
EU GDPR: “RIGHT TO EXPLANATION”
As well as being a practical imperative, explainability will also be required because of ethical or legal
requirements, such as the introduction under the EU’s forthcoming General Data Protection Regulation
(GDPR) of the “right to explanation” about algorithm-derived decisions. But most importantly,
explainability puts people in control—meaning AI augments human skills rather than trying to replace
them. For all these reasons, AI needs to go beyond machine learning to the next stage: Explainable AI.
Source: https://www.eugdpr.org/
25. 25
Excursus: Explainable AI
While “black box” AI is clearly limited
by its inability to explain its reasoning
to human users, it can actually work
well in three types of application.
The first is simple pattern recognition
tasks where the cost of failure is low.
The second is “closed-loop” systems
where a real-time response is critical
and/or the pace of decision-making is
too fast to allow for human intervention,
such as realtime pricing (Amazon),
movie or music recommendations
(Netflix, Spotify, etc), or driverless cars.
The third is interactive response
systems like robots and chatbots.
26. 26
Excursus: Explainable AI
If AI lacks the ability to explain itself in
these areas, then the risk of it making a
wrong decision may outweigh the
benefits it could bring in terms of the
speed, accuracy and efficiency of
decision-making. The effect would be to
severely limit its usage.
Examples
▸ Travel Expenses Analysis
▸ Project Risk Management
28. 28
Six Core Aspects of Semantic AI
1. Data Quality
Semantically enriched data serves
as a basis for better data quality
and provides more options for
feature extraction.
2. Data as a Service
Linked data based on W3C Standards
can serve as an enterprise-wide data
platform and helps to provide
training data for machine learning in
a more cost-efficient way.
3. No black-box
Semantic AI ultimately leads to
AI governance that works on
three layers: technically,
ethically, and on the legal layer.
4. Hybrid approach
Semantic AI is the combination of
methods derived from symbolic AI
and statistical AI. It is not only
focused on process automation,
but also on intelligence
augmentation.
5. Structured data meets
text
Most machine learning algorithms
work well either with text or with
structured data. Semantic AI is based
on entity-centric data models.
6. Towards self
optimizing machines
ML can help to extend
knowledge graphs, and in
return, knowledge graphs can
help to improve ML algorithms.
Read more
29. 29
1) Data Quality
PoolParty Semantic Classifier combines machine learning algorithms
(SVM, Deep Learning, Naive Bayes, etc.) with Semantic Knowledge Graphs.
Training data is
semantically
enriched with help
from semantic
knowledge models
30. 30
2) Data as a Service
Unstructured Data
Structured Data
Knowledge Graphs
Machine
Learning
Semantic
Layer
Cognitive
Applications
Proposal for a
Cognitive Computing
Platform Architecture
31. 31
2) Data as a Service
The Semantic Layer
completes the
Four-layered Data &
Content Architecture
32. 32
3) No back-box
Explaining an image classification prediction made by Google’s Inception network, highlighting
positive pixels. The top 3 classes predicted are “Electric Guitar” (p = 0.32), “Acoustic guitar” (p =
0.24) and “Labrador” (p = 0.21)
From: Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). Why should i trust you?: Explaining the predictions of any classifier. In
Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144). ACM.
Explainable AI as an AI whose
decision-making mechanism for
a specific problem can be
understood by humans who have
expertise in making decisions for
that specific problem.
Explainable AI has been used for years in AI
that are based on transparent methods.
These include Expert Systems or Symbolic
Reasoning Systems - anything that is
considered GOFAI (Good Old-Fashioned AI)
methods.
33. 33
3) No back-box
Explaining a text
classification
prediction made by
PoolParty Semantic
Suite, highlighting
positive concepts
and terms.
34. 34
4) Hybrid Approach
Artificial Intelligence
ANN
Symbolic AISub-Symbolic AI Statistical AI
KR & reasoning
NLP
Machine Learning
Word Embedding Deep Learning
Natural Language
Understanding
Entity Recognition &
Linking
Knowledge Extraction
Semantic enhanced
Text Classification
In Semantic AI, various methods from
Symbolic AI are combined with
machine learning methods, and/or
neuronal networks.
Examples:
● Semantic enrichment of
text corpora to enhance
word embeddings
● Extraction of semantic features
from text to improve ML-based
classification tasks
● Combine ML-based with Graph-
based entity extraction
● Knowledge Graphs as a Data
Model for Machine Learning
● ….
35. 35
5) Structured data meets text
These transformations can result in loss of information and
introduce bias. To solve this problem, we require machine
learning methods to consume knowledge in a data model
more suited to represent this heterogeneous knowledge.
We argue that knowledge graphs are that data model.
Three examples for the benefits of using knowledge graphs:
▸ they allow for true end-to-end-learning,
▸ they simplify the integration of heterogeneous data
sources and data harmonization,
▸ they provide a natural way to seamlessly integrate
different forms of background knowledge.
Wilcke X, Bloem P, De Boer V. The Knowledge Graph as the Default Data Model for Machine Learning.
Data Science. 2017 Oct 17;1-19. Available from, DOI: 10.3233/DS-170007
Traditionally, when faced with heterogeneous
knowledge in a machine learning context, data
scientists preprocess the data and engineer
feature vectors so they can be used as input for
learning algorithms (e.g., for classification).
40. 40
Labels and basic relations:
Taxonomies and Thesauri
prefLabel
Venice
prefLabel
St. Mark’s Square
altLabel
Piazza
San Marco
Peggy
Guggenheim
Museum
prefLabel
Piazza
altLabel
Town Square
related
related
prefLabel
broader
41. 41
Classes, specific relations, restrictions:
Ontologies and custom schemas
prefLabel
Venice
prefLabel
St. Mark’s Square
altLabel
Piazza
San Marco
http://schema.org/City
http://schema.org/TouristAttraction
http://schema.org/ArtGallery
Monday through
Sunday, all day
opening
Hours
image
http://schema.org/containedInPlace
prefLabel
Piazza
altLabel
Town Square
Peggy
Guggenheim
Museum
prefLabel
containedInPlace
containedInPlace
broader
42. 42
Metadata and Graph Annotations
prefLabel
Venice
prefLabel
St. Mark’s Square
altLabel
Piazza
San Marco
http://schema.org/City
http://schema.org/TouristAttraction
http://schema.org/ArtGallery
Monday through
Sunday, all day
opening
Hours
image
http://schema.org/containedInPlace
prefLabel
Piazza
altLabel
Town Square
Peggy
Guggenheim
Museum
prefLabel
containedInPlace
containedInPlace
CC BY-SA 3.0
broader
43. 43
Entity Linking & schema mapping:
Links to other graphs
prefLabel
Venice
prefLabel
St. Mark’s Square
altLabel
Piazza
San Marco
http://schema.org/City
http://schema.org/TouristAttraction
http://schema.org/ArtGallery
Monday through
Sunday, all day
opening
Hours
image
http://schema.org/containedInPlace
prefLabel
Piazza
altLabel
Town Square
Peggy
Guggenheim
Museum
prefLabel
containedInPlace
containedInPlace
CC BY-SA 3.0
broader
44. 44
Linking to metadata, data and documents
stored in other systems
prefLabel
Venice
prefLabel
St. Mark’s Square
altLabel
Piazza
San Marco
http://schema.org/City
http://schema.org/TouristAttraction
http://schema.org/ArtGallery
Monday through
Sunday, all day
opening
Hours
image
http://schema.org/containedInPlace
prefLabel
Piazza
altLabel
Town Square
broader
Peggy
Guggenheim
Museum
prefLabel
containedInPlace
containedInPlace
CC BY-SA 3.0
The Peggy
Guggenheim
Collection is
a modern art
museum on the
Grand Canal in
the Dorsoduro
sestiere of
Venice, Italy.
45. 45
4 Pillars of successful Knowledge Graphs
● Keep you Knowledge Graph Alive!
● Knowledge Graphs Helps you Scale with Reusable Existing Data
● Use AI to Automate your Knowledge Graph
● Accelerating the Speed of DataOps with Knowledge Graphs
Source: Knowledge Graphs, Transforming Data into Knowledge (PoolParty)
46. Use Cases for Semantic AI
#Knowledge Graphs, Semantic Layer,
Machine Learning
47. 47
5 generic Use Cases for Semantic AI
1. Deal with hierarchical or highly connected datasets
more efficiently
2. Gain new insights based on entity-centric views (in
contrast to document-centric views)
3. Understand and calculate causalities and the effects
in a knowledge domain
4. Integrate heterogeneous data sources (structured
& unstructured) based on a “schema-late” approach
5. Create federated (unified) views across multiple
data silos within the enterprise
48. 48
Example: Research in Life Sciences
As a researcher in pharmaceutical industry,
I want to plan new experiments more
efficiently.
I want to know what’s already available.
I’m interested in former experiments
where
● certain genes were tested
● under specific treatment conditions
● in a target therapeutic area
● with help from categorisation systems
like ‘disease hierarchies’
UniProt, ChEMBL
Experiments
Documentation
MeSH
DrugBank
→ Linking Structured to Unstructured
Data and to Industry Knowledge Graphs
49. 49
Making use of Knowledge Graphs
Experiments
Document
Store
→ Knowledge Graphs serve as means to enrich
unstructured information to provide a rich set of
additional access points to document repositories
50. 50
The LinkedIn Economic Graph
The LinkedIn Economic Graph is
a digital representation of the
global economy based on
▸ 560 million members,
▸ 50 thousand skills,
▸ 20 million companies,
▸ 15 million open jobs, and
▸ 60 thousand schools.
https://economicgraph.linkedin.com/
“LinkedIn has a vast quantity of data. While much of the data is
structured—graph nodes and edges, normalized fields in database
records—a great deal of it is simply natural language text.
Attaching structure and meaning to this text is essential to
LinkedIn’s overall mission of connecting its members to
opportunity.”
52. 52
Why Data Scientists need semantic models
▸ Data Quality & Data Governance
▹ Content aboutness in a defined framework
▹ Data relationships and context within a
unified organizational model
▹ Connections across disparate datasets
▸ Improved Machine Learning
▹ Hierarchical or other mapped relationships allow
for recommending similar content when exact
matches not found
▹ Granularity allows for more specific
recommendations
▹ Consistency across structure results more precise
analysis and predictions
Source: Suzanne Carroll, Data Science Product Director at XO Group
54. 54
Take home: Semantic AI
● Semantic Data Lake: interlinked data & metadata full of context & meaning
● High data quality along the whole data lifecycle
● Scale with reusable existing data for several (AI) applications
● Bridge the gap between structured and unstructured data
● From ‘black box’ to ‘glass box’: transparency, trust & better decision making
● Support data governance and data stewardship
● Ensure an agile approach and not a top-down only Semantic AI strategy
● Adapt your business models along your Semantic AI Strategy
57. 57
Linked Data Lifecycle
PLUS:
Knowledge Graphs
support
• Data Governance
• Data Stewardship
• Reuse of Data
Further Reading: How to build Enterprise Knowledge Graphs?
58. 58
Resources and further reading...
▸ White Paper: Explainable AI (PoolParty Semantic Suite), link
▸ White Paper: Knowledge Graphs (PoolParty Semantic Suite), link
▸ BDVA AI Positioning Paper 2018 (Big Data Value Association), link
▸ Allan Morrison: Collapsing the IT Stack (PWC, at SEMANTiCS2018, Vienna), link
▸ Clarify Strategy and Tactics for Artificial Intelligence by Separating Training and Machine
Learning (Gartner), link
▸ Breaking Down Data Silos (Harvard Business Review), link
▸ CIOs Can Manage the Risks of AI Investments (Gartner), link
▸ Understanding Machines, Explainable AI (Accenture Labs), link
▸ Why should i trust you?: Explaining the predictions of any classifier (Ribeiro, M. T.,
Singh, S., & Guestrin, C.), link
▸ The Knowledge Graph as the Default Data Model for Machine Learning (Wilcke X, Bloem P,
De Boer V. ), link
▸ Mike Bergman, link