Speaker: Nir Avrahamov, Developer Relations, Neo4j
Abstract: Knowledge graphs are driving industry disruption and business transformation by bringing together previously disparate data, using connections for superior decision support, and adding context for more intelligent applications (including AI). In this session, we’ll walk through the fundamental elements of knowledge graphs including contextual relevancy, dynamic self-updating, understandability with intelligent metadata, and the combination of heterogeneous data.
Our use cases will cover the 3 main types of knowledge graphs (context-rich search, external insights sensing, and enterprise NLP) that build on each other. You’ll hear about real-world examples that include organizations such as Refinitiv a leading provider of financial information, the German Center for Diabetes Research, eBay, and NASA.
We’ll also cover how you can build analytical applications on top of your knowledge graph using Neo4j Solution Frameworks quickly and easily. Attend this session to see real world knowledge graphs and walk away with practical approaches for building your knowledge graph and leveraging it for business applications.
2. Index Free Adjacency - What is it?
• While Any database can represent a graph, only a native graph database
makes the graph structure explicit
• In a graph database each node (or vertex) stores a collection of pointers to
its adjacent nodes
• This means that as the database grows in size the cost of each hop
remains constant.
3. • Operational workloads
• Analytics workloads
Real-time Transactional
and Analytic Processing • Interactive graph exploration
• Graph representation of data
Discovery and Visualization
• Native property graph model
• Dynamic schema
Agility
• Cypher - Declarative query language
• Procedural language extensions
• Worldwide developer community
Developer Productivity
• 10x less CPU with index-free adjacency
• 10x less hardware than other platforms
Hardware efficiency
Neo4j: Why use Native Graph?
Performance
• Index-free adjacency
• Millions of hops per second
4. The Knowledge Graph Problem
Organizations have difficulty maintaining their corporate memory due to a
variety of reasons:
• Growth which drives need for new and continuous education
• Digitalization / Digital Transformation initiatives to identify new markets
• Turnover where long term knowledge is lost
• Aging infrastructures and siloed information
5. Related entities are
connected.
(contextually
related)
Dynamically
updating / not
manual
Uses intelligent
labelling and ties in
to the graph
automatically
Explainable -
Intelligent
metadata helps
traverse to find
answers to specific
problems, even
when we don’t
know exactly how
to ask for it.
Usually contains
heterogeneous
data types. It
combines and
uncovers
connections across
silos of information.
Key Principles of a Knowledge Graph
6. 8
Knowledge Graph Vs Knowledge Base
“Unlike a simple knowledge base with flat structures and static
content, a knowledge graph acquires and integrates adjacent
information using data relationships to derive new knowledge.”
12. Strictly Confidential
Graph Algorithms in Neo4j
• Parallel Breadth First Search
• Parallel Depth First Search
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• Minimum Spanning Tree
• A* Shortest Path
• Yen’s K Shortest Path
• K-Spanning Tree (MST)
• Random Walk
• Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic,
Dangalchev, Wasserman & Faust
• Betweenness Centrality
• Approximate Betweenness
Centrality
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity – 1 Step & Multi-
Step
• Balanced Triad (identification)
• Euclidean Distance
• Cosine Similarity
• Jaccard Similarity
• Overlap Similarity
• Pearson Similarity
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Similarity
neo4j.com/docs/
graph-algorithms/current/
Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
14
13. • Parallel Breadth-First Search & Depth-First Search
○ Traverses tree structure by exploring nearest neighbors (BFS) or down each branch
(DFS)
• Single-Source Shortest Path
○ Calculates path between a node and all other nodes
Algorithms - Pathfinding & Search
Analyzing network flow
• All-Pairs Shortest Path
○ Calculates shortest path
group with all shortest paths
between nodes
• Minimum Weight Spanning Tree
○ Calculates the path with the smallest value for visiting all nodesLeast Cost Routing
14. Strictly Confidential
Connected components to
identify disjointed graphs sharing
identifiers
PageRank to measure influence
and transaction volumes
Louvain to identify communities
that frequently interact
Jaccard to measure account
similarity
Algorithms - Centrality &Community Detection
Detecting Financial Fraud
Large financial institutions have existing pipelines to identify fraud via
knowledge graphs, heuristics, and ml models
16
15. Background
• Brazil's largest bank, #38 on Forbes G2000
• $61B annual sales 95K employees
• Most valuable brand in Brazil
• 28.9M credit card & 25.6M debit card accounts
• High integrity, customer-centric values
Business Problem
• Data silos made assessing credit worthiness hard
• High sensitivity to fraud activity
• 73% of all transactions over internet and mobile
• Needed real-time detection for 2,000 analysts
• Scale to trillions of relationships
Solution and Benefits
• Credit monitoring and fraud detection application
• 4.2M nodes & 4B relationships for 100 analysts
• Grow to 93T relationships for 2000 analysts by 2021
• Real time visibility into money flow across multiple
customers
Itau Unibanco FINANCIAL SERVICES
Fraud Detection / Credit Monitoring17
CE Customer since 2016 Q1EE Customer since Q2 2017
16. Strictly Confidential
het.io - HetioNet
Knowledge graph integrating
50+ years of biomedical data
Leveraged to predict new
uses for drugs by using the
graph topology to create
features to predict new links
Algorithms - Link Prediction
Mining Data for Drug Discovery
18
17. Strictly Confidential
Algorithms - Link Prediction
Mining Data for Drug Discovery
het.io - HetioNet
Knowledge graph integrating
50+ years of biomedical data
Leveraged to predict new
uses for drugs by using the
graph topology to create
features to predict new links
19
18. 20
Data Orchestration Layer
Data Sources
CLIENT Admin Dashboard
Session
Data
Feedback
Scored
Recommen-
dations
Graph
Algorithms
AI / ML
Click
Stream
Data
INTELLIGENT RECOMMENDATIONS FRAMEWORK
Discovery
Exclude
Boost
Diversity
User Segmentation
Item Similarity
Recommendation Engines
• Strategic Data Modelling
• Continuous Data Capture
• Automated Tagging &
Labelling (NLP)
• Real-time Scoring
Pipelines & Algos
• Preserved Data Lineage
• Relevant Alerting
• Auto & Semi-auto
deduplication/entity
resolution
• ML integration
RSS Feed
Org. Feed
(Graph)
Generating Insights &
Recommendations
From Your Graph
19. 21
Hybrid Scoring-Based Approach is More Contextual
Graph technology enables you to make
recommendations that weight multiple methods
Collaborative
Filtering
Based on user action
history or product
interaction
Content
Filtering
Based on user's
profile or product
attributes
Rules-Based Filtering
Based on predefined
rules and criteria
Business
Strategy
Based on
promotions, margins,
inventory
20. Strictly Confidential
Query-Based Knowledge Graphs
Connecting the Dots
Multiple graph layers of financial
information
Includes corporate data with cross-
relationships, external news, and
customized weighting
Dashboards and tools
• Credit risk
• Investment risk
• Portfolio news recommendations
has become...
22
22. Background
• Personal shopping assistant
• Converses with buyer via text, picture and voice
to provide real-time recommendations
• Combines AI and natural language understanding
(NLU) in Neo4j Knowledge Graph
• First of many apps in eBay's AI Platform
Business Problem
• Improve personal context in online shopping
• Transform buyer-provided context into ideal
purchase recommendations over social platforms
• "Feels like talking to a friend"
Solution and Benefits
• 3 developers, 8M nodes, 20M relationships
• Needed high-performance traversals to respond
to live customer requests
• Easy to train new algorithms and grow model
• Generating revenue since launch
eBay Conversational Commerce ONLINE RETAIL
Knowledge Graph powers Real-Time Recommendations24
EE Customer since 2016 Q3
33. Background
• Large global bank
• Deploying Reference Data to users and systems
• 12 data domains, 18 datasets, 400+ integrations
• Complex data management infrastructure
Business Problem
• Master data silos were inflexible and hard to
consume
• Needed simplification to reduce redundancy
• Reduce risk when data is in consumers’ hands
• Dramatically improve efficiency
Solution and Benefits
• Data distribution flows improved dramatically
• Knowledge Base improves consumer access
• Ad-hoc analytics improved
• Governance, lineage and trust improved
• Better service level from IT to data consumers
UBS FINANCIAL SERVICES
Master Data Management / Knowledge Graph35
CE Customer since 2016 Q1EE Customer since 2015
34. Background
• SF-based C2C rental platform
• Dataportal democratizes data access for
growing number of employees while improving
discoverability and trust
• Data strewn everywhere—in silos, in segmented
departments, nothing was universally accessible
Business Problem
• Data-driven culture hampered by variety and
dependability of data, tribal knowledge and
word-of-mouth distribution
• Needed visibility into information usage, context,
lineage and popularity across company of 3,000+
Solution and Benefits
• Offers search with context & metadata, user &
team-centric pages for origin & lineage
• Nodes are resources: data tables, dashboards,
reports, users, teams, business outcomes, etc.
• Relationships reflect consumption, production,
association, etc.
• Neo4j, Elasticsearch, Python
Airbnb Dataportal TRAVEL TECHNOLOGY
Knowledge Graph, Metadata Management36
CE users since 2017
35. Background
• 5 year long drug discovery research
• Parse & Navigate over 25 Million scientific papers
• Sourced from National Library of Research and
tagging of “Medical Subject Headers” (MeSH tags)
Business Problem
• Seeking to automate phenotype, compound and
protein cell behavior research by using previously
documented research more effectively
• Text mining for research elements like DNA strings,
proteins, RNA, chemicals and diseases
Solution and Benefits
• Found ways to identify compound interaction
behavior from millions of research documents
• Relations between biological entities can be
identified and validated by biologic experts
• Still very challenging to keep up-to-date, add
genomics data, and find a breakthrough
Novartis PHARMACEUTICAL RESEARCH
Content Management / Biomedical Research37
CE Customer since 2016 Q1CE Customer since 2012
36. Background
• How Neo4j is used in investigations
• Non-technical reporters manually gather data
• “Low-tech” data curation
• Journalists want to model data as a story, not
as data
Business Problem
• Identify repeated business relationships among
individuals and their holdings and accounts
• Scan documents and identify possible entities,
then create relationships between people and
documents.
• Names and alias variances
Solution and Benefits
• Uses Neo4j in “story discovery” phase
• Uncovers shortest paths for leads for reporters
• Many investigations underway now
Columbia University EDUCATION
Investigative Journalism / Fraud Detection38
CE Customer since 2016 Q1EE Customer since 2015 Q4
37. Background
• Large Nordic Telecom Provider
• 1M Broadband routers deployed in Sweden
• Half of subscribership are over 55yrs old
• Each household connects 10 devices
• Goal to improve customer experience
Business Problem
• Broadband router enhancement to improve
customer experience
• Context-based in home services
• How to build smart home platform that allows
vendors to build new “home-centric” apps
Solution and Benefits
• New Features deployed to 1M homes
• API-based platform for easy apps that:
• Automatically assemble Spotify playlists
based on who is in the house
• Notify parents when children get home
• Build smart shopping lists
TELIA ZONE TELECOMMUNICATIONS
Smart Home / Internet of Things39
EE Customer since 2016 Q4
38. Business Problem
• Needed new asset management backbone to
handle scheduling, ads, sales and pushing linear
streams to satellites
• Novell LDAP content hierarchy not flexible
enough to store graph-based business content
Solution and Benefits
• Neo4j selected for performance and domain fit
• Flexible, native storage of content hierarchy
• Graph includes metadata used by all systems:
TV series-->Episodes-->Blocks with Tags-->
Linked Content, tagged with legal rights, actors,
dubbing et al
Background
• Nashville-based developer of lifestyle-
oriented content for TV, digital, mobile and
publishing
• Web properties generate tens of millions of
unique visitors per month
Scripps Networks MEDIA AND ENTERTAINMENT
Knowledge Graph / Asset Management40
39. Business Problem
• Needed to reimagine existing system to beat
competition and provide 360-degree view of
customers
• Channel complexity necessitated move to graph
database
• Needed an enterprise-ready solution
Solution and Benefits
• Leapfrogged competition and increased digital
business by 23%
• Handles new data from mobile, social
networks, experience and governance sources
• After launch of new Neo4j MDM, Pitney Bowes
stock declared a Buy
Background
• Connecticut-based leader in digital marketing
communications
• Helps clients provide omni-channel experience
with in-context information
Pitney Bowes MARKETING COMMUNICATIONS
Master Data Management41
40. Background
• Large Public University – “U-Dub”
• IT staff for 80K+ students and employees
• Transforming IT systems from mainframe to cloud
• Providing IT & data warehousing services to 3
campuses, 6 hospitals, and 6,300 EDW users
Business Problem
• Old Sharepoint metadata was too complicated
for users, not flexible and not transparent
• $1B project to migrate HR system from
mainframe to Workday needed to be smooth
• Future projects needed repeatable predictability
• Needed new glossary, impact analysis, analytics
Solution and Benefits
• Consulted with NDU peers, built simple model
• Built Visualizer with Elasticsearch, Neo4j & D3.js
• Improved predictability, lineage, and impact
understanding for over 6,300 users
University of Washington EDUCATION & RESEARCH
Metadata Management, IT & Network Operations42
CE Customer since 2016 Q1
41. Background
• World's largest hospitality / hotel company
• 7th largest web site on internet
• 1.5 M hotel rooms offered online by 2018
• Revenue Management System that allows
property managers to update their pricing rates
Business Problem
• Provide the right room & price at the right time
• Old rate program was inflexible and bogged down
as they increased the pricing options per property
per day
• Lay the path to be an innovator in the future
Solution and Benefits
• 2016-era rate program embeds Neo4j as "cache"
• Created a graph per hotel for 4500 properties in
3 clusters
• 1000% increase in volume over 4 years
• 50% decrease in infrastructure costs
• "Use Neo4j Support!"
MARRIOTT TRAVEL & HOSPITALITY SERVICES
Pricing Recommendations Engine43
EE Customer since 2014 Q2
42. Strictly Confidential
Better Predictions with Graphs
Using the Data You Already Have
• Current data science models ignore network structure
• Graphs add highly predictive features to ML models, increasing accuracy
• Otherwise unattainable predictions based on relationships
Machine Learning Pipeline
44
43. Strictly ConfidentialStrictly Confidential
The Market Sees Strong Synergy between Graphs
and Artificial Intelligence
45
AI research papers focused on graphs
New Book:
20K Downloads in first 2 weeks, ⅓
Net-new