Weitere ähnliche Inhalte Ähnlich wie Scaling up business value with real-time operational graph analytics (20) Mehr von Connected Data World (20) Kürzlich hochgeladen (20) Scaling up business value with real-time operational graph analytics1. Scaling Up Business Value
with Real-Time Operational
Graph Analytics
Connected Data London, 7 November 2018
Victor Lee, Director of Product Management
2. © 2018 TigerGraph. All Rights Reserved
Graph is Big Business
2
$2.5+ Trillion market cap combined
Business is GRAPH
3. © 2018 TigerGraph. All Rights Reserved 3
Graph analysis is possibly the single most effective
competitive differentiator for organizations pursuing
data-driven operations and decisions after the design of
data capture.”
https://www.gartner.com/doc/2852717/it-market-clock-database-management
4. © 2018 TigerGraph. All Rights Reserved
Scaling Up with Big Data + Graph Analytics
Benefits
● Richer knowledge
o Knowledge Graph Model
● Better Analytics
o More, richer data
o Ask harder questions:
Multi-hop queries
o More valuable results
4
Challenges
● Storage
● Speed performance
o Searching
o Traversing
o Updating
o Exponential slowdown?
● Cost
How do we fulfill the Promise of Graph Analytics?
5. © 2018 TigerGraph. All Rights Reserved 5
China Mobile Detects Phone Based Fraud
with Real-Time Customer 360 Data Hub
Business Challenge:: Phone Scams are hard
to detect, evolve quickly, and cost billions of $.
● 460M users (inside & outside network)
● 10B+ phone calls every month
● 10K peak calls per second
Solution: Run a real-time scam-detection
graph query on every phone call as it occur.
● Maintain real-time operational graph with 460M
user vertices & 2B call edges, with call details.
● Building training set of 2M calls; run queries to
measure 118 graph features.
● Feed graph feature data to machine learning;
build a scam detection model.
● As a phone call is requested, measure graph
features in real time to detect possible scam.
Business Benefits
● If scam is detected, recipients receive a
warning on phone, BEFORE call is answered.
● Improve customer satisfaction
6. © 2018 TigerGraph. All Rights Reserved
Detecting Phone-Based Fraud by Analyzing Network or
Graph Relationship Features
6
Good Phone Features Bad Phone Features
(1) Short term call duration
(2) Empty stable group
(3) No call back phone
(4) Many rejected calls
(5) Average distance > 3
Empty stable group
Many rejected
calls
Average distance >
3
(1) High call back phone
(2) Stable group
(3) Long term phone
(4) Many in-group
connections
(5) 3-step friend relation
Stable
group
Many in-group
connections
Good Phone Features
3-step friend relation
///
Good phone Bad phone
X
X
X
Download the solution brief at - https://info.tigergraph.com/MachineLearning
7. © 2018 TigerGraph. All Rights Reserved
Real-Time Operational Graph Analytics
Platform Requirements
● Scalability
○ Can handle the application's Big Data (Gigabytes? Terabytes? Petabytes?)
○ Can expand as data grow → distributed graph
● Real-Time Speed @Scale
○ On complex analytics
○ Fast updates
○ Throughput/Concurrency
● Query Language Support for Analytics
○ Describe common analytics patterns
○ Parallel processing
7
8. © 2018 TigerGraph. All Rights Reserved 8
Native Graph Storage
Parallel Loading
Parallel Multi-Hop
Analytics
Parallel Updates
(in real-time)
Scale up to Support Query
Volume
Privacy for Sensitive Data
Graph 1.0
single server,
non-parallel
Graph 2.0
NoSQL base
for storage
Graph 3.0
Native, Parallel
Hours to load
terabytes
Sub-second
across 10+ hops
Mutable/
Transactional
2 billion+/day in
production
Runs out of
time/memory
after 2 hops
Scales for
simple queries
(1-2 hops)
Times out
after 2 hops
Days to load
terabytes
Evolution of Graph
Databases
Days to load
terabytes
MultiGraph
service
9. © 2018 TigerGraph. All Rights Reserved
OLTP and OLAP Together in Graph
OLTP
● Real-time read and write
● ACID properties
(guarantee that transaction is correct)
● Concurrency
(many transactions at the same time)
OLAP
● Multi-dimensional Analysis
● Compute-intensive
● Aggregation
TigerGraph
+ Real-time read & write
+ Real-time, compute-intensive,
multi-dimensional analysis
+ Real-time aggregation
+ HTAP: Hybrid Transactions+Analytics
Some Graph Databases
+ ACID
+ Concurrency
All Graph Database
o Multi-dimensional data
o Real-time read
10. © 2018 TigerGraph. All Rights Reserved
Delivering New Graph-Based Solutions with TigerGraph
10
ERP: Enterprise
Resource Planning
Multichannel
Marketing & Sales
Force Automation
CRM: Customer
Relationship
Management
Operational Data
Master Data
Master Data
Management
Customer
Supplier
Devicee
Employee
Name
Address
Gender
Age
Orders
Payments
Shipments
Invoices
Visits
Downloads
MASTERDATA
TRANSACTIONAL / OPERATIONAL DATA
Data
Warehouse
Data Mart Data Lake
Batch and Streaming
Machine
Learning
Business
Initiatives
Fraud & AML
Supply Chain
Intelligence
Credit Risk
Scoring &
Monitoring
Product & Service
Marketing
Network & IT
Infrastructure
Analytics
Location
Analytics
Enterprise
Knowledge
Graph
BI & Analytics Solutions
NoSQL
Historical Data
Queries / Lookups
Complex Graph Analytics
[Community Detection,
Shortest Path, PageRank..]
Graph features
SignalsNetwork and
Sensor/IoT Systems
Recommendat
ion Engine
Real-time Customer 360 Data Hub
Faster than
Real-time EMS
Cyber Security
11. © 2018 TigerGraph. All Rights Reserved
Building Real-time Customer 360 Data Hub with
TigerGraph – Case Studies
Fraud & Money Laundering
Detection
11
Large Multi-National
Pharma
Product & Service Marketing
Cross-sell & Up-sell
Recommendation
Credit Risk Scoring & Monitoring
Use Case Examples
12. © 2018 TigerGraph. All Rights Reserved
Using C360 for Personalized Results in e-Real Estate
• Technology & data science are transforming the US real estate market.
• Past: Agents controlled all data (Homes for sale? Comparable prices)
• Possible Use Cases:
• Recommendations for Home Sellers
• Recommendations for Home Buyers - which home on consider. Challenge: home buying is a rare effect.
Harder: Find the right test.
• Matching Buyers with Sellers
• Predict selling price and buying price
• Internal operations
US leader in e-Real Estate
• Seeing unique value in TigerGraph real-time operational analytics
12
13. © 2018 TigerGraph. All Rights Reserved
Network & IT Infrastructure Analytics (Impact Assessment)
13
Business Challenge: Monitor
complex energy infrastructure to
detect and manage power outages.
• Solution:
• Model power system using real-time
Graph to accelerate power flow and
state calculation (eliminates data prep.)
• Leverage massively parallel
computing in Graph for bus ordering &
admittance graph forming to balance
load spikes
• Visualize energy computation results
in Graph for contingency analysis and
action plan
• Business Value: Completes
execution within a SCADA sample
cycle of 5 seconds
Disconnector
Connector
Breaker
ACline
Two Port
Transformer
Neutral
Point
Three Port
Transformer
BU
S
Substation
Unit
Load
Longitude
Latitude
Compensator
14. © 2018 TigerGraph. All Rights Reserved 14
Network & IT Infrastructure Analytics (Impact Assessment)
Array
Pool Pool
LUNLUN LUNLUN
Server Server Server
Applicati
on
Applicati
on
BU
Service
BU: A business unit uses
and pays for a service.
Service: Each service calls
multiple applications.
Application: An application
can run on multiple servers.
Applications may depend on
other applications.
Server: Each server
may be assigned with
multiple LUNs
LUN:
Logical Unit
Number
Pool: Each has a port on
storage array device & it is
the container of logical
LUNs
Array: Storage Array is the
container of logical pools
Attributes:
PhysicalCapacity, PromisedCapacity, Alert
Attributes:
PhysicalCapacity, PromisedCapacity, Alert
Attributes:
PhysicalCapacity,
PromisedCapacity, Alert
Attributes:
IsDecommissioned
(1:n)
(1:n)
(n:1)
(n:m)
(n:m)
(n:m)
Business Challenge: Assess business impact of
component outage or capacity overflow in a
complex, interconnected infrastructure.
Solution:
1. Maintain real-time operational map of interconnected
resources – arrays, pools, LUNs, virtual machines,
servers, applications all catering specific service for a
business unit
. . .
Business Value: Reduce server, disk storage and
network provisioning costs and downtime. Avoid
interruptions for critical workloads.
15. © 2018 TigerGraph. All Rights Reserved 15
Network & IT Infrastructure Analytics (Impact Assessment)
Arra
y
Poo
l
Poo
l
LUNLUN LUNLUN
Serv
er
Serv
er
Serv
er
Applica
tion
Applica
tion
BU
Servi
ce
BU: A business unit
uses and pays for a
service.
Service: Each service
calls multiple
applications.
Application: An
application can run on
multiple sCervers.
Applications may
depend on other
applications.CC
Server: Each
server may be
assigned with
multiple LUNs
LUN:
Logical
Unit
Number
Pool: Each has a port
on storage array
device & it is the
container of logical
LUNs
Array: Storage Array is
the container of logical
pools
Attributes:
PhysicalCapacity,
PromisedCapacity, Alert
Attributes:
PhysicalCapacity,
PromisedCapacity, Alert
Attributes:
PhysicalCapacity,
PromisedCapacity,
Alert
Attributes:
IsDecommissioned
(1:n)
(1:n)
(n:1)
(n:m)
(n:m)
(n:m)
Solution:
1. Maintain real-time operational map of
interconnected resources – arrays, pools, LUNs,
virtual machines, servers, applications all catering
specific service for a business unit.
2. Process IoT (Internet of Things) sensor signals from
physical storage, server and network equipment to
identify at-risk components in real-time.
3. Assess impact of at-risk component such as a
LUN, server or application on business service to
offload critical workloads.
4. Identify storage, server or network element likely to
max out based on physical and promised capacity
/ traffic for proactive remediation measures.
• Business Value: Reduce server, disk storage and
network provisioning costs and downtime. Avoid
interruptions for critical workloads.
16. © 2018 TigerGraph. All Rights Reserved
AML Workflow with TigerGraph and Machine Learning at
Alipay
16
17. © 2018 TigerGraph. All Rights Reserved
Location Analytics Use Case 1: Matching Demand & Supply
with Predicted Traffic Flow Data For Busy Locations
17
Traffic Inflow (predicted)
Traffic Outflow (predicted)
Saturday, 11:15 pm
Number of Taxis needed: 105
Number of Taxis available: 65
● Scenario – Based on predicted traffic inflow and outflow from Ginza brand street area for next
Saturday at 11:15 pm Japan Standard time, there will be shortage of 40 taxis for about 45 minutes.
● Recommended action - Message 40 taxi drivers from nearby areas that have excess supply of taxis to
come to Ginza brand street area by 11:15 pm on Saturday
● Outcome – Increase revenue by matching expected spike in taxi demand with increased supply,
Reduce idle time for drivers
18. © 2018 TigerGraph. All Rights Reserved
40
Location Analytics Use Case 1: Matching Demand & Supply
with Predicted Traffic Flow Data For Busy Locations
78 96 120 92
45 45 80 29
45 323445
60
35
17
54
35
14
79 25 72
45 35 44 57
89 648745
45
98
34
65
35
78
78 85 78 45 45 35 98 64 33 63 654 69
Input Driver
Suggested Moving direction
120
In Flow
40
Out Flow
Flow Prediction Based
on Historical Data
40
Number of Taxis in
that mesh
19. © 2018 TigerGraph. All Rights Reserved
Solving Problems with Graph Algorithms
An Analytics Toolbox specifically for graphs
19
Supervised learning: look for particular patterns/features, then correlate to known cases
Unsupervised learning: look for frequent/infrequent patterns & groupings
20. © 2018 TigerGraph. All Rights Reserved
Graph Algorithms: Built-in Analytics Functions
Several categories:
● Path-related
○ What is the shortest path between two nodes?
● Centrality
○ Which nodes are the most centrally located?
● Community Detection
○ What the the natural groupings of nodes, based on connectedness?
● Similarity
○ Which nodes are very similar to one another, based on context?
20
21. © 2018 TigerGraph. All Rights Reserved
Graph Algorithm Use Case:
Understanding & Leveraging Influencers in Medical Care
Given a national network of medical providers and prescribers, find connected
communities and who are the top influencers with each community.
• USA – Private Health Care System
• Very decentralized, many layers and pay-per-service leads in inefficiency
• UK/Europe – National Health Care
• Is the central authority making good decisions?
Applies to other cases with many decision makers
• Viral/influencer marketing
21
Based on a project by Large US Pharma
22. © 2018 TigerGraph. All Rights Reserved
Influencer Analysis as a Graph Analytics Problem
Business Benefits
• Understand care and referral
dynamics better
• Target education at the
influencers
• Identify which influencers are also
best-practice practitioners
22
Analysis Steps
1. Find the most influential provider in each
region, for a related group of medical codes
(Diabetes, Cardiac Care, etc.)?
2. Who is influenced by these leaders
(e.g. other doctors, physical therapists, facilities)?
3. What is the community size and impact
(patients and providers) around these hubs?
23. © 2018 TigerGraph. All Rights Reserved
1. How do I find the most influential provider in each region
for a particular medical condition?
Whole-Graph Compute problem
A. Analyze claims data to create referral relationships
among providers (Time Series Analysis)
B. Create subsets of claims around each condition with a
group of healthcare codes (e.g. CPT codes) for each
region (e.g. local healthcare market)
C. Utilize PageRank to score hubs within each market
23
Dr. Thomas
Condition: Diabetes
Healthcare Market: S. San Jose, CA
Hub Identified: Dr. Thomas
?
24. © 2018 TigerGraph. All Rights Reserved
2. Who is influenced by these leaders (e.g. other doctors,
physical therapists, facilities)?
Utilize Community Detection
A.Identify communities of providers around each hub
for each region and for a specific condition
B.Track changes over time to detect significant shifts
in communities
24
Dr. Thomas
Condition: Diabetes
Healthcare Market: S. San Jose, CA
Hub Identified: Dr. Thomas
Community Detected: Diabetes – S. San Jose – Dr. Thomas
25. © 2018 TigerGraph. All Rights Reserved
3. What is the community size and impact (patients and
providers) around these hubs?
A.Compute cost of care for initial diagnosis and
follow-on treatment for each community
B.Compare with other communities with similar
patient population
C.Track changes over time to detect significant
changes in cost of care
25
Dr. Thomas
Condition: Diabetes
Healthcare Market: S. San Jose, CA
Hub Identified: Dr. Thomas
Community Detected: Diabetes – S. San Jose – Dr. Thomas
Cost of care: initial diagnosis, follow-on care (medicine, tests, treatment)
26. © 2018 TigerGraph. All Rights Reserved
GSQL Graph Algorithm Library
● Well-designed commonly-used graph algorithms written in GSQL
● TigerGraph's Advantages:
○ Native MPP design means faster execution
○ GSQL was designed for analytic functions
○ Open-source and user-extensible
■ Users can see the GSQL, learn from it, and modify it,
to customize their algorithms.
■ In other platforms, the algorithms are embedded functions
which cannot be viewed or modified.
■ On Github:
https://github.com/tigergraph/ecosys/tree/master/graph_algorithms
26
27. © 2018 TigerGraph. All Rights Reserved
Summary
● This is the Year of Graph!
● Enterprises are ready to adopt Operational Graph Analytics, with
Big Data.
● Graph DBs need to provide scalability, scale, and query support.
● There are many use cases.
● Graph Algorithm Libraries will help to deliver the query support.
27
28. © 2018 TigerGraph. All Rights Reserved
Thank You!
Developer Portal
www.tigergraph.com/developers/
Free Developer Edition
Now runs on Linux, Docker and VM (for Mac, Windows)
www.tigergraph.com/download
28
Victor Lee
Director, Product
Development
TigerGraph
victor@tigergraph.com