Graph Data: a New Data Management Frontier

Graph Data
A New Data Management Frontier
Demai Ni
Demai.Ni@huawei.com
Gauss American Lab
Huawei Central Software Institute

2
Contents
JanusGraph and Collaboration
2
Huawei Cloud Data Management1
Architecture and Challenges3
Why Graph Database?
4
Demai.Ni@huawei.com

3
An Industry-Leading Integrated Big Data Analysis Platform
Finance cloud
(Real-time credit
and risk control)
Safe City cloud
(Relationship analysis, track
analysis, and set collision)
Terminal cloud
(precision marketing, customer
portrait, and group categorization)
On-premise/on-cloud
data warehouse and Data
Mart
Integrated data analysis platform
Unstructured data
GIS JSON XML
Semi-structured data
TXT CSV Log
Structured data
Numeral Date/time Character
Characteristic
data
Relationship
data
Track data Behavior data
Time series
data
Deploy
Monitor
O&M
Cluster
Commis-
sion
Track
Unified
cloud
mgmt
Unified
metadata
Data
governance
and
integration
Data
integration
Data
quality
OLAP analysis
engine
Graph
database
engine
Relationship
analysis
engine
GIS
engine
IOT
engine
Real-time
analysis
engine
Unified data analysis interface
SQL+API
Unified data storage (DFV)
Interface
layer
Engine
layer
Storage
layer
Data
layer
Demai.Ni@huawei.com

4
Objective : PB-level Enterprise Data Warehouse Solution
MPPDB
• Full SQL support, application
transparence
• Open platform with Top Performance
• PB level data management with
Scalability
ELK
• Full SQL support(99 TPC-DS queries)
• High capability
• Transactional IUD on HDFS
• Compatible with all Hadoop
Platforms
FusionInsight MPPDB: A Massively Parallel Processing database by Huawei, with the
capability of PB enterprise data warehouse for Big Data solution
FusionInsight ELK: MPPDB on Hadoop, which provides unified SQL solution
Unified entrance
LFS
Stream
process
Machine
learning
Data mining
unified source management
MPPDB
Enterprise
Data Warehouse
FusionInsight big data platform
CarbonData HDFS
ELK
Interactive analysis
No-SQL connection （SQL-like/API） Standard SQL Standard SQL
Demai.Ni@huawei.com

5
FusionInsight MPPDB: PB-Level High-performance
Cloud and on Premise
Linux 64-bit, universal x86 architecture
(SUSE Linux and Red Hat, or Cloud OS/Storage)
Hardware+
OS
SCTP large-scale cluster communication network
...
Data Node
MPP cluster
Interface
layer
Standard ANSI SQL, JDBC, and ODBC interfaces
Telecommunications
Centralized
operation
analysis
Application
layer
xDR query EDW
Finance Government & public
security
Integrated data
warehouse
Public security
information search
Key features:
• Comprehensive SQL capabilities
and smooth application migration:
TPC-H/TPC-DS allows you to directly execute
SQL statements without modification, supporting
transactions and stored procedures.
• Best-performing open platform in
the industry: Based on x86 servers and an
open Linux platform, Huawei MPPDB supports
column storage, vectorization, all-parallel
execution, and self-learning optimizer, achieving
high performance in interactive SQL queries,
responding to TB-level data correlation analysis
requests within seconds.
• Auto-scaling supporting PB-level
data processing: Based on the MPP
architecture and unique SCTP-based large-scale
cluster communication technology, Huawei
MPPDB provides a solution supporting 256
physical nodes and 10000+ cores, allowing auto
scaling from TB to PB.
Data migration
SQL
development
Cluster
management
Comprehensive
tool chain
FusionInsight MPPDB
...
...
DN DN DN
DN DN DN
DN DN DN
DN DN DN
DN DN DN
DN DN DN
Core
CNCN
Core Core CoreCore Core CoreCore Core
Demai.Ni@huawei.com

6
Contents
2
Why Graph Database?
4
Demai.Ni@huawei.com

7
Graph Database vs. Relational Database
Relational Database doing a wonderful job managing data except for RELATIONSHIPS
Graph Database
Hop(Walk)
O(1)
Flexible
iteration
Air-routes Problem:
find routes from Routes: San Francisco (SFO) to
Shenzhen(SZX) with two stops
select
a1.code,r1.dest,r2.dest
,r3.dest from airports
a1
join routes r1 on
a1.code=r1.src
join routes r2 on
r1.dest=r2.src
join routes r3 on
r2.dest=r3.src
where a1.code= 'SFO'
and r3.dest= 'SZX';
g.V().has('code', 'SFO')
.out()
.out()
.out()
.has('code','SZX')
.path().by('code')
Demai.Ni@huawei.com
JOIN
Relational Database
O(M X N)
Fixed
number of
Operations
Example courtesy of Kelvin Lawrence (https://github.com/krlawrence/graph)

8
 Key Limitations with existing RDBMS for Graph
 Too many expensive Joins amongst tables
 Too many Self-Reference instead of Walking the graph
 Unstructured and semi-structured Data for RDBMS’ 2-dimension
 Flexible and often changed Schema
 SQL Structured Query Language Not a native expression of Graph Relation
Demai.Ni@huawei.com
Graph Database vs. Relational Database
RDMS Graph Database
Relation Amongst tables (or self-reference) Vertex vs. Vertex
Operator Expensive Join Native Edge/Link Path
Results Emphasize accurate and exact results
More common with Estimate Results
for Performance
Model Entity-Relation model/relational algebra Nodes, Relations, Properties and Label

9
Why Graph is important to Huawei?
Network Provider Mobile Device & IoT Cloud Computing
 Network traffic
 Cyber Attack
 Social Relations
 Ads/Recommendation
 Spatial-temporal Data
 Data Center Mgmt
 Fault Detection
 Logistics Analysis
Demai.Ni@huawei.com

10
Contents
2
Why Graph Database?
4
Demai.Ni@huawei.com

11
Application level
Porter Miner FarmerDataFarm
Data information knowledge wisdom
OpenAPE/SDK REST/SNMP/Syslog
HadoopAPI
MPP
DB
System
manager
Services
manager
Security
manager
Manager
PluginAPI
Yarn/Zookeeper
HDFS/HBase
HIVE
Impala
M/R Spark Storm
Solr
ES
ELK
Hadoop
JanusGraph
Demai.Ni@huawei.com
So, Let’s add Graph DB

12
MPPDB
(Libra)
ELK
Demai.Ni@huawei.com

13
Top Challenges for a super large Graph?
 Distribution key?
Edge-cut, Vertex-cut, or random-cut works, just not work well for data
locality and data rebalance
 Massively Parallelism?
Graph walk is a pipeline and iterative operator
 Incremental data/mining?
Insert Update Delete (n) Vertex/Edge may need re-computation of the
graph pattern, with (n2) or (n3) complexity for Incremental Query
Answering
 Where the data comes from?
Often flat file or Relational Database, and ETL is toooooo slooooow!
Demai.Ni@huawei.com

14
Contents
2
Why Graph Database?
4
Demai.Ni@huawei.com

15
Why Janusgraph?
“JanusGraph is a highly scalable graph database optimized for storing and querying large graphs with
billions of vertices and edges distributed across a multi-machine cluster, … a transactional database
that can support thousands of concurrent users, complex traversals, and analytic graph queries.” –
JanusGraph README@github
Key features/Issues to be considered!
 Bulkload into Backend Store
Why: Flat files or RDMS are common data source, and current performance is way slow(at GB/Hour level)
and not user friendly
 Framework for various Partition methods and dynamic balance
Why: current edge-cut, vertex-cut or random partition works, can we do better? And able to balance the
data
 Support Visibility labels (issue 493 for HBase)
Why: Security and Access Control
 Janusgraph + HBase + Solr(or ES, Lucence) tutorial with real use cases
Why: resolve the real world problem one at a time
 Incremental Load/Update
Why: Periodically(Daily, hourly?) data refresh with performance
Demai.Ni@huawei.com

16
Thanks and Call for Collaboration!
Janusgraph and open source community
Collaboration with Academia, industry, Start-up and You!
Join us!
Demai.Ni@huawei.com

Copyright©2015 Huawei Technologies Co., Ltd. All Rights Reserved.
The information in this document may contain predictive statements including, without limitation,
statements regarding the future financial and operating results, future product portfolio, new technology,
etc. There are a number of factors that could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements. Therefore, such information is provided
for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the
information at any time without notice.

Graph Data: a New Data Management Frontier

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Graph Data: a New Data Management Frontier

Ähnlich wie Graph Data: a New Data Management Frontier (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Graph Data: a New Data Management Frontier

Hinweis der Redaktion