SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Graph Data
A New Data Management Frontier
Demai Ni
Demai.Ni@huawei.com
Gauss American Lab
Huawei Central Software Institute
2
Contents
JanusGraph and Collaboration
2
Huawei Cloud Data Management1
Architecture and Challenges3
Why Graph Database?
4
Demai.Ni@huawei.com
3
An Industry-Leading Integrated Big Data Analysis Platform
Finance cloud
(Real-time credit
and risk control)
Safe City cloud
(Relationship analysis, track
analysis, and set collision)
Terminal cloud
(precision marketing, customer
portrait, and group categorization)
On-premise/on-cloud
data warehouse and Data
Mart
Integrated data analysis platform
Unstructured data
GIS JSON XML
Semi-structured data
TXT CSV Log
Structured data
Numeral Date/time Character
Characteristic
data
Relationship
data
Track data Behavior data
Time series
data
Deploy
Monitor
O&M
Cluster
Commis-
sion
Track
Unified
cloud
mgmt
Unified
metadata
Data
governance
and
integration
Data
integration
Data
quality
OLAP analysis
engine
Graph
database
engine
Relationship
analysis
engine
GIS
engine
IOT
engine
Real-time
analysis
engine
Unified data analysis interface
SQL+API
Unified data storage (DFV)
Interface
layer
Engine
layer
Storage
layer
Data
layer
Demai.Ni@huawei.com
4
Objective : PB-level Enterprise Data Warehouse Solution
MPPDB
• Full SQL support, application
transparence
• Open platform with Top Performance
• PB level data management with
Scalability
ELK
• Full SQL support(99 TPC-DS queries)
• High capability
• Transactional IUD on HDFS
• Compatible with all Hadoop
Platforms
FusionInsight MPPDB: A Massively Parallel Processing database by Huawei, with the
capability of PB enterprise data warehouse for Big Data solution
FusionInsight ELK: MPPDB on Hadoop, which provides unified SQL solution
Unified entrance
LFS
Stream
process
Machine
learning
Data mining
unified source management
MPPDB
Enterprise
Data Warehouse
FusionInsight big data platform
CarbonData HDFS
ELK
Interactive analysis
No-SQL connection (SQL-like/API) Standard SQL Standard SQL
Demai.Ni@huawei.com
5
FusionInsight MPPDB: PB-Level High-performance
Cloud and on Premise
Linux 64-bit, universal x86 architecture
(SUSE Linux and Red Hat, or Cloud OS/Storage)
Hardware+
OS
SCTP large-scale cluster communication network
...
Data Node
MPP cluster
Interface
layer
Standard ANSI SQL, JDBC, and ODBC interfaces
Telecommunications
Centralized
operation
analysis
Application
layer
xDR query EDW
Finance Government & public
security
Integrated data
warehouse
Public security
information search
Key features:
• Comprehensive SQL capabilities
and smooth application migration:
TPC-H/TPC-DS allows you to directly execute
SQL statements without modification, supporting
transactions and stored procedures.
• Best-performing open platform in
the industry: Based on x86 servers and an
open Linux platform, Huawei MPPDB supports
column storage, vectorization, all-parallel
execution, and self-learning optimizer, achieving
high performance in interactive SQL queries,
responding to TB-level data correlation analysis
requests within seconds.
• Auto-scaling supporting PB-level
data processing: Based on the MPP
architecture and unique SCTP-based large-scale
cluster communication technology, Huawei
MPPDB provides a solution supporting 256
physical nodes and 10000+ cores, allowing auto
scaling from TB to PB.
Data migration
SQL
development
Cluster
management
Comprehensive
tool chain
FusionInsight MPPDB
...
...
DN DN DN
DN DN DN
DN DN DN
DN DN DN
DN DN DN
DN DN DN
Core
CNCN
Core Core CoreCore Core CoreCore Core
Demai.Ni@huawei.com
6
Contents
JanusGraph and Collaboration
2
Huawei Cloud Data Management1
Architecture and Challenges3
Why Graph Database?
4
Demai.Ni@huawei.com
7
Graph Database vs. Relational Database
Relational Database doing a wonderful job managing data except for RELATIONSHIPS
Graph Database
Hop(Walk)
O(1)
Flexible
iteration
Air-routes Problem:
find routes from Routes: San Francisco (SFO) to
Shenzhen(SZX) with two stops
select
a1.code,r1.dest,r2.dest
,r3.dest from airports
a1
join routes r1 on
a1.code=r1.src
join routes r2 on
r1.dest=r2.src
join routes r3 on
r2.dest=r3.src
where a1.code= 'SFO'
and r3.dest= 'SZX';
g.V().has('code', 'SFO')
.out()
.out()
.out()
.has('code','SZX')
.path().by('code')
Demai.Ni@huawei.com
JOIN
Relational Database
O(M X N)
Fixed
number of
Operations
Example courtesy of Kelvin Lawrence (https://github.com/krlawrence/graph)
8
 Key Limitations with existing RDBMS for Graph
 Too many expensive Joins amongst tables
 Too many Self-Reference instead of Walking the graph
 Unstructured and semi-structured Data for RDBMS’ 2-dimension
 Flexible and often changed Schema
 SQL Structured Query Language Not a native expression of Graph Relation
Demai.Ni@huawei.com
Graph Database vs. Relational Database
RDMS Graph Database
Relation Amongst tables (or self-reference) Vertex vs. Vertex
Operator Expensive Join Native Edge/Link Path
Results Emphasize accurate and exact results
More common with Estimate Results
for Performance
Model Entity-Relation model/relational algebra Nodes, Relations, Properties and Label
9
Why Graph is important to Huawei?
Network Provider Mobile Device & IoT Cloud Computing
 Network traffic
 Cyber Attack
 Social Relations
 Ads/Recommendation
 Spatial-temporal Data
 Data Center Mgmt
 Fault Detection
 Logistics Analysis
Demai.Ni@huawei.com
10
Contents
JanusGraph and Collaboration
2
Huawei Cloud Data Management1
Architecture and Challenges3
Why Graph Database?
4
Demai.Ni@huawei.com
11
Application level
Porter Miner FarmerDataFarm
Data information knowledge wisdom
OpenAPE/SDK REST/SNMP/Syslog
HadoopAPI
MPP
DB
System
manager
Services
manager
Security
manager
Manager
PluginAPI
Yarn/Zookeeper
HDFS/HBase
HIVE
Impala
M/R Spark Storm
Solr
ES
ELK
Hadoop
JanusGraph
Demai.Ni@huawei.com
So, Let’s add Graph DB
12
MPPDB
(Libra)
ELK
Demai.Ni@huawei.com
13
Top Challenges for a super large Graph?
 Distribution key?
Edge-cut, Vertex-cut, or random-cut works, just not work well for data
locality and data rebalance
 Massively Parallelism?
Graph walk is a pipeline and iterative operator
 Incremental data/mining?
Insert Update Delete (n) Vertex/Edge may need re-computation of the
graph pattern, with (n2) or (n3) complexity for Incremental Query
Answering
 Where the data comes from?
Often flat file or Relational Database, and ETL is toooooo slooooow!
Demai.Ni@huawei.com
14
Contents
JanusGraph and Collaboration
2
Huawei Cloud Data Management1
Architecture and Challenges3
Why Graph Database?
4
Demai.Ni@huawei.com
15
Why Janusgraph?
“JanusGraph is a highly scalable graph database optimized for storing and querying large graphs with
billions of vertices and edges distributed across a multi-machine cluster, … a transactional database
that can support thousands of concurrent users, complex traversals, and analytic graph queries.” –
JanusGraph README@github
Key features/Issues to be considered!
 Bulkload into Backend Store
Why: Flat files or RDMS are common data source, and current performance is way slow(at GB/Hour level)
and not user friendly
 Framework for various Partition methods and dynamic balance
Why: current edge-cut, vertex-cut or random partition works, can we do better? And able to balance the
data
 Support Visibility labels (issue 493 for HBase)
Why: Security and Access Control
 Janusgraph + HBase + Solr(or ES, Lucence) tutorial with real use cases
Why: resolve the real world problem one at a time
 Incremental Load/Update
Why: Periodically(Daily, hourly?) data refresh with performance
Demai.Ni@huawei.com
16
Thanks and Call for Collaboration!
Janusgraph and open source community
Collaboration with Academia, industry, Start-up and You!
Join us!
Demai.Ni@huawei.com
Copyright©2015 Huawei Technologies Co., Ltd. All Rights Reserved.
The information in this document may contain predictive statements including, without limitation,
statements regarding the future financial and operating results, future product portfolio, new technology,
etc. There are a number of factors that could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements. Therefore, such information is provided
for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the
information at any time without notice.

Weitere ähnliche Inhalte

Was ist angesagt?

Graph analytics in Linkurious Enterprise
Graph analytics in Linkurious EnterpriseGraph analytics in Linkurious Enterprise
Graph analytics in Linkurious EnterpriseLinkurious
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Big Data Spain
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentationTao Feng
 
Building a Graph-based Analytics Platform
Building a Graph-based Analytics PlatformBuilding a Graph-based Analytics Platform
Building a Graph-based Analytics PlatformKenny Bastani
 
Spark Summit EU 2015: Matei Zaharia keynote
Spark Summit EU 2015: Matei Zaharia keynoteSpark Summit EU 2015: Matei Zaharia keynote
Spark Summit EU 2015: Matei Zaharia keynoteDatabricks
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceRevolution Analytics
 
Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jijtsrd
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best PracticesNeo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best PracticesNeo4j
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingPaco Nathan
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...Connected Data World
 
The Power of Unified Analytics with Ali Ghodsi
The Power of Unified Analytics with Ali Ghodsi The Power of Unified Analytics with Ali Ghodsi
The Power of Unified Analytics with Ali Ghodsi Databricks
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Itai Yaffe
 
Graph tour keynote 2019
Graph tour keynote 2019Graph tour keynote 2019
Graph tour keynote 2019Neo4j
 
Bigdata Machine Learning Platform
Bigdata Machine Learning PlatformBigdata Machine Learning Platform
Bigdata Machine Learning PlatformMk Kim
 
GraphTech Ecosystem - part 2: Graph Analytics
 GraphTech Ecosystem - part 2: Graph Analytics GraphTech Ecosystem - part 2: Graph Analytics
GraphTech Ecosystem - part 2: Graph AnalyticsLinkurious
 
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Databricks
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryNeo4j
 

Was ist angesagt? (20)

Big Data Landscape 2016
Big Data Landscape 2016Big Data Landscape 2016
Big Data Landscape 2016
 
Graph analytics in Linkurious Enterprise
Graph analytics in Linkurious EnterpriseGraph analytics in Linkurious Enterprise
Graph analytics in Linkurious Enterprise
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Building a Graph-based Analytics Platform
Building a Graph-based Analytics PlatformBuilding a Graph-based Analytics Platform
Building a Graph-based Analytics Platform
 
Spark Summit EU 2015: Matei Zaharia keynote
Spark Summit EU 2015: Matei Zaharia keynoteSpark Summit EU 2015: Matei Zaharia keynote
Spark Summit EU 2015: Matei Zaharia keynote
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4j
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best PracticesNeo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely heading
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
 
The Power of Unified Analytics with Ali Ghodsi
The Power of Unified Analytics with Ali Ghodsi The Power of Unified Analytics with Ali Ghodsi
The Power of Unified Analytics with Ali Ghodsi
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?
 
Graph tour keynote 2019
Graph tour keynote 2019Graph tour keynote 2019
Graph tour keynote 2019
 
Bigdata Machine Learning Platform
Bigdata Machine Learning PlatformBigdata Machine Learning Platform
Bigdata Machine Learning Platform
 
GraphTech Ecosystem - part 2: Graph Analytics
 GraphTech Ecosystem - part 2: Graph Analytics GraphTech Ecosystem - part 2: Graph Analytics
GraphTech Ecosystem - part 2: Graph Analytics
 
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 

Ähnlich wie Graph Data: a New Data Management Frontier

Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudJames Serra
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesDataWorks Summit
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
 
Analytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle ApplicationsAnalytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle ApplicationsRay Février
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionEtu Solution
 
Open Source für den geschäftskritischen Einsatz
Open Source für den geschäftskritischen EinsatzOpen Source für den geschäftskritischen Einsatz
Open Source für den geschäftskritischen EinsatzMariaDB plc
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101MongoDB
 
Open Source für den geschäftskritischen Einsatz
Open Source für den geschäftskritischen EinsatzOpen Source für den geschäftskritischen Einsatz
Open Source für den geschäftskritischen EinsatzMariaDB plc
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 
DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?
DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?
DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?US-Analytics
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalDataWorks Summit
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & RŁukasz Grala
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousingSneha Challa
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleDatabricks
 
Above the cloud joarder kamal
Above the cloud   joarder kamalAbove the cloud   joarder kamal
Above the cloud joarder kamalJoarder Kamal
 

Ähnlich wie Graph Data: a New Data Management Frontier (20)

Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Analytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle ApplicationsAnalytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle Applications
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
 
Open Source für den geschäftskritischen Einsatz
Open Source für den geschäftskritischen EinsatzOpen Source für den geschäftskritischen Einsatz
Open Source für den geschäftskritischen Einsatz
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
Open Source für den geschäftskritischen Einsatz
Open Source für den geschäftskritischen EinsatzOpen Source für den geschäftskritischen Einsatz
Open Source für den geschäftskritischen Einsatz
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?
DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?
DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposal
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & R
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousing
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Above the cloud joarder kamal
Above the cloud   joarder kamalAbove the cloud   joarder kamal
Above the cloud joarder kamal
 

Kürzlich hochgeladen

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Kürzlich hochgeladen (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Graph Data: a New Data Management Frontier

  • 1. Graph Data A New Data Management Frontier Demai Ni Demai.Ni@huawei.com Gauss American Lab Huawei Central Software Institute
  • 2. 2 Contents JanusGraph and Collaboration 2 Huawei Cloud Data Management1 Architecture and Challenges3 Why Graph Database? 4 Demai.Ni@huawei.com
  • 3. 3 An Industry-Leading Integrated Big Data Analysis Platform Finance cloud (Real-time credit and risk control) Safe City cloud (Relationship analysis, track analysis, and set collision) Terminal cloud (precision marketing, customer portrait, and group categorization) On-premise/on-cloud data warehouse and Data Mart Integrated data analysis platform Unstructured data GIS JSON XML Semi-structured data TXT CSV Log Structured data Numeral Date/time Character Characteristic data Relationship data Track data Behavior data Time series data Deploy Monitor O&M Cluster Commis- sion Track Unified cloud mgmt Unified metadata Data governance and integration Data integration Data quality OLAP analysis engine Graph database engine Relationship analysis engine GIS engine IOT engine Real-time analysis engine Unified data analysis interface SQL+API Unified data storage (DFV) Interface layer Engine layer Storage layer Data layer Demai.Ni@huawei.com
  • 4. 4 Objective : PB-level Enterprise Data Warehouse Solution MPPDB • Full SQL support, application transparence • Open platform with Top Performance • PB level data management with Scalability ELK • Full SQL support(99 TPC-DS queries) • High capability • Transactional IUD on HDFS • Compatible with all Hadoop Platforms FusionInsight MPPDB: A Massively Parallel Processing database by Huawei, with the capability of PB enterprise data warehouse for Big Data solution FusionInsight ELK: MPPDB on Hadoop, which provides unified SQL solution Unified entrance LFS Stream process Machine learning Data mining unified source management MPPDB Enterprise Data Warehouse FusionInsight big data platform CarbonData HDFS ELK Interactive analysis No-SQL connection (SQL-like/API) Standard SQL Standard SQL Demai.Ni@huawei.com
  • 5. 5 FusionInsight MPPDB: PB-Level High-performance Cloud and on Premise Linux 64-bit, universal x86 architecture (SUSE Linux and Red Hat, or Cloud OS/Storage) Hardware+ OS SCTP large-scale cluster communication network ... Data Node MPP cluster Interface layer Standard ANSI SQL, JDBC, and ODBC interfaces Telecommunications Centralized operation analysis Application layer xDR query EDW Finance Government & public security Integrated data warehouse Public security information search Key features: • Comprehensive SQL capabilities and smooth application migration: TPC-H/TPC-DS allows you to directly execute SQL statements without modification, supporting transactions and stored procedures. • Best-performing open platform in the industry: Based on x86 servers and an open Linux platform, Huawei MPPDB supports column storage, vectorization, all-parallel execution, and self-learning optimizer, achieving high performance in interactive SQL queries, responding to TB-level data correlation analysis requests within seconds. • Auto-scaling supporting PB-level data processing: Based on the MPP architecture and unique SCTP-based large-scale cluster communication technology, Huawei MPPDB provides a solution supporting 256 physical nodes and 10000+ cores, allowing auto scaling from TB to PB. Data migration SQL development Cluster management Comprehensive tool chain FusionInsight MPPDB ... ... DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Core CNCN Core Core CoreCore Core CoreCore Core Demai.Ni@huawei.com
  • 6. 6 Contents JanusGraph and Collaboration 2 Huawei Cloud Data Management1 Architecture and Challenges3 Why Graph Database? 4 Demai.Ni@huawei.com
  • 7. 7 Graph Database vs. Relational Database Relational Database doing a wonderful job managing data except for RELATIONSHIPS Graph Database Hop(Walk) O(1) Flexible iteration Air-routes Problem: find routes from Routes: San Francisco (SFO) to Shenzhen(SZX) with two stops select a1.code,r1.dest,r2.dest ,r3.dest from airports a1 join routes r1 on a1.code=r1.src join routes r2 on r1.dest=r2.src join routes r3 on r2.dest=r3.src where a1.code= 'SFO' and r3.dest= 'SZX'; g.V().has('code', 'SFO') .out() .out() .out() .has('code','SZX') .path().by('code') Demai.Ni@huawei.com JOIN Relational Database O(M X N) Fixed number of Operations Example courtesy of Kelvin Lawrence (https://github.com/krlawrence/graph)
  • 8. 8  Key Limitations with existing RDBMS for Graph  Too many expensive Joins amongst tables  Too many Self-Reference instead of Walking the graph  Unstructured and semi-structured Data for RDBMS’ 2-dimension  Flexible and often changed Schema  SQL Structured Query Language Not a native expression of Graph Relation Demai.Ni@huawei.com Graph Database vs. Relational Database RDMS Graph Database Relation Amongst tables (or self-reference) Vertex vs. Vertex Operator Expensive Join Native Edge/Link Path Results Emphasize accurate and exact results More common with Estimate Results for Performance Model Entity-Relation model/relational algebra Nodes, Relations, Properties and Label
  • 9. 9 Why Graph is important to Huawei? Network Provider Mobile Device & IoT Cloud Computing  Network traffic  Cyber Attack  Social Relations  Ads/Recommendation  Spatial-temporal Data  Data Center Mgmt  Fault Detection  Logistics Analysis Demai.Ni@huawei.com
  • 10. 10 Contents JanusGraph and Collaboration 2 Huawei Cloud Data Management1 Architecture and Challenges3 Why Graph Database? 4 Demai.Ni@huawei.com
  • 11. 11 Application level Porter Miner FarmerDataFarm Data information knowledge wisdom OpenAPE/SDK REST/SNMP/Syslog HadoopAPI MPP DB System manager Services manager Security manager Manager PluginAPI Yarn/Zookeeper HDFS/HBase HIVE Impala M/R Spark Storm Solr ES ELK Hadoop JanusGraph Demai.Ni@huawei.com So, Let’s add Graph DB
  • 13. 13 Top Challenges for a super large Graph?  Distribution key? Edge-cut, Vertex-cut, or random-cut works, just not work well for data locality and data rebalance  Massively Parallelism? Graph walk is a pipeline and iterative operator  Incremental data/mining? Insert Update Delete (n) Vertex/Edge may need re-computation of the graph pattern, with (n2) or (n3) complexity for Incremental Query Answering  Where the data comes from? Often flat file or Relational Database, and ETL is toooooo slooooow! Demai.Ni@huawei.com
  • 14. 14 Contents JanusGraph and Collaboration 2 Huawei Cloud Data Management1 Architecture and Challenges3 Why Graph Database? 4 Demai.Ni@huawei.com
  • 15. 15 Why Janusgraph? “JanusGraph is a highly scalable graph database optimized for storing and querying large graphs with billions of vertices and edges distributed across a multi-machine cluster, … a transactional database that can support thousands of concurrent users, complex traversals, and analytic graph queries.” – JanusGraph README@github Key features/Issues to be considered!  Bulkload into Backend Store Why: Flat files or RDMS are common data source, and current performance is way slow(at GB/Hour level) and not user friendly  Framework for various Partition methods and dynamic balance Why: current edge-cut, vertex-cut or random partition works, can we do better? And able to balance the data  Support Visibility labels (issue 493 for HBase) Why: Security and Access Control  Janusgraph + HBase + Solr(or ES, Lucence) tutorial with real use cases Why: resolve the real world problem one at a time  Incremental Load/Update Why: Periodically(Daily, hourly?) data refresh with performance Demai.Ni@huawei.com
  • 16. 16 Thanks and Call for Collaboration! Janusgraph and open source community Collaboration with Academia, industry, Start-up and You! Join us! Demai.Ni@huawei.com
  • 17. Copyright©2015 Huawei Technologies Co., Ltd. All Rights Reserved. The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice.

Hinweis der Redaktion

  1. Add a foot note on each page with my email address: Demai.Ni@huawei.com