SlideShare ist ein Scribd-Unternehmen logo
1 von 38
hosted by
HBaseConAsia2018
JanusGraph —
Distributed graph database with HBase
XueMin Zhang @ TalkingData
hosted by
Content
01
02
04
03
About Us
Something about Graph
Introduction to JanusGraph
JanusGraph with HBase
hosted by
Content
01
02
04
03
About Us
Something about Graph
Introduction to JanusGraph
JanusGraph with HBase
hosted by
About us
• Seven years of practical experience in technical research and development(R&D),focusing
on distributed storage, distributed computing, real-time computing, etc.
• Successively worked in Sina Weibo and TalkingData, and served as the big data Team
Leader of Sina r&d center.
• Technical speechers on the platforms of China Hadoop, Strata Hadoop/Data Conference
and DTCC.
About me
• Founded in 2011, TalkingData is China’s leading third-party big data platform. With
SmartDP as the core of its data intelligence application ecosystem, TalkingData empowers
enterprises and helps them achieve a data-driven digital transformation.
• From the beginning, TalkingData’s vision of using “big data for smarter business
decisions and a better world” has allowed it to gradually become China’s leading data
intelligence solution provider. TalkingData creates value for clients and serves as their
“performance partner,” helping modern enterprises achieve data-driven transformation
and accelerating the digitization of clients from various industries. Using data-generated
insights to change how people see the world and themselves, TalkingData hopes to
ultimately improve people’s lives.
About TalkingData
hosted by
Content
01
02
04
03
About Us
Something about Graph
Introduction to JanusGraph
JanusGraph with HBase
hosted by
Something about Graph
What is a Graph Database
 As name suggests, it is a database.
 Uses graph structures for semantic queries with nodes, edges
and properties to represent and store data.
 Allow data in the store to be linked together directly.
 compare with traditional relational databases
 Hybrid relations.
 Handy in finding connections between entities.
hosted by
Something about Graph
Graph Structures - Vertices
 Vertices are the nodes or
points in a graph
structure
 Every vertex may contain
a unique ID.
hosted by
Something about Graph
Graph Structures - Vertices
 Vertices are the nodes or
points in a graph
structure
 Every vertex may contain
a unique ID.
 Vertices can be
associated with a set of
properties (key-value
pairs)
hosted by
Something about Graph
Graph Structures - Edges
 Edges are the
connections between the
vertices in a graph
hosted by
Something about Graph
Graph Structures - Edges
 Edges are the connections
between the vertices in a
graph
 Edges can be
nondirectional, directional,
or bidirectional
hosted by
Something about Graph
Graph Structures - Edges
 Edges are the connections
between the vertices in a
graph
 Edges can be
nondirectional,
directional, or
bidirectional
 Edges like vertices can
have properties and id
hosted by
Something about Graph
Graph Structures - Graph
 G = (V, E)
 The graph is the
collection of vertices,
edges, and associated
properties
 Vertices and edges can
use label classification
hosted by
Something about Graph
Graph Storage Model - Adjacency Matrix
0 1 1 1 0 0
1 0 0 0 0 0
1 0 0 1 0 0
1 0 1 0 1 0
0 0 0 1 0 0
0 0 1 0 0 0
1 2 3 4 5 6G.vertices =
G.edges = 1
2
3
4
5
6
1 2 3 4 5 6
hosted by
Something about Graph
Graph Storage Model - Adjacency Lists
1
2
3
4
5
6
2 3 4 Λ
1 Λ
1 4 6 Λ
1 3 5 Λ
4 Λ
3 Λ
hosted by
Content
01
02
04
03
About Us
Something about Graph
Introduction to JanusGraph
JanusGraph with HBase
hosted by
Introduction to JanusGraph
 Scalable graph database distribute on multi-maching clusters with
pluggable storage and indexing.
 Fully compliant with Apache TinkerPop graph computing framework.
 Optimized for storing/querying billions of vertices and edges.
 Supports thousands of concurrent users.
 Can execute local queries (OLTP) or cross-cluster distributed queries
(OLAP).
 Sponsored by the Linux Foundation.
 Apache License 2.0
hosted by
Introduction to JanusGraph
Architecture
hosted by
Introduction to JanusGraph
Apache Tinkerpop & Gremlin
 A graph computing
framework for both graph
databases (OLTP) and
graph analytic systems
(OLAP)
 Gremlin graph traversal
language
hosted by
Introduction to JanusGraph
Schema and Data Modeling
 Consist of edge labels, property keys, vertex labels ,index
 Explicit or Implicit
 Can evolve over time without database downtime
hosted by
Introduction to JanusGraph
Schema - Edge Label Multiplicity
 MULTI: Multiple edges of the same label between vertices
 SIMPLE: One edge with that label (unique per label)
 MANY2ONE: One outgoing edge with that label
 ONE2MANY: One incoming edge with that label
 ONE2ONE: One incoming, one outgoing edge with that label
hosted by
Introduction to JanusGraph
Schema - Property Key Data Types
hosted by
Introduction to JanusGraph
Schema - Property Key Cardinality
 SINGLE: At most one value per element.
 LIST: Arbitrary number of values per element. Allows duplicates.
 SET: Multiple values, but no duplicates.
hosted by
Introduction to JanusGraph
Storage Model
hosted by
Introduction to JanusGraph
What is Graph Partitioning?
 When the JanusGraph cluster consists of multiple storage backend
instances, the graph must be partitioned across those machines.
 Stores graph in an adjacency list , ssignment of vertices to machines
determines the partitioning.
 Different ways to partition a graph
 Random Graph Partitioning
 Explicit Graph Partitioning
hosted by
Introduction to JanusGraph
Random Graph Partitioning
 Pros
 Very efficient
 Requires no configuration
 Results in balanced partitions
 Cons
 Less efficient query processing as the cluster grows
 Requires more cross-instance communication to retrieve the
desired
hosted by
Introduction to JanusGraph
Explicit Graph Partitioning
 Pros
 Ensures strongly connected subgraphs are stored on the same
instance
 Reduces the communication overhead significantly
 Easy to setup
 Cons
 Only enabled against storage backends that support ordered key
 Hotspot issue
hosted by
Introduction to JanusGraph
Edge Cut & Vertex Cut
 Edge Cut
 Vertices are hosted on separate machines.
 Optimization aims to reduce the cross communication and thereby
improve query execution.
 Vertex Cut (by label)
 A vertex label can be defined as partitioned which means that all
vertices of that label will be partitiond across the cluster.
 In other words, Storing a subset of that vertex’s adjacency list on each
partition .
 Address the hotspot issue caused by vertices with a large number of
incident edges.
hosted by
Introduction to JanusGraph
What is Graph Index?
 graph indexes : efficient retrieval of vertices or edges by their
properties
 Composite Index (supported through the primary storage
backend)
 Mixed Index (supported through external indexing backend)
 vertex-centric indexes : effectively address query performance for
large degree vertices
hosted by
Content
01
02
04
03
About Us
Something about Graph
Introduction to JanusGraph
JanusGraph with HBase
hosted by
JanusGraph with HBase
HBase – Perfect Storage Backend for JanusGraph
 Tight integration with the Apache Hadoop ecosystem.
 Native support for strong consistency.
 Linear scalability with the addition of more machines.
 Scalability and partitioning
 Read and write speed
 Big enough for your biggest graph
 Support for exporting metrics via JMX.
 Great open community
hosted by
JanusGraph with HBase
HBase – Perfect Storage Backend for JanusGraph
 Simple configuration
 storage.backend=hbase
 storage.hostname=zk-host1,zk-host2,zk-host3
 storage.hbase.table=janusgraph
 storage.port=2181
 storage.hbase.ext.zookeeper.znode.parent=/hbase
hosted by
JanusGraph with HBase
HBase – Perfect Storage Backend for JanusGraph
 A variety of reading and writing way
 Batch to mutate
 Get or Multi Get
 Key range scan
 ColumnRangeFilter
 ColumnPaginationFilter
hosted by
JanusGraph with HBase
HBase Storage Model - Column Families
CF attributes can be set. E.g. compression, TTL.
 Edge store -> e
 Index store -> g
 Id store -> i
 Transaction log store -> l
 System property store -> s
hosted by
JanusGraph with HBase
HBase Storage Model - Edge store -> e
 Storage vertex label, edge, property data
 RowKey -> Vertex ID
• Count
• ID padding
• Partition ID
 Vertex label save as edge
 Vertex property and edge save as relation
• Relation ID( Property key id / Edge label id + direction )
hosted by
JanusGraph with HBase
HBase Storage Model - Edge store -> e
hosted by
JanusGraph with HBase
HBase Storage Model - Index store -> g
 Storage graph indexes (Composite Index) data
 Rowkey -> property values
 Cell value->
• relationId
• outVertexId
• typeId
• inVertexId
hosted by
JanusGraph with HBase
Optimization Suggestions
 hbase.regionserver.thread.compaction.large/small
 hbase.hstore.flusher.count
 hbase.hregion.memstore.flush.sizeh
 base.hregion.memstore.block.multiplier
 hbase.hregion.percolumnfamilyflush.size.lower.bound
 hbase.regionserver.global.memstore.size
 hfile.block.cache.size
 hbase.regionserver.global.memstore.size.lower.limit
(hbase.regionserver.global.memstore.lowerLimit)
 Random vs. Explicit Partitioning
hosted by
Thanks

Weitere ähnliche Inhalte

Was ist angesagt?

Hardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsHardware Assisted Latency Investigations
Hardware Assisted Latency Investigations
ScyllaDB
 

Was ist angesagt? (20)

Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion Objects
 
Apache Spark 2.4 and 3.0 What's Next?
Apache Spark 2.4 and 3.0  What's Next? Apache Spark 2.4 and 3.0  What's Next?
Apache Spark 2.4 and 3.0 What's Next?
 
Cassandraのしくみ データの読み書き編
Cassandraのしくみ データの読み書き編Cassandraのしくみ データの読み書き編
Cassandraのしくみ データの読み書き編
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
Hive on Spark の設計指針を読んでみた
Hive on Spark の設計指針を読んでみたHive on Spark の設計指針を読んでみた
Hive on Spark の設計指針を読んでみた
 
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCERCEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent Memory
 
MapReduce/YARNの仕組みを知る
MapReduce/YARNの仕組みを知るMapReduce/YARNの仕組みを知る
MapReduce/YARNの仕組みを知る
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Hardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsHardware Assisted Latency Investigations
Hardware Assisted Latency Investigations
 
Apache Hadoopに見るJavaミドルウェアのcompatibility(Open Developers Conference 2020 Onli...
Apache Hadoopに見るJavaミドルウェアのcompatibility(Open Developers Conference 2020 Onli...Apache Hadoopに見るJavaミドルウェアのcompatibility(Open Developers Conference 2020 Onli...
Apache Hadoopに見るJavaミドルウェアのcompatibility(Open Developers Conference 2020 Onli...
 
Using ScyllaDB for Distribution of Game Assets in Unreal Engine
Using ScyllaDB for Distribution of Game Assets in Unreal EngineUsing ScyllaDB for Distribution of Game Assets in Unreal Engine
Using ScyllaDB for Distribution of Game Assets in Unreal Engine
 
リペア時間短縮にむけた取り組み@Yahoo! JAPAN #casstudy
リペア時間短縮にむけた取り組み@Yahoo! JAPAN #casstudyリペア時間短縮にむけた取り組み@Yahoo! JAPAN #casstudy
リペア時間短縮にむけた取り組み@Yahoo! JAPAN #casstudy
 
これがCassandra
これがCassandraこれがCassandra
これがCassandra
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 

Ähnlich wie HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase

Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkMartin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Flink Forward
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Andrey Vykhodtsev
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
nzhang
 

Ähnlich wie HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase (20)

Oslo bekk2014
Oslo bekk2014Oslo bekk2014
Oslo bekk2014
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
 
SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
 
Processing large-scale graphs with Google Pregel
Processing large-scale graphs with Google PregelProcessing large-scale graphs with Google Pregel
Processing large-scale graphs with Google Pregel
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con R
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkMartin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
 
Time series database by Harshil Ambagade
Time series database by Harshil AmbagadeTime series database by Harshil Ambagade
Time series database by Harshil Ambagade
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
 
The ABC of Big Data
The ABC of Big DataThe ABC of Big Data
The ABC of Big Data
 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAs
 

Mehr von Michael Stack

Mehr von Michael Stack (20)

hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloudhbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
 
hbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinteresthbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinterest
 
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltdhbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
 
hbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didihbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didi
 
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
 
hbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencenthbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencent
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
 
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Componenthbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
 
hbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomihbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomi
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBasehbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solution
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memoryhbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLhbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBasehbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBase
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
 
HBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteHBaseConAsia2019 Keynote
HBaseConAsia2019 Keynote
 

Kürzlich hochgeladen

Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
ChloeMeadows1
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
lolsDocherty
 

Kürzlich hochgeladen (16)

Thank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsThank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirts
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case Study
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?
 
Reggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirts
 
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
 
Statistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfStatistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdf
 
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresenceCyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
 
Bug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideBug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's Guide
 
Development Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appsDevelopment Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of apps
 
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebiThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
 
GOOGLE Io 2024 At takes center stage.pdf
GOOGLE Io 2024 At takes center stage.pdfGOOGLE Io 2024 At takes center stage.pdf
GOOGLE Io 2024 At takes center stage.pdf
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
 
I’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 ShirtI’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 Shirt
 
Premier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfPremier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdf
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdf
 

HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase

  • 1. hosted by HBaseConAsia2018 JanusGraph — Distributed graph database with HBase XueMin Zhang @ TalkingData
  • 2. hosted by Content 01 02 04 03 About Us Something about Graph Introduction to JanusGraph JanusGraph with HBase
  • 3. hosted by Content 01 02 04 03 About Us Something about Graph Introduction to JanusGraph JanusGraph with HBase
  • 4. hosted by About us • Seven years of practical experience in technical research and development(R&D),focusing on distributed storage, distributed computing, real-time computing, etc. • Successively worked in Sina Weibo and TalkingData, and served as the big data Team Leader of Sina r&d center. • Technical speechers on the platforms of China Hadoop, Strata Hadoop/Data Conference and DTCC. About me • Founded in 2011, TalkingData is China’s leading third-party big data platform. With SmartDP as the core of its data intelligence application ecosystem, TalkingData empowers enterprises and helps them achieve a data-driven digital transformation. • From the beginning, TalkingData’s vision of using “big data for smarter business decisions and a better world” has allowed it to gradually become China’s leading data intelligence solution provider. TalkingData creates value for clients and serves as their “performance partner,” helping modern enterprises achieve data-driven transformation and accelerating the digitization of clients from various industries. Using data-generated insights to change how people see the world and themselves, TalkingData hopes to ultimately improve people’s lives. About TalkingData
  • 5. hosted by Content 01 02 04 03 About Us Something about Graph Introduction to JanusGraph JanusGraph with HBase
  • 6. hosted by Something about Graph What is a Graph Database  As name suggests, it is a database.  Uses graph structures for semantic queries with nodes, edges and properties to represent and store data.  Allow data in the store to be linked together directly.  compare with traditional relational databases  Hybrid relations.  Handy in finding connections between entities.
  • 7. hosted by Something about Graph Graph Structures - Vertices  Vertices are the nodes or points in a graph structure  Every vertex may contain a unique ID.
  • 8. hosted by Something about Graph Graph Structures - Vertices  Vertices are the nodes or points in a graph structure  Every vertex may contain a unique ID.  Vertices can be associated with a set of properties (key-value pairs)
  • 9. hosted by Something about Graph Graph Structures - Edges  Edges are the connections between the vertices in a graph
  • 10. hosted by Something about Graph Graph Structures - Edges  Edges are the connections between the vertices in a graph  Edges can be nondirectional, directional, or bidirectional
  • 11. hosted by Something about Graph Graph Structures - Edges  Edges are the connections between the vertices in a graph  Edges can be nondirectional, directional, or bidirectional  Edges like vertices can have properties and id
  • 12. hosted by Something about Graph Graph Structures - Graph  G = (V, E)  The graph is the collection of vertices, edges, and associated properties  Vertices and edges can use label classification
  • 13. hosted by Something about Graph Graph Storage Model - Adjacency Matrix 0 1 1 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 2 3 4 5 6G.vertices = G.edges = 1 2 3 4 5 6 1 2 3 4 5 6
  • 14. hosted by Something about Graph Graph Storage Model - Adjacency Lists 1 2 3 4 5 6 2 3 4 Λ 1 Λ 1 4 6 Λ 1 3 5 Λ 4 Λ 3 Λ
  • 15. hosted by Content 01 02 04 03 About Us Something about Graph Introduction to JanusGraph JanusGraph with HBase
  • 16. hosted by Introduction to JanusGraph  Scalable graph database distribute on multi-maching clusters with pluggable storage and indexing.  Fully compliant with Apache TinkerPop graph computing framework.  Optimized for storing/querying billions of vertices and edges.  Supports thousands of concurrent users.  Can execute local queries (OLTP) or cross-cluster distributed queries (OLAP).  Sponsored by the Linux Foundation.  Apache License 2.0
  • 17. hosted by Introduction to JanusGraph Architecture
  • 18. hosted by Introduction to JanusGraph Apache Tinkerpop & Gremlin  A graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP)  Gremlin graph traversal language
  • 19. hosted by Introduction to JanusGraph Schema and Data Modeling  Consist of edge labels, property keys, vertex labels ,index  Explicit or Implicit  Can evolve over time without database downtime
  • 20. hosted by Introduction to JanusGraph Schema - Edge Label Multiplicity  MULTI: Multiple edges of the same label between vertices  SIMPLE: One edge with that label (unique per label)  MANY2ONE: One outgoing edge with that label  ONE2MANY: One incoming edge with that label  ONE2ONE: One incoming, one outgoing edge with that label
  • 21. hosted by Introduction to JanusGraph Schema - Property Key Data Types
  • 22. hosted by Introduction to JanusGraph Schema - Property Key Cardinality  SINGLE: At most one value per element.  LIST: Arbitrary number of values per element. Allows duplicates.  SET: Multiple values, but no duplicates.
  • 23. hosted by Introduction to JanusGraph Storage Model
  • 24. hosted by Introduction to JanusGraph What is Graph Partitioning?  When the JanusGraph cluster consists of multiple storage backend instances, the graph must be partitioned across those machines.  Stores graph in an adjacency list , ssignment of vertices to machines determines the partitioning.  Different ways to partition a graph  Random Graph Partitioning  Explicit Graph Partitioning
  • 25. hosted by Introduction to JanusGraph Random Graph Partitioning  Pros  Very efficient  Requires no configuration  Results in balanced partitions  Cons  Less efficient query processing as the cluster grows  Requires more cross-instance communication to retrieve the desired
  • 26. hosted by Introduction to JanusGraph Explicit Graph Partitioning  Pros  Ensures strongly connected subgraphs are stored on the same instance  Reduces the communication overhead significantly  Easy to setup  Cons  Only enabled against storage backends that support ordered key  Hotspot issue
  • 27. hosted by Introduction to JanusGraph Edge Cut & Vertex Cut  Edge Cut  Vertices are hosted on separate machines.  Optimization aims to reduce the cross communication and thereby improve query execution.  Vertex Cut (by label)  A vertex label can be defined as partitioned which means that all vertices of that label will be partitiond across the cluster.  In other words, Storing a subset of that vertex’s adjacency list on each partition .  Address the hotspot issue caused by vertices with a large number of incident edges.
  • 28. hosted by Introduction to JanusGraph What is Graph Index?  graph indexes : efficient retrieval of vertices or edges by their properties  Composite Index (supported through the primary storage backend)  Mixed Index (supported through external indexing backend)  vertex-centric indexes : effectively address query performance for large degree vertices
  • 29. hosted by Content 01 02 04 03 About Us Something about Graph Introduction to JanusGraph JanusGraph with HBase
  • 30. hosted by JanusGraph with HBase HBase – Perfect Storage Backend for JanusGraph  Tight integration with the Apache Hadoop ecosystem.  Native support for strong consistency.  Linear scalability with the addition of more machines.  Scalability and partitioning  Read and write speed  Big enough for your biggest graph  Support for exporting metrics via JMX.  Great open community
  • 31. hosted by JanusGraph with HBase HBase – Perfect Storage Backend for JanusGraph  Simple configuration  storage.backend=hbase  storage.hostname=zk-host1,zk-host2,zk-host3  storage.hbase.table=janusgraph  storage.port=2181  storage.hbase.ext.zookeeper.znode.parent=/hbase
  • 32. hosted by JanusGraph with HBase HBase – Perfect Storage Backend for JanusGraph  A variety of reading and writing way  Batch to mutate  Get or Multi Get  Key range scan  ColumnRangeFilter  ColumnPaginationFilter
  • 33. hosted by JanusGraph with HBase HBase Storage Model - Column Families CF attributes can be set. E.g. compression, TTL.  Edge store -> e  Index store -> g  Id store -> i  Transaction log store -> l  System property store -> s
  • 34. hosted by JanusGraph with HBase HBase Storage Model - Edge store -> e  Storage vertex label, edge, property data  RowKey -> Vertex ID • Count • ID padding • Partition ID  Vertex label save as edge  Vertex property and edge save as relation • Relation ID( Property key id / Edge label id + direction )
  • 35. hosted by JanusGraph with HBase HBase Storage Model - Edge store -> e
  • 36. hosted by JanusGraph with HBase HBase Storage Model - Index store -> g  Storage graph indexes (Composite Index) data  Rowkey -> property values  Cell value-> • relationId • outVertexId • typeId • inVertexId
  • 37. hosted by JanusGraph with HBase Optimization Suggestions  hbase.regionserver.thread.compaction.large/small  hbase.hstore.flusher.count  hbase.hregion.memstore.flush.sizeh  base.hregion.memstore.block.multiplier  hbase.hregion.percolumnfamilyflush.size.lower.bound  hbase.regionserver.global.memstore.size  hfile.block.cache.size  hbase.regionserver.global.memstore.size.lower.limit (hbase.regionserver.global.memstore.lowerLimit)  Random vs. Explicit Partitioning