SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
POWERS OF TEN REDUX
JASON PLURAD • @pluradj
IBM • APACHE TINKERPOP • JANUSGRAPH
DATA DAY TEXAS • #DDTX18 • JANUARY 27, 2018
OPEN SOURCE GRAPH TECH
Property Graph
Connected Data
Model
Apache TinkerPop™
Graph Computing
Framework
JanusGraph® Scalable
Graph Database
Image credits: Apache TinkerPop (ALv2) and JanusGraph (CC-BY-4.
POWERS OF
TEN
Stephen Mallette
Image credit: spmallette on Twitter
101
TEN
Dart Paper Airplane
Image credit: Akkana on Wikimedia Commons, CC BY-SA 3.0
GRAPH TRAVERSALS
Vertex
id: 0
label:
person
• name:
Jason
Vertex
id: 2
label:
airplane
• name:
Dart
• type:
paper
Edge
id: 5, outV: 0, inV: 2
label: throws
• distance: 10
103
ONE
THOUSAND
Wright Flyer
Image credit: John T. Daniels on Wikimedia Commons, Public Domain
GREMLIN CONSOLE
• Read-Eval-Print Loop
• Instant gratification
• Help with reproducible
scripts
Image credit: Apache TinkerPop, ALv2
AIR ROUTES DATA, CSV TO PROPERTY GRAPH
Vertex
id: 0
label: airport
• code: AAE
• desc:
Annaba
Vertex
id: 2
label: airport
• code: ALG
• desc:
Algiers
Edge
id: 5, outV: 0, inV: 2
label: route
• distance: 254
airports.csv
(3,374)
routes.csv
(43,400)
CSV LOADING
• Leverage CSV
libraries
• Be aware of
auto-iteration
• Get-or-Create
pattern with
coalesce()
105
ONE HUNDRED
THOUSAND
Spirit of St. Louis
Image credit: Ad Meskens on Wikimedia Commons, CC BY-SA 3.0
GREMLIN SERVER AND REMOTE GRAPHS
• Gremlin Language Variants
(GLV) for queries, not for
bulkload
• Gremlin Client Drivers
enable efficient batch
scripting
• Use Script Parameterization.
Period.
Image credit: Apache TinkerPop, ALv2
NO PARAMETERIZATION
• Each script gets compiled and cached on the server – EXPENSIVE
• Eventually will exceed the GC overhead limit
BASIC PARAMETERIZATION
• Script is compiled once and reused on future requests
ADVANCED PARAMETERIZATION
• Leverage Groovy script evaluation to handle more complex
scripts
Gremlin-Groovy script
Parameters JSON
STRUCTURED RETURN VALUES
• Serializing all vertex properties and values can be expensive
• Judiciously decide what to include in the response
• Leverage Groovy scripting in combination with Gremlin
traversals for maximum efficiency
Image credit: Apache TinkerPop, ALv2
106
ONE MILLION
Cessna 172 Skyhawk
Image credit: Adrian Pingstone on Wikimedia Commons, Public Domain
JANUSGRAPH
• Open source project with open governance
• Community driven development
• Full implementation of Apache TinkerPop
• Apache license
• Broad adoption
Image credits: The Linux Foundation® and JanusGraph (CC-BY-4.
JANUSGRAPH STORAGE BACKENDS
• In-Memory
• Apache Cassandra, ScyllaDB
• Apache HBase, Google Cloud
Bigtable
• Oracle Berkeley DB Java Edition
• Amazon DynamoDB
Image credit: Apache TinkerPop, ALv2
JANUSGRAPH SCHEMA AND INDEXING
• Graph schema
• Vertex labels
• Edge labels: multiplicity
• Vertex properties: data types, cardinality
• Indexing
• Composite index: exact matches
• Mixed index: full-text search, numerical range, geospatial
• Vertex-centric index: local per vertex, a solution for supernodes
Image credit: JanusGraph, CC-BY-4.0
JANUSGRAPH QUICK-START DISTRIBUTION
• Local server mode
• Client, Storage, and Gremlin Server on a single machine
• Great for testing out JanusGraph, but not recommended for production
use
JANUSGRAPH DEPLOYMENT OPTIONS
• Remote server mode
• Client on first machine
• Storage on second machine
• Remote server mode with Gremlin Server
• Client on first machine
• Gremlin Server on second machine
• Storage on third machine
Image credit: JanusGraph, CC-BY-4.0
107
TEN MILLION
Bombardier CRJ700
Image credit: Aero Icarus on Wikimedia Commons, CC BY-SA 2.0
BATCHGRAPH FOR BOUTIQUE GRAPHS
• Wrapper for a graph instance
• Handle intermediate commits
• Maintain vertex cache
• For loading data only
• Not in Apache TinkerPop 3 or
JanusGraph
• Moved away from graph wrapper
approach
Image credit: Apache TinkerPop, ALv2
REPLACING BATCHGRAPH
• Intermediate commits
• Count the mutations and commit
periodically
• Vertex cache
• Enable fast lookup of vertices to connect
with edges
• Composite index
• LRU cache https://github.com/ben-
manes/caffeine
• Pre-sort the data to maximize cache hits
Image credit: Apache TinkerPop, ALv2
storage.batch-
loading
• Disables automatic schema
• Disables transaction logging
• Disables transactions on storage
backend
• Bigger dirty transaction cache
size
• Disables external vertex
existence checks
• Disables consistency checks
(verify uniqueness, acquire
locks)
Image credit: Apache TinkerPop, ALv2
MULTI-MODEL APPROACHES
• Only store the data you need for graph queries in the graph
• Rehydrate non-graph properties from another store
• Direct index queries
Image credit: Apache TinkerPop, ALv2
108
ONE HUNDRED
MILLION
Boeing 737
Image credit: JTOcchialini on Wikimedia Commons, CC BY-SA 2.0
FAUNUS / TITAN-HADOOP
• Faunus was the distributed graph
analytics engine from Aurelius
• Used Hadoop to do breadth-first
traversals using MapReduce
• OLAP abstraction was pulled into
Apache TinkerPop 3
Image credit: Apache TinkerPop, ALv2
HADOOPGRAPH I/O FORMATS
• TinkerPop formats pull from files
• GraphSONInputFormat
• GryoInputFormat
• ScriptInputFormat
• JanusGraph formats pull from
storage
• Cassandra3InputFormat
• HBaseInputFormat
Image credit: JanusGraph, CC-BY-4.0
SPARKGRAPHCOMPUTER AND
BULKLOADERVERTEXPROGRAM
• Flexible Spark deployment options
• Spark local with multiple threads
• Spark master with multiple workers
• Configure BLVP with ScriptInputFormat
• Script and data shared across workers via HDFS
• Assorted tips
• Pre-define schema before loading
• Define an index on “bulkLoader.vertex.id”
• gremlin.spark.persistStorageLevel=DISK_ONLY
Image credit: Apache TinkerPop, ALv2
109
ONE BILLION
Airbus A380
Image credit: Maarten Visser on Wikipedia, CC BY-SA 2.0
FULLY-
DISTRIBUTED
CLUSTER
COMPUTING
• Same loading
mechanics as pseudo-
distributed
• Consider a Hadoop
distribution, like
Apache Ambari or
Hortonworks Data
Platform
• Be aware of differences
between distributions,
especially software
versions
Image credit: Apache TinkerPop, ALv2
DON’T
WHEELIE
THE DUCATI
Ducati Wheelie
Image credit: David Hurt on Flickr, CC BY 2.0
THANK
YOU!
@pluradj
RESOURCES
• Apache TinkerPop
• @apachetinkerpop
• https://tinkerpop.apache.org
• JanusGraph
• @janusgraph
• https://janusgraph.org
• Powers of Ten
• Stephen Mallette @spmallette
• https://www.datastax.com/dev/blog/powers-
of-ten-part-i
• https://www.datastax.com/dev/blog/powers-
of-ten-part-ii
• Practical Gremlin
• Kelvin Lawrence @gfxman
• https://github.com/krlawrence/graph
• JanusGraph Code Patterns
• IBM Code @ibmcode
• https://github.com/IBM/janusgraph-utils
• HadoopMarcʼs Blog
• http://yaaics.blogspot.com
• JanusGraph Nuts and Bolts
• Ted Wilmes @trwilmes
• https://www.experoinc.com/post/janusgraph-
nuts-and-bolts-part-1-write-performance

Weitere ähnliche Inhalte

Was ist angesagt?

Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...Neo4j
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4jNeo4j
 
The Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdfThe Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdfNeo4j
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4jNeo4j
 
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...Neo4j
 
Graphs for Enterprise Architects
Graphs for Enterprise ArchitectsGraphs for Enterprise Architects
Graphs for Enterprise ArchitectsNeo4j
 
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j
 
The Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent ApplicationsThe Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent ApplicationsNeo4j
 
Neo4j 4.1 overview
Neo4j 4.1 overviewNeo4j 4.1 overview
Neo4j 4.1 overviewNeo4j
 
Smarter Fraud Detection With Graph Data Science
Smarter Fraud Detection With Graph Data ScienceSmarter Fraud Detection With Graph Data Science
Smarter Fraud Detection With Graph Data ScienceNeo4j
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...jexp
 
Graph-Based Customer Journey Analytics with Neo4j
Graph-Based Customer Journey Analytics with Neo4jGraph-Based Customer Journey Analytics with Neo4j
Graph-Based Customer Journey Analytics with Neo4jNeo4j
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Neo4j: The path to success with Graph Database and Graph Data Science
Neo4j: The path to success with Graph Database and Graph Data ScienceNeo4j: The path to success with Graph Database and Graph Data Science
Neo4j: The path to success with Graph Database and Graph Data ScienceNeo4j
 
Workshop - Neo4j Graph Data Science
Workshop - Neo4j Graph Data ScienceWorkshop - Neo4j Graph Data Science
Workshop - Neo4j Graph Data ScienceNeo4j
 
Workshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceWorkshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceNeo4j
 
Transforming BT’s Infrastructure Management with Graph Technology
Transforming BT’s Infrastructure Management with Graph TechnologyTransforming BT’s Infrastructure Management with Graph Technology
Transforming BT’s Infrastructure Management with Graph TechnologyNeo4j
 
Introduction to Cypher
Introduction to Cypher Introduction to Cypher
Introduction to Cypher Neo4j
 

Was ist angesagt? (20)

Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
The Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdfThe Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdf
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
 
Graphs for Enterprise Architects
Graphs for Enterprise ArchitectsGraphs for Enterprise Architects
Graphs for Enterprise Architects
 
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
 
The Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent ApplicationsThe Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent Applications
 
Neo4j 4.1 overview
Neo4j 4.1 overviewNeo4j 4.1 overview
Neo4j 4.1 overview
 
Smarter Fraud Detection With Graph Data Science
Smarter Fraud Detection With Graph Data ScienceSmarter Fraud Detection With Graph Data Science
Smarter Fraud Detection With Graph Data Science
 
Graph databases
Graph databasesGraph databases
Graph databases
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
 
Graph-Based Customer Journey Analytics with Neo4j
Graph-Based Customer Journey Analytics with Neo4jGraph-Based Customer Journey Analytics with Neo4j
Graph-Based Customer Journey Analytics with Neo4j
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Neo4j: The path to success with Graph Database and Graph Data Science
Neo4j: The path to success with Graph Database and Graph Data ScienceNeo4j: The path to success with Graph Database and Graph Data Science
Neo4j: The path to success with Graph Database and Graph Data Science
 
Workshop - Neo4j Graph Data Science
Workshop - Neo4j Graph Data ScienceWorkshop - Neo4j Graph Data Science
Workshop - Neo4j Graph Data Science
 
Workshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceWorkshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data Science
 
Transforming BT’s Infrastructure Management with Graph Technology
Transforming BT’s Infrastructure Management with Graph TechnologyTransforming BT’s Infrastructure Management with Graph Technology
Transforming BT’s Infrastructure Management with Graph Technology
 
Introduction to Cypher
Introduction to Cypher Introduction to Cypher
Introduction to Cypher
 

Ähnlich wie Powers of Ten Redux

JanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching ForwardJanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching ForwardJason Plurad
 
Janus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardJanus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardDemai Ni
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azureDavid Giard
 
Big Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David GiardBig Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David GiardITCamp
 
Ipres2019 sn-stormcrawler
Ipres2019 sn-stormcrawlerIpres2019 sn-stormcrawler
Ipres2019 sn-stormcrawlersebastian_nagel
 
グラフデータベース Neptune 使ってみた
グラフデータベース Neptune 使ってみたグラフデータベース Neptune 使ってみた
グラフデータベース Neptune 使ってみたYoshiyasu SAEKI
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real WorldJeremy Hanna
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Mac Moore
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batchboorad
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]Thamme Gowda
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIOJozo Kovac
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Jason Dai
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Anthony Baker
 
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...Yahoo Developer Network
 

Ähnlich wie Powers of Ten Redux (20)

JanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching ForwardJanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching Forward
 
Janus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardJanus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforward
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
 
Big Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David GiardBig Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David Giard
 
Ipres2019 sn-stormcrawler
Ipres2019 sn-stormcrawlerIpres2019 sn-stormcrawler
Ipres2019 sn-stormcrawler
 
グラフデータベース Neptune 使ってみた
グラフデータベース Neptune 使ってみたグラフデータベース Neptune 使ってみた
グラフデータベース Neptune 使ってみた
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batch
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIO
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)
 
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...
 

Mehr von Jason Plurad

Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraphJason Plurad
 
Exploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraphExploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraphJason Plurad
 
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseAirline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseJason Plurad
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphJason Plurad
 
Graph Computing with Apache TinkerPop
Graph Computing with Apache TinkerPopGraph Computing with Apache TinkerPop
Graph Computing with Apache TinkerPopJason Plurad
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraphJason Plurad
 
JanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJason Plurad
 
Start Flying with Python & Apache TinkerPop
Start Flying with Python & Apache TinkerPopStart Flying with Python & Apache TinkerPop
Start Flying with Python & Apache TinkerPopJason Plurad
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphJason Plurad
 
Graph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and GremlinGraph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and GremlinJason Plurad
 
IBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyIBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyJason Plurad
 
Enabling Multimodel Graphs with Apache TinkerPop
Enabling Multimodel Graphs with Apache TinkerPopEnabling Multimodel Graphs with Apache TinkerPop
Enabling Multimodel Graphs with Apache TinkerPopJason Plurad
 
Graph Processing with Titan and Scylla
Graph Processing with Titan and ScyllaGraph Processing with Titan and Scylla
Graph Processing with Titan and ScyllaJason Plurad
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopJason Plurad
 

Mehr von Jason Plurad (14)

Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraph
 
Exploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraphExploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraph
 
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseAirline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use Case
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraph
 
Graph Computing with Apache TinkerPop
Graph Computing with Apache TinkerPopGraph Computing with Apache TinkerPop
Graph Computing with Apache TinkerPop
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraph
 
JanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYC
 
Start Flying with Python & Apache TinkerPop
Start Flying with Python & Apache TinkerPopStart Flying with Python & Apache TinkerPop
Start Flying with Python & Apache TinkerPop
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraph
 
Graph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and GremlinGraph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and Gremlin
 
IBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyIBM Open by Design: Graph Technology
IBM Open by Design: Graph Technology
 
Enabling Multimodel Graphs with Apache TinkerPop
Enabling Multimodel Graphs with Apache TinkerPopEnabling Multimodel Graphs with Apache TinkerPop
Enabling Multimodel Graphs with Apache TinkerPop
 
Graph Processing with Titan and Scylla
Graph Processing with Titan and ScyllaGraph Processing with Titan and Scylla
Graph Processing with Titan and Scylla
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop
 

Kürzlich hochgeladen

Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxAS Design & AST.
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...kalichargn70th171
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxSasikiranMarri
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfkalichargn70th171
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 

Kürzlich hochgeladen (20)

Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptx
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 

Powers of Ten Redux

  • 1. POWERS OF TEN REDUX JASON PLURAD • @pluradj IBM • APACHE TINKERPOP • JANUSGRAPH DATA DAY TEXAS • #DDTX18 • JANUARY 27, 2018
  • 2. OPEN SOURCE GRAPH TECH Property Graph Connected Data Model Apache TinkerPop™ Graph Computing Framework JanusGraph® Scalable Graph Database Image credits: Apache TinkerPop (ALv2) and JanusGraph (CC-BY-4.
  • 3. POWERS OF TEN Stephen Mallette Image credit: spmallette on Twitter
  • 4. 101 TEN Dart Paper Airplane Image credit: Akkana on Wikimedia Commons, CC BY-SA 3.0
  • 5. GRAPH TRAVERSALS Vertex id: 0 label: person • name: Jason Vertex id: 2 label: airplane • name: Dart • type: paper Edge id: 5, outV: 0, inV: 2 label: throws • distance: 10
  • 6. 103 ONE THOUSAND Wright Flyer Image credit: John T. Daniels on Wikimedia Commons, Public Domain
  • 7. GREMLIN CONSOLE • Read-Eval-Print Loop • Instant gratification • Help with reproducible scripts Image credit: Apache TinkerPop, ALv2
  • 8. AIR ROUTES DATA, CSV TO PROPERTY GRAPH Vertex id: 0 label: airport • code: AAE • desc: Annaba Vertex id: 2 label: airport • code: ALG • desc: Algiers Edge id: 5, outV: 0, inV: 2 label: route • distance: 254 airports.csv (3,374) routes.csv (43,400)
  • 9. CSV LOADING • Leverage CSV libraries • Be aware of auto-iteration • Get-or-Create pattern with coalesce()
  • 10. 105 ONE HUNDRED THOUSAND Spirit of St. Louis Image credit: Ad Meskens on Wikimedia Commons, CC BY-SA 3.0
  • 11. GREMLIN SERVER AND REMOTE GRAPHS • Gremlin Language Variants (GLV) for queries, not for bulkload • Gremlin Client Drivers enable efficient batch scripting • Use Script Parameterization. Period. Image credit: Apache TinkerPop, ALv2
  • 12. NO PARAMETERIZATION • Each script gets compiled and cached on the server – EXPENSIVE • Eventually will exceed the GC overhead limit
  • 13. BASIC PARAMETERIZATION • Script is compiled once and reused on future requests
  • 14. ADVANCED PARAMETERIZATION • Leverage Groovy script evaluation to handle more complex scripts Gremlin-Groovy script Parameters JSON
  • 15. STRUCTURED RETURN VALUES • Serializing all vertex properties and values can be expensive • Judiciously decide what to include in the response • Leverage Groovy scripting in combination with Gremlin traversals for maximum efficiency Image credit: Apache TinkerPop, ALv2
  • 16. 106 ONE MILLION Cessna 172 Skyhawk Image credit: Adrian Pingstone on Wikimedia Commons, Public Domain
  • 17. JANUSGRAPH • Open source project with open governance • Community driven development • Full implementation of Apache TinkerPop • Apache license • Broad adoption Image credits: The Linux Foundation® and JanusGraph (CC-BY-4.
  • 18. JANUSGRAPH STORAGE BACKENDS • In-Memory • Apache Cassandra, ScyllaDB • Apache HBase, Google Cloud Bigtable • Oracle Berkeley DB Java Edition • Amazon DynamoDB Image credit: Apache TinkerPop, ALv2
  • 19. JANUSGRAPH SCHEMA AND INDEXING • Graph schema • Vertex labels • Edge labels: multiplicity • Vertex properties: data types, cardinality • Indexing • Composite index: exact matches • Mixed index: full-text search, numerical range, geospatial • Vertex-centric index: local per vertex, a solution for supernodes Image credit: JanusGraph, CC-BY-4.0
  • 20. JANUSGRAPH QUICK-START DISTRIBUTION • Local server mode • Client, Storage, and Gremlin Server on a single machine • Great for testing out JanusGraph, but not recommended for production use
  • 21. JANUSGRAPH DEPLOYMENT OPTIONS • Remote server mode • Client on first machine • Storage on second machine • Remote server mode with Gremlin Server • Client on first machine • Gremlin Server on second machine • Storage on third machine Image credit: JanusGraph, CC-BY-4.0
  • 22. 107 TEN MILLION Bombardier CRJ700 Image credit: Aero Icarus on Wikimedia Commons, CC BY-SA 2.0
  • 23. BATCHGRAPH FOR BOUTIQUE GRAPHS • Wrapper for a graph instance • Handle intermediate commits • Maintain vertex cache • For loading data only • Not in Apache TinkerPop 3 or JanusGraph • Moved away from graph wrapper approach Image credit: Apache TinkerPop, ALv2
  • 24. REPLACING BATCHGRAPH • Intermediate commits • Count the mutations and commit periodically • Vertex cache • Enable fast lookup of vertices to connect with edges • Composite index • LRU cache https://github.com/ben- manes/caffeine • Pre-sort the data to maximize cache hits Image credit: Apache TinkerPop, ALv2
  • 25. storage.batch- loading • Disables automatic schema • Disables transaction logging • Disables transactions on storage backend • Bigger dirty transaction cache size • Disables external vertex existence checks • Disables consistency checks (verify uniqueness, acquire locks) Image credit: Apache TinkerPop, ALv2
  • 26. MULTI-MODEL APPROACHES • Only store the data you need for graph queries in the graph • Rehydrate non-graph properties from another store • Direct index queries Image credit: Apache TinkerPop, ALv2
  • 27. 108 ONE HUNDRED MILLION Boeing 737 Image credit: JTOcchialini on Wikimedia Commons, CC BY-SA 2.0
  • 28. FAUNUS / TITAN-HADOOP • Faunus was the distributed graph analytics engine from Aurelius • Used Hadoop to do breadth-first traversals using MapReduce • OLAP abstraction was pulled into Apache TinkerPop 3 Image credit: Apache TinkerPop, ALv2
  • 29. HADOOPGRAPH I/O FORMATS • TinkerPop formats pull from files • GraphSONInputFormat • GryoInputFormat • ScriptInputFormat • JanusGraph formats pull from storage • Cassandra3InputFormat • HBaseInputFormat Image credit: JanusGraph, CC-BY-4.0
  • 30. SPARKGRAPHCOMPUTER AND BULKLOADERVERTEXPROGRAM • Flexible Spark deployment options • Spark local with multiple threads • Spark master with multiple workers • Configure BLVP with ScriptInputFormat • Script and data shared across workers via HDFS • Assorted tips • Pre-define schema before loading • Define an index on “bulkLoader.vertex.id” • gremlin.spark.persistStorageLevel=DISK_ONLY Image credit: Apache TinkerPop, ALv2
  • 31. 109 ONE BILLION Airbus A380 Image credit: Maarten Visser on Wikipedia, CC BY-SA 2.0
  • 32. FULLY- DISTRIBUTED CLUSTER COMPUTING • Same loading mechanics as pseudo- distributed • Consider a Hadoop distribution, like Apache Ambari or Hortonworks Data Platform • Be aware of differences between distributions, especially software versions Image credit: Apache TinkerPop, ALv2
  • 33. DON’T WHEELIE THE DUCATI Ducati Wheelie Image credit: David Hurt on Flickr, CC BY 2.0
  • 34. THANK YOU! @pluradj RESOURCES • Apache TinkerPop • @apachetinkerpop • https://tinkerpop.apache.org • JanusGraph • @janusgraph • https://janusgraph.org • Powers of Ten • Stephen Mallette @spmallette • https://www.datastax.com/dev/blog/powers- of-ten-part-i • https://www.datastax.com/dev/blog/powers- of-ten-part-ii • Practical Gremlin • Kelvin Lawrence @gfxman • https://github.com/krlawrence/graph • JanusGraph Code Patterns • IBM Code @ibmcode • https://github.com/IBM/janusgraph-utils • HadoopMarcʼs Blog • http://yaaics.blogspot.com • JanusGraph Nuts and Bolts • Ted Wilmes @trwilmes • https://www.experoinc.com/post/janusgraph- nuts-and-bolts-part-1-write-performance

Hinweis der Redaktion

  1. One of the first problems a developer encounters when evaluating a graph database is how to construct a graph efficiently. Recognizing this need in 2014, TinkerPop's Stephen Mallette penned a series of blog posts titled "Powers of Ten" which addressed several bulkload techniques for Titan. Since then Titan has gone away, and the open source graph database landscape has evolved significantly. Do the same approaches stand the test of time? In this session, we will take a deep dive into strategies for loading data of various sizes into modern Apache TinkerPop graph systems. We will discuss bulkloading with JanusGraph, the scalable graph database forked from Titan, to better understand how its architecture can be optimized for ingestion.
  2. spmallette on Twitter https://twitter.com/spmallette/status/931575876046729217
  3. Akkana on Wikimedia Commons, CC BY-SA 3.0 https://commons.wikimedia.org/wiki/File:Paperairplane.png
  4. John T. Daniels on Wikimedia Commons, Public Domain https://commons.wikimedia.org/wiki/File:First_flight2.jpg
  5. Ad Meskens on Wikimedia Commons, CC BY-SA 3.0 https://commons.wikimedia.org/wiki/File:Spirit_Of_St_Louis2.jpg
  6. Adrian Pingstone on Wikimedia Commons, Public Domain https://commons.wikimedia.org/wiki/File:Cessna_172S_Skyhawk_at_Bristol_Airport_(England)_23Aug2014_arp.jpg
  7. Aero Icarus on Wikimedia Commons, CC BY-SA 2.0 https://commons.wikimedia.org/wiki/File:Delta_Connection_Canadair_CRJ700;_N603QX@SLC;09.10.2011_621ds_(6299961315).jpg
  8. JTOcchialini on Wikimedia Commons, CC BY-SA 2.0 https://commons.wikimedia.org/wiki/File:WestJet_C-GWSZ_Disney_World_JTPI_9598_(14506120928).jpg
  9. Maarten Visser on Wikimedia Commons, CC BY-SA 2.0 https://commons.wikimedia.org/wiki/File:A6-EDY_A380_Emirates_31_jan_2013_jfk_(8442269364)_(cropped).jpg
  10. David Hurt on Flickr, CC BY 2.0 https://www.flickr.com/photos/davidht/1787402541