SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Airline Reservations
and Routing: A Graph
Use Case
Jason Plurad
Chin Huang
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Pilots
2DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Jason Plurad is a software developer in IBM Digital Business Group. He
develops open source software and builds open communities in the big data
and analytics space, with a current focus on graph databases and graph
analytics. He is a Technical Steering Committee member and committer on
JanusGraph and Apache TinkerPop.
Chin Huang is a software engineer at the IBM Open Technologies and
Performance. He has worked on various enterprise and open source
projects. His current focus is JanusGraph and node.js development and
performance characterization.
How Did We Get Here?
Jason
• Raleigh (RDU)
• Detroit (DTW)
• Amsterdam (AMS)
• Berlin (TXL)
Chin
• San Francisco (SFO)
• Copenhagen (CPH)
• Berlin (TXL)
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Graphs are not new
4DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Graph Data Use Cases
5
Social network analysis
Configuration management database
Master data management
Recommendation engines
Knowledge graphs
Internet of things
Cyber security attack analysis
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
C
A
B
D
Property Graph
6DOC ID / Month XX, 2018 / © 2018 IBM Corporation
RDU DTW AMS
TXLSFO CPH
Type: vertex
Label: airport
Name: Berlin Tegel
Code: TXL
City: Berlin
Country: Germany
Type: edge
Label: route
Flight: 343
Distance: 501
Depart: 13:05
Arrive: 14:57
Gremlin: Graph Traversal
Language
7
What is the shortest path to Berlin?
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Apache TinkerPop
https://tinkerpop.apache.org
> g.V(rdu).
repeat( out('route').simplePath() ).
until( has('code’, TXL') ).
limit(5).
path().by('code').
toList()
==> [RDU, JFK, TXL]
==> [RDU, LAX, TXL]
==> [RDU, MIA, TXL]
==> [RDU, YYZ, TXL]
==> [RDU, SFO, TXL]
JanusGraph
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation 8
JanusGraph
Maintainer The Linux
Foundation
License Apache
Releases 0.3.0 planned
2Q 2018
https://janusgraph.org
• Established in January 2017
• Fork of TitanDB
• Scalable graph database distributed
on multi-machine clusters with
pluggable storage and indexing
• Vendor-neutral, open community with
open governance
• Founders: Expero, Google, Grakn,
Hortonworks, IBM
• Members: Amazon, Huawei,
Netflix, Orchestral Developments,
Seeq, Uber
• In Production: Celum, Finc, G-
Data, IBM Cloud, Seeq
JanusGraph Architecture
9DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
http://docs.janusgraph.org/latest/arch-overview.html
Graph database storage
backends: Performance
evaluation
Graph use case: Air
travel reservation
10DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Performance Test
Environment
11
Server spec
• Physical servers: x3650 M5, 2 sockets x 14
cores, 384 GB (12 x 32G) memory
• CPU: Intel Xeon Processor E5-2690 v4 14C
2.6GHz 35MB Cache 2400MHz
• Network interface: Emulex VFA5.2 ML2 Dual
Port 10GbE SFP+ Adapter
• Disk: 720 GB SSD, RAID 5
• Operating system: Ubuntu 16.04.2 LTS
Public tools
• jMeter - load testing tool
• nmon, nmon analyser - system performance
monitor and analyze tool
• VisualVM - all-in-one Java
troubleshooting/profiling tool
• GCeasy - garbage collection log analysis tool
• Prometheus and grafana – monitoring
dashboard
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
JanusGraph Utility Tools
12
How about graph data in volume?
• Lack of existing data or unavailable for performance evaluation
• What are the performance characteristics for various volumes
• Graph Data Generator generates graph data in different sizes and
shapes, so you can easily simulate real data and performance
How to manage graph schema?
• Lack of graph schema management tools
• Graph schemas may change for optimal performance
• Graph Schema Loader enables you to quickly load and update
schema definitions in JanusGraph
How to massively load data into a graph database?
• Lots of RDBMS support data export to CSV files
• I have millions/billions of records!
• Data Batch Importer allows you to fully utilize system resources to
import data in CSV files into JanusGraph
Open source code: https://github.com/IBM/janusgraph-utils
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Performance Test Topology
13DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Cassandra
HBase + HDFS
+ ZooKeeper
Scylla
Cassandra
HBase + HDFS
+ ZooKeeper
Scylla
Cassandra
HBase + HDFS
+ ZooKeeper
Scylla
JanusGraph
Database Cluster
Load injector
queryinsert, update
Performance Evaluation:
Insert Vertices
14DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
• 40 mil vertices in total
• 2 properties for each vertex
• Insert scenario
• Fully utilize the injectors to generate the
loading against the databases
Performance Evaluation:
Insert Edges
15DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
• 30 mil edges in total
• 1 property for each edge
• Query and update scenario
Performance
Evaluation: Graph
Traversal
16DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Lessons Learned: Storage
Backends
17
Cassandra
• Cluster bootstrapping takes more efforts
• Smaller memory footprint
HBase
• Uneven CPU% caused by hot regions
• Need to carefully configure read and write
cache settings for better throughput
Scylla
• Easy clustering – adding multiple nodes at once
• Well self-tuned but also lacks documentation
• Even load distributed
• Fully utilize system resources
• CPU utilization misrepresents real loads
• Nice monitoring dashboard – prometheus +
grafana
• Works with existing Cassandra utility clients
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Flight Search Use Case
18
Flight search
•All flights from airport A to airport B on a given date and time
•# of stops: non-stop, one-stop, two-stop…
Data spec
•600+ airports, 350K+ flight schedules
Graph Model
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Vertex: Airport
Airport code
Vertex: Country
Country name
Edge: Flight Schedule
Flight #
Departure date
Arrival date
Lessons Learned: Flight
Search
19
Model your graph database for performance
• Design data model for your use cases!
• Understand workload read/write ratio
• What kind of queries you want to support? How
many levels deep into a traversal?
• Consider denormalization…
• Design and use various indexes supported in
JanusGraph
Try different approaches to get results back faster
• Use pre-processor in custom app
• Use gremlin queries, applying filters as early as
possible in a query to limit the number of
traversals
• Use groovy methods as programmable extension
Fine-tune for your workloads and systems
• JanusGraph supports storage and index backends
therefore tune your backends!
• JanusGraph server configurations, such as
threadPoolBoss and threadPoolWorker
• JVM configurations, such as Xms (initial and
minimum Java heap size) and Xmx (maximum
Java heap size) You don’t want to see the
annoying java.lang.OutOfMemoryError exceptions
or long and slower GCs.
• Use multiple threads and/or instances to your
system’s capacity
• Consider cloud and auto-scaling
• Be thorough and be patient because it will take a
few iterations!
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
20
Thank you
compose.com/databases/janusgraph
twitter.com/pluradj
twitter.com/chinhuang007
github.com/IBM/janusgraph-utils
developer.ibm.com/code/patterns
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
21DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation

Weitere ähnliche Inhalte

Was ist angesagt?

Start Flying with Python & Apache TinkerPop
Start Flying with Python & Apache TinkerPopStart Flying with Python & Apache TinkerPop
Start Flying with Python & Apache TinkerPopJason Plurad
 
Janus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardJanus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardDemai Ni
 
Graph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and GremlinGraph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and GremlinJason Plurad
 
Enabling Multimodel Graphs with Apache TinkerPop
Enabling Multimodel Graphs with Apache TinkerPopEnabling Multimodel Graphs with Apache TinkerPop
Enabling Multimodel Graphs with Apache TinkerPopJason Plurad
 
IBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyIBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyJason Plurad
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphJason Plurad
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRAkbajda
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopJason Plurad
 
Graph Processing with Titan and Scylla
Graph Processing with Titan and ScyllaGraph Processing with Titan and Scylla
Graph Processing with Titan and ScyllaJason Plurad
 
BDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical OverviewBDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical OverviewBigData_Europe
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkKNIMESlides
 
The IoT and big data
The IoT and big dataThe IoT and big data
The IoT and big dataGal Ben-Haim
 
NetApp Flash Storage Facts
NetApp Flash Storage FactsNetApp Flash Storage Facts
NetApp Flash Storage FactsNetApp Insight
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIMESlides
 
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"Ibrahim Muhammadi
 
Lightweight Collection and Storage of Software Repository Data with DataRover
Lightweight Collection and Storage of  Software Repository Data with DataRoverLightweight Collection and Storage of  Software Repository Data with DataRover
Lightweight Collection and Storage of Software Repository Data with DataRoverChristoph Matthies
 
Quix presto ide, presto summit IL
Quix presto ide, presto summit ILQuix presto ide, presto summit IL
Quix presto ide, presto summit ILOri Reshef
 

Was ist angesagt? (20)

Start Flying with Python & Apache TinkerPop
Start Flying with Python & Apache TinkerPopStart Flying with Python & Apache TinkerPop
Start Flying with Python & Apache TinkerPop
 
Janus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardJanus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforward
 
Graph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and GremlinGraph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and Gremlin
 
Enabling Multimodel Graphs with Apache TinkerPop
Enabling Multimodel Graphs with Apache TinkerPopEnabling Multimodel Graphs with Apache TinkerPop
Enabling Multimodel Graphs with Apache TinkerPop
 
IBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyIBM Open by Design: Graph Technology
IBM Open by Design: Graph Technology
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraph
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRA
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop
 
Graph Processing with Titan and Scylla
Graph Processing with Titan and ScyllaGraph Processing with Titan and Scylla
Graph Processing with Titan and Scylla
 
BDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical OverviewBDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical Overview
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with Spark
 
Big data groningen
Big data groningenBig data groningen
Big data groningen
 
Data Science in the Cloud
Data Science in the CloudData Science in the Cloud
Data Science in the Cloud
 
The IoT and big data
The IoT and big dataThe IoT and big data
The IoT and big data
 
NetApp Flash Storage Facts
NetApp Flash Storage FactsNetApp Flash Storage Facts
NetApp Flash Storage Facts
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To Deployment
 
Big data groningen
Big data groningenBig data groningen
Big data groningen
 
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"
 
Lightweight Collection and Storage of Software Repository Data with DataRover
Lightweight Collection and Storage of  Software Repository Data with DataRoverLightweight Collection and Storage of  Software Repository Data with DataRover
Lightweight Collection and Storage of Software Repository Data with DataRover
 
Quix presto ide, presto summit IL
Quix presto ide, presto summit ILQuix presto ide, presto summit IL
Quix presto ide, presto summit IL
 

Ähnlich wie Airline Reservations and Routing: A Graph Use Case

Why Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps MonitoringWhy Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps MonitoringDevOps.com
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsScyllaDB
 
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3DataWorks Summit
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningModusOptimum
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSKimmo Kantojärvi
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big dataSigmoid
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software OverviewKNIMESlides
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformDeepak Chandramouli
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise DataWorks Summit
 
What's New In Neo4j 3.4 & Bloom Update
What's New In Neo4j 3.4 & Bloom UpdateWhat's New In Neo4j 3.4 & Bloom Update
What's New In Neo4j 3.4 & Bloom UpdateNeo4j
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeLuciano Resende
 
Using LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowUsing LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowDataWorks Summit
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon RedshiftAmazon Web Services
 
Graph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.comGraph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.comKarin Patenge
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Databricks
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJim Dowling
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceCambridge Semantics
 

Ähnlich wie Airline Reservations and Routing: A Graph Use Case (20)

Why Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps MonitoringWhy Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps Monitoring
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software Overview
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic Platform
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
 
What's New In Neo4j 3.4 & Bloom Update
What's New In Neo4j 3.4 & Bloom UpdateWhat's New In Neo4j 3.4 & Bloom Update
What's New In Neo4j 3.4 & Bloom Update
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
Using LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowUsing LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache Arrow
 
Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Graph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.comGraph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.com
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocks
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 

Kürzlich hochgeladen

VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 

Kürzlich hochgeladen (20)

VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 

Airline Reservations and Routing: A Graph Use Case

  • 1. Airline Reservations and Routing: A Graph Use Case Jason Plurad Chin Huang DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  • 2. Pilots 2DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation Jason Plurad is a software developer in IBM Digital Business Group. He develops open source software and builds open communities in the big data and analytics space, with a current focus on graph databases and graph analytics. He is a Technical Steering Committee member and committer on JanusGraph and Apache TinkerPop. Chin Huang is a software engineer at the IBM Open Technologies and Performance. He has worked on various enterprise and open source projects. His current focus is JanusGraph and node.js development and performance characterization.
  • 3. How Did We Get Here? Jason • Raleigh (RDU) • Detroit (DTW) • Amsterdam (AMS) • Berlin (TXL) Chin • San Francisco (SFO) • Copenhagen (CPH) • Berlin (TXL) DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  • 4. Graphs are not new 4DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  • 5. Graph Data Use Cases 5 Social network analysis Configuration management database Master data management Recommendation engines Knowledge graphs Internet of things Cyber security attack analysis DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation C A B D
  • 6. Property Graph 6DOC ID / Month XX, 2018 / © 2018 IBM Corporation RDU DTW AMS TXLSFO CPH Type: vertex Label: airport Name: Berlin Tegel Code: TXL City: Berlin Country: Germany Type: edge Label: route Flight: 343 Distance: 501 Depart: 13:05 Arrive: 14:57
  • 7. Gremlin: Graph Traversal Language 7 What is the shortest path to Berlin? DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation Apache TinkerPop https://tinkerpop.apache.org > g.V(rdu). repeat( out('route').simplePath() ). until( has('code’, TXL') ). limit(5). path().by('code'). toList() ==> [RDU, JFK, TXL] ==> [RDU, LAX, TXL] ==> [RDU, MIA, TXL] ==> [RDU, YYZ, TXL] ==> [RDU, SFO, TXL]
  • 8. JanusGraph DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation 8 JanusGraph Maintainer The Linux Foundation License Apache Releases 0.3.0 planned 2Q 2018 https://janusgraph.org • Established in January 2017 • Fork of TitanDB • Scalable graph database distributed on multi-machine clusters with pluggable storage and indexing • Vendor-neutral, open community with open governance • Founders: Expero, Google, Grakn, Hortonworks, IBM • Members: Amazon, Huawei, Netflix, Orchestral Developments, Seeq, Uber • In Production: Celum, Finc, G- Data, IBM Cloud, Seeq
  • 9. JanusGraph Architecture 9DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation http://docs.janusgraph.org/latest/arch-overview.html
  • 10. Graph database storage backends: Performance evaluation Graph use case: Air travel reservation 10DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  • 11. Performance Test Environment 11 Server spec • Physical servers: x3650 M5, 2 sockets x 14 cores, 384 GB (12 x 32G) memory • CPU: Intel Xeon Processor E5-2690 v4 14C 2.6GHz 35MB Cache 2400MHz • Network interface: Emulex VFA5.2 ML2 Dual Port 10GbE SFP+ Adapter • Disk: 720 GB SSD, RAID 5 • Operating system: Ubuntu 16.04.2 LTS Public tools • jMeter - load testing tool • nmon, nmon analyser - system performance monitor and analyze tool • VisualVM - all-in-one Java troubleshooting/profiling tool • GCeasy - garbage collection log analysis tool • Prometheus and grafana – monitoring dashboard DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  • 12. JanusGraph Utility Tools 12 How about graph data in volume? • Lack of existing data or unavailable for performance evaluation • What are the performance characteristics for various volumes • Graph Data Generator generates graph data in different sizes and shapes, so you can easily simulate real data and performance How to manage graph schema? • Lack of graph schema management tools • Graph schemas may change for optimal performance • Graph Schema Loader enables you to quickly load and update schema definitions in JanusGraph How to massively load data into a graph database? • Lots of RDBMS support data export to CSV files • I have millions/billions of records! • Data Batch Importer allows you to fully utilize system resources to import data in CSV files into JanusGraph Open source code: https://github.com/IBM/janusgraph-utils DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  • 13. Performance Test Topology 13DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation Cassandra HBase + HDFS + ZooKeeper Scylla Cassandra HBase + HDFS + ZooKeeper Scylla Cassandra HBase + HDFS + ZooKeeper Scylla JanusGraph Database Cluster Load injector queryinsert, update
  • 14. Performance Evaluation: Insert Vertices 14DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation • 40 mil vertices in total • 2 properties for each vertex • Insert scenario • Fully utilize the injectors to generate the loading against the databases
  • 15. Performance Evaluation: Insert Edges 15DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation • 30 mil edges in total • 1 property for each edge • Query and update scenario
  • 16. Performance Evaluation: Graph Traversal 16DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  • 17. Lessons Learned: Storage Backends 17 Cassandra • Cluster bootstrapping takes more efforts • Smaller memory footprint HBase • Uneven CPU% caused by hot regions • Need to carefully configure read and write cache settings for better throughput Scylla • Easy clustering – adding multiple nodes at once • Well self-tuned but also lacks documentation • Even load distributed • Fully utilize system resources • CPU utilization misrepresents real loads • Nice monitoring dashboard – prometheus + grafana • Works with existing Cassandra utility clients DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  • 18. Flight Search Use Case 18 Flight search •All flights from airport A to airport B on a given date and time •# of stops: non-stop, one-stop, two-stop… Data spec •600+ airports, 350K+ flight schedules Graph Model DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation Vertex: Airport Airport code Vertex: Country Country name Edge: Flight Schedule Flight # Departure date Arrival date
  • 19. Lessons Learned: Flight Search 19 Model your graph database for performance • Design data model for your use cases! • Understand workload read/write ratio • What kind of queries you want to support? How many levels deep into a traversal? • Consider denormalization… • Design and use various indexes supported in JanusGraph Try different approaches to get results back faster • Use pre-processor in custom app • Use gremlin queries, applying filters as early as possible in a query to limit the number of traversals • Use groovy methods as programmable extension Fine-tune for your workloads and systems • JanusGraph supports storage and index backends therefore tune your backends! • JanusGraph server configurations, such as threadPoolBoss and threadPoolWorker • JVM configurations, such as Xms (initial and minimum Java heap size) and Xmx (maximum Java heap size) You don’t want to see the annoying java.lang.OutOfMemoryError exceptions or long and slower GCs. • Use multiple threads and/or instances to your system’s capacity • Consider cloud and auto-scaling • Be thorough and be patient because it will take a few iterations! DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  • 21. 21DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation