SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Improve Your Experience by Using
Graph Analytics
Slides from my session at
“Women Who Code” Meetup | 2018-05-23 | Berlin
Karin Patenge | @kpatenge |  karin.patenge@oracle.com
Business Development Manager Technology (Europe North)
Oracle Deutschland B.V. & Co. KG
1
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Oracle Code Berlin
June 12th 2018
Free full-day event @ Funkhaus Berlin
https://developer.oracle.com/code/berlin-june-2018
Including panel discussion:
Go for IT! Make Diversity Matter: Digital Transformation
as a Chance for Women in Coding
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Agenda
• Data of Interest
• Questions of Interest
• Data Processing Workflow
• Key Takeaways
• Q&A
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Just briefly about myself
Since Nov. 2016: Business Development Manager focusing on new(er)/emerging
technologies & modern data management platforms for Europe North
Joined Oracle in 2007: As Sales Consultant for Core Tech Products. Special topics:
Spatial Technologies, Graph & Semantic Technologies, NoSQL, …
Before Oracle: Since 1989 worked as Computer Scientist in several IT roles | depts for
Radio Technology Manufacturer | Public Sector | Pharma (Schering, Bayer Health Care)
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Setting the Scene
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Data of Interest
Direct relations not (yet) analyzed
• Data retrieval via REST API
https://www.meetup.com/meetup_api
• Different API methods & versions
• API Key required
• Sample request
• Data returned as JSON
@kpatenge
is_interested_in
is_member_of
is_assigned_to
has_registered_for
takes_place_in
is_located_in
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Questions of Interest
• Which Meetup groups are most active in terms of:
– # members
– # events
– # event attendees
• Who and where are influencers in the Meetup community?
• Where are connections between the Meetup groups in different locations?
• Which topics are “hot”?
• How close/similar are groups?
• …
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Data Processing Workflow: Overview
Retrieve&Prepare
Prepare
source data
• Using R data
retrieval via
REST API and
conversion
JSON  CSV 
OPV/OPE
Load&Build
Load
nodes and
edges data
into a graph
• Use Oracle
NoSQL DB as
Graph data
store
Analyze
Analyze
graph data
• Using Graph
Analytics Engine
(PGX) and
Property Graph
Query Language
(PGQL)
Visualize
Visualize
graph data
• Using
Cytoscape
Results
Summarize
results
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• Code and result (data) files can be
downloaded from:
– https://github.com/karinpatenge/AnalyticsandD
ataSummit2018
Important Note
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Working Environment
• Available for free:
Oracle Big Data Lite VM 4.11 running in Oracle VirtualBox
– Big Data Spatial and Graph (BDSG) 2.4 including Property Graph Query Language
(PGQL) 1.0
http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html
• Gremlin, Apache Groovy Shell
• Zeppelin Notebook with PGX Interpreter
– Oracle NoSQL Database (Minimal instance with 1 node, no replication, aka kvlite)
– RStudio
• Additional R packages loaded
– Cytoscape 3.6.0
• Big Data Spatial and Graph 2.4 support installed
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Modeling Data as Graphs
11
The more connected the data is, the better a Graph fits
Oracle NoSQL DB with Big Data Spatial and GraphGraphic source: http://www.ateam-oracle.com/intro-to-graphs-at-oracle/
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• A set of nodes (aka vertices)
– each vertex has a unique identifier
– each vertex has a set of in/out edges
– each vertex has a collection of key-value
properties
• A set of edges
– each edge has a unique identifier
– each edge has a head/tail vertex
– each edge has a label denoting type of
relationship between two vertices
– each edge has a collection of key-value properties
• Blueprints Java APIs
• Implementations
– Oracle (Spatial and Graph, Big Data Spatial and
Graph), Neo4j, DataStax (Titan), InfiniteGraph,
Dex, Sail, MongoDB, …
12
What is a Property Graph?
https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
@kpatenge
2
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 13
Retrieve&Prepare
Prepare
source data
• Using R data
retrieval via
REST API and
conversion
JSON  CSV 
OPV/OPE
Load&Build
Load
nodes and
edges data
into a graph
• Use Oracle
NoSQL DB to
store
Analyze
Analyze
graph data
• Using Graph
Analytics Engine
(PGX) and
Property Graph
Query Language
(PGQL)
Visualize
Visualize
graph data
• Using
Cytoscape
Results
Summarize
results
1
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Request URL (Example)
• https://api.meetup.com/Women-Who-
Code-Berlin-
Germany/events?&key=506c1916524f6d
3a6c782432645f5eb&status=past,upcomi
ng&omit=description
• Important note:
– For most requests data are only returned for
the city that matches with the location that
is assigned to the user profile posessing the
API key
Response (JSON)
Requesting Data via Meetup REST API
1
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Run R Code via RStudio
1
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• Transform JSON into a flat structure:
One record per instance of
information type
– Cities
– Categories
– Groups
– Members
– Events
– Topics
• Store data in .csv
– Not required but convenient to have as
intermediate format
Intermediate Results: CSV text files
1
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• Extract attribute values from flat
structure
• Append each as single record into nodes
and edges files
Final Results: Flat File Structure for Property Graph
1
Nodes
(aka Vertices)
(in flat file format)
Edges
(in flat file format)
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Useful Tips
• When creating nodes and edges files (.opv, .ope)
– Assign the right data type to attributes
– Check for NULL values
– Replace special characters
– Remove duplicates
– Check pattern of IDs used in source(s). Generate surrogate IDs if necessary.
• Keep original ID by storing it as property if necessary
@kpatenge
1
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 19
Retrieve&Prepare
Prepare
source data
• Using R data
retrieval via
REST API and
conversion
JSON  CSV 
OPV/OPE
Load&Build
Load
nodes and
edges data
into a graph
• Use Oracle
NoSQL DB to
store
Analyze
Analyze
graph data
• Using Graph
Analytics Engine
(PGX) and
Property Graph
Query Language
(PGQL)
Visualize
Visualize
graph data
• Using
Cytoscape
Results
Summarize
results
2
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Architecture of Property Graph Support
20@kpatenge
3
Graph Data Access Layer (DAL)
Graph Analytics
Blueprints & Lucene/SolrCloud RDF (RDF/XML, N-
Triples, N-Quads,
TriG,N3,JSON)
REST/Web
Service/Notebooks
Java,Groovy,Python,…
Java APIs
Java APIs/JDBC/SQL/PLSQL
Property Graph
formats
GraphML
GML
GraphSON
Flat FilesScalable and Persistent Storage Management
Parallel In-Memory Graph
Analytics (PGX) /
Graph Querying (PGQL)
Oracle NoSQL
Database
Oracle RDBMS Apache HBase
Apache
Spark
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Import Nodes and Edges into a Property Graph
// Start Groovy Shell connecting to Oracle NoSQL DB
cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy
./gremlin-opg-nosql.sh
server = new ArrayList();
server.add("bigdatalite.localdomain:5000");
// Create a graph config that contains the graph name "meetup"
// Name of KV store is "kvstore"
// Make sure to add all vertex/edge properties used in PGQL queries
cfg = GraphConfigBuilder.forPropertyGraphNosql() 
.setName("meetup") 
.setStoreName("kvstore") 
.setHosts(server) 
.addVertexProperty("type", PropertyType.STRING, "NA") 
.addVertexProperty("city_name", PropertyType.STRING, "NA") 
.addVertexProperty("city_country", PropertyType.STRING, "NA") 
.addVertexProperty("city_member_count", PropertyType.INTEGER, 0) 
.addVertexProperty("group_country", PropertyType.STRING, "NA") 
.addVertexProperty("group_visibility", PropertyType.STRING, "NA") 
.addVertexProperty("group_members", PropertyType.INTEGER, 0) 
.addVertexProperty("group_name", PropertyType.STRING, "NA") 
.addVertexProperty("member_name", PropertyType.STRING, "NA") 
.addVertexProperty("topic_name", PropertyType.STRING, "NA") 
.addVertexProperty("topic_urlkey", PropertyType.STRING, "NA") 
.addVertexProperty("event_yes_rsvp_count", PropertyType.INTEGER, 0) 
.addVertexProperty("event_rating_count", PropertyType.INTEGER, 0) 
.addVertexProperty("event_rating_average", PropertyType.INTEGER, 0) 
.addVertexProperty("event_waitlist_count", PropertyType.INTEGER, 0) 
.hasEdgeLabel(true) 
.setLoadEdgeLabel(true) 
.setMaxNumConnections(2).build();
// Create an instance of the graph
opg = OraclePropertyGraph.getInstance(cfg);
opg.setClearTableDOP(2);
opg.clearRepository();
opg.getKVStoreConfig();
// Create an instance for the graph loader
opgdl=OraclePropertyGraphDataLoader.getInstance();
vfile="/home/oracle/Documents/Meetup/data/meetup.opv
efile="/home/oracle/Documents/Meetup/data/meetup.ope
// Load data into the graph
opgdl.loadData(opg, vfile, efile, 2);
// Do some checks
// Count vertices and edges
opg.countVertices();
opg.countEdges();
// Get vertices and edges
opg.getVertices();
opg.getEdges();
...
// Shut down instance and close shell
opg.shutdown();
:q
2
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 22
Retrieve&Prepare
Prepare
source data
• Using R data
retrieval via
REST API and
conversion
JSON  CSV 
OPV/OPE
Load&Build
Load
nodes and
edges data
into a graph
• Use Oracle
NoSQL DB to
store
Analyze
Analyze
graph data
• Using Graph
Analytics Engine
(PGX) and
Property Graph
Query Language
(PGQL)
Visualize
Visualize
graph data
• Using
Cytoscape
Results
Summarize
results
3
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
PGX – Graph Analytics Engine
• Toolkit for In-Memory, Parallel Graph
Analysis containing
– PGX shell
– Analyst API with a large collection of built-in
algorithms
– and more
• Developed by Oracle Labs
• https://docs.oracle.com/cd/E56133_01/latest/i
ndex.html
• https://event.cwi.nl/grades/2018/07-
VanRest.pdf
PGQL – Property Graph Query Language
• http://pgql-lang.org/
• Graph Pattern Matching combined with
SQL
– WHERE clause set of comma-separated
constraints
• Developed by Oracle Labs
• Proposed for standardization
23
How to Analyze Property Graph Data
@kpatenge
3
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Analyze Property Graph Data using PGX
3
• Start PGX server
/opt/oracle/oracle-spatial-
graph/property_graph/pgx/bin/start-server
• Start / Return to Groovy Shell
// Create in-memory session and analyst for analytics
session=Pgx.createSession("session_ID_1");
analyst=session.createAnalyst();
// Read the graph from Oracle NoSQL DB into memory
pgxGraph =
session.readGraphWithProperties(opg.getConfig());
// Working with In-Memory Analyst
// Execute Page Rank
rank=analyst.pagerank(pgxGraph, 0.0001, 0.85, 100);
// Get top 10 vertices
rank.getTopKValues(10);
// BetweenNess Centrality
bc=analyst.vertexBetweennessCentrality(pgxGraph)
// Get top 10 vertices
bc.getTopKValues(10);
...
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• Topology constraints
 (n)–[e]–>(m)
 (n)–[e1]–>(m1), (n)–[e2]–>(m2)
 (n1)-[e1]->(n2)-[e2]->(n3)-[e3]->(n4)
 (n1)-[e1]->(n2)<-[e2]-(n3)
• Label matching
 (x:Person) -[e:likes]-> (y:Person)
 (:Person) -[:likes]-> (:Person)
 (x:Student|Professor) -[e:likes|knows]->
(y:Student|Professor)
• Value constraints
 (x) -> (y), x.name = 'John’, y.age > 25
• In-Line constraints
 (n WITH name = 'John' OR name = 'James', type =
'Person') -[e WITH type = 'workAt', workHours <
40]-> ()
• …
Syntax form Examples
Basic form (n)-[e]->(m)
Omit variable name of the source
vertex
()-[e]->(m)
Omit variable name of the destination
vertex
(n)-[e]->()
Omit variable names in both vertices ()-[e]->()
Omit variable name in edge (n)-->(m)
Omit variable name in edge
(alternative, one dash)
(n)->(m)
25
Analyzing Property Graph Data using PGQL
3
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Analyzing Property Graph Data using PGQL
3
• Start / Return to Groovy Shell
// Some PGQL queries
pgxResultSet = pgxGraph.queryPgql("SELECT * WHERE (x) -
[e1:is_organizer_of]-> (y) -[e2:is_located_in]-> (z)")
pgxResultSet.print(5);
pgxResultSet.getNumResults();
pgxResultSet = pgxGraph.queryPgql("SELECT x.type,
y.type, y.group_name, y.group_members WHERE (x) -
[e1:is_organizer_of]-> (y WITH group_members > 1000) -
[e2:is_located_in]-> (z) order by y.group_members
desc");
pgxResultSet.print(5);
pgxResultSet = pgxGraph.queryPgql("SELECT
x.member_name, y.group_name, y.group_members WHERE (x)
-[e1:is_organizer_of]-> (y WITH group_members > 1000) -
[e2:is_located_in]-> (z)");
pgxResultSet.print(5);
pgxResultSet = pgxGraph.queryPgql("SELECT * WHERE (x
WITH event_yes_rsvp_count > 250) -[e1:is_organized_by]-
> (y) -[e2:is_located_in]-> (z)")
pgxResultSet.print(5);
...
@kpatenge
https://blogs.oracle.com/bigdataspatialgraph/how-many-ways-to-run-property-graph-query-language-pgql-in-bdsg-i
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 27
Retrieve&Prepare
Prepare
source data
• Using R data
retrieval via
REST API and
conversion
JSON  CSV 
OPV/OPE
Load&Build
Load
nodes and
edges data
into a graph
• Use Oracle
NoSQL DB to
store
Analyze
Analyze
graph data
• Using Graph
Analytics Engine
(PGX) and
Property Graph
Query Language
(PGQL)
Visualize
Visualize
graph data
• Using
Cytoscape
Results
Summarize
results
4
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Visualization Data using Cytoscape connected to
Big Data Spatial and Graph
4
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
SELECT *
WHERE (x WITH type='Event') -[e1]-> (y WITH
type='Group' and group_name = 'Women Who Code
Berlin') <-[e2:is_assigned_to]- (z WITH
type='Topic')
29
PGQL – Examples (visualized using Cytoscape)
4
@kpatenge
https://blogs.oracle.com/bigdataspatialgraph/how-many-ways-to-run-property-graph-query-language-pgql-in-bdsg-ii
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
SELECT *
WHERE (x) -[e1:is_organized_by]-> (y WITH
type='Group' and group_name = 'Women Who Code
Berlin') <-[e2:is_assigned_to]- (z WITH
type='Topic'), (y) -[e3:is_located_in]-> (w)
30
PGQL – Examples (visualized using Cytoscape)
4
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
SELECT *
WHERE (x WITH type='Topic' and topic_name = 'Women
in Technology') -[e1]-> (y WITH type='Group') -
[e2]-> (z WITH type = 'City' and (city_name =
'Berlin' or city_name = 'Hamburg' or city_name =
'München'))
31
PGQL – Examples (visualized using Cytoscape)
4
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
SELECT *
WHERE (x WITH type='Event' and
event_yes_rsvp_count >= 250) -[e1]- (y WITH
type='Group') -[e2]- (z WITH type='City')
32
PGQL – Examples (visualized using Cytoscape)
4
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
SELECT *
WHERE (x WITH type='Group' and group_name = 'Women
Who Code Berlin') <-[e1:is_assigned_to]- (y WITH
type='Topic') -[e2]-> (z WITH group_members >=
2000) -[e3:is_located_in]-> (w)
33
PGQL – Examples (visualized using Cytoscape)
4
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Copenhagen
Berlin
Hamburg
Munich
4
Meetup Groups in relation to organizers
@kpatenge
More Visualization Examples using Cytoscape
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
4
@kpatenge
More Visualization Examples using Cytoscape
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 37
Retrieve&Prepare
Prepare
source data
• Using R data
retrieval via
REST API and
conversion
JSON  CSV 
OPV/OPE
Load&Build
Load
nodes and
edges data
into a graph
• Use Oracle
NoSQL DB to
store
Analyze
Analyze
graph data
• Using Graph
Analytics Engine
(PGX) and
Property Graph
Query Language
(PGQL)
Visualize
Visualize
graph data
• Using
Cytoscape
Results
Summarize
results
5
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Summarize (Preliminary) Results
Who are important people in the Meetup landscape?
Which Meetup groups should we talk to for certain topics?
Which Meetup groups are relevant in terms of
#Members, #Participants of events, #Events
Which Meetup groups are related and how?
...
5
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Key Takeaways – So far
• Graph data model perfect to focus on connectivity
• Code written once, re-useable many times to retrieve data from every
desired location (city)
• Visual analysis helps a great deal to understand how data are connected
• Big variety of analytic tools and frameworks to answer all kind of questions
– Integrated distributed, in-memory Graph analytics engine
• Use case of how to combine Open Source with Oracle Technologies
• Please also check latest Graph talks during
Analytics and Data Summit in March 2018
– https://analyticsanddatasummit.org/schedule/
5
@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 40@kpatenge
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Oracle Code Berlin
June 12th 2018
See you there 

Weitere ähnliche Inhalte

Was ist angesagt?

Training Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL LibraryTraining Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL Library
Neo4j
 

Was ist angesagt? (20)

Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
 
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraphOracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017
 
An Introduction to Graph: Database, Analytics, and Cloud Services
An Introduction to Graph:  Database, Analytics, and Cloud ServicesAn Introduction to Graph:  Database, Analytics, and Cloud Services
An Introduction to Graph: Database, Analytics, and Cloud Services
 
AnzoGraph DB - SPARQL 101
AnzoGraph DB - SPARQL 101AnzoGraph DB - SPARQL 101
AnzoGraph DB - SPARQL 101
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaBig Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al Essa
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
GraphTour - Neo4j Platform Overview
GraphTour - Neo4j Platform OverviewGraphTour - Neo4j Platform Overview
GraphTour - Neo4j Platform Overview
 
Spark Summit EU 2015: Matei Zaharia keynote
Spark Summit EU 2015: Matei Zaharia keynoteSpark Summit EU 2015: Matei Zaharia keynote
Spark Summit EU 2015: Matei Zaharia keynote
 
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
 
Training Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL LibraryTraining Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL Library
 
Plume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis LibraryPlume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis Library
 
Fossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriFossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatri
 
DbyDx Software Corporate Presentation
DbyDx Software Corporate PresentationDbyDx Software Corporate Presentation
DbyDx Software Corporate Presentation
 
Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
 
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-PlatformDelight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
 

Ähnlich wie Graph Analytics on Data from Meetup.com

MySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document StoreMySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document Store
Olivier DASINI
 
AGIT 2015 - Hans Viehmann: "Big Data and Smart Cities"
AGIT 2015  - Hans Viehmann: "Big Data and Smart Cities"AGIT 2015  - Hans Viehmann: "Big Data and Smart Cities"
AGIT 2015 - Hans Viehmann: "Big Data and Smart Cities"
jstrobl
 

Ähnlich wie Graph Analytics on Data from Meetup.com (20)

Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
Red hat infrastructure for analytics
Red hat infrastructure for analyticsRed hat infrastructure for analytics
Red hat infrastructure for analytics
 
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 
How To Visualize Graphs
How To Visualize GraphsHow To Visualize Graphs
How To Visualize Graphs
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 Apache AGE and the synergy effect in the combination of Postgres and NoSQL Apache AGE and the synergy effect in the combination of Postgres and NoSQL
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
 
MySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document StoreMySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document Store
 
AGIT 2015 - Hans Viehmann: "Big Data and Smart Cities"
AGIT 2015  - Hans Viehmann: "Big Data and Smart Cities"AGIT 2015  - Hans Viehmann: "Big Data and Smart Cities"
AGIT 2015 - Hans Viehmann: "Big Data and Smart Cities"
 
MySQL Day Paris 2018 - What’s New in MySQL 8.0 ?
MySQL Day Paris 2018 - What’s New in MySQL 8.0 ?MySQL Day Paris 2018 - What’s New in MySQL 8.0 ?
MySQL Day Paris 2018 - What’s New in MySQL 8.0 ?
 
Data meets AI - AICUG - Santa Clara
Data meets AI  - AICUG - Santa ClaraData meets AI  - AICUG - Santa Clara
Data meets AI - AICUG - Santa Clara
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
 
2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
Stream based Data Integration
Stream based Data IntegrationStream based Data Integration
Stream based Data Integration
 
Graph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise GraphGraph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise Graph
 
Meetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management TrendsMeetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management Trends
 
20180921_DOAG_BigDataDays_OracleSpatialandPython_kpatenge
20180921_DOAG_BigDataDays_OracleSpatialandPython_kpatenge20180921_DOAG_BigDataDays_OracleSpatialandPython_kpatenge
20180921_DOAG_BigDataDays_OracleSpatialandPython_kpatenge
 

Mehr von Karin Patenge

Mehr von Karin Patenge (16)

20190704_AGIT_Georaster_ImageryData_KPatenge
20190704_AGIT_Georaster_ImageryData_KPatenge20190704_AGIT_Georaster_ImageryData_KPatenge
20190704_AGIT_Georaster_ImageryData_KPatenge
 
20190703_AGIT_GeoRasterWorkshop_GriddedData_KPatenge
20190703_AGIT_GeoRasterWorkshop_GriddedData_KPatenge20190703_AGIT_GeoRasterWorkshop_GriddedData_KPatenge
20190703_AGIT_GeoRasterWorkshop_GriddedData_KPatenge
 
20190604_DOAGDatabase2019_OracleNoSQLDB_for_DBAs
20190604_DOAGDatabase2019_OracleNoSQLDB_for_DBAs20190604_DOAGDatabase2019_OracleNoSQLDB_for_DBAs
20190604_DOAGDatabase2019_OracleNoSQLDB_for_DBAs
 
Big Data Community Webinar vom 16. Mai 2019: Oracle NoSQL DB im Überblick
Big Data Community Webinar vom 16. Mai 2019: Oracle NoSQL DB im ÜberblickBig Data Community Webinar vom 16. Mai 2019: Oracle NoSQL DB im Überblick
Big Data Community Webinar vom 16. Mai 2019: Oracle NoSQL DB im Überblick
 
20181210_ITTage2018_OracleNoSQLDB_KPatenge
20181210_ITTage2018_OracleNoSQLDB_KPatenge20181210_ITTage2018_OracleNoSQLDB_KPatenge
20181210_ITTage2018_OracleNoSQLDB_KPatenge
 
20181120_DOAG_OracleNoSQLDB_KPatenge
20181120_DOAG_OracleNoSQLDB_KPatenge20181120_DOAG_OracleNoSQLDB_KPatenge
20181120_DOAG_OracleNoSQLDB_KPatenge
 
5 Gründe für Oracle Spatial Technologies
5 Gründe für Oracle Spatial Technologies5 Gründe für Oracle Spatial Technologies
5 Gründe für Oracle Spatial Technologies
 
IT-Tage 2017: Visuelle Analyse komplexer Datenbestände am Beispiel der Panama...
IT-Tage 2017: Visuelle Analyse komplexer Datenbestände am Beispiel der Panama...IT-Tage 2017: Visuelle Analyse komplexer Datenbestände am Beispiel der Panama...
IT-Tage 2017: Visuelle Analyse komplexer Datenbestände am Beispiel der Panama...
 
20171106_OracleWebcast_ITTrends_EFavuzzi_KPatenge
20171106_OracleWebcast_ITTrends_EFavuzzi_KPatenge20171106_OracleWebcast_ITTrends_EFavuzzi_KPatenge
20171106_OracleWebcast_ITTrends_EFavuzzi_KPatenge
 
20171121_DOAGKonferenz_JSON_OracleNoSQL_KPatenge
20171121_DOAGKonferenz_JSON_OracleNoSQL_KPatenge20171121_DOAGKonferenz_JSON_OracleNoSQL_KPatenge
20171121_DOAGKonferenz_JSON_OracleNoSQL_KPatenge
 
Oracle NoSQL Database: Integration in den Oracle Enterprise Manager 12
Oracle NoSQL Database: Integration in den Oracle Enterprise Manager 12Oracle NoSQL Database: Integration in den Oracle Enterprise Manager 12
Oracle NoSQL Database: Integration in den Oracle Enterprise Manager 12
 
20160310_ModernApplicationDevelopment_NoSQL_KPatenge
20160310_ModernApplicationDevelopment_NoSQL_KPatenge20160310_ModernApplicationDevelopment_NoSQL_KPatenge
20160310_ModernApplicationDevelopment_NoSQL_KPatenge
 
20160229_ModernApplicationDevelopment_Python_KPatenge
20160229_ModernApplicationDevelopment_Python_KPatenge20160229_ModernApplicationDevelopment_Python_KPatenge
20160229_ModernApplicationDevelopment_Python_KPatenge
 
Datenbank-gestützte Validierung und Geokodierung von Adressdatenbeständen
Datenbank-gestützte Validierung und Geokodierung von AdressdatenbeständenDatenbank-gestützte Validierung und Geokodierung von Adressdatenbeständen
Datenbank-gestützte Validierung und Geokodierung von Adressdatenbeständen
 
Raster Algebra mit Oracle Spatial und uDig
Raster Algebra mit Oracle Spatial und uDigRaster Algebra mit Oracle Spatial und uDig
Raster Algebra mit Oracle Spatial und uDig
 
Geodatenmanagement und -Visualisierung mit Oracle Spatial Technologies
Geodatenmanagement und -Visualisierung mit Oracle Spatial TechnologiesGeodatenmanagement und -Visualisierung mit Oracle Spatial Technologies
Geodatenmanagement und -Visualisierung mit Oracle Spatial Technologies
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 

Graph Analytics on Data from Meetup.com

  • 1. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Improve Your Experience by Using Graph Analytics Slides from my session at “Women Who Code” Meetup | 2018-05-23 | Berlin Karin Patenge | @kpatenge |  karin.patenge@oracle.com Business Development Manager Technology (Europe North) Oracle Deutschland B.V. & Co. KG 1
  • 2. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Oracle Code Berlin June 12th 2018 Free full-day event @ Funkhaus Berlin https://developer.oracle.com/code/berlin-june-2018 Including panel discussion: Go for IT! Make Diversity Matter: Digital Transformation as a Chance for Women in Coding
  • 3. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Agenda • Data of Interest • Questions of Interest • Data Processing Workflow • Key Takeaways • Q&A @kpatenge
  • 4. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Just briefly about myself Since Nov. 2016: Business Development Manager focusing on new(er)/emerging technologies & modern data management platforms for Europe North Joined Oracle in 2007: As Sales Consultant for Core Tech Products. Special topics: Spatial Technologies, Graph & Semantic Technologies, NoSQL, … Before Oracle: Since 1989 worked as Computer Scientist in several IT roles | depts for Radio Technology Manufacturer | Public Sector | Pharma (Schering, Bayer Health Care) @kpatenge
  • 5. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Setting the Scene @kpatenge
  • 6. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Data of Interest Direct relations not (yet) analyzed • Data retrieval via REST API https://www.meetup.com/meetup_api • Different API methods & versions • API Key required • Sample request • Data returned as JSON @kpatenge is_interested_in is_member_of is_assigned_to has_registered_for takes_place_in is_located_in
  • 7. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Questions of Interest • Which Meetup groups are most active in terms of: – # members – # events – # event attendees • Who and where are influencers in the Meetup community? • Where are connections between the Meetup groups in different locations? • Which topics are “hot”? • How close/similar are groups? • … @kpatenge
  • 8. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Data Processing Workflow: Overview Retrieve&Prepare Prepare source data • Using R data retrieval via REST API and conversion JSON  CSV  OPV/OPE Load&Build Load nodes and edges data into a graph • Use Oracle NoSQL DB as Graph data store Analyze Analyze graph data • Using Graph Analytics Engine (PGX) and Property Graph Query Language (PGQL) Visualize Visualize graph data • Using Cytoscape Results Summarize results @kpatenge
  • 9. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | • Code and result (data) files can be downloaded from: – https://github.com/karinpatenge/AnalyticsandD ataSummit2018 Important Note @kpatenge
  • 10. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Working Environment • Available for free: Oracle Big Data Lite VM 4.11 running in Oracle VirtualBox – Big Data Spatial and Graph (BDSG) 2.4 including Property Graph Query Language (PGQL) 1.0 http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html • Gremlin, Apache Groovy Shell • Zeppelin Notebook with PGX Interpreter – Oracle NoSQL Database (Minimal instance with 1 node, no replication, aka kvlite) – RStudio • Additional R packages loaded – Cytoscape 3.6.0 • Big Data Spatial and Graph 2.4 support installed @kpatenge
  • 11. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Modeling Data as Graphs 11 The more connected the data is, the better a Graph fits Oracle NoSQL DB with Big Data Spatial and GraphGraphic source: http://www.ateam-oracle.com/intro-to-graphs-at-oracle/ @kpatenge
  • 12. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | • A set of nodes (aka vertices) – each vertex has a unique identifier – each vertex has a set of in/out edges – each vertex has a collection of key-value properties • A set of edges – each edge has a unique identifier – each edge has a head/tail vertex – each edge has a label denoting type of relationship between two vertices – each edge has a collection of key-value properties • Blueprints Java APIs • Implementations – Oracle (Spatial and Graph, Big Data Spatial and Graph), Neo4j, DataStax (Titan), InfiniteGraph, Dex, Sail, MongoDB, … 12 What is a Property Graph? https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model @kpatenge 2
  • 13. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 13 Retrieve&Prepare Prepare source data • Using R data retrieval via REST API and conversion JSON  CSV  OPV/OPE Load&Build Load nodes and edges data into a graph • Use Oracle NoSQL DB to store Analyze Analyze graph data • Using Graph Analytics Engine (PGX) and Property Graph Query Language (PGQL) Visualize Visualize graph data • Using Cytoscape Results Summarize results 1 @kpatenge
  • 14. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Request URL (Example) • https://api.meetup.com/Women-Who- Code-Berlin- Germany/events?&key=506c1916524f6d 3a6c782432645f5eb&status=past,upcomi ng&omit=description • Important note: – For most requests data are only returned for the city that matches with the location that is assigned to the user profile posessing the API key Response (JSON) Requesting Data via Meetup REST API 1 @kpatenge
  • 15. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Run R Code via RStudio 1 @kpatenge
  • 16. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | • Transform JSON into a flat structure: One record per instance of information type – Cities – Categories – Groups – Members – Events – Topics • Store data in .csv – Not required but convenient to have as intermediate format Intermediate Results: CSV text files 1 @kpatenge
  • 17. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | • Extract attribute values from flat structure • Append each as single record into nodes and edges files Final Results: Flat File Structure for Property Graph 1 Nodes (aka Vertices) (in flat file format) Edges (in flat file format) @kpatenge
  • 18. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Useful Tips • When creating nodes and edges files (.opv, .ope) – Assign the right data type to attributes – Check for NULL values – Replace special characters – Remove duplicates – Check pattern of IDs used in source(s). Generate surrogate IDs if necessary. • Keep original ID by storing it as property if necessary @kpatenge 1
  • 19. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 19 Retrieve&Prepare Prepare source data • Using R data retrieval via REST API and conversion JSON  CSV  OPV/OPE Load&Build Load nodes and edges data into a graph • Use Oracle NoSQL DB to store Analyze Analyze graph data • Using Graph Analytics Engine (PGX) and Property Graph Query Language (PGQL) Visualize Visualize graph data • Using Cytoscape Results Summarize results 2 @kpatenge
  • 20. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Architecture of Property Graph Support 20@kpatenge 3 Graph Data Access Layer (DAL) Graph Analytics Blueprints & Lucene/SolrCloud RDF (RDF/XML, N- Triples, N-Quads, TriG,N3,JSON) REST/Web Service/Notebooks Java,Groovy,Python,… Java APIs Java APIs/JDBC/SQL/PLSQL Property Graph formats GraphML GML GraphSON Flat FilesScalable and Persistent Storage Management Parallel In-Memory Graph Analytics (PGX) / Graph Querying (PGQL) Oracle NoSQL Database Oracle RDBMS Apache HBase Apache Spark
  • 21. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Import Nodes and Edges into a Property Graph // Start Groovy Shell connecting to Oracle NoSQL DB cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy ./gremlin-opg-nosql.sh server = new ArrayList(); server.add("bigdatalite.localdomain:5000"); // Create a graph config that contains the graph name "meetup" // Name of KV store is "kvstore" // Make sure to add all vertex/edge properties used in PGQL queries cfg = GraphConfigBuilder.forPropertyGraphNosql() .setName("meetup") .setStoreName("kvstore") .setHosts(server) .addVertexProperty("type", PropertyType.STRING, "NA") .addVertexProperty("city_name", PropertyType.STRING, "NA") .addVertexProperty("city_country", PropertyType.STRING, "NA") .addVertexProperty("city_member_count", PropertyType.INTEGER, 0) .addVertexProperty("group_country", PropertyType.STRING, "NA") .addVertexProperty("group_visibility", PropertyType.STRING, "NA") .addVertexProperty("group_members", PropertyType.INTEGER, 0) .addVertexProperty("group_name", PropertyType.STRING, "NA") .addVertexProperty("member_name", PropertyType.STRING, "NA") .addVertexProperty("topic_name", PropertyType.STRING, "NA") .addVertexProperty("topic_urlkey", PropertyType.STRING, "NA") .addVertexProperty("event_yes_rsvp_count", PropertyType.INTEGER, 0) .addVertexProperty("event_rating_count", PropertyType.INTEGER, 0) .addVertexProperty("event_rating_average", PropertyType.INTEGER, 0) .addVertexProperty("event_waitlist_count", PropertyType.INTEGER, 0) .hasEdgeLabel(true) .setLoadEdgeLabel(true) .setMaxNumConnections(2).build(); // Create an instance of the graph opg = OraclePropertyGraph.getInstance(cfg); opg.setClearTableDOP(2); opg.clearRepository(); opg.getKVStoreConfig(); // Create an instance for the graph loader opgdl=OraclePropertyGraphDataLoader.getInstance(); vfile="/home/oracle/Documents/Meetup/data/meetup.opv efile="/home/oracle/Documents/Meetup/data/meetup.ope // Load data into the graph opgdl.loadData(opg, vfile, efile, 2); // Do some checks // Count vertices and edges opg.countVertices(); opg.countEdges(); // Get vertices and edges opg.getVertices(); opg.getEdges(); ... // Shut down instance and close shell opg.shutdown(); :q 2 @kpatenge
  • 22. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 22 Retrieve&Prepare Prepare source data • Using R data retrieval via REST API and conversion JSON  CSV  OPV/OPE Load&Build Load nodes and edges data into a graph • Use Oracle NoSQL DB to store Analyze Analyze graph data • Using Graph Analytics Engine (PGX) and Property Graph Query Language (PGQL) Visualize Visualize graph data • Using Cytoscape Results Summarize results 3 @kpatenge
  • 23. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | PGX – Graph Analytics Engine • Toolkit for In-Memory, Parallel Graph Analysis containing – PGX shell – Analyst API with a large collection of built-in algorithms – and more • Developed by Oracle Labs • https://docs.oracle.com/cd/E56133_01/latest/i ndex.html • https://event.cwi.nl/grades/2018/07- VanRest.pdf PGQL – Property Graph Query Language • http://pgql-lang.org/ • Graph Pattern Matching combined with SQL – WHERE clause set of comma-separated constraints • Developed by Oracle Labs • Proposed for standardization 23 How to Analyze Property Graph Data @kpatenge 3
  • 24. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Analyze Property Graph Data using PGX 3 • Start PGX server /opt/oracle/oracle-spatial- graph/property_graph/pgx/bin/start-server • Start / Return to Groovy Shell // Create in-memory session and analyst for analytics session=Pgx.createSession("session_ID_1"); analyst=session.createAnalyst(); // Read the graph from Oracle NoSQL DB into memory pgxGraph = session.readGraphWithProperties(opg.getConfig()); // Working with In-Memory Analyst // Execute Page Rank rank=analyst.pagerank(pgxGraph, 0.0001, 0.85, 100); // Get top 10 vertices rank.getTopKValues(10); // BetweenNess Centrality bc=analyst.vertexBetweennessCentrality(pgxGraph) // Get top 10 vertices bc.getTopKValues(10); ... @kpatenge
  • 25. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | • Topology constraints  (n)–[e]–>(m)  (n)–[e1]–>(m1), (n)–[e2]–>(m2)  (n1)-[e1]->(n2)-[e2]->(n3)-[e3]->(n4)  (n1)-[e1]->(n2)<-[e2]-(n3) • Label matching  (x:Person) -[e:likes]-> (y:Person)  (:Person) -[:likes]-> (:Person)  (x:Student|Professor) -[e:likes|knows]-> (y:Student|Professor) • Value constraints  (x) -> (y), x.name = 'John’, y.age > 25 • In-Line constraints  (n WITH name = 'John' OR name = 'James', type = 'Person') -[e WITH type = 'workAt', workHours < 40]-> () • … Syntax form Examples Basic form (n)-[e]->(m) Omit variable name of the source vertex ()-[e]->(m) Omit variable name of the destination vertex (n)-[e]->() Omit variable names in both vertices ()-[e]->() Omit variable name in edge (n)-->(m) Omit variable name in edge (alternative, one dash) (n)->(m) 25 Analyzing Property Graph Data using PGQL 3 @kpatenge
  • 26. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Analyzing Property Graph Data using PGQL 3 • Start / Return to Groovy Shell // Some PGQL queries pgxResultSet = pgxGraph.queryPgql("SELECT * WHERE (x) - [e1:is_organizer_of]-> (y) -[e2:is_located_in]-> (z)") pgxResultSet.print(5); pgxResultSet.getNumResults(); pgxResultSet = pgxGraph.queryPgql("SELECT x.type, y.type, y.group_name, y.group_members WHERE (x) - [e1:is_organizer_of]-> (y WITH group_members > 1000) - [e2:is_located_in]-> (z) order by y.group_members desc"); pgxResultSet.print(5); pgxResultSet = pgxGraph.queryPgql("SELECT x.member_name, y.group_name, y.group_members WHERE (x) -[e1:is_organizer_of]-> (y WITH group_members > 1000) - [e2:is_located_in]-> (z)"); pgxResultSet.print(5); pgxResultSet = pgxGraph.queryPgql("SELECT * WHERE (x WITH event_yes_rsvp_count > 250) -[e1:is_organized_by]- > (y) -[e2:is_located_in]-> (z)") pgxResultSet.print(5); ... @kpatenge https://blogs.oracle.com/bigdataspatialgraph/how-many-ways-to-run-property-graph-query-language-pgql-in-bdsg-i
  • 27. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 27 Retrieve&Prepare Prepare source data • Using R data retrieval via REST API and conversion JSON  CSV  OPV/OPE Load&Build Load nodes and edges data into a graph • Use Oracle NoSQL DB to store Analyze Analyze graph data • Using Graph Analytics Engine (PGX) and Property Graph Query Language (PGQL) Visualize Visualize graph data • Using Cytoscape Results Summarize results 4 @kpatenge
  • 28. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Visualization Data using Cytoscape connected to Big Data Spatial and Graph 4 @kpatenge
  • 29. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | SELECT * WHERE (x WITH type='Event') -[e1]-> (y WITH type='Group' and group_name = 'Women Who Code Berlin') <-[e2:is_assigned_to]- (z WITH type='Topic') 29 PGQL – Examples (visualized using Cytoscape) 4 @kpatenge https://blogs.oracle.com/bigdataspatialgraph/how-many-ways-to-run-property-graph-query-language-pgql-in-bdsg-ii
  • 30. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | SELECT * WHERE (x) -[e1:is_organized_by]-> (y WITH type='Group' and group_name = 'Women Who Code Berlin') <-[e2:is_assigned_to]- (z WITH type='Topic'), (y) -[e3:is_located_in]-> (w) 30 PGQL – Examples (visualized using Cytoscape) 4 @kpatenge
  • 31. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | SELECT * WHERE (x WITH type='Topic' and topic_name = 'Women in Technology') -[e1]-> (y WITH type='Group') - [e2]-> (z WITH type = 'City' and (city_name = 'Berlin' or city_name = 'Hamburg' or city_name = 'München')) 31 PGQL – Examples (visualized using Cytoscape) 4 @kpatenge
  • 32. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | SELECT * WHERE (x WITH type='Event' and event_yes_rsvp_count >= 250) -[e1]- (y WITH type='Group') -[e2]- (z WITH type='City') 32 PGQL – Examples (visualized using Cytoscape) 4 @kpatenge
  • 33. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | SELECT * WHERE (x WITH type='Group' and group_name = 'Women Who Code Berlin') <-[e1:is_assigned_to]- (y WITH type='Topic') -[e2]-> (z WITH group_members >= 2000) -[e3:is_located_in]-> (w) 33 PGQL – Examples (visualized using Cytoscape) 4 @kpatenge
  • 34. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Copenhagen Berlin Hamburg Munich 4 Meetup Groups in relation to organizers @kpatenge More Visualization Examples using Cytoscape
  • 35. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 4 @kpatenge More Visualization Examples using Cytoscape
  • 36. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 37 Retrieve&Prepare Prepare source data • Using R data retrieval via REST API and conversion JSON  CSV  OPV/OPE Load&Build Load nodes and edges data into a graph • Use Oracle NoSQL DB to store Analyze Analyze graph data • Using Graph Analytics Engine (PGX) and Property Graph Query Language (PGQL) Visualize Visualize graph data • Using Cytoscape Results Summarize results 5 @kpatenge
  • 37. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Summarize (Preliminary) Results Who are important people in the Meetup landscape? Which Meetup groups should we talk to for certain topics? Which Meetup groups are relevant in terms of #Members, #Participants of events, #Events Which Meetup groups are related and how? ... 5 @kpatenge
  • 38. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Key Takeaways – So far • Graph data model perfect to focus on connectivity • Code written once, re-useable many times to retrieve data from every desired location (city) • Visual analysis helps a great deal to understand how data are connected • Big variety of analytic tools and frameworks to answer all kind of questions – Integrated distributed, in-memory Graph analytics engine • Use case of how to combine Open Source with Oracle Technologies • Please also check latest Graph talks during Analytics and Data Summit in March 2018 – https://analyticsanddatasummit.org/schedule/ 5 @kpatenge
  • 39. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 40@kpatenge
  • 40. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Oracle Code Berlin June 12th 2018 See you there 