Weitere ähnliche Inhalte
Ähnlich wie 20181123 dn2018 graph_analytics_k_patenge (20)
Mehr von Karin Patenge (17)
Kürzlich hochgeladen (20)
20181123 dn2018 graph_analytics_k_patenge
- 1. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Visual Analysis of Social Media Data from
using Graph Technologies
DATA NATIVES 2018 | Nov 22-23, 2018 | Berlin
Karin Patenge | Principal Solution Engineer | Cloud & Core Technologies
@kpatenge | karin.patenge@oracle.com
Oracle Deutschland B.V. & Co. KG | Potsdam | Schiffbauergasse 14
- 2. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18
- 3. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Accessing Data Entities
• Data retrieval via REST API
https://www.meetup.com/meetup_api
• Different API methods & versions
• API Key required
• Sample request
• Data returned as JSON
@kpatenge @datanativesconf #DN18
- 4. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Potential Questions of Interest
• Which Meetup groups are most active in terms of:
– # members
– # events
– # event attendees
• Who and where are influencers in the Meetup community?
• Where are connections between the Meetup groups in different locations?
• Which topics are “hot” and where?
• How close/similar are groups?
• …
@kpatenge @datanativesconf #DN18
- 5. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Approach: Modeling Data as Graphs
The more connected the data is, the better a Graph fits
Source: http://www.ateam-oracle.com/intro-to-graphs-at-oracle/
@kpatenge @datanativesconf #DN18
- 6. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• A set of nodes (aka vertices)
– each vertex has a unique identifier
– each vertex has a set of in/out edges
– each vertex has a collection of key-value
properties
• A set of edges
– each edge has a unique identifier
– each edge has a head/tail vertex
– each edge has a label denoting type of
relationship between two vertices
– each edge has a collection of key-value properties
• Implementations
– Oracle (Spatial and Graph/Big Data Spatial and
Graph), Neo4j, DataStax (Titan), InfiniteGraph, …
What is a Property Graph?
https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
@kpatenge @datanativesconf #DN18
- 7. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• PageRank, Weighted PageRank
– Find influencers, critical vertices
• Personalized PageRank
– Find important people/products/...
with respect to a given starting point
• Sparsification
– Trim down the graph to make it more
fragmented
• Clustering
– Find communities which can be the
basis of segmentation, and/or
recommendation/anomaly detection,
churn analysis
• Centrality
– Find critical people/devices/...
• Shortest path
– Discover links, find suspect‘s close
collaborators, transportation routing
• Breadth-First-Search (BFS)
– Impact analysis, link analysis
• Matric factorization
– Recommendation
• Reachability
– Connectivity test
• ...
Graph Algorithms and their Applications
@kpatenge @datanativesconf #DN18
- 8. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Pathfinding
– fattestPath
– shortestPathBellmanFord
– shortestPathBellmanFordReverse
– shortestPathDijkstra
– shortestPathDijkstraBidirectional
– shortestPathFilteredDijkstra
– shortestPathFilteredDijkstraBidirectional
– shortestPathHopDist
– shortestPathHopDistReverse
Ranking
– closenessCentralityUnitLength
– degreeCentrality
– eigenvectorCentrality
– Hyperlink-Induced Topic Search (HITS)
– inDegreeCentrality
– nodeBetweennessCentrality
– outDegreeCentrality
– PageRank, weighted PageRank
– approximatePagerank
– personalizedPagerank
– randomWalkWithRestart
Social Network Analysis Algorithms (1)
@kpatenge @datanativesconf #DN18
https://tinyurl.com/pgxdocs
- 9. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Structure Evaluation
– Conductance
– countTriangles
– inDegreeDistribution
– outDegreeDistribution
– partitionConductance
– partitionModularity
– sparsify
– K-Core computes
Community Detection
– communitiesLabelPropagation
Recommendation
– salsa
– personalizedSalsa
– whomToFollow
Classic - Connected Components
– sccKosaraju
– sccTarjan
– wcc
Social Network Analysis Algorithms (2)
@kpatenge @datanativesconf #DN18
https://tinyurl.com/pgxdocs
- 10. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Architecture of Oracle Property Graph Analytics
Property
Graph formats
GraphML
GML
GraphSON
Flat Files
@kpatenge @datanativesconf #DN18
- 11. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
PGX
• Toolkit for In-Memory, Parallel Graph
Analytics containing
– PGX shell
– Analyst API with a large collection of built-in
algorithms (45+)
– Enhance with user defined algorithms written
in Green-Marl
– Tutorials, JavaDoc, Use Cases, and more
• Developed by Oracle Labs
• https://docs.oracle.com/cd/E56133_01/latest/i
ndex.html
PGQL – Property Graph Query Language
• http://pgql-lang.org/
• Graph Pattern Matching combined with
SQL
• Developed by Oracle Labs
• Proposed for standardization
• Changes in Version 1.1:
http://pgql-lang.org/spec/1.1/#breaking-syntax-
changes-since-pgql-10
Property Graph Analytics Engine
@kpatenge @datanativesconf #DN18
- 12. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Data Processing and Analysis Workflow: Overview
Retrieve&Prepare
Prepare
source data
• Using R for data
retrieval via
REST API and
conversion
JSON CSV
OPV/OPE
Load&Build
Load
nodes and
edges data
into a graph
• Using Oracle
NoSQL DB as
Graph data
store
Analyze
Analyze
graph data
• Using Graph
Analytics Engine
(PGX) and
Property Graph
Query Language
(PGQL)
Visualize
Visualize
graph data
• Using
Cytoscape
Results
Summarize
results
@kpatenge @datanativesconf #DN18
- 13. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Demo
@kpatenge @datanativesconf #DN18
- 14. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18
‚Big Data‘ Groups
in relation with
Topics and Cities
- 15. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18
‚Big Data‘ Groups
in relation with
Organizers, Cities
and Events
- 16. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18
‚Big Data‘ Groups
in relation with
Organizers and Cities
Weakly Connected
Components (WCC)
- 17. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18
Ranking via PageRank (Top 10+1)
- 18. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18
Ranking via PageRank (Top 10+1)
- 19. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18
Ranking via PageRank (Top 10+1)
- 20. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18
Ranking via PageRank (Top 10+1)
- 21. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
✓Which cities are tech hot spots?
✓Who are important people in the
Meetup landscape?
✓Which Meetup groups cover with
topics?
✓Which Meetup groups are relevant in
terms of
#Members, #Participants of events,
#Events
✓Which Meetup groups are related
and how?
✓Which topics are related and how?
• The way you model the graph has
influence on the results of executing
Graph algorithms
• The choice of edge directions does
matter depending on the algorithms
• Attaching weights to edges is useful
for certain algorithms
Some Results
@kpatenge @datanativesconf #DN18
- 22. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Key Takeaways
• Graph data model perfect to focus on connectivity
• Graph databases are powerful tools, complementing relational and other
databases
– Especially strong for analysis of graph topology and connectedness
• Visual analysis helps a great deal to understand how data are connected
– New insights, especially with relationships, dependencies and behavioral patterns
• Big variety of analytic tools and frameworks to answer all kind of questions
• Oracle Graph Technologies combined with Open Source or 3rd party tools
@kpatenge @datanativesconf #DN18
- 23. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Follow us @kpatenge @SpatialHannes @JeanIhm
karin.patenge@oracle.com
GitHub:
https://github.com/karinpatenge/DN2018
Blogs:
https://blogs.oracle.com/bigdataspatialgraph/
https://blogs.oracle.com/oraclespatial/
AskTom Office Hours for Property Graph:
https://asktom.oracle.com/pls/apex/f?p=100:551
@kpatenge @datanativesconf #DN18