SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Downloaden Sie, um offline zu lesen
KNOWLEDGE
INFORMATION
DATA
Adding Value Through Graph
Analysis

Matthias Broecheler, CTO
@mbroecheler               AURELIUS
March V, MMXIII            THINKAURELIUS.COM
"                                   "
                                                     "
               "
                                                "
                                 "                       "

    "
                                           "
Communities of Interest 
         Finding Influencers 
                        "
Understanding Behavior 
                                "
"                                "
                                                  "
                "
                                             "
                              "                       "

    "
                                        "
Information Integration 
          Recommendation 
                        "
Question Answering 
                             "
"                                  "
                                                    "
                "
                                               "
                                "                       "


    "
                                          "
Fraud Detection 
             Risk Analysis 
                        "
Market Valuation 
                               "
Knowledge




               Value
Information




   Data
likes(Jane Joe, cute mamals):0.8


                                    Knowledge
         userid:3552
"
  clicked
 timestamp:
                  addid:9914
                                    Information
 93932342
                 "
2013-03-03 18:52:48:112;
12.123.211.192; ACCESS/TRR;
http://adserve.domain.com/
render.cgi?
uid=F32282DA39B&flagtru&xls=trendi     Data
ng ; ACTION=CLICK|DELAY=250|x=450|
y=632!
Graph Databases 
                                                          &
likes(Jane Joe, cute mamals):0.8
                                                    Graph Analysis
                                    Knowledge
         userid:3552
"
  clicked
 timestamp:
                  addid:9914
                                    Information
 93932342
                 "
2013-03-03 18:52:48:112;
12.123.211.192; ACCESS/TRR;
http://adserve.domain.com/
render.cgi?
uid=F32282DA39B&flagtru&xls=trendi     Data
ng ; ACTION=CLICK|DELAY=250|x=450|
y=632!
Adding Value through graph analysis using Titan and Faunus
Adding Value through graph analysis using Titan and Faunus
Adding Value through graph analysis using Titan and Faunus
I
Graph Foundation


                   AURELIUS
                   THINKAURELIUS.COM
name: Neptune
   name: Alcmene
                         type: god
       type: god



Vertex
                                                              Property


         name: Saturn
   name: Jupiter
   name: Hercules
         type: titan
    type: god
       type: demigod




                         name: Pluto
     name: Cerberus
                         type: god
       type: monster




                                                            Graph
name: Neptune
                  name: Alcmene
                                   type: god
                      type: god



Edge
                        brother
                         mother


       name: Saturn
               name: Jupiter
                  name: Hercules
       type: titan
                type: god
                      type: demigod



              father
                       father

                                                                                        Edge
                                                        battled
                        brother
                                                      Property
                                                      time:12


                                   name: Pluto
                    name: Cerberus
                                   type: god
                      type: monster

   Edge
   Type                                      pet



                                                                                     Graph
name: Neptune
                  name: Alcmene
                            type: god
                      type: god




                 brother
                         mother


name: Saturn
               name: Jupiter
                  name: Hercules
type: titan
                type: god
                      type: demigod



       father
                       father


                                                 battled
                 brother
                                               time:12


                            name: Pluto
                    name: Cerberus
                            type: god
                      type: monster



                                      pet



                                                                              Path
name: Neptune
                  name: Alcmene
                            type: god
                      type: god




                 brother
                         mother


name: Saturn
               name: Jupiter
                  name: Hercules
type: titan
                type: god
                      type: demigod



       father
                       father


                                                 battled
                 brother
                                               time:12


                            name: Pluto
                    name: Cerberus
                            type: god
                      type: monster



                                      pet



                                                                              Degree
Apache 2

            Aurelius Graph Cluster
          TITAN                                 FAUNUS                               FULGORA




                                Map/Reduce
                                                                          Load

                                 Bulk Load




                                 Analysis results
                                 back into Titan


    Stores a massive-scale                    Batch processing of large           Runs global graph algorithms
property graph allowing real-                   graphs with Hadoop
                  on large, compressed,
 time traversals and updates
                                                          in-memory graphs
II
Titan Graph Database



                       AURELIUS
                       THINKAURELIUS.COM
Titan Features
  Numerous Concurrent Users
  Many Short Transactions
    read/write
  Real-time Traversals (OLTP)
  High Availability
  Dynamic Scalability
  Variable Consistency Model
    ACID or eventual consistency
  Real-time Big Graph Data
Storage Backends
               Partitionability




Consistency
                       Availability
$ ./titan-0.2.0/bin/gremlin.sh!
  ! ! !,,,/!
         (o o)!
-----oOOo-(_)-oOOo-----!
gremlin> g = TitanFactory.open('/tmp/titan')!
==>titangraph[local:/tmp/titan]!
gremlin> v = g.V(‘name’,’Hercules’)!
==>v[4]!
gremlin> v.out(‘father’).out(‘brother’).name!
name: Neptune
                  name: Alcmene
                                  type: god
                      type: god




                       brother
                         mother


      name: Saturn
               name: Jupiter
                  name: Hercules
      type: titan
                type: god
                      type: demigod



             father
                       father


                                                       battled
                       brother
                                                     time:12


                                  name: Pluto
                    name: Cerberus
                                  type: god
                      type: monster



                                            pet




gremlin> v.out(‘father’).out(‘brother’).name!
Vertex-Centric Indices
  Sort and index edges per
   vertex by primary key
    Primary key can be composite
  Enables efficient focused
   traversals
    Only retrieve edges that matter
  Uses push down predicates for
   quick, index-driven retrieval
battled
         battled
        battled
 time: 1
        time: 3
        time: 5



       mother
                       battled
                            v
                  v.query()!
                                     time: 9



  father
        fought
         fought
battled
         battled
        battled
 time: 1
        time: 3
        time: 5



       mother
                       battled
                            v
                  v.query()!
                                     time: 9
                                                 .direction(OUT)!

  father
battled
    battled
        battled
 time: 1
   time: 3
        time: 5




                                battled
                       v
                  v.query()!
                                time: 9
                                            .direction(OUT)!
                                            .labels(‘battled’)!
battled
    battled
 time: 1
   time: 3




                       v
   v.query()!
                             .direction(OUT)!
                             .labels(‘battled’)!
                             .has(‘time,T.lt,5)!
Titan Features

I.  Data Management




II.  Vertex-Centric
     Indices
Titan Features

III.  Graph
   Partitioning




IV.  Edge Compression
III
TITAN 0.3.0 [-SNAPSHOT]



                          AURELIUS
                          THINKAURELIUS.COM
Titan Embedding
  Rexster RexPro
    lightweight Gremlin
     Server
    binary protocol
  Titan Gremlin Engine
  Embedded Storage
   Backend
    in-JVM method calls
  Native clients
    Java, Python, Clojure
Graph Indexing
  Vertex and Edge indexing
  Pluggable index provider
    ElasticSearch
    Lucene
  Full-text search
  Numeric range search
  Geographic search
name: Neptune
                  name: Alcmene
                            age: 5200
                      age: 3300
                            title: God of the
                            earth and ocean




                 brother
                       mother

                            name: Jupiter
name: Saturn
               age: 4800
                      name: Hercules
age: 5900
                  title: God of the               title: Divine hero
                            heaven and skies


       father
                       father

                                                          battled

                 brother
                                      time:12
                                                               location: (38.071,23.745)


                            name: Pluto
                                                            name: Cerberus
                            age: 4900
                                                            title: Ugly beast of the
                            title: God of the
                                                            underworld
                            underworld

                                       pet
name: Neptune
                  name: Alcmene
                                   age: 5200
                      age: 3300
                                   title: God of the
                                   earth and ocean




                        brother
                       mother

                                   name: Jupiter
       name: Saturn
               age: 4800
                      name: Hercules
       age: 5900
                  title: God of the               title: Divine hero
                                   heaven and skies


              father
                       father

                                                                 battled

                        brother
                                      time:12
                                                                      location: (38.071,23.745)


                                   name: Pluto
                                                                   name: Cerberus
                                   age: 4900
                                                                   title: Ugly beast of the
                                   title: God of the
                                                                   underworld
                                   underworld

                                              pet




g.query().has(‘age’,Cmp.GREATER_THAN,5000).vertices()!
name: Neptune
                  name: Alcmene
                                   age: 5200
                      age: 3300
                                   title: God of the
                                   earth and ocean




                        brother
                       mother

                                   name: Jupiter
       name: Saturn
               age: 4800
                      name: Hercules
       age: 5900
                  title: God of the               title: Divine hero
                                   heaven and skies


              father
                       father

                                                                 battled

                        brother
                                      time:12
                                                                      location: (38.071,23.745)


                                   name: Pluto
                                                                   name: Cerberus
                                   age: 4900
                                                                   title: Ugly beast of the
                                   title: God of the
                                                                   underworld
                                   underworld

                                              pet




g.query().has(‘title’,Txt.CONTAINS,’god’).vertices()!
name: Neptune
                  name: Alcmene
                              age: 5200
                      age: 3300
                              title: God of the
                              earth and ocean




                   brother
                       mother

                              name: Jupiter
  name: Saturn
               age: 4800
                      name: Hercules
  age: 5900
                  title: God of the               title: Divine hero
                              heaven and skies


         father
                       father

                                                            battled

                   brother
                                      time:12
                                                                 location: (38.071,23.745)


                              name: Pluto
                                                              name: Cerberus
                              age: 4900
                                                              title: Ugly beast of the
                              title: God of the
                                                              underworld
                              underworld

                                         pet




g.query().has(‘age’,Cmp.GREATER_THAN,5000)

has(‘title’,Txt.CONTAINS,’god’).vertices()!
name: Neptune
                  name: Alcmene
                            age: 5200
                      age: 3300
                            title: God of the
                            earth and ocean




                 brother
                       mother

                            name: Jupiter
name: Saturn
               age: 4800
                      name: Hercules
age: 5900
                  title: God of the               title: Divine hero
                            heaven and skies


       father
                       father

                                                          battled

                 brother
                                      time:12
                                                               location: (38.071,23.745)


                            name: Pluto
                                                            name: Cerberus
                            age: 4900
                                                            title: Ugly beast of the
                            title: God of the
                                                            underworld
                            underworld

                                       pet




  g.query().has(‘location’,Geo.WITHIN,

   Geoshape.circle(38,23,100).edges()!
IV
Faunus Graph Analytics



                         AURELIUS
                         THINKAURELIUS.COM
Faunus Features
  Hadoop-based Graph
   Computing Framework
  Graph Analytics
  Breadth-first Traversals
  Global Graph Computations
  Batch Big Graph Data
Faunus Architecture




         g._()!
Faunus Work Flow

g.V.out                        .out                   .count()




                                  hdfs://user/ubuntu/
                                      output/job-0/
                                      output/job-1/       graph*
                                      output/job-2/   {   sideeffect*
Compressed HDFS Graphs
  stored in sequence files
  variable length encoding
  prefix compression
Apache 2

            Aurelius Graph Cluster
          TITAN                                 FAUNUS                               FULGORA




                                Map/Reduce
                                                                          Load

                                 Bulk Load




                                 Analysis results
                                 back into Titan


    Stores a massive-scale                    Batch processing of large           Runs global graph algorithms
property graph allowing real-                   graphs with Hadoop
                  on large, compressed,
 time traversals and updates
                                                          in-memory graphs
What’s New
  Faunus 0.1 released
  Bulk Import / Export for Titan
    loaded graph into Titan
    loading derivations into Titan
    RDF support
  Many optimizations
    vertex compression
Faunus Setup


$ bin/gremlin.sh !

         ,,,/!
         (o o)!
-----oOOo-(_)-oOOo-----!
gremlin> g = FaunusFactory.open('bin/titan-hbase.properties')!
==>faunusgraph[titanhbaseinputformat]!
gremlin> g.getProperties()!
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
==>faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat!
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat!
==>faunus.output.location=dbpedia!
==>faunus.output.location.overwrite=true!
gremlin> g._() !
12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)!
12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1:
MapSequence[com.thinkaurelius.faunus.mapreduce.transform.IdentityMap.Map]!
12/11/09 15:17:50 INFO mapred.JobClient: Running job: job_201211081058_0003!
Build a Knowledge Graph
  Based on DBPedia
    Graph version of Wikipedia
    ~290 million edges (~1B triples)
1.  Bulk load RDF into Faunus
    6 m1.xlarge
2.  Convert to property graph
3.  Bulk load into Titan
    3 m1.xlarge with Cassandra
4.  OLTP+OLAP
    Total Time: ~ 2 hours
Graph OLTP

gremlin> g = TitanFactory.open('bin/cassandra.local')   !
==>titangraph[cassandrathrift:10.176.213.110]!

gremlin> g.V('name','Random_walker_algorithm').both.name!
==>Random_walk!
==>Segmentation_(image_processing)!
==>Graph_(mathematics)!
==>Laplacian_matrix!
==>Graph!
==>Laplacian_matrix!
==>Electrical_network!
==>Resistor!
==>Electrical_resistance_and_conductance!
==>Ground_(electricity)!
==>Direct_current!
==>Voltage_source!
==>Precomputation!
==>Category:Computer_vision!
==>Random_Walker_(Computer_Vision)!
==>List_of_algorithms!
==>Segmentation_(image_processing)!
==>Watershed_(image_processing)!
==>Random_walker_(computer_vision)!
==>Random_Walker_(computer_vision)!
gremlin> g.V('name','Learning').out.out.out.out[0..10].name !
==>Latium!
==>Roman_Kingdom!
==>Roman_Republic!
==>Roman_Empire!
==>Middle_Ages!
==>Early_modern_Europe!
==>Armenian_Kingdom_of_Cilicia!
==>Lingua_franca!
==>Vatican_City!
==>Vulgar_Latin!
==>Romance_languages!
Apache 2

            Aurelius Graph Cluster
          TITAN                                 FAUNUS                               FULGORA




                                Map/Reduce
                                                                          Load

                                 Bulk Load




                                 Analysis results
      aureliusgraphs@googlegroups.com
                                 back into Titan


    Stores a massive-scale                    Batch processing of large           Runs global graph algorithms
property graph allowing real-                   graphs with Hadoop
                  on large, compressed,
 time traversals and updates
                                                          in-memory graphs
Speed of Traversal/Process
     The Graph Landscape




Illustration only, not to scale
                                         Size of Graph
TINKERPOP.COM
Thanks!


   Vadas Gintautas
    Marko Rodriguez
   @vadasg
            @twarko


   Stephen Mallette
   Daniel LaRocque
   @spmallette

                           AURELIUS
                           THINKAURELIUS.COM
We are Hiring



   AURELIUS
  THINKAURELIUS.COM

Weitere ähnliche Inhalte

Andere mochten auch

Recuperare dati da partizioni NTFS danneggiate
Recuperare dati da partizioni NTFS danneggiateRecuperare dati da partizioni NTFS danneggiate
Recuperare dati da partizioni NTFS danneggiateAndrea Lazzarotto
 
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...Martin Junghanns
 
Ricostruzione forense di NTFS con metadati parzialmente danneggiati
Ricostruzione forense di NTFS con metadati parzialmente danneggiatiRicostruzione forense di NTFS con metadati parzialmente danneggiati
Ricostruzione forense di NTFS con metadati parzialmente danneggiatiAndrea Lazzarotto
 
TinkerPop and Titan from a Python State of Mind
TinkerPop and Titan from a  Python State of MindTinkerPop and Titan from a  Python State of Mind
TinkerPop and Titan from a Python State of MindDenise Gosnell, Ph.D.
 
C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...
C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...
C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...DataStax Academy
 
TinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsTinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsJoshua Shinavier
 
Come si creano le app Android
Come si creano le app AndroidCome si creano le app Android
Come si creano le app AndroidAndrea Lazzarotto
 
Building Knowledge Graphs in DIG
Building Knowledge Graphs in DIGBuilding Knowledge Graphs in DIG
Building Knowledge Graphs in DIGPalak Modi
 
Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?Stephen Mallette
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopJason Plurad
 
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...DataStax
 
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Intro to Graph Databases Using Tinkerpop, TitanDB, and GremlinIntro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Intro to Graph Databases Using Tinkerpop, TitanDB, and GremlinCaleb Jones
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 

Andere mochten auch (15)

Recuperare dati da partizioni NTFS danneggiate
Recuperare dati da partizioni NTFS danneggiateRecuperare dati da partizioni NTFS danneggiate
Recuperare dati da partizioni NTFS danneggiate
 
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
 
Ricostruzione forense di NTFS con metadati parzialmente danneggiati
Ricostruzione forense di NTFS con metadati parzialmente danneggiatiRicostruzione forense di NTFS con metadati parzialmente danneggiati
Ricostruzione forense di NTFS con metadati parzialmente danneggiati
 
TinkerPop and Titan from a Python State of Mind
TinkerPop and Titan from a  Python State of MindTinkerPop and Titan from a  Python State of Mind
TinkerPop and Titan from a Python State of Mind
 
C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...
C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...
C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...
 
TinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsTinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBs
 
Come si creano le app Android
Come si creano le app AndroidCome si creano le app Android
Come si creano le app Android
 
Building Knowledge Graphs in DIG
Building Knowledge Graphs in DIGBuilding Knowledge Graphs in DIG
Building Knowledge Graphs in DIG
 
PSL Overview
PSL OverviewPSL Overview
PSL Overview
 
Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop
 
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
 
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Intro to Graph Databases Using Tinkerpop, TitanDB, and GremlinIntro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Mehr von Matthias Broecheler

Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Matthias Broecheler
 
Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013Matthias Broecheler
 
Titan - Graph Computing with Cassandra
Titan - Graph Computing with CassandraTitan - Graph Computing with Cassandra
Titan - Graph Computing with CassandraMatthias Broecheler
 
PMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social NetworksPMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social NetworksMatthias Broecheler
 
Budget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large NetworksBudget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large NetworksMatthias Broecheler
 
Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010Matthias Broecheler
 
A Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social NetworksA Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social NetworksMatthias Broecheler
 
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social NetworksCOSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social NetworksMatthias Broecheler
 

Mehr von Matthias Broecheler (10)

Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3
 
Titan NYC Meetup March 2014
Titan NYC Meetup March 2014Titan NYC Meetup March 2014
Titan NYC Meetup March 2014
 
Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013
 
Titan - Graph Computing with Cassandra
Titan - Graph Computing with CassandraTitan - Graph Computing with Cassandra
Titan - Graph Computing with Cassandra
 
PMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social NetworksPMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social Networks
 
Budget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large NetworksBudget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large Networks
 
Probabilistic Soft Logic
Probabilistic Soft LogicProbabilistic Soft Logic
Probabilistic Soft Logic
 
Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010
 
A Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social NetworksA Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social Networks
 
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social NetworksCOSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks
 

Kürzlich hochgeladen

Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 

Kürzlich hochgeladen (20)

Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 

Adding Value through graph analysis using Titan and Faunus

  • 1. KNOWLEDGE INFORMATION DATA Adding Value Through Graph Analysis Matthias Broecheler, CTO @mbroecheler AURELIUS March V, MMXIII THINKAURELIUS.COM
  • 2. " " " " " " " " " Communities of Interest Finding Influencers " Understanding Behavior "
  • 3. " " " " " " " " " Information Integration Recommendation " Question Answering "
  • 4. " " " " " " " " " Fraud Detection Risk Analysis " Market Valuation "
  • 5. Knowledge Value Information Data
  • 6. likes(Jane Joe, cute mamals):0.8 Knowledge userid:3552 " clicked timestamp: addid:9914 Information 93932342 " 2013-03-03 18:52:48:112; 12.123.211.192; ACCESS/TRR; http://adserve.domain.com/ render.cgi? uid=F32282DA39B&flagtru&xls=trendi Data ng ; ACTION=CLICK|DELAY=250|x=450| y=632!
  • 7. Graph Databases & likes(Jane Joe, cute mamals):0.8 Graph Analysis Knowledge userid:3552 " clicked timestamp: addid:9914 Information 93932342 " 2013-03-03 18:52:48:112; 12.123.211.192; ACCESS/TRR; http://adserve.domain.com/ render.cgi? uid=F32282DA39B&flagtru&xls=trendi Data ng ; ACTION=CLICK|DELAY=250|x=450| y=632!
  • 11. I Graph Foundation AURELIUS THINKAURELIUS.COM
  • 12. name: Neptune name: Alcmene type: god type: god Vertex Property name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod name: Pluto name: Cerberus type: god type: monster Graph
  • 13. name: Neptune name: Alcmene type: god type: god Edge brother mother name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod father father Edge battled brother Property time:12 name: Pluto name: Cerberus type: god type: monster Edge Type pet Graph
  • 14. name: Neptune name: Alcmene type: god type: god brother mother name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod father father battled brother time:12 name: Pluto name: Cerberus type: god type: monster pet Path
  • 15. name: Neptune name: Alcmene type: god type: god brother mother name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod father father battled brother time:12 name: Pluto name: Cerberus type: god type: monster pet Degree
  • 16. Apache 2 Aurelius Graph Cluster TITAN FAUNUS FULGORA Map/Reduce Load Bulk Load Analysis results back into Titan Stores a massive-scale Batch processing of large Runs global graph algorithms property graph allowing real- graphs with Hadoop on large, compressed, time traversals and updates in-memory graphs
  • 17. II Titan Graph Database AURELIUS THINKAURELIUS.COM
  • 18. Titan Features   Numerous Concurrent Users   Many Short Transactions   read/write   Real-time Traversals (OLTP)   High Availability   Dynamic Scalability   Variable Consistency Model   ACID or eventual consistency   Real-time Big Graph Data
  • 19. Storage Backends Partitionability Consistency Availability
  • 20. $ ./titan-0.2.0/bin/gremlin.sh! ! ! !,,,/! (o o)! -----oOOo-(_)-oOOo-----! gremlin> g = TitanFactory.open('/tmp/titan')! ==>titangraph[local:/tmp/titan]! gremlin> v = g.V(‘name’,’Hercules’)! ==>v[4]! gremlin> v.out(‘father’).out(‘brother’).name!
  • 21. name: Neptune name: Alcmene type: god type: god brother mother name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod father father battled brother time:12 name: Pluto name: Cerberus type: god type: monster pet gremlin> v.out(‘father’).out(‘brother’).name!
  • 22. Vertex-Centric Indices   Sort and index edges per vertex by primary key   Primary key can be composite   Enables efficient focused traversals   Only retrieve edges that matter   Uses push down predicates for quick, index-driven retrieval
  • 23. battled battled battled time: 1 time: 3 time: 5 mother battled v v.query()! time: 9 father fought fought
  • 24. battled battled battled time: 1 time: 3 time: 5 mother battled v v.query()! time: 9 .direction(OUT)! father
  • 25. battled battled battled time: 1 time: 3 time: 5 battled v v.query()! time: 9 .direction(OUT)! .labels(‘battled’)!
  • 26. battled battled time: 1 time: 3 v v.query()! .direction(OUT)! .labels(‘battled’)! .has(‘time,T.lt,5)!
  • 27. Titan Features I.  Data Management II.  Vertex-Centric Indices
  • 28. Titan Features III.  Graph Partitioning IV.  Edge Compression
  • 29. III TITAN 0.3.0 [-SNAPSHOT] AURELIUS THINKAURELIUS.COM
  • 30. Titan Embedding   Rexster RexPro   lightweight Gremlin Server   binary protocol   Titan Gremlin Engine   Embedded Storage Backend   in-JVM method calls   Native clients   Java, Python, Clojure
  • 31. Graph Indexing   Vertex and Edge indexing   Pluggable index provider   ElasticSearch   Lucene   Full-text search   Numeric range search   Geographic search
  • 32. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupiter name: Saturn age: 4800 name: Hercules age: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld pet
  • 33. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupiter name: Saturn age: 4800 name: Hercules age: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld pet g.query().has(‘age’,Cmp.GREATER_THAN,5000).vertices()!
  • 34. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupiter name: Saturn age: 4800 name: Hercules age: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld pet g.query().has(‘title’,Txt.CONTAINS,’god’).vertices()!
  • 35. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupiter name: Saturn age: 4800 name: Hercules age: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld pet g.query().has(‘age’,Cmp.GREATER_THAN,5000)
 has(‘title’,Txt.CONTAINS,’god’).vertices()!
  • 36. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupiter name: Saturn age: 4800 name: Hercules age: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld pet g.query().has(‘location’,Geo.WITHIN,
 Geoshape.circle(38,23,100).edges()!
  • 37. IV Faunus Graph Analytics AURELIUS THINKAURELIUS.COM
  • 38. Faunus Features   Hadoop-based Graph Computing Framework   Graph Analytics   Breadth-first Traversals   Global Graph Computations   Batch Big Graph Data
  • 40. Faunus Work Flow g.V.out .out .count() hdfs://user/ubuntu/ output/job-0/ output/job-1/ graph* output/job-2/ { sideeffect* Compressed HDFS Graphs   stored in sequence files   variable length encoding   prefix compression
  • 41. Apache 2 Aurelius Graph Cluster TITAN FAUNUS FULGORA Map/Reduce Load Bulk Load Analysis results back into Titan Stores a massive-scale Batch processing of large Runs global graph algorithms property graph allowing real- graphs with Hadoop on large, compressed, time traversals and updates in-memory graphs
  • 42. What’s New   Faunus 0.1 released   Bulk Import / Export for Titan   loaded graph into Titan   loading derivations into Titan   RDF support   Many optimizations   vertex compression
  • 43. Faunus Setup $ bin/gremlin.sh ! ,,,/! (o o)! -----oOOo-(_)-oOOo-----! gremlin> g = FaunusFactory.open('bin/titan-hbase.properties')! ==>faunusgraph[titanhbaseinputformat]! gremlin> g.getProperties()! ==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat ==>faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat! ==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat! ==>faunus.output.location=dbpedia! ==>faunus.output.location.overwrite=true! gremlin> g._() ! 12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)! 12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1: MapSequence[com.thinkaurelius.faunus.mapreduce.transform.IdentityMap.Map]! 12/11/09 15:17:50 INFO mapred.JobClient: Running job: job_201211081058_0003!
  • 44. Build a Knowledge Graph   Based on DBPedia   Graph version of Wikipedia   ~290 million edges (~1B triples) 1.  Bulk load RDF into Faunus   6 m1.xlarge 2.  Convert to property graph 3.  Bulk load into Titan   3 m1.xlarge with Cassandra 4.  OLTP+OLAP   Total Time: ~ 2 hours
  • 45. Graph OLTP gremlin> g = TitanFactory.open('bin/cassandra.local') ! ==>titangraph[cassandrathrift:10.176.213.110]! gremlin> g.V('name','Random_walker_algorithm').both.name! ==>Random_walk! ==>Segmentation_(image_processing)! ==>Graph_(mathematics)! ==>Laplacian_matrix! ==>Graph! ==>Laplacian_matrix! ==>Electrical_network! ==>Resistor! ==>Electrical_resistance_and_conductance! ==>Ground_(electricity)! ==>Direct_current! ==>Voltage_source! ==>Precomputation! ==>Category:Computer_vision! ==>Random_Walker_(Computer_Vision)! ==>List_of_algorithms! ==>Segmentation_(image_processing)! ==>Watershed_(image_processing)! ==>Random_walker_(computer_vision)! ==>Random_Walker_(computer_vision)!
  • 47. Apache 2 Aurelius Graph Cluster TITAN FAUNUS FULGORA Map/Reduce Load Bulk Load Analysis results aureliusgraphs@googlegroups.com back into Titan Stores a massive-scale Batch processing of large Runs global graph algorithms property graph allowing real- graphs with Hadoop on large, compressed, time traversals and updates in-memory graphs
  • 48. Speed of Traversal/Process The Graph Landscape Illustration only, not to scale Size of Graph
  • 50. Thanks! Vadas Gintautas Marko Rodriguez @vadasg @twarko Stephen Mallette Daniel LaRocque @spmallette AURELIUS THINKAURELIUS.COM
  • 51. We are Hiring AURELIUS THINKAURELIUS.COM