SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
Efficient Multi-View Maintenance in
       the Social Semantic Web
                                                                                                                      1                                                                                                                                     2                                                                                                                          3
 Views on Social Networks                                                                                                                 Multiple Views                                                                                                             Merging Views
  Query = Subgraph matching query                                                                                                       •  On large social networks, multiple views are often                                                                     IDEA:Very often, view queries have overlapping
  View = Query for which the answer set is                                                                                              maintained concurrently.                                                                                                  subgraph structures (bold arcs). If we can overlay the
maintained as the social network database is                                                                                             •  Maintaining multiple views is very expensive, in                                                                       different view queries such that these shared
updated                                                                                                                                  particular for rapidly changing databases.                                                                                substructures can be matched jointly rather than
                                                      publish                                                                             •  e.g. Twitter has over 340 million tweets / day                                                                        independently, we can save a lot of time.
                                         ?person                        ?doc
                                                                                                                                                    Health         topic                         Business                                                            Social network updates are edge insertions (removals),
                                                      as




                               comments                                          topic                                                                                     ?article1                             topic
                                                                                                                                                     Care                                        Analytics
                                                                                                                                                                                                                                                                   hence graph merging has to center around the inserted
                                                        so
                                                          ci
                                                           at




                            publish                                      topic                                                                                    references                                             ?article2
                                                                                    Health
                                                             ed




             ?author                     ?article
                                                                                     Care                                                        expert                         publish           ?msg2        references
                                                                                                                                                                                                                                                                   edge type.
                                                                                                                                                                   ?msg1
                                                                                                                                                                                                                             publish
                             references
                                                         tweet
                                                                                    expert                                                                tweet
                                                                                                                                                                                                  tweet                                                              Developed subgraph matching algorithm that can
                                                                                                                                                                                                             follows
                                           ?msg                         ?expert                                                                    ?expert
                                                                                                                                                                   associated
                                                                                                                                                                                     ?person                             ?other                                    process merged view queries efficiently.




                                                                                                                      4                                                                                                                                     5                                                                                                                          6
 Example Merge                                                                                                                            Merge Optimality                                                                                                          Optimal Merge
                               publish                                             topic                                                                                                                                                                                                                    publish
                                                               expert




                                                                                                         ?v15




                                                                                                                                                                                                                                                                       associated
                   ?v13                        ?v12
                                                                                                                                           Many possible ways to merge query graphs. We                                                                                                     ?v9                           ?v8

                                                                                                                                                                                                                                                                                                  topic        comments
              follows       associated
                                                   topic
                                                                          ?v14           tweet
                                                                                                      references
                                                                                                                                         want high connected overlap which results in most                                                                                                                                           topic
                                                                                                                                                                                                                                                                                          Health             topic                             Business
                 Health            topic
                                                   ?v3                                               ?v16                                savings at update time.                                                                                                                           Care
                                                                                                                                                                                                                                                                                                                          ?v3
                                                                                                                                                                                                                                                                                                                                               Analytics
                                                                                                                                                                                                                                                                                                                                                               topic
                  Care                                            Business                                                                                                                                                                                                                                  references                                                    ?v6
                                   references                     Analytics                         publish
                                                                                                                                             Define merged view score as the sum of edge




                                                                                                                                                                                                                                                                                         expert
                                                                                   topic
                                                                                                                                                                                                                                                                                                               ?a2         publish                ?v5        references
                expert




                                                                                                                                                                                                                                                                                expert
                                        ?a2                                                                                                                                                                                                                                                         tweet                                                                   publish
                           tweet
                                                   publish
                                                                                   ?v5
                                                                                                            ?v6                               overlaps.                                                                                                                                                                                         tweet
    expert




                                                                                           references                                                                                                                                                                                                        associated                                    follows
                                                                           tweet                                                                                                                                                                                                              ?a1                                    ?v4                                  ?v7
                     ?a1
                                      associated
                                                             ?v4
                                                                               follows
                                                                                             ?v7            publish                        Finding the optimal view wrt the merged view                                                                                                                       follows
                                  references                                                                                                                                                                                                                                                                          associated
                             topic
                                                   ?v8
                                                                                                   Edges mapped by
                                                                                                    1,  2 and  3
                                                                                                                                         score is NP–hard.                                                                                                                                                                              ?v11
                                                                                                                                                                                                                                                                                                                                                                     Edges mapped by
                                                                                                                                                                                                                                                                                                                                                                     1, 2 and 3
                                                                                                                                                                                                                                                                                                  publish                                                            Edges mapped




                                                                                                                                                                                                                                                                             topic
                                                             publish                               Edges mapped
   topic
                         associated
                                              comments                                             only by  1
                                                                                                   Edges mapped                            Our greedy view merging algorithm finds near                                                                                                                         ?v12
                                                                                                                                                                                                                                                                                                                                tweet                                only by 1
                                                                                                                                                                                                                                                                                                                                                                     Edges mapped
                                                                                                                                                                                                                                                                                                                                                                     only by 2
                            publish                                                                only by  2
              ?v9                          ?v11                ?v10                                Edges mapped
                                                                                                   only by  3
                                                                                                                                         optimal views in practice.	

                                                                                                                       ?v13
                                                                                                                                                                                                                                                                                                          references                                                 Edges mapped
                                                                                                                                                                                                                                                                                                                                                                     only by 3
                                                                                    LEGEND                                                                                                                                                                                                                                                           LEGEND




                                                                                                                      7                                                                                                                                     8                                                                                                                          9
 Experiments                                                                                                                                 Mul2-View-Maintenance-Performance-                                                                                      Applications
                                                                                                                                         900%%                                                                                    100.0%%
                                                                                                                                         800%%
                                                                                                                                                                                                                                           Outperforming-




                                                                                                                                                                                                                                  95.0%%
                                                                                                                                                                                                                                                                   Maintaining multiple views jointly as a merged view
                                                                                                                          Improvement-




Compared our merged multi-view maintenance                                                                                               700%%
                                                                                                                                         600%%                                                                                    90.0%%
algorithm against standard independent view                                                                                              500%%
                                                                                                                                                                                                                                  85.0%%
                                                                                                                                                                                                                                                                   leads to significant improvements.
                                                                                                                                         400%%
maintenance.                                                                                                                             300%%                                                                                    80.0%%                             477% faster than standard view maintenance
                                                                                                                                         200%%
  6 real world social network datasets with up to 540                                                                                   100%%
                                                                                                                                                                                                                                  75.0%%
million edges.                                                                                                                             0%%                                                                                    70.0%%                           Applications include:
                                                                                                                                                                                                                                                                     Monitoring social networks
                                                                                                                                                      %


                                                                                                                                                                  %




                                                                                                                                                                                            r%



                                                                                                                                                                                                           %
                                                                                                                                                                             %




                                                                                                                                                                                                                       t%




  Randomly generated 12,000 queries with varying
                                                                                                                                                  ics



                                                                                                                                                                 n




                                                                                                                                                                                                        al
                                                                                                                                                                           be


                                                                                                                                                                                          ck




                                                                                                                                                                                                                    ku
                                                                                                                                                              ro




                                                                                                                                                                                                     rn
                                                                                                                                                                           tu
                                                                                                                                               ys




                                                                                                                                                                                       Fli




                                                                                                                                                                                                                  Or
                                                                                                                                                             En




                                                                                                                                                                                                                                                                       Fraud, security applications, alerts
                                                                                                                                                                                                  ou
                                                                                                                                             Ph




                                                                                                                                                                         u




degree of overlap and averaged results over 750 trials.
                                                                                                                                                                      Yo




                                                                                                                                                                                                eJ
                                                                                                                                                                                             Liv




                                                                                                                                                                                                                                                                     Business Analytics
  All algorithm implemented in Java on top of the COSI                                                                       Performance improvement of the Multi-View                                                                                              Knowledge Discovery
graph database middleware.                                                                                                                                                                                                                                           Caching frequently asked queries
                                                                                                                          Maintenance algorithm on 6 different social networks




                                                                                                                                                                                                                                                                Matthias Broecheler, Andrea Pugliese,
                                                                                                                                                                                                                                                                               and VS Subrahmanian

Weitere ähnliche Inhalte

Mehr von Matthias Broecheler

Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013Matthias Broecheler
 
Titan - Graph Computing with Cassandra
Titan - Graph Computing with CassandraTitan - Graph Computing with Cassandra
Titan - Graph Computing with CassandraMatthias Broecheler
 
Adding Value through graph analysis using Titan and Faunus
Adding Value through graph analysis using Titan and FaunusAdding Value through graph analysis using Titan and Faunus
Adding Value through graph analysis using Titan and FaunusMatthias Broecheler
 
Titan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraTitan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraMatthias Broecheler
 
PMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social NetworksPMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social NetworksMatthias Broecheler
 
Budget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large NetworksBudget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large NetworksMatthias Broecheler
 
Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010Matthias Broecheler
 
A Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social NetworksA Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social NetworksMatthias Broecheler
 
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social NetworksCOSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social NetworksMatthias Broecheler
 

Mehr von Matthias Broecheler (13)

Titan NYC Meetup March 2014
Titan NYC Meetup March 2014Titan NYC Meetup March 2014
Titan NYC Meetup March 2014
 
Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013
 
Titan - Graph Computing with Cassandra
Titan - Graph Computing with CassandraTitan - Graph Computing with Cassandra
Titan - Graph Computing with Cassandra
 
Data Day Texas 2013
Data Day Texas 2013Data Day Texas 2013
Data Day Texas 2013
 
Adding Value through graph analysis using Titan and Faunus
Adding Value through graph analysis using Titan and FaunusAdding Value through graph analysis using Titan and Faunus
Adding Value through graph analysis using Titan and Faunus
 
Big Graph Data
Big Graph DataBig Graph Data
Big Graph Data
 
Titan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraTitan: Big Graph Data with Cassandra
Titan: Big Graph Data with Cassandra
 
PMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social NetworksPMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social Networks
 
Budget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large NetworksBudget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large Networks
 
Probabilistic Soft Logic
Probabilistic Soft LogicProbabilistic Soft Logic
Probabilistic Soft Logic
 
Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010
 
A Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social NetworksA Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social Networks
 
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social NetworksCOSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks
 

Efficient Multi-View Maintenance in the Social Semantic Web

  • 1. Efficient Multi-View Maintenance in the Social Semantic Web 1 2 3 Views on Social Networks Multiple Views Merging Views   Query = Subgraph matching query •  On large social networks, multiple views are often IDEA:Very often, view queries have overlapping   View = Query for which the answer set is maintained concurrently. subgraph structures (bold arcs). If we can overlay the maintained as the social network database is •  Maintaining multiple views is very expensive, in different view queries such that these shared updated particular for rapidly changing databases. substructures can be matched jointly rather than publish •  e.g. Twitter has over 340 million tweets / day independently, we can save a lot of time. ?person ?doc Health topic Business   Social network updates are edge insertions (removals), as comments topic ?article1 topic Care Analytics hence graph merging has to center around the inserted so ci at publish topic references ?article2 Health ed ?author ?article Care expert publish ?msg2 references edge type. ?msg1 publish references tweet expert tweet tweet   Developed subgraph matching algorithm that can follows ?msg ?expert ?expert associated ?person ?other process merged view queries efficiently. 4 5 6 Example Merge Merge Optimality Optimal Merge publish topic publish expert ?v15 associated ?v13 ?v12   Many possible ways to merge query graphs. We ?v9 ?v8 topic comments follows associated topic ?v14 tweet references want high connected overlap which results in most topic Health topic Business Health topic ?v3 ?v16 savings at update time. Care ?v3 Analytics topic Care Business references ?v6 references Analytics publish   Define merged view score as the sum of edge expert topic ?a2 publish ?v5 references expert expert ?a2 tweet publish tweet publish ?v5 ?v6 overlaps. tweet expert references associated follows tweet ?a1 ?v4 ?v7 ?a1 associated ?v4 follows ?v7 publish   Finding the optimal view wrt the merged view follows references associated topic ?v8 Edges mapped by  1,  2 and  3 score is NP–hard. ?v11 Edges mapped by 1, 2 and 3 publish Edges mapped topic publish Edges mapped topic associated comments only by  1 Edges mapped   Our greedy view merging algorithm finds near ?v12 tweet only by 1 Edges mapped only by 2 publish only by  2 ?v9 ?v11 ?v10 Edges mapped only by  3 optimal views in practice. ?v13 references Edges mapped only by 3 LEGEND LEGEND 7 8 9 Experiments Mul2-View-Maintenance-Performance- Applications 900%% 100.0%% 800%% Outperforming- 95.0%% Maintaining multiple views jointly as a merged view Improvement- Compared our merged multi-view maintenance 700%% 600%% 90.0%% algorithm against standard independent view 500%% 85.0%% leads to significant improvements. 400%% maintenance. 300%% 80.0%%   477% faster than standard view maintenance 200%%   6 real world social network datasets with up to 540 100%% 75.0%% million edges. 0%% 70.0%% Applications include:   Monitoring social networks % % r% % % t%   Randomly generated 12,000 queries with varying ics n al be ck ku ro rn tu ys Fli Or En   Fraud, security applications, alerts ou Ph u degree of overlap and averaged results over 750 trials. Yo eJ Liv   Business Analytics   All algorithm implemented in Java on top of the COSI Performance improvement of the Multi-View   Knowledge Discovery graph database middleware.   Caching frequently asked queries Maintenance algorithm on 6 different social networks Matthias Broecheler, Andrea Pugliese, and VS Subrahmanian