This poster describes an efficient approach to maintaining multiple views on large, evolving social networks.
Abstract:
The Social Semantic Web (SSW) refers to the mix of RDF data in web content, and social network data associated with those who posted that content. Applications to monitor the
SSW are becoming increasingly popular. For instance, marketers want to look for semantic patterns relating to the content of tweets and Facebook posts relating to their products. Such applications allow multiple users to specify patterns of interest, and monitor them in real-time as new data gets added to the web or to a social network. In this paper, we develop the concept of SSW view servers in which all of these types of applications can be simultaneously monitored from such servers. The patterns of interest are views. We show that a given set of views can be compiled in multiple possible ways to take advantage of common substructures, and define the concept of an optimal merge. We develop a very fast MultiView algorithm that scalably and efficiently maintains multiple subgraph views. We show that our algorithm is correct, study its complexity, and experimentally demonstrate that our algorithm can scalably handle updates
to hundreds of views on real-world SSW databases with up to 540M edges.
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks
Efficient Multi-View Maintenance in the Social Semantic Web
1. Efficient Multi-View Maintenance in
the Social Semantic Web
1 2 3
Views on Social Networks Multiple Views Merging Views
Query = Subgraph matching query • On large social networks, multiple views are often IDEA:Very often, view queries have overlapping
View = Query for which the answer set is maintained concurrently. subgraph structures (bold arcs). If we can overlay the
maintained as the social network database is • Maintaining multiple views is very expensive, in different view queries such that these shared
updated particular for rapidly changing databases. substructures can be matched jointly rather than
publish • e.g. Twitter has over 340 million tweets / day independently, we can save a lot of time.
?person ?doc
Health topic Business Social network updates are edge insertions (removals),
as
comments topic ?article1 topic
Care Analytics
hence graph merging has to center around the inserted
so
ci
at
publish topic references ?article2
Health
ed
?author ?article
Care expert publish ?msg2 references
edge type.
?msg1
publish
references
tweet
expert tweet
tweet Developed subgraph matching algorithm that can
follows
?msg ?expert ?expert
associated
?person ?other process merged view queries efficiently.
4 5 6
Example Merge Merge Optimality Optimal Merge
publish topic publish
expert
?v15
associated
?v13 ?v12
Many possible ways to merge query graphs. We ?v9 ?v8
topic comments
follows associated
topic
?v14 tweet
references
want high connected overlap which results in most topic
Health topic Business
Health topic
?v3 ?v16 savings at update time. Care
?v3
Analytics
topic
Care Business references ?v6
references Analytics publish
Define merged view score as the sum of edge
expert
topic
?a2 publish ?v5 references
expert
expert
?a2 tweet publish
tweet
publish
?v5
?v6 overlaps. tweet
expert
references associated follows
tweet ?a1 ?v4 ?v7
?a1
associated
?v4
follows
?v7 publish Finding the optimal view wrt the merged view follows
references associated
topic
?v8
Edges mapped by
1, 2 and 3
score is NP–hard. ?v11
Edges mapped by
1, 2 and 3
publish Edges mapped
topic
publish Edges mapped
topic
associated
comments only by 1
Edges mapped Our greedy view merging algorithm finds near ?v12
tweet only by 1
Edges mapped
only by 2
publish only by 2
?v9 ?v11 ?v10 Edges mapped
only by 3
optimal views in practice.
?v13
references Edges mapped
only by 3
LEGEND LEGEND
7 8 9
Experiments Mul2-View-Maintenance-Performance- Applications
900%% 100.0%%
800%%
Outperforming-
95.0%%
Maintaining multiple views jointly as a merged view
Improvement-
Compared our merged multi-view maintenance 700%%
600%% 90.0%%
algorithm against standard independent view 500%%
85.0%%
leads to significant improvements.
400%%
maintenance. 300%% 80.0%% 477% faster than standard view maintenance
200%%
6 real world social network datasets with up to 540 100%%
75.0%%
million edges. 0%% 70.0%% Applications include:
Monitoring social networks
%
%
r%
%
%
t%
Randomly generated 12,000 queries with varying
ics
n
al
be
ck
ku
ro
rn
tu
ys
Fli
Or
En
Fraud, security applications, alerts
ou
Ph
u
degree of overlap and averaged results over 750 trials.
Yo
eJ
Liv
Business Analytics
All algorithm implemented in Java on top of the COSI Performance improvement of the Multi-View Knowledge Discovery
graph database middleware. Caching frequently asked queries
Maintenance algorithm on 6 different social networks
Matthias Broecheler, Andrea Pugliese,
and VS Subrahmanian