- how to compute recommendations using a graph with 40m edges and 11m nodes in 0.2s (200ms)
- new perspective on near real-time social recommendations in enterprise social platforms using Linked Data
- recommender system that is easy to integrate with social networks and legacy data
- application of data analytics in enterprise context
2. ADVANsse
Advances in social semantic enterprise
HTTP://ADVANSSE.DERI.IE/
MACIEJ DABROWSKI
BENJAMIN HEITMANN
CONOR HAYES
KEITH GRIFFIN
10TH JULY 2013
15. Problem 2: data level issues
DISTRIBUTION
MULTIPLE DOMAINS
ANDTYPES OF ENTITIES
PEOPLE
INTERESTS
CONTENT
16. Requirements - personalization
USE BACKGROUND KNOWLEDGE
ALLOW CROSS-DOMAIN MULTI-
SOURCE PERSONALIZATION
EXPLOIT SOCIAL GRAPH
ALLOW REAL-TIME APPLICATIONS
17. Requirements - data
DATA LEVEL
• FLEXIBLE
• COMPACT
• ENABLE CRUD
• GRAPH?
TRANSPORT PROTOCOL:
• RELIABLE
• EFFICIENT
• PUBSUB?
18. What?
A PLATFORM BASED ON OPEN STANDARDS
THAT IS EASILY PLUGGABLETO EXISTING
INFRASTRUCTURES ANDTHAT EXPLOITS
LEGACY INFORMATION, SOCIAL GRAPH
AND INTEREST GRAPHTO PROVIDE A
PERSONALIZED INFORMATION
“DASHBOARD” IN NEAR REAL-TIME.
22. Step 2: Exploit interest graphs
BENEFITS OF USING INTEREST GRAPHS:
1. FLEXIBLE SOURCE OF BACKGROUND KNOWLEDGE
2. ANY DATASET CAN BE “PLUGGED-IN” IF NEEDED
3. CROSS-DOMAIN RECOMMENDATIONS
4. VERY GOOD IN DISCOVERING INTERESTING
RECOMMENDATIONS
OUR APPROACH: SPREADING ACTIVATION
24. Our Approach
A PLATFORM FOR SOCIAL NETWORKS:
§ ENTERPRISE FOCUS: PEOPLE, COMMUNITIES, INFORMATION
§ EFFICIENCY USING XMPP PUBSUB AND SPARQL 1.1 UPDATE
§ EXPLOIT INTEREST GRAPH ANDVARIOUS DATA SOURCES
TO PROVIDE PERSONALIZATIONTHROUGH SOPHISTICATED
NEAR REAL-TIME RECOMMENDATIONS
25. Demonstrator
EASYTO INTEGRATE WITH CISCO INFRASTRUCTURE
OPEN STANDARDS (XMPP, SPARQL 1.1 UPDATE)
SCALABLE RECOMMENDATIONS BASED ON SOCIAL
GRAPH WITH OVER 10M ENTITIES AND 40M EDGES
COMPUTED BELOW 1 SECOND (0.2S ON AVERAGE).
MORE DETAILS: HTTP://ADVANSSE.DERI.IE/
30. Technical considerations
NON-NATIVE IMPORT OF RDF
STARTUPTIME WITH DBPEDIA
• 12 MIN ON 24 CORE, 96GB RAMTO LOAD
PARALLEL PROCESSING OF ACTIVATIONS
• STATE FOR EACH USER AT EACH NODE
SCALABILITY ISSUES
LACK OF GLOBAL ALGORITHM CONTROL
IMMATURE CODE BASE, LACK OF
DOCUMENTATION
32. Server design
XMPP
SPREADING ACTIVATION
HDT
ADVANSSE connected
social platform
XMPP client:
Ignite Smack
Web application:
Tomcat + Servlet
RDF store:
Jena Fuseki
ADVANSSE
server
Personalisation
component
Recommendation
algorithm
XMPP
R/W RDF store:
Jena Fuseki
XMPP
Java API
XMPP server:
Ignite OpenFire
XMPP client:
Ignite Smack
Fast, R/O RDF
store: HDT
SPARQL
SPARQL +
Java API
Java API +
SPARQL
Java
API
SPARQL
Java API
File
import
Link resolver
RDF store:
Jena Fuseki
33. configuration
• DISTANCE CONSTRAINT DISABLED
• FANOUT CONSTRAINT ENABLED
• 10TARGET ACTIVATIONS
• ACTIVATIONTHRESHOLD 0.5
• INITIAL ACTIVATION 4.0,
• MAXIMUM OUT EDGES 500,
• AND A MAXIMUM OF 10 WAVES AND 1 PHASE
35. The value
SOCIAL CAPITAL IN ENTERPRISE
SOCIAL NETWORKS IN NOT FULLY
EXPLOITED.
ENTERPRISE SOCIAL PLATFORMS
ARE DISTRIBUTED AND INCLUDE
VARIOUS SOURCES OF
INFORMATION.
VALUABLE INFORMATION IN AN
ORGANIZATION IS NOT
DISCOVERED BYTHE RELEVANT
EMPLOYEES.
DISCOVER AND CONNECT WITH
RELEVANT PEOPLE IN THE
ORGANIZATION.
AGGREGATE INFORMATION FROM
VARIOUS DISTRIBUTED SOCIAL
PLATFORMS USING OPEN
STANDARDS
PROVIDE NEAR REAL-TIME
PERSONALIZATION BASED ON
LARGE, DYNAMIC GRAPH DATA.
37. Lessons learned
• GREATER RELEVANCETO REAL PROBLEMS
• CLEARER REQUIREMENTS (AND MORE)
• ACCESSTO ACTUAL USAGE DATA (REAL USERS)
• PATENTSVS. PUBLISHING
• PROTOTYPE INTEGRATION CONSUMES RESOURCES
• MORE FOCUS ON FEATURE DEVELOPMENT
• LESS EXPLORATION AND HYPOTHESISTESTING
38. major considerations
ACCESSTO INDUSTRY
DATA
INTEGRATION WITH
THE PRODUCT?
https://www.keytrac.net/assets/industry-social-networks.jpg http://www.autointhenews.com/wp-content/uploads/2010/05/volvo-s60-crash-video-image.jpg
39. Summary
PROBLEM
§ INFORMATION OVERLOAD AND INEFFICIENT INFORMATION
DISCOVERY IN DISTRIBUTED ENTERPRISE SOCIAL NETWORKS
SOLUTION
§ RECOMMENDER SYSTEMTHAT EXPLOITS SOCIAL GRAPH
§ UTILIZE INTEREST GRAPH AND LEGACY INFORMATION
§ NEAR-REALTIME PERSONALIZATION
TECHNOLOGY
§ OPEN SOURCE COMPONENT FOR RDF DATA AGGREGATION
USING XMPP AND SPARQL 1.1 UPDATE
§ PERSONALIZATION COMPONENT BASED ON SPREADING
ACTIVATION APPLICABLETO MULTI-SOURCE, CROSS DOMAIN
DATA
Broader picture Users expect personalised experiences Preferences are distributed and cover many domains New site: no user profile no recommendations Goal: Use any user information for recommendations from any target domain or data setDomain -
This problem is also visible in enterprises, where within company there are different domains, social platforms etc
SHOW DEMO
Non-native import of RDF: The Giraph paradigm for reading input data is incompatible with reading NTriple files, as it assumes each node of the graph to be describe on exactly one line of the input data. Merging of lines needs to be done in a pre-processing step. In addition Giraph does not support true multi-graphs.
Out of the 56 users without enough recommendations, 64% had only 2 out-links and 20% had only 3 out-links (