This document discusses a project called MEM0R1ES that aims to automatically organize a person's digital information from various devices and online services to generate useful digital memories. The project develops techniques for entity search, typing, clustering, and elicitation to extract, integrate and expose personal information from heterogeneous graphs. It has produced several open-source software components and published results in top conferences. The document outlines current research directions and concludes that the project addresses important societal issues through stimulating collaboration between institutions.
2. Motivation (1/3)
Commoditization of digital equipment
■ Desktops, laptops, netbooks, mobile phones,
tablets, e-book readers, set-top boxes, personal
GPSs, digital cameras, TVs, etc.
Fragmentation of information across devices
3. Motivation (2/3)
The story of my life...
■ Where are the pictures of my niece’s birthday?
■ How should I consolidate/backup my emails?
Fortunately there’s the cloud, right?
4. Motivation (3/3)
2014 twist on Personal Information Management:
lifelogging, health-monitoring
■ Everylog, Memoto, Google Glasses, Nike's FuelBand,
FitBit, Samsung GearFit & competitors...
➡Urgent need to index & integrate continuous personal
feeds for automated processing
5. Problem Definition
Personal digital information is today fragmented
and externalized
➡“Each site is a silo, walled off from the others…”
[TBL 10.2010]
■ Data partitioning
■ Loss of governance
How shall one automatically reclaim and
meaningfully organize his/her digital information
dispersed online and on various devices to
generate useful digital memories?
7. the -Team
Prof. Dr. Philippe
Cudré-Mauroux
Prof. Dr. Karl
Aberer
Prof. Dr. Maria Sokhn
Julien Tscherrig
Joël Dumoulin
Michele
Catasta
Dr. Gianluca Demartini
Alberto Tonon
8. Last Year…
Device & Service Wrappers [EIA-FR]
■ Generic Wrapper Architecture:
SMTP, Gmail, Google Drive, Facebook, DBPedia,
Flickr, LinkedIn
■ Browser wrapper: [EPFL]
Lifelogging rich features (context, user activities
and focus, etc.) from the browser
Storage Infrastructure
■ Multi-purpose, declarative & elastic storage
layer [UNIFR]
9. Result from the Digital Reclaiming
➡Heterogeneous Graphs of Entities
Information duplication
Sometimes with different facets
Missing information
10. Today’s Focus
Meaningful information integration from
heterogeneous graphs of entities
1. Entity Search (AOR)
2. Entity Typing (TRank)
3. Entity Clustering (ZenCrowd, MemorySense,
Predict)
4. Entity Elicitation (Transactive Search)
Use-case: leveraging digital mem0r1es from a
conference participation (demonstrators)
11. 1. Entity Search [UNIFR]
Main idea: combine unstructured and
structured search to find relevant entities in
the graph
■ Inverted index to locate first candidates
■ Graph queries to refine the results
■ Graph traversals (queries on object properties)
■ Graph neighborhoods (queries on data type properties)
13. 2. Entity Typing [UNIFR+EPFL]
Entities can have many types (facets)
■ Which fine-grained types are most relevant given
the context?
Thing
American
Billionaire
s
People
from King
County
People
from
Seattle
Windows
People
Agent
Person
Living
People
American
People of
Scottish
Descent
Harvard
University
People
American
Computer
Programmers
American
Philanthropists
People
from
Seattle
14. 2. Entity Typing
Integrates BigData types from the Web of data
■ Tree of 447’260 types
■ Rooted on <owl:Thing>
■ Depth of 19
Ranks relevant types by analyzing the context
■ Textual context
■ Graph context
■ Decision trees
■ Linear regression
15. 3. Entity Clustering
Several efforts to cluster entities into
meaningful groups depending on context:
PREDIct [EIA-FR]
■ Extracts Web
information through
wrappers
■ Models topics through
Latent Dirichlet Allocation
■ Predictions based on
topic trends
16. 3. Entity Clustering
MemorySense [EPFL]
■ Clusters mobile data into
macro-activities
■ Leverages location, machine-
learning and an activity ontology
B-hist [UNIFR+EPFL+EIA-FR]
■ Better browser history
clustering through
entity typing and
machine-learning
17. 4. Entity Elicitation [EPFL+UNIFR]
Filling the gaps in mem0r1es entity graphs
■ e.g., ‘who also attended WWW03 last year?’
■ Traditional methods (Web crawling, machine-
learning, micro-task crowdsourcing) are insufficient
■ Errors and lack of discriminative features (➘precision)
■ Lack of public data (➘recall)
18. 4. Entity Elicitation
Adapting the concept of transactive memories
(group memories) from psychology
➡Transactive search methods to elicit
information
■ Social network analysis (to direct the search)
■ Crowdsourcing (to get the information)
■ 46% improvement (F1) over best alternative
19. Demo
Use-case on scientific conference memories
Based on 4 demonstrators:
■ Visualizing clustered mobile data (MemorySense)
■ Information elicitation through Transactive Search
(Hippocampus)
■ Browsing clustered Web history (B-hist)
■ Clustering and prediction of topics based on
extracted information (PREDIct)
20. Dissemination (1)
Papers at top research venues:
■ Alberto Tonon, Gianluca Demartini, Philippe Cudré-Mauroux: Combining inverted indices and structured
search for ad-hoc object retrieval. SIGIR 2012.
■ Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux, Karl Aberer: TRank,
Ranking Entity Types Using the Web of Data. International Semantic Web Conference ISWC 2013.
■ Michele Catasta, Alberto Tonon, Djellel Eddine Difallah, Gianluca Demartini, Karl Aberer, Philippe Cudré-
Mauroux: Hippocampus, answering memory queries using transactive search. WWW 2014.
■ Michele Catasta, Alberto Tonon, Vincent Pasquier, Gianluca Demartini, Karl Aberer, Philippe Cudré-
Mauroux: B-hist, Better Entity-Centric Search over Personal Web Browsing History. International
Semantic Web Conference ISWC 2014.
■ Michele Catasta, Alberto Tonon, Gianluca Demartini, Jean-Eudes Ranvier, Karl Aberer, Philippe Cudré-
Mauroux: B-hist, Entity-Centric Search over Personal Web Browsing History. Journal of Web Semantics,
2014 (to appear).
■ Michele Catasta, Alberto Tonon, Djellel Eddine Difallah, Gianluca Demartini, Karl Aberer, Philippe Cudre-
Mauroux: TransactiveDB: Tapping into Collective Human Memories. PVLDB, 2014 (in revision).
■ Julien Tscherrig, Philippe Cudre-Mauroux, Elena Mugellini, Omar Abou Khaled, Maria Sokhn:
SemantiConverter: A Flexible Framework to Convert Semi-Structured Data into RDF. Submitted for
publication.
21. Dissemination (2)
Android app on Google Play
Open-source release of most components
■ https://github.com/MEM0R1ES
ISWC 2013 Best-Paper Award nominee (TRank)
Semantic Web Challenge 2013 Finalist (B-hist)
Wall Street Journal mention (B-hist, 30.10.2013)
Technology transfer
■ Extracting entities (Google Zurich)
■ MemorySense (Samsung)
■ TRank (Yahoo!)
Start-up (?)
22. Current Research Directions
Modelling tail-entities
Transactive DB operator
Automatic capture of important memories
■ Google Glasses
Software integration
23. Conclusions
Exciting project
■ Important, timely societal issues
■ Fundamental research questions
■ Data Storage, Data Integration, Data Clustering, Data Elicitation
Stimulating collaboration
■ Involving 3 (4) institutions
➡Thanks to all partners for their contributions!
A number of tangible results already
■ Open-source software components
■ Publications at top research venues
■ Industry transfer
24. Thanks a lot for your attention,
… and many thanks to the Hasler Stiftung
for funding this project!
Questions?
Hasler Stiftung SmartWorld Workshop, June 19, 2014, Thun — Switzerland
Reclaim your Digital Life