SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Effect of Heuristics on Serendipity in Path-Based
Storytelling with Linked Data
Laurens De Vocht
Christian Beecks, Ruben Verborgh, Erik Mannens, Thomas Seidl, Rik Van de Walle
BA
Introduction
Pathfinding
Semantic Distance
Evaluation
Conclusions & Next Steps
BA
Introduction
Pathfinding
Semantic Distance
Evaluation
Conclusions & Next Steps
BA
?
How to consistently improve and tailor existing
pathfinding approaches? [pathfinding]
How well do heuristics effect user expectations so
users are able to discover feeling confident about the
story facts relevance? [serendipity]
Is semantic distance between facts a good criterion
for optimizing the paths forming a story? [user
judgments]
6
- trivial
randomness -
familiarity
surprise
sense
makin
g
+ discovery
Serendipity
BA
Introduction
Pathfinding
Semantic Distance
Evaluation
Conclusions & Next Steps
8
Original Core Algorithm A* based
A*
h =
Jaccard
Distance
w = Common
Node Degree
Optimizations
9
Improved Algorithm Wraps Core Algorithm
h
w
Domain Delineation
Iterative Refinement
to increase semantic relatedness
10
Heuristics [h]
Jaccard
Normalized
DBpedia Distance
Confidence
11
Weights [w]
Jaccard
Jiang-Conrath
Distance (JCW)
Common Node
Degree (CND)
BA
Introduction
Pathfinding
Semantic Distance
Evaluation
Conclusions & Next Steps
13
Semantic
Distance
0.62 via Physics
0.45 via Hume
Einstein
Newton
Physics
Hume
:influences
:discipline
:birthPlace :deathPlace
Semantic Distances
14
Normalized Web Search Distance
e.g. Google Distance, Bing Distance

15
Motivating Example
16
17
18
Semantic Distances (continued)
BA
Introduction
Pathfinding
Semantic Distance
Evaluation
Conclusions & Next Steps
20
Serendipity – Semantic Distance
21
Serendipity – User Judgments
22
Serendipity – User Judgments
Least agreement (high standard deviation):
Carl Linnaeus and Albert Einstein [JCWJaccard]
Carl Linnaeus and Baruch Spinoza are Expert, Intellectual and Scholar Baruch
Spinoza’s and Albert Einstein’s are both Pantheists Intellectuals and Jewish
Philosophers
Most relevant and consistent:
Charles Darwin and Carl Linnaeus [CNDJaccard]
Copley Medal’s the award of Alfred Russel Wallace and Charles Darwin
Alfred Russel Wallace’s and Charles Darwin’s awards are Royal Medal and
Copley Medal Alfred Russel Wallace and Charles Darwin are known for their
Natural selection Carl Linnaeus and Alfred Russel Wallace have as subject
‘Fellows of the Royal Society’ Carl Linnaeus and Alfred Russel Wallace are
Biologists and Colleagues
BA
Introduction
Pathfinding
Semantic Distance
Evaluation
Conclusions & Next Steps
24
Conclusions
 Reducing the number of arbitrary resources/facts revealed for a story.
 Dbpedia example: telling a story with better link estimation, in cases
where the original algorithm did not make optimal choices of links.
 The most consistent output was generated with the Jaccard distance
used both as weight and heuristic; or as heuristic in combination with
the Jiang-Conrath distance as weight.
 The most arbitrary facts occur in a story when using the combined
node degree as weight with the Jaccard distance as heuristic, both in
the optimized and the original algorithm.
 User judgments conïŹrm the ïŹndings for the Jiang-Conrath weight,
original algorithm and for the Jaccard distance used as weight and
heuristic in terms of discovery.
25
 Validate the correlation between the effect of the link estimation
on the arbitrariness as perceived by users and computational
semantic relatedness measures such as SemRank.
 Measure the scalability of the approach by implementing the
algorithms:
(i) solely on the client,
(ii) completely on the sever, and
(iii) in a distributed client/server architecture.
Next Steps
Additional questions?
@laurens_d_v
laurens.devocht@ugent.be
http://slideshare.net/laurensdv

Weitere Àhnliche Inhalte

Andere mochten auch

Opportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership MetadataOpportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership Metadata
Miel Vander Sande
 
Using EPUB 3 and the Open Web Platform for Enhanced Presentation and Machine-...
Using EPUB 3 and the Open Web Platform for Enhanced Presentation and Machine-...Using EPUB 3 and the Open Web Platform for Enhanced Presentation and Machine-...
Using EPUB 3 and the Open Web Platform for Enhanced Presentation and Machine-...
Pieter Heyvaert
 
Querying federations ‹of Triple Pattern Fragments
Querying federations ‹of Triple Pattern FragmentsQuerying federations ‹of Triple Pattern Fragments
Querying federations ‹of Triple Pattern Fragments
Ruben Verborgh
 
LDOW2013 r&wbase: git for triples
LDOW2013 r&wbase: git for triplesLDOW2013 r&wbase: git for triples
LDOW2013 r&wbase: git for triples
Miel Vander Sande
 
Towards a Uniform User Interface for Editing Mapping Definitions
Towards a Uniform User Interface for Editing Mapping DefinitionsTowards a Uniform User Interface for Editing Mapping Definitions
Towards a Uniform User Interface for Editing Mapping Definitions
Pieter Heyvaert
 
Researcher Profiling based on Semantic Analysis in Social Networks
Researcher Profiling based on Semantic Analysis in Social NetworksResearcher Profiling based on Semantic Analysis in Social Networks
Researcher Profiling based on Semantic Analysis in Social Networks
Laurens De Vocht
 

Andere mochten auch (19)

Machines are the new Digital Natives
Machines are the new Digital NativesMachines are the new Digital Natives
Machines are the new Digital Natives
 
Opportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership MetadataOpportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership Metadata
 
iRail: History & current issues
iRail: History & current issuesiRail: History & current issues
iRail: History & current issues
 
Using EPUB 3 and the Open Web Platform for Enhanced Presentation and Machine-...
Using EPUB 3 and the Open Web Platform for Enhanced Presentation and Machine-...Using EPUB 3 and the Open Web Platform for Enhanced Presentation and Machine-...
Using EPUB 3 and the Open Web Platform for Enhanced Presentation and Machine-...
 
Querying federations ‹of Triple Pattern Fragments
Querying federations ‹of Triple Pattern FragmentsQuerying federations ‹of Triple Pattern Fragments
Querying federations ‹of Triple Pattern Fragments
 
Situation of open data in Flanders
Situation of open data in FlandersSituation of open data in Flanders
Situation of open data in Flanders
 
Querying Heterogeneous Linked Date Interfaces through Reasoning
Querying Heterogeneous Linked Date Interfaces through ReasoningQuerying Heterogeneous Linked Date Interfaces through Reasoning
Querying Heterogeneous Linked Date Interfaces through Reasoning
 
Towards an Interface for User-Friendly Linked Data Generation Administration
Towards an Interface for User-Friendly Linked Data Generation AdministrationTowards an Interface for User-Friendly Linked Data Generation Administration
Towards an Interface for User-Friendly Linked Data Generation Administration
 
LDOW2013 r&wbase: git for triples
LDOW2013 r&wbase: git for triplesLDOW2013 r&wbase: git for triples
LDOW2013 r&wbase: git for triples
 
Time travelling through DBpedia
Time travelling through DBpediaTime travelling through DBpedia
Time travelling through DBpedia
 
ESWC2015 - Query Optimization for Clients of Linked Data Fragments
ESWC2015 - Query Optimization for Clients of Linked Data FragmentsESWC2015 - Query Optimization for Clients of Linked Data Fragments
ESWC2015 - Query Optimization for Clients of Linked Data Fragments
 
Towards a Uniform User Interface for Editing Mapping Definitions
Towards a Uniform User Interface for Editing Mapping DefinitionsTowards a Uniform User Interface for Editing Mapping Definitions
Towards a Uniform User Interface for Editing Mapping Definitions
 
Presentation Data Science Challenge
Presentation Data Science ChallengePresentation Data Science Challenge
Presentation Data Science Challenge
 
DBpedia Mappings Quality Assessment
DBpedia Mappings Quality AssessmentDBpedia Mappings Quality Assessment
DBpedia Mappings Quality Assessment
 
Scaling out federated queries for Life Sciences Data In Production
Scaling out federated queries for Life Sciences Data In ProductionScaling out federated queries for Life Sciences Data In Production
Scaling out federated queries for Life Sciences Data In Production
 
ComparativeMotifFinding
ComparativeMotifFindingComparativeMotifFinding
ComparativeMotifFinding
 
RMLEditor: A Graph-based Mapping Editor for Linked Data Mappings
RMLEditor: A Graph-based Mapping Editor for Linked Data MappingsRMLEditor: A Graph-based Mapping Editor for Linked Data Mappings
RMLEditor: A Graph-based Mapping Editor for Linked Data Mappings
 
A Visual Exploration Workflow as Enabler for the Exploitation of Linked Open ...
A Visual Exploration Workflow as Enabler for the Exploitation of Linked Open ...A Visual Exploration Workflow as Enabler for the Exploitation of Linked Open ...
A Visual Exploration Workflow as Enabler for the Exploitation of Linked Open ...
 
Researcher Profiling based on Semantic Analysis in Social Networks
Researcher Profiling based on Semantic Analysis in Social NetworksResearcher Profiling based on Semantic Analysis in Social Networks
Researcher Profiling based on Semantic Analysis in Social Networks
 

Ähnlich wie Effect of Heuristics on Serendipity in Path-Based Storytelling with Linked Data

Adaptive cluster distance bounding
Adaptive cluster distance boundingAdaptive cluster distance bounding
Adaptive cluster distance bounding
Ocular Systems
 
Cannonical correlation
Cannonical correlationCannonical correlation
Cannonical correlation
domsr
 
Cannonical Correlation
Cannonical CorrelationCannonical Correlation
Cannonical Correlation
domsr
 
671_JeevanRavula_CEE
671_JeevanRavula_CEE671_JeevanRavula_CEE
671_JeevanRavula_CEE
Jeevan Reddy R
 
Adaptive image search with hash codes
Adaptive image search with hash codesAdaptive image search with hash codes
Adaptive image search with hash codes
Ecway Technologies
 
Matlab adaptive image search with hash codes
Matlab  adaptive image search with hash codesMatlab  adaptive image search with hash codes
Matlab adaptive image search with hash codes
Ecway Technologies
 
UROP Symposium Poster
UROP Symposium PosterUROP Symposium Poster
UROP Symposium Poster
Brad Schwartz
 
Advanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic DataAdvanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic Data
Dmitry Grapov
 
Graphical Models 4dummies
Graphical Models 4dummiesGraphical Models 4dummies
Graphical Models 4dummies
xamdam
 
Download
DownloadDownload
Download
butest
 

Ähnlich wie Effect of Heuristics on Serendipity in Path-Based Storytelling with Linked Data (20)

240408_JW_labseminar[Asymmetric Transitivity Preserving Graph Embedding].pptx
240408_JW_labseminar[Asymmetric Transitivity Preserving Graph Embedding].pptx240408_JW_labseminar[Asymmetric Transitivity Preserving Graph Embedding].pptx
240408_JW_labseminar[Asymmetric Transitivity Preserving Graph Embedding].pptx
 
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
 
Adaptive cluster distance bounding
Adaptive cluster distance boundingAdaptive cluster distance bounding
Adaptive cluster distance bounding
 
Advanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data AnalysisAdvanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data Analysis
 
Cannonical correlation
Cannonical correlationCannonical correlation
Cannonical correlation
 
Cannonical Correlation
Cannonical CorrelationCannonical Correlation
Cannonical Correlation
 
Weighting NN
Weighting NNWeighting NN
Weighting NN
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clustering
 
671_JeevanRavula_CEE
671_JeevanRavula_CEE671_JeevanRavula_CEE
671_JeevanRavula_CEE
 
Dsto tr-1436
Dsto tr-1436Dsto tr-1436
Dsto tr-1436
 
Adaptive image search with hash codes
Adaptive image search with hash codesAdaptive image search with hash codes
Adaptive image search with hash codes
 
Matlab adaptive image search with hash codes
Matlab  adaptive image search with hash codesMatlab  adaptive image search with hash codes
Matlab adaptive image search with hash codes
 
cannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfcannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdf
 
UROP Symposium Poster
UROP Symposium PosterUROP Symposium Poster
UROP Symposium Poster
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
Advanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic DataAdvanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic Data
 
Graphical Models 4dummies
Graphical Models 4dummiesGraphical Models 4dummies
Graphical Models 4dummies
 
421_PrakashMudholkar
421_PrakashMudholkar421_PrakashMudholkar
421_PrakashMudholkar
 
Data Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdfData Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdf
 
Download
DownloadDownload
Download
 

KĂŒrzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

KĂŒrzlich hochgeladen (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Effect of Heuristics on Serendipity in Path-Based Storytelling with Linked Data

Hinweis der Redaktion

  1. Algorithmic storytelling can be seen as a particular kind of querying data. Given a set of keywords or entities, which are typically, but not necessarily dissimilar, it aims at generating a story by explicitly relating the query context with a path that includes semantically related resources. Storytelling is utilized for example in entertaining applications and visualizations in order to enrich related Linked Data resources with data from multimedia archives and social media as well as in scientiïŹc research ïŹelds such as bio-informatics where biologists try to relate sets of genes arising from different experiments by investigating the implicated pathways or discovering stories through linked books.
  2. Algorithmic storytelling can be seen as a particular kind of querying data. Given a set of keywords or entities, which are typically, but not necessarily dissimilar, it aims at generating a story by explicitly relating the query context with a path that includes semantically related resources. Storytelling is utilized for example in entertaining applications and visualizations in order to enrich related Linked Data resources with data from multimedia archives and social media as well as in scientiïŹc research ïŹelds such as bio-informatics where biologists try to relate sets of genes arising from different experiments by investigating the implicated pathways or discovering stories through linked books.
  3. Research questions: Relate to the title of the paper, we want to get an indication of the effect of using different heuristics and find suitable measure to gain insight in them: 3 components: [pathfinding] [serendipity] and [user judgments]
  4. The aspects that make a story a good story are captured in the term serendipity. The term depicts a mixture between casual, lucky, helpful and unforeseen facts, also in an information context. In fact, users want to be surprised and they want to discover, conïŹrm, and extend knowledge - but not feel unsure while doing so. This means that users can always relate presented story facts to their background knowledge.
  5. The most frequently encountered algorithm to determine a path between multiple resources is the A* algorithm. This algorithm, which is based on a graph representation of the underlying data (i.e., resources and links between them deïŹne nodes and edges, respectively) determines an optimal solution in form of a lowest-cost traversable path between two resources (determined using H and W). The optimality of a path, which is guaranteed by the A* algorithm, does not necessarily comply with the users’ expectations
  6. Our algorithm reduces the arbitrariness of a path between these resources by increasing the relevance of the links between the nodes using a domain-delineation step. The path is reïŹned by iteratively applying the A* algorithm and with each iteration attempting to improve the overall semantic relatedness between the resources until a ïŹxed number of iterations or a certain similarity threshold is reached. H and W are not fixed as in the original core algorithm.
  7. Jaccard statistical approach taking into account the relative number of common predicates of two nodes. The higher the number of common predicates, the more likely similar properties of the nodes and thus the semantically closer in terms of distance the corresponding nodes. NDD See normalized web distance but with f being a different function: f (n)∈N denotes thenumber of DBpedia nodes linking to node n∈G , f (n,m)∈N denotes the number of DBpedia nodes linking to both nodes n and m. Confidence asymmetrical statistical measure that can be thought of as the probability that node a occurs provided that node b has already occurred.
  8. Jaccard (see prev) JCW looking at the classes of each of the nodes and determining the most common denominator of those classes in the ontology. Once this type is determined, the number of subjects that exist with this type is divided by the total number of subjects. The higher this number, the more generic the class, thus the more different two nodes. CND can be used to compute a weight that encourages rarity of items in a path. It ranks more rare resources higher, thereby guaranteeing that paths between resources prefer speciïŹc relations. The main idea is to avoid that paths go via generic nodes. It makes use of the node degree, the number of in and outgoing links.
  9. f(x) and f(y) are the number of hits for search terms x and y, respectively; andf(x, y) is the number of web pages on which both x and y occur.
  10. Semantic relatedness is improved (distance is decreased) in all cases except when entities were already closely related. NGD: normalized web search distance implemented on top of Bing Search API.
  11. A set of paths is not a presentable story yet. We note that even if a path comprise just the start and destination (indicating they are linked via common hops or directly to each other), the story will contain interesting facts. This is because each step in the path is separated with at least one hop from the next node. For example, to present a story about Carl Linnaeus and Charles Darwin, the story could start from a path that goes via J.W. von Goethe. The resulting statements serve as basic facts, which are relation-object statements, that make up the story. It is up to the application or visualization engine to present it to end-users and enrich it with descriptions, media or further facts.
  12. We applied SemRank to evaluate the paths, in particular to capture the serendipity of each path. The serendipity is measured by using a factor ” to indicate the so called ‘refraction’ how different each new step in a path is compared to the previous averaged over the entire path. Furthermore the information gain is modulated using the same factor ” . The information gain is computed from the weakest point along the path and an average of the rest. * The information gain indicates how much new information is added over all the possibilities that were in fact available. It is averaged over all the steps (facts). If a fact is more rare it will contribute relatively more than others at a certain point. This involves looking at the frequency of the predicates and the resources in the path and how likely it was that a certain resource was included a certain point in a path. In information theory, the amount of information contained in an event is measured by the negative logarithm of the probability of occurrence of the event. More details -> refer to SemRank paper: Kemafor Anyanwu, Angela Maduko, and Amit Sheth. 2005. SemRank: ranking complex relationship search results on the semantic web. In Proceedings of the 14th international conference on World Wide Web (WWW '05). ACM, New York, NY, USA, 117-127. DOI=http://dx.doi.org/10.1145/1060745.1060766 * Average re- fraction R(p): path -> how different is the item from the previous, i.e. thus how much does it contribute to the path so far. There are three special cases [4]: conventional with ” = 0 leading to 1 I(p) SemRank(0, p) = , serendipity plays no role and so no emphasis is put one newly gained or unexpected information; mixed with ” = 0.5 leading toSemRank(0.5, p)= R(p) I(p) 1 2I(p) + ]×−[1+ [ ], a balance between unexpected and newly gained information; 2 2 and discovery with ” = 1 leading to SemRank(1, p) = I(p)×−[1+R(p)], emphasizing unexpected and newly gained information.
  13. On the one hand the conventional and mixed mode for SemRank put less emphasis on novelty and focuses mainly on semantic association and information content. The jaccard distance combination used as weight and heuristic is not entirely surprisingly the best choice for this scenario. On the other hand the results of the original algorithm making use of the common node degree as weight together with the jaccard distance is conïŹrmed by the results of the improved algorithm with the common node degree however with a slightly lower rank in the new algorithm. Using the JCW however leads to even higher ranks. In terms of discovery, the original algorithm outperforms the JaccardJaccard combination. The CNDJaccard improved algorithm is able to slightly outperform all the other combinations.
  14. The effect of the heuristics according to user judgments compared to the overall median. The JCWJaccard conïŹrms already good results with SemRank. The CNDJaccard scores relatively well too. The scores for relevancy, consistency and discovery as unexpected - but relevant - facts are highly dependent on the user who judges. Some users might be interested in the more trivial path as well in some cases. Nevertheless, we used the overall judgment as a baseline to compare the judgments with the same combinations of heuristics and weights as before.
  15. The suggested stories that center around a certain via-fact are not always considered relevant by some users even though the algorithms might consider them so.
  16. Algorithmic storytelling can be seen as a particular kind of querying data. Given a set of keywords or entities, which are typically, but not necessarily dissimilar, it aims at generating a story by explicitly relating the query context with a path that includes semantically related resources. Storytelling is utilized for example in entertaining applications and visualizations in order to enrich related Linked Data resources with data from multimedia archives and social media as well as in scientiïŹc research ïŹelds such as bio-informatics where biologists try to relate sets of genes arising from different experiments by investigating the implicated pathways or discovering stories through linked books.
  17. We proposed an optimized pathïŹnding algorithm for storytelling that reduces the number of arbitrary resources revealed in paths contained in the story. Preliminary evaluation results using the DBpedia dataset indicate that our proposal succeeds in telling a story featuring better link estimation, especially in cases where the previous algorithm did not make seemingly optimal choices of links. By deïŹning stories as chains of links in Linked Data, we optimized the storytelling algorithm and tested with several heuristics and weights. The most consistent output was generated with the Jaccard distance used both as weight and heuristic; or as heuristic in combination with the Jiang-Conrath distance as weight. The most arbitrary facts occur in a story when using the combined node degree as weight with the Jaccard distance as heuristic, both in the optimized and the original algorithm. User judgments conïŹrm the ïŹndings for the Jiang-Conrath weight and the original algorithm and for the Jaccard distance used as weight and heuristic in terms of discovery. There is no clear positive effect however according the users in terms of consistency and relevancy there.
  18. Future work will focus on validating the correlation between the effect of the link estimation on the arbitrariness as perceived by users and computational semantic relatedness measures such as SemRank. Additionally, we will measure the scalability of our approach by implementing the algorithms (i) solely on the client, (ii) completely on the sever, and (iii) in a distributed client/server architecture.