SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis Mathieu d’Aquin and Enrico Motta Knowledge Media Institute The Open University, Milton Keynes, UK
Hey, Data!  I Love Data!
Me? My name is “one great dataset” and my namespace http://datasets.com/greatone/ Let’s see… You there, what are you about? One great dataset
1,254,245 triples.  I also have a SPARQL endpoint! OK, but what’s there? One great dataset
Euh.. I have a Void description… with links and all… Can you be more explicit? One great dataset
You mean you want to see… my ontology? Hmm… I mean, what are these triples saying? One great dataset
That would help… but can you tell me what I can ask you? Like example SPARQL queries? One great dataset
Yeah… but I don’t know SPARQL, and how do you chose your examples anyway? … One great dataset
Well… figure it out by yourself them! One great dataset
Summarizing an RDF dataset with questions  We would like to be able to give an entry point to a dataset by showing questions it is good at answering In a way that can be navigated Example: Who are the people Tom knows? Tom Heath’s FOAF profile
A question A list of characteristics of objects (clauses) based on the relationships between objects Things that are people, i.e. instances of <Person> Related to <tom> through the relation <knows> For which the answer is a set of objects  All the objects that satisfy the clauses of the question
Formal concept analysis Lattice of concepts: set of objects (extension) with common properties (intension) Formal context: objects with binary attributes Example from: http://en.wikipedia.org/wiki/Formal_concept_analysis
RDF instances as individuals in a formal context Present relations of objects as binary attributes: RDF: tom a Person. tom knows enrico. jeff knows tom. FCA: tom: {Class:-Person, knows:-Enrico, jeff-:knows} Include implicit information based on the ontology tom: {Class:-Person, Class:-Agent, Class:-Thing, knows:-Enrico, knows:Person, knows:-Agent, knows:-Thing,jeff-:knows, Person:-knows, Agent-:knows, Thing:-knows}
Example lattice: Tom’s FOAF Profile
Eliminating redundancies Who are the people Tom knows?
A concept in the lattice is a question Intension = clauses of the question  Extension = answers  All the objects of the extension satisfy the clauses of the question Different areas of the lattice focus on different topics Questions are  organized in a hierarchy {Class:-Person, tom-:knows} What are the (Person) that (tom knows)? What are tom’s current projects? What are the people? What are the people that tom knows?
But… The RDFFormal Context process can generate a lot of attributes and so a lot of questions Ranging from things uninterestingly general          What are the Things? To the ones that might be interesting only in very specific cases         What are the indian restaurants located in San Diego that have been rated OK and are called “Chez Bob”? Need to extract a list of questions as an entry point
How to measure the interestingness of a question - metrics Inspired by ontology summarization: Coverage: if providing a list of questions, the questions should cover the entire lattice (i.e., at least one question per branch) Level: Too general or too specific questions are not useful Density: The number of clauses can have an impact (avoid too complex questions as well as too simple ones) Inspired from FCA: Support: the cardinality of the extent – i.e. the number of answers Intentional Stability: How much a concept depends on particular elements of the extension Extensional Stability: How much a concept depends on particular elements of the intension
Experiment: finding the relevant metrics 4 datasets in different domains 12 evaluators providing questions of interest for these datasets Obtained 44 questions, out of which 27 are valid (no overlap) Some are too complicated for our model (include disjunction, negation, aggregation functions) “What is the highest point in Florida?” A large part do not comply with the initial instructions: should be self-contained and answered by a list of objects “How high is mountain x?” “What are the restaurant in a given city?”
Results Level: Questions between levels 3 and 7. 4.46 is the average. ,[object Object],Density: Questions have between 1 and 3 clauses ,[object Object],Support: Very large variations amongst the obtained questions Intentional Stability: Very large variations amongst the obtained questions Extensional Stability: High values (between 0.75 and 1.0), especially compared to the average (0.4) Conclusion: In order to establish a list of questions most likely to be of interest, a combination of level, density and extensional stability, together with coverage should be used
Evaluation Algorithm to generate a set of questions from the lattice of an RDF dataset that Cover the entire lattice Are believed to be interesting according to a given measure Datasets from data.open.ac.uk 614 course descriptions  1706 Video podcasts Using the metrics: random, closeness to middle level, density close to 2, support, extensional stability, and  Aggregated = 1/3 level + 1/3 density + 1/3 stability 6 users to score the resulting sets of questions (6 metrics in 2 datasets: 12 sets in total) depending on interestingness
Results
Implementation: the whatoaskinterface Dataset with SPARQL endpoint SPARQL2RCF Formal Context CORON Offline Lattice Online Lattice Parser Interface Generation (using metrics) Interface with navigation in Browser User
Example: Open educational material(OpenLearn)
Example: Database of reading experiences (Arts History project)
Example: Open University Buildings
Conclusion The technique presented provides both a summary and an exploration mechanism over RDF data, using the  underlying ontology and formal concept analysis It provides an interface for documenting the dataset by examples rather than by specification It favors serendipity in the exploration of the dataset, without the need for prior, specialized knowledge The current interface in beta is available in an online demo Need to improve the question generation and navigation mechanisms Ongoing experiment including information gathered through the links to external dataset, to generate un-anticipated questions Use-cases in research projects in Arts and Humanities
Thank you! More info Demo: http://lucero-project.info/lb/2011/06/what-to-ask-linked-data/ Data.open.ac.uk (for some of the datasets used) @mdaquin – m.daquin@open.ac.uk

Weitere ähnliche Inhalte

Was ist angesagt?

Online Learning and Linked Data: An Introduction
Online Learning and Linked Data: An IntroductionOnline Learning and Linked Data: An Introduction
Online Learning and Linked Data: An Introduction
EUCLID project
 
WWW2013 Tutorial: Linked Data & Education
WWW2013 Tutorial: Linked Data & EducationWWW2013 Tutorial: Linked Data & Education
WWW2013 Tutorial: Linked Data & Education
Stefan Dietze
 

Was ist angesagt? (20)

Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data
 
Working with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open UniversityWorking with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open University
 
Why should semantic technologies pay more attention to privacy... and vice-ve...
Why should semantic technologies pay more attention to privacy... and vice-ve...Why should semantic technologies pay more attention to privacy... and vice-ve...
Why should semantic technologies pay more attention to privacy... and vice-ve...
 
Online Learning and Linked Data: An Introduction
Online Learning and Linked Data: An IntroductionOnline Learning and Linked Data: An Introduction
Online Learning and Linked Data: An Introduction
 
LinkedUp - Linked Data & Education
LinkedUp - Linked Data & EducationLinkedUp - Linked Data & Education
LinkedUp - Linked Data & Education
 
WWW2013 Tutorial: Linked Data & Education
WWW2013 Tutorial: Linked Data & EducationWWW2013 Tutorial: Linked Data & Education
WWW2013 Tutorial: Linked Data & Education
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
 
Linked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; RepositoriesLinked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; Repositories
 
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebBeyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
 
SDA2013 Pundit: Creating, Exploring and Consuming Annotations
SDA2013 Pundit: Creating, Exploring and Consuming AnnotationsSDA2013 Pundit: Creating, Exploring and Consuming Annotations
SDA2013 Pundit: Creating, Exploring and Consuming Annotations
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the Web
 
Eprints Application Profile
Eprints Application ProfileEprints Application Profile
Eprints Application Profile
 
Better Search With Structured Knowledge
Better Search With Structured KnowledgeBetter Search With Structured Knowledge
Better Search With Structured Knowledge
 
Open Access: Open Access Looking for ways to increase the reach and impact of...
Open Access: Open Access Looking for ways to increase the reach and impact of...Open Access: Open Access Looking for ways to increase the reach and impact of...
Open Access: Open Access Looking for ways to increase the reach and impact of...
 
ESWC2015 opening ceremony
ESWC2015 opening ceremonyESWC2015 opening ceremony
ESWC2015 opening ceremony
 
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
 
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebRetrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
 
LAK Dataset and Challenge (April 2013)
LAK Dataset and Challenge (April 2013)LAK Dataset and Challenge (April 2013)
LAK Dataset and Challenge (April 2013)
 
Open data and reuse of public information
Open data and reuse of public informationOpen data and reuse of public information
Open data and reuse of public information
 
Open Web Data for Education - Linked Data technologies for connecting open ed...
Open Web Data for Education - Linked Data technologies for connecting open ed...Open Web Data for Education - Linked Data technologies for connecting open ed...
Open Web Data for Education - Linked Data technologies for connecting open ed...
 

Andere mochten auch

Andere mochten auch (7)

Linked Data Workflows and Applications for Education and Research
Linked Data Workflows and Applications for Education and ResearchLinked Data Workflows and Applications for Education and Research
Linked Data Workflows and Applications for Education and Research
 
Semantic Web Technologies for Social Translucence and Privacy Mirrors on the Web
Semantic Web Technologies for Social Translucence and Privacy Mirrors on the WebSemantic Web Technologies for Social Translucence and Privacy Mirrors on the Web
Semantic Web Technologies for Social Translucence and Privacy Mirrors on the Web
 
Semantic Technologies to Support the User-Centric Analysis of Activity Data
Semantic Technologies to Support the User-Centric Analysis of Activity Data  Semantic Technologies to Support the User-Centric Analysis of Activity Data
Semantic Technologies to Support the User-Centric Analysis of Activity Data
 
The many ways of research in semantic technologies
The many ways of research in semantic technologiesThe many ways of research in semantic technologies
The many ways of research in semantic technologies
 
Understanding personal privacy in the age of big online data
Understanding  personal privacy  in the age of big online dataUnderstanding  personal privacy  in the age of big online data
Understanding personal privacy in the age of big online data
 
Données ouvertes et traces numériques
Données ouvertes et traces numériquesDonnées ouvertes et traces numériques
Données ouvertes et traces numériques
 
Build Features, Not Apps
Build Features, Not AppsBuild Features, Not Apps
Build Features, Not Apps
 

Ähnlich wie Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2
Seonho Kim
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
butest
 
CNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationCNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundation
John Doove
 
Presentation of LUCERO at EURECOM
Presentation of LUCERO at EURECOMPresentation of LUCERO at EURECOM
Presentation of LUCERO at EURECOM
Mathieu d'Aquin
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
Mohamed BEN ELLEFI
 

Ähnlich wie Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis (20)

20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2
 
Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015
Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015
Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
Loupe model - Use Cases and Requirements
Loupe model - Use Cases and Requirements Loupe model - Use Cases and Requirements
Loupe model - Use Cases and Requirements
 
The Role Of Ontology In Modern Expert Systems Dallas 2008
The Role Of Ontology In Modern Expert Systems   Dallas   2008The Role Of Ontology In Modern Expert Systems   Dallas   2008
The Role Of Ontology In Modern Expert Systems Dallas 2008
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)
 
Semantic Web in Action
Semantic Web in ActionSemantic Web in Action
Semantic Web in Action
 
CNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationCNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundation
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)
 
Dr.saleem gul assignment summary
Dr.saleem gul assignment summaryDr.saleem gul assignment summary
Dr.saleem gul assignment summary
 
Presentation of LUCERO at EURECOM
Presentation of LUCERO at EURECOMPresentation of LUCERO at EURECOM
Presentation of LUCERO at EURECOM
 
Duplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy DatasetDuplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy Dataset
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
 
Object models and object representation
Object models and object representationObject models and object representation
Object models and object representation
 
"Is one Stop Shopping all we Dreamed it Would be? Usability and the Single Se...
"Is one Stop Shopping all we Dreamed it Would be? Usability and the Single Se..."Is one Stop Shopping all we Dreamed it Would be? Usability and the Single Se...
"Is one Stop Shopping all we Dreamed it Would be? Usability and the Single Se...
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
 
Topic Pages. From articles to answers.
Topic Pages. From articles to answers.Topic Pages. From articles to answers.
Topic Pages. From articles to answers.
 

Mehr von Mathieu d'Aquin

Mehr von Mathieu d'Aquin (20)

A factorial study of neural network learning from differences for regression
A factorial study of neural network learning from  differences for regressionA factorial study of neural network learning from  differences for regression
A factorial study of neural network learning from differences for regression
 
Recentrer l'intelligence artificielle sur les connaissances
Recentrer l'intelligence artificielle sur les connaissancesRecentrer l'intelligence artificielle sur les connaissances
Recentrer l'intelligence artificielle sur les connaissances
 
Data and Knowledge as Commodities
Data and Knowledge as CommoditiesData and Knowledge as Commodities
Data and Knowledge as Commodities
 
Unsupervised learning approach for identifying sub-genres in music scores
Unsupervised learning approach for identifying sub-genres in music scoresUnsupervised learning approach for identifying sub-genres in music scores
Unsupervised learning approach for identifying sub-genres in music scores
 
Is knowledge engineering still relevant?
Is knowledge engineering still relevant?Is knowledge engineering still relevant?
Is knowledge engineering still relevant?
 
A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science process
 
Dealing with Open Domain Data
Dealing with Open Domain DataDealing with Open Domain Data
Dealing with Open Domain Data
 
Web Analytics for Everyday Learning
Web Analytics for  Everyday LearningWeb Analytics for  Everyday Learning
Web Analytics for Everyday Learning
 
Presentation a in ovive montpellier - 26%2 f06%2f2018 (1)
Presentation a in ovive   montpellier - 26%2 f06%2f2018 (1)Presentation a in ovive   montpellier - 26%2 f06%2f2018 (1)
Presentation a in ovive montpellier - 26%2 f06%2f2018 (1)
 
Learning Analytics: understand learning and support the learner
Learning Analytics: understand learning and support the learnerLearning Analytics: understand learning and support the learner
Learning Analytics: understand learning and support the learner
 
The AFEL Project
The AFEL ProjectThe AFEL Project
The AFEL Project
 
Assessing the Readability of Policy Documents: The Case of Terms of Use of On...
Assessing the Readability of Policy Documents: The Case of Terms of Use of On...Assessing the Readability of Policy Documents: The Case of Terms of Use of On...
Assessing the Readability of Policy Documents: The Case of Terms of Use of On...
 
Data ethics
Data ethicsData ethics
Data ethics
 
Data for Learning and Learning with Data
Data for Learning and Learning with DataData for Learning and Learning with Data
Data for Learning and Learning with Data
 
Towards an “Ethics in Design” methodology for AI research projects
Towards an “Ethics in Design” methodology  for AI research projects Towards an “Ethics in Design” methodology  for AI research projects
Towards an “Ethics in Design” methodology for AI research projects
 
AFEL: Towards Measuring Online Activities Contributions to Self-Directed Lear...
AFEL: Towards Measuring Online Activities Contributions to Self-Directed Lear...AFEL: Towards Measuring Online Activities Contributions to Self-Directed Lear...
AFEL: Towards Measuring Online Activities Contributions to Self-Directed Lear...
 
Profiling information sources and services for discovery
Profiling information sources and services for discoveryProfiling information sources and services for discovery
Profiling information sources and services for discovery
 
Analyse de données et de réseaux sociaux pour l’aide à l’apprentissage infor...
Analyse de données et de réseaux sociaux pour  l’aide à l’apprentissage infor...Analyse de données et de réseaux sociaux pour  l’aide à l’apprentissage infor...
Analyse de données et de réseaux sociaux pour l’aide à l’apprentissage infor...
 
From Knowledge Bases to Knowledge Infrastructures for Intelligent Systems
From Knowledge Bases to Knowledge Infrastructures for Intelligent SystemsFrom Knowledge Bases to Knowledge Infrastructures for Intelligent Systems
From Knowledge Bases to Knowledge Infrastructures for Intelligent Systems
 
Data analytics beyond data processing and how it affects Industry 4.0
Data analytics beyond data processing and how it affects Industry 4.0Data analytics beyond data processing and how it affects Industry 4.0
Data analytics beyond data processing and how it affects Industry 4.0
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

  • 1. Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis Mathieu d’Aquin and Enrico Motta Knowledge Media Institute The Open University, Milton Keynes, UK
  • 2. Hey, Data! I Love Data!
  • 3. Me? My name is “one great dataset” and my namespace http://datasets.com/greatone/ Let’s see… You there, what are you about? One great dataset
  • 4. 1,254,245 triples. I also have a SPARQL endpoint! OK, but what’s there? One great dataset
  • 5. Euh.. I have a Void description… with links and all… Can you be more explicit? One great dataset
  • 6. You mean you want to see… my ontology? Hmm… I mean, what are these triples saying? One great dataset
  • 7. That would help… but can you tell me what I can ask you? Like example SPARQL queries? One great dataset
  • 8. Yeah… but I don’t know SPARQL, and how do you chose your examples anyway? … One great dataset
  • 9. Well… figure it out by yourself them! One great dataset
  • 10. Summarizing an RDF dataset with questions We would like to be able to give an entry point to a dataset by showing questions it is good at answering In a way that can be navigated Example: Who are the people Tom knows? Tom Heath’s FOAF profile
  • 11. A question A list of characteristics of objects (clauses) based on the relationships between objects Things that are people, i.e. instances of <Person> Related to <tom> through the relation <knows> For which the answer is a set of objects All the objects that satisfy the clauses of the question
  • 12. Formal concept analysis Lattice of concepts: set of objects (extension) with common properties (intension) Formal context: objects with binary attributes Example from: http://en.wikipedia.org/wiki/Formal_concept_analysis
  • 13. RDF instances as individuals in a formal context Present relations of objects as binary attributes: RDF: tom a Person. tom knows enrico. jeff knows tom. FCA: tom: {Class:-Person, knows:-Enrico, jeff-:knows} Include implicit information based on the ontology tom: {Class:-Person, Class:-Agent, Class:-Thing, knows:-Enrico, knows:Person, knows:-Agent, knows:-Thing,jeff-:knows, Person:-knows, Agent-:knows, Thing:-knows}
  • 14. Example lattice: Tom’s FOAF Profile
  • 15. Eliminating redundancies Who are the people Tom knows?
  • 16. A concept in the lattice is a question Intension = clauses of the question Extension = answers All the objects of the extension satisfy the clauses of the question Different areas of the lattice focus on different topics Questions are organized in a hierarchy {Class:-Person, tom-:knows} What are the (Person) that (tom knows)? What are tom’s current projects? What are the people? What are the people that tom knows?
  • 17. But… The RDFFormal Context process can generate a lot of attributes and so a lot of questions Ranging from things uninterestingly general What are the Things? To the ones that might be interesting only in very specific cases What are the indian restaurants located in San Diego that have been rated OK and are called “Chez Bob”? Need to extract a list of questions as an entry point
  • 18. How to measure the interestingness of a question - metrics Inspired by ontology summarization: Coverage: if providing a list of questions, the questions should cover the entire lattice (i.e., at least one question per branch) Level: Too general or too specific questions are not useful Density: The number of clauses can have an impact (avoid too complex questions as well as too simple ones) Inspired from FCA: Support: the cardinality of the extent – i.e. the number of answers Intentional Stability: How much a concept depends on particular elements of the extension Extensional Stability: How much a concept depends on particular elements of the intension
  • 19. Experiment: finding the relevant metrics 4 datasets in different domains 12 evaluators providing questions of interest for these datasets Obtained 44 questions, out of which 27 are valid (no overlap) Some are too complicated for our model (include disjunction, negation, aggregation functions) “What is the highest point in Florida?” A large part do not comply with the initial instructions: should be self-contained and answered by a list of objects “How high is mountain x?” “What are the restaurant in a given city?”
  • 20.
  • 21. Evaluation Algorithm to generate a set of questions from the lattice of an RDF dataset that Cover the entire lattice Are believed to be interesting according to a given measure Datasets from data.open.ac.uk 614 course descriptions 1706 Video podcasts Using the metrics: random, closeness to middle level, density close to 2, support, extensional stability, and Aggregated = 1/3 level + 1/3 density + 1/3 stability 6 users to score the resulting sets of questions (6 metrics in 2 datasets: 12 sets in total) depending on interestingness
  • 23. Implementation: the whatoaskinterface Dataset with SPARQL endpoint SPARQL2RCF Formal Context CORON Offline Lattice Online Lattice Parser Interface Generation (using metrics) Interface with navigation in Browser User
  • 24. Example: Open educational material(OpenLearn)
  • 25. Example: Database of reading experiences (Arts History project)
  • 27. Conclusion The technique presented provides both a summary and an exploration mechanism over RDF data, using the underlying ontology and formal concept analysis It provides an interface for documenting the dataset by examples rather than by specification It favors serendipity in the exploration of the dataset, without the need for prior, specialized knowledge The current interface in beta is available in an online demo Need to improve the question generation and navigation mechanisms Ongoing experiment including information gathered through the links to external dataset, to generate un-anticipated questions Use-cases in research projects in Arts and Humanities
  • 28. Thank you! More info Demo: http://lucero-project.info/lb/2011/06/what-to-ask-linked-data/ Data.open.ac.uk (for some of the datasets used) @mdaquin – m.daquin@open.ac.uk