This document proposes a Benchmark for End-User Structured Data User Interfaces (BESDUI) to evaluate and compare semantic and relational data exploration tools from a user perspective. The benchmark includes a set of 12 representative user tasks on a sample dataset, along with metrics to measure tools' capabilities, efficiency, and time performance in completing the tasks. The document describes applying BESDUI to four different tools and finding one tool performs most efficiently based on the metrics. It encourages the community to adopt and contribute to BESDUI to help drive research on improving semantic search and exploration user interfaces.
2. Authors
Roberto García
GRIHO - HCI & Data Integration
Research Group
Universitat de Lleida, Spain
Eirik Bakke
Computer Science and Artificial
Intelligence Laboratory
MIT, USA
Rosa Gil
GRIHO - HCI & Data Integration
Research Group
Universitat de Lleida, Spain
David R. Karger
Computer Science and Artificial
Intelligence Laboratory
MIT, USA
Juan Manuel Gimeno
GRIHO - HCI & Data Integration
Research Group
Universitat de Lleida, Spain
3. Motivation
• Inability to reach users traditionally alleged as
one of the main barriers for Semantic Web
uptake
• No killer app for the Semantic Web?
Desired outcome?
• Client applications should hide the complexities
of semantic technologies
• For specific tasks, task-specific user interfaces
better satisfy user needs without breaking user
experience
4. Motivation
• Anyway, opportunity for Semantic Web user interfaces:
datasets without dedicated user interface
• New data collections or rarely used
• Combination of existing datasets
• Provide users power of Web-wide connected data to
explore and discover unforeseen connections…
• Semantic Web killer app?
• Current proposals:
• Linked Data browsers, Controlled Natural Language query
engines, faceted browsers,…
• Difficult to compare from the user perspective
• What ways of exploring the data they provide?
• How efficient they are from a Quality in Use perspective?
5. Proposal
• Benchmark for comparing user interfaces
• Set of typical user tasks
• Procedure for measuring performance per task
• Low cost and easy to apply, not requiring the
intervention of real users
• For UI tools based on semantic or relational data
• Longer term
• Trigger a community discussion leading to a
framework for comparing, measuring,…
…encourage better semantic search/exploration tools
6. User Tasks
• Criteria:
• Avoid introducing bias from our a priori conception
of the problem or experience developing our own
tools
• Looked outward to find sets of typical end-user tasks related
to structured data exploration
• Applicable both to relational and semantic data
• Somewhere to start:
• Berlin SPARQL Benchmark (BSBM), Explore Use Case
• Intended for measuring the computational performance but
based on a set of realistic queries inspired by common
information needs
7. User Tasks
1. BSBM-1 Find products for a given set of generic features COMBINED
2. ADDED Find products for a given set of alternative features
3. BSBM-2 Retrieve basic information about a specific product for display purposes
4. BSBM-3 Find products having some specific features and not having one feature
5. BSBM-4 Find products matching two different sets of features
6. BSBM-5 Find product that are similar to a given product
7. BSBM-6 Find products having a label name that contains a specific string some text
8. BSBM-7 Retrieve in-depth information about a specific product including offers
and reviews
9. BSBM-8 Give me recent reviews in English for a specific product
10.BSBM-9 Get Information about a reviewer
11.BSBM-10 Get offers for a given product which fulfill specific requirements
BSBM-11 Get all information about an offer
12.BSBM-12 Export the chosen offer into another information system which uses a
different schema
8. User Tasks
• BESDUI includes for each Task, considering the
sample dataset:
• Information need:
• “List products of type sheeny with product features
stroboscopes OR gadgeteers, and a
productPropertyNumeric1 greater than 450”
• Expected output:
• “aliter tiredest”, “auditoriums reducing pappies”,
“boozed”, “byplay”, “closely jerries”
9. User Tasks
• Set of tasks is not closed, work in progress,
contributions appreciated
• However, quite complete.
References for evaluation:
• Information Seeking Strategies (Belkin et al., 1995)
• All dimensions covered by the current tasks
• Method of Interaction:
Searching (known item) / Scanning (unknown)
• Goal of Interaction:
Learning / Selecting (for retrieval)
• Mode of Retrieval:
Recognition (by association) / Specification (identified items)
• Resource Considered:
Information / Meta-information
10. User Tasks
• Frameworks of Information Exploration - Towards the
Evaluation of Exploration Systems (Nunes & Schwabe,
2016)
• Work in progress…
but complete for some operations and criteria
• Boolean Expressivity
• Conjunction values Same Relation and Different Relations
Product feature “A” and feature “B”
Product feature “A” and price “100”
• Disjunction values Same Relation and Different Relations
Product feature “A” or feature “B”
Product feature “A” or price “100”
• Negation
12. Metrics
BESDUI
Alpha Frontal Asymmetry
related to Valence (Pleasure)
“Method for Improving EEG
Based Emotion Recognition…”
(López-Gil et al., 2016)
“Using SWET-QUM to Compare the Quality in Use of
Semantic Web Exploration Tools”
(González et al., 2013) http://rhizomik.net/swet-qum/
13. Metrics
• Effectiveness
degree to which users can achieve the tasks with precision and completeness
• BESDUI Metric:
Capability: Is performing the task possible with the given system?
0% No – 100% Yes (50% if task has 2 parts)
• Efficiency
degree to which users can achieve tasks investing appropriate amount of resources
• BESDUI Metrics:
Operation Count: How many basic steps
(mouse clicks, keyboard entry, scrolling)
must be performed to carry out the task?
Time: How quickly can these steps be
executed? Map operations to time using
Keystroke Level Model (Card et al, 1980)
Time Efficiency: capability / time,
“goals per second” measure
KLM Operator
Time
(secs.)
K: button press or keystroke 0.2
P: pointing to a target on a
display with a mouse
1.1
H: homing the hand(s) on the
keyboard or other device
0.4
14. Applying BESDUI
1. Anyone, but preferably an experienced tool
user, loads the dataset and performs the 12
Tasks
2. For each one, record if the tool capable of
completing it. If so, detail all interaction steps
required
3. Map interaction steps to task time (using
provided spreadsheet)
15. Applying BESDUI
• Task 1:
“Look for products of type sheeny with product features stroboscopes
AND gadgeteers, and a productPropertyNumeric1 greater than 450”
• Tools
• Rhizomer:
• Capability: 0%
no support for conjunction of values same property
• Virtuoso FCT (Faceted Browser):
• Capability: 100%
16. Virtuoso FCT – Task 1
1. Type “sheeny” and “Enter”, then click “ProductType10”.
2. Click “Go” for “Start New Facet”, then click “Options”.
3. For “Interence Rule” Click and Select rules graph then “Apply”.
4. Click “Attributes”, then “productFeature” and “stroboscopes”.
5. Click “Attributes”, then “productFeature” and “gadgeteers”.
6. Click “Attributes” and “productPropertyNumeric1”.
7. Click “Add condition: None” and select “>”.
8. Type “450” and click “Set Condition”.
9K, 2P, 3H
2K, 2P
2K, 2P
3K, 3P
3K, 3P
2K, 2P
2K, 2P
5K, 2P, 2H
17. Applying BESDUI
• Task 2:
“Look for products of type sheeny with product features stroboscopes
OR gadgeteers, and a productPropertyNumeric1 greater than 450”
• Tools
• Rhizomer:
• Capability: 100%
• Virtuoso FCT:
• Capability: 100%
18. Rhizomer – Task 2
1. Click menu “ProductType” and then “Sheeny” submenu.
2. Click “Show values” for facet “Product Feature”.
3. Click facet value “stroboscopes”.
4. Type in input “Search Product Feature” “gad...”
5. Select “gadgeteers” from autocomplete
6. Set left side of “Product Property Numeric1”slider to “450”.
2K, 2P, 1H
1K, 1P
1K, 1P
4K, 1P, 1H
1K, 1P, 1H
1K, 2P
20. Results
• Currently, BESDUI applied to:
• Rhizomer
a semantic data exploration tool with facets and pivoting
• Virtuoso FCT
the faceted browser for the Virtuoso RDF data store
• Sieuferd
a general-purpose user interface for relational databases
• PepeSearch
a search interface for querying SPARQL endpoints
21. Results & Conclusions
• Sieuferd the most capable but less performant,
most complex user interface
• PepeSearch the less capable but more performant,
less complex user interface
• Rhizomer best effectiveness/efficiency ratio,
more “goals per second”
Averages
per Tool
Capability
K
(0.2s)
P
(1.1s)
H
(0.4s)
Operator
Count
Time
Time Efficiency
(Capability/Time)
Rhizomer 58% 15.9 10.9 2.6 29.3 16.1 3.60
Virtuoso FCT 54% 20.4 12.7 3.0 36.1 19.3 2.80
Sieuferd 96% 48.7 19.7 2.9 71.3 32.6 2.94
PepeSearch 25% 10.3 5.3 5.3 21.0 10.1 2.48
22. Conclusions
• Importance of benchmarks to drive research in a
domain
• Simple benchmark (too much?) but adoption key
• BSBM useful source of tasks and data
• Synthetic nature results in funny product names like
“waterskiing sharpness horseshoes”
…but no significant impact (no real users)
• Measure UI without having to involve users
• Less reliable but cheaper
• Ideal during early dev stages or to compare tools
23. Future Work
• Continue tasks review and extend set of users tasks
• Consider additional tools:
• Direct manipulation (Explorator, Tabulator,…)
• Interactive Query Building (YASGUI, iSPARQL…)
• Relational data (Cipher, BrioQuery,…)
• …
• Improve metrics to consider users mental effort
• SPARQL command line best UI from a KLM point of view
• Considering GOMS, includes cognitive and perceptual operators
• Compare results with real users tests
• Available as GitHub repository: http://w3id.org/BESDUI
• Please, FORK and CONTRIBUTE!
24. Thank you for your attention
Questions?
rgarcia@diei.udl.cat
http://rhizomik.net/~roberto/
BESDUI Persistent URI:
http://w3id.org/BESDUI
Hinweis der Redaktion
Les 11 persones son:
Roberto, Rosa, Marta, Toni, Montse, Paco
Juan M
Afra, Llúcia, Josep Mª
David
Andres, Yenny