My talk from the Semtech Biz conference in London.
I argued that it is time to move beyond discussing size of datasets and encourage a more nuanced view to understand quality and utility.
The RDF Report Card is offered as one simple, high-level visualization.
13. Dataset Information Spectrum
Low Detail High Detail
Summary and overview Detailed data model
of dataset content documentation & guides
14. Dataset Information Spectrum
Low Detail High Detail
Summary and overview Detailed data model
of dataset content documentation & guides
More Information
15. Dataset Information Spectrum
Low Detail High Detail
Metadata ● Title, Description
● Provenance
● Publication dates
● Licensing
● Usage cues
● Related datasets
16. Dataset Information Spectrum
Low Detail High Detail
Scope ● What types of entity?
● How many of each type?
● Coverage
● Geographic
● Events (time)
20. Summarising Content of a Dataset
● Find all classes in all datasets in Kasabi
● Tag each class against a pre-defined set of
categories
● Customized version of top-level schema.org
classes
● Generate a report card for each dataset listing
types of entity
26. Summary
● Triple counts tell us nothing
● Vital to present the quality & utility of our data
● Data publishing platforms should support this
● "Progressive disclosure"
● Right detail at the right time
● Dataset analysis can generate useful
summaries
● e.g. an RDF report card