5. Calculate statistical metrics User interface
5
Aggregates datasets from the largest data portals
LODStats: Web Application
SPARQL interface
“LODStats – An Extensible Framework for High-performance Dataset Analytics” (EKAW’2012) [1]
6. 6
CKAN Aggregator
LODStats: System Architecture
Scan largest CKAN repos Filter out RDF datasets
“Linked Open Data Statistics: Collection and Exploitation” (KESW’2013) [2]
12. 12
Data Web Statistics Summary
More statistics are available from SPARQL endpoint
2011 2016
Datasets 422 9,644
Links 3% 40%
Data Portals datahub.io publicdata.eu,
data.gov, datahub.io
13. Privacy Analysis
Does dataset
contain sensitive
information?
Coverage Analysis
Does dataset
contain necessary
information?
Quality Analysis
Define quality
metrics using
statistical data.
Vocabulary Reuse
Find a suitable
vocabulary for
your dataset.
13
How can you use LODStats data?
Use Cases
Link Target Identification
Which datasets are good
candidates for
interlinking?
“Detecting Similar Linked Datasets Using Topic Modelling” (ESWC’2016) [3]
15. Availability
● Application
○ Online at: http://lodstats.aksw.org
○ LODStats processing module: https://github.com/aksw/lodstats
○ LODStats frontend including SPARQLify mappings:
https://github.com/aksw/lodstats_www
○ Deployment setup (docker): https://github.com/AKSW/lodstats.docker
● Dataset
○ Online at: http://lodstats.aksw.org/sparql
○ Datahub.io: https://datahub.io/dataset/lodstats
○ Can be deployed in Virtuoso using docker-compose from deployment repo
16. Processing of very large datasets (Spark/Hadoop)
Improving usability of the frontend
Extending data collection to crawling
Conclusions & Future Work
LODStats is easily replicable using Docker technology
19. Linked Open Data Statistics: Collection and Exploitation by Ivan Ermilov, Michael Martin,
Jens Lehmann, and Sören Auer in Proceedings of the 4th Conference on Knowledge Engineering
and Semantic Web
LODStats---An Extensible Framework for High-performance Dataset Analytics by Jan
Demter, Sören Auer, Michael Martin, and Jens Lehmann in Proceedings of the EKAW 2012
References
1
2
Detecting Similar Linked Datasets Using Topic Modelling by Michael Röder, Axel-Cyrille
Ngonga Ngomo, Ivan Ermilov, and Andreas Both in The Semantic Web. Latest Advances and
New Domains: 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29 --
June 2, 2016, Proceedings
3