Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Tomas Knap: UnifiedViews in COMSODE pilot projects

541 Aufrufe

Veröffentlicht am


Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Tomas Knap: UnifiedViews in COMSODE pilot projects

  1. 1. The COMSODE project has received funding from the Seventh Framework Programme of the European Union in the grant agreement number 611358. UnifiedViews in COMSODE pilot projects Tomas Knap1,2, Jakub Klimek2 1EEA s.r.o., http://www.eea.sk/ 2Charles University in Prague, Department of Software Engineering, XML and Web Engineering Research Group
  2. 2. Agenda  UnifiedViews  UnifiedViews in Open Data Node  Pilot Applications  Slovak Environmental Agency  Czech Trade Inspection Authority
  3. 3. UnifiedViews
  4. 4. UnifiedViews  A tool for management of RDF data processing tasks  Task = progression of data processing units (DPUs)  Sample task:  Extract data from SPARQL Endpoint A  Extract data from CSV file B  Refine data with SPARQL queries X,Y, Z  Deduplicate data using Linker L  Publish data to SPARQL Endpoint B
  5. 5. RDF Data Processing Task
  6. 6. UnifiedViews  UnifiedViews allows users to define, execute, monitor, debug, schedule, and share tasks  UnifiedViews is ETL tool for RDF data  It differs from other ETL tools by natively supporting RDF data  UnifiedViews provides set of plugins (DPUs) for working with RDF data and new custom plugins may be easily created  Open source, http://unifiedviews.eu
  7. 7. UnifiedViews Team http://unifiedviews.eu
  8. 8. UnifiedViews Demo
  9. 9. UnifiedViews in Open Data Node
  10. 10. Open Data Node  Publication platform for (Linked) Open Data  Open Source  Developed in COMSODE project  2013-2015
  11. 11. Open Data Node http://opendatanode.org/
  12. 12. Pilot Application Slovak Environmental Agency
  13. 13. Mission of the Slovak Environmental Agency (SEA)  Policy support  Design and Implementation  Data provider/integrator  LandCover, Environmental burden, waste dumps  Infrastructure provider  Data Services (DB servers)  Consultancy provider  Analysis and design of environmental information systems
  14. 14. Initial Situation/Motivation  SEA publishes various geospatial data from the environmental domain  SEA wanted to explore potential to increase re- use of their data if published as Linked data
  15. 15. Goals  To publish as Linked Data datasets on:  Protected sites, species distribution, bio-geographical regions, land cover, contaminated sites registered as enviromental burdens  Harvest and convert source data to RDF  Source data is available in the Geography Markup Language (GML) via an API provided by the Web Feature Service (WFS), typically in INSPIRE format  Initial barrier: the vocabularies mapping the INSPIRE XML schemas to RDF were not available  Interlink with relevant RDF/Linked data resources  Provide visualizations, interface for querying
  16. 16. Approach and IT solution  Successfully deployed ODN with UnifiedViews on remote cloud infrastructure of SEA  For each dataset we built a transforming data processing pipeline in UnifiedViews, which harvested the data from the data service and converted it to RDF via XSL transformations.  We also created pipelines for enriching the datasets with links to external datasets  We associated these pipelines with datasets in catalog
  17. 17. Approach and IT solution Data Transformation  Since GML is an XML format we converted it to RDF via XSL transformations.  We extend XSL transformations developed by the GeoKnow project (http://geoknow.eu)  The target vocabularies produced by the transformations were derived from the INSPIRE schemas and were simplified and adjusted to match linked data conventions  Done in cooperation with SmartOpenData project (http://www.w3.org/2015/03/inspire)
  18. 18. Approach and IT solution Data Enrichment  We link datasets to external datasets including Geonames.org and datasets from the European Environmental Agency:  Biogeographical regions 2011  Natura 2000  EUNIS
  19. 19. Benefits of the Semantic Solution  A key benefit of the RDF version of the SEA datasets is that it is straightforward to combine it with third-party datasets  We did the linkage to GeoNames, Natura 2000 and EUNIS datasets
  20. 20. Lessons Learned  Open Data Node (and UnifiedViews) was able to transform, enrich and publish RDF data in a simple way, allowing easy maintenance for the future  Making the data you publish adhere to common standards, such as the INSPIRE schemas, make it more reusable  Reuse of XSL transformations from other projects
  21. 21. Next Steps  Linking more third-party datasets and extending the coverage of the source data included in the RDF version  Data visualizations are being designed  Developed as extensions of LDVMi (http://ldvm.net).
  22. 22. Demo
  23. 23. Pilot Application Czech Trade Inspection Authority
  24. 24. Mission of Czech Trade Inspection Authority (CTIA)  Monitors and inspects businesses and individuals who  Supply goods  Sell goods  Provide services  Provided consumer credit  Operate marketplaces
  25. 25. Initial Situation  CTIA publishes CSV data about  Inspections  Penalties  Bans
  26. 26. Motivation  CTIA wanted to publish their data  To be used by third-party applications  Instead of building their own map visualizations
  27. 27. Goals  CTIA wanted to (and managed to) be the first Czech administrative government institution to publish data in RDF (LOD)  CTIA wanted to publish additional anonymized datasets
  28. 28. Approach and IT solution  UnifiedViews successfully deployed and pipelines prepared to publish the source data as Linked Open Data
  29. 29. Benefits of the Semantic Solution  A map application emerged  Uses RDF data combined with other datasets  Registry of Business Entities  Google Maps
  30. 30. Lessons Learned and Next Steps  Publishing data as LOD pays off  Publishing data as LOD is not difficult  All you need to start is a spare PC  CTIA is in the process of implementing the COMSODE methodology for publising open data
  31. 31. Demo  Resulting data published:  http://www.coi.cz/cz/spotrebitel/open-data- databaze-kontrol-sankci-a-zakazu/  (in Czech)
  32. 32. Conclusions
  33. 33. Conclusions  UnifiedViews  http://unifiedviews.eu  Open Data Node  http://opendatanode.org  Pilots:  Slovak Environmental Agency  Czech Trade Inspection Authority
  34. 34. UnifiedViews in COMSODE pilot projects Tomas Knap1,2, Jakub Klimek2 1EEA s.r.o., http://www.eea.sk/ 2Charles University in Prague, Department of Software Engineering, XML and Web Engineering Research Group