UnifiedViews is a joint project currently maintained by Semantic Web Company (SWC) and Semantica.cz (Semantica.cz). It has been mainly developed by Charles University in Prague as a student project called ODCleanStore (version 2). It is based on the experience SWC obtained with the LOD Management Suite (LODMS) used in WP7 and ODCleansStore (version 1) developed by Charles University in Prague for the WP9a use case of the LOD2 FP7 project. In the next stack release of the LOD2 stack, UnifiedViews will replace LODMS as an ETL tool in the stack and the tool has already been adopted in other projects.
In the webinar we will give a brief overview of the UnifiedViews project (Helmut Nagy). The main part will be a presentation of the tool and it's capabilities (Tomas Knap)
Enterprise Security Monitoring, And Log Management.
LOD2 Webinar: UnifiedViews
1. Creating Knowledge out of Interlinked Data
LOD2 Webinar . 29.11.2011 . Page 1 http://lod2.eu
2. Creating Knowledge out of Interlinked Data
http://lod2.eu
LOD2 is a large-scale integrating project co-funded by the European
Commission within the FP7 Information and Communication Technologies
Work Programme. This 4-year project comprises leading Linked Open
Data technology researchers, companies, and service providers. Coming
from across 12 countries the partners are coordinated by the Agile
Knowledge Engineering and Semantic Web Research Group at the
University of Leipzig, Germany.
LOD2 will integrate and syndicate Linked Data with existing large-scale
applications. The project shows the benefits in the scenarios of Media and
Publishing, Corporate Data intranets and eGovernment.
LOD2 Webinar . 29.11.2011 . Page 2 http://lod2.eu
3. Creating Knowledge out of Interlinked Data
http://lod2.eu
Once per month the LOD2 webinar series offer a free webinar about
tools and services along the Linked Open Data Life Cycle.
Stay with us and learn more about acquisition, editing, composing,
connected applications – and finally publishing Linked Open Data.
LOD2 Webinar . 29.11.2011 . Page 3 http://lod2.eu
4. Creating Knowledge out of Interlinked Data
http://lod2.eu
UnifiedViews
Tomáš Knap, Semantica.cz
Helmut Nagy, Semantic Web Company
LOD2 Webinar . 29.11.2011 . Page 4 http://lod2.eu
5. Creating Knowledge out of Interlinked Data
Agenda
• What is UnifiedViews
• Short History: From ODCleanstore & LOD Manager to Unified Views
• Presentation of Unified Views
• Outlook, Impact, the UnifiedViews project
LOD2 Webinar . 29.11.2011 . Page 5 http://lod2.eu
6. Creating Knowledge out of Interlinked Data
Motivation for UnifiedViews
• Suppose a Linked Data consumer, who is defining a data processing task -
building a data mart integrating information from various RDF and non-RDF
sources.
– There are tools available for RDF data extraction, enrichment, linking, transforming, ...
– Any23, Virtuoso, Silk, …
• Stil, the consumer has to (among other activities):
– Write his own script executing the tools in the required order and with the required
configurations
– Schedule the script
– Add notification capabilities, such as sending an email in case of problems
• Maintenance of such task is challenging
– In case of problems, consumer has to manually launch the problematic tool with the proper
input data and the problematic configuration, load the output data to a RDF store and
browse/query these data
– Consumer can get very quickly lost as the amount of configurations and tools, he is using, is
increasing; as a result, he may start creating duplicated configurations.
– Consumer cannot share already prepared configurations, cannot use configurations already
prepared by others
LOD2 Webinar . 29.11.2011 . Page 6 http://lod2.eu
7. Creating Knowledge out of Interlinked Data
Problem and Our Solution
• General Problem: Consumers have to write most of the logic to define,
execute, monitor, schedule, and share the data processing tasks
• We propose UnifiedViews, an Extract-Transform-Load (ETL) framework
– The concept of data processing task is a central concept
– Another central concept is the native support for RDF data format and ontologies
LOD2 Webinar . 29.11.2011 . Page 7 http://lod2.eu
8. Creating Knowledge out of Interlinked Data
Short History: From ODCleanstore & LOD Manager to Unified Views
Two tools targetting the same purpose with different strenght
One tool aligning the ideas of both tools and going beyond that
LOD2 Webinar . 29.11.2011 . Page 8 http://lod2.eu
9. Creating Knowledge out of Interlinked Data
Presentation of Unified Views
• Basic Concepts
• Key Features
• Demo
LOD2 Webinar . 29.11.2011 . Page 9 http://lod2.eu
10. Creating Knowledge out of Interlinked Data
Basic Concepts in UnifiedViews – A Pipeline
• Every data processing task is modelled as a pipeline.
LOD2 Webinar . 29.11.2011 . Page 10 http://lod2.eu
11. Creating Knowledge out of Interlinked Data
Basic Concepts in UnifiedViews - Data Processing Unit (DPU)
• Component, plugin, module, on the pipeline
• Every DPU has certain inputs, outputs, business logic and configuration.
Based on the input and the configuration, the outputs are created.
– E.g., DPU may apply certain set of SPARQL Update queries to the input RDF and produces
output RDF data.
LOD2 Webinar . 29.11.2011 . Page 11 http://lod2.eu
12. Creating Knowledge out of Interlinked Data
Key Features
• Web administration interface:
– Define and manage pipelines
– Validate, execute, monitor and debug pipelines
– Possibility to schedule tasks, set up notifications about the pipeline executions
– Define and manage DPUs
– Possibility to debug inputs to/outputs from DPU
– Possibility to share pipelines and DPUs
– Possibility to get notifications about the result of the pipeline execution
– Multi-user environment
• Robust engine running the tasks
– Ensures that DPUs on the pipeline are executed in the proper order
– It may send notifications about the result of the pipeline execution
• Core DPUs to work with RDF data
• Easy way how to extend UnifiedViews with your own DPUs
– Every DPU is an OSGi bundle, as a result, two DPUs with the requirement for two different
versions of the same library may coexist in the framework
– Possibility to reload DPUs on the fly
LOD2 Webinar . 29.11.2011 . Page 12 http://lod2.eu
13. Creating Knowledge out of Interlinked Data
Demo
• Part A – instance http://odcs.xrg.cz:8080/unifiedviews
– Introduction to the Web user interface (2mins)
– Simple pipeline and basic operations with the pipeline (5mins)
– DPU templates, how they can be managed (1-2mins)
• Part B – instance http://odcs.xrg.cz:8080/odcleanstore
– More complex pipelines (1-5mins)
LOD2 Webinar . 29.11.2011 . Page 13 http://lod2.eu
14. Creating Knowledge out of Interlinked Data
Related Work
• Non-RDF ETL Frameworks
– Plenty of ETL frameworks, some of them are open source
– No support for RDF data format and ontologies in the framework itself
• E.g., DPUs are not prepared to suggest ontological terms in DPU configurations
– No native support for exchanging RDF data between DPUs
– No RDF data processing units available out of the box
• Linked Data Integration Framework (LDIF)
• DERI Pipes
– When adding new DPUs, Core must be rebuilt
– It is not possible to reload Dpus on the fly
– Does not provide solution for library version clashes
– No possibility to debug inputs/outputs of DPUs
LOD2 Webinar . 29.11.2011 . Page 14 http://lod2.eu
15. Creating Knowledge out of Interlinked Data
Impact
• Integrated into the LOD2 stack
• Replacin the existing LOD Manager integration
• Used in LOD2
• WP9a, to process public contracts data
• WP7, to enrich documents with links to Dbpedia and WKD Thesauri
• Used by other projects
• OpenData.cz initiative
• INITLIB
• COMSODE FP7 project (2013-2015)
• OpenFridge project.
• Used for commercial purposes by companies Semantica.cz, Czech Republic, and
Semantic Web Company, Austria, to help their customers to prepare and process RDF
data
LOD2 Webinar . 29.11.2011 . Page 15 http://lod2.eu
16. Creating Knowledge out of Interlinked Data
How to try UnifiedViews?
• UnifiedViews is available under open source license
– GPLv3 + LGPLv3
• Hosted on GitHub
– Respository: https://github.com/UnifiedViews/Core
• Current latest version: UnifiedViews 1.0 Candidate
– Branch in the repository
• User Documentation:
– https://grips.semantic-web.at/display/UDDOC/UnifiedViews+User+Documentation
LOD2 Webinar . 29.11.2011 . Page 16 http://lod2.eu
17. Creating Knowledge out of Interlinked Data
How to develop new DPUs?
• Guide for Plugin (DPU) developers:
– https://grips.semantic-web.at/display/UDDOC/Creation+of+Plugins
• In short, every DPU typically consists of 4 main files
– Core DPU file
• Implement execute() method
• Define inputs, outputs
– pom.xml File
– DPU dialog
– DPU config object
LOD2 Webinar . 29.11.2011 . Page 17 http://lod2.eu
18. Creating Knowledge out of Interlinked Data
How to contribute?
• Guideline for contributors:
– https://grips.semantic-web.at/display/UDDOC/Guidelines+for+Contributors
LOD2 Webinar . 29.11.2011 . Page 18 http://lod2.eu
19. Creating Knowledge out of Interlinked Data
Conclusions
• We presented UnifiedViews, an ETL framework with a native support for
processing RDF data, which addresses the problem of sustainable RDF data
processing
– Users may define, execute, monitor, debug, schedule, and share data processing tasks
(pipeline)
– Users may create their own plugins - data processing units
• UnifiedViews has a living community around and is already used in many
projects
– It is Maintained by Semantic Web Company and Semantica.cz
LOD2 Webinar . 29.11.2011 . Page 19 http://lod2.eu
20. Creating Knowledge out of Interlinked Data
Credits
Jingle R.E.M., Martin Kaltenböck, Florian Kondert
Coordination Thomas Thurner
Martin Kaltenböck
Moderation Martin Kaltenböck
Presented by Tomas Knap, Helmut Nagy
LOD2 Webinar . 29.11.2011 . Page 20 http://lod2.eu
21. Creating Knowledge out of Interlinked Data
http://lod2.eu
Hope you enjoyed staying with us – if you need more detailed
information, visit us at www.lod2.eu and let us know how we can
improve to meet your expectations!
Don’t forget to register for our next webinar
20.12. 2011 - Virtuoso (Open Link Software)
24.01. 2012 - OntoWiki (University of Leipzig, Germany)
Have a great day and don’t forget ...
LOD2 Webinar . 29.11.2011 . Page 21 http://lod2.eu
22. Creating Knowledge out of Interlinked Data
http://lod2.eu
LOD2 Webinar . 29.11.2011 . Page 22 http://lod2.eu
Hinweis der Redaktion
Extract data from SPARQL Endpoint A
Extract data from CSV files B
Refine data with SPARQL queries X,Y, Z
Deduplicate data using Linker L
Publish data to SPARQL Endpoint B
Pipeline consists certain components and defines data flow between componets
I will probably refactor to two slides
present the graphical user interface, all tabs
List of pipelines
List of DPU templates, show detail
Execution monitor
Scheduller
Settings
Simple Pipeline
copy a prepared pipeline pipeline (with one extractor, one loader),
Open, rename, set proper visibility
Show DPUs in the tree, how they can be placed on the canvas
debug the pipeline, show
Add transformer, debug, show inputs/outputs to the new DPU
Close the pipeline
Run the pipeline, open exec monitor, show that the pipeline is there
Schedule the pipeline (just show)
Show how easy it is to import new DPU, replace new DPU
Show more complex pipeline examples:
CZSO – Codelists
INTLIB nsoud
Pipelines with Silk linker – creating sameAs link between BEs
The goal of the OpenData.cz initiative18 is to extract, transform and publish Czech open data in the form of Linked Data, so that the initiative contributes to the Czech Linked (Open) Data cloud. For this effort, UV framework is successfully used since September 2013.
Project INTLIB19 aims at extracting (1) references between legislation docu- ments, such as decisions and acts, (2) entities (e.g., a citizen, a president) defined by these documents and (3) the rights and obligations of these extracted enti- ties. UV is used in INTLIB to extract data from selected sources of legislation documents, convert it to RDF data, and provide it as Linked Data.
COMSODE FP7 project20 has the goal to create a publication platform for publishing (linked) open data. ODCleanStore is used there as the core tool for converting hundreds of original datasets to RDF/Linked Data.