4. UnifiedViews
Motivation
▸ Maintaining RDF data
processing tasks is challenging
▹ Different tools
▹ Different configurations
▹ Tens of data processing tasks
sharing parts of the data
processing
▸ Debugging
4
5. UnifiedViews
Approach
▸ UnifiedViews is an ETL tool for
RDF data processing
▹ Allows users to manage RDF data
processing tasks
▹ Natively supporting RDF data
format
5
6. UnifiedViews
Approach
▸ Standard maintenance interface
▹ Define, execute, monitor, schedule, and
share data processing tasks
▹ Predefined and customizable building
blocks (plugins) to set up the individual
data processing tasks
▸ Debugging features
▸ Simplified documentation
▹ Visualizations of the prepared tasks
■ Plugins
■ Data flow
6
8. UnifiedViews
Core
Components
▸ Web administration interface
▹ Define and maintain pipelines
▹ Validate, execute, monitor pipelines
▹ Possibility to schedule pipelines
■ Notifications
▹ Possibility to debug pipelines
▹ Possibility to share pipelines and plugins
▹ Define and maintain plugins
▹ Multi-user environment, SSO support
▸ Robust engine running the tasks
▸ API to work with tasks, executions,
schedulled events
8
9. UnifiedViews
Core Plugins
▸ Set of Core plugins available
▹ Extractors
■ Obtaining external sources (CSV, DBF, XLS, XML
files, RDF data, or relational tables)
▹ Transformers
■ Transforming them between various formats
(e.g. CSV files to RDF data, relational tables to
RDF data)
■ Executing typical transformations such as
SPARQL Update queries, or XSL
transformations
▹ Loaders
■ Loading the transformed and curated data to
external systems, repositories
▸ 35+ plugins
9
12. PoolParty
Semantic
Integrator
and
UnifiedViews
▸ UnifiedViews is part of PoolParty
Semantic Integrator
▸ A semantic technology suite
▹ Organize and maintain company
knowledge
▹ Annotate documents with concepts from
the knowledge base
▹ Provide focused search on top of the
annotated document space
▸ https://www.poolparty.biz/
▹ Or please visit PoolParty booth
12
13. UnifiedViews
Availability
▸ Available under an open source
license (GPL + LGPL v3)
▹ Commercial license also available as part
of PoolParty Semantic Integrator
▸ Hosted on GitHub
▹ https://github.com/UnifiedViews
▸ Latest release (June 2016):
▹ UnifiedViews Core 2.3.1
▸ http://unifiedviews.eu
13
15. 3 Use Cases
1. Aligned Project
▹ Extraction/Annotation of data
from Atlassian Confluence/JIRA
2. Boehringer Ingelheim
▹ Publication tracker
3. World Bank
▹ Annotation of World Bank docs
▹ Integration with MarkLogic
15
17. About
▸ Aligned project:
▹ H2020, http://aligned-project.eu/
▸ One of the goals:
▹ Integrate outputs from commercial
tools such as Atlassian Confluence,
JIRA to bring a data-centric approach
to governance of software and data
engineering
17
18. UnifiedViews
Use Case
▸ UnifiedViews pipeline
▹ Extracting data from Atlassian
Confluence, JIRA
▹ Annotating textual content with a
taxonomy maintained in PoolParty
▹ Loading everything to a remote
triple store
18
20. Benefits,
Lessons
Learned
▸ Predefined plugins which may be
used out of the box
▹ No heavy programming
▸ Easy pipeline management via user
interface
▸ Further support when preparing
the pipeline
▹ Pipeline validation
▹ Pipeline debugging
20
22. About
▸ Boehringer Ingelheim wanted to get
better overview over world-wide
research activities
▸ Extract and annotate articles published
at PubMed
▹ http://www.ncbi.nlm.nih.gov/pubmed
▸ Linking of unstructured and structured /
internal and external information
22
25. Benefits,
Lessons
Learned
▸ Pipelines in UnifiedViews may
be easily
▹ scheduled
▹ extended in the future
▸ Detailed information about
the pipeline executions is
available
▹ Events, logs
▸ Maintenance simplified
25
26. Benefits,
Lessons
Learned
▸ Missing
▹ Long running pipelines
■ Tighter integration of UnifiedViews
and PoolParty Semantic Integrator
▹ Loops, conditional execution of
plugins
26
28. About
▸ Goal: Search over annotated
World Bank documents
▹ World Bank topical taxonomy
▹ Geo taxonomy
▸ Demo:
▹ http://marklogic-demo.poolparty.biz
28
29. UnifiedViews
Use Case
▸ UnifiedViews pipeline to
annotate portions of the World
Bank documents
▹ Country & region information
annotated with Geo taxonomy
▹ Full text, topics annotated with
World Bank topical taxonomy
29
33. Summary
▸ UnifiedViews
▹ UnifiedViews and PoolParty
Semantic Integrator
▸ UnifiedViews Use Cases
▹ Conversion of sources to RDF data
▹ Annotation of sources
▹ Enrichment of the data
▹ Publication of the curated data to
the target store
▸ UnifiedViews 2.0 in 5mins
33
34. Summarized
Lessons
Learned
▸ Easy pipeline management via user interface
▸ Predefined plugins which may be used
out of the box
▹ No heavy programming
▹ Simplified pipeline creation
▸ Further support when preparing pipeline
▹ Pipeline validation
▹ Pipeline debugging
▸ Pipeline scheduling
34