In the space of just a few years we’ve seen the transformational power of open data; both for transparency and accountability in public data, and efficiency and innovation with businesses in private data. In its first year, institutions and individuals throughout Europe have supported public sector bodies in releasing data and numerous start-ups, developers and SMEs in reusing this data for economic benefit.
However, we are still at the beginning of the open data movement, and there is still more that can be done to make open data simpler to use and to make it available to a wider audience.
The core goal of the DaPaaS project is to provide a Data- and Platform-as-a-Service environment, where 3rd parties (such as governmental organisations, SMEs, developers and larger companies) can publish and host both data sets and data-intensive applications, which can then be accessed by end-user applications in a cross-platform manner. You can find out more about DaPaaS on the detailed about page.
Essentially, DaPaaS aims to make publishing, consumption, and reuse of open data, as well as deploying open data applications, easier and cheaper for SMEs and small public bodies which otherwise may not have sufficient technical expertise, infrastructure and resources required to do so.
see also http://www.slideshare.net/eswcsummerschool/wed-roman-tutopendatapub-38742186
UiPath Community: Communication Mining from Zero to Hero
Enabling Low-cost Open Data Publishing and Reuse
1. DaPaaS: Enabling Low-cost Open Data
Publishing and Reuse
@ Data Summit Brussels
March 5th, 2015
http://dapaas.eu/
Marin Dimitrov, Ontotext, Bulgaria
Amanda Smith, Open Data Institute, UK
2. Open Data Benefits
• Businesses can develop new ideas, services and applications;
improve decision making, cost savings
• Can increase government transparency and accountability, quality
of public services
• Citizens get better and timely access to public services
2
Source: McKinsey
http://www.mckinsey.com/insights/business_technology/open_data_unlocking_innovation_a
nd_performance_with_liquid_information
Gartner:
By 2016, the use of "open data" will continue to
increase — but slowly, and predominantly limited to
Type A enterprises.
By 2017, over 60% of government open data
programs that do not effectively use open data
internally, will be scaled back or discontinued.
By 2020, enterprises and governments will fail to
protect 75% of sensitive data and will declassify and
grant broad / public access to it.
Source: Garner
http://training.gsn.gov.tw/uploads/news/6.Gartner+ExP+Briefing_Open+Data
_JUN+2014_v2.pdf
3. Lots of open datasets on the Web…
• A large number of open datasets published in the recent years
• Various domains: cultural heritage, science, finance, statistics,
transport and smart cities, environment, …
• Various formats: tabular (e.g. CSV, XLS), HTML/XML, JSON, LOD,
Web APIs…
3
4. …but few actually used
• Few applications utilizing open and distributed datasets at present
• Challenges for data consumers
– Data quality issues
– Difficult or unreliable data access
– Licensing issues
• Challenges for data publishers
– Lack of expertise & resources: not easily to publish & maintain high
quality data
– Unclear monetization & sustainability
4
Open Data Portal Datasets Applications
data.gov ~ 110 000 ~ 350
publicdata.eu ~ 50 000 ~ 80
data.gov.uk ~ 20 000 ~ 350
data.norge.no ~ 300 ~ 40
5. Open Data is mostly tabular data
– Records organized in silos of
collections
– Very few links within and/or
across collections
– Difficult to understand the nature
of the data
– Difficult to integrate / query
5
Tabular datasets
publicdata.eu data.gov.uk
6. Linked Data is great for Open Data
• Linked Data as a great means to represent and integrate
disparate and heterogeneous open data sources
• How Linked Data can improve Open Data:
– Easier integration, free data from silos
– Seamless interlinking of data
– Understand the data
– New ways to query and interact with data
• Challenges with using Linked Data
– Lack of tooling & expertise to publish high quality Linked Data
– Lack of resources to host LOD endpoints / unreliable data access
6
7. DaPaaS: making Open (Linked) Data easier
to use
• A data hosting platform: to make it easy for
publishers to put data on the web
• A data portal: to help advertising data
availability
• Data transfomraiton tools to make it easier
to publish large amounts of high quality data
• Open source tools with high-quality
documentation
7
Make Linked Data more
accessible to everyone!
9. Grafter
• Grafter is a DSL and a suite of
tools for data transformation &
cleaning
• Primarily used for handling
data conversions from:
– tabular data formats to tabular
data formats
– tabular data formats to RDF
• “lazy” / stream processing, no
need to load whole dataset
• Robust & efficient for large
scale processing
• Transformations can be
packaged as REST services
• Open Source (EPL)
– http://github.com/swirrl/grafter
– http://grafter.org/
9
10. Tabular data (spreadsheet)
to RDF Linked Data (graph)
1. Define a pipeline of tabular transformations for data cleaning and
transformation.
2. Create the graph fragments resulting in the generation of an RDF
graph.
10
11.
12. Grafterizer
• GUI tool for the Grafter suite
• Open Source (EPL)
– github.com/dapaas/grafterizer
12
13. Use Case: Transformation and Mapping to
RDF
• Import raw data
• Clean up and transform using Grafter / Grafterizer
• Define ontology mapping using Grafterizer
• Generate the RDF graph
Transform
Generate
RDF
Ontology X
Ontology X
Ontology X
Ontology
mapping
RDF
Graph
Raw
Data
Prepared
Data
Map
Map
14. RDF database-as-a-service
• Enables live data services, instead of static datasets
– A new RDF database can be operational within seconds
• Automated backups, operations, maintenance
• Based on an enterprise-grade RDF database
– Linked Data Fragments servers to be deployed too
• Designed for scalability & availability, in the cloud
• Data import services (Grafter pipelines)
14
15. Summary
• Open Data has big potential for governments,
enterprises and citizens
• Lots of open datasets available, but very few actually
used
• Linked Data is a promising technology for Open Data,
but still difficult to use for publishers and application
developers
• DaPaaS – enabling low-cost Open (Linked) Data
publishing and reuse
– Platform, portal, methodology, APIs
– Repeatable and scalable data transformations
– Scalable Linked Data hosting
15