2. 2
“Data is the new oil”
…but many of us just need gasoline
Data-as-a-Service
…is the new filling station
3. Data-as-a-Service
• Outsourcing of various data operations to the cloud
• Eliminates
– upfront costs on data infrastructure
– ongoing investment of time and resources in managing the data
infrastructure
• Complete package for
– transformation of raw data into meaningful data assets
– reliable delivery of data assets
3
4. Example #1: Using open data – petroleum
activities on the Norwegian continental shelf
4
• ~70 tabular datasets
• Difficult to query across
tables, integrate with other
data, e.g. Business
Registry
• Simplified integration with external
datasets
• Distribution of integrated dataset
• Live service
• Reliable access
• …
• Which companies have been
owners in license X?
• What is the oil production
for each field in year X?
• What is the total production of the
top 10 companies by number of
employees in year X?
• ....
Integration and querying service
Tabular
data on
the Web
Data Insights
factpages.npd.no data.brreg.no/oppslag/enhetsregister
et
5. Example #2: Reporting state-owned
real estate properties in Norway
• A hard copy of 314 pages and as a
PDF file
• 6 Person-Months
• Data collection with spreadsheets
• Quality assurance through e-mails
and phone correspondence
Pains
• Time consuming
• Poor data quality
• Static report without live updating
• Live service
• Efficient sharing of data
• Simplified integration with external
datasets
• Live updating
• Reliable access
• …
• Risk and vulnerability analysis,
e.g. buildings affected by
flooding
• Analysis of leasing prices
Report Reporting Service 3rd party services
5
7. 7
Example #3: Personalized and Localized
Urban Quality Index (PLUQI)
The index includes data from various domains:
Daily life satisfaction weather, transportation, community,…
Healthcare level number of doctors, hospitals, suicide statistics,…
Safety and security number of police stations, fire stations, crimes per capita,…
Financial satisfaction prices, incomes, housing, savings, debt, insurance, pension,…
Level of opportunity jobs, unemployment, education, re-education,…
Environmental needs and efficiency green space, air quality,…
9. was developed to allow
data workers
to manage their data in a
simple, effective, and efficient way
Powerful
data transformation and
reliable data access capabilities
9
DataGraft
10. Tabular Data Graph Data
• Open Data is mostly tabular data
• Excel, CSV, TSV, etc.
• Records organized in silos of collections
• Very few links within and/or across
collections
• Difficult to understand the nature of the
data
• Difficult to integrate / query
Based on Linked Data
• Method for publishing data on the Web
• Self-describing data and relations
• Interlinking
• Accessed using semantic queries
• Open standards by W3C
− Data format: RDF
− Knowledge representation: RDFS/OWL
− Query language: SPARQL
http://www.w3.org/standards/semanticweb/data
europeandataportal.eu
10
11. Data Transformation and
RDF Publication Process
• Interactive design of transformations?
• Repeatable transformations?
• Reuse/share transformations (user-based access)?
• Cloud-based deployment of transformations?
• Self-serviced process?
• Data and Transformation as-a-Service? 11
Semantic graph
database
32. 32
Data records (rows)
Add row
Take row(s)
Drop row(s)
Shift row
Filter rows (grep)
Remove duplicate rows
Entire dataset
Sort
Reshape dataset
Group (categorize) and aggregate
Columns
Add column(s)
Take column(s)
Drop column(s)
Move column
Merge columns
Split column
Rename column(s)
Apply function to all values in a column
38. Data pages and federated querying
38
What is the
population of
locations and
total number of
persons employed
in Human health
and social work
activities?
44. DataGraft key feature:
Flexible management and sharing of data
and transformations
Fork, reuse and extend
transformations built by other
professionals from DataGraft’s
transformations catalog
Interactively build,
modify and share data
transformations
Share transformations
privately or publicly
Reuse transformations to
repeatably clean and
transform spreadsheet
data
Programmatically access transformations
and the transformation catalogue
44
45. Reuse of transformations in environmental
data publishing
TRAGSA Pilot
• Number of
transformations: 42
– Created via reuse: 25
• Number of triples:
– ~ 7.7M
ARPA Pilot
• Number of
transformations: 5
– Created via reuse: 2
• Number of triples:
– ~ 14K
45
Forking/reusing transformations helped us spend less
time on creating new transformations
46. DataGraft key feature:
Reliable data hosting and querying services
Host data on DataGraft’s
reliable, cloud-based
semantic graph database
Share data privately or
publicly
Query data through
your own SPARQL
endpoint
Programmatically
access the data
catalogue
46
Operations & maintenance
performed on behalf of users
48. DataGraft – 1 package 2 audiences
DataGraft
Data Publisher Application Developer
Helping
integrating and
publishing data
Giving better,
easier tools
48
49. DataGraft – targeted impacts
Reduction in costs
for organisations which lack
sufficient expertise and resources to
make their data available
Reduction on the dependency
of data owners on generic Cloud platforms
to build, deploy and maintain their linked
data from scratch
Increase in the speed of
publishing
new datasets and updating existing
datasets
Reduction in the cost and
complexity of developing
applications that use data
Increase in the reuse of data
by providing reliable access to numerous
datasets hosted on DataGraft.net
49
50. • Gathering enough of good datasets
• Designing/implementing
2. Able to focus on
service quality
Example: The benefit of DataGraft in PLUQI
50
• Reducing cost for implementing
transformations
• Integrating the process is
simpler
1. 23% of development
cost reduction
Datasets
gathering
Data
transformation
Data
provisioning/access
Implementing
App
Before
Datasets
gathering
Data
transformation
Data
provisioning/
access
Implementing
App
After (with DataGraft)
51. DataGraft in numbers
(as of end of Jan 2016)
51
238
Registered users
607 (208 public)
Registered
Data transformations
1828
Uploaded files
192
Public Data
pages
52. DataGraft in the wild
• Investigating crime data in small geographies
• Used DataGraft to transform data and publish RDF
52http://benproctor.co.uk/investigating-crime-data-at-small-geographies/
53. Data Science and DataGraft
Greater Data Science:
1. Data Exploration and
Preparation
2. Data Representation and
Transformation
3. Computing with Data
4. Data Visualization and
Presentation
5. Data Modeling
6. Science about Data Science
53
“50 years of Data Science” by David Donoho
http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf
DataGraft
54. Summary
• DataGraft – emerging Data-as-a-Service solution for
making (linked) data more accessible
– Platform, portal, methodology, APIs
– Online service, functional and documented
– Validated through several use cases
• Key features:
– Support for Sharable/Repeatable/Reusable Data
Transformations
– Reliable RDF Database-as-a-Service
54