SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Downloaden Sie, um offline zu lesen
Data-as-a-Service
DataGraft
Dumitru Roman
dumitru.roman@sintef.no
https://datagraft.net
2
“Data is the new oil”
…but many of us just need gasoline
Data-as-a-Service
…is the new filling station
Data-as-a-Service
• Outsourcing of various data operations to the cloud
• Eliminates
– upfront costs on data infrastructure
– ongoing investment of time and resources in managing the data
infrastructure
• Complete package for
– transformation of raw data into meaningful data assets
– reliable delivery of data assets
3
Example #1: Using open data – petroleum
activities on the Norwegian continental shelf
4
• ~70 tabular datasets
• Difficult to query across
tables, integrate with other
data, e.g. Business
Registry
• Simplified integration with external
datasets
• Distribution of integrated dataset
• Live service
• Reliable access
• …
• Which companies have been
owners in license X?
• What is the oil production
for each field in year X?
• What is the total production of the
top 10 companies by number of
employees in year X?
• ....
Integration and querying service
Tabular
data on
the Web
Data Insights
factpages.npd.no data.brreg.no/oppslag/enhetsregister
et
Example #2: Reporting state-owned
real estate properties in Norway
• A hard copy of 314 pages and as a
PDF file
• 6 Person-Months
• Data collection with spreadsheets
• Quality assurance through e-mails
and phone correspondence
Pains
• Time consuming
• Poor data quality
• Static report without live updating
• Live service
• Efficient sharing of data
• Simplified integration with external
datasets
• Live updating
• Reliable access
• …
• Risk and vulnerability analysis,
e.g. buildings affected by
flooding
• Analysis of leasing prices
Report Reporting Service 3rd party services
5
Sample data
6
Cleaning, Transformation, Publishing,
Integration, Querying, Visualization,
Service Access
7
Example #3: Personalized and Localized
Urban Quality Index (PLUQI)
The index includes data from various domains:
Daily life satisfaction weather, transportation, community,…
Healthcare level number of doctors, hospitals, suicide statistics,…
Safety and security number of police stations, fire stations, crimes per capita,…
Financial satisfaction prices, incomes, housing, savings, debt, insurance, pension,…
Level of opportunity jobs, unemployment, education, re-education,…
Environmental needs and efficiency green space, air quality,…
Sample data
8
was developed to allow
data workers
to manage their data in a
simple, effective, and efficient way
Powerful
data transformation and
reliable data access capabilities
9
DataGraft
Tabular Data Graph Data
• Open Data is mostly tabular data
• Excel, CSV, TSV, etc.
• Records organized in silos of collections
• Very few links within and/or across
collections
• Difficult to understand the nature of the
data
• Difficult to integrate / query
Based on Linked Data
• Method for publishing data on the Web
• Self-describing data and relations
• Interlinking
• Accessed using semantic queries
• Open standards by W3C
− Data format: RDF
− Knowledge representation: RDFS/OWL
− Query language: SPARQL
http://www.w3.org/standards/semanticweb/data
europeandataportal.eu
10
Data Transformation and
RDF Publication Process
• Interactive design of transformations?
• Repeatable transformations?
• Reuse/share transformations (user-based access)?
• Cloud-based deployment of transformations?
• Self-serviced process?
• Data and Transformation as-a-Service? 11
Semantic graph
database
Tabular
Data
Graph
Data
DataGraft: Data-as-a-Service
For the Data Transformation and RDF Publication Process
12
13
https://www.ssb.no/statistikkbanken
Example: Using statistical data
14
30
31
32
Data records (rows)
Add row
Take row(s)
Drop row(s)
Shift row
Filter rows (grep)
Remove duplicate rows
Entire dataset
Sort
Reshape dataset
Group (categorize) and aggregate
Columns
Add column(s)
Take column(s)
Drop column(s)
Move column
Merge columns
Split column
Rename column(s)
Apply function to all values in a column
33
34
35
36
37
Data pages and federated querying
38
What is the
population of
locations and
total number of
persons employed
in Human health
and social work
activities?
Configuring data visualizations
39
40
41
42
43
APIs
DataGraft key feature:
Flexible management and sharing of data
and transformations
Fork, reuse and extend
transformations built by other
professionals from DataGraft’s
transformations catalog
Interactively build,
modify and share data
transformations
Share transformations
privately or publicly
Reuse transformations to
repeatably clean and
transform spreadsheet
data
Programmatically access transformations
and the transformation catalogue
44
Reuse of transformations in environmental
data publishing
TRAGSA Pilot
• Number of
transformations: 42
– Created via reuse: 25
• Number of triples:
– ~ 7.7M
ARPA Pilot
• Number of
transformations: 5
– Created via reuse: 2
• Number of triples:
– ~ 14K
45
Forking/reusing transformations helped us spend less
time on creating new transformations
DataGraft key feature:
Reliable data hosting and querying services
Host data on DataGraft’s
reliable, cloud-based
semantic graph database
Share data privately or
publicly
Query data through
your own SPARQL
endpoint
Programmatically
access the data
catalogue
46
Operations & maintenance
performed on behalf of users
Grafter Grafterizer
Semantic
Graph DBaaSData Portal
DataGraft
47
DataGraft Enablers
DataGraft – 1 package 2 audiences
DataGraft
Data Publisher Application Developer
Helping
integrating and
publishing data
Giving better,
easier tools
48
DataGraft – targeted impacts
Reduction in costs
for organisations which lack
sufficient expertise and resources to
make their data available
Reduction on the dependency
of data owners on generic Cloud platforms
to build, deploy and maintain their linked
data from scratch
Increase in the speed of
publishing
new datasets and updating existing
datasets
Reduction in the cost and
complexity of developing
applications that use data
Increase in the reuse of data
by providing reliable access to numerous
datasets hosted on DataGraft.net
49
• Gathering enough of good datasets
• Designing/implementing
2. Able to focus on
service quality
Example: The benefit of DataGraft in PLUQI
50
• Reducing cost for implementing
transformations
• Integrating the process is
simpler
1. 23% of development
cost reduction
Datasets
gathering
Data
transformation
Data
provisioning/access
Implementing
App
Before
Datasets
gathering
Data
transformation
Data
provisioning/
access
Implementing
App
After (with DataGraft)
DataGraft in numbers
(as of end of Jan 2016)
51
238
Registered users
607 (208 public)
Registered
Data transformations
1828
Uploaded files
192
Public Data
pages
DataGraft in the wild
• Investigating crime data in small geographies
• Used DataGraft to transform data and publish RDF
52http://benproctor.co.uk/investigating-crime-data-at-small-geographies/
Data Science and DataGraft
Greater Data Science:
1. Data Exploration and
Preparation
2. Data Representation and
Transformation
3. Computing with Data
4. Data Visualization and
Presentation
5. Data Modeling
6. Science about Data Science
53
“50 years of Data Science” by David Donoho
http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf
DataGraft
Summary
• DataGraft – emerging Data-as-a-Service solution for
making (linked) data more accessible
– Platform, portal, methodology, APIs
– Online service, functional and documented
– Validated through several use cases
• Key features:
– Support for Sharable/Repeatable/Reusable Data
Transformations
– Reliable RDF Database-as-a-Service
54
https://datagraft.net
Thank you!
Contact: dumitru.roman@sintef.no 55

Weitere ähnliche Inhalte

Was ist angesagt?

Powering Self Service Business Intelligence with Hadoop and Data Virtualization
Powering Self Service Business Intelligence with Hadoop and Data VirtualizationPowering Self Service Business Intelligence with Hadoop and Data Virtualization
Powering Self Service Business Intelligence with Hadoop and Data VirtualizationDenodo
 
Bridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architectureBridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architectureIBM Analytics
 
Workshop Rio de Janeiro Strategies for Web Based Data Dissemination
Workshop Rio de Janeiro Strategies for Web Based Data DisseminationWorkshop Rio de Janeiro Strategies for Web Based Data Dissemination
Workshop Rio de Janeiro Strategies for Web Based Data DisseminationZoltan Nagy
 
How OpenTable uses Big Data to impact growth by Raman Marya
How OpenTable uses Big Data to impact growth by Raman MaryaHow OpenTable uses Big Data to impact growth by Raman Marya
How OpenTable uses Big Data to impact growth by Raman MaryaData Con LA
 
Data Virtualization to Survive a Multi and Hybrid Cloud World
Data Virtualization to Survive a Multi and Hybrid Cloud WorldData Virtualization to Survive a Multi and Hybrid Cloud World
Data Virtualization to Survive a Multi and Hybrid Cloud WorldDenodo
 
The Curse of the Data Lake Monster
The Curse of the Data Lake MonsterThe Curse of the Data Lake Monster
The Curse of the Data Lake MonsterThoughtworks
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Denodo DataFest 2017: Outpace Your Competition with Real-Time Responses
Denodo DataFest 2017: Outpace Your Competition with Real-Time ResponsesDenodo DataFest 2017: Outpace Your Competition with Real-Time Responses
Denodo DataFest 2017: Outpace Your Competition with Real-Time ResponsesDenodo
 
Denodo DataFest 2017: Multi-zone Data Virtualization for Data Lakes
Denodo DataFest 2017: Multi-zone Data Virtualization for Data LakesDenodo DataFest 2017: Multi-zone Data Virtualization for Data Lakes
Denodo DataFest 2017: Multi-zone Data Virtualization for Data LakesDenodo
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationDenodo
 
Denodo Datafest 2016: Modernizing Data Warehouse Using Real-time Data Virtual...
Denodo Datafest 2016: Modernizing Data Warehouse Using Real-time Data Virtual...Denodo Datafest 2016: Modernizing Data Warehouse Using Real-time Data Virtual...
Denodo Datafest 2016: Modernizing Data Warehouse Using Real-time Data Virtual...Denodo
 
Enabling digital transformation api ecosystems and data virtualization
Enabling digital transformation   api ecosystems and data virtualizationEnabling digital transformation   api ecosystems and data virtualization
Enabling digital transformation api ecosystems and data virtualizationDenodo
 
Business Insight
Business InsightBusiness Insight
Business InsightMicrosoft
 
Cloud Modernization with Data Virtualization
Cloud Modernization with Data VirtualizationCloud Modernization with Data Virtualization
Cloud Modernization with Data VirtualizationDenodo
 
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Rittman Analytics
 
EDF2014: Franck Cotton & Kamel Gadouche, France: TeraLab - A Secure Big Data...
EDF2014: Franck Cotton  & Kamel Gadouche, France: TeraLab - A Secure Big Data...EDF2014: Franck Cotton  & Kamel Gadouche, France: TeraLab - A Secure Big Data...
EDF2014: Franck Cotton & Kamel Gadouche, France: TeraLab - A Secure Big Data...European Data Forum
 
Agile Data Management with Enterprise Data Fabric (Middle East)
Agile Data Management with Enterprise Data Fabric (Middle East)Agile Data Management with Enterprise Data Fabric (Middle East)
Agile Data Management with Enterprise Data Fabric (Middle East)Denodo
 
Data Architecture for Machine Learning
Data Architecture for Machine LearningData Architecture for Machine Learning
Data Architecture for Machine LearningDenodo
 

Was ist angesagt? (20)

Powering Self Service Business Intelligence with Hadoop and Data Virtualization
Powering Self Service Business Intelligence with Hadoop and Data VirtualizationPowering Self Service Business Intelligence with Hadoop and Data Virtualization
Powering Self Service Business Intelligence with Hadoop and Data Virtualization
 
Bridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architectureBridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architecture
 
Workshop Rio de Janeiro Strategies for Web Based Data Dissemination
Workshop Rio de Janeiro Strategies for Web Based Data DisseminationWorkshop Rio de Janeiro Strategies for Web Based Data Dissemination
Workshop Rio de Janeiro Strategies for Web Based Data Dissemination
 
How OpenTable uses Big Data to impact growth by Raman Marya
How OpenTable uses Big Data to impact growth by Raman MaryaHow OpenTable uses Big Data to impact growth by Raman Marya
How OpenTable uses Big Data to impact growth by Raman Marya
 
Data Virtualization to Survive a Multi and Hybrid Cloud World
Data Virtualization to Survive a Multi and Hybrid Cloud WorldData Virtualization to Survive a Multi and Hybrid Cloud World
Data Virtualization to Survive a Multi and Hybrid Cloud World
 
The Curse of the Data Lake Monster
The Curse of the Data Lake MonsterThe Curse of the Data Lake Monster
The Curse of the Data Lake Monster
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Denodo DataFest 2017: Outpace Your Competition with Real-Time Responses
Denodo DataFest 2017: Outpace Your Competition with Real-Time ResponsesDenodo DataFest 2017: Outpace Your Competition with Real-Time Responses
Denodo DataFest 2017: Outpace Your Competition with Real-Time Responses
 
Denodo DataFest 2017: Multi-zone Data Virtualization for Data Lakes
Denodo DataFest 2017: Multi-zone Data Virtualization for Data LakesDenodo DataFest 2017: Multi-zone Data Virtualization for Data Lakes
Denodo DataFest 2017: Multi-zone Data Virtualization for Data Lakes
 
Big data hadoop
Big data hadoopBig data hadoop
Big data hadoop
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
 
Denodo Datafest 2016: Modernizing Data Warehouse Using Real-time Data Virtual...
Denodo Datafest 2016: Modernizing Data Warehouse Using Real-time Data Virtual...Denodo Datafest 2016: Modernizing Data Warehouse Using Real-time Data Virtual...
Denodo Datafest 2016: Modernizing Data Warehouse Using Real-time Data Virtual...
 
Enabling digital transformation api ecosystems and data virtualization
Enabling digital transformation   api ecosystems and data virtualizationEnabling digital transformation   api ecosystems and data virtualization
Enabling digital transformation api ecosystems and data virtualization
 
Business Insight
Business InsightBusiness Insight
Business Insight
 
Cloud Modernization with Data Virtualization
Cloud Modernization with Data VirtualizationCloud Modernization with Data Virtualization
Cloud Modernization with Data Virtualization
 
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
 
EDF2014: Franck Cotton & Kamel Gadouche, France: TeraLab - A Secure Big Data...
EDF2014: Franck Cotton  & Kamel Gadouche, France: TeraLab - A Secure Big Data...EDF2014: Franck Cotton  & Kamel Gadouche, France: TeraLab - A Secure Big Data...
EDF2014: Franck Cotton & Kamel Gadouche, France: TeraLab - A Secure Big Data...
 
Agile Data Management with Enterprise Data Fabric (Middle East)
Agile Data Management with Enterprise Data Fabric (Middle East)Agile Data Management with Enterprise Data Fabric (Middle East)
Agile Data Management with Enterprise Data Fabric (Middle East)
 
Data Architecture for Machine Learning
Data Architecture for Machine LearningData Architecture for Machine Learning
Data Architecture for Machine Learning
 

Ähnlich wie Data-as-a-Service: DataGraft

Industry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftIndustry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftRuleML
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Denodo
 
DataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open DataDataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open Datadapaasproject
 
DataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open DataDataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open Datadapaasproject
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationDATAVERSITY
 
Govern and Protect Your End User Information
Govern and Protect Your End User InformationGovern and Protect Your End User Information
Govern and Protect Your End User InformationDenodo
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeDenodo
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Enabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseEnabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseMarin Dimitrov
 
Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsData Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsDenodo
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesDenodo
 
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)Denodo
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeDATAVERSITY
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItDenodo
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 

Ähnlich wie Data-as-a-Service: DataGraft (20)

Industry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftIndustry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraft
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
 
DataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open DataDataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open Data
 
DataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open DataDataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open Data
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
Govern and Protect Your End User Information
Govern and Protect Your End User InformationGovern and Protect Your End User Information
Govern and Protect Your End User Information
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Enabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseEnabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and Reuse
 
Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsData Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified Insights
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business Outcomes
 
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
 
Telvent Big Data Approach and Case Studies
Telvent Big Data Approach and Case StudiesTelvent Big Data Approach and Case Studies
Telvent Big Data Approach and Case Studies
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 

Kürzlich hochgeladen

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 

Kürzlich hochgeladen (20)

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 

Data-as-a-Service: DataGraft

  • 2. 2 “Data is the new oil” …but many of us just need gasoline Data-as-a-Service …is the new filling station
  • 3. Data-as-a-Service • Outsourcing of various data operations to the cloud • Eliminates – upfront costs on data infrastructure – ongoing investment of time and resources in managing the data infrastructure • Complete package for – transformation of raw data into meaningful data assets – reliable delivery of data assets 3
  • 4. Example #1: Using open data – petroleum activities on the Norwegian continental shelf 4 • ~70 tabular datasets • Difficult to query across tables, integrate with other data, e.g. Business Registry • Simplified integration with external datasets • Distribution of integrated dataset • Live service • Reliable access • … • Which companies have been owners in license X? • What is the oil production for each field in year X? • What is the total production of the top 10 companies by number of employees in year X? • .... Integration and querying service Tabular data on the Web Data Insights factpages.npd.no data.brreg.no/oppslag/enhetsregister et
  • 5. Example #2: Reporting state-owned real estate properties in Norway • A hard copy of 314 pages and as a PDF file • 6 Person-Months • Data collection with spreadsheets • Quality assurance through e-mails and phone correspondence Pains • Time consuming • Poor data quality • Static report without live updating • Live service • Efficient sharing of data • Simplified integration with external datasets • Live updating • Reliable access • … • Risk and vulnerability analysis, e.g. buildings affected by flooding • Analysis of leasing prices Report Reporting Service 3rd party services 5
  • 6. Sample data 6 Cleaning, Transformation, Publishing, Integration, Querying, Visualization, Service Access
  • 7. 7 Example #3: Personalized and Localized Urban Quality Index (PLUQI) The index includes data from various domains: Daily life satisfaction weather, transportation, community,… Healthcare level number of doctors, hospitals, suicide statistics,… Safety and security number of police stations, fire stations, crimes per capita,… Financial satisfaction prices, incomes, housing, savings, debt, insurance, pension,… Level of opportunity jobs, unemployment, education, re-education,… Environmental needs and efficiency green space, air quality,…
  • 9. was developed to allow data workers to manage their data in a simple, effective, and efficient way Powerful data transformation and reliable data access capabilities 9 DataGraft
  • 10. Tabular Data Graph Data • Open Data is mostly tabular data • Excel, CSV, TSV, etc. • Records organized in silos of collections • Very few links within and/or across collections • Difficult to understand the nature of the data • Difficult to integrate / query Based on Linked Data • Method for publishing data on the Web • Self-describing data and relations • Interlinking • Accessed using semantic queries • Open standards by W3C − Data format: RDF − Knowledge representation: RDFS/OWL − Query language: SPARQL http://www.w3.org/standards/semanticweb/data europeandataportal.eu 10
  • 11. Data Transformation and RDF Publication Process • Interactive design of transformations? • Repeatable transformations? • Reuse/share transformations (user-based access)? • Cloud-based deployment of transformations? • Self-serviced process? • Data and Transformation as-a-Service? 11 Semantic graph database
  • 12. Tabular Data Graph Data DataGraft: Data-as-a-Service For the Data Transformation and RDF Publication Process 12
  • 14. 14
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30. 30
  • 31. 31
  • 32. 32 Data records (rows) Add row Take row(s) Drop row(s) Shift row Filter rows (grep) Remove duplicate rows Entire dataset Sort Reshape dataset Group (categorize) and aggregate Columns Add column(s) Take column(s) Drop column(s) Move column Merge columns Split column Rename column(s) Apply function to all values in a column
  • 33. 33
  • 34. 34
  • 35. 35
  • 36. 36
  • 37. 37
  • 38. Data pages and federated querying 38 What is the population of locations and total number of persons employed in Human health and social work activities?
  • 40. 40
  • 41. 41
  • 42. 42
  • 44. DataGraft key feature: Flexible management and sharing of data and transformations Fork, reuse and extend transformations built by other professionals from DataGraft’s transformations catalog Interactively build, modify and share data transformations Share transformations privately or publicly Reuse transformations to repeatably clean and transform spreadsheet data Programmatically access transformations and the transformation catalogue 44
  • 45. Reuse of transformations in environmental data publishing TRAGSA Pilot • Number of transformations: 42 – Created via reuse: 25 • Number of triples: – ~ 7.7M ARPA Pilot • Number of transformations: 5 – Created via reuse: 2 • Number of triples: – ~ 14K 45 Forking/reusing transformations helped us spend less time on creating new transformations
  • 46. DataGraft key feature: Reliable data hosting and querying services Host data on DataGraft’s reliable, cloud-based semantic graph database Share data privately or publicly Query data through your own SPARQL endpoint Programmatically access the data catalogue 46 Operations & maintenance performed on behalf of users
  • 47. Grafter Grafterizer Semantic Graph DBaaSData Portal DataGraft 47 DataGraft Enablers
  • 48. DataGraft – 1 package 2 audiences DataGraft Data Publisher Application Developer Helping integrating and publishing data Giving better, easier tools 48
  • 49. DataGraft – targeted impacts Reduction in costs for organisations which lack sufficient expertise and resources to make their data available Reduction on the dependency of data owners on generic Cloud platforms to build, deploy and maintain their linked data from scratch Increase in the speed of publishing new datasets and updating existing datasets Reduction in the cost and complexity of developing applications that use data Increase in the reuse of data by providing reliable access to numerous datasets hosted on DataGraft.net 49
  • 50. • Gathering enough of good datasets • Designing/implementing 2. Able to focus on service quality Example: The benefit of DataGraft in PLUQI 50 • Reducing cost for implementing transformations • Integrating the process is simpler 1. 23% of development cost reduction Datasets gathering Data transformation Data provisioning/access Implementing App Before Datasets gathering Data transformation Data provisioning/ access Implementing App After (with DataGraft)
  • 51. DataGraft in numbers (as of end of Jan 2016) 51 238 Registered users 607 (208 public) Registered Data transformations 1828 Uploaded files 192 Public Data pages
  • 52. DataGraft in the wild • Investigating crime data in small geographies • Used DataGraft to transform data and publish RDF 52http://benproctor.co.uk/investigating-crime-data-at-small-geographies/
  • 53. Data Science and DataGraft Greater Data Science: 1. Data Exploration and Preparation 2. Data Representation and Transformation 3. Computing with Data 4. Data Visualization and Presentation 5. Data Modeling 6. Science about Data Science 53 “50 years of Data Science” by David Donoho http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf DataGraft
  • 54. Summary • DataGraft – emerging Data-as-a-Service solution for making (linked) data more accessible – Platform, portal, methodology, APIs – Online service, functional and documented – Validated through several use cases • Key features: – Support for Sharable/Repeatable/Reusable Data Transformations – Reliable RDF Database-as-a-Service 54