SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
LODStats
Introduction
Description and System Architecture
Dataset Model
Use Cases
Agenda
Data Web Statistics (Summary)
Conclusions
How to comprehend this
data?
3
● Data portals
● Big nucleus datasets
● SPARQL endpoints
Introduction
9960+
RDF Datasets on the Data Portals
4
Calculate statistical metrics User interface
5
Aggregates datasets from the largest data portals
LODStats: Web Application
SPARQL interface
“LODStats – An Extensible Framework for High-performance Dataset Analytics” (EKAW’2012) [1]
6
CKAN Aggregator
LODStats: System Architecture
Scan largest CKAN repos Filter out RDF datasets
“Linked Open Data Statistics: Collection and Exploitation” (KESW’2013) [2]
7
LODStats core application
LODStats: System Architecture (cont.)
Queue RDF datasets Calculate statistics
LODStats:
Provisioning
Docker image per component
docker-compose.yml for the whole project
Sustainable and platform independent deployment
8
LODStats:
Provisioning (cont.)
9
web:
restart: always
build: ./web
links:
- db
- rabbitmq
environment:
- LODSTATS_DB=db
- RABBITMQ=rabbitmq
rabbitmq:
restart: always
image: rabbitmq:3.6.1
db:
restart: always
build: ./db
virtuoso:
restart: always
build: ./virtuoso
environment:
- DBA_PASSWORD=dba
- SPARQL_UPDATE=false
- DEFAULT_GRAPH=http://lodstats.aksw.org/
nginx:
build: ./nginx
restart: always
links:
- web
- virtuoso
environment:
- VIRTUAL_HOST=lodstats.aksw.org,stats.lod2.eu
LODStats:
Provisioning (cont.)
10
$ git pull https://github.com/AKSW/lodstats.docker
$ docker-compose build
$ docker-compose up -d
11
Data Model
12
Data Web Statistics Summary
More statistics are available from SPARQL endpoint
2011 2016
Datasets 422 9,644
Links 3% 40%
Data Portals datahub.io publicdata.eu,
data.gov, datahub.io
Privacy Analysis
Does dataset
contain sensitive
information?
Coverage Analysis
Does dataset
contain necessary
information?
Quality Analysis
Define quality
metrics using
statistical data.
Vocabulary Reuse
Find a suitable
vocabulary for
your dataset.
13
How can you use LODStats data?
Use Cases
Link Target Identification
Which datasets are good
candidates for
interlinking?
“Detecting Similar Linked Datasets Using Topic Modelling” (ESWC’2016) [3]
14
Sparql Endpoint
http://lodstats.aksw.org/sparql
Availability
● Application
○ Online at: http://lodstats.aksw.org
○ LODStats processing module: https://github.com/aksw/lodstats
○ LODStats frontend including SPARQLify mappings:
https://github.com/aksw/lodstats_www
○ Deployment setup (docker): https://github.com/AKSW/lodstats.docker
● Dataset
○ Online at: http://lodstats.aksw.org/sparql
○ Datahub.io: https://datahub.io/dataset/lodstats
○ Can be deployed in Virtuoso using docker-compose from deployment repo
Processing of very large datasets (Spark/Hadoop)
Improving usability of the frontend
Extending data collection to crawling
Conclusions & Future Work
LODStats is easily replicable using Docker technology
Augustusplatz 10,
Room P905,
04109 Leipzig, Germany
Address
+49-341-97-32260
Phone
iermilov@informatik.uni-leipzig.de
Email
twitter.com/akswgroup
http://aksw.com/IvanErmilov
17
Contact Information
Thank You
Ivan Ermilov <iermilov@informatik.uni-leipzig.de>
Linked Open Data Statistics: Collection and Exploitation by Ivan Ermilov, Michael Martin,
Jens Lehmann, and Sören Auer in Proceedings of the 4th Conference on Knowledge Engineering
and Semantic Web
LODStats---An Extensible Framework for High-performance Dataset Analytics by Jan
Demter, Sören Auer, Michael Martin, and Jens Lehmann in Proceedings of the EKAW 2012
References
1
2
Detecting Similar Linked Datasets Using Topic Modelling by Michael Röder, Axel-Cyrille
Ngonga Ngomo, Ivan Ermilov, and Andreas Both in The Semantic Web. Latest Advances and
New Domains: 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29 --
June 2, 2016, Proceedings
3

Weitere ähnliche Inhalte

Was ist angesagt?

Societal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending ComparisonSocietal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending ComparisonBigData_Europe
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Icebergkbajda
 
BDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical OverviewBDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical OverviewBigData_Europe
 
SC1 Workshop 2 Technical overview
SC1 Workshop 2 Technical overviewSC1 Workshop 2 Technical overview
SC1 Workshop 2 Technical overviewBigData_Europe
 
Updates from Hungary (Jozsef Kovacs)
Updates from Hungary (Jozsef Kovacs)Updates from Hungary (Jozsef Kovacs)
Updates from Hungary (Jozsef Kovacs)EOSC-hub project
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3DataWorks Summit
 
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...BigData_Europe
 
Alexander Pavlenko, Java Software Engineer, DataArt.
Alexander Pavlenko, Java Software Engineer, DataArt.Alexander Pavlenko, Java Software Engineer, DataArt.
Alexander Pavlenko, Java Software Engineer, DataArt.Alina Vilk
 
10 basic terms so you can talk to data engineer
10 basic terms so you can  talk to data engineer10 basic terms so you can  talk to data engineer
10 basic terms so you can talk to data engineerWorapol Alex Pongpech, PhD
 
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Sergio Fernández
 
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...OpenNebula Project
 
shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014
shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014
shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014Gerd König
 
ICIC 2016: New Product Introductions FIZ Karlsruhe / STN
ICIC 2016: New Product Introductions FIZ Karlsruhe / STNICIC 2016: New Product Introductions FIZ Karlsruhe / STN
ICIC 2016: New Product Introductions FIZ Karlsruhe / STNDr. Haxel Consult
 
Data access and data extraction services within the Land Imagery Portal
Data access and data extraction services within the Land Imagery PortalData access and data extraction services within the Land Imagery Portal
Data access and data extraction services within the Land Imagery PortalGasperi Jerome
 
Hadoop at LinkedIn
Hadoop at LinkedInHadoop at LinkedIn
Hadoop at LinkedInKeith Dsouza
 
Spark - The beginnings
Spark -  The beginningsSpark -  The beginnings
Spark - The beginningsDaniel Leon
 
New web service oriented ARC
New web service oriented ARCNew web service oriented ARC
New web service oriented ARCFerenc Szalai
 
Semantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12cSemantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12cMartin Toshev
 

Was ist angesagt? (20)

Societal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending ComparisonSocietal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending Comparison
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
BDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical OverviewBDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical Overview
 
SC1 Workshop 2 Technical overview
SC1 Workshop 2 Technical overviewSC1 Workshop 2 Technical overview
SC1 Workshop 2 Technical overview
 
Updates from Hungary (Jozsef Kovacs)
Updates from Hungary (Jozsef Kovacs)Updates from Hungary (Jozsef Kovacs)
Updates from Hungary (Jozsef Kovacs)
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
 
Alexander Pavlenko, Java Software Engineer, DataArt.
Alexander Pavlenko, Java Software Engineer, DataArt.Alexander Pavlenko, Java Software Engineer, DataArt.
Alexander Pavlenko, Java Software Engineer, DataArt.
 
10 basic terms so you can talk to data engineer
10 basic terms so you can  talk to data engineer10 basic terms so you can  talk to data engineer
10 basic terms so you can talk to data engineer
 
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
 
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
 
shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014
shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014
shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014
 
ICIC 2016: New Product Introductions FIZ Karlsruhe / STN
ICIC 2016: New Product Introductions FIZ Karlsruhe / STNICIC 2016: New Product Introductions FIZ Karlsruhe / STN
ICIC 2016: New Product Introductions FIZ Karlsruhe / STN
 
Data access and data extraction services within the Land Imagery Portal
Data access and data extraction services within the Land Imagery PortalData access and data extraction services within the Land Imagery Portal
Data access and data extraction services within the Land Imagery Portal
 
Hadoop at LinkedIn
Hadoop at LinkedInHadoop at LinkedIn
Hadoop at LinkedIn
 
Spark - The beginnings
Spark -  The beginningsSpark -  The beginnings
Spark - The beginnings
 
New web service oriented ARC
New web service oriented ARCNew web service oriented ARC
New web service oriented ARC
 
Semantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12cSemantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12c
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
 
Mastro
MastroMastro
Mastro
 

Andere mochten auch

Infographic - MSP AWS Migration
Infographic - MSP AWS MigrationInfographic - MSP AWS Migration
Infographic - MSP AWS MigrationCopperEgg
 
First Day
First Day First Day
First Day jmori1
 
Compare mysql5.1.50 mysql5.5.8
Compare mysql5.1.50 mysql5.5.8Compare mysql5.1.50 mysql5.5.8
Compare mysql5.1.50 mysql5.5.8Philip Zhong
 
Java swing tips
Java swing tipsJava swing tips
Java swing tipsTuan Ngo
 
Elements of Art - nf
Elements of Art - nfElements of Art - nf
Elements of Art - nfmindartpower
 
Assignment 4 - Certification in Dispute Management
Assignment 4 - Certification in Dispute ManagementAssignment 4 - Certification in Dispute Management
Assignment 4 - Certification in Dispute ManagementJyotpreet Kaur
 
Traditional may day celebrations
Traditional may day celebrationsTraditional may day celebrations
Traditional may day celebrationsbalada65
 
upcoming commercial projects in dwarka expressway 7428424386
upcoming commercial projects in dwarka expressway 7428424386upcoming commercial projects in dwarka expressway 7428424386
upcoming commercial projects in dwarka expressway 7428424386Adore Global Pvt. Ltd
 
Infographic - The State of Application Performance Monitoring
Infographic - The State of Application Performance MonitoringInfographic - The State of Application Performance Monitoring
Infographic - The State of Application Performance MonitoringCopperEgg
 
Kmeans vs kmeanspp_20151124
Kmeans vs kmeanspp_20151124Kmeans vs kmeanspp_20151124
Kmeans vs kmeanspp_20151124博三 太田
 
Track2 -刘继伟--openstack in gamewave
Track2 -刘继伟--openstack in gamewaveTrack2 -刘继伟--openstack in gamewave
Track2 -刘继伟--openstack in gamewaveOpenCity Community
 
Burgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 hBurgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 hMieke Sanden, van der
 
เครื่องใช้ไฟฟ้า
เครื่องใช้ไฟฟ้าเครื่องใช้ไฟฟ้า
เครื่องใช้ไฟฟ้าthananat
 
Backtoschoolnight
BacktoschoolnightBacktoschoolnight
Backtoschoolnighthdaleo
 
Toward a malaria-free world - Tools' information
Toward a malaria-free world - Tools' informationToward a malaria-free world - Tools' information
Toward a malaria-free world - Tools' informationXplore Health
 
How are drugs developed? - Tools' information
How are drugs developed? - Tools' informationHow are drugs developed? - Tools' information
How are drugs developed? - Tools' informationXplore Health
 
Burgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 hBurgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 hMieke Sanden, van der
 

Andere mochten auch (20)

Infographic - MSP AWS Migration
Infographic - MSP AWS MigrationInfographic - MSP AWS Migration
Infographic - MSP AWS Migration
 
First Day
First Day First Day
First Day
 
Compare mysql5.1.50 mysql5.5.8
Compare mysql5.1.50 mysql5.5.8Compare mysql5.1.50 mysql5.5.8
Compare mysql5.1.50 mysql5.5.8
 
Java swing tips
Java swing tipsJava swing tips
Java swing tips
 
Elements of Art - nf
Elements of Art - nfElements of Art - nf
Elements of Art - nf
 
1interview1 golda
1interview1 golda1interview1 golda
1interview1 golda
 
Assignment 4 - Certification in Dispute Management
Assignment 4 - Certification in Dispute ManagementAssignment 4 - Certification in Dispute Management
Assignment 4 - Certification in Dispute Management
 
Traditional may day celebrations
Traditional may day celebrationsTraditional may day celebrations
Traditional may day celebrations
 
upcoming commercial projects in dwarka expressway 7428424386
upcoming commercial projects in dwarka expressway 7428424386upcoming commercial projects in dwarka expressway 7428424386
upcoming commercial projects in dwarka expressway 7428424386
 
MyEpcTeam v1.1
MyEpcTeam v1.1MyEpcTeam v1.1
MyEpcTeam v1.1
 
Infographic - The State of Application Performance Monitoring
Infographic - The State of Application Performance MonitoringInfographic - The State of Application Performance Monitoring
Infographic - The State of Application Performance Monitoring
 
Kmeans vs kmeanspp_20151124
Kmeans vs kmeanspp_20151124Kmeans vs kmeanspp_20151124
Kmeans vs kmeanspp_20151124
 
Track2 -刘继伟--openstack in gamewave
Track2 -刘继伟--openstack in gamewaveTrack2 -刘继伟--openstack in gamewave
Track2 -刘继伟--openstack in gamewave
 
Burgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 hBurgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 h
 
เครื่องใช้ไฟฟ้า
เครื่องใช้ไฟฟ้าเครื่องใช้ไฟฟ้า
เครื่องใช้ไฟฟ้า
 
Backtoschoolnight
BacktoschoolnightBacktoschoolnight
Backtoschoolnight
 
Toward a malaria-free world - Tools' information
Toward a malaria-free world - Tools' informationToward a malaria-free world - Tools' information
Toward a malaria-free world - Tools' information
 
How are drugs developed? - Tools' information
How are drugs developed? - Tools' informationHow are drugs developed? - Tools' information
How are drugs developed? - Tools' information
 
Lightning Talk
Lightning TalkLightning Talk
Lightning Talk
 
Burgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 hBurgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 h
 

Ähnlich wie Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overviewBigData_Europe
 
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, SmileOCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, SmileOCCIware
 
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017Marc Dutoo
 
Introduction to Orchestration and DevOps with OpenStack
Introduction to Orchestration and DevOps with OpenStackIntroduction to Orchestration and DevOps with OpenStack
Introduction to Orchestration and DevOps with OpenStackAbderrahmane TEKFI
 
BigDataEurope @BDVA Summit2016 1: The BDE Platform
BigDataEurope @BDVA Summit2016 1: The BDE PlatformBigDataEurope @BDVA Summit2016 1: The BDE Platform
BigDataEurope @BDVA Summit2016 1: The BDE PlatformBigData_Europe
 
Integrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesIntegrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesDataWorks Summit
 
BigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigData_Europe
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshSion Smith
 
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformOCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformMarc Dutoo
 
LarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - IntroductionLarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - IntroductionLarKC
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
Database Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wiDatabase Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wiOllieShoresna
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OW2
 
OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OCCIware
 
OCCIware@OW2con 2016
OCCIware@OW2con 2016OCCIware@OW2con 2016
OCCIware@OW2con 2016Marc Dutoo
 
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware
 
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...OW2
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowWes McKinney
 
Stargate, the gateway for some multi-models data API
Stargate, the gateway for some multi-models data APIStargate, the gateway for some multi-models data API
Stargate, the gateway for some multi-models data APIData Con LA
 

Ähnlich wie Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016 (20)

BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, SmileOCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
 
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
 
Introduction to Orchestration and DevOps with OpenStack
Introduction to Orchestration and DevOps with OpenStackIntroduction to Orchestration and DevOps with OpenStack
Introduction to Orchestration and DevOps with OpenStack
 
BigDataEurope @BDVA Summit2016 1: The BDE Platform
BigDataEurope @BDVA Summit2016 1: The BDE PlatformBigDataEurope @BDVA Summit2016 1: The BDE Platform
BigDataEurope @BDVA Summit2016 1: The BDE Platform
 
Integrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesIntegrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query Engines
 
BigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal Pilots
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformOCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
 
LarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - IntroductionLarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - Introduction
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Database Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wiDatabase Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wi
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...
 
OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...
 
OCCIware@OW2con 2016
OCCIware@OW2con 2016OCCIware@OW2con 2016
OCCIware@OW2con 2016
 
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
 
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Stargate, the gateway for some multi-models data API
Stargate, the gateway for some multi-models data APIStargate, the gateway for some multi-models data API
Stargate, the gateway for some multi-models data API
 

Kürzlich hochgeladen

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Kürzlich hochgeladen (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

  • 2. Introduction Description and System Architecture Dataset Model Use Cases Agenda Data Web Statistics (Summary) Conclusions
  • 3. How to comprehend this data? 3 ● Data portals ● Big nucleus datasets ● SPARQL endpoints Introduction
  • 4. 9960+ RDF Datasets on the Data Portals 4
  • 5. Calculate statistical metrics User interface 5 Aggregates datasets from the largest data portals LODStats: Web Application SPARQL interface “LODStats – An Extensible Framework for High-performance Dataset Analytics” (EKAW’2012) [1]
  • 6. 6 CKAN Aggregator LODStats: System Architecture Scan largest CKAN repos Filter out RDF datasets “Linked Open Data Statistics: Collection and Exploitation” (KESW’2013) [2]
  • 7. 7 LODStats core application LODStats: System Architecture (cont.) Queue RDF datasets Calculate statistics
  • 8. LODStats: Provisioning Docker image per component docker-compose.yml for the whole project Sustainable and platform independent deployment 8
  • 9. LODStats: Provisioning (cont.) 9 web: restart: always build: ./web links: - db - rabbitmq environment: - LODSTATS_DB=db - RABBITMQ=rabbitmq rabbitmq: restart: always image: rabbitmq:3.6.1 db: restart: always build: ./db virtuoso: restart: always build: ./virtuoso environment: - DBA_PASSWORD=dba - SPARQL_UPDATE=false - DEFAULT_GRAPH=http://lodstats.aksw.org/ nginx: build: ./nginx restart: always links: - web - virtuoso environment: - VIRTUAL_HOST=lodstats.aksw.org,stats.lod2.eu
  • 10. LODStats: Provisioning (cont.) 10 $ git pull https://github.com/AKSW/lodstats.docker $ docker-compose build $ docker-compose up -d
  • 12. 12 Data Web Statistics Summary More statistics are available from SPARQL endpoint 2011 2016 Datasets 422 9,644 Links 3% 40% Data Portals datahub.io publicdata.eu, data.gov, datahub.io
  • 13. Privacy Analysis Does dataset contain sensitive information? Coverage Analysis Does dataset contain necessary information? Quality Analysis Define quality metrics using statistical data. Vocabulary Reuse Find a suitable vocabulary for your dataset. 13 How can you use LODStats data? Use Cases Link Target Identification Which datasets are good candidates for interlinking? “Detecting Similar Linked Datasets Using Topic Modelling” (ESWC’2016) [3]
  • 15. Availability ● Application ○ Online at: http://lodstats.aksw.org ○ LODStats processing module: https://github.com/aksw/lodstats ○ LODStats frontend including SPARQLify mappings: https://github.com/aksw/lodstats_www ○ Deployment setup (docker): https://github.com/AKSW/lodstats.docker ● Dataset ○ Online at: http://lodstats.aksw.org/sparql ○ Datahub.io: https://datahub.io/dataset/lodstats ○ Can be deployed in Virtuoso using docker-compose from deployment repo
  • 16. Processing of very large datasets (Spark/Hadoop) Improving usability of the frontend Extending data collection to crawling Conclusions & Future Work LODStats is easily replicable using Docker technology
  • 17. Augustusplatz 10, Room P905, 04109 Leipzig, Germany Address +49-341-97-32260 Phone iermilov@informatik.uni-leipzig.de Email twitter.com/akswgroup http://aksw.com/IvanErmilov 17 Contact Information
  • 18. Thank You Ivan Ermilov <iermilov@informatik.uni-leipzig.de>
  • 19. Linked Open Data Statistics: Collection and Exploitation by Ivan Ermilov, Michael Martin, Jens Lehmann, and Sören Auer in Proceedings of the 4th Conference on Knowledge Engineering and Semantic Web LODStats---An Extensible Framework for High-performance Dataset Analytics by Jan Demter, Sören Auer, Michael Martin, and Jens Lehmann in Proceedings of the EKAW 2012 References 1 2 Detecting Similar Linked Datasets Using Topic Modelling by Michael Röder, Axel-Cyrille Ngonga Ngomo, Ivan Ermilov, and Andreas Both in The Semantic Web. Latest Advances and New Domains: 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29 -- June 2, 2016, Proceedings 3