SlideShare ist ein Scribd-Unternehmen logo
1 von 76
Downloaden Sie, um offline zu lesen
Autodiscovery
or
The long tail of open data
Christopher Gutteridge
University of Southampton
& data.ac.uk
Bragsheet
Christopher Gutteridge - @cgutteridge
• Previously; Lead Developer of EPrints
(Open access research repository software).
• “Linked Open Data Architect” for University of
Southampton.
(or whatever we’re currently call doing LOD stuff for an
organisation)
• Benevolent technical dictator of data.ac.uk
(recently deposed)
• Webmaster WWW2006
• Assistant Webmaster WWW2007, WWW2009
Image Attributions:
• Backgrounds:
– http://www.fansshare.com/gallery/photos/14646865/abstract-
background-brown-and-blue-circles/
– http://www.pptback.com/old-machine-gears-
pptbackground.html
• Cliff leap pic: Justin De La Ornellas @ Flickr
• Train tracks: duncanh1 @ Flickr
• Lego bricks: rawdonfox @ Flickr
• Mechano Box: Lady alys @ Wikipedia
• Stickle Bricks: Simon Jobling @ Flickr
• Free Universal Construction Kit: F.A.T. Lab + Sy-Lab.
• Telescope: Brongaeh @ Flickr
• Pinata: Peasap @ Flickr
• Containers: l2f1 @ Flickr
Why don’t organisations
share data?
(and what stops them)
Us early adopters have shared
data because it’s cool.
We were not 100% clear on the
benefits but it looks like fun and
maybe gains us reputation.
Fear. Uncertainty. Doubt.
Open Data Excuse Bingo
Terrorists will
use it
We'll get spam It's too big
It's not very
interesting
Thieves will use
it
I don't mind,
but someone
else might
We will get too
many enquiries
Lawyers want a
custom License
There's no API Poor Quality
There's already
a project to...
We might want
to use it in a
paper
It's too
complicated
Data Protection
People may
misinterpret
the data
What if we
want to sell it
later
Don’t get depressed! Go here for antidotes: http://is.gd/odbingo
Menu
Burger ….. £3.50
Chips ….. £1.50
≠
Greater than the sum of
its parts
Interoperable datasets
allow results that are
greater than the sum
of the parts…
11
bu
http://bus.southampton.ac.uk/
13
14
15
16
http://www.minecraftworldmap.com/worlds/xO
3X4/full#/4469/64/-1806/-3/0/0
data.southampton.ac.uk
Discrete
Facts
Statistitics
What I want from data
• Where am I going?
• How can I get there?
• Where can I get a coffee enroute?
Why aren’t they using
our data?
“If you build it, they will come.”
“If you build it, they will come.”
Value of dataset to audience
X
Potential audience size
X
Ease of discovery
X
Ease of grasping the value of the dataset
X
Ease of exploiting dataset
Probability of open dataset reuse =
Value of dataset to audience
X
Potential audience size
X
Ease of discovery
X
Ease of grasping the value of the dataset
X
Ease of exploiting dataset
X
Perceived quality & reliability
Probability of open dataset reuse =
…Autodiscoverable
and interoperable data
can massively increase
the potential audience
28
$ ./generate-world Demo
--postcode PO381NL
--size 250
29
$ ./generate-world Demo
--postcode PO381NL
--size 250
30
data.ac.uk
• Automatically discovers equipment data from
all .ac.uk sites
– 2769 websites
– 42 providing data
– 11,028 records
• Automation massively reduces staffing costs
• Low effort for institutions-
– A third just provide a well-structured spreadsheet!
• Not a single-point-of-failure
32
.ac.uk
33
UK National Equipment Portal
34
http://equipment.data.ac.uk
UNIQUIP
Column Heading Required
Type No
Name
At least one of these fields must be completed.
Description
Related Facility ID No
Technique(:cpv) or (:N8) No
Location No
Contact Name No
Contact Telephone
At least one of these fields must be completed.Contact URL
Contact Email
Secondary Contact Name No
Secondary Contact Telephone
At least one of these fields must be completed with
second contact name.
Secondary Contact URL
Secondary Contact Email
ID No
Photo No
Department No
Site Location Yes
Building No
Service Level No
Web Address No
35
36
.ac.uk
Doin’ it on the cheap
37
Doin’ it on the cheap
38
Ensuring a sustainable
service through
autodiscovery
39
Sustainability via Autodiscovery
• How do we add new datasets?
• How are changes made?
• How do we know the data is open data?
Sustainability via Autodiscovery
• Have a machine readable document
describing the institution and any open
datasets (with licences)
• Place a link to it on the Institutions homepage
/.well-known/openorg
http://www.soton.ac.uk/.well-known/openorg
or
<link rel=“openorg”
href=“http://id.southampton.ac.uk/dataset/pr
ofile/latest”>
/.well-known/openorg
http://www.soton.ac.uk/.well-known/openorg
or
<link rel=“openorg”
href=“http://id.southampton.ac.uk/dataset/pr
ofile/latest”>
What is an Organisation Profile Document,
44
A RDF Document that describes the organisation:
– General information provided:
• Official name, Postal address, Contact phone number,The correct logo,
Physical location
– Links to the parts of the organisation,
• Admissions, Alumni, Freedom of Information, Complaints
– A semantic sitemap
• Key pages such as jobs, news, events…
– Links to the organisation’s discoverable open data sets and APIs
• The equipment dataset
What is an Organisation Profile
Document,
45
46
Autodiscovery
47
Autodiscovery
48
• Dataset publicly available on website.
• Dataset has to be added manually along with all the institutions details,
contacts etc
Requires staff time (especially if any dataset changes location)
Autodiscovery
49
• Dataset publicly available on website.
• Dataset has to be added manually along with all the institutions details,
contacts etc
Requires staff time (especially if any dataset changes location)
• Organisation has an OPD linking to dataset
• The OPD has to be added manually, but the dataset location and
institution info is consumed directly from the OPD.
Requires less staff time (as any changes made to OPD will get updated)
Autodiscovery
50
• Dataset publicly available on website.
• Dataset has to be added manually along with all the institutions details,
contacts etc
Requires staff time (especially if any dataset changes location)
• Organisation has an OPD linking to dataset
• The OPD has to be added manually, but the dataset location and
institution info is consumed directly from the OPD.
Requires less staff time (as any changes made to OPD will get updated)
• Link to OPD from organisation’s home page
• OPD autodiscovered, so the dataset is automatically added to the
service.
Requires no staff time (as data is autodiscovered)
Never appeal to a man’s “better
nature.” He may not have one.
Invoking his “self—interest”
gives you more leverage.
- Robert Heinlein, “The Notebooks of Lazarus Long”
Status Report – Contributors and data statistics
52
Bronze Silver Gold
Data is on the internet and in an
acceptable format.
✔ ✔ ✔
Description of dataset is provided by a
remotely hosted OPD
✔ ✔
The OPD is discovered via
autodiscovery.
✔
The OPD/dataset has a recognised and
supported open licence (eg CCO, ODCA
or OGL)
✔
53
Bronze Silver Gold
Data is on the internet and in an
acceptable format.
✔ ✔ ✔
Description of dataset is provided by a
remotely hosted OPD
✔ ✔
The OPD is discovered via
autodiscovery.
✔
The OPD/dataset has a recognised and
supported open licence (eg CCO, ODCA
or OGL)
✔
All items in the dataset are assigned an
ID code which is unique within the
assigning organisation.
✔
54
Exploiting profile
documents
Exploiting profile documents
• We’ve barely begun
• Lets try a live demo....
Warning:
Metaphor mixing detected
63
Needless heterogeneity means research doesn’t join up.
Aligning datasets every time
costs too much.
Tools can’t be reused
So what do we do about it?
Building easy-to-use tools to cross
between formats, platforms and
paradigms is very specialist work.
Building easy-to-use tools to cross
between formats, platforms and
paradigms is very specialist work.
The solutions need to be discoverable.
Building easy-to-use tools to cross
between formats, platforms and
paradigms is very specialist work.
The solutions need to be discoverable.
Just putting it on Github is not
making a tool discoverable!
Building easy-to-use tools to cross
between formats, platforms and
paradigms is very specialist work.
The solutions need to be discoverable.
Just putting it on Github is not
making a tool discoverable!
https://github.com/cgutteridge
/
Organisation Datasets
Well known formats
available for:
• Events
• Publications
• News headlines
Nothing in common use for:
• Staff Expertise
• Programmes of Events
• Vacancies
• Organisational Structure
• Buildings, Rooms
• Points of service
• Products
– Food Menus
RDF or XML Vocabularies
don’t solve the problem
by themselves.
You need:
Examples to copy.
Tools which consume
and produce the format.
Online checking tools.
A dataset should at
least solve one
usecase.
Over modelling is fun.
Stop it.
• TODO:
• OPD DOCUMENTATION
Thank-you.
Christopher Gutteridge
University of Southampton
@cgutteridge
cjg@ecs.soton.ac.uk
http://opd.data.ac.uk/

Weitere ähnliche Inhalte

Was ist angesagt?

TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen TechnologienTFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen TechnologienTourismFastForward
 
Where are my predictions? I needed them yesterday!(A recipe for Automated Dat...
Where are my predictions? I needed them yesterday!(A recipe for Automated Dat...Where are my predictions? I needed them yesterday!(A recipe for Automated Dat...
Where are my predictions? I needed them yesterday!(A recipe for Automated Dat...eRic Choo
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Edureka!
 
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...BigMine
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data ScienceBrijeshGoyani
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big DataLewis Crawford
 
Big data – An Introduction, July 2013
Big data – An Introduction, July 2013Big data – An Introduction, July 2013
Big data – An Introduction, July 2013Peter Morgan
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Gabriel Moreira
 
Stanford DeepDive Framework
Stanford DeepDive FrameworkStanford DeepDive Framework
Stanford DeepDive FrameworkRan Zhang
 
Analysis of big data in pandemic case
Analysis of big data in pandemic case Analysis of big data in pandemic case
Analysis of big data in pandemic case Muh Saleh
 
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 Unexpected Challenges in Large Scale Machine Learning by Charles Parker Unexpected Challenges in Large Scale Machine Learning by Charles Parker
Unexpected Challenges in Large Scale Machine Learning by Charles ParkerBigMine
 
Big Data Content Organization, Discovery, and Management
Big Data Content Organization, Discovery, and ManagementBig Data Content Organization, Discovery, and Management
Big Data Content Organization, Discovery, and ManagementAccess Innovations, Inc.
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldDez Blanchfield
 
Mass tlc big data panel sep 20
Mass tlc big data panel sep 20Mass tlc big data panel sep 20
Mass tlc big data panel sep 20MassTLC
 
MassTLC Big Data Seminar Sept 20
MassTLC Big Data Seminar Sept 20MassTLC Big Data Seminar Sept 20
MassTLC Big Data Seminar Sept 20MassTLC
 
Using Graphs to Enable National-Scale Analytics
Using Graphs to Enable National-Scale AnalyticsUsing Graphs to Enable National-Scale Analytics
Using Graphs to Enable National-Scale AnalyticsNeo4j
 

Was ist angesagt? (20)

TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen TechnologienTFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
 
Where are my predictions? I needed them yesterday!(A recipe for Automated Dat...
Where are my predictions? I needed them yesterday!(A recipe for Automated Dat...Where are my predictions? I needed them yesterday!(A recipe for Automated Dat...
Where are my predictions? I needed them yesterday!(A recipe for Automated Dat...
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?
 
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
 
Big data – An Introduction, July 2013
Big data – An Introduction, July 2013Big data – An Introduction, July 2013
Big data – An Introduction, July 2013
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
 
Stanford DeepDive Framework
Stanford DeepDive FrameworkStanford DeepDive Framework
Stanford DeepDive Framework
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
 
Analysis of big data in pandemic case
Analysis of big data in pandemic case Analysis of big data in pandemic case
Analysis of big data in pandemic case
 
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 Unexpected Challenges in Large Scale Machine Learning by Charles Parker Unexpected Challenges in Large Scale Machine Learning by Charles Parker
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 
Big Data Content Organization, Discovery, and Management
Big Data Content Organization, Discovery, and ManagementBig Data Content Organization, Discovery, and Management
Big Data Content Organization, Discovery, and Management
 
From Big Data to Fast Data
From Big Data to Fast DataFrom Big Data to Fast Data
From Big Data to Fast Data
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
 
Mass tlc big data panel sep 20
Mass tlc big data panel sep 20Mass tlc big data panel sep 20
Mass tlc big data panel sep 20
 
Big Data
Big DataBig Data
Big Data
 
MassTLC Big Data Seminar Sept 20
MassTLC Big Data Seminar Sept 20MassTLC Big Data Seminar Sept 20
MassTLC Big Data Seminar Sept 20
 
Using Graphs to Enable National-Scale Analytics
Using Graphs to Enable National-Scale AnalyticsUsing Graphs to Enable National-Scale Analytics
Using Graphs to Enable National-Scale Analytics
 
INEGI ESS big data workshop
INEGI ESS big data workshopINEGI ESS big data workshop
INEGI ESS big data workshop
 

Andere mochten auch

Estructura del Computador
Estructura del Computador Estructura del Computador
Estructura del Computador Coraima Fiducia
 
Customer Analytics
Customer AnalyticsCustomer Analytics
Customer AnalyticsSunil Kappal
 
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jExplicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jConnected Data World
 
National Basketball Association Industry Analysis
National Basketball Association Industry AnalysisNational Basketball Association Industry Analysis
National Basketball Association Industry AnalysisThomas Salierno
 
Introduction to CaseWare IDEA - Designed by Auditors for Auditors
Introduction to CaseWare IDEA - Designed by Auditors for AuditorsIntroduction to CaseWare IDEA - Designed by Auditors for Auditors
Introduction to CaseWare IDEA - Designed by Auditors for AuditorsCaseWare IDEA
 
Improving Audit Effectiveness / Efficiency by Leveraging Data Analytics
Improving Audit Effectiveness / Efficiency by Leveraging Data AnalyticsImproving Audit Effectiveness / Efficiency by Leveraging Data Analytics
Improving Audit Effectiveness / Efficiency by Leveraging Data AnalyticsBrent Hutchings
 
Sport Globalization in the N.B.A.
Sport Globalization in the N.B.A.Sport Globalization in the N.B.A.
Sport Globalization in the N.B.A.Makayla Boyink
 
Lecture on muscular system
Lecture on muscular systemLecture on muscular system
Lecture on muscular systemMirza Anwar Baig
 
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial FraudGraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial FraudNeo4j
 
GraphDay Stockholm - Telia Zone
GraphDay Stockholm - Telia Zone GraphDay Stockholm - Telia Zone
GraphDay Stockholm - Telia Zone Neo4j
 

Andere mochten auch (10)

Estructura del Computador
Estructura del Computador Estructura del Computador
Estructura del Computador
 
Customer Analytics
Customer AnalyticsCustomer Analytics
Customer Analytics
 
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jExplicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
 
National Basketball Association Industry Analysis
National Basketball Association Industry AnalysisNational Basketball Association Industry Analysis
National Basketball Association Industry Analysis
 
Introduction to CaseWare IDEA - Designed by Auditors for Auditors
Introduction to CaseWare IDEA - Designed by Auditors for AuditorsIntroduction to CaseWare IDEA - Designed by Auditors for Auditors
Introduction to CaseWare IDEA - Designed by Auditors for Auditors
 
Improving Audit Effectiveness / Efficiency by Leveraging Data Analytics
Improving Audit Effectiveness / Efficiency by Leveraging Data AnalyticsImproving Audit Effectiveness / Efficiency by Leveraging Data Analytics
Improving Audit Effectiveness / Efficiency by Leveraging Data Analytics
 
Sport Globalization in the N.B.A.
Sport Globalization in the N.B.A.Sport Globalization in the N.B.A.
Sport Globalization in the N.B.A.
 
Lecture on muscular system
Lecture on muscular systemLecture on muscular system
Lecture on muscular system
 
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial FraudGraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
 
GraphDay Stockholm - Telia Zone
GraphDay Stockholm - Telia Zone GraphDay Stockholm - Telia Zone
GraphDay Stockholm - Telia Zone
 

Ähnlich wie Autodiscovery or The long tail of open data

Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016Grid Protection Alliance
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Searching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data WorldSearching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data WorldOpenSource Connections
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...
Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...
Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...Experfy
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Srinath Perera
 
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...InfluxData
 
Deconstructing Lambda
Deconstructing LambdaDeconstructing Lambda
Deconstructing Lambdadarach
 
Exploring and Using the Python Ecosystem
Exploring and Using the Python EcosystemExploring and Using the Python Ecosystem
Exploring and Using the Python EcosystemAdam Cook
 
Tci sfsu-uo h-2015
Tci sfsu-uo h-2015Tci sfsu-uo h-2015
Tci sfsu-uo h-2015Sameer Verma
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with SparkKrishna Sankar
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentationTao Feng
 
PSUG 1 - 2024-01-22 - Onboarding Best Practices
PSUG 1 - 2024-01-22 - Onboarding Best PracticesPSUG 1 - 2024-01-22 - Onboarding Best Practices
PSUG 1 - 2024-01-22 - Onboarding Best PracticesTomas Moser
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsIlkay Altintas, Ph.D.
 
Enabling Your Data Science Team with Modern Data Engineering
Enabling Your Data Science Team with Modern Data EngineeringEnabling Your Data Science Team with Modern Data Engineering
Enabling Your Data Science Team with Modern Data EngineeringJames Densmore
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptxShree Shree
 
(ATS6-APP01) Unleashing the Power of Your Data with Discoverant
(ATS6-APP01) Unleashing the Power of Your Data with Discoverant(ATS6-APP01) Unleashing the Power of Your Data with Discoverant
(ATS6-APP01) Unleashing the Power of Your Data with DiscoverantBIOVIA
 

Ähnlich wie Autodiscovery or The long tail of open data (20)

Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Searching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data WorldSearching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data World
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...
Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...
Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...
 
Deconstructing Lambda
Deconstructing LambdaDeconstructing Lambda
Deconstructing Lambda
 
Exploring and Using the Python Ecosystem
Exploring and Using the Python EcosystemExploring and Using the Python Ecosystem
Exploring and Using the Python Ecosystem
 
Tci sfsu-uo h-2015
Tci sfsu-uo h-2015Tci sfsu-uo h-2015
Tci sfsu-uo h-2015
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
PSUG 1 - 2024-01-22 - Onboarding Best Practices
PSUG 1 - 2024-01-22 - Onboarding Best PracticesPSUG 1 - 2024-01-22 - Onboarding Best Practices
PSUG 1 - 2024-01-22 - Onboarding Best Practices
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
 
Enabling Your Data Science Team with Modern Data Engineering
Enabling Your Data Science Team with Modern Data EngineeringEnabling Your Data Science Team with Modern Data Engineering
Enabling Your Data Science Team with Modern Data Engineering
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
 
(ATS6-APP01) Unleashing the Power of Your Data with Discoverant
(ATS6-APP01) Unleashing the Power of Your Data with Discoverant(ATS6-APP01) Unleashing the Power of Your Data with Discoverant
(ATS6-APP01) Unleashing the Power of Your Data with Discoverant
 
Beyond User Research
Beyond User ResearchBeyond User Research
Beyond User Research
 

Mehr von Connected Data World

Systems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenSystems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenConnected Data World
 
Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaConnected Data World
 
How to get started with Graph Machine Learning
How to get started with Graph Machine LearningHow to get started with Graph Machine Learning
How to get started with Graph Machine LearningConnected Data World
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is hereConnected Data World
 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2Connected Data World
 
From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3Connected Data World
 
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data ModelConnected Data World
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseConnected Data World
 
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Connected Data World
 
Semantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleSemantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleConnected Data World
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Connected Data World
 
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the WebConnected Data World
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsConnected Data World
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...Connected Data World
 
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGOConnected Data World
 
What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?Connected Data World
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesConnected Data World
 

Mehr von Connected Data World (20)

Systems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenSystems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van Harmelen
 
Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora Lassila
 
How to get started with Graph Machine Learning
How to get started with Graph Machine LearningHow to get started with Graph Machine Learning
How to get started with Graph Machine Learning
 
Graphs in sustainable finance
Graphs in sustainable financeGraphs in sustainable finance
Graphs in sustainable finance
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is here
 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
 
From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3
 
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data Model
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
 
Graph Realities
Graph RealitiesGraph Realities
Graph Realities
 
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
 
Semantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleSemantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scale
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
 
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the Web
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property Graphs
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
 
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGO
 
What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
 

Kürzlich hochgeladen

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Autodiscovery or The long tail of open data

  • 1. Autodiscovery or The long tail of open data Christopher Gutteridge University of Southampton & data.ac.uk
  • 2. Bragsheet Christopher Gutteridge - @cgutteridge • Previously; Lead Developer of EPrints (Open access research repository software). • “Linked Open Data Architect” for University of Southampton. (or whatever we’re currently call doing LOD stuff for an organisation) • Benevolent technical dictator of data.ac.uk (recently deposed) • Webmaster WWW2006 • Assistant Webmaster WWW2007, WWW2009
  • 3. Image Attributions: • Backgrounds: – http://www.fansshare.com/gallery/photos/14646865/abstract- background-brown-and-blue-circles/ – http://www.pptback.com/old-machine-gears- pptbackground.html • Cliff leap pic: Justin De La Ornellas @ Flickr • Train tracks: duncanh1 @ Flickr • Lego bricks: rawdonfox @ Flickr • Mechano Box: Lady alys @ Wikipedia • Stickle Bricks: Simon Jobling @ Flickr • Free Universal Construction Kit: F.A.T. Lab + Sy-Lab. • Telescope: Brongaeh @ Flickr • Pinata: Peasap @ Flickr • Containers: l2f1 @ Flickr
  • 4.
  • 5. Why don’t organisations share data? (and what stops them)
  • 6. Us early adopters have shared data because it’s cool. We were not 100% clear on the benefits but it looks like fun and maybe gains us reputation.
  • 8. Open Data Excuse Bingo Terrorists will use it We'll get spam It's too big It's not very interesting Thieves will use it I don't mind, but someone else might We will get too many enquiries Lawyers want a custom License There's no API Poor Quality There's already a project to... We might want to use it in a paper It's too complicated Data Protection People may misinterpret the data What if we want to sell it later Don’t get depressed! Go here for antidotes: http://is.gd/odbingo
  • 10. Greater than the sum of its parts
  • 11. Interoperable datasets allow results that are greater than the sum of the parts… 11
  • 13. 13
  • 14. 14
  • 15. 15
  • 16. 16
  • 17.
  • 21. What I want from data • Where am I going? • How can I get there? • Where can I get a coffee enroute?
  • 22.
  • 23. Why aren’t they using our data?
  • 24. “If you build it, they will come.”
  • 25. “If you build it, they will come.”
  • 26. Value of dataset to audience X Potential audience size X Ease of discovery X Ease of grasping the value of the dataset X Ease of exploiting dataset Probability of open dataset reuse =
  • 27. Value of dataset to audience X Potential audience size X Ease of discovery X Ease of grasping the value of the dataset X Ease of exploiting dataset X Perceived quality & reliability Probability of open dataset reuse =
  • 28. …Autodiscoverable and interoperable data can massively increase the potential audience 28
  • 29. $ ./generate-world Demo --postcode PO381NL --size 250 29
  • 30. $ ./generate-world Demo --postcode PO381NL --size 250 30
  • 32. • Automatically discovers equipment data from all .ac.uk sites – 2769 websites – 42 providing data – 11,028 records • Automation massively reduces staffing costs • Low effort for institutions- – A third just provide a well-structured spreadsheet! • Not a single-point-of-failure 32 .ac.uk
  • 33. 33
  • 34. UK National Equipment Portal 34 http://equipment.data.ac.uk
  • 35. UNIQUIP Column Heading Required Type No Name At least one of these fields must be completed. Description Related Facility ID No Technique(:cpv) or (:N8) No Location No Contact Name No Contact Telephone At least one of these fields must be completed.Contact URL Contact Email Secondary Contact Name No Secondary Contact Telephone At least one of these fields must be completed with second contact name. Secondary Contact URL Secondary Contact Email ID No Photo No Department No Site Location Yes Building No Service Level No Web Address No 35
  • 37. Doin’ it on the cheap 37
  • 38. Doin’ it on the cheap 38
  • 39. Ensuring a sustainable service through autodiscovery 39
  • 40. Sustainability via Autodiscovery • How do we add new datasets? • How are changes made? • How do we know the data is open data?
  • 41. Sustainability via Autodiscovery • Have a machine readable document describing the institution and any open datasets (with licences) • Place a link to it on the Institutions homepage
  • 44. What is an Organisation Profile Document, 44 A RDF Document that describes the organisation: – General information provided: • Official name, Postal address, Contact phone number,The correct logo, Physical location – Links to the parts of the organisation, • Admissions, Alumni, Freedom of Information, Complaints – A semantic sitemap • Key pages such as jobs, news, events… – Links to the organisation’s discoverable open data sets and APIs • The equipment dataset
  • 45. What is an Organisation Profile Document, 45
  • 46. 46
  • 48. Autodiscovery 48 • Dataset publicly available on website. • Dataset has to be added manually along with all the institutions details, contacts etc Requires staff time (especially if any dataset changes location)
  • 49. Autodiscovery 49 • Dataset publicly available on website. • Dataset has to be added manually along with all the institutions details, contacts etc Requires staff time (especially if any dataset changes location) • Organisation has an OPD linking to dataset • The OPD has to be added manually, but the dataset location and institution info is consumed directly from the OPD. Requires less staff time (as any changes made to OPD will get updated)
  • 50. Autodiscovery 50 • Dataset publicly available on website. • Dataset has to be added manually along with all the institutions details, contacts etc Requires staff time (especially if any dataset changes location) • Organisation has an OPD linking to dataset • The OPD has to be added manually, but the dataset location and institution info is consumed directly from the OPD. Requires less staff time (as any changes made to OPD will get updated) • Link to OPD from organisation’s home page • OPD autodiscovered, so the dataset is automatically added to the service. Requires no staff time (as data is autodiscovered)
  • 51. Never appeal to a man’s “better nature.” He may not have one. Invoking his “self—interest” gives you more leverage. - Robert Heinlein, “The Notebooks of Lazarus Long”
  • 52. Status Report – Contributors and data statistics 52
  • 53. Bronze Silver Gold Data is on the internet and in an acceptable format. ✔ ✔ ✔ Description of dataset is provided by a remotely hosted OPD ✔ ✔ The OPD is discovered via autodiscovery. ✔ The OPD/dataset has a recognised and supported open licence (eg CCO, ODCA or OGL) ✔ 53
  • 54. Bronze Silver Gold Data is on the internet and in an acceptable format. ✔ ✔ ✔ Description of dataset is provided by a remotely hosted OPD ✔ ✔ The OPD is discovered via autodiscovery. ✔ The OPD/dataset has a recognised and supported open licence (eg CCO, ODCA or OGL) ✔ All items in the dataset are assigned an ID code which is unique within the assigning organisation. ✔ 54
  • 55.
  • 56.
  • 58. Exploiting profile documents • We’ve barely begun • Lets try a live demo....
  • 59.
  • 60.
  • 61.
  • 63. 63 Needless heterogeneity means research doesn’t join up. Aligning datasets every time costs too much. Tools can’t be reused
  • 64. So what do we do about it?
  • 65.
  • 66. Building easy-to-use tools to cross between formats, platforms and paradigms is very specialist work.
  • 67. Building easy-to-use tools to cross between formats, platforms and paradigms is very specialist work. The solutions need to be discoverable.
  • 68. Building easy-to-use tools to cross between formats, platforms and paradigms is very specialist work. The solutions need to be discoverable. Just putting it on Github is not making a tool discoverable!
  • 69. Building easy-to-use tools to cross between formats, platforms and paradigms is very specialist work. The solutions need to be discoverable. Just putting it on Github is not making a tool discoverable! https://github.com/cgutteridge /
  • 70. Organisation Datasets Well known formats available for: • Events • Publications • News headlines Nothing in common use for: • Staff Expertise • Programmes of Events • Vacancies • Organisational Structure • Buildings, Rooms • Points of service • Products – Food Menus
  • 71.
  • 72. RDF or XML Vocabularies don’t solve the problem by themselves. You need: Examples to copy. Tools which consume and produce the format. Online checking tools.
  • 73. A dataset should at least solve one usecase. Over modelling is fun. Stop it.
  • 74. • TODO: • OPD DOCUMENTATION
  • 75.
  • 76. Thank-you. Christopher Gutteridge University of Southampton @cgutteridge cjg@ecs.soton.ac.uk http://opd.data.ac.uk/