Christopher Gutteridge's slides form Connected Data London. Christopher, who is an Open Data Architect at the Univeristy of Southhampton presented why and how people should employ an Open Data strategy at their organisation.
2. Bragsheet
Christopher Gutteridge - @cgutteridge
• Previously; Lead Developer of EPrints
(Open access research repository software).
• “Linked Open Data Architect” for University of
Southampton.
(or whatever we’re currently call doing LOD stuff for an
organisation)
• Benevolent technical dictator of data.ac.uk
(recently deposed)
• Webmaster WWW2006
• Assistant Webmaster WWW2007, WWW2009
8. Open Data Excuse Bingo
Terrorists will
use it
We'll get spam It's too big
It's not very
interesting
Thieves will use
it
I don't mind,
but someone
else might
We will get too
many enquiries
Lawyers want a
custom License
There's no API Poor Quality
There's already
a project to...
We might want
to use it in a
paper
It's too
complicated
Data Protection
People may
misinterpret
the data
What if we
want to sell it
later
Don’t get depressed! Go here for antidotes: http://is.gd/odbingo
26. Value of dataset to audience
X
Potential audience size
X
Ease of discovery
X
Ease of grasping the value of the dataset
X
Ease of exploiting dataset
Probability of open dataset reuse =
27. Value of dataset to audience
X
Potential audience size
X
Ease of discovery
X
Ease of grasping the value of the dataset
X
Ease of exploiting dataset
X
Perceived quality & reliability
Probability of open dataset reuse =
32. • Automatically discovers equipment data from
all .ac.uk sites
– 2769 websites
– 42 providing data
– 11,028 records
• Automation massively reduces staffing costs
• Low effort for institutions-
– A third just provide a well-structured spreadsheet!
• Not a single-point-of-failure
32
.ac.uk
35. UNIQUIP
Column Heading Required
Type No
Name
At least one of these fields must be completed.
Description
Related Facility ID No
Technique(:cpv) or (:N8) No
Location No
Contact Name No
Contact Telephone
At least one of these fields must be completed.Contact URL
Contact Email
Secondary Contact Name No
Secondary Contact Telephone
At least one of these fields must be completed with
second contact name.
Secondary Contact URL
Secondary Contact Email
ID No
Photo No
Department No
Site Location Yes
Building No
Service Level No
Web Address No
35
41. Sustainability via Autodiscovery
• Have a machine readable document
describing the institution and any open
datasets (with licences)
• Place a link to it on the Institutions homepage
44. What is an Organisation Profile Document,
44
A RDF Document that describes the organisation:
– General information provided:
• Official name, Postal address, Contact phone number,The correct logo,
Physical location
– Links to the parts of the organisation,
• Admissions, Alumni, Freedom of Information, Complaints
– A semantic sitemap
• Key pages such as jobs, news, events…
– Links to the organisation’s discoverable open data sets and APIs
• The equipment dataset
48. Autodiscovery
48
• Dataset publicly available on website.
• Dataset has to be added manually along with all the institutions details,
contacts etc
Requires staff time (especially if any dataset changes location)
49. Autodiscovery
49
• Dataset publicly available on website.
• Dataset has to be added manually along with all the institutions details,
contacts etc
Requires staff time (especially if any dataset changes location)
• Organisation has an OPD linking to dataset
• The OPD has to be added manually, but the dataset location and
institution info is consumed directly from the OPD.
Requires less staff time (as any changes made to OPD will get updated)
50. Autodiscovery
50
• Dataset publicly available on website.
• Dataset has to be added manually along with all the institutions details,
contacts etc
Requires staff time (especially if any dataset changes location)
• Organisation has an OPD linking to dataset
• The OPD has to be added manually, but the dataset location and
institution info is consumed directly from the OPD.
Requires less staff time (as any changes made to OPD will get updated)
• Link to OPD from organisation’s home page
• OPD autodiscovered, so the dataset is automatically added to the
service.
Requires no staff time (as data is autodiscovered)
51. Never appeal to a man’s “better
nature.” He may not have one.
Invoking his “self—interest”
gives you more leverage.
- Robert Heinlein, “The Notebooks of Lazarus Long”
53. Bronze Silver Gold
Data is on the internet and in an
acceptable format.
✔ ✔ ✔
Description of dataset is provided by a
remotely hosted OPD
✔ ✔
The OPD is discovered via
autodiscovery.
✔
The OPD/dataset has a recognised and
supported open licence (eg CCO, ODCA
or OGL)
✔
53
54. Bronze Silver Gold
Data is on the internet and in an
acceptable format.
✔ ✔ ✔
Description of dataset is provided by a
remotely hosted OPD
✔ ✔
The OPD is discovered via
autodiscovery.
✔
The OPD/dataset has a recognised and
supported open licence (eg CCO, ODCA
or OGL)
✔
All items in the dataset are assigned an
ID code which is unique within the
assigning organisation.
✔
54
67. Building easy-to-use tools to cross
between formats, platforms and
paradigms is very specialist work.
The solutions need to be discoverable.
68. Building easy-to-use tools to cross
between formats, platforms and
paradigms is very specialist work.
The solutions need to be discoverable.
Just putting it on Github is not
making a tool discoverable!
69. Building easy-to-use tools to cross
between formats, platforms and
paradigms is very specialist work.
The solutions need to be discoverable.
Just putting it on Github is not
making a tool discoverable!
https://github.com/cgutteridge
/
70. Organisation Datasets
Well known formats
available for:
• Events
• Publications
• News headlines
Nothing in common use for:
• Staff Expertise
• Programmes of Events
• Vacancies
• Organisational Structure
• Buildings, Rooms
• Points of service
• Products
– Food Menus
71.
72. RDF or XML Vocabularies
don’t solve the problem
by themselves.
You need:
Examples to copy.
Tools which consume
and produce the format.
Online checking tools.
73. A dataset should at
least solve one
usecase.
Over modelling is fun.
Stop it.