Session 7/8. Development strategy. The Strategic Content Alliance, JISC sponsored workshops on Maximising Online Resource Effectiveness, held on different occasions throughout 2010 and delivered by Netskills.
7. Compiling metadata
Subject Property Value
me
me
me
my organisation
my organisation
my organisation
my home town
my home town
my home town
8. Compiling metadata
Subject Property Value
me name “George Munroe”
me organisation my organisation
me email address “george@netskills.biz”
my organisation name “Platypus Consultancy”
my organisation year formed “2004”
my organisation town/city my home town
my home town name “Belfast”
my home town country UK
my home town population “1 million”
9. Compiling metadata
Subject Property Value
GM name “George Munroe”
GM organisation PCL
GM email address “george@netskills.biz”
PCL name “Platypus Consultancy”
PCL year formed “2004”
PCL town/city BFS
BFS name “Belfast”
BFS country UK
BFS population “1 million”
12. name organisation
George Munroe GM PCL
l
ai
ho
em
m
george@netskills.biz
e
to
wn
BFS
13. me
Platypus Consultancy
na
name organisation
George Munroe GM PCL formed in
2004
n
l
ai
ho
di
em
m
se
george@netskills.biz
e
ba
to
wn
BFS
14. me
Platypus Consultancy
na
name organisation
George Munroe GM PCL formed in
2004
n
l
ai
ho
di
em
m
se
george@netskills.biz
e
ba
to
wn
name
BFS Belfast
po
pu
la
country
tio
n
1 million
UK
15. me
Platypus Consultancy
na
name organisation
George Munroe GM PCL formed in
2004
n
l
ai
ho
di
em
m
se
george@netskills.biz
e
ba
to
wn
name
BFS Belfast
po
pu
la
country
tio
n
1 million
UK
16. name
Christine Cahoon CJ
or
home
ga
l
ai
ni
ws
em
s
at
christine@netskills.biz Platypus Consultancy
kno
io
me
town
n
na
name organisation
George Munroe GM PCL formed in
2004
n
l
ai
ho
di
em
m
se
george@netskills.biz
e
ba
to
wn
name
BFS Belfast
po
pu
la
country
tio
n
1 million
UK
17. name
Christine Cahoon CJ
or
home
ga
l
ai
ni
ws
em
s
at
christine@netskills.biz Platypus Consultancy
kno
io
me
town
n
na
name organisation
George Munroe GM PCL formed in
2004
n
l
ai
ho
di
em
m
se
george@netskills.biz
e
ba
to
wn
name
BFS Belfast
po
pu
la
country
name tio
Brian Kelly BK n
1 million
l
ai
em
organisation
ho
brian@netskills.biz UK
m
e
to
wn
ntry
cou
name
UKOLN ULN BTH name
Bath
based in
18. Real metadata
Resource description framework
‣ RDF is a generic "way" of using definitive metadata with web resources.
‣ RDF describes "things" (defined by uniform resource identifiers, URIs) by assigning
properties and corresponding values—statements are known as "triples" consisting
of [subject] [predicate] [object].
‣ The predicate URI usually references a term in a standard metadata vocabulary,
resulting in unambiguous meaning.
‣ Any part of the triple can be a URI and URIs can point to other URIs that can be
read using HTTP and extended (or related) in other web resources, thus a scalable
model and very flexible.
http://www.w3.org/TR/rdf-primer/
19. What is RDFa?
RDFa =
Resource Description Framework
in
attributes
20. What is RDFa?
Generic model for the provision of metadata
RDFa =
Resource Description Framework
in
attributes
21. What is RDFa?
Generic model for the provision of metadata
RDFa =
Resource Description Framework
in
attributes
HTML
22. An RDFa basics tutorial by Manu Sporny
http://www.youtube.com/watch?v=ldl0m-5zLz4&feature=player_embedded
23. Simple RDFa web page
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:v="http://rdf.data-vocabulary.org/#">
<head profile="http://www.w3.org/1999/xhtml/vocab">
<title>Simple RDFa example</title>
</head>
<body>
<div xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:Person">
My name is <span property="v:name">George Munroe</span>,
also known online as <span property="v:nickname">mungeo</span>.
I am involved in several ventures but my home web site is at:
<a href="http://www.platypusconsultancy.com"
rel="v:url">www.platypusconsultancy.com</a>.
I live in
<span rel="v:address">
<span typeof="v:Address">
<span property="v:locality">Donegal</span>,
<span property="v:region">Ulster</span>
</span>
</span>
and work as a <span property="v:title">consultant trainer</span>
at <span property="v:affiliation">Netskills</span>.
</div>
</body>
</html>
24. Simple RDFa web page
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:v="http://rdf.data-vocabulary.org/#">
<head profile="http://www.w3.org/1999/xhtml/vocab">
<title>Simple RDFa example</title>
</head>
<body>
<div xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:Person">
My name is <span property="v:name">George Munroe</span>,
also known online as <span property="v:nickname">mungeo</span>.
I am involved in several ventures but my home web site is at:
<a href="http://www.platypusconsultancy.com"
rel="v:url">www.platypusconsultancy.com</a>.
I live in
<span rel="v:address">
<span typeof="v:Address">
<span property="v:locality">Donegal</span>,
<span property="v:region">Ulster</span>
</span>
</span>
and work as a <span property="v:title">consultant trainer</span>
at <span property="v:affiliation">Netskills</span>.
</div>
</body>
</html>
25. Simple RDFa web page
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:v="http://rdf.data-vocabulary.org/#">
<head profile="http://www.w3.org/1999/xhtml/vocab">
<title>Simple RDFa example</title>
</head>
<body>
<div xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:Person">
My name is <span property="v:name">George Munroe</span>,
also known online as <span property="v:nickname">mungeo</span>.
I am involved in several ventures but my home web site is at:
<a href="http://www.platypusconsultancy.com"
rel="v:url">www.platypusconsultancy.com</a>.
I live in
<span rel="v:address">
<span typeof="v:Address">
<span property="v:locality">Donegal</span>,
<span property="v:region">Ulster</span>
</span>
</span>
and work as a <span property="v:title">consultant trainer</span>
at <span property="v:affiliation">Netskills</span>.
</div>
</body>
</html>
26. Simple RDFa web page
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:v="http://rdf.data-vocabulary.org/#">
<head profile="http://www.w3.org/1999/xhtml/vocab">
<title>Simple RDFa example</title>
</head>
<body>
<div xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:Person">
My name is <span property="v:name">George Munroe</span>,
also known online as <span property="v:nickname">mungeo</span>.
I am involved in several ventures but my home web site is at:
<a href="http://www.platypusconsultancy.com"
rel="v:url">www.platypusconsultancy.com</a>.
I live in
<span rel="v:address">
<span typeof="v:Address">
<span property="v:locality">Donegal</span>,
<span property="v:region">Ulster</span>
</span>
</span>
and work as a <span property="v:title">consultant trainer</span>
at <span property="v:affiliation">Netskills</span>.
</div>
</body>
</html>
27. RDFa distiller
Extract RDF from HTML + RDFa
W3C service to identify and list RDF from a web page
‣ using web address, local file or direct text inputs
‣ provides “clean” view of data hierarchy
‣ enables simple check on markup validation *and* intended meaning
http://www.w3.org/2007/08/pyRdfa/
29. Real metadata
Microformats and RDFa
‣ Previously RDF statements were usually provided in separate .rdf files and were not
widely used because of the extra effort required to produce.
‣ Microformats consist of informal vocabularies (not referenced in the document) that
have been established by rapid user adoption, ease of use and desire to create
richer semantics with embedded metadata. These are used with "class" attribute in
<div> and <span> blocks or with “rel” in anchor <a …> tags.
‣ RDFa allows RDF statements to be included in ordinary HTML files using formally
defined attributes within <span> blocks, with metadata vocabularies referenced in
<head>.
http://www.w3.org/TR/rdfa-in-html/ http://microformats.org/wiki/Main_Page
30. Data integration
Seamless use of data in a web page with
desktop applications
‣ use of microformats tools to generate contact information in a web page
‣ viewing of web page containing microformats with an “aware” browser
‣ addition of data in the web page to desktop address book
http://microformats.org/code-tools http://en.wikipedia.org/wiki/HCard
31.
32.
33.
34. Metadata vocabularies
The importance of using commonly understood
and accessible metadata language
Everyone (and every computer) must have a common understanding of
what particular “things“ (entities and their properties) actually are
‣ concept of XML namespaces, identifying vocabularies (descriptions of what
properties could be defined for a particular entity) available on the web
‣ these descriptions supplied as RDF (or RDFa) files with a URL (URI)
And there’s more to it than just a flat list of entities and properties
‣ a real understanding involves being aware of the relationships between entity
classes as well as what properties are associated with an entity
‣ these relationships can be defined using other vocabularies (OWL)
‣ a very complex “ontology” can be built very simply from triples where the object of
one triple may be the subject of another
36. Metadata—deployment
RDFa and linked data in UK government web-
sites
Mark Birbeck, Nodalities Magazine, 29 July 2009
The UK government’s Central Office of Information had a straightforward problem to solve:
how could they create a centralised web-site of information that the public could search and
access, when the source of that information could be any government department database
or any public sector web-site?
By using RDFa to address the challenge of making distributed data available in one place, the
COI avoided having to make changes to each department's systems. But once each
department is publishing RDFa, it becomes possible for third parties to consume that
information however they see fit. Such a flexible architecture is crucial in the age of open
government, and is a cornerstone of linked open data.
http://blogs.talis.com/nodalities/2009/07/rdfa-and-linked-data-in-uk-government-web-sites.php
37. Metadata—deployment
TSO announces major new platform to
accelerate open data drive
TSO partners with Garlik on hosted "trillion triple" RDF platform, 18 January 2010
TSO (The Stationery Office), the public sector division of Williams Lea, has today announced
a partnership with Garlik, the leading semantic technology innovator, to launch what is
believed to be the world's most scalable, securely hosted RDF platform for use by UK Central
and Local Government departments. As the largest publisher in the UK of public sector
documents (over 8,000 titles a year), TSO has taken this proactive step to provide its core
public sector customers with the ability to participate with confidence in the Government's
open data initiative.
http://www.tso.co.uk/press/latestnews/archive/2010/triplestore/
39. The open graph protocol
Facebook and the open graph
protocol
Announced at the Facebook Developers Conference, 21 April 2010
The Open Graph protocol enables you to integrate your web pages
into the social graph. It is currently designed for web pages
representing profiles of real-world things—things like movies,
sports teams, celebrities, and restaurants. Once your pages
become objects in the graph, users can establish connections to
your pages as they do with Facebook Pages. Based on the
structured data you provide via the Open Graph protocol, your
pages show up richly across Facebook: in user profiles, within
search results and in News Feed.
With the open graph protocol, any URL can be treated just like a
Facebook page.
http://opengraphprotocol.org/ http://developers.facebook.com/docs/opengraph
40. The “open data” movement
Linking Open Data project
The goal is to extend the web by publishing various open data sets as RDF on the web and by
setting RDF links between data items in different data sources. These RDF links then enable
navigation from a data item within one data source to related data items within other sources
using a semantic web browser.
RDF links can also be followed by the crawlers of semantic web search engines, which may
provide sophisticated search and query capabilities over crawled data. As query results are
structured data and not just links to HTML pages, they can be used within other applications.
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData/
41. The “open data” movement
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData/
42. The “open data” movement
Contains 4.7 billion triples, interlinked by
around 142 million RDF links
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData/
43. Open linked data
School of Electronics and Computer Science
(ECS) at University of Southampton releases
all public data in open linked data format
Joyce Lewis, 13 July 2010
In what is believed also to be a world-first, ECS has become the UK’s first University
department to release all its public data in open linked data format.
In accordance with the spirit of the open linked data initiative, ECS has released all its own
data for public reuse. This includes data about research papers in the EPrints archive
(announced this in the official global rankings as one of the top ten in the world), people in
the School, research groups, teaching modules, seminars and events, buildings and rooms.
All public (RDF) data from rdf.ecs.soton.ac.uk and eprints.ecs.soton.ac.uk is now available
and can be reused for any legal purpose, including derivative works and commercial use. The
School has opted for a creative commons public domain (CC0) license to allow the data to be
reused.
Christopher Gutteridge, ECS Web Projects Manager, comments: “We believe that in the future
this will become common practice for certain types of open data, and it is our responsibility to
lead the way in setting the standards of best practice.”
http://www.ecs.soton.ac.uk/about/news/3313
44. Open and linked data
It's all semantics: open data, linked data and
the semantic web
Richard MacManus, ReadWriteWeb, 31 March 2010
Titti Cimmino put it nicely: Open Data is simply 'data on the web,' whereas Linked Data is a
'web of data.'
However, the idea of Open Data is to turn it into Linked Data. As John S. Erickson pointed
out, the first priority of Data.gov.uk (and its U.S. counterpart) is to publish lots of Open Data.
The next step is to work towards linking it all up. This is already starting to happen.
Answering a question I posed on Twitter, Kingsley Idehen confirmed that Data.gov.uk is
currently a combination of Open Data and Linked Data.
http://www.readwriteweb.com/archives/open_data_linked_data_semantic_web.php
45. Open data
Open data principles
December 7-8, 2007—30 open government advocates gathered to develop principles of open
government data in California, resulting in 8 fundamental principles for open government
data. Governments can become more effective, transparent, and relevant by embracing.
http://wiki.opengovdata.org/index.php?title=OpenDataPrinciples
46. Open data
Open data principles
December 7-8, 2007—30 open government advocates gathered to develop principles of open
government data in California, resulting in 8 fundamental principles for open government
data. Governments can become more effective, transparent, and relevant by embracing.
1. Complete. All public data is made available. Public data is data that is not subject
to valid privacy, security or privilege limitations.
http://wiki.opengovdata.org/index.php?title=OpenDataPrinciples
47. Open data
Open data principles
December 7-8, 2007—30 open government advocates gathered to develop principles of open
government data in California, resulting in 8 fundamental principles for open government
data. Governments can become more effective, transparent, and relevant by embracing.
1. Complete. All public data is made available. Public data is data that is not subject
to valid privacy, security or privilege limitations.
2. Primary. Data is as collected at the source, with the highest possible level of
granularity, not in aggregate or modified forms.
http://wiki.opengovdata.org/index.php?title=OpenDataPrinciples
48. Open data
Open data principles
December 7-8, 2007—30 open government advocates gathered to develop principles of open
government data in California, resulting in 8 fundamental principles for open government
data. Governments can become more effective, transparent, and relevant by embracing.
1. Complete. All public data is made available. Public data is data that is not subject
to valid privacy, security or privilege limitations.
2. Primary. Data is as collected at the source, with the highest possible level of
granularity, not in aggregate or modified forms.
3. Timely. Data is made available as quickly as necessary to preserve the value of
the data.
http://wiki.opengovdata.org/index.php?title=OpenDataPrinciples
49. Open data
Open data principles
December 7-8, 2007—30 open government advocates gathered to develop principles of open
government data in California, resulting in 8 fundamental principles for open government
data. Governments can become more effective, transparent, and relevant by embracing.
1. Complete. All public data is made available. Public data is data that is not subject
to valid privacy, security or privilege limitations.
2. Primary. Data is as collected at the source, with the highest possible level of
granularity, not in aggregate or modified forms.
3. Timely. Data is made available as quickly as necessary to preserve the value of
the data.
4. Accessible. Data is available to the widest range of users for the widest range of
purposes.
http://wiki.opengovdata.org/index.php?title=OpenDataPrinciples
50. Open data
Open data principles
December 7-8, 2007—30 open government advocates gathered to develop principles of open
government data in California, resulting in 8 fundamental principles for open government
data. Governments can become more effective, transparent, and relevant by embracing.
1. Complete. All public data is made available. Public data is data that is not subject
to valid privacy, security or privilege limitations.
2. Primary. Data is as collected at the source, with the highest possible level of
granularity, not in aggregate or modified forms.
3. Timely. Data is made available as quickly as necessary to preserve the value of
the data.
4. Accessible. Data is available to the widest range of users for the widest range of
purposes.
5. Machine processable. Data is reasonably structured to allow automated
processing.
http://wiki.opengovdata.org/index.php?title=OpenDataPrinciples
51. Open data
Open data principles
December 7-8, 2007—30 open government advocates gathered to develop principles of open
government data in California, resulting in 8 fundamental principles for open government
data. Governments can become more effective, transparent, and relevant by embracing.
1. Complete. All public data is made available. Public data is data that is not subject
to valid privacy, security or privilege limitations.
2. Primary. Data is as collected at the source, with the highest possible level of
granularity, not in aggregate or modified forms.
3. Timely. Data is made available as quickly as necessary to preserve the value of
the data.
4. Accessible. Data is available to the widest range of users for the widest range of
purposes.
5. Machine processable. Data is reasonably structured to allow automated
processing.
6. Non-discriminatory. Data is available to anyone, with no requirement of
registration.
http://wiki.opengovdata.org/index.php?title=OpenDataPrinciples
52. Open data
Open data principles
December 7-8, 2007—30 open government advocates gathered to develop principles of open
government data in California, resulting in 8 fundamental principles for open government
data. Governments can become more effective, transparent, and relevant by embracing.
1. Complete. All public data is made available. Public data is data that is not subject
to valid privacy, security or privilege limitations.
2. Primary. Data is as collected at the source, with the highest possible level of
granularity, not in aggregate or modified forms.
3. Timely. Data is made available as quickly as necessary to preserve the value of
the data.
4. Accessible. Data is available to the widest range of users for the widest range of
purposes.
5. Machine processable. Data is reasonably structured to allow automated
processing.
6. Non-discriminatory. Data is available to anyone, with no requirement of
registration.
7. Non-proprietary. Data is available in a format over which no entity has exclusive
control.
http://wiki.opengovdata.org/index.php?title=OpenDataPrinciples
53. Open data
Open data principles
December 7-8, 2007—30 open government advocates gathered to develop principles of open
government data in California, resulting in 8 fundamental principles for open government
data. Governments can become more effective, transparent, and relevant by embracing.
1. Complete. All public data is made available. Public data is data that is not subject
to valid privacy, security or privilege limitations.
2. Primary. Data is as collected at the source, with the highest possible level of
granularity, not in aggregate or modified forms.
3. Timely. Data is made available as quickly as necessary to preserve the value of
the data.
4. Accessible. Data is available to the widest range of users for the widest range of
purposes.
5. Machine processable. Data is reasonably structured to allow automated
processing.
6. Non-discriminatory. Data is available to anyone, with no requirement of
registration.
7. Non-proprietary. Data is available in a format over which no entity has exclusive
control.
8. License-free. Data is not subject to any copyright, patent, trademark or trade
secret regulation. Reasonable privacy, security and privilege restrictions may be
allowed.
http://wiki.opengovdata.org/index.php?title=OpenDataPrinciples
54. Open government
The currents of our time
Carl Malamud, Gov 2.0 Summit, 7-8 September
2010
If our government is to do the jobs with which we
have entrusted it—if government is to ensure that
the air we breathe and the water we drink are
safe, or that every child is to be given a chance to
flourish—if we are to accomplish these goals, the
machinery of our government must be made to
work properly...
Our federal government spends $81.9 billion a
year on Information Technology. Much of that is
wasted effort. We build systems so badly, it is
crippling the infrastructure of government.
http://public.resource.org/currents/
56. RDFa tools
RDF/RDFa related tools
RDFa distiller (extract pure RDF from HTML + RDFa)
‣ http://www.w3.org/2007/08/pyRdfa/
‣ get RDF directly from http://example.com/sample.html using single address
http://www.w3.org/2007/08/pyRdfa/extract?uri=http://example.com/sample.html
RDF validator and grapher
‣ http://www.w3.org/RDF/Validator/
Google’s RDFa tutorial
‣ http://www.google.com/support/webmasters/bin/answer.py?
hl=en&answer=146898
Operator plug in for Firefox
‣ https://addons.mozilla.org/en-US/firefox/addon/4106
DBpedia applications (try e.g. the relation finder)
‣ http://wiki.dbpedia.org/Applications
OpenLink Data Explorer extension for Firefox
‣ https://addons.mozilla.org/en-US/firefox/addon/8062
List global namespaces and entities
‣ http://pingthesemanticweb.com/
57. Discovery using RDF links
DBpedia applications
DBpedia is RDF data extracted from the well structured wikipedia pages
(13 million)
‣ open web page at: http://wiki.dbpedia.org/Applications
‣ select the “Relation Finder” application
‣ on the left hand side of the page enter two “entities” that are likely to have several
mentions in wikipedia
‣ select “Find Relations” and watch the RDF links begin to match up to reveal
interesting direct and indirect information about the entities
‣ explore some of the other DBpedia applications and determine if there is any
relevance to your own work
58. SPARQL
Exploring, mining, combining RDF triples using
a simple query language
‣ requires a “SPARQL” endpoint as query engine to examine data and present results
‣ generic (using any data set) or specific (using a particular data set) available
‣ typical SPARQL query using dbpedia data set:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?person
FROM <http://dbpedia.org/>
WHERE {
?person foaf:name ?name .
GRAPH ?g1 { ?person a foaf:Person }
GRAPH ?g2 { ?person a foaf:Person }
FILTER(?g1 != ?g2) .
}
‣ potentially extremely powerful search of many resources
http://dbpedia.org/snorql/
59. The “open data” movement
http://www.ted.com/talks/tim_berners_lee_the_year_open_data_went_worldwide.html