Linked_Open_Data_Rome_Netcamp_13

OPEN DATA & OPEN CULTURE
Michele Piunti
Whitehall Reply

OUTLINE

• Background
• Motivations
• Approaches
• Open Data As A Service
• Cultural Heritage Hacking
• Case Study

2

Obama Vision
―In the coming year, we’ll also
work to rebuild people’s faith in
the institution of government.
Because you deserve to know
exactly how and where your tax
dollars are being spent, you’ll
be able to go to a website and
get that information for the very
first time in history. Because
President Obama you deserve to know when your
The State of the Union elected officials are meeting
Speech - 2011 with lobbyists, I ask Congress
http://www.whitehouse.gov/ to do what the White House has
state-of-the-union-2011
already done — put that
5 information online.―

What Open Data is
Open data is the idea that certain data should be freely available to
everyone to use and republish as they wish, without restrictions from
copyright, patents or other mechanisms of control.

The goals of the open data movement are similar to those of other
"Open" movements such as open source, open content, and open
access (Ref. Wikipedia)

Citizen centricity comes from citizen empowerment, namely
disintermediation .wrt traditional actors
6

Expected Payoff
• Ubiquitous access
• Re usability
• Optimization
• Social and Cultural enrichment

ROI ―A greater than 100X return on investment in direct Federal IT spending
through economies of scope is achievable by equipping agencies with an
Open Data platform that is the shared foundation for numerous programs that
are independently funded today‖
[http://www.socrata.com/blog/open-data-as-a-platform/]

OD turns to be a formidable tool for:

• Analyzing Spending Review on administrations expenses
• Enforcing Fact Checking on declarations policies and campaigns
7

Where Open Data is
http://census.okfn.org/
https://nycopendata.socrata.com/
https://dati.lombardia.it
..and counting

8

Italian Digital Agenda : Open Data + E-Gov
Italy established in 2012 a ―control room‖ of experts aimed at promoting Open
Data in the context of a digital agenda

Open Data is integrated with E-Gov
1. Enabling Infrastructures
2. PA digital switchover
3. Purposive and regulative set of norms and rules
4. Communication plan

The challenge is: optimizing services and costs:
• Digital Identities and related services, unified and web based registry
offices, e-payments, continuous census, interoperability of EU platforms
• Digital health, Cultural Heritage
• eLearning, eProcurement, eRecruitment
9

dati.gov.it

dati.gov.it/content/infografica

dati.gov.it/content/parte-lopen-data-default
10

Roadmap to Open Data

Data assets Identify Use Cases Identify ROI Architectur Legal LOD
analysis and Final Users • Risk e Definition Issues Feasibility
• Relevant Datasets • Best Practices and assessment • Identify service • Copyright
report, Exec
• Customer internal similar datasets • Savings level • Licensing
utive Plan
processes analysis • Linked Data Cloud • Identify non • W3C • Liability of Data
and Road
quantifiable Compliance update Map
ROI

Identify Development Data Enrichment Validation and Composition
Datasets • Architecture • Metadata Publication of Services
LOD
• Data Analysis Definition description, Ontolo • W3C Compliance • Documentatio Services
• Datasets Store gy, RDF • Data Localization, n
• Data and
transformation • Internal linking • External Linking History • Build Ecosystem Platform
• SPARQL • External • Communication • Public API
•Normalization Endpoint component Plan
(GIS, Data-
Mining, BI
Analysis modules)
Service Development Knowledge Transfer

15

Legal Restrictions, Privacy, Licenses
Multiple legal or regulatory restrictions on the use of the data.

16

5★ Open Data
Tim Berners-Lee, the inventor of the Web and Linked Data
initiator, suggested a 5 star deployment scheme for Open Data.

make PUBLIC stuff available on the Web (whatever
★
format, .jpeg .pdf) under an open license

make it available as structured data (e.g., Excel
★★
instead of image scan of a table)

use non-proprietary formats
★★★
(e.g., CSV instead of Excel)

use URIs to denote things, so that people
★★★★
can point at your stuff

★★★★★ link your data to other data to provide context

17

W3C Roadmap
Having Standard Names/URIs for All Government
Objects aids in discoverability, improves
metadata, and ensures authenticity.

• Provide permanent, patterned and/or
discoverable URI/URLs to your data
• Create a web page with a plain language
description of the dataset to help search
engines find the data, so people can use it.
• Provide links out to other data and documentation.
• Ensure that data is findable and can be referenced for as long as
people need it
• Data published in industry standards like (X)HTML, XML and RDF
can be used as an object database or RESTful API
18

Linked Open Data
Recommended best practice for exposing, sharing, and connecting
pieces of data, information, and knowledge on the Semantic Web
using URIs OWL and RDF.

1. Requires Ontologies to be applied to
data

2. Allows heterogeneous Nodes to be
traversed in a semantically coherent
fashion

19

Botticelli Case
One may specify that the author’s mention of ―La Primavera‖ at Uffizi
Museum LINKS to exactly the same person as the one described on
the Dbpedia (LOD of Wikipedia)

http://live.dbpedia.org/page/Primavera_(painting)

http://live.dbpedia.org/page/Sandro_Botticelli

http://live.dbpedia.org/page/Adoration_of_the_Magi_of_1475_(Botticelli)

The link is not just a hyperlink because it is typed.
In the BOTTICELLI page, the information about his life and works is
structured, by means of the topology
20

Semantic Network
Enable Reasoning: OWL-DL, based on Description Logics, represent
decidable fragments of First Order Logic

Sandro_Botticelli  category: Italian_Renaissance_painters
category: Italian_Renaissance_painters  category:Quattrocento_painters
Sandro_Botticelli  category:Quattrocento_painters

http://live.dbpedia.org/page/Category:Italian_Renaissance_painters

http://live.dbpedia.org/page/Sandro_Botticelli

http://live.dbpedia.org/page/Category:Quattrocento_painters

21

Linked Open Data Cloud (2011)

Doubled in size
every 10 months,
since 2007

Media
User-generated
Geographic
Publications
Government
Cross-Domain
Life Sciences

22

Recipes for Serving Information as Linked Data
• Entities must be identified with referenceable HTTP URIs.
• At the MIME-type application/rdf+xml, the data source must return
an RDF/XML description.
• RDF descriptions should also contain RDF links to resources
provided by other data sources, so that clients can navigate the Web
of Data as a whole by following RDF links.

23

Towards Government 2.0

“ Governments IT need to redefine themselves as Government as a Platform ”

Open Data is the platform for Open Government.
Actors:
• Institutions: to better serve services for citizens
• Civic-minded developers: to serve themselves and the others by extending
the platform (i.e. mash-ups, applications)

What actors need: Open Data management platforms, consistent admin tools
and a powerful Open Data Catalog to consolidate the entire Open Data
lifecycle (STEP 1-5)

25

Open Data As-A-Service

REST API
Mobile App

REST API
Web App

REST API
Mobile App

Data-on-Demand data are not closed inside CMS applications but are
consumed on-demand As-a-Service
Data as Web Resources RESTful API make it possible to retrieve data as
a web resource (through URI)
27

Socrata: GovStat Approach
Socrata is being realeasing fragments of the platform as Open Source
in Git Hub
https://github.com/open-data-standards

Business Model is moving to advanced data analysis tools, mining, real
time monitoring, decision making support systems
http://www.socrata.com/govstat/

28

Open Data in a Cultural Heritage Scenario
Art Galleries, Libraries, Archives and Museums (GLAMS) are exploring the
added value of sharing their data resources as LOD

Key facts:
• Rich and structured data sets accumulated over many years by experts
• Ability to reach out to audiences to both enrich datasets and to evaluation
services
• Long-standing expertise in meta-data management and
(co-) curation
• Authoritative knowledge on a wide range of subjects
30

GLAMS LOD Examples
In Agora, the Rijksmuseum Amsterdam and the
Netherlands Institute for Sound and Vision collaborate
with the Computer Science and History departments at
the VU to integrate their collections and enrich with
historical information to facilitate a more comprehensive
understanding of the historical dimension of objects in
online heritage collections. [http://agora.cs.vu.nl/]

The Amsterdam Museum was the first museum in the
Netherlands to convert its complete museum collection
database to RDF. The resulting resource consists of
more than 5 Million RDF triples describing over than
70.000 cultural heritage objects. Several working
examples uses this dataset, such as a mobile city
guide.
31

GLAMS LOD Examples
Europeana is a pan-European initiative that provides
access to millions of objects as LOD through API. The
Europeana Thought Lab[5] search interface shows how
LOD principles can aid the search process. Europeana
has been a strong supporter for the uptake of CC0, the
"no rights reserved" in Creative Commons-licenses
[http://pro.europeana.eu]

Open Images provides access to a large and growing
collection of Creative Commons licensed archive
material. The meta-data is converted to RDF, allowing
the creation of rich semantic links between other
datasets such as the Amsterdam Museum dataset
[http://www.openimages.eu/]

32

PROS and CONS of LOD for GLAMS
PROS
• Driving users to online content held by GLAMS (e.g., by improved
search engine optimization);
• Stimulating collaboration in the library, archives and museums
domain and beyond, for instance by inviting people to clean/enrich
existing data;
• Enabling new scholarship that can only be done with open data;
• Allowing the creation of new services for discovery;
• quoting Verwayen (2011) ―increas[ing] relevance to digital society.‖

CONS
• Loss of Attribution to the ―memory institution‖, which may turn to
decrease values of the artworks
• Loss of potential Incomes: open data may not be sold
33

Metrics of Success

Incomes: measured in money

Public Outreach: to measure the
number of (online) visitors

Reuse: to measure the use of data and content by heritage
institutions themselves and by others

Public Participation: to measure the amount of added metadata
and content

34

Developing Open Linked Data
(with Graph Database)

Developing Open Linked Data
We may recognize few contingencies in our scenario:
• Exponential growth in data volumes
• Rise of connectedness
• Increase in degrees of semi-structure
• Structures and Schemes emerge rather than having a pre-defined
upfront

Key facts:
• Volume: the size of the stored data
• Velocity: the rate at which data changes over time
• Variety: the degree to which data is regularly or irregularly
structured, dense or sparse, and importantly connected or
37 disconnected

ER Approach
We do not know the structure of the documents in design time.
Adopting an ER approach we have to define vertical tables

38

Relational Model Weakness
In ER model relationships are semantic free (direction, name)

• As the amount of semi-structured information increases, the
relational model becomes burdened
• Maintenance overheads: join tables and maintaining foreign key constraints
just to make the database work.
• Large join tables, sparsely populated rows and lots of null-checking logic

• Difficult to face with reciprocal queries in nowadays semi-structured,
real-world cases
• Recommendation systems, social networks

39

Aggregate Stores Weakness
Aggregates allow to mimic relationships embedding cross-stores
identifiers, but:
• Is up to the developer to manage, infer and reify useful knowledge
from that
• Do not provide index-free adjacency
• Delete must be checked

• Traversing relationships is expensive, each link requiring index
lookup
• Brute force computing an entire data set is O(n) since all n aggregates in
the data store must be considered. That’s far too costly where we’d prefer
O(log n)
• Impractical in real time scenario

40

Storing data in Graphs
Graph theory was pioneered by Euler in the 18th century, received
multidisciplinary contributes across centuries
• Facebook, Google and Twitter have centered their business models
around their own proprietary distributed graph technologies

Facebook TAO
Twitter FlockDB

Graph databases store information in ways that much more closely
resemble the ways the world is organized and the humans ―think
about‖ data.
Top 10 Gartner IT technologies in 2013 ―[..] are designed to support
new transaction, interaction and observation use cases involving web
scale, mobile, cloud and clustered environments‖

41

From Relational to Graph based Modeling
Graph DB place relationships as first-class abstractions of the data
model

• It contains nodes and relationships
• Nodes contain properties (key-
value pairs)
• Relationships are named, directed
and always have a start and end
node
• Relationships can also contain
properties

A Graph –[:RECORDS_DATA_IN] Nodes –[:WHICH_HAVE]
Properties.
Nodes –[:LINKED_BY] Relationships
42

From Relational to Graph based Modeling

Shake RDBMS while keeping all the relationships, and you’ll see a
graph

Where RDBMS are optimized for aggregated data, Graph Database
are optimized for highly connected data
43

Traversing Map Performances
Friend of Friend (FoF) problem : for any person in a social network,
look for a route to some other person in the graph at most depth=N
hops away.
For a social network containing 1,000,000 people each with ~50 friends
the results (*) shows that graph databases are the best choice

Depth RDBMS Execution Time (s) Neo4j (s) Returned Records

2 0.016 0.01 ~2500

3 30.267 0.168 ~110,000

4 1543.505 1.359 ~600,000

5 Unfinished 2.132 ~800,000

45 (*) Graph Databases, O’ Reilly – To Appear

A Case Study
Using Neo4j and Spring Data

Neo4j Graph DB

intuitive, using a graph model for data
representation
reliable, with full ACID transactions
durable and fast, using a custom disk-based, native storage engine
massively scalable, up to several billion nodes/relationships/properties
highly-available, when distributed across multiple machines
expressive, with a powerful, human readable graph query language
fast, with a powerful traversal framework for high-speed graph queries
embeddable, with a few small jars
simple, accessible by a convenient REST interface or an object-
oriented Java API

48

Spring Data and Neo4J

Promotes POJO based development for the Graph Database Neo4j.
It maps annotated entity classes to the Neo4j Graph Database with
advanced mapping functionality.

Seamless integration of the Cypher Query Language

49

Spring Data Neo4j
It is possible to derive queries for domain entities from finder method
names like Iterable<T>

@Indexed fields will be converted into index-lookups of the start
clause, navigation along relationships will be reflected in the match
clause properties with operators will end up as expressions in the
where clause
50

Open Linked Graph
User

51

Open Linked Graph
User

[:OWNS] [:OWNS]
Document
Document

52

Open Linked Graph
User

[:OWNS] [:OWNS]
Document
Document
[:INCLUDES]
[:INCLUDES]
[:INCLUDES]
Node [:INCLUDES] [:INCLUDES] Node
[:INCLUDES]
Node Node Node
Node

53

Open Linked Graph
User

[:OWNS] [:OWNS]
Document
Document
[:INCLUDES]
[:INCLUDES]
[:INCLUDES]
Node [:INCLUDES] [:INCLUDES] Node
[:INCLUDES]
Node Node Node
Node
[:DBP_LINKED] [:DBP_LINKED]
[:DBP_LINKED]
[:LOCATED]
[:LOCATED] [:LOCATED]
[:LOCATED]
[:DBP_LINKED]

DBPedia URI
DBPedia URI
Venue DBPedia URI
Venue DBPedia URI Venue
Venue
54

Ideas for Changing the Future

1. User Centric Experience
2. Relevance Based
Approach
3. Big Data
4. Environments and
Societies
5. Smart Cities

55

Thanks

Michele Piunti
m.piunti@reply.eu

Modular Approach

Mobile Web UI
Interface Public API
JQuery UI, Kendo UI,
Spring REST Bootstrap

Open CMS
Enterprise OLAP
Framework Analysis,

SaaS
Mining
Manager Spring, J2EE

Pentaho
Persistence Manager
Spring Data, MyBatis, Hybernate
SOCRATA
CKan
**

Geographic Information System

Data Storage Ontology
NoSQL RDF, AML *
SQL DBMS
Neo4J,MongoDB Oracle, MySQL, Postgre

Ongoing collaborations:
58 (*) ST-Lab, ISTC-CNR
(**) Socrata, Inc.

Linked_Open_Data_Rome_Netcamp_13

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (7)

Ähnlich wie Linked_Open_Data_Rome_Netcamp_13

Ähnlich wie Linked_Open_Data_Rome_Netcamp_13 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Linked_Open_Data_Rome_Netcamp_13