SlideShare ist ein Scribd-Unternehmen logo
1 von 51
OPEN DATA & OPEN CULTURE
                 Michele Piunti
                Whitehall Reply
OUTLINE


    •   Background
    •   Motivations
    •   Approaches
    •   Open Data As A Service
    •   Cultural Heritage Hacking
    •   Case Study




2
Background
Obama Vision
                                   ―In the coming year, we’ll also
                                     work to rebuild people’s faith in
                                     the institution of government.
                                     Because you deserve to know
                                     exactly how and where your tax
                                     dollars are being spent, you’ll
                                     be able to go to a website and
                                     get that information for the very
                                     first time in history. Because
              President Obama        you deserve to know when your
          The State of the Union     elected officials are meeting
                  Speech - 2011      with lobbyists, I ask Congress
    http://www.whitehouse.gov/       to do what the White House has
        state-of-the-union-2011
                                     already done — put that
5                                    information online.―
What Open Data is
    Open data is the idea that certain data should be freely available to
     everyone to use and republish as they wish, without restrictions from
     copyright, patents or other mechanisms of control.

    The goals of the open data movement are similar to those of other
     "Open" movements such as open source, open content, and open
     access (Ref. Wikipedia)




    Citizen centricity comes from citizen empowerment, namely
     disintermediation .wrt traditional actors
6
Expected Payoff
    •   Ubiquitous access
    •   Re usability
    •   Optimization
    •   Social and Cultural enrichment



    ROI ―A greater than 100X return on investment in direct Federal IT spending
    through economies of scope is achievable by equipping agencies with an
    Open Data platform that is the shared foundation for numerous programs that
    are independently funded today‖
    [http://www.socrata.com/blog/open-data-as-a-platform/]

OD turns to be a formidable tool for:

• Analyzing Spending Review on administrations expenses
• Enforcing Fact Checking on declarations policies and campaigns
7
Where Open Data is
http://census.okfn.org/
https://nycopendata.socrata.com/
https://dati.lombardia.it
..and counting




8
Italian Digital Agenda : Open Data + E-Gov
    Italy established in 2012 a ―control room‖ of experts aimed at promoting Open
      Data in the context of a digital agenda

    Open Data is integrated with E-Gov
      1. Enabling Infrastructures
      2. PA digital switchover
      3. Purposive and regulative set of norms and rules
      4. Communication plan

    The challenge is: optimizing services and costs:
    • Digital Identities and related services, unified and web based registry
      offices, e-payments, continuous census, interoperability of EU platforms
    • Digital health, Cultural Heritage
    • eLearning, eProcurement, eRecruitment
9
dati.gov.it



                                                    dati.gov.it/content/infografica




     dati.gov.it/content/parte-lopen-data-default
10
Approaches
Roadmap to Open Data

        Data assets                Identify Use Cases              Identify ROI     Architectur              Legal                         LOD
        analysis                   and Final Users                 • Risk           e Definition             Issues                     Feasibility
        • Relevant Datasets        • Best Practices and              assessment     • Identify service       • Copyright
                                                                                                                                       report, Exec
        • Customer internal          similar datasets              • Savings          level                  • Licensing
                                                                                                                                        utive Plan
          processes analysis       • Linked Data Cloud             • Identify non   • W3C                    • Liability of Data
                                                                                                                                        and Road
                                                                     quantifiable     Compliance               update                      Map
                                                                     ROI




     Identify                  Development                  Data Enrichment           Validation and                    Composition
     Datasets                  • Architecture               • Metadata                Publication                       of Services
                                                                                                                                              LOD
     • Data Analysis             Definition                   description, Ontolo     • W3C Compliance                  • Documentatio      Services
                               • Datasets Store               gy, RDF                 • Data Localization,                n
     • Data                                                                                                                                   and
       transformation          • Internal linking           • External Linking          History                         • Build Ecosystem   Platform
                               • SPARQL                     • External                • Communication                   • Public API
     •Normalization              Endpoint                     component                 Plan
                                                              (GIS, Data-
                                                              Mining, BI
        Analysis                                              modules)
                                                          Service Development                            Knowledge Transfer




15
Legal Restrictions, Privacy, Licenses
     Multiple legal or regulatory restrictions on the use of the data.




16
5★ Open Data
     Tim Berners-Lee, the inventor of the Web and Linked Data
     initiator, suggested a 5 star deployment scheme for Open Data.


            make PUBLIC stuff available on the Web (whatever
        ★
            format, .jpeg .pdf) under an open license



            make it available as structured data (e.g., Excel
       ★★
            instead of image scan of a table)



            use non-proprietary formats
      ★★★
            (e.g., CSV instead of Excel)


            use URIs to denote things, so that people
     ★★★★
            can point at your stuff


 ★★★★★ link your data to other data to provide context



17
W3C Roadmap
 Having Standard Names/URIs for All Government
  Objects aids in discoverability, improves
  metadata, and ensures authenticity.

 • Provide permanent, patterned and/or
   discoverable URI/URLs to your data
 • Create a web page with a plain language
   description of the dataset to help search
   engines find the data, so people can use it.
 • Provide links out to other data and documentation.
 • Ensure that data is findable and can be referenced for as long as
   people need it
 • Data published in industry standards like (X)HTML, XML and RDF
   can be used as an object database or RESTful API
18
Linked Open Data
     Recommended best practice for exposing, sharing, and connecting
      pieces of data, information, and knowledge on the Semantic Web
      using URIs OWL and RDF.

                                   1. Requires Ontologies to be applied to
                                      data

                                   2. Allows heterogeneous Nodes to be
                                      traversed in a semantically coherent
                                      fashion




19
Botticelli Case
      One may specify that the author’s mention of ―La Primavera‖ at Uffizi
       Museum LINKS to exactly the same person as the one described on
       the Dbpedia (LOD of Wikipedia)

                                            http://live.dbpedia.org/page/Primavera_(painting)




http://live.dbpedia.org/page/Sandro_Botticelli




                      http://live.dbpedia.org/page/Adoration_of_the_Magi_of_1475_(Botticelli)



      The link is not just a hyperlink because it is typed.
      In the BOTTICELLI page, the information about his life and works is
        structured, by means of the topology
 20
Semantic Network
     Enable Reasoning: OWL-DL, based on Description Logics, represent
      decidable fragments of First Order Logic

     Sandro_Botticelli  category: Italian_Renaissance_painters
     category: Italian_Renaissance_painters  category:Quattrocento_painters
     Sandro_Botticelli  category:Quattrocento_painters



                                             http://live.dbpedia.org/page/Category:Italian_Renaissance_painters


 http://live.dbpedia.org/page/Sandro_Botticelli




                                                  http://live.dbpedia.org/page/Category:Quattrocento_painters

21
Linked Open Data Cloud (2011)


                                 Doubled in size
                                 every 10 months,
                                 since 2007

                                 Media
                                 User-generated
                                 Geographic
                                 Publications
                                 Government
                                 Cross-Domain
                                 Life Sciences




22
Recipes for Serving Information as Linked Data
     • Entities must be identified with referenceable HTTP URIs.
     • At the MIME-type application/rdf+xml, the data source must return
       an RDF/XML description.
     • RDF descriptions should also contain RDF links to resources
       provided by other data sources, so that clients can navigate the Web
       of Data as a whole by following RDF links.




23
Open Data As A Service
Towards Government 2.0

     “ Governments IT need to redefine themselves as Government as a Platform ”

     Open Data is the platform for Open Government.
      Actors:
     • Institutions: to better serve services for citizens
     • Civic-minded developers: to serve themselves and the others by extending
       the platform (i.e. mash-ups, applications)

     What actors need: Open Data management platforms, consistent admin tools
      and a powerful Open Data Catalog to consolidate the entire Open Data
      lifecycle (STEP 1-5)




25
Open Data As-a-Service




26
Open Data As-A-Service

                   REST API
Mobile App


                 REST API
     Web App

                   REST API
      Mobile App



     Data-on-Demand data are not closed inside CMS applications but are
      consumed on-demand As-a-Service
     Data as Web Resources RESTful API make it possible to retrieve data as
      a web resource (through URI)
27
Socrata: GovStat Approach
     Socrata is being realeasing fragments of the platform as Open Source
      in Git Hub
     https://github.com/open-data-standards

     Business Model is moving to advanced data analysis tools, mining, real
      time monitoring, decision making support systems
     http://www.socrata.com/govstat/




28
Cultural Heritage Hacking
Open Data in a Cultural Heritage Scenario
     Art Galleries, Libraries, Archives and Museums (GLAMS) are exploring the
     added value of sharing their data resources as LOD




     Key facts:
     • Rich and structured data sets accumulated over many years by experts
     • Ability to reach out to audiences to both enrich datasets and to evaluation
       services
     • Long-standing expertise in meta-data management and
       (co-) curation
     • Authoritative knowledge on a wide range of subjects
30
GLAMS LOD Examples
     In Agora, the Rijksmuseum Amsterdam and the
     Netherlands Institute for Sound and Vision collaborate
     with the Computer Science and History departments at
     the VU to integrate their collections and enrich with
     historical information to facilitate a more comprehensive
     understanding of the historical dimension of objects in
     online heritage collections. [http://agora.cs.vu.nl/]

     The Amsterdam Museum was the first museum in the
     Netherlands to convert its complete museum collection
     database to RDF. The resulting resource consists of
     more than 5 Million RDF triples describing over than
     70.000 cultural heritage objects. Several working
     examples uses this dataset, such as a mobile city
     guide.
31
GLAMS LOD Examples
     Europeana is a pan-European initiative that provides
     access to millions of objects as LOD through API. The
     Europeana Thought Lab[5] search interface shows how
     LOD principles can aid the search process. Europeana
     has been a strong supporter for the uptake of CC0, the
     "no rights reserved" in Creative Commons-licenses
     [http://pro.europeana.eu]

     Open Images provides access to a large and growing
     collection of Creative Commons licensed archive
     material. The meta-data is converted to RDF, allowing
     the creation of rich semantic links between other
     datasets such as the Amsterdam Museum dataset
     [http://www.openimages.eu/]

32
PROS and CONS of LOD for GLAMS
PROS
     • Driving users to online content held by GLAMS (e.g., by improved
       search engine optimization);
     • Stimulating collaboration in the library, archives and museums
       domain and beyond, for instance by inviting people to clean/enrich
       existing data;
     • Enabling new scholarship that can only be done with open data;
     • Allowing the creation of new services for discovery;
     • quoting Verwayen (2011) ―increas[ing] relevance to digital society.‖

CONS
     • Loss of Attribution to the ―memory institution‖, which may turn to
       decrease values of the artworks
     • Loss of potential Incomes: open data may not be sold
33
Metrics of Success

 Incomes: measured in money

 Public Outreach: to measure the
  number of (online) visitors

 Reuse: to measure the use of data and content by heritage
  institutions themselves and by others

 Public Participation: to measure the amount of added metadata
  and content


34
Developing Open Linked Data
      (with Graph Database)
Developing Open Linked Data
     We may recognize few contingencies in our scenario:
     • Exponential growth in data volumes
     • Rise of connectedness
     • Increase in degrees of semi-structure
     • Structures and Schemes emerge rather than having a pre-defined
       upfront



     Key facts:
     • Volume: the size of the stored data
     • Velocity: the rate at which data changes over time
     • Variety: the degree to which data is regularly or irregularly
       structured, dense or sparse, and importantly connected or
37     disconnected
ER Approach
     We do not know the structure of the documents in design time.
     Adopting an ER approach we have to define vertical tables




38
Relational Model Weakness
     In ER model relationships are semantic free (direction, name)

     • As the amount of semi-structured information increases, the
       relational model becomes burdened
     • Maintenance overheads: join tables and maintaining foreign key constraints
       just to make the database work.
     • Large join tables, sparsely populated rows and lots of null-checking logic


     • Difficult to face with reciprocal queries in nowadays semi-structured,
       real-world cases
     • Recommendation systems, social networks




39
Aggregate Stores Weakness
     Aggregates allow to mimic relationships embedding cross-stores
      identifiers, but:
     • Is up to the developer to manage, infer and reify useful knowledge
        from that
     • Do not provide index-free adjacency
     • Delete must be checked


     • Traversing relationships is expensive, each link requiring index
       lookup
     • Brute force computing an entire data set is O(n) since all n aggregates in
       the data store must be considered. That’s far too costly where we’d prefer
       O(log n)
     • Impractical in real time scenario

40
Storing data in Graphs
     Graph theory was pioneered by Euler in the 18th century, received
       multidisciplinary contributes across centuries
     • Facebook, Google and Twitter have centered their business models
       around their own proprietary distributed graph technologies

                                Facebook TAO
                                Twitter FlockDB


     Graph databases store information in ways that much more closely
      resemble the ways the world is organized and the humans ―think
      about‖ data.
     Top 10 Gartner IT technologies in 2013 ―[..] are designed to support
      new transaction, interaction and observation use cases involving web
      scale, mobile, cloud and clustered environments‖

41
From Relational to Graph based Modeling
     Graph DB place relationships as first-class abstractions of the data
      model

                                        • It contains nodes and relationships
                                        • Nodes contain properties (key-
                                          value pairs)
                                        • Relationships are named, directed
                                          and always have a start and end
                                          node
                                        • Relationships can also contain
                                          properties


     A Graph –[:RECORDS_DATA_IN] Nodes –[:WHICH_HAVE]
      Properties.
     Nodes –[:LINKED_BY] Relationships
42
From Relational to Graph based Modeling




 Shake RDBMS while keeping all the relationships, and you’ll see a
  graph

 Where RDBMS are optimized for aggregated data, Graph Database
  are optimized for highly connected data
43
Traversing Map Performances
     Friend of Friend (FoF) problem : for any person in a social network,
      look for a route to some other person in the graph at most depth=N
      hops away.
     For a social network containing 1,000,000 people each with ~50 friends
      the results (*) shows that graph databases are the best choice

     Depth   RDBMS Execution Time (s)       Neo4j (s)           Returned Records

     2       0.016                          0.01                      ~2500

     3       30.267                         0.168                    ~110,000

     4       1543.505                       1.359                    ~600,000

     5       Unfinished                     2.132                    ~800,000



45                      (*) Graph Databases, O’ Reilly – To Appear
A Case Study
Using Neo4j and Spring Data
Neo4j Graph DB


 intuitive, using a graph model for data
   representation
 reliable, with full ACID transactions
 durable and fast, using a custom disk-based, native storage engine
 massively scalable, up to several billion nodes/relationships/properties
 highly-available, when distributed across multiple machines
 expressive, with a powerful, human readable graph query language
 fast, with a powerful traversal framework for high-speed graph queries
 embeddable, with a few small jars
 simple, accessible by a convenient REST interface or an object-
   oriented Java API

48
Spring Data and Neo4J

     Promotes POJO based development for the Graph Database Neo4j.
     It maps annotated entity classes to the Neo4j Graph Database with
       advanced mapping functionality.




     Seamless integration of the Cypher Query Language




49
Spring Data Neo4j
     It is possible to derive queries for domain entities from finder method
       names like Iterable<T>




     @Indexed fields will be converted into index-lookups of the start
      clause, navigation along relationships will be reflected in the match
      clause properties with operators will end up as expressions in the
      where clause
50
Open Linked Graph
     User




51
Open Linked Graph
                          User




                [:OWNS]             [:OWNS]
                                              Document
     Document




52
Open Linked Graph
                                  User




                      [:OWNS]                [:OWNS]
                                                                Document
          Document
                                                                                  [:INCLUDES]
                            [:INCLUDES]
      [:INCLUDES]
                                      Node        [:INCLUDES]      [:INCLUDES]            Node
                    [:INCLUDES]
Node                    Node                                                     Node
                                                            Node




 53
Open Linked Graph
                                  User




                      [:OWNS]                   [:OWNS]
                                                                   Document
          Document
                                                                                     [:INCLUDES]
                            [:INCLUDES]
      [:INCLUDES]
                                      Node           [:INCLUDES]      [:INCLUDES]            Node
                    [:INCLUDES]
Node                    Node                                                        Node
                                                               Node
                                  [:DBP_LINKED]                                            [:DBP_LINKED]
                [:DBP_LINKED]
                                                                                       [:LOCATED]
                            [:LOCATED]                         [:LOCATED]
  [:LOCATED]
                                                                                [:DBP_LINKED]


                                                                                                   DBPedia URI
                                           DBPedia URI
                                                                   Venue    DBPedia URI
        Venue    DBPedia URI                                                               Venue
                                   Venue
 54
Ideas for Changing the Future


     1. User Centric Experience
     2. Relevance Based
        Approach
     3. Big Data
     4. Environments and
        Societies
     5. Smart Cities




55
Thanks

    Michele Piunti
m.piunti@reply.eu
Annex
Modular Approach

                                          Mobile                  Web UI
      Interface     Public API
                                           JQuery UI, Kendo UI,
                      Spring REST               Bootstrap




                                                                                              Open CMS
                             Enterprise                              OLAP
                             Framework                              Analysis,




                                                                                                           SaaS
                                                                     Mining
      Manager                    Spring, J2EE


                                                                       Pentaho
                      Persistence Manager
                        Spring Data, MyBatis, Hybernate
                                                                                                         SOCRATA
                                                                                             CKan
                                                                                                            **

                          Geographic Information System

     Data Storage                          Ontology
                    NoSQL                       RDF, AML *
                                                                    SQL DBMS
                    Neo4J,MongoDB                                   Oracle, MySQL, Postgre




                                    Ongoing collaborations:
58                                   (*) ST-Lab, ISTC-CNR
                                        (**) Socrata, Inc.

Weitere ähnliche Inhalte

Andere mochten auch

Embodied Organizations A unifying perspective in programming Agents, Organiza...
Embodied Organizations A unifying perspective in programming Agents, Organiza...Embodied Organizations A unifying perspective in programming Agents, Organiza...
Embodied Organizations A unifying perspective in programming Agents, Organiza...Michele Piunti
 
Graphs in the Real World
Graphs in the Real WorldGraphs in the Real World
Graphs in the Real WorldNeo4j
 
Alimentaria Istituto "G. Caporale"
Alimentaria Istituto "G. Caporale"Alimentaria Istituto "G. Caporale"
Alimentaria Istituto "G. Caporale"Michele Piunti
 
Executive Briefing: Strategic Issues Surrounding Cloud Services
Executive Briefing:  Strategic Issues Surrounding Cloud ServicesExecutive Briefing:  Strategic Issues Surrounding Cloud Services
Executive Briefing: Strategic Issues Surrounding Cloud ServicesWhitmeyerTuffin
 
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika AldabaLightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldabaux singapore
 

Andere mochten auch (7)

Embodied Organizations A unifying perspective in programming Agents, Organiza...
Embodied Organizations A unifying perspective in programming Agents, Organiza...Embodied Organizations A unifying perspective in programming Agents, Organiza...
Embodied Organizations A unifying perspective in programming Agents, Organiza...
 
Piunti coin10
Piunti coin10Piunti coin10
Piunti coin10
 
Graphs in the Real World
Graphs in the Real WorldGraphs in the Real World
Graphs in the Real World
 
Alimentaria Istituto "G. Caporale"
Alimentaria Istituto "G. Caporale"Alimentaria Istituto "G. Caporale"
Alimentaria Istituto "G. Caporale"
 
PhD dissertation 2010
PhD dissertation 2010PhD dissertation 2010
PhD dissertation 2010
 
Executive Briefing: Strategic Issues Surrounding Cloud Services
Executive Briefing:  Strategic Issues Surrounding Cloud ServicesExecutive Briefing:  Strategic Issues Surrounding Cloud Services
Executive Briefing: Strategic Issues Surrounding Cloud Services
 
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika AldabaLightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
 

Ähnlich wie Linked_Open_Data_Rome_Netcamp_13

Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked .
 
Data Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementData Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementRENDER project
 
Semantic Search: We're Living in a Golden Age for Information
Semantic Search: We're Living in a Golden Age for InformationSemantic Search: We're Living in a Golden Age for Information
Semantic Search: We're Living in a Golden Age for Information3 Round Stones
 
Big Data Session Presentations
Big Data Session PresentationsBig Data Session Presentations
Big Data Session PresentationsePSI Platform
 
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 KeynoteSmart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 KeynoteJoel Natividad
 
Closing plenary: the future of public sector websites #BPCW11
Closing plenary: the future of public sector websites #BPCW11Closing plenary: the future of public sector websites #BPCW11
Closing plenary: the future of public sector websites #BPCW11Headstar
 
Linked Data as a Service
Linked Data as a ServiceLinked Data as a Service
Linked Data as a ServicePeter Haase
 
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data VirtualizationMyth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data VirtualizationDenodo
 
Sentara Linked Data Workshop - Sept 10, 2012
Sentara Linked Data Workshop - Sept 10, 2012Sentara Linked Data Workshop - Sept 10, 2012
Sentara Linked Data Workshop - Sept 10, 20123 Round Stones
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentPeter Haase
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareIMC Technologies
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarHortonworks
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementMarieke Guy
 
Open Data Open Innovation and The Cloud gayler berlin nov12
Open Data Open Innovation and The Cloud   gayler berlin nov12Open Data Open Innovation and The Cloud   gayler berlin nov12
Open Data Open Innovation and The Cloud gayler berlin nov12Mark Gayler
 
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)Denodo
 
COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015Comsode - FP7 project
 
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...Peter Haase
 
The Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data ImplementationThe Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data ImplementationInside Analysis
 

Ähnlich wie Linked_Open_Data_Rome_Netcamp_13 (20)

Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011
 
Hri in english-psi-2011-final
Hri in english-psi-2011-finalHri in english-psi-2011-final
Hri in english-psi-2011-final
 
Data Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementData Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data Management
 
Semantic Search: We're Living in a Golden Age for Information
Semantic Search: We're Living in a Golden Age for InformationSemantic Search: We're Living in a Golden Age for Information
Semantic Search: We're Living in a Golden Age for Information
 
Big Data Session Presentations
Big Data Session PresentationsBig Data Session Presentations
Big Data Session Presentations
 
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 KeynoteSmart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
 
Closing plenary: the future of public sector websites #BPCW11
Closing plenary: the future of public sector websites #BPCW11Closing plenary: the future of public sector websites #BPCW11
Closing plenary: the future of public sector websites #BPCW11
 
Linked Data as a Service
Linked Data as a ServiceLinked Data as a Service
Linked Data as a Service
 
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data VirtualizationMyth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
 
Ilmiöluento
IlmiöluentoIlmiöluento
Ilmiöluento
 
Sentara Linked Data Workshop - Sept 10, 2012
Sentara Linked Data Workshop - Sept 10, 2012Sentara Linked Data Workshop - Sept 10, 2012
Sentara Linked Data Workshop - Sept 10, 2012
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the Software
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data Management
 
Open Data Open Innovation and The Cloud gayler berlin nov12
Open Data Open Innovation and The Cloud   gayler berlin nov12Open Data Open Innovation and The Cloud   gayler berlin nov12
Open Data Open Innovation and The Cloud gayler berlin nov12
 
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
 
COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015
 
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
 
The Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data ImplementationThe Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data Implementation
 

Kürzlich hochgeladen

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 

Kürzlich hochgeladen (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Linked_Open_Data_Rome_Netcamp_13

  • 1. OPEN DATA & OPEN CULTURE Michele Piunti Whitehall Reply
  • 2. OUTLINE • Background • Motivations • Approaches • Open Data As A Service • Cultural Heritage Hacking • Case Study 2
  • 4. Obama Vision ―In the coming year, we’ll also work to rebuild people’s faith in the institution of government. Because you deserve to know exactly how and where your tax dollars are being spent, you’ll be able to go to a website and get that information for the very first time in history. Because President Obama you deserve to know when your The State of the Union elected officials are meeting Speech - 2011 with lobbyists, I ask Congress http://www.whitehouse.gov/ to do what the White House has state-of-the-union-2011 already done — put that 5 information online.―
  • 5. What Open Data is Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. The goals of the open data movement are similar to those of other "Open" movements such as open source, open content, and open access (Ref. Wikipedia) Citizen centricity comes from citizen empowerment, namely disintermediation .wrt traditional actors 6
  • 6. Expected Payoff • Ubiquitous access • Re usability • Optimization • Social and Cultural enrichment ROI ―A greater than 100X return on investment in direct Federal IT spending through economies of scope is achievable by equipping agencies with an Open Data platform that is the shared foundation for numerous programs that are independently funded today‖ [http://www.socrata.com/blog/open-data-as-a-platform/] OD turns to be a formidable tool for: • Analyzing Spending Review on administrations expenses • Enforcing Fact Checking on declarations policies and campaigns 7
  • 7. Where Open Data is http://census.okfn.org/ https://nycopendata.socrata.com/ https://dati.lombardia.it ..and counting 8
  • 8. Italian Digital Agenda : Open Data + E-Gov Italy established in 2012 a ―control room‖ of experts aimed at promoting Open Data in the context of a digital agenda Open Data is integrated with E-Gov 1. Enabling Infrastructures 2. PA digital switchover 3. Purposive and regulative set of norms and rules 4. Communication plan The challenge is: optimizing services and costs: • Digital Identities and related services, unified and web based registry offices, e-payments, continuous census, interoperability of EU platforms • Digital health, Cultural Heritage • eLearning, eProcurement, eRecruitment 9
  • 9. dati.gov.it dati.gov.it/content/infografica dati.gov.it/content/parte-lopen-data-default 10
  • 11. Roadmap to Open Data Data assets Identify Use Cases Identify ROI Architectur Legal LOD analysis and Final Users • Risk e Definition Issues Feasibility • Relevant Datasets • Best Practices and assessment • Identify service • Copyright report, Exec • Customer internal similar datasets • Savings level • Licensing utive Plan processes analysis • Linked Data Cloud • Identify non • W3C • Liability of Data and Road quantifiable Compliance update Map ROI Identify Development Data Enrichment Validation and Composition Datasets • Architecture • Metadata Publication of Services LOD • Data Analysis Definition description, Ontolo • W3C Compliance • Documentatio Services • Datasets Store gy, RDF • Data Localization, n • Data and transformation • Internal linking • External Linking History • Build Ecosystem Platform • SPARQL • External • Communication • Public API •Normalization Endpoint component Plan (GIS, Data- Mining, BI Analysis modules) Service Development Knowledge Transfer 15
  • 12. Legal Restrictions, Privacy, Licenses Multiple legal or regulatory restrictions on the use of the data. 16
  • 13. 5★ Open Data Tim Berners-Lee, the inventor of the Web and Linked Data initiator, suggested a 5 star deployment scheme for Open Data. make PUBLIC stuff available on the Web (whatever ★ format, .jpeg .pdf) under an open license make it available as structured data (e.g., Excel ★★ instead of image scan of a table) use non-proprietary formats ★★★ (e.g., CSV instead of Excel) use URIs to denote things, so that people ★★★★ can point at your stuff ★★★★★ link your data to other data to provide context 17
  • 14. W3C Roadmap Having Standard Names/URIs for All Government Objects aids in discoverability, improves metadata, and ensures authenticity. • Provide permanent, patterned and/or discoverable URI/URLs to your data • Create a web page with a plain language description of the dataset to help search engines find the data, so people can use it. • Provide links out to other data and documentation. • Ensure that data is findable and can be referenced for as long as people need it • Data published in industry standards like (X)HTML, XML and RDF can be used as an object database or RESTful API 18
  • 15. Linked Open Data Recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs OWL and RDF. 1. Requires Ontologies to be applied to data 2. Allows heterogeneous Nodes to be traversed in a semantically coherent fashion 19
  • 16. Botticelli Case One may specify that the author’s mention of ―La Primavera‖ at Uffizi Museum LINKS to exactly the same person as the one described on the Dbpedia (LOD of Wikipedia) http://live.dbpedia.org/page/Primavera_(painting) http://live.dbpedia.org/page/Sandro_Botticelli http://live.dbpedia.org/page/Adoration_of_the_Magi_of_1475_(Botticelli) The link is not just a hyperlink because it is typed. In the BOTTICELLI page, the information about his life and works is structured, by means of the topology 20
  • 17. Semantic Network Enable Reasoning: OWL-DL, based on Description Logics, represent decidable fragments of First Order Logic Sandro_Botticelli  category: Italian_Renaissance_painters category: Italian_Renaissance_painters  category:Quattrocento_painters Sandro_Botticelli  category:Quattrocento_painters http://live.dbpedia.org/page/Category:Italian_Renaissance_painters http://live.dbpedia.org/page/Sandro_Botticelli http://live.dbpedia.org/page/Category:Quattrocento_painters 21
  • 18. Linked Open Data Cloud (2011) Doubled in size every 10 months, since 2007 Media User-generated Geographic Publications Government Cross-Domain Life Sciences 22
  • 19. Recipes for Serving Information as Linked Data • Entities must be identified with referenceable HTTP URIs. • At the MIME-type application/rdf+xml, the data source must return an RDF/XML description. • RDF descriptions should also contain RDF links to resources provided by other data sources, so that clients can navigate the Web of Data as a whole by following RDF links. 23
  • 20. Open Data As A Service
  • 21. Towards Government 2.0 “ Governments IT need to redefine themselves as Government as a Platform ” Open Data is the platform for Open Government. Actors: • Institutions: to better serve services for citizens • Civic-minded developers: to serve themselves and the others by extending the platform (i.e. mash-ups, applications) What actors need: Open Data management platforms, consistent admin tools and a powerful Open Data Catalog to consolidate the entire Open Data lifecycle (STEP 1-5) 25
  • 23. Open Data As-A-Service REST API Mobile App REST API Web App REST API Mobile App Data-on-Demand data are not closed inside CMS applications but are consumed on-demand As-a-Service Data as Web Resources RESTful API make it possible to retrieve data as a web resource (through URI) 27
  • 24. Socrata: GovStat Approach Socrata is being realeasing fragments of the platform as Open Source in Git Hub https://github.com/open-data-standards Business Model is moving to advanced data analysis tools, mining, real time monitoring, decision making support systems http://www.socrata.com/govstat/ 28
  • 26. Open Data in a Cultural Heritage Scenario Art Galleries, Libraries, Archives and Museums (GLAMS) are exploring the added value of sharing their data resources as LOD Key facts: • Rich and structured data sets accumulated over many years by experts • Ability to reach out to audiences to both enrich datasets and to evaluation services • Long-standing expertise in meta-data management and (co-) curation • Authoritative knowledge on a wide range of subjects 30
  • 27. GLAMS LOD Examples In Agora, the Rijksmuseum Amsterdam and the Netherlands Institute for Sound and Vision collaborate with the Computer Science and History departments at the VU to integrate their collections and enrich with historical information to facilitate a more comprehensive understanding of the historical dimension of objects in online heritage collections. [http://agora.cs.vu.nl/] The Amsterdam Museum was the first museum in the Netherlands to convert its complete museum collection database to RDF. The resulting resource consists of more than 5 Million RDF triples describing over than 70.000 cultural heritage objects. Several working examples uses this dataset, such as a mobile city guide. 31
  • 28. GLAMS LOD Examples Europeana is a pan-European initiative that provides access to millions of objects as LOD through API. The Europeana Thought Lab[5] search interface shows how LOD principles can aid the search process. Europeana has been a strong supporter for the uptake of CC0, the "no rights reserved" in Creative Commons-licenses [http://pro.europeana.eu] Open Images provides access to a large and growing collection of Creative Commons licensed archive material. The meta-data is converted to RDF, allowing the creation of rich semantic links between other datasets such as the Amsterdam Museum dataset [http://www.openimages.eu/] 32
  • 29. PROS and CONS of LOD for GLAMS PROS • Driving users to online content held by GLAMS (e.g., by improved search engine optimization); • Stimulating collaboration in the library, archives and museums domain and beyond, for instance by inviting people to clean/enrich existing data; • Enabling new scholarship that can only be done with open data; • Allowing the creation of new services for discovery; • quoting Verwayen (2011) ―increas[ing] relevance to digital society.‖ CONS • Loss of Attribution to the ―memory institution‖, which may turn to decrease values of the artworks • Loss of potential Incomes: open data may not be sold 33
  • 30. Metrics of Success Incomes: measured in money Public Outreach: to measure the number of (online) visitors Reuse: to measure the use of data and content by heritage institutions themselves and by others Public Participation: to measure the amount of added metadata and content 34
  • 31. Developing Open Linked Data (with Graph Database)
  • 32. Developing Open Linked Data We may recognize few contingencies in our scenario: • Exponential growth in data volumes • Rise of connectedness • Increase in degrees of semi-structure • Structures and Schemes emerge rather than having a pre-defined upfront Key facts: • Volume: the size of the stored data • Velocity: the rate at which data changes over time • Variety: the degree to which data is regularly or irregularly structured, dense or sparse, and importantly connected or 37 disconnected
  • 33. ER Approach We do not know the structure of the documents in design time. Adopting an ER approach we have to define vertical tables 38
  • 34. Relational Model Weakness In ER model relationships are semantic free (direction, name) • As the amount of semi-structured information increases, the relational model becomes burdened • Maintenance overheads: join tables and maintaining foreign key constraints just to make the database work. • Large join tables, sparsely populated rows and lots of null-checking logic • Difficult to face with reciprocal queries in nowadays semi-structured, real-world cases • Recommendation systems, social networks 39
  • 35. Aggregate Stores Weakness Aggregates allow to mimic relationships embedding cross-stores identifiers, but: • Is up to the developer to manage, infer and reify useful knowledge from that • Do not provide index-free adjacency • Delete must be checked • Traversing relationships is expensive, each link requiring index lookup • Brute force computing an entire data set is O(n) since all n aggregates in the data store must be considered. That’s far too costly where we’d prefer O(log n) • Impractical in real time scenario 40
  • 36. Storing data in Graphs Graph theory was pioneered by Euler in the 18th century, received multidisciplinary contributes across centuries • Facebook, Google and Twitter have centered their business models around their own proprietary distributed graph technologies Facebook TAO Twitter FlockDB Graph databases store information in ways that much more closely resemble the ways the world is organized and the humans ―think about‖ data. Top 10 Gartner IT technologies in 2013 ―[..] are designed to support new transaction, interaction and observation use cases involving web scale, mobile, cloud and clustered environments‖ 41
  • 37. From Relational to Graph based Modeling Graph DB place relationships as first-class abstractions of the data model • It contains nodes and relationships • Nodes contain properties (key- value pairs) • Relationships are named, directed and always have a start and end node • Relationships can also contain properties A Graph –[:RECORDS_DATA_IN] Nodes –[:WHICH_HAVE] Properties. Nodes –[:LINKED_BY] Relationships 42
  • 38. From Relational to Graph based Modeling Shake RDBMS while keeping all the relationships, and you’ll see a graph Where RDBMS are optimized for aggregated data, Graph Database are optimized for highly connected data 43
  • 39. Traversing Map Performances Friend of Friend (FoF) problem : for any person in a social network, look for a route to some other person in the graph at most depth=N hops away. For a social network containing 1,000,000 people each with ~50 friends the results (*) shows that graph databases are the best choice Depth RDBMS Execution Time (s) Neo4j (s) Returned Records 2 0.016 0.01 ~2500 3 30.267 0.168 ~110,000 4 1543.505 1.359 ~600,000 5 Unfinished 2.132 ~800,000 45 (*) Graph Databases, O’ Reilly – To Appear
  • 40. A Case Study Using Neo4j and Spring Data
  • 41. Neo4j Graph DB intuitive, using a graph model for data representation reliable, with full ACID transactions durable and fast, using a custom disk-based, native storage engine massively scalable, up to several billion nodes/relationships/properties highly-available, when distributed across multiple machines expressive, with a powerful, human readable graph query language fast, with a powerful traversal framework for high-speed graph queries embeddable, with a few small jars simple, accessible by a convenient REST interface or an object- oriented Java API 48
  • 42. Spring Data and Neo4J Promotes POJO based development for the Graph Database Neo4j. It maps annotated entity classes to the Neo4j Graph Database with advanced mapping functionality. Seamless integration of the Cypher Query Language 49
  • 43. Spring Data Neo4j It is possible to derive queries for domain entities from finder method names like Iterable<T> @Indexed fields will be converted into index-lookups of the start clause, navigation along relationships will be reflected in the match clause properties with operators will end up as expressions in the where clause 50
  • 44. Open Linked Graph User 51
  • 45. Open Linked Graph User [:OWNS] [:OWNS] Document Document 52
  • 46. Open Linked Graph User [:OWNS] [:OWNS] Document Document [:INCLUDES] [:INCLUDES] [:INCLUDES] Node [:INCLUDES] [:INCLUDES] Node [:INCLUDES] Node Node Node Node 53
  • 47. Open Linked Graph User [:OWNS] [:OWNS] Document Document [:INCLUDES] [:INCLUDES] [:INCLUDES] Node [:INCLUDES] [:INCLUDES] Node [:INCLUDES] Node Node Node Node [:DBP_LINKED] [:DBP_LINKED] [:DBP_LINKED] [:LOCATED] [:LOCATED] [:LOCATED] [:LOCATED] [:DBP_LINKED] DBPedia URI DBPedia URI Venue DBPedia URI Venue DBPedia URI Venue Venue 54
  • 48. Ideas for Changing the Future 1. User Centric Experience 2. Relevance Based Approach 3. Big Data 4. Environments and Societies 5. Smart Cities 55
  • 49. Thanks Michele Piunti m.piunti@reply.eu
  • 50. Annex
  • 51. Modular Approach Mobile Web UI Interface Public API JQuery UI, Kendo UI, Spring REST Bootstrap Open CMS Enterprise OLAP Framework Analysis, SaaS Mining Manager Spring, J2EE Pentaho Persistence Manager Spring Data, MyBatis, Hybernate SOCRATA CKan ** Geographic Information System Data Storage Ontology NoSQL RDF, AML * SQL DBMS Neo4J,MongoDB Oracle, MySQL, Postgre Ongoing collaborations: 58 (*) ST-Lab, ISTC-CNR (**) Socrata, Inc.