SlideShare a Scribd company logo
1 of 88
SESAM
    Lars Marius Garshol, Baksia, 2013-02-13
    larsga@bouvet.no, http://twitter.com/larsga

1
Agenda

    •   Project background
    •   Functional overview
    •   Under the hood
    •   Similar projects (if time)




2
Lars Marius Garshol
• Consultant in Bouvet since 2007
  – focus on information architecture and semantics
• Worked with semantic technologies since 1999
  –   mostly with Topic Maps
  –   co-founder of Ontopia, later CTO
  –   editor of several Topic Maps ISO standards 2001-
  –   co-chair of TMRA conference 2006-2011
  –   developed several key Topic Maps technologies
  –   consultant in a number of Topic Maps projects
• Published a book on XML on Prentice-Hall
• Implemented Unicode support in the Opera web
  browser
My role on the project
• The overall architecture is the brainchild of Axel
  Borge
• SDshare came from an idea by Graham Moore
• I only contributed parts of the design
   – and some parts of the implementation
• Don’t actually know the whole system
Hafslund SESAM
Hafslund ASA
• Norwegian energy company
  –   founded 1898
  –   53% owned by the city of Oslo
  –   responsible for energy grid around Oslo
  –   1.4 million customers
• A conglomerate of companies
  –   Nett (electricity grid)
  –   Fjernvarme (remote heating)
  –   Produksjon (power generation)
  –   Venture
  –   ...
What if...?
ERP
                                                            CRM
                 Work                     Bill
                 order




      Transfor
                         Cables   Meter          Customer
        mer
Hafslund SESAM
• An archive system, really
• Generally, archive systems are glorified trash cans
  – putting it in the archive effectively means hiding it
• Because archives are not important, are they?
• Except, when you need that contract from 1937
  about the right to build a power line across...
Problems with archives
• Many documents aren’t there
  – even though they should be,
  – because entering metadata is too much hassle
• Poor metadata, because nobody bothers to enter
  it properly
  – yet, much of the metadata exists in the user context
• Not used by anybody
  – strange, separate system with poor interface
  – (and the metadata is poor, too)
• Contains only documents
  – not connected to anything else
Goals for the SESAM project
• Increase percentage of archived documents
  – to do this, archiving must be made easy
• Increase metadata quality
  – by automatically harvesting metadata
• Make it easy to find archived documents
  – by building a well-tuned search application
What SESAM does
• Automatically enrich document metadata
  – to do that we have to collect background data from the
    source systems
• Connect document metadata with structured
  business data
  – this is a side-effect of the enrichment
• Provide search across the whole information
  space
  – once we’ve collected the data this part is easy
As seen by customer




13
High-level architecture

                                          Share-
                     ERP      CRM
                                           point

                            SDshare
                                                        CMIS




  Search   SDshare
                           Triple store      SDshare   Archive
  engine
Main principle of data extraction
• No canonical model!
  – instead, data reflects model of source system
• One ontology per source system
  – subtyped from core ontology where possible
• Vastly simplifies data extraction
  – for search purposes it loses us nothing
  – and translation is easier once the data is in the triple store
Simplified core ontology
Data structure in triple store
  ERP

                sameAs

  Archive

                         sameAs


 CRM



 Sharepoint
Connecting data
                                                      ERP

     • Contacts in many different
       systems                               sameAs

       – ERP has one set of contacts,
       – these are mirrored in archive                Archive

     • Different IDs in these two
       systems
       – using “sameAs” we know which
         are the same                        =
       – can do queries across the systems
       – can also translate IDs from one
         system to the other


18
Duplicate suppression
       ERP                          CRM                             Billing
     Suppliers

    Customers                 Customers                           Customers

    Companies




                                          Field       Record 1     Record 2      Probability
                                          Name        acme inc     acme inc      0.9
                                          Assoc no    177477707                  0.5
                                          Zip code    9161         9161          0.6
                       owl:sameAs
                                          Country     norway Duke norway         0.51
     Data hub
                                          Address 1   mb 113       mailbox 113   0.49
                                          Address 2                              0.5

                    SDshare

   http://code.google.com/p/duke/
When archiving
• The user works on the document in some system
  – ERP, CRM, whatever
• This system knows the context
  – what user, project, equipment, etc is involved
• This information is passed to the CMIS server
  – it uses already gathered information from the triple store
    to attach more metadata
Auto-tagging
                                        Manager




  Sent to archive           Project

                                        Customer

                    Work
                    order

                            Equipment


                                        Equipment
Archive integration

     • Clients deliver documents via CMIS
       – an OASIS standard for CMS interfaces
       – lots of available implementations
     • Metadata translation
       – not only auto-tagging                                 CMIS
       – client vocabulary translated to archive vocab
       – required static metadata added automatically
     • Benefits
       – clients can reuse existing software for connecting   Archive

       – interface is independent of actual archive
       – archive integration is reusable


22
Archive integration elsewhere

     • Generally, archive integrations are hard
       – web site integration: 400 hours
       – #2 integration: performance problems
     • They also reproduce the same functionality
       over and over again
     • Hard bindings against
       – internal archive model           System   Regulation
                                                                Outlook
       – archive interface                  #2      system


     • Switching archive system        System
                                                                    Web site
        will be very hard...             #1

       – even upgrading is hard
                                                   Archive

23
Showing context in the ERP system
Access control
• Users only see objects they’re allowed to see
• Implemented by search engine
  – all objects have lists of users/groups allowed to see them
  – on login a SPARQL query lists user’s access control group
    memberships
  – search engine uses this to filter search results
• In some cases, complex access rules are run to
  resolve ACLs before loading into triple store
  – e.g: archive system
Data volumes
    Graph             Statements
    IFS data                   5,417,260
    Public 360 data            3,725,963
    GeoNIS data                    44,242
    Tieto CAB data           138,521,810
    Hummingbird 1             32,619,140
    Hummingbird 2            165,671,179
    Hummingbird 3            192,930,188
    Hummingbird 4             48,623,178
    Address data               2,415,315
    Siebel data               36,117,786
    Duke links                      4,858
    Total                   626,090,919
7
Under the hood



33
RDF

     • The data hub is an RDF database
       – RDF = Resource Description Framework
       – a W3C standard for data
       – also interchange format and query language (SPARQL)
     • Many implementations of the standards
       – lots of alternatives to choose between
     • Technology relatively well known
       – books, specifications, courses, conferences, ...




34
How RDF works
     ‘PERSON’ table
     ID NAME                   EMAIL
     1   Stian Danenbarger     stian.danenbarger@
     2   Lars Marius Garshol   larsga@bouvet.no
     3   Axel Borge            axel.borge@bouvet

            RDF-ized data
            SUBJECT                       PROPERTY   OBJECT
            http://example.com/person/1   rdf:type   ex:Person
            http://example.com/person/1   ex:name    Stian Danenbarger
            http://example.com/person/1   ex:email   stian.danenbarger@
            http://example.com/person/2   rdf:type   Person
            http://example.com/person/2   ex:name    Lars Marius Garshol
            ...                           ...        ...

35
RDF/XML

     • Standard XML format for RDF
       – not really very nice
     • However, it’s a generic format
       – that is, it’s the same format regardless of what data
         you’re transmitting
     • Therefore
       – all data flows are in the same format
       – absolutely no syntax transformations whatsoever




36
Plug-and-play

     • Data source integrations are
       pluggable
                                                    Data
       – if you need another data source,
                                                    hub
         connect and pull in the data
     • RDF database is schemaless
       – no need to create tables and
         columns beforehand
     • No common data model                 System #1   System #2
       – do not need to transform data
         to store it



37
Product queue

     • Since sources are pluggable we can guide
       the project with a product queue
       – consisting of new sources and data elements
     • We analyse the items in the queue
       – estimate complexity and amount of work
       – also to see how it fits into what’s already there
     • Customer then decides what to do, and in
       what order




38
Don’t push!

     • The general IT approach is to push data
       – source calls services provided by recipient
     • Leads to high complexity
       –   two-way dependency between systems
       –   have to extend source system with extra logic
       –   extra triggers/threads in source system
       –   many moving parts


       System #1




39
Pull!

     • We let the recipient pull
        – always using same protocol and same format
     • Wrap the source
        – the wrapper must support 3 simple functions
     • A different kind of solution
        –   one-way dependency
        –   often zero code
        –   wrapper is thin and stateless
        –   data moving done by reused code


              System #1

40
The data integration
• All data transport done by SDshare
• A simple Atom-based specification for
  synchronizing RDF data
  – http://www.sdshare.org
• Provides two main features
  – snapshot of the data
  – fragments for each updated resource
Basics of SDshare
• Source offers
  – a dump of the entire data set
  – a list of fragments changed since time t
  – a dump of each fragment
• Completely generic solution
  – always the same protocol
  – always the same data format (RDF/XML)
• A generic SDshare client then transfers the data
  – to the receipient, whatever it is
SDshare service structure
Typical usage of SDshare
• Client downloads snapshot
  – client now has complete data set
• Client polls fragment feed
  – each time asking for new fragments since last check
  – client keeps track of time of last check
  – fragments are applied to data, keeping them in sync
Implementing the fragment feed
select objid, objtype, change_time
from history_log                   <atom>
where change_time > :since:         <title>Fragments for ...</title>
order by change_time asc            ...

                                     <entry>
                                        <title>Change to 34121</title>
                                        <link rel=fragment href=“...”/>
                                        <sdshare:resource>http://...</sdshare:resource>
                                        <updated>2012-09-06T08:22:23</updated>
                                     </entry>
                                     <entry>
                                        <title>Change to 94857</title>
                                        <link rel=fragment href=“...”/>
                                        <sdshare:resource>http://...</sdshare:resource>
                                        <updated>2012-09-06T08:22:24</updated>
                                     </entry>
                                    ...
The SDshare client
                                              SPARQL-   Triple
                                              backend   store
                  Frontend        Core
                                               POST-
                                              backend
                                                          W
                                                          S




   http://code.google.com/p/sdshare-client/
Getting data out of the triple store
                  • Set up SPARQL queries to
     RDF
                    extract the data
                  • Server does the rest
                  • Queries can be configured to
        SPARQL
                    produce
 SDshare server     – any subset of data
                    – data in any shape
Contacts into the archive
• We want some resources in the triple store to be
  written into the archive as “contacts”
  – need to select which resources to include
  – must also transform from source data model
• How to achieve without hard-wiring anything?
Contacts solution
• Create a generic archive object writer
  – type of RDF resource specifies type of object to create
  – name of RDF property (within namespace) specifies which
    property to set
• Set up RDF mapping from source data
  – type1 maps-to type2
  – prop1 maps-to prop2
  – only mapped types/properties included
• Use SPARQL to
  – create SDshare feed
  – do data translation with CONSTRUCT query
Properties of the system
• Uniform integration approach
   – everything is done the same way
• Really simple integration
   – setting up a data source is generally very easy
• Loose bindings
   – components can easily be replaced
• Very little state
   – most components are stateless (or have little state)
• Idempotent
   – applying a fragment 1 or many times: same result
• Clear and reload
   – can delete everything and reload at any time
Conclusion
     SESAM


51
Project outcome

     • Big, innovative project
       – many developers over a long time period
       – innovated a number of techniques and tools
     • Delivered on time, and working
       – despite an evolving project context
     • Very satisfied customer
       – they claim the project has already paid for itself in
         cost savings at the document center




52
Archive of the year 2012
                                                  ”Virksomheten har vært
                                                  innovative og strategiske i sin
                                                  bruk av teknologi både når det
                                                  gjelder de gamle papirarkivene og
                                                  det digitale arkivet. De har et
                                                  stort fokus på å øke datafangsten
                                                  og forenkle bruken av metadata.
                                                  De har hatt som mål at brukerne
                                                  kun skal måtte påføre én
                                                  metadata. De er nå nede i to – alt
                                                  annet blir påført automatisk i
                                                  løsningen – men målet er fortsatt
                                                  å komme ned i kun én metadata.”




53     http://www.arkivrad.no/utdanning.html?newsid=10677
Highly flexible solution

     • ERP integration replaced twice
       – without affecting other components
     • CRM system replaced while in production
       – all customers got new IDs
       – Sesam hid this from users, making transition 100%
         transparent




54
Status now

     • In production since autumn 2011
     • Used by
       – Hafslund Fjernvarme
       – Hafslund support centre




55
Ohter implementations

     • A very similar system has been
       implemented for Statkraft
     • A system based on the same architecture
       has been implemented for DSS
       – basically an intranet system




56
DSS Intranet



57
DSS project background

     • Delivers IT services and platform to
       ministries
        – archive, intranet, email, ...
     • Intranet currently based on EPiServer
        – in addition, want to use Sharepoint
        – of course, many other related systems...
     • How to integrate all this?
        – integrations must be loose, as systems in 13
          ministries come and go




58
Metadata structure
                                 Person
                              Org. enhet
                        (seksjon, avd., dep., virkso
                                   mhet)                                   Har ansvar for
                                                                                                                  Tema/
                                                                                                               arb.område/
                                                   Har ansvar for
                                                                                            Handler om         Arkivnøkkel

      Tilhører/ har rolle i                                                                                                     Handler om
                                                 Er involvert i           Prosjekt/ sak
                                                                                                                  Er relevant for
                                        Har rolle i forhold til


                                                                                              Saks-nr.               Tilhører
                                                                  Saks-/ Prosess-type
                                Person
                                                                                             Er ekspert på
                                                                                                                        Dokument/
                                                                                               Jobber med
                                                                                                                         Innhold
             Omtale/                                      Har ansvar for/ skrevet
             kompetanse               Tittel
                                      Telefon                                                                                                           Status
                                      adresse
                                      …                                                                      Dokument-type                   Fil-type
                                                                                                                                    Dato



59
Project scope



                        EPiServer
          IDM                        Sharepoint
                        (intranet)

     User information
     Access groups




60
How we built it




79
The web part
       EPiServer
                   We
                                   Sharepoint
                                    We
                                                         • Viser data gitt
                                                            – en spørring
                    b                b
                   part             part




         SDShare          SPARQL           SDShare
                                                            – et XSLT-stilsett
                                                         • Kan kjøres hvor som
                                                           helst
                      Virtuoso                              – standard protokoll til
                      RDF DB                                  databasen
                                                         • Lett å inkludere i
      SDShare                      SDShare
                   SDShare
                                                           andre systemer også
     Novell            Active                Regjering
      IDM             Directory               en.no


80
Data flow (simplified)

                                                           Workspaces
                                                           Documents
                                                           Lists
                                                                        Sharepoint
                                         Org structure

                 Categories
                 Page structure
     EPiServer                    Virtuoso
                                              Categories
                                  RDF DB


                  Employees
                  Org. structure

       Novell
        IDM


81
Independence of source

     • Solutions need to be independent of the
       data source
       – when we extract data, or
       – when we run queries against the data hub
     • Independence isolates clients from changes
       to the sources
       – allows us to replace sources completely, or
       – change the source models
     • Of course, total independence is impossible
       – if structures are too different it won’t work
       – but surprisingly often they’re not

82
Independence of source #1
                                                                            Core model


                              core:         core:participant       core:
                             Person                               Project




        idm:
                idm:has-member    idm:                     sp:      sp:member-of     sp:
       Person                    Project                 Person                    Project




                                      IDM         Sharepoint

83
Independence of source #2

     • “Facebook streams” a dss:StreamEvent;
                  sp:doc-created
                                  require
       queries across sources
                   rdfs:label "opprettet";
                       dss:time-property sp:Created;
        –   every source has its own model
                       dss:user-property sp:Author.
     • We can reduce the differences
                 sp:doc-updated a dss:StreamEvent;
       with annotation "endret";
                  rdfs:label
        – ie: we describe the data with RDF
                      dss:time-property sp:Modified;
                         dss:user-property sp:Editor.
     • Allows data models to
        – change, or
        – be plugged in
     • with no changes to query
                   idm:ny-ansatt a dss:StreamEvent;
                            rdfs:label "ble ansatt i";
                            dss:time-property idm:ftAnsattFra .

84
Conclusion



85
Execution

     • Customer asked for start August 15,
       delivery January 1
       – sources: EPiServer, Sharepoint, IDM
     • We offered fixed price, done November 1
     • Delibered october 20
       – also integrated with ActiveDirectory
       – added Regjeringen.no for extra value
     • Integration with ACOS Websak
       – has been analyzed and described
       – but not implemented


86
The data sources

     Source             Connector
     Active Directory   LDAP
     IDM                LDAP
     Intranet           EPiServer
     Regjeringen.no     EPiServer
     Sharepoint         Sharepoint




87
Conclusion

     • We could do this so quickly because
       – the architecture is right
       – have components we can reuse
       – the architecture allows us to plug together the
         components in different settings
     • Once we have the data, the rest is usually
       simple
       – it’s getting the data that’s hard




88

More Related Content

What's hot

HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseCloudera, Inc.
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebookparallellabs
 
Presentacion oracle exadata & exalogic f. podesta -yatch club 19 de abril 2012
Presentacion oracle exadata & exalogic   f. podesta -yatch club 19 de abril 2012Presentacion oracle exadata & exalogic   f. podesta -yatch club 19 de abril 2012
Presentacion oracle exadata & exalogic f. podesta -yatch club 19 de abril 2012ValeVilloslada
 
Moving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDBMoving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDBMongoDB
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Stefan Lipp
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBayMongoDB
 
Searching conversations with hadoop
Searching conversations with hadoopSearching conversations with hadoop
Searching conversations with hadoopDataWorks Summit
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloudImaginea
 
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014 WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014 Chris Almond
 
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...Cloudera, Inc.
 
Scaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedInScaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedInDataWorks Summit
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Chris Nauroth
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
Application Architectures with Hadoop - Big Data TechCon SF 2014
Application Architectures with Hadoop - Big Data TechCon SF 2014Application Architectures with Hadoop - Big Data TechCon SF 2014
Application Architectures with Hadoop - Big Data TechCon SF 2014hadooparchbook
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012larsgeorge
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DBHeriyadi Janwar
 

What's hot (20)

HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
 
Presentacion oracle exadata & exalogic f. podesta -yatch club 19 de abril 2012
Presentacion oracle exadata & exalogic   f. podesta -yatch club 19 de abril 2012Presentacion oracle exadata & exalogic   f. podesta -yatch club 19 de abril 2012
Presentacion oracle exadata & exalogic f. podesta -yatch club 19 de abril 2012
 
Moving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDBMoving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDB
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
 
Searching conversations with hadoop
Searching conversations with hadoopSearching conversations with hadoop
Searching conversations with hadoop
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
 
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014 WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
 
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
 
Scaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedInScaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedIn
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Application Architectures with Hadoop - Big Data TechCon SF 2014
Application Architectures with Hadoop - Big Data TechCon SF 2014Application Architectures with Hadoop - Big Data TechCon SF 2014
Application Architectures with Hadoop - Big Data TechCon SF 2014
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 

Similar to Hafslund SESAM - Semantic integration in practice

HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauSam Palani
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware ProvisioningMongoDB
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Sa introduction to big data pipelining with cassandra &amp; spark   west mins...Sa introduction to big data pipelining with cassandra &amp; spark   west mins...
Sa introduction to big data pipelining with cassandra &amp; spark west mins...Simon Ambridge
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platformmartinbpeters
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge
 
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...Lucas Jellema
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Databricks
 

Similar to Hafslund SESAM - Semantic integration in practice (20)

Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Sa introduction to big data pipelining with cassandra &amp; spark   west mins...Sa introduction to big data pipelining with cassandra &amp; spark   west mins...
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Operational-Analytics
Operational-AnalyticsOperational-Analytics
Operational-Analytics
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_databaseOracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_database
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 

More from Lars Marius Garshol

JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformationLars Marius Garshol
 
Data collection in AWS at Schibsted
Data collection in AWS at SchibstedData collection in AWS at Schibsted
Data collection in AWS at SchibstedLars Marius Garshol
 
NoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativityNoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativityLars Marius Garshol
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engineLars Marius Garshol
 
Linked Open Data for the Cultural Sector
Linked Open Data for the Cultural SectorLinked Open Data for the Cultural Sector
Linked Open Data for the Cultural SectorLars Marius Garshol
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityLars Marius Garshol
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Experiments in genetic programming
Experiments in genetic programmingExperiments in genetic programming
Experiments in genetic programmingLars Marius Garshol
 

More from Lars Marius Garshol (20)

JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformation
 
Data collection in AWS at Schibsted
Data collection in AWS at SchibstedData collection in AWS at Schibsted
Data collection in AWS at Schibsted
 
Kveik - what is it?
Kveik - what is it?Kveik - what is it?
Kveik - what is it?
 
Nature-inspired algorithms
Nature-inspired algorithmsNature-inspired algorithms
Nature-inspired algorithms
 
Collecting 600M events/day
Collecting 600M events/dayCollecting 600M events/day
Collecting 600M events/day
 
History of writing
History of writingHistory of writing
History of writing
 
NoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativityNoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativity
 
Norwegian farmhouse ale
Norwegian farmhouse aleNorwegian farmhouse ale
Norwegian farmhouse ale
 
Archive integration with RDF
Archive integration with RDFArchive integration with RDF
Archive integration with RDF
 
The Euro crisis in 10 minutes
The Euro crisis in 10 minutesThe Euro crisis in 10 minutes
The Euro crisis in 10 minutes
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
 
Linked Open Data for the Cultural Sector
Linked Open Data for the Cultural SectorLinked Open Data for the Cultural Sector
Linked Open Data for the Cultural Sector
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
 
Bitcoin - digital gold
Bitcoin - digital goldBitcoin - digital gold
Bitcoin - digital gold
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Hops - the green gold
Hops - the green goldHops - the green gold
Hops - the green gold
 
Big data 101
Big data 101Big data 101
Big data 101
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Approximate string comparators
Approximate string comparatorsApproximate string comparators
Approximate string comparators
 
Experiments in genetic programming
Experiments in genetic programmingExperiments in genetic programming
Experiments in genetic programming
 

Recently uploaded

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Hafslund SESAM - Semantic integration in practice

  • 1. SESAM Lars Marius Garshol, Baksia, 2013-02-13 larsga@bouvet.no, http://twitter.com/larsga 1
  • 2. Agenda • Project background • Functional overview • Under the hood • Similar projects (if time) 2
  • 3. Lars Marius Garshol • Consultant in Bouvet since 2007 – focus on information architecture and semantics • Worked with semantic technologies since 1999 – mostly with Topic Maps – co-founder of Ontopia, later CTO – editor of several Topic Maps ISO standards 2001- – co-chair of TMRA conference 2006-2011 – developed several key Topic Maps technologies – consultant in a number of Topic Maps projects • Published a book on XML on Prentice-Hall • Implemented Unicode support in the Opera web browser
  • 4. My role on the project • The overall architecture is the brainchild of Axel Borge • SDshare came from an idea by Graham Moore • I only contributed parts of the design – and some parts of the implementation • Don’t actually know the whole system
  • 6. Hafslund ASA • Norwegian energy company – founded 1898 – 53% owned by the city of Oslo – responsible for energy grid around Oslo – 1.4 million customers • A conglomerate of companies – Nett (electricity grid) – Fjernvarme (remote heating) – Produksjon (power generation) – Venture – ...
  • 7. What if...? ERP CRM Work Bill order Transfor Cables Meter Customer mer
  • 8. Hafslund SESAM • An archive system, really • Generally, archive systems are glorified trash cans – putting it in the archive effectively means hiding it • Because archives are not important, are they? • Except, when you need that contract from 1937 about the right to build a power line across...
  • 9.
  • 10. Problems with archives • Many documents aren’t there – even though they should be, – because entering metadata is too much hassle • Poor metadata, because nobody bothers to enter it properly – yet, much of the metadata exists in the user context • Not used by anybody – strange, separate system with poor interface – (and the metadata is poor, too) • Contains only documents – not connected to anything else
  • 11. Goals for the SESAM project • Increase percentage of archived documents – to do this, archiving must be made easy • Increase metadata quality – by automatically harvesting metadata • Make it easy to find archived documents – by building a well-tuned search application
  • 12. What SESAM does • Automatically enrich document metadata – to do that we have to collect background data from the source systems • Connect document metadata with structured business data – this is a side-effect of the enrichment • Provide search across the whole information space – once we’ve collected the data this part is easy
  • 13. As seen by customer 13
  • 14. High-level architecture Share- ERP CRM point SDshare CMIS Search SDshare Triple store SDshare Archive engine
  • 15. Main principle of data extraction • No canonical model! – instead, data reflects model of source system • One ontology per source system – subtyped from core ontology where possible • Vastly simplifies data extraction – for search purposes it loses us nothing – and translation is easier once the data is in the triple store
  • 17. Data structure in triple store ERP sameAs Archive sameAs CRM Sharepoint
  • 18. Connecting data ERP • Contacts in many different systems sameAs – ERP has one set of contacts, – these are mirrored in archive Archive • Different IDs in these two systems – using “sameAs” we know which are the same = – can do queries across the systems – can also translate IDs from one system to the other 18
  • 19. Duplicate suppression ERP CRM Billing Suppliers Customers Customers Customers Companies Field Record 1 Record 2 Probability Name acme inc acme inc 0.9 Assoc no 177477707 0.5 Zip code 9161 9161 0.6 owl:sameAs Country norway Duke norway 0.51 Data hub Address 1 mb 113 mailbox 113 0.49 Address 2 0.5 SDshare http://code.google.com/p/duke/
  • 20. When archiving • The user works on the document in some system – ERP, CRM, whatever • This system knows the context – what user, project, equipment, etc is involved • This information is passed to the CMIS server – it uses already gathered information from the triple store to attach more metadata
  • 21. Auto-tagging Manager Sent to archive Project Customer Work order Equipment Equipment
  • 22. Archive integration • Clients deliver documents via CMIS – an OASIS standard for CMS interfaces – lots of available implementations • Metadata translation – not only auto-tagging CMIS – client vocabulary translated to archive vocab – required static metadata added automatically • Benefits – clients can reuse existing software for connecting Archive – interface is independent of actual archive – archive integration is reusable 22
  • 23. Archive integration elsewhere • Generally, archive integrations are hard – web site integration: 400 hours – #2 integration: performance problems • They also reproduce the same functionality over and over again • Hard bindings against – internal archive model System Regulation Outlook – archive interface #2 system • Switching archive system System Web site will be very hard... #1 – even upgrading is hard Archive 23
  • 24. Showing context in the ERP system
  • 25. Access control • Users only see objects they’re allowed to see • Implemented by search engine – all objects have lists of users/groups allowed to see them – on login a SPARQL query lists user’s access control group memberships – search engine uses this to filter search results • In some cases, complex access rules are run to resolve ACLs before loading into triple store – e.g: archive system
  • 26. Data volumes Graph Statements IFS data 5,417,260 Public 360 data 3,725,963 GeoNIS data 44,242 Tieto CAB data 138,521,810 Hummingbird 1 32,619,140 Hummingbird 2 165,671,179 Hummingbird 3 192,930,188 Hummingbird 4 48,623,178 Address data 2,415,315 Siebel data 36,117,786 Duke links 4,858 Total 626,090,919
  • 27. 7
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 34. RDF • The data hub is an RDF database – RDF = Resource Description Framework – a W3C standard for data – also interchange format and query language (SPARQL) • Many implementations of the standards – lots of alternatives to choose between • Technology relatively well known – books, specifications, courses, conferences, ... 34
  • 35. How RDF works ‘PERSON’ table ID NAME EMAIL 1 Stian Danenbarger stian.danenbarger@ 2 Lars Marius Garshol larsga@bouvet.no 3 Axel Borge axel.borge@bouvet RDF-ized data SUBJECT PROPERTY OBJECT http://example.com/person/1 rdf:type ex:Person http://example.com/person/1 ex:name Stian Danenbarger http://example.com/person/1 ex:email stian.danenbarger@ http://example.com/person/2 rdf:type Person http://example.com/person/2 ex:name Lars Marius Garshol ... ... ... 35
  • 36. RDF/XML • Standard XML format for RDF – not really very nice • However, it’s a generic format – that is, it’s the same format regardless of what data you’re transmitting • Therefore – all data flows are in the same format – absolutely no syntax transformations whatsoever 36
  • 37. Plug-and-play • Data source integrations are pluggable Data – if you need another data source, hub connect and pull in the data • RDF database is schemaless – no need to create tables and columns beforehand • No common data model System #1 System #2 – do not need to transform data to store it 37
  • 38. Product queue • Since sources are pluggable we can guide the project with a product queue – consisting of new sources and data elements • We analyse the items in the queue – estimate complexity and amount of work – also to see how it fits into what’s already there • Customer then decides what to do, and in what order 38
  • 39. Don’t push! • The general IT approach is to push data – source calls services provided by recipient • Leads to high complexity – two-way dependency between systems – have to extend source system with extra logic – extra triggers/threads in source system – many moving parts System #1 39
  • 40. Pull! • We let the recipient pull – always using same protocol and same format • Wrap the source – the wrapper must support 3 simple functions • A different kind of solution – one-way dependency – often zero code – wrapper is thin and stateless – data moving done by reused code System #1 40
  • 41. The data integration • All data transport done by SDshare • A simple Atom-based specification for synchronizing RDF data – http://www.sdshare.org • Provides two main features – snapshot of the data – fragments for each updated resource
  • 42. Basics of SDshare • Source offers – a dump of the entire data set – a list of fragments changed since time t – a dump of each fragment • Completely generic solution – always the same protocol – always the same data format (RDF/XML) • A generic SDshare client then transfers the data – to the receipient, whatever it is
  • 44. Typical usage of SDshare • Client downloads snapshot – client now has complete data set • Client polls fragment feed – each time asking for new fragments since last check – client keeps track of time of last check – fragments are applied to data, keeping them in sync
  • 45. Implementing the fragment feed select objid, objtype, change_time from history_log <atom> where change_time > :since: <title>Fragments for ...</title> order by change_time asc ... <entry> <title>Change to 34121</title> <link rel=fragment href=“...”/> <sdshare:resource>http://...</sdshare:resource> <updated>2012-09-06T08:22:23</updated> </entry> <entry> <title>Change to 94857</title> <link rel=fragment href=“...”/> <sdshare:resource>http://...</sdshare:resource> <updated>2012-09-06T08:22:24</updated> </entry> ...
  • 46. The SDshare client SPARQL- Triple backend store Frontend Core POST- backend W S http://code.google.com/p/sdshare-client/
  • 47. Getting data out of the triple store • Set up SPARQL queries to RDF extract the data • Server does the rest • Queries can be configured to SPARQL produce SDshare server – any subset of data – data in any shape
  • 48. Contacts into the archive • We want some resources in the triple store to be written into the archive as “contacts” – need to select which resources to include – must also transform from source data model • How to achieve without hard-wiring anything?
  • 49. Contacts solution • Create a generic archive object writer – type of RDF resource specifies type of object to create – name of RDF property (within namespace) specifies which property to set • Set up RDF mapping from source data – type1 maps-to type2 – prop1 maps-to prop2 – only mapped types/properties included • Use SPARQL to – create SDshare feed – do data translation with CONSTRUCT query
  • 50. Properties of the system • Uniform integration approach – everything is done the same way • Really simple integration – setting up a data source is generally very easy • Loose bindings – components can easily be replaced • Very little state – most components are stateless (or have little state) • Idempotent – applying a fragment 1 or many times: same result • Clear and reload – can delete everything and reload at any time
  • 51. Conclusion SESAM 51
  • 52. Project outcome • Big, innovative project – many developers over a long time period – innovated a number of techniques and tools • Delivered on time, and working – despite an evolving project context • Very satisfied customer – they claim the project has already paid for itself in cost savings at the document center 52
  • 53. Archive of the year 2012 ”Virksomheten har vært innovative og strategiske i sin bruk av teknologi både når det gjelder de gamle papirarkivene og det digitale arkivet. De har et stort fokus på å øke datafangsten og forenkle bruken av metadata. De har hatt som mål at brukerne kun skal måtte påføre én metadata. De er nå nede i to – alt annet blir påført automatisk i løsningen – men målet er fortsatt å komme ned i kun én metadata.” 53 http://www.arkivrad.no/utdanning.html?newsid=10677
  • 54. Highly flexible solution • ERP integration replaced twice – without affecting other components • CRM system replaced while in production – all customers got new IDs – Sesam hid this from users, making transition 100% transparent 54
  • 55. Status now • In production since autumn 2011 • Used by – Hafslund Fjernvarme – Hafslund support centre 55
  • 56. Ohter implementations • A very similar system has been implemented for Statkraft • A system based on the same architecture has been implemented for DSS – basically an intranet system 56
  • 58. DSS project background • Delivers IT services and platform to ministries – archive, intranet, email, ... • Intranet currently based on EPiServer – in addition, want to use Sharepoint – of course, many other related systems... • How to integrate all this? – integrations must be loose, as systems in 13 ministries come and go 58
  • 59. Metadata structure Person Org. enhet (seksjon, avd., dep., virkso mhet) Har ansvar for Tema/ arb.område/ Har ansvar for Handler om Arkivnøkkel Tilhører/ har rolle i Handler om Er involvert i Prosjekt/ sak Er relevant for Har rolle i forhold til Saks-nr. Tilhører Saks-/ Prosess-type Person Er ekspert på Dokument/ Jobber med Innhold Omtale/ Har ansvar for/ skrevet kompetanse Tittel Telefon Status adresse … Dokument-type Fil-type Dato 59
  • 60. Project scope EPiServer IDM Sharepoint (intranet) User information Access groups 60
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79. How we built it 79
  • 80. The web part EPiServer We Sharepoint We • Viser data gitt – en spørring b b part part SDShare SPARQL SDShare – et XSLT-stilsett • Kan kjøres hvor som helst Virtuoso – standard protokoll til RDF DB databasen • Lett å inkludere i SDShare SDShare SDShare andre systemer også Novell Active Regjering IDM Directory en.no 80
  • 81. Data flow (simplified) Workspaces Documents Lists Sharepoint Org structure Categories Page structure EPiServer Virtuoso Categories RDF DB Employees Org. structure Novell IDM 81
  • 82. Independence of source • Solutions need to be independent of the data source – when we extract data, or – when we run queries against the data hub • Independence isolates clients from changes to the sources – allows us to replace sources completely, or – change the source models • Of course, total independence is impossible – if structures are too different it won’t work – but surprisingly often they’re not 82
  • 83. Independence of source #1 Core model core: core:participant core: Person Project idm: idm:has-member idm: sp: sp:member-of sp: Person Project Person Project IDM Sharepoint 83
  • 84. Independence of source #2 • “Facebook streams” a dss:StreamEvent; sp:doc-created require queries across sources rdfs:label "opprettet"; dss:time-property sp:Created; – every source has its own model dss:user-property sp:Author. • We can reduce the differences sp:doc-updated a dss:StreamEvent; with annotation "endret"; rdfs:label – ie: we describe the data with RDF dss:time-property sp:Modified; dss:user-property sp:Editor. • Allows data models to – change, or – be plugged in • with no changes to query idm:ny-ansatt a dss:StreamEvent; rdfs:label "ble ansatt i"; dss:time-property idm:ftAnsattFra . 84
  • 86. Execution • Customer asked for start August 15, delivery January 1 – sources: EPiServer, Sharepoint, IDM • We offered fixed price, done November 1 • Delibered october 20 – also integrated with ActiveDirectory – added Regjeringen.no for extra value • Integration with ACOS Websak – has been analyzed and described – but not implemented 86
  • 87. The data sources Source Connector Active Directory LDAP IDM LDAP Intranet EPiServer Regjeringen.no EPiServer Sharepoint Sharepoint 87
  • 88. Conclusion • We could do this so quickly because – the architecture is right – have components we can reuse – the architecture allows us to plug together the components in different settings • Once we have the data, the rest is usually simple – it’s getting the data that’s hard 88