SlideShare a Scribd company logo
1 of 39
Download to read offline
The MetaLex
Document Server
      Rinke Hoekstra
 Universiteit van Amsterdam
The Problem
• Knowledge




• Provenance
                        Regulation A       Art 12 Art 14, lid 3, 2e volzin   Art 14, lid 3, 2e volzin
                       (01-01-2011)    (04-02-2011)   (11-06-2008)               (01-07-2011)




• Open Data: public service falls short
• Large scale validation of CEN MetaLex
• “Linked Open Government Data”
Current
         Situation
Public content services hosted at wetten.nl
Wetten.nl XML Service
    http://wetten.overheid.nl/xml.php?regelingID=...

• Only available format is BWB XML
• Only current version
• Content at document level
• Identification at document level
• Identifiers are not dereferencable
• Hardly any metadata (e.g. version date)
• Only available context is position in text
BWBId Web Service
http://wetten.overheid.nl/BWBIdService/BWBIdList.xml.zip




NB: The problem with the XML processing instruction was reported and fixed, but returned sometime last week
Identifiers &
                          Juriconnect
                                  1.0:c:BWBR0005416&artikel=6
                                               vs
http://wetten.overheid.nl/cgi-bin/deeplink/law1/bwbid=BWBR0005416/article=6/date=2005-01-14
                                               vs
         http://wetten.overheid.nl/BWBR0005416/TitelII698946/HoofdstukII/Artikel16/
                                  geldigheidsdatum_14-01-2005



        • Juriconnect?
            • URN-based... but no naming server
               • (cf. Document Object Identifiers)
            • Named elements do not carry identifier
            • No explicit version information, only contextual
Sources used...

• List of all regulations in “XML”
• Wetten.nl XML Service
• Metadata in HTML table on wetten.nl
  (the “info page”)



• ... so let’s get started already
Step 1
Requirements
Our Goals
• “Deserialize” regulation content
  (e.g. topic-based browsing)

• Extract and reconstruct implicit information
  (identifiers, metadata)

• Annotate regulations
  (reconstructed metadata, third-party metadata)

• Annotate using regulations
  (knowledge based systems, services, business processes ...)

• Accessible and reusable for any other party
  (shared vocabularies, standard access)
Requirements


• Unique, persistent identification
• Generic XML structure of documents
• Extensible metadata framework
• Flexible web services
Technology Choices


• URL-like URIs
• CEN MetaLex XML documents
• Linked Data / RDF metadata
  (extensibility to OWL, RIF)

• Transparent REST-services
Step 2
   Come up with persistent identifiers at
element level and a solid versioning scheme
Identification

• Web-enabled “URL-like” URIs
  • e.g. http://doc.metalex.eu/....
• “Cool” URIs (http://www.w3.org/TR/cooluris/)
  • “Accept”-header based dereferencing
  • Different types of content at same URI
Levels of Identification

                     Bibliographic
                                                                                   Work
                         Entity

                                                                      realizes


• IFLA FRBR levels                                           Expression


                                                  embodies



 • Work                                 Manifestation


                                exemplifies



 • Expression            Item




 • Manifestation     XML version of
                      regulation on
                                        XML version of       Version of
                                                                                 Regulation
                                          regulation         regulation
                       my harddisk
Transparent Identifiers


• Hierarchical information (work)
  http://doc.metalex.eu/id/BWBR0011823/hoofdstuk/1/artikel/1
  http://doc.metalex.eu/id/BWBR0011823/artikel/1


• Version and language (expression)
  http://doc.metalex.eu/id/BWBR0011823/hoofdstuk/1/artikel/1/nl/2010-09-01



• Format information (manifestation)
  http://doc.metalex.eu/doc/BWBR0011823/hoofdstuk/1/artikel/1/nl/2010-09-01/data.xml
Problem
       • URIs don’t carry semantics...
       • Detect changes:
          • which element versions are the same
          • ... and which versions are different?
                                                      Art. 44, lid 4
                                                      (2011-03-26)


Art. 44, lid 4
(2011-04-05)

                                 from: Besluit prudentiële regels Wft, BWBR0020420
Opaque Identifiers
http://doc.metalex.eu/BWBR0011823/hoofdstuk/1/artikel/34b0cee26ee5138c74aa2c62caf2c117d3c616e9

                           s1                                    s2


                    frbr:realizes                         frbr:realizes



           s1t1         s1t2          s1t3       s2t1         s2t2           s2t3      ...


              owl:sameAs            owl:sameAs      owl:sameAs            owl:sameAs


                  AE6                 B9C               3F5




          • Content information
          • Unique SHA1 Hash of text
Step 3
Generic conversion of BWB XML to a generic
 XML format (CEN MetaLex) and appropriate
                 metadata
Procedure

     For each BWB XML file listed,
if update has occurred since latest run,
       download latest version,
         scrape metadata, and
               produce:
                 Persistent URIs
           CEN MetaLex + Citations
  Inline RDFa (optional) or RDF graph (optional),
           Pajek “.net” files (optional)
CEN MetaLex

• Straightforward 1:1 mapping
 • ... some minor fixes
• Mint URI’s on the fly
• Convert citations on the fly
• Generate metadata on the fly
 • “inline” inside mcontainer elements
Results
14

      Table 1. Conversion performance for 300 randomly selected regulations.

                            Number     %                                Number    %
                    42
       Substitutions                         Corrections
       container              22312   29 %   artikel                     2525    72 %
       hcontainer             3730    5%     divisie                      519    15 %
       htitle                 3730    5%     colspec                      289     8%
       block                  34325   44 %   illustratie                   54     2%
       inline                 13527   17 %   others                        99     3%
       Total                  77624          Total                       3486
                                             Total no. of regulations     300
                                             Revoked regulations         109     30 %
                                             Correction %                         4%



Lastly, the MDS offers a simple search interface for finding regulations based on
the title and version date.


6    Conclusion(full description in draft ISWC 2011 paper)
                 and Results
We ran the MetaLex conversion script on all regulations available through the
wetten.nl portal, resulting in a total of 27.687 versions of regulations being con-
                                                                                        40
Citations

• Juriconnect citations:
  1.0:v:BWBR0020486&artikel=6
  1.0:c:BWBR0020486&artikel=6


• MetaLex identifiers:
  http://doc.metalex.eu/id/BWBR0020486/artikel/6
  http://doc.metalex.eu/id/BWBR0020486/artikel/6/2009-01-01
Metadata Vocabularies
• “RDFized” BWB elements
• MetaLex ontology
  • FRBR type, modification events, structure
• Dublin Core
  • title, alternativeTitle, version
• FOAF
  • page, homepage
• Simple Event Model (SEM)
• Open Provenance Model vocabulary (OPMV)
• W3C Time Ontology
Events & Provenance
          The date at which the expression was created




"2009-10-23"^^xsd:date                                     time:Instant                               ml:Date                                  sem:Time
                              rdf:value

                 sem:hasTimeStamp                                          rdf:type
                                                             rdf:type                 sem:timeType
           time:inXSDDateTime                                                                              rdf:type




    opmv:Process                             http://doc.metalex.eu/id/date/2009-10-23                           sem:Event                           ml:LegislativeModification



                                                                             sem:hasTime                                      rdf:type
                   rdf:type               time:hasEnd                                                        rdf:type

                                                                           ml:date                                                 sem:eventType                  The creation event of the regulation


   http://doc.metalex.eu/id/process/BWBR0017869/2009-10-23                    http://doc.metalex.eu/id/event/BWBR0017869/2009-10-23                               opmv:Artifact




                                                            opmv:wasGeneratedAt
The process that generated the expression                                                              ml:resultOf
                                                                                                                                         rdf:type                    ml:BibliographicExpression
                                                opmv:wasGeneratedBy

                                                                                                                                                       rdf:type
                                                                                 http://doc.metalex.eu/id/BWBR0017869/2009-10-23



                                                                               The expression (version) URI of a regulation
Step 4
Publish: The MetaLex Document Server
                (MDS)
Document Serving

• RESTful API
  • Implement Cool URIs
      (Dereference to XML, RDF, .net)

  • Shorthands (‘/latest’)
  • SPARQL endpoint
  • Citation graphs
• Rudimentary (and unpredictable) search
• CSS Stylesheet for CEN MetaLex XML
Dereferencing (RDF)
File containing Turtle serialisation of SCBD       http://doc.metalex.eu/id/BWBR0011823/nl/2010-09-01
                                                   Accept: application/x-turtle
                                                                                 1   Client requests URI

     MDS returns Turtle         5




                                                                     http://doc.metalex.eu/doc/BWBR0011823/nl/2010-09-01/data.ttl

                                                                                                        2   Server redirects to manifestation URI (HTTP 303)



                                    JSON serialisation   SPARQL
Triplestore returns SCBD    4          of SCBD            Query        3   Server queries triplestore for Symmetric Concise Bounded Description (SCBD)




                                                                                                    http://www.w3.org/Submission/CBD
Dereferencing (XML)
                           Location of Manifestation                   http://doc.metalex.eu/id/BWBR0011823/nl/2010-09-01
                                                                       Accept: text/xml

                 http://doc.metalex.eu/files/BWBR0011823_2010-03-01_mls.xml                              1   Client requests URI


MDS redirects to Manifestation URI (HTTP 302)   6



                                                                                           http://doc.metalex.eu/doc/BWBR0011823/nl/2010-09-01/data.xml

                                                                                                                               2   Server redirects to manifestation URI (HTTP 303)




        Triplestore returns URI of Manifestation 5     Manifestation         Glob      3     Server queries file store for XML manifestation




                                                                                       4     If no manifestation exist, extract from parent




                                                                 (extract)

                                                                                       (Clients may render XML using CSS stylesheet)
Dereferencing (...)
• Other RDF syntaxes
  application/rdf+xml, text/rdf+n3

• HTML clients
  application/xml, application/xhtml+xml, text/html

  • Redirect (303) to Marbles browser
• Pajek clients
  text/plain

  • Download .net file
  • View using Gephi Toolkit
     http://gephi.org
Technical Details
•   Current situation
    •   +/- 27 thousand regulations
    •   87.9 million triples (legislation.gov.uk: 1.9 billion)
    •   Updated daily
•   Technical details
    •   Dell PowerEdge II T110, 32GB RAM
    •   Garlik 4Store triplestore (http://4store.org)
    •   Python Django web applications
    •   Tomcat servlet + Gephi Toolkit API


•   See http://doc.metalex.eu
Step 5
Use: social network analysis and concept
        extraction (ongoing work)
Network Analysis

• Impact of regulation on other
  regulations
  (combine with work on court rulings)

• Connectedness
• “Importance” of articles
• Analysis tools
 • Pajek, Gephi
The MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked Data

More Related Content

Viewers also liked

Viewers also liked (9)

Eter finishing by shinigami
Eter finishing by shinigamiEter finishing by shinigami
Eter finishing by shinigami
 
Hipertermia maligna
Hipertermia malignaHipertermia maligna
Hipertermia maligna
 
Karya Ilmiah Remaja
Karya Ilmiah RemajaKarya Ilmiah Remaja
Karya Ilmiah Remaja
 
HIPERTERMIA
HIPERTERMIAHIPERTERMIA
HIPERTERMIA
 
Varicella
VaricellaVaricella
Varicella
 
Makalah penyakit menular dan tidak menular
Makalah penyakit menular dan tidak menularMakalah penyakit menular dan tidak menular
Makalah penyakit menular dan tidak menular
 
Cacar air
Cacar airCacar air
Cacar air
 
Penulisan karya ilmiah skripsi
Penulisan karya ilmiah skripsiPenulisan karya ilmiah skripsi
Penulisan karya ilmiah skripsi
 
5-star linked open council decisions
5-star linked open council decisions5-star linked open council decisions
5-star linked open council decisions
 

Similar to The MetaLex Document Server - Legal Documents as Versioned Linked Data

Metadata gebruiken, wat komt er bij kijken
Metadata gebruiken, wat komt er bij kijkenMetadata gebruiken, wat komt er bij kijken
Metadata gebruiken, wat komt er bij kijkenovonder
 
Metadata oplossingen
Metadata oplossingenMetadata oplossingen
Metadata oplossingengrus001
 
Versiebeheer van database changes
Versiebeheer van database changesVersiebeheer van database changes
Versiebeheer van database changesArjen van Vliet
 
2010 iska - tim m - nosql iska
2010   iska - tim m - nosql iska2010   iska - tim m - nosql iska
2010 iska - tim m - nosql iskaTim Mahy
 
metadata & open source #osgeonl dag 2012
metadata & open source #osgeonl dag 2012 metadata & open source #osgeonl dag 2012
metadata & open source #osgeonl dag 2012 pvangenuchten
 
Nord Toelichting Techniek
Nord Toelichting TechniekNord Toelichting Techniek
Nord Toelichting Techniektjercus
 
M4B e-commercedag VVB 17/03/2014
M4B e-commercedag VVB 17/03/2014M4B e-commercedag VVB 17/03/2014
M4B e-commercedag VVB 17/03/2014boek_be
 
TYPO3 Congres 2012 - Aan de slag met TYPO3 Extbase en Fluid
TYPO3 Congres 2012 - Aan de slag met TYPO3 Extbase en FluidTYPO3 Congres 2012 - Aan de slag met TYPO3 Extbase en Fluid
TYPO3 Congres 2012 - Aan de slag met TYPO3 Extbase en FluidTYPO3 Nederland
 
PinkWeb SBR seminar Batavia XBRL Services: Validatie
PinkWeb SBR seminar Batavia XBRL Services: ValidatiePinkWeb SBR seminar Batavia XBRL Services: Validatie
PinkWeb SBR seminar Batavia XBRL Services: ValidatieVisma | PinkWeb
 
Open Source ECM Alternatief Alfresco
Open Source ECM Alternatief AlfrescoOpen Source ECM Alternatief Alfresco
Open Source ECM Alternatief AlfrescoEdwin van der Geest
 
Rf meetup 25feb2020 robo_con
Rf meetup 25feb2020 robo_conRf meetup 25feb2020 robo_con
Rf meetup 25feb2020 robo_conchristiantester
 
Oracle Discoverer to Oracle BI EE
Oracle Discoverer to Oracle BI EEOracle Discoverer to Oracle BI EE
Oracle Discoverer to Oracle BI EEDaan Bakboord
 
2007-may-31 HL7 NL Themamiddag V3 Architecture
2007-may-31 HL7 NL Themamiddag V3 Architecture2007-may-31 HL7 NL Themamiddag V3 Architecture
2007-may-31 HL7 NL Themamiddag V3 ArchitectureMichael van der Zel
 
Remco van Veenendaal (Nationaal Archief) = persistent identifiers
Remco van Veenendaal (Nationaal Archief) = persistent identifiersRemco van Veenendaal (Nationaal Archief) = persistent identifiers
Remco van Veenendaal (Nationaal Archief) = persistent identifiersNetwerk Digitaal Erfgoed
 
M4B Gebruikersoverleg - 30 mei 2013
M4B Gebruikersoverleg - 30 mei 2013 M4B Gebruikersoverleg - 30 mei 2013
M4B Gebruikersoverleg - 30 mei 2013 boek_be
 

Similar to The MetaLex Document Server - Legal Documents as Versioned Linked Data (20)

Metadata gebruiken, wat komt er bij kijken
Metadata gebruiken, wat komt er bij kijkenMetadata gebruiken, wat komt er bij kijken
Metadata gebruiken, wat komt er bij kijken
 
Metadata oplossingen
Metadata oplossingenMetadata oplossingen
Metadata oplossingen
 
Versiebeheer van database changes
Versiebeheer van database changesVersiebeheer van database changes
Versiebeheer van database changes
 
2010 iska - tim m - nosql iska
2010   iska - tim m - nosql iska2010   iska - tim m - nosql iska
2010 iska - tim m - nosql iska
 
metadata & open source #osgeonl dag 2012
metadata & open source #osgeonl dag 2012 metadata & open source #osgeonl dag 2012
metadata & open source #osgeonl dag 2012
 
Nord Toelichting Techniek
Nord Toelichting TechniekNord Toelichting Techniek
Nord Toelichting Techniek
 
M4B e-commercedag VVB 17/03/2014
M4B e-commercedag VVB 17/03/2014M4B e-commercedag VVB 17/03/2014
M4B e-commercedag VVB 17/03/2014
 
Excellent rest met de web api
Excellent rest met de web apiExcellent rest met de web api
Excellent rest met de web api
 
TYPO3 Congres 2012 - Aan de slag met TYPO3 Extbase en Fluid
TYPO3 Congres 2012 - Aan de slag met TYPO3 Extbase en FluidTYPO3 Congres 2012 - Aan de slag met TYPO3 Extbase en Fluid
TYPO3 Congres 2012 - Aan de slag met TYPO3 Extbase en Fluid
 
PinkWeb SBR seminar Batavia XBRL Services: Validatie
PinkWeb SBR seminar Batavia XBRL Services: ValidatiePinkWeb SBR seminar Batavia XBRL Services: Validatie
PinkWeb SBR seminar Batavia XBRL Services: Validatie
 
Open Source ECM Alternatief Alfresco
Open Source ECM Alternatief AlfrescoOpen Source ECM Alternatief Alfresco
Open Source ECM Alternatief Alfresco
 
Rf meetup 25feb2020 robo_con
Rf meetup 25feb2020 robo_conRf meetup 25feb2020 robo_con
Rf meetup 25feb2020 robo_con
 
Oracle Discoverer to Oracle BI EE
Oracle Discoverer to Oracle BI EEOracle Discoverer to Oracle BI EE
Oracle Discoverer to Oracle BI EE
 
Implementing Rule-based Systems with Semantic MediaWiki
Implementing Rule-based Systems with Semantic MediaWikiImplementing Rule-based Systems with Semantic MediaWiki
Implementing Rule-based Systems with Semantic MediaWiki
 
2007-may-31 HL7 NL Themamiddag V3 Architecture
2007-may-31 HL7 NL Themamiddag V3 Architecture2007-may-31 HL7 NL Themamiddag V3 Architecture
2007-may-31 HL7 NL Themamiddag V3 Architecture
 
Adlib webservices
Adlib webservicesAdlib webservices
Adlib webservices
 
Verwerking van ontvangen digitale data. De opbouw van het LIAS-preintgestproces
Verwerking van ontvangen digitale data. De opbouw van het LIAS-preintgestprocesVerwerking van ontvangen digitale data. De opbouw van het LIAS-preintgestproces
Verwerking van ontvangen digitale data. De opbouw van het LIAS-preintgestproces
 
Debat Wegwijs in het landschap van archiefbeheersysteem
Debat Wegwijs in het landschap van archiefbeheersysteemDebat Wegwijs in het landschap van archiefbeheersysteem
Debat Wegwijs in het landschap van archiefbeheersysteem
 
Remco van Veenendaal (Nationaal Archief) = persistent identifiers
Remco van Veenendaal (Nationaal Archief) = persistent identifiersRemco van Veenendaal (Nationaal Archief) = persistent identifiers
Remco van Veenendaal (Nationaal Archief) = persistent identifiers
 
M4B Gebruikersoverleg - 30 mei 2013
M4B Gebruikersoverleg - 30 mei 2013 M4B Gebruikersoverleg - 30 mei 2013
M4B Gebruikersoverleg - 30 mei 2013
 

More from Rinke Hoekstra

Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the WebRinke Hoekstra
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseRinke Hoekstra
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataRinke Hoekstra
 
QBer - Connect your data to the cloud
QBer - Connect your data to the cloudQBer - Connect your data to the cloud
QBer - Connect your data to the cloudRinke Hoekstra
 
Jurix 2014 welcome presentation
Jurix 2014 welcome presentationJurix 2014 welcome presentation
Jurix 2014 welcome presentationRinke Hoekstra
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Rinke Hoekstra
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationRinke Hoekstra
 
Linkitup: Link Discovery for Research Data
Linkitup: Link Discovery for Research DataLinkitup: Link Discovery for Research Data
Linkitup: Link Discovery for Research DataRinke Hoekstra
 
Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?Rinke Hoekstra
 
Linked Science - Building a Web of Research Data
Linked Science - Building a Web of Research DataLinked Science - Building a Web of Research Data
Linked Science - Building a Web of Research DataRinke Hoekstra
 
Semantic Representations for Research
Semantic Representations for ResearchSemantic Representations for Research
Semantic Representations for ResearchRinke Hoekstra
 
A Slightly Different Web of Data
A Slightly Different Web of DataA Slightly Different Web of Data
A Slightly Different Web of DataRinke Hoekstra
 
The Knowledge Reengineering Bottleneck
The Knowledge Reengineering BottleneckThe Knowledge Reengineering Bottleneck
The Knowledge Reengineering BottleneckRinke Hoekstra
 
Concept- en Definitie Extractie
Concept- en Definitie ExtractieConcept- en Definitie Extractie
Concept- en Definitie ExtractieRinke Hoekstra
 
SIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web LanguagesSIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web LanguagesRinke Hoekstra
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of DataRinke Hoekstra
 
History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)Rinke Hoekstra
 
Making Sense of Design Patterns
Making Sense of Design PatternsMaking Sense of Design Patterns
Making Sense of Design PatternsRinke Hoekstra
 

More from Rinke Hoekstra (20)

Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities Data
 
QBer - Connect your data to the cloud
QBer - Connect your data to the cloudQBer - Connect your data to the cloud
QBer - Connect your data to the cloud
 
Jurix 2014 welcome presentation
Jurix 2014 welcome presentationJurix 2014 welcome presentation
Jurix 2014 welcome presentation
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance Visualization
 
Linkitup: Link Discovery for Research Data
Linkitup: Link Discovery for Research DataLinkitup: Link Discovery for Research Data
Linkitup: Link Discovery for Research Data
 
Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?
 
Linked Science - Building a Web of Research Data
Linked Science - Building a Web of Research DataLinked Science - Building a Web of Research Data
Linked Science - Building a Web of Research Data
 
COMMIT/VIVO
COMMIT/VIVOCOMMIT/VIVO
COMMIT/VIVO
 
Semantic Representations for Research
Semantic Representations for ResearchSemantic Representations for Research
Semantic Representations for Research
 
A Slightly Different Web of Data
A Slightly Different Web of DataA Slightly Different Web of Data
A Slightly Different Web of Data
 
The Knowledge Reengineering Bottleneck
The Knowledge Reengineering BottleneckThe Knowledge Reengineering Bottleneck
The Knowledge Reengineering Bottleneck
 
Linked Census Data
Linked Census DataLinked Census Data
Linked Census Data
 
Concept- en Definitie Extractie
Concept- en Definitie ExtractieConcept- en Definitie Extractie
Concept- en Definitie Extractie
 
SIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web LanguagesSIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web Languages
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of Data
 
History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)
 
Making Sense of Design Patterns
Making Sense of Design PatternsMaking Sense of Design Patterns
Making Sense of Design Patterns
 

The MetaLex Document Server - Legal Documents as Versioned Linked Data

  • 1. The MetaLex Document Server Rinke Hoekstra Universiteit van Amsterdam
  • 2. The Problem • Knowledge • Provenance Regulation A Art 12 Art 14, lid 3, 2e volzin Art 14, lid 3, 2e volzin (01-01-2011) (04-02-2011) (11-06-2008) (01-07-2011) • Open Data: public service falls short • Large scale validation of CEN MetaLex • “Linked Open Government Data”
  • 3. Current Situation Public content services hosted at wetten.nl
  • 4. Wetten.nl XML Service http://wetten.overheid.nl/xml.php?regelingID=... • Only available format is BWB XML • Only current version • Content at document level • Identification at document level • Identifiers are not dereferencable • Hardly any metadata (e.g. version date) • Only available context is position in text
  • 5. BWBId Web Service http://wetten.overheid.nl/BWBIdService/BWBIdList.xml.zip NB: The problem with the XML processing instruction was reported and fixed, but returned sometime last week
  • 6. Identifiers & Juriconnect 1.0:c:BWBR0005416&artikel=6 vs http://wetten.overheid.nl/cgi-bin/deeplink/law1/bwbid=BWBR0005416/article=6/date=2005-01-14 vs http://wetten.overheid.nl/BWBR0005416/TitelII698946/HoofdstukII/Artikel16/ geldigheidsdatum_14-01-2005 • Juriconnect? • URN-based... but no naming server • (cf. Document Object Identifiers) • Named elements do not carry identifier • No explicit version information, only contextual
  • 7. Sources used... • List of all regulations in “XML” • Wetten.nl XML Service • Metadata in HTML table on wetten.nl (the “info page”) • ... so let’s get started already
  • 9. Our Goals • “Deserialize” regulation content (e.g. topic-based browsing) • Extract and reconstruct implicit information (identifiers, metadata) • Annotate regulations (reconstructed metadata, third-party metadata) • Annotate using regulations (knowledge based systems, services, business processes ...) • Accessible and reusable for any other party (shared vocabularies, standard access)
  • 10. Requirements • Unique, persistent identification • Generic XML structure of documents • Extensible metadata framework • Flexible web services
  • 11. Technology Choices • URL-like URIs • CEN MetaLex XML documents • Linked Data / RDF metadata (extensibility to OWL, RIF) • Transparent REST-services
  • 12. Step 2 Come up with persistent identifiers at element level and a solid versioning scheme
  • 13. Identification • Web-enabled “URL-like” URIs • e.g. http://doc.metalex.eu/.... • “Cool” URIs (http://www.w3.org/TR/cooluris/) • “Accept”-header based dereferencing • Different types of content at same URI
  • 14. Levels of Identification Bibliographic Work Entity realizes • IFLA FRBR levels Expression embodies • Work Manifestation exemplifies • Expression Item • Manifestation XML version of regulation on XML version of Version of Regulation regulation regulation my harddisk
  • 15. Transparent Identifiers • Hierarchical information (work) http://doc.metalex.eu/id/BWBR0011823/hoofdstuk/1/artikel/1 http://doc.metalex.eu/id/BWBR0011823/artikel/1 • Version and language (expression) http://doc.metalex.eu/id/BWBR0011823/hoofdstuk/1/artikel/1/nl/2010-09-01 • Format information (manifestation) http://doc.metalex.eu/doc/BWBR0011823/hoofdstuk/1/artikel/1/nl/2010-09-01/data.xml
  • 16. Problem • URIs don’t carry semantics... • Detect changes: • which element versions are the same • ... and which versions are different? Art. 44, lid 4 (2011-03-26) Art. 44, lid 4 (2011-04-05) from: Besluit prudentiële regels Wft, BWBR0020420
  • 17. Opaque Identifiers http://doc.metalex.eu/BWBR0011823/hoofdstuk/1/artikel/34b0cee26ee5138c74aa2c62caf2c117d3c616e9 s1 s2 frbr:realizes frbr:realizes s1t1 s1t2 s1t3 s2t1 s2t2 s2t3 ... owl:sameAs owl:sameAs owl:sameAs owl:sameAs AE6 B9C 3F5 • Content information • Unique SHA1 Hash of text
  • 18. Step 3 Generic conversion of BWB XML to a generic XML format (CEN MetaLex) and appropriate metadata
  • 19. Procedure For each BWB XML file listed, if update has occurred since latest run, download latest version, scrape metadata, and produce: Persistent URIs CEN MetaLex + Citations Inline RDFa (optional) or RDF graph (optional), Pajek “.net” files (optional)
  • 20. CEN MetaLex • Straightforward 1:1 mapping • ... some minor fixes • Mint URI’s on the fly • Convert citations on the fly • Generate metadata on the fly • “inline” inside mcontainer elements
  • 21. Results 14 Table 1. Conversion performance for 300 randomly selected regulations. Number % Number % 42 Substitutions Corrections container 22312 29 % artikel 2525 72 % hcontainer 3730 5% divisie 519 15 % htitle 3730 5% colspec 289 8% block 34325 44 % illustratie 54 2% inline 13527 17 % others 99 3% Total 77624 Total 3486 Total no. of regulations 300 Revoked regulations 109 30 % Correction % 4% Lastly, the MDS offers a simple search interface for finding regulations based on the title and version date. 6 Conclusion(full description in draft ISWC 2011 paper) and Results We ran the MetaLex conversion script on all regulations available through the wetten.nl portal, resulting in a total of 27.687 versions of regulations being con- 40
  • 22. Citations • Juriconnect citations: 1.0:v:BWBR0020486&artikel=6 1.0:c:BWBR0020486&artikel=6 • MetaLex identifiers: http://doc.metalex.eu/id/BWBR0020486/artikel/6 http://doc.metalex.eu/id/BWBR0020486/artikel/6/2009-01-01
  • 23. Metadata Vocabularies • “RDFized” BWB elements • MetaLex ontology • FRBR type, modification events, structure • Dublin Core • title, alternativeTitle, version • FOAF • page, homepage • Simple Event Model (SEM) • Open Provenance Model vocabulary (OPMV) • W3C Time Ontology
  • 24.
  • 25. Events & Provenance The date at which the expression was created "2009-10-23"^^xsd:date time:Instant ml:Date sem:Time rdf:value sem:hasTimeStamp rdf:type rdf:type sem:timeType time:inXSDDateTime rdf:type opmv:Process http://doc.metalex.eu/id/date/2009-10-23 sem:Event ml:LegislativeModification sem:hasTime rdf:type rdf:type time:hasEnd rdf:type ml:date sem:eventType The creation event of the regulation http://doc.metalex.eu/id/process/BWBR0017869/2009-10-23 http://doc.metalex.eu/id/event/BWBR0017869/2009-10-23 opmv:Artifact opmv:wasGeneratedAt The process that generated the expression ml:resultOf rdf:type ml:BibliographicExpression opmv:wasGeneratedBy rdf:type http://doc.metalex.eu/id/BWBR0017869/2009-10-23 The expression (version) URI of a regulation
  • 26. Step 4 Publish: The MetaLex Document Server (MDS)
  • 27. Document Serving • RESTful API • Implement Cool URIs (Dereference to XML, RDF, .net) • Shorthands (‘/latest’) • SPARQL endpoint • Citation graphs • Rudimentary (and unpredictable) search • CSS Stylesheet for CEN MetaLex XML
  • 28. Dereferencing (RDF) File containing Turtle serialisation of SCBD http://doc.metalex.eu/id/BWBR0011823/nl/2010-09-01 Accept: application/x-turtle 1 Client requests URI MDS returns Turtle 5 http://doc.metalex.eu/doc/BWBR0011823/nl/2010-09-01/data.ttl 2 Server redirects to manifestation URI (HTTP 303) JSON serialisation SPARQL Triplestore returns SCBD 4 of SCBD Query 3 Server queries triplestore for Symmetric Concise Bounded Description (SCBD) http://www.w3.org/Submission/CBD
  • 29. Dereferencing (XML) Location of Manifestation http://doc.metalex.eu/id/BWBR0011823/nl/2010-09-01 Accept: text/xml http://doc.metalex.eu/files/BWBR0011823_2010-03-01_mls.xml 1 Client requests URI MDS redirects to Manifestation URI (HTTP 302) 6 http://doc.metalex.eu/doc/BWBR0011823/nl/2010-09-01/data.xml 2 Server redirects to manifestation URI (HTTP 303) Triplestore returns URI of Manifestation 5 Manifestation Glob 3 Server queries file store for XML manifestation 4 If no manifestation exist, extract from parent (extract) (Clients may render XML using CSS stylesheet)
  • 30. Dereferencing (...) • Other RDF syntaxes application/rdf+xml, text/rdf+n3 • HTML clients application/xml, application/xhtml+xml, text/html • Redirect (303) to Marbles browser • Pajek clients text/plain • Download .net file • View using Gephi Toolkit http://gephi.org
  • 31. Technical Details • Current situation • +/- 27 thousand regulations • 87.9 million triples (legislation.gov.uk: 1.9 billion) • Updated daily • Technical details • Dell PowerEdge II T110, 32GB RAM • Garlik 4Store triplestore (http://4store.org) • Python Django web applications • Tomcat servlet + Gephi Toolkit API • See http://doc.metalex.eu
  • 32. Step 5 Use: social network analysis and concept extraction (ongoing work)
  • 33. Network Analysis • Impact of regulation on other regulations (combine with work on court rulings) • Connectedness • “Importance” of articles • Analysis tools • Pajek, Gephi