Sasaki datathon-madrid-2015

Felix Sasaki
Felix SasakiTH Brandenburg / Cornelsen Verlag GmbH
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Roundtripping of NIF based
Linguistic Linked Data with non
linked data sources
Felix Sasaki
DFKI / W3C Fellow
Slides:
http://de.slideshare.net/atcfsenzoku/sasaki-datathonmadrid2015
1
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
What is NIF?
• Natural Language Processing Interchange
Format
– See http://nlp2rdf.org/
• LLD format to store annotations & to organize
NLP pipelines
• API specification to create NIF workflows
• More details: after the coffee break 
• Following slides: main roles for NIF
2
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example (Partial; JSON-LD Syntax)
{ "@graph" : [ {
"@id" : "p:char=0,18",
"@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ],
"anchorOf" : "Welcome to Prague.",
"beginIndex" : "0",
"endIndex" : "18",
"isString" : "Welcome to Prague.",
"referenceContext" : "p:char=0,18”
}, {
"@id" : "p:char=11,17",
"@type" : [ "nif:RFC5147String", "nif:Word" ], …
"referenceContext" : "p:char=0,18",
"taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }
3
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example (Partial; JSON-LD Syntax)
{ "@graph" : [ {
"@id" : "p:char=0,18",
"@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ],
"anchorOf" : "Welcome to Prague.",
"beginIndex" : "0",
"endIndex" : "18",
"isString" : "Welcome to Prague.",
"referenceContext" : "p:char=0,18”
}, {
"@id" : "p:char=11,17",
"@type" : [ "nif:RFC5147String", "nif:Word" ], …
"referenceContext" : "p:char=0,18",
"taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }
4
• Identifying and typing
annotations
• Identifying annotation
offsets
• Adding additional
knowledge, e.g. named
entity identifier
• Interrelating
annotations
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example (Partial; JSON-LD Syntax)
{ "@graph" : [ {
"@id" : "p:char=0,18",
"@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ],
"anchorOf" : "Welcome to Prague.",
"beginIndex" : "0",
"endIndex" : "18",
"isString" : "Welcome to Prague.",
"referenceContext" : "p:char=0,18”
}, {
"@id" : "p:char=11,17",
"@type" : [ "nif:RFC5147String", "nif:Word" ], …
"referenceContext" : "p:char=0,18",
"taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }
5
• Identifying and typing
annotations
• Identifying annotation
offsets
• Adding additional
knowledge, e.g. named
entity identifier
• Interrelating
annotations
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example (Partial; JSON-LD Syntax)
{ "@graph" : [ {
"@id" : "p:char=0,18",
"@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ],
"anchorOf" : "Welcome to Prague.",
"beginIndex" : "0",
"endIndex" : "18",
"isString" : "Welcome to Prague.",
"referenceContext" : "p:char=0,18”
}, {
"@id" : "p:char=11,17",
"@type" : [ "nif:RFC5147String", "nif:Word" ], …
"referenceContext" : "p:char=0,18",
"taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }
6
• Identifying and typing
annotations
• Identifying annotation
offsets
• Adding additional
knowledge, e.g.
named entity identifier
• Interrelating
annotations
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example (Partial; JSON-LD Syntax)
{ "@graph" : [ {
"@id" : "p:char=0,18",
"@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ],
"anchorOf" : "Welcome to Prague.",
"beginIndex" : "0",
"endIndex" : "18",
"isString" : "Welcome to Prague.",
"referenceContext" : "p:char=0,18”
}, {
"@id" : "p:char=11,17",
"@type" : [ "nif:RFC5147String", "nif:Word" ], …
"referenceContext" : "p:char=0,18",
"taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }
7
• Identifying and typing
annotations
• Identifying annotation
offsets
• Adding additional
knowledge, e.g.
named entity identifier
• Interrelating
annotations
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
A NIF workflow
8
Existing
content
Content analytics, e.g.
named entity
recognition
Conversion to
NIF
Deploying knowledge from the LLD cloud
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Potential scenario: roundtripping
9
Existing
content
Content analytics, e.g.
named entity
recognition
Conversion to
NIF
Storing annotations in original content
Deploying knowledge from the LLD cloud
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Roundtripping
• Roundtripping: Storing the outcome of
content processing (analytics) tasks in the
original content
• Not always needed, but sometimes –
examples:
– Enriching Web content with named entity
information; generating Schema.org markup via
NIF pipelines. Format: HTML
– Enriching localisation content, to add value
beyond translation: Format: XLIFF
10
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example: HTML
Example roundtripping workflow
11
… <p>Welcome to Prague!</p>…
…<p>Welcome to <span …
itemtype="http://schema.org/Place">Prague</span>!<
/p>…
1) Conversion to NIF 2) NER processing
3) Back conversion to HTML
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example: XLIFF
Example roundtripping workflow
12
… <xlf:source>Welcome to Prague!</xlf:source> …
… <xlf:source>Welcome to <mrk …
its:taClassRef="http://schema.org/Place">Prague
</mrk>!</xlf:source> …
1) Conversion to NIF 2) NER processing
3) Back conversion to HTML
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example usage scenario:
FREME project
• See http://www.freme-project.eu/
• Developing interfaces for multilingual and semantic
enrichment of digital content
• Relies on NIF based enrichment workflows
– See FREME API version 0.1
http://api.freme-project.eu/doc/0.1/
• Deploys aspects of the LIDER reference architecture for LLD
processing
– See D3.1.1 at http://lider-project.eu/?q=doc/deliverables
• Focuses on four business cases
– Localization BC requires XLIFF roundtripping
– Web content personalisation BC requires HTML roundtripping
13
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Challenges for roundtripping
• Source format
– How to store enrichment information
(annotations)
– How to handle existing information
• Annotation model
– NIF = a general graph-based annotation model
– Sources format and annotation motivation may
require restriction of the model
14
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
How to store annotations in various
source formats
• Solvable for markup languages like HTML or
XLIFF
• Challenge to preserve existing markup
“<p>Welcome to <b>Prague</b>!</p>”
• General issue with complex and proprietary
formats:
– “My own” storage mechanism = no tool support
– Using existing storage mechanisms may mean:
overloading semantics
15
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Source format example: Word
… <w:t>Welcome to Prague!</w:t> …
16
… <w:commentRangeStart w:id="0"/><w:t>Prague</w:t>
<w:commentRangeEnd w:id="0"/>
<w:r w:rsidR="00987079"> …
<w:p w:rsidRPr="00987079">… Enrichment: type "http://schema.org/Place"…</w:p>
Enrichment process; storing enrichment as comments
Change of original content: creation of anchor
Comment stored separately; refers to anchor: “standoff approach”
Content storage
Comment storage
Content storage (Word file unzipped)
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Annotation models
• NIF: like RDF = general graph model
– Consisting of nodes and arcs
17
p:char=11,17 dbp:Prague
taIdentRef
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Restricting graphs: Tree structured annotations
on several layers
18
• Tree structures
for syntactic
annotations
• Several
annotation layers
for the same text
• Concurrent
hierarchies
• Representation
only of one of
these in
roundtripping
with XML
Example taken from TEI http://www.tei-c.org/release/doc/tei-p5-doc/en/html/NH.html
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Representing overlapping hierarchies
with markup (1/2)
Solutions advertised by the TEI
• Multiple encoding of the same information
– One XML document per annotation
• Boundary marking with empty “milestone”
elements
– Also used by XLIFF
19
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Representing overlapping hierarchies
with markup (2/2)
Solutions advertised by the TEI
• Fragmentation and reconstitution of virtual
elements
– One hierarchy explicit, others with interrelated
marked-up spans
• Stand-off markup
– Separation of text and annotations, interlinked via
anchor and reference
– Cf. Word example
20
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Representing overlapping hierarchies
in RDF
POWLA (cf. Chiarcos, 2012)
• RDF representation for corpus annotation,
based on PAULA XML Standoff format
• Allows to represent hierarchical, multi-layer
corpora in RDF and query in SPARQL
• Not relevant for roundtripping, but for
linguistic annotation representation and
processing in RDF
21
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Lessons learned
• Choose the overlap solution that fits your
roundtripping modelling and processing needs
• Consider off-the-shelf tooling
– For 100% hierarchical data: XPath / CSS selectors, DOM, …
• Consider libraries
– For extraction only: Tika http://tika.apache.org/
– For roundtripping: Okapi http://okapi.opentag.com/ - in
FREME currently being adapted for roundtripping in
selected formats
• Make sure the annotation survives in the original
format – cf. Word example
– Soon to be made easier by using Okapi
22
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Roundtripping of NIF based
Linguistic Linked Data with non
linked data sources
Felix Sasaki
DFKI / W3C Fellow
23
1 von 23

Recomendados

Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level... von
Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...Till Blume
364 views20 Folien
Lider Reference Model ld4lt session March, 3rd, 2015 von
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015Sebastian Hellmann
1.2K views18 Folien
Sasaki mlkrep-20150710 von
Sasaki mlkrep-20150710Sasaki mlkrep-20150710
Sasaki mlkrep-20150710FREMEProjectH2020
536 views30 Folien
LDP-DL: A language to define the design of Linked Data Platforms von
LDP-DL: A language to define the design of Linked Data PlatformsLDP-DL: A language to define the design of Linked Data Platforms
LDP-DL: A language to define the design of Linked Data PlatformsMohammad Noorani Bakerally
116 views33 Folien
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory von
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryVisual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryPeter Haase
270 views15 Folien
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j von
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jExplicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jConnected Data World
1.4K views17 Folien

Más contenido relacionado

Was ist angesagt?

Getting Started with Knowledge Graphs von
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge GraphsPeter Haase
13.9K views74 Folien
Linked base Registries | The Scottish Government - Webinar 2017 von
Linked base Registries | The Scottish Government - Webinar 2017Linked base Registries | The Scottish Government - Webinar 2017
Linked base Registries | The Scottish Government - Webinar 2017Raf Buyle
587 views31 Folien
Linked data-tooling-xml von
Linked data-tooling-xmlLinked data-tooling-xml
Linked data-tooling-xmlFelix Sasaki
1.1K views20 Folien
Semantic Variation Graphs the case for RDF & SPARQL von
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLJerven Bolleman
566 views27 Folien
ESWC 2017 Tutorial Knowledge Graphs von
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsPeter Haase
2.8K views100 Folien
Hybrid Enterprise Knowledge Graphs von
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsPeter Haase
169 views12 Folien

Was ist angesagt?(20)

Getting Started with Knowledge Graphs von Peter Haase
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
Peter Haase13.9K views
Linked base Registries | The Scottish Government - Webinar 2017 von Raf Buyle
Linked base Registries | The Scottish Government - Webinar 2017Linked base Registries | The Scottish Government - Webinar 2017
Linked base Registries | The Scottish Government - Webinar 2017
Raf Buyle587 views
Linked data-tooling-xml von Felix Sasaki
Linked data-tooling-xmlLinked data-tooling-xml
Linked data-tooling-xml
Felix Sasaki1.1K views
Semantic Variation Graphs the case for RDF & SPARQL von Jerven Bolleman
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQL
Jerven Bolleman566 views
ESWC 2017 Tutorial Knowledge Graphs von Peter Haase
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
Peter Haase2.8K views
Hybrid Enterprise Knowledge Graphs von Peter Haase
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge Graphs
Peter Haase169 views
What_do_Knowledge_Graph_Embeddings_Learn.pdf von Heiko Paulheim
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
Heiko Paulheim170 views
The Power of Semantic Technologies to Explore Linked Open Data von Ontotext
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
Ontotext1.3K views
ROI in Linking Content to CRM by Applying the Linked Data Stack von Martin Voigt
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data Stack
Martin Voigt856 views
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud von Ontotext
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
Ontotext2.2K views
Property graph vs. RDF Triplestore comparison in 2020 von Ontotext
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
Ontotext17K views
Semantic Technologies and Triplestores for Business Intelligence von Marin Dimitrov
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
Marin Dimitrov4.1K views
Regal - a Repository for Electronic Documents and Bibliographic Data von Felix Ostrowski
Regal - a Repository for Electronic Documents and Bibliographic DataRegal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic Data
Felix Ostrowski829 views
Hacktoberfest 2020 - Intro to Knowledge Graphs von ArangoDB Database
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge Graphs
ArangoDB Database221 views
Linked Data Experiences at Springer Nature von Michele Pasin
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
Michele Pasin3.8K views
Linked Data at the National Széchényi Library : road to the publication von horvadam
Linked Data at the National Széchényi Library : road to the publicationLinked Data at the National Széchényi Library : road to the publication
Linked Data at the National Széchényi Library : road to the publication
horvadam287 views
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS von Harsh Thakkar
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Harsh Thakkar1.1K views

Similar a Sasaki datathon-madrid-2015

MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data von
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data21Style
1.4K views51 Folien
The Nature.com ontologies portal - Linked Science 2015 von
The Nature.com ontologies portal - Linked Science 2015The Nature.com ontologies portal - Linked Science 2015
The Nature.com ontologies portal - Linked Science 2015Michele Pasin
1.7K views25 Folien
The nature.com ontologies portal: nature.com/ontologies von
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesTony Hammond
141 views27 Folien
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015 von
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Sergio Fernández
3.2K views20 Folien
Data integration with a façade. The case of knowledge graph construction. von
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
335 views40 Folien
Eclipse RDF4J - Working with RDF in Java von
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaJeen Broekstra
426 views20 Folien

Similar a Sasaki datathon-madrid-2015(20)

MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data von 21Style
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
21Style1.4K views
The Nature.com ontologies portal - Linked Science 2015 von Michele Pasin
The Nature.com ontologies portal - Linked Science 2015The Nature.com ontologies portal - Linked Science 2015
The Nature.com ontologies portal - Linked Science 2015
Michele Pasin1.7K views
The nature.com ontologies portal: nature.com/ontologies von Tony Hammond
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologies
Tony Hammond141 views
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015 von Sergio Fernández
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Sergio Fernández3.2K views
Data integration with a façade. The case of knowledge graph construction. von Enrico Daga
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
Enrico Daga335 views
Eclipse RDF4J - Working with RDF in Java von Jeen Broekstra
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in Java
Jeen Broekstra426 views
Sasaki practical-linked-data von Felix Sasaki
Sasaki practical-linked-dataSasaki practical-linked-data
Sasaki practical-linked-data
Felix Sasaki985 views
Graph databases & data integration - the case of RDF von Dimitris Kontokostas
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDF
Linked Open Data: A simple how-to von nvitucci
Linked Open Data: A simple how-toLinked Open Data: A simple how-to
Linked Open Data: A simple how-to
nvitucci76 views
A year on the Semantic Web @ W3C von Ivan Herman
A year on the Semantic Web @ W3CA year on the Semantic Web @ W3C
A year on the Semantic Web @ W3C
Ivan Herman865 views
The Rhizomer Semantic Content Management System von Roberto García
The Rhizomer Semantic Content Management SystemThe Rhizomer Semantic Content Management System
The Rhizomer Semantic Content Management System
Roberto García1.1K views
Data Integration And Visualization von Ivan Ermilov
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
Ivan Ermilov985 views
Knowledge graph construction with a façade - The SPARQL Anything Project von Enrico Daga
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
Enrico Daga360 views
Querying the Wikidata Knowledge Graph von Ioan Toma
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
Ioan Toma2.3K views
Querying the Wikidata Knowledge Graph von LDBC council
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
LDBC council201 views
State of the Semantic Web von Ivan Herman
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
Ivan Herman923 views
FIWARE Global Summit - IDS Implementation with FIWARE Software Components von FIWARE
FIWARE Global Summit - IDS Implementation with FIWARE Software ComponentsFIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE353 views
RDF Linked Data - Automatic Exchange of BIM Containers von Safe Software
RDF Linked Data - Automatic Exchange of BIM ContainersRDF Linked Data - Automatic Exchange of BIM Containers
RDF Linked Data - Automatic Exchange of BIM Containers
Safe Software336 views

Más de Felix Sasaki

Thb tag-des-offenen-fensters-2021-sasaki-graphdatenbanken von
Thb tag-des-offenen-fensters-2021-sasaki-graphdatenbankenThb tag-des-offenen-fensters-2021-sasaki-graphdatenbanken
Thb tag-des-offenen-fensters-2021-sasaki-graphdatenbankenFelix Sasaki
214 views43 Folien
XML Seminar von
XML SeminarXML Seminar
XML SeminarFelix Sasaki
379 views102 Folien
Sasaki Presentation at EVA 2016 von
Sasaki Presentation at EVA 2016Sasaki Presentation at EVA 2016
Sasaki Presentation at EVA 2016Felix Sasaki
294 views19 Folien
Freme at feisgiltt 2015 freme & linked data & localisers von
Freme at feisgiltt 2015   freme & linked data & localisersFreme at feisgiltt 2015   freme & linked data & localisers
Freme at feisgiltt 2015 freme & linked data & localisersFelix Sasaki
808 views17 Folien
Freme at feisgiltt 2015 freme use cases von
Freme at feisgiltt 2015   freme use casesFreme at feisgiltt 2015   freme use cases
Freme at feisgiltt 2015 freme use casesFelix Sasaki
626 views22 Folien
1114 sasaki-metadata von
1114 sasaki-metadata1114 sasaki-metadata
1114 sasaki-metadataFelix Sasaki
9.7K views54 Folien

Más de Felix Sasaki(13)

Thb tag-des-offenen-fensters-2021-sasaki-graphdatenbanken von Felix Sasaki
Thb tag-des-offenen-fensters-2021-sasaki-graphdatenbankenThb tag-des-offenen-fensters-2021-sasaki-graphdatenbanken
Thb tag-des-offenen-fensters-2021-sasaki-graphdatenbanken
Felix Sasaki214 views
Sasaki Presentation at EVA 2016 von Felix Sasaki
Sasaki Presentation at EVA 2016Sasaki Presentation at EVA 2016
Sasaki Presentation at EVA 2016
Felix Sasaki294 views
Freme at feisgiltt 2015 freme & linked data & localisers von Felix Sasaki
Freme at feisgiltt 2015   freme & linked data & localisersFreme at feisgiltt 2015   freme & linked data & localisers
Freme at feisgiltt 2015 freme & linked data & localisers
Felix Sasaki808 views
Freme at feisgiltt 2015 freme use cases von Felix Sasaki
Freme at feisgiltt 2015   freme use casesFreme at feisgiltt 2015   freme use cases
Freme at feisgiltt 2015 freme use cases
Felix Sasaki626 views
1114 sasaki-metadata von Felix Sasaki
1114 sasaki-metadata1114 sasaki-metadata
1114 sasaki-metadata
Felix Sasaki9.7K views
Its2 ontology-localization von Felix Sasaki
Its2 ontology-localizationIts2 ontology-localization
Its2 ontology-localization
Felix Sasaki708 views
Sasaki ins-netz-gegangen-20111117 von Felix Sasaki
Sasaki ins-netz-gegangen-20111117Sasaki ins-netz-gegangen-20111117
Sasaki ins-netz-gegangen-20111117
Felix Sasaki707 views
"Warum Metadaten? Ein Plädoyer und mehr …" - webtechcon 2011 Präsentation von Felix Sasaki
"Warum Metadaten? Ein Plädoyer und mehr …" - webtechcon 2011 Präsentation"Warum Metadaten? Ein Plädoyer und mehr …" - webtechcon 2011 Präsentation
"Warum Metadaten? Ein Plädoyer und mehr …" - webtechcon 2011 Präsentation
Felix Sasaki1.1K views
HTML5 - presentation at W3C-Tag 2009 von Felix Sasaki
HTML5 - presentation at W3C-Tag 2009HTML5 - presentation at W3C-Tag 2009
HTML5 - presentation at W3C-Tag 2009
Felix Sasaki718 views

Último

Amine el bouzalimi von
Amine el bouzalimiAmine el bouzalimi
Amine el bouzalimiAmine EL BOUZALIMI
5 views38 Folien
WITS Deck von
WITS DeckWITS Deck
WITS DeckW.I.T.S.
27 views22 Folien
The Dark Web : Hidden Services von
The Dark Web : Hidden ServicesThe Dark Web : Hidden Services
The Dark Web : Hidden ServicesAnshu Singh
19 views24 Folien
ATPMOUSE_융합2조.pptx von
ATPMOUSE_융합2조.pptxATPMOUSE_융합2조.pptx
ATPMOUSE_융합2조.pptxkts120898
35 views70 Folien
Affiliate Marketing von
Affiliate MarketingAffiliate Marketing
Affiliate MarketingNavin Dhanuka
20 views30 Folien
cis5-Project-11a-Harry Lai von
cis5-Project-11a-Harry Laicis5-Project-11a-Harry Lai
cis5-Project-11a-Harry Laiharrylai126
9 views11 Folien

Último(10)

Sasaki datathon-madrid-2015

  • 1. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Roundtripping of NIF based Linguistic Linked Data with non linked data sources Felix Sasaki DFKI / W3C Fellow Slides: http://de.slideshare.net/atcfsenzoku/sasaki-datathonmadrid2015 1
  • 2. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 What is NIF? • Natural Language Processing Interchange Format – See http://nlp2rdf.org/ • LLD format to store annotations & to organize NLP pipelines • API specification to create NIF workflows • More details: after the coffee break  • Following slides: main roles for NIF 2
  • 3. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Example (Partial; JSON-LD Syntax) { "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] } 3
  • 4. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Example (Partial; JSON-LD Syntax) { "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] } 4 • Identifying and typing annotations • Identifying annotation offsets • Adding additional knowledge, e.g. named entity identifier • Interrelating annotations
  • 5. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Example (Partial; JSON-LD Syntax) { "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] } 5 • Identifying and typing annotations • Identifying annotation offsets • Adding additional knowledge, e.g. named entity identifier • Interrelating annotations
  • 6. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Example (Partial; JSON-LD Syntax) { "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] } 6 • Identifying and typing annotations • Identifying annotation offsets • Adding additional knowledge, e.g. named entity identifier • Interrelating annotations
  • 7. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Example (Partial; JSON-LD Syntax) { "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] } 7 • Identifying and typing annotations • Identifying annotation offsets • Adding additional knowledge, e.g. named entity identifier • Interrelating annotations
  • 8. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 A NIF workflow 8 Existing content Content analytics, e.g. named entity recognition Conversion to NIF Deploying knowledge from the LLD cloud
  • 9. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Potential scenario: roundtripping 9 Existing content Content analytics, e.g. named entity recognition Conversion to NIF Storing annotations in original content Deploying knowledge from the LLD cloud
  • 10. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Roundtripping • Roundtripping: Storing the outcome of content processing (analytics) tasks in the original content • Not always needed, but sometimes – examples: – Enriching Web content with named entity information; generating Schema.org markup via NIF pipelines. Format: HTML – Enriching localisation content, to add value beyond translation: Format: XLIFF 10
  • 11. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Example: HTML Example roundtripping workflow 11 … <p>Welcome to Prague!</p>… …<p>Welcome to <span … itemtype="http://schema.org/Place">Prague</span>!< /p>… 1) Conversion to NIF 2) NER processing 3) Back conversion to HTML
  • 12. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Example: XLIFF Example roundtripping workflow 12 … <xlf:source>Welcome to Prague!</xlf:source> … … <xlf:source>Welcome to <mrk … its:taClassRef="http://schema.org/Place">Prague </mrk>!</xlf:source> … 1) Conversion to NIF 2) NER processing 3) Back conversion to HTML
  • 13. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Example usage scenario: FREME project • See http://www.freme-project.eu/ • Developing interfaces for multilingual and semantic enrichment of digital content • Relies on NIF based enrichment workflows – See FREME API version 0.1 http://api.freme-project.eu/doc/0.1/ • Deploys aspects of the LIDER reference architecture for LLD processing – See D3.1.1 at http://lider-project.eu/?q=doc/deliverables • Focuses on four business cases – Localization BC requires XLIFF roundtripping – Web content personalisation BC requires HTML roundtripping 13
  • 14. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Challenges for roundtripping • Source format – How to store enrichment information (annotations) – How to handle existing information • Annotation model – NIF = a general graph-based annotation model – Sources format and annotation motivation may require restriction of the model 14
  • 15. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 How to store annotations in various source formats • Solvable for markup languages like HTML or XLIFF • Challenge to preserve existing markup “<p>Welcome to <b>Prague</b>!</p>” • General issue with complex and proprietary formats: – “My own” storage mechanism = no tool support – Using existing storage mechanisms may mean: overloading semantics 15
  • 16. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Source format example: Word … <w:t>Welcome to Prague!</w:t> … 16 … <w:commentRangeStart w:id="0"/><w:t>Prague</w:t> <w:commentRangeEnd w:id="0"/> <w:r w:rsidR="00987079"> … <w:p w:rsidRPr="00987079">… Enrichment: type "http://schema.org/Place"…</w:p> Enrichment process; storing enrichment as comments Change of original content: creation of anchor Comment stored separately; refers to anchor: “standoff approach” Content storage Comment storage Content storage (Word file unzipped)
  • 17. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Annotation models • NIF: like RDF = general graph model – Consisting of nodes and arcs 17 p:char=11,17 dbp:Prague taIdentRef
  • 18. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Restricting graphs: Tree structured annotations on several layers 18 • Tree structures for syntactic annotations • Several annotation layers for the same text • Concurrent hierarchies • Representation only of one of these in roundtripping with XML Example taken from TEI http://www.tei-c.org/release/doc/tei-p5-doc/en/html/NH.html
  • 19. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Representing overlapping hierarchies with markup (1/2) Solutions advertised by the TEI • Multiple encoding of the same information – One XML document per annotation • Boundary marking with empty “milestone” elements – Also used by XLIFF 19
  • 20. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Representing overlapping hierarchies with markup (2/2) Solutions advertised by the TEI • Fragmentation and reconstitution of virtual elements – One hierarchy explicit, others with interrelated marked-up spans • Stand-off markup – Separation of text and annotations, interlinked via anchor and reference – Cf. Word example 20
  • 21. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Representing overlapping hierarchies in RDF POWLA (cf. Chiarcos, 2012) • RDF representation for corpus annotation, based on PAULA XML Standoff format • Allows to represent hierarchical, multi-layer corpora in RDF and query in SPARQL • Not relevant for roundtripping, but for linguistic annotation representation and processing in RDF 21
  • 22. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Lessons learned • Choose the overlap solution that fits your roundtripping modelling and processing needs • Consider off-the-shelf tooling – For 100% hierarchical data: XPath / CSS selectors, DOM, … • Consider libraries – For extraction only: Tika http://tika.apache.org/ – For roundtripping: Okapi http://okapi.opentag.com/ - in FREME currently being adapted for roundtripping in selected formats • Make sure the annotation survives in the original format – cf. Word example – Soon to be made easier by using Okapi 22
  • 23. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Roundtripping of NIF based Linguistic Linked Data with non linked data sources Felix Sasaki DFKI / W3C Fellow 23