SlideShare ist ein Scribd-Unternehmen logo
1 von 54
Downloaden Sie, um offline zu lesen
NIF Tutorial – 2013/09/24 – Page 1 http://lod2.eu
Creating Knowledge out of Interlinked Data
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
AKSW, Universität Leipzig
Sebastian Hellmann
Content Analysis
and the Semantic Web
NIF 2.0 Tutorial
http://nlp2rdf.org
http://lod2.eu
http://slideshare.net/kurzum
NIF Tutorial – 2013/09/24 – Page 2 http://lod2.eu
Sebastian Hellmann – researcher working on LOD2 EU Project
AKSW – Agile Knowledge and the Semantic Web research group in Leipzig -
http://aksw.org
InfAI – Institute for Applied Informatics - http://infai.org
ALL DEMOS ARE AVAILABLE AT:
http://nlp2rdf.org/leipzig-24-9-2013
Introduction
NIF Tutorial – 2013/09/24 – Page 3 http://lod2.eu
Introduction
ALL DEMOS ARE AVAILABLE AT:
http://nlp2rdf.org/leipzig-24-9-2013
NIF Tutorial – 2013/09/24 – Page 4 http://lod2.eu
End users have tasks for NLP, but:
Each new tool is a challenge:
• How to download and start it?
• What kind of annotations does it use?
• How good does it perform (on my domain)?
• If badly, are there any alternatives? How can I find them?
• Open source?
• Lot's of know-how needed to exploit NLP.
• Lot's of data needed to exploit NLP.
Barriers to NLP
NIF Tutorial – 2013/09/24 – Page 5 http://lod2.eu
The Semantic Gap
NIF Tutorial – 2013/09/24 – Page 6 http://lod2.eu
NIF Tutorial – 2013/09/24 – Page 7 http://lod2.eu
• Part 1: exploiting free, open and interoperable (FOI)
language resources
• Part 2: Connecting text to these resources
• Part 3: tools, demos, infrastructure
From a walled garden to
an interoperable infrastructure
NIF Tutorial – 2013/09/24 – Page 8 http://lod2.eu
• Part 1: exploiting free, open and interoperable (FOI)
language resources
From a walled garden to
an interoperable infrastructure
NIF Tutorial – 2013/09/24 – Page 9 http://lod2.eu
http://lod-cloud.net
Linguistic/NLP Data currently filed
under “cross-domain”
NIF Tutorial – 2013/09/24 – Page 10 http://lod2.eu
http://lod-cloud.net
Linked Open Data
- All datasets provide open access to individual records via HTTP
- Many are free (no payment required, as in royalty-free)
- Some are openly licensed, e.g. CC-0 or CC-BY-SA
=> Open access also applies to published HTML on the WWW, but in LOD the data
itself is published unrendered via RDF
NIF Tutorial – 2013/09/24 – Page 11 http://lod2.eu
Question:
• Who knows how to add a new bubble to the LOD cloud?
From a walled garden to
an interoperable infrastructure
NIF Tutorial – 2013/09/24 – Page 12 http://lod2.eu
• Who knows how to add a new bubble to the LOD cloud?
http://datahub.io/group/linguistics
https://github.com/jmccrae/llod-cloud.py
http://validator.lod-cloud.net/validate.php
From a walled garden to
an interoperable infrastructure
NIF Tutorial – 2013/09/24 – Page 13 http://lod2.eu
NIF Tutorial – 2013/09/24 – Page 14 http://lod2.eu
NIF Tutorial – 2013/09/24 – Page 15 http://lod2.eu
Question:
• What are the most important data sets and ontologies for NLP?
• Who has used what?
FOI data
NIF Tutorial – 2013/09/24 – Page 16 http://lod2.eu
Analysis of mentions of Wikipedia / DBpedia at LREC 2012:
• https://www.google.com/webhp?q=site:http%3A%2F%2Fwww.lrec-conf.org%2
→ 163 papers
• https://www.google.com/webhp?q=site:http%3A%2F%2Fwww.lrec-conf.org%2
→ 24 papers
FOI data 1: Wikipedia / DBpedia
NIF Tutorial – 2013/09/24 – Page 17 http://lod2.eu
• Training data for NLP, e.g. URI, surrounding text, surface form
• Probabilities:
• P(sf|URI): P that “apple” refers to wikipedia:Apple_Inc.
• P(URI|sf): P that wikipedia:Apple_Inc. is “apple” in text
FOI data 1: Wikipedia / DBpedia
http://wiki.dbpedia.org/Datasets/NLP
NIF Tutorial – 2013/09/24 – Page 18 http://lod2.eu
FOI data: Wikipedia / DBpedia
http://lookup.dbpedia.org/api/search.asmx/KeywordSearch?
QueryString=sodium
http://lookup.dbpedia.org/api/search.asmx/KeywordSearch?
QueryString=sodium
Available data for “Sodium”
http://dbpedia.org/snorql
select ?labels where {
<http://dbpedia.org/resource/Sodium> rdfs:label ?labels .
} LIMIT 100
select ?altlabel where {
?redirect dbpedia-owl:wikiPageRedirects <http://dbpedia.org/resource/Sodium> .
?redirect rdfs:label ?altlabel .
} LIMIT 100
http://lcl.uniroma1.it/babelnet/explore.jsp?word=sodium&lang=EN
NIF Tutorial – 2013/09/24 – Page 19 http://lod2.eu
Wiktionary2RDF – Mediator Wrapper
http://dbpedia.org/Wiktionary
NIF Tutorial – 2013/09/24 – Page 20 http://lod2.eu
http://dbpedia.org/Wiktionary
NIF Tutorial – 2013/09/24 – Page 21 http://lod2.eu
http://dbpedia.org/Wiktionary
NIF Tutorial – 2013/09/24 – Page 22 http://lod2.eu
Wiktionary2RDF – Mediator Wrapper
http://dbpedia.org/Wiktionary
Mediator
Lemon
NIF Tutorial – 2013/09/24 – Page 23 http://lod2.eu
Wiktionary2RDF – Mediator Wrapper
http://lcl.uniroma1.it/babelnet/explore.jsp?word=sodium&lang=EN
https://en.wiktionary.org/wiki/sodium#English
http://wiktionary.dbpedia.org/resource/sodium
NIF Tutorial – 2013/09/24 – Page 24 http://lod2.eu
Lemon Ontology - http://lemon-model.net
NIF Tutorial – 2013/09/24 – Page 25 http://lod2.eu
Lemon Ontology - http://lemon-model.net
IntersectiveDataPropertyAdjective ("extinct" ,
dbpedia:conservationStatus ,"EX")
IntersectiveDataPropertyAdjective ("endangered" ,
dbpedia:conservationStatus ,"EN")
https://github.com/cunger/lemon.dbpedia
Christina Unger, John Mccrae, Sebastian Walter, Sara Winter and Philipp Cimiano (2013):
A lemon lexicon for DBpedia. NLP & DBpedia Workshop
NIF Tutorial – 2013/09/24 – Page 26 http://lod2.eu
• Part 2: Connecting text to these resources
From a walled garden to
an interoperable infrastructure
NIF Tutorial – 2013/09/24 – Page 27 http://lod2.eu
From a walled garden to
an interoperable infrastructure
https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki
NIF Tutorial – 2013/09/24 – Page 28 http://lod2.eu
From a walled garden to
an interoperable infrastructure
Overview of existing tools:
• http://en.wikipedia.org/wiki/Knowledge_extraction#Tools
NIF Tutorial – 2013/09/24 – Page 29 http://lod2.eu
From a walled garden to
an interoperable infrastructure
Developers nightmare:
• All tools belong to similar class of NLP tools
→ Wikifier or Named Entity Linking, SOA principle
But they all have:
• Heterogeneous output formats (JSON, XML)
• Heterogeneous API parameters
• Heterogeneous ways of annotating text:
• Some remove HTML internally, offsets not usable
• Some use byte offset instead of char offset
NIF Tutorial – 2013/09/24 – Page 30 http://lod2.eu
From a walled garden to
an interoperable infrastructure
Demo
• http://rdface.aksw.org/new/tinymce/examples/rdface.html
NIF Tutorial – 2013/09/24 – Page 31 http://lod2.eu
ITS 2.0 - http://www.w3.org/TR/its20/
The Internationalization Tag Set (ITS) 2.0 – enhances the foundation to
integrate automated processing of human language into core Web
technologies.
• Currently last call
• Driven by localization industry
• Embed translation aids into HTML and XML
• Robust way to encode NLP information in HTML
• ITS 2.0 describes 20 data categories → ontology
NIF Tutorial – 2013/09/24 – Page 32 http://lod2.eu
NIF overview
Summary
• Motivated the Walled Garden problem
• Overview of the emerging Web of Language resources
• Motivated the NLP tool heterogeneity problem
• Introduction of ITS 2.0 Use case for NIF
• Now: NIF 2.0
NIF Tutorial – 2013/09/24 – Page 33 http://lod2.eu
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to
achieve interoperability between Natural Language Processing (NLP) tools,
language resources and annotations.
• Reuse of existing standards such as RDF, OWL 2, the PROV Ontology, LAF
(ISO 24612), Unicode and RFC 5147
• Standardize access parameters, annotations (e.g. tokenization), validation
and log messages.
• A NIF workflow, however, can obviously not provide any better performance
(F-measure, speed) than a properly configured UIMA or GATE pipeline with
the same components.
• Lower entry barrier, easy data integration, reusability of tools and
conceptualisation, off-the-shelf solutions for common tasks.
NIF Overview
NIF Tutorial – 2013/09/24 – Page 34 http://lod2.eu
Relation of NIF and UIMA and Gate
• A Formal Framework for Linguistic Annotation (2000) by Steven Bird, Mark
Liberman
• take home message: generic annotation formats should be based on
graphs
• Ontologies in NIF (e.g. OliA, lemon) can be hard compiled for internal use (as
is done in Stanbol)
WP3 Task 3.2 – Community work: NLP2RDF
Not primarily aimed at
increasing features or
performance (F-Measure)
NIF Tutorial – 2013/09/24 – Page 35 http://lod2.eu
WP3 Task 3.2 – NIF overview
NIF Tutorial – 2013/09/24 – Page 36 http://lod2.eu
• NIF turns out to have a Unique selling proposition regarding NLP and RDF
• NIF will be the recommended RDF conversion of the Internationalisation
Tagset 2.0 of W3C (ITS 2.0) - http://www.w3.org/TR/its20/
• There was no alternative RDF vocabulary for this conversion available.
NIF Overview
NIF Tutorial – 2013/09/24 – Page 37 http://lod2.eu
WP3 Task 3.2 – Community work: NLP2RDF
RDFa parsers loose all provenance information:
<http://examples.com/books/wikinomics> dc:title ''Wikinomics'' .
https://en.wikipedia.org/wiki/RDFa
NIF Tutorial – 2013/09/24 – Page 38 http://lod2.eu
Available resources:
http://persistence.uni-leipzig.org/nlp2rdf/
Disclaimer
Migration to the online presence is still on-going, but there are 15 scientific
publications, e.g.
Integrating NLP using Linked Data. Sebastian Hellmann, Jens Lehmann, Sören Auer, and Martin Brümmer. 12th
International Semantic Web Conference, 21-25 October 2013, Sydney, Australia, (2013) -
http://svn.aksw.org/papers/2013/ISWC_NIF/public.pdf
NIF Overview
NIF Tutorial – 2013/09/24 – Page 39 http://lod2.eu
Question:
• What is a String?
NIF Basics
NIF Tutorial – 2013/09/24 – Page 40 http://lod2.eu
Counting strings is more difficult than it seems:
• Three ways to count Unicode:
• Code Units
• Code Points
• Graphems
• Encoding:
• UTF-8, 16, 32
NIF Basics Unicode
NIF Tutorial – 2013/09/24 – Page 41 http://lod2.eu
• Code Unit. The minimal bit combination that can represent a unit of encoded
text for processing or interchange. The Unicode Standard uses 8-bit code
units in the UTF-8 encoding form, 16-bit code units in the UTF-16 encoding
form, and 32-bit code units in the UTF-32 encoding form.
• Code Point. (1) Any value in the Unicode codespace; that is, the range of
integers from 0 to 10FFFF16. Not all code points are assigned to encoded
characters. See code point type. (2) A value, or position, for a character, in
any coded character set.
• Unicode Normal Form C
• http://unicode.org/reports/tr15/#Norm_Forms
Unicode
NIF Tutorial – 2013/09/24 – Page 42 http://lod2.eu
• Recommendation for RDF Literals
• http://unicode.org/reports/tr15/#Norm_Forms
Unicode Normal Form C
NIF Tutorial – 2013/09/24 – Page 43 http://lod2.eu
• NIF uses Unicode Normal Form C
• NIF counts in Code Points
Unicode
NIF Tutorial – 2013/09/24 – Page 44 http://lod2.eu
• Sadly, there are still implementation problems:
• Java length() vs. PHP strlen() function
• curl --data-urlencode i=" 대 " -d f=text "http://nlp2rdf.lod2.eu/nif-ws.php"
• Korean Character is URL encoded (#%EB%8C%80) and counted as 3
characters (not NFC in PHP)
Demo
ALL DEMOS ARE AVAILABLE AT:
http://nlp2rdf.org/leipzig-24-9-2013
NIF Tutorial – 2013/09/24 – Page 45 http://lod2.eu
• Now some RDF (finally):
• Note that in NIF the document is != content of the document.
• two different documents can have the same content
=> must not have the same URI
Context
NIF Tutorial – 2013/09/24 – Page 46 http://lod2.eu
Annotations
NIF Tutorial – 2013/09/24 – Page 47 http://lod2.eu
Tokenization
Christian Chiarcos, Julia Ritz, Manfred Stede: By all these lovely tokens... Merging conflicting tokenizations.
Language Resources and Evaluation 46(1): 53-74 (2012)
NIF Tutorial – 2013/09/24 – Page 48 http://lod2.eu
NIF
Demo:
http://nlp2rdf.lod2.eu/demo.php
NIF Tutorial – 2013/09/24 – Page 49 http://lod2.eu
• SPARQL queries produce (find) errors
• http://persistence.uni-leipzig.org/nlp2rdf/ontologies/testcase/lib/nif-2.0-suite.t
• RLOG – An RDF Logging Ontology
• ./validate.jar -i nif-erroneous-model.ttl -t file
• Demo → character count
• Demo → all errors
Validation over specification
ALL DEMOS ARE AVAILABLE AT:
http://nlp2rdf.org/leipzig-24-9-2013
NIF Tutorial – 2013/09/24 – Page 50 http://lod2.eu
NIF
Demo:
http://nlp2rdf.lod2.eu/demo.php
NIF Tutorial – 2013/09/24 – Page 51 http://lod2.eu
NIF
NIF Tutorial – 2013/09/24 – Page 52 http://lod2.eu
• http://www.w3.org/TR/its20/#conversion-to-nif
• http://www.w3.org/TR/its20/#nif-backconversion
NIF
NIF Tutorial – 2013/09/24 – Page 53 http://lod2.eu
• Demo
• Load Terminological model or Inference Model
Reasoning
NIF Tutorial – 2013/09/24 – Page 54 http://lod2.eu
Open Community – All feedback is welcome!
http://slideshare.net/kurzum
Websites:
http://dbpedia.org
http://nlp2rdf.org
http://lod2.eu
Thanks for your attention
ALL DEMOS ARE AVAILABLE AT:
http://nlp2rdf.org/leipzig-24-9-2013

Weitere ähnliche Inhalte

Was ist angesagt?

From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...WARCnet
 
Fremeatfeisgiltt2015 fremelinkeddatalocalisers-150603090934-lva1-app6891
Fremeatfeisgiltt2015 fremelinkeddatalocalisers-150603090934-lva1-app6891Fremeatfeisgiltt2015 fremelinkeddatalocalisers-150603090934-lva1-app6891
Fremeatfeisgiltt2015 fremelinkeddatalocalisers-150603090934-lva1-app6891FREMEProjectH2020
 
Freme at feisgiltt 2015 freme & linked data & localisers
Freme at feisgiltt 2015   freme & linked data & localisersFreme at feisgiltt 2015   freme & linked data & localisers
Freme at feisgiltt 2015 freme & linked data & localisersFelix Sasaki
 
META-NET: Language Technology for Europe
META-NET: Language Technology for EuropeMETA-NET: Language Technology for Europe
META-NET: Language Technology for EuropeGeorg Rehm
 
META-NET and META-SHARE: Language Technology for Europe
META-NET and META-SHARE: Language Technology for EuropeMETA-NET and META-SHARE: Language Technology for Europe
META-NET and META-SHARE: Language Technology for EuropeGeorg Rehm
 

Was ist angesagt? (9)

LOD2 Webinar: SIREn
LOD2 Webinar: SIREnLOD2 Webinar: SIREn
LOD2 Webinar: SIREn
 
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
 
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
 
Fremeatfeisgiltt2015 fremelinkeddatalocalisers-150603090934-lva1-app6891
Fremeatfeisgiltt2015 fremelinkeddatalocalisers-150603090934-lva1-app6891Fremeatfeisgiltt2015 fremelinkeddatalocalisers-150603090934-lva1-app6891
Fremeatfeisgiltt2015 fremelinkeddatalocalisers-150603090934-lva1-app6891
 
Freme at feisgiltt 2015 freme & linked data & localisers
Freme at feisgiltt 2015   freme & linked data & localisersFreme at feisgiltt 2015   freme & linked data & localisers
Freme at feisgiltt 2015 freme & linked data & localisers
 
META-NET: Language Technology for Europe
META-NET: Language Technology for EuropeMETA-NET: Language Technology for Europe
META-NET: Language Technology for Europe
 
viki.
viki.viki.
viki.
 
LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine
 
META-NET and META-SHARE: Language Technology for Europe
META-NET and META-SHARE: Language Technology for EuropeMETA-NET and META-SHARE: Language Technology for Europe
META-NET and META-SHARE: Language Technology for Europe
 

Ähnlich wie NIF 2.0 Tutorial: Content Analysis and the Semantic Web

Linked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and SegmentationLinked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and SegmentationSebastian Hellmann
 
NIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate reportNIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate reportSebastian Hellmann
 
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...Sebastian Hellmann
 
Populating DBpedia FR and using it for Extracting Information
Populating DBpedia FR and using it for Extracting InformationPopulating DBpedia FR and using it for Extracting Information
Populating DBpedia FR and using it for Extracting InformationJulien PLU
 
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...Pieter Pauwels
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...semanticsconference
 
Linked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web AnnotationLinked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web AnnotationSebastian Hellmann
 
NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23Sebastian Hellmann
 
Linguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkLinguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkSebastian Hellmann
 
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF toolsCIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF toolsPieter Pauwels
 
TPAC2016 - From Linked Building Data to Building Data on the Web
TPAC2016 - From Linked Building Data to Building Data on the WebTPAC2016 - From Linked Building Data to Building Data on the Web
TPAC2016 - From Linked Building Data to Building Data on the WebPieter Pauwels
 
Navigation-induced Knowledge Engineering by Example
 Navigation-induced Knowledge Engineering by Example Navigation-induced Knowledge Engineering by Example
Navigation-induced Knowledge Engineering by ExampleSebastian Hellmann
 
RDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct MappingRDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct MappingBoris Villazón-Terrazas
 

Ähnlich wie NIF 2.0 Tutorial: Content Analysis and the Semantic Web (20)

Linked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and SegmentationLinked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and Segmentation
 
NIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate reportNIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate report
 
NIF 2.0 draft for Pisa
NIF 2.0 draft for PisaNIF 2.0 draft for Pisa
NIF 2.0 draft for Pisa
 
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
 
Populating DBpedia FR and using it for Extracting Information
Populating DBpedia FR and using it for Extracting InformationPopulating DBpedia FR and using it for Extracting Information
Populating DBpedia FR and using it for Extracting Information
 
LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge BasesLOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases
 
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
 
Linked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web AnnotationLinked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web Annotation
 
LOD2 webinar series: Virtuoso by OpenLink Software
LOD2 webinar series: Virtuoso by OpenLink SoftwareLOD2 webinar series: Virtuoso by OpenLink Software
LOD2 webinar series: Virtuoso by OpenLink Software
 
Linked Open Data stuff
Linked Open Data stuffLinked Open Data stuff
Linked Open Data stuff
 
NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23
 
Linguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkLinguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future Work
 
LOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: LOD2 in information and publishing industryLOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: LOD2 in information and publishing industry
 
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF toolsCIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
 
TPAC2016 - From Linked Building Data to Building Data on the Web
TPAC2016 - From Linked Building Data to Building Data on the WebTPAC2016 - From Linked Building Data to Building Data on the Web
TPAC2016 - From Linked Building Data to Building Data on the Web
 
Navigation-induced Knowledge Engineering by Example
 Navigation-induced Knowledge Engineering by Example Navigation-induced Knowledge Engineering by Example
Navigation-induced Knowledge Engineering by Example
 
Free Webinar: LOD2 Stack - 1st release
Free Webinar: LOD2 Stack - 1st releaseFree Webinar: LOD2 Stack - 1st release
Free Webinar: LOD2 Stack - 1st release
 
RDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct MappingRDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct Mapping
 
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and AuthoringLOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
 

Mehr von Sebastian Hellmann

DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016Sebastian Hellmann
 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015Sebastian Hellmann
 
LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015Sebastian Hellmann
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataSebastian Hellmann
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftSebastian Hellmann
 

Mehr von Sebastian Hellmann (10)

KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
 
DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016
 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015
 
LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of Data
 
Introduction to LDL 2012
Introduction to LDL 2012Introduction to LDL 2012
Introduction to LDL 2012
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
NIF - NLP Interchange Format
NIF - NLP Interchange FormatNIF - NLP Interchange Format
NIF - NLP Interchange Format
 
Tool collection as linkeddata
Tool collection as linkeddataTool collection as linkeddata
Tool collection as linkeddata
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draft
 

Kürzlich hochgeladen

UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 

Kürzlich hochgeladen (20)

UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 

NIF 2.0 Tutorial: Content Analysis and the Semantic Web

  • 1. NIF Tutorial – 2013/09/24 – Page 1 http://lod2.eu Creating Knowledge out of Interlinked Data LOD2 Presentation . 02.09.2010 . Page http://lod2.eu AKSW, Universität Leipzig Sebastian Hellmann Content Analysis and the Semantic Web NIF 2.0 Tutorial http://nlp2rdf.org http://lod2.eu http://slideshare.net/kurzum
  • 2. NIF Tutorial – 2013/09/24 – Page 2 http://lod2.eu Sebastian Hellmann – researcher working on LOD2 EU Project AKSW – Agile Knowledge and the Semantic Web research group in Leipzig - http://aksw.org InfAI – Institute for Applied Informatics - http://infai.org ALL DEMOS ARE AVAILABLE AT: http://nlp2rdf.org/leipzig-24-9-2013 Introduction
  • 3. NIF Tutorial – 2013/09/24 – Page 3 http://lod2.eu Introduction ALL DEMOS ARE AVAILABLE AT: http://nlp2rdf.org/leipzig-24-9-2013
  • 4. NIF Tutorial – 2013/09/24 – Page 4 http://lod2.eu End users have tasks for NLP, but: Each new tool is a challenge: • How to download and start it? • What kind of annotations does it use? • How good does it perform (on my domain)? • If badly, are there any alternatives? How can I find them? • Open source? • Lot's of know-how needed to exploit NLP. • Lot's of data needed to exploit NLP. Barriers to NLP
  • 5. NIF Tutorial – 2013/09/24 – Page 5 http://lod2.eu The Semantic Gap
  • 6. NIF Tutorial – 2013/09/24 – Page 6 http://lod2.eu
  • 7. NIF Tutorial – 2013/09/24 – Page 7 http://lod2.eu • Part 1: exploiting free, open and interoperable (FOI) language resources • Part 2: Connecting text to these resources • Part 3: tools, demos, infrastructure From a walled garden to an interoperable infrastructure
  • 8. NIF Tutorial – 2013/09/24 – Page 8 http://lod2.eu • Part 1: exploiting free, open and interoperable (FOI) language resources From a walled garden to an interoperable infrastructure
  • 9. NIF Tutorial – 2013/09/24 – Page 9 http://lod2.eu http://lod-cloud.net Linguistic/NLP Data currently filed under “cross-domain”
  • 10. NIF Tutorial – 2013/09/24 – Page 10 http://lod2.eu http://lod-cloud.net Linked Open Data - All datasets provide open access to individual records via HTTP - Many are free (no payment required, as in royalty-free) - Some are openly licensed, e.g. CC-0 or CC-BY-SA => Open access also applies to published HTML on the WWW, but in LOD the data itself is published unrendered via RDF
  • 11. NIF Tutorial – 2013/09/24 – Page 11 http://lod2.eu Question: • Who knows how to add a new bubble to the LOD cloud? From a walled garden to an interoperable infrastructure
  • 12. NIF Tutorial – 2013/09/24 – Page 12 http://lod2.eu • Who knows how to add a new bubble to the LOD cloud? http://datahub.io/group/linguistics https://github.com/jmccrae/llod-cloud.py http://validator.lod-cloud.net/validate.php From a walled garden to an interoperable infrastructure
  • 13. NIF Tutorial – 2013/09/24 – Page 13 http://lod2.eu
  • 14. NIF Tutorial – 2013/09/24 – Page 14 http://lod2.eu
  • 15. NIF Tutorial – 2013/09/24 – Page 15 http://lod2.eu Question: • What are the most important data sets and ontologies for NLP? • Who has used what? FOI data
  • 16. NIF Tutorial – 2013/09/24 – Page 16 http://lod2.eu Analysis of mentions of Wikipedia / DBpedia at LREC 2012: • https://www.google.com/webhp?q=site:http%3A%2F%2Fwww.lrec-conf.org%2 → 163 papers • https://www.google.com/webhp?q=site:http%3A%2F%2Fwww.lrec-conf.org%2 → 24 papers FOI data 1: Wikipedia / DBpedia
  • 17. NIF Tutorial – 2013/09/24 – Page 17 http://lod2.eu • Training data for NLP, e.g. URI, surrounding text, surface form • Probabilities: • P(sf|URI): P that “apple” refers to wikipedia:Apple_Inc. • P(URI|sf): P that wikipedia:Apple_Inc. is “apple” in text FOI data 1: Wikipedia / DBpedia http://wiki.dbpedia.org/Datasets/NLP
  • 18. NIF Tutorial – 2013/09/24 – Page 18 http://lod2.eu FOI data: Wikipedia / DBpedia http://lookup.dbpedia.org/api/search.asmx/KeywordSearch? QueryString=sodium http://lookup.dbpedia.org/api/search.asmx/KeywordSearch? QueryString=sodium Available data for “Sodium” http://dbpedia.org/snorql select ?labels where { <http://dbpedia.org/resource/Sodium> rdfs:label ?labels . } LIMIT 100 select ?altlabel where { ?redirect dbpedia-owl:wikiPageRedirects <http://dbpedia.org/resource/Sodium> . ?redirect rdfs:label ?altlabel . } LIMIT 100 http://lcl.uniroma1.it/babelnet/explore.jsp?word=sodium&lang=EN
  • 19. NIF Tutorial – 2013/09/24 – Page 19 http://lod2.eu Wiktionary2RDF – Mediator Wrapper http://dbpedia.org/Wiktionary
  • 20. NIF Tutorial – 2013/09/24 – Page 20 http://lod2.eu http://dbpedia.org/Wiktionary
  • 21. NIF Tutorial – 2013/09/24 – Page 21 http://lod2.eu http://dbpedia.org/Wiktionary
  • 22. NIF Tutorial – 2013/09/24 – Page 22 http://lod2.eu Wiktionary2RDF – Mediator Wrapper http://dbpedia.org/Wiktionary Mediator Lemon
  • 23. NIF Tutorial – 2013/09/24 – Page 23 http://lod2.eu Wiktionary2RDF – Mediator Wrapper http://lcl.uniroma1.it/babelnet/explore.jsp?word=sodium&lang=EN https://en.wiktionary.org/wiki/sodium#English http://wiktionary.dbpedia.org/resource/sodium
  • 24. NIF Tutorial – 2013/09/24 – Page 24 http://lod2.eu Lemon Ontology - http://lemon-model.net
  • 25. NIF Tutorial – 2013/09/24 – Page 25 http://lod2.eu Lemon Ontology - http://lemon-model.net IntersectiveDataPropertyAdjective ("extinct" , dbpedia:conservationStatus ,"EX") IntersectiveDataPropertyAdjective ("endangered" , dbpedia:conservationStatus ,"EN") https://github.com/cunger/lemon.dbpedia Christina Unger, John Mccrae, Sebastian Walter, Sara Winter and Philipp Cimiano (2013): A lemon lexicon for DBpedia. NLP & DBpedia Workshop
  • 26. NIF Tutorial – 2013/09/24 – Page 26 http://lod2.eu • Part 2: Connecting text to these resources From a walled garden to an interoperable infrastructure
  • 27. NIF Tutorial – 2013/09/24 – Page 27 http://lod2.eu From a walled garden to an interoperable infrastructure https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki
  • 28. NIF Tutorial – 2013/09/24 – Page 28 http://lod2.eu From a walled garden to an interoperable infrastructure Overview of existing tools: • http://en.wikipedia.org/wiki/Knowledge_extraction#Tools
  • 29. NIF Tutorial – 2013/09/24 – Page 29 http://lod2.eu From a walled garden to an interoperable infrastructure Developers nightmare: • All tools belong to similar class of NLP tools → Wikifier or Named Entity Linking, SOA principle But they all have: • Heterogeneous output formats (JSON, XML) • Heterogeneous API parameters • Heterogeneous ways of annotating text: • Some remove HTML internally, offsets not usable • Some use byte offset instead of char offset
  • 30. NIF Tutorial – 2013/09/24 – Page 30 http://lod2.eu From a walled garden to an interoperable infrastructure Demo • http://rdface.aksw.org/new/tinymce/examples/rdface.html
  • 31. NIF Tutorial – 2013/09/24 – Page 31 http://lod2.eu ITS 2.0 - http://www.w3.org/TR/its20/ The Internationalization Tag Set (ITS) 2.0 – enhances the foundation to integrate automated processing of human language into core Web technologies. • Currently last call • Driven by localization industry • Embed translation aids into HTML and XML • Robust way to encode NLP information in HTML • ITS 2.0 describes 20 data categories → ontology
  • 32. NIF Tutorial – 2013/09/24 – Page 32 http://lod2.eu NIF overview Summary • Motivated the Walled Garden problem • Overview of the emerging Web of Language resources • Motivated the NLP tool heterogeneity problem • Introduction of ITS 2.0 Use case for NIF • Now: NIF 2.0
  • 33. NIF Tutorial – 2013/09/24 – Page 33 http://lod2.eu The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. • Reuse of existing standards such as RDF, OWL 2, the PROV Ontology, LAF (ISO 24612), Unicode and RFC 5147 • Standardize access parameters, annotations (e.g. tokenization), validation and log messages. • A NIF workflow, however, can obviously not provide any better performance (F-measure, speed) than a properly configured UIMA or GATE pipeline with the same components. • Lower entry barrier, easy data integration, reusability of tools and conceptualisation, off-the-shelf solutions for common tasks. NIF Overview
  • 34. NIF Tutorial – 2013/09/24 – Page 34 http://lod2.eu Relation of NIF and UIMA and Gate • A Formal Framework for Linguistic Annotation (2000) by Steven Bird, Mark Liberman • take home message: generic annotation formats should be based on graphs • Ontologies in NIF (e.g. OliA, lemon) can be hard compiled for internal use (as is done in Stanbol) WP3 Task 3.2 – Community work: NLP2RDF Not primarily aimed at increasing features or performance (F-Measure)
  • 35. NIF Tutorial – 2013/09/24 – Page 35 http://lod2.eu WP3 Task 3.2 – NIF overview
  • 36. NIF Tutorial – 2013/09/24 – Page 36 http://lod2.eu • NIF turns out to have a Unique selling proposition regarding NLP and RDF • NIF will be the recommended RDF conversion of the Internationalisation Tagset 2.0 of W3C (ITS 2.0) - http://www.w3.org/TR/its20/ • There was no alternative RDF vocabulary for this conversion available. NIF Overview
  • 37. NIF Tutorial – 2013/09/24 – Page 37 http://lod2.eu WP3 Task 3.2 – Community work: NLP2RDF RDFa parsers loose all provenance information: <http://examples.com/books/wikinomics> dc:title ''Wikinomics'' . https://en.wikipedia.org/wiki/RDFa
  • 38. NIF Tutorial – 2013/09/24 – Page 38 http://lod2.eu Available resources: http://persistence.uni-leipzig.org/nlp2rdf/ Disclaimer Migration to the online presence is still on-going, but there are 15 scientific publications, e.g. Integrating NLP using Linked Data. Sebastian Hellmann, Jens Lehmann, Sören Auer, and Martin Brümmer. 12th International Semantic Web Conference, 21-25 October 2013, Sydney, Australia, (2013) - http://svn.aksw.org/papers/2013/ISWC_NIF/public.pdf NIF Overview
  • 39. NIF Tutorial – 2013/09/24 – Page 39 http://lod2.eu Question: • What is a String? NIF Basics
  • 40. NIF Tutorial – 2013/09/24 – Page 40 http://lod2.eu Counting strings is more difficult than it seems: • Three ways to count Unicode: • Code Units • Code Points • Graphems • Encoding: • UTF-8, 16, 32 NIF Basics Unicode
  • 41. NIF Tutorial – 2013/09/24 – Page 41 http://lod2.eu • Code Unit. The minimal bit combination that can represent a unit of encoded text for processing or interchange. The Unicode Standard uses 8-bit code units in the UTF-8 encoding form, 16-bit code units in the UTF-16 encoding form, and 32-bit code units in the UTF-32 encoding form. • Code Point. (1) Any value in the Unicode codespace; that is, the range of integers from 0 to 10FFFF16. Not all code points are assigned to encoded characters. See code point type. (2) A value, or position, for a character, in any coded character set. • Unicode Normal Form C • http://unicode.org/reports/tr15/#Norm_Forms Unicode
  • 42. NIF Tutorial – 2013/09/24 – Page 42 http://lod2.eu • Recommendation for RDF Literals • http://unicode.org/reports/tr15/#Norm_Forms Unicode Normal Form C
  • 43. NIF Tutorial – 2013/09/24 – Page 43 http://lod2.eu • NIF uses Unicode Normal Form C • NIF counts in Code Points Unicode
  • 44. NIF Tutorial – 2013/09/24 – Page 44 http://lod2.eu • Sadly, there are still implementation problems: • Java length() vs. PHP strlen() function • curl --data-urlencode i=" 대 " -d f=text "http://nlp2rdf.lod2.eu/nif-ws.php" • Korean Character is URL encoded (#%EB%8C%80) and counted as 3 characters (not NFC in PHP) Demo ALL DEMOS ARE AVAILABLE AT: http://nlp2rdf.org/leipzig-24-9-2013
  • 45. NIF Tutorial – 2013/09/24 – Page 45 http://lod2.eu • Now some RDF (finally): • Note that in NIF the document is != content of the document. • two different documents can have the same content => must not have the same URI Context
  • 46. NIF Tutorial – 2013/09/24 – Page 46 http://lod2.eu Annotations
  • 47. NIF Tutorial – 2013/09/24 – Page 47 http://lod2.eu Tokenization Christian Chiarcos, Julia Ritz, Manfred Stede: By all these lovely tokens... Merging conflicting tokenizations. Language Resources and Evaluation 46(1): 53-74 (2012)
  • 48. NIF Tutorial – 2013/09/24 – Page 48 http://lod2.eu NIF Demo: http://nlp2rdf.lod2.eu/demo.php
  • 49. NIF Tutorial – 2013/09/24 – Page 49 http://lod2.eu • SPARQL queries produce (find) errors • http://persistence.uni-leipzig.org/nlp2rdf/ontologies/testcase/lib/nif-2.0-suite.t • RLOG – An RDF Logging Ontology • ./validate.jar -i nif-erroneous-model.ttl -t file • Demo → character count • Demo → all errors Validation over specification ALL DEMOS ARE AVAILABLE AT: http://nlp2rdf.org/leipzig-24-9-2013
  • 50. NIF Tutorial – 2013/09/24 – Page 50 http://lod2.eu NIF Demo: http://nlp2rdf.lod2.eu/demo.php
  • 51. NIF Tutorial – 2013/09/24 – Page 51 http://lod2.eu NIF
  • 52. NIF Tutorial – 2013/09/24 – Page 52 http://lod2.eu • http://www.w3.org/TR/its20/#conversion-to-nif • http://www.w3.org/TR/its20/#nif-backconversion NIF
  • 53. NIF Tutorial – 2013/09/24 – Page 53 http://lod2.eu • Demo • Load Terminological model or Inference Model Reasoning
  • 54. NIF Tutorial – 2013/09/24 – Page 54 http://lod2.eu Open Community – All feedback is welcome! http://slideshare.net/kurzum Websites: http://dbpedia.org http://nlp2rdf.org http://lod2.eu Thanks for your attention ALL DEMOS ARE AVAILABLE AT: http://nlp2rdf.org/leipzig-24-9-2013