SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Creating Knowledge out of Interlinked Data
http://lod2.eu

ISWC – 2013/10/23 – Page 1

Integrating NLP using Linked Data
Sebastian Hellmann, Jens Lehmann, Sören Auer and Martin Brümmer

http://slideshare.net/kurzum
http://nlp2rdf.org
http://lod2.eu

LOD2 Presentation . 02.09.2010 . Page

AKSW, Universität Leipzig

http://lod2.eu
ISWC – 2013/10/23 – Page 2

Introduction

http://lod2.eu
ISWC – 2013/10/23 – Page 3

Introduction

Core problems in integrating NLP:
1. Too much heterogeneity
2. Almost no open standards available
3. Lack of open collaboration
4. Difficult and large domain

http://lod2.eu
ISWC – 2013/10/23 – Page 4

Problem analysis
Hardly any reusability in NLP
• Free software (as in free beer), but no open licenses
• Few standards and few mappings
• Integration is hard-wired (you have to write software)
– for each tool, for each framework
Main benefits of using RDF, OWL and Linked Data are:
• lower entry barrier (as a client / user)
• easy data integration (linking, mapping)
• reusability of tools and conceptualisations (ontologies)
• off-the-shelf solutions for common tasks

http://lod2.eu
ISWC – 2013/10/23 – Page 5

The Semantic Gap

http://lod2.eu
ISWC – 2013/10/23 – Page 6

http://lod2.eu
ISWC – 2013/10/23 – Page 7

NLP2RDF project
NLP2RDF (http://nlp2rdf.org)
- community project bootstrapped by LOD2
- develops NLP Interchange Format (NIF)
- umbrella project to combine (and consolidate) existing work

http://lod2.eu
ISWC – 2013/10/23 – Page 8

NIF Overview
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to
achieve interoperability between Natural Language Processing (NLP) tools,
language resources and annotations.
→ to create an eco-system of interopable web services

http://lod2.eu
ISWC – 2013/10/23 – Page 9

http://lod2.eu

NIF Overview
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to
achieve interoperability between Natural Language Processing (NLP) tools,
language resources and annotations.

•

Reuse of existing standards such as RDF, OWL2, the PROV Ontology, LAF (ISO
24612), Unicode and RFC 5147

•

Standardize access parameters, annotations (e.g. tokenization), validation
and log messages

•

Reuse of existing ontologies:
ISWC – 2013/10/23 – Page 10

http://lod2.eu

Example NIF Workflow

NIF workflow, however, can obviously not provide any better performance (Fmeasure, speed) than a properly configured UIMA or GATE pipeline with the same
components.
ISWC – 2013/10/23 – Page 11

Use Cases
•
•
•

Internationalization TagSet 2.0
Part of Speech Tagging
Wikifier API access via RDFaCE (Entity Linking)

http://lod2.eu
ISWC – 2013/10/23 – Page 12

http://lod2.eu

UC1 - Internationalisation Tagset 2.0

•

NIF will be the recommended RDF conversion of the Internationalisation
Tagset 2.0 of W3C (ITS 2.0) - http://www.w3.org/TR/its20/

•

NIF turns out to have a unique selling proposition regarding NLP and RDF

•

There were no suitable alternative RDF vocabulary for this conversion
available.
ISWC – 2013/10/23 – Page 13

Source: http://www.w3.org/TR/its20/#EX-HTML-whitespace-normalization

http://lod2.eu

ITS 2.0

RDFa parsers loose all provenance information:
<http://examples.com/books/wikinomics> dc:title ''Wikinomics'' .

Source: https://en.wikipedia.org/wiki/RDFa
ISWC – 2013/10/23 – Page 14

UC1 - Internationalisation Tagset 2.0

http://lod2.eu
ISWC – 2013/10/23 – Page 15

UC1 - Internationalisation Tagset 2.0

String offset based on:
- Unicode NFC, code points
- ISO 24612
- RFC 5147

http://lod2.eu
http://lod2.eu

ISWC – 2013/10/23 – Page 16

UC2 – Part of Speech Tagging

Please see the paper:

http://purl.org/olia
ISWC – 2013/10/23 – Page 17

UC3 – Wikifier API access via RDFaCE

https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki

http://lod2.eu
ISWC – 2013/10/23 – Page 18

UC3 - Wikifier API access via RDFaCE
http://rdface.aksw.org/

http://lod2.eu
ISWC – 2013/10/23 – Page 19

UC3 - Wikifier API access via RDFaCE
http://rdface.aksw.org/

http://lod2.eu
ISWC – 2013/10/23 – Page 20

Evaluation
Please see the paper!
1) Quantitative Analysis with Google Wikilinks Corpus as NIF RDF
• Crawl of 3 million web sites, 40 million Wikipedia links
• ~ 477 million triples in NIF
2) Questionnaire and Developers Study for NIF 1.0
• NIF 1.0 was released in September 2009
• Over 30 known implementations (22 not from authors)
• 14 developers participated in the study
• Minimal NIF implementation requires less than 500 LoC
3) Qualitative Comparison with other Frameworks and Formats

http://lod2.eu
ISWC – 2013/10/23 – Page 21

State of NIF 2.0
Corpora as Linked Data
• Wikilinks corpus - http://wiki-link.nlp2rdf.org
• KORE 50 - http://www.yovisto.com/labs/ner-benchmarks/
• DBpedia Spotlight dataset
Tools
• entityclassifier.eu – http://entityclassifier.eu
• Spotlight - http://spotlight.dbpedia.org
• Open NLP
• Stanford CoreNLP - https://github.com/NLP2RDF/software
• Validator - https://github.com/NLP2RDF/software

http://lod2.eu
ISWC – 2013/10/23 – Page 22

State of NIF 2.0
•
•
•

Rollout is in progress
Distributed implementation at different speed and quality
Software lifecycle:
• Implementation
• Testing/Validation
• Integration in the main software
• Deployment as a web service

•

Hosted web services often not up to date while code base is

http://lod2.eu
ISWC – 2013/10/23 – Page 23

How to join - http://nlp2rdf.org

http://lod2.eu
ISWC – 2013/10/23 – Page 24

For ontology creators
NLP2RDF provides infrastructure for your NLP ontologies

•
•
•
•
•
•

Redundant, persistent hosting
Maven packages
Code and documentation generation
Continuous Integration (planned)
Indexing
Validation of instance data

Please write to me or the mailing list
nlp2rdf@lists.informatik.uni-leipzig.de

http://lod2.eu
http://lod2.eu

ISWC – 2013/10/23 – Page 25

Take home message
•

Early industrial uptake
• OpenLink, Vistatech.ie, Zemanta, Tenforce, Unister
• ITS 2.0 W3C standard was driven by localization industry

•
•

NIF is open and free (CC0 planned)
NIF is designed to be a cost-saver

Not primarily aimed at
increasing features or
performance (F-Measure)
ISWC – 2013/10/23 – Page 26

Thanks for your attention
Open Community – All feedback is welcome!
http://slideshare.net/kurzum
Websites:
http://nlp2rdf.org
http://lod2.eu

http://lod2.eu
ISWC – 2013/10/23 – Page 27

Annotations

http://lod2.eu
ISWC – 2013/10/23 – Page 28

NIF

http://lod2.eu
ISWC – 2013/10/23 – Page 29

Scalability - Salzburg Research KMT

https://bitbucket.org/srfgkmt/stanbol-nlp

http://lod2.eu
ISWC – 2013/10/23 – Page 30

Unicode Normal Form C

•
•

Recommendation for RDF Literals
http://unicode.org/reports/tr15/#Norm_Forms

http://lod2.eu
ISWC – 2013/10/23 – Page 31

Tokenization

Christian Chiarcos, Julia Ritz, Manfred Stede: By all these lovely tokens... Merging conflicting tokenizations.
Language Resources and Evaluation 46(1): 53-74 (2012)

http://lod2.eu
http://lod2.eu

ISWC – 2013/10/23 – Page 32

Validation over specification

•
•
•
•
•
•

SPARQL queries produce (find) errors

http://persistence.uni-leipzig.org/nlp2rdf/ontologies/testcase/lib/nif-2.0-suite.t
RLOG – An RDF Logging Ontology
./validate.jar -i nif-erroneous-model.ttl -t file
Demo → character count
Demo → all errors

ALL DEMOS ARE AVAILABLE AT:
http://nlp2rdf.org/leipzig-24-9-2013
ISWC – 2013/10/23 – Page 33

NIF

Demo:
http://nlp2rdf.lod2.eu/demo.php

http://lod2.eu
ISWC – 2013/10/23 – Page 34

OLiA

http://purl.org/olia

http://lod2.eu
ISWC – 2013/10/23 – Page 35

NIF

http://lod2.eu
ISWC – 2013/10/23 – Page 36

NIF

http://lod2.eu

Weitere ähnliche Inhalte

Was ist angesagt? (6)

LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORELOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
 
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
 
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and RepairLOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
 
LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz
 
LOD2 Webinar Series FOX
LOD2 Webinar Series FOXLOD2 Webinar Series FOX
LOD2 Webinar Series FOX
 
LOD2 Webinar: SIREn
LOD2 Webinar: SIREnLOD2 Webinar: SIREn
LOD2 Webinar: SIREn
 

Ähnlich wie Integrating NLP using Linked Data

NIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate reportNIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate reportSebastian Hellmann
 
NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23Sebastian Hellmann
 
Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)Sergio Fernández
 
Linked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and SegmentationLinked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and SegmentationSebastian Hellmann
 
Oc wg-nif-20130711
Oc wg-nif-20130711Oc wg-nif-20130711
Oc wg-nif-20130711STIinnsbruck
 
Linked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web AnnotationLinked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web AnnotationSebastian Hellmann
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013François Belleau
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012François Belleau
 
Linked data and semantic wikis
Linked data and semantic wikisLinked data and semantic wikis
Linked data and semantic wikisSören Auer
 
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...Sebastian Hellmann
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...semanticsconference
 
Cloud open unveillithium-odlnewrelease-2-ns
Cloud open unveillithium-odlnewrelease-2-nsCloud open unveillithium-odlnewrelease-2-ns
Cloud open unveillithium-odlnewrelease-2-nsNEC Corporation
 
Presentation of lpOD (ODF automation platform) at FOSDEM 2010
Presentation of lpOD (ODF automation platform) at FOSDEM 2010Presentation of lpOD (ODF automation platform) at FOSDEM 2010
Presentation of lpOD (ODF automation platform) at FOSDEM 2010Itaapy
 
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeFrom Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeSören Auer
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishBruno Cornec
 
Linguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkLinguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkSebastian Hellmann
 

Ähnlich wie Integrating NLP using Linked Data (20)

NIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate reportNIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate report
 
NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23
 
Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)
 
NIF 2.0 draft for Pisa
NIF 2.0 draft for PisaNIF 2.0 draft for Pisa
NIF 2.0 draft for Pisa
 
Linked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and SegmentationLinked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and Segmentation
 
Oc wg-nif-20130711
Oc wg-nif-20130711Oc wg-nif-20130711
Oc wg-nif-20130711
 
Linked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web AnnotationLinked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web Annotation
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
 
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and RepairLOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
Linked data and semantic wikis
Linked data and semantic wikisLinked data and semantic wikis
Linked data and semantic wikis
 
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
 
Cloud open unveillithium-odlnewrelease-2-ns
Cloud open unveillithium-odlnewrelease-2-nsCloud open unveillithium-odlnewrelease-2-ns
Cloud open unveillithium-odlnewrelease-2-ns
 
OOoCon Lpod
OOoCon LpodOOoCon Lpod
OOoCon Lpod
 
Presentation of lpOD (ODF automation platform) at FOSDEM 2010
Presentation of lpOD (ODF automation platform) at FOSDEM 2010Presentation of lpOD (ODF automation platform) at FOSDEM 2010
Presentation of lpOD (ODF automation platform) at FOSDEM 2010
 
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeFrom Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live Redfish
 
OpenDaylight nluug_november
OpenDaylight nluug_novemberOpenDaylight nluug_november
OpenDaylight nluug_november
 
Linguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkLinguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future Work
 

Mehr von Sebastian Hellmann

DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016Sebastian Hellmann
 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015Sebastian Hellmann
 
LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015Sebastian Hellmann
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataSebastian Hellmann
 
Navigation-induced Knowledge Engineering by Example
 Navigation-induced Knowledge Engineering by Example Navigation-induced Knowledge Engineering by Example
Navigation-induced Knowledge Engineering by ExampleSebastian Hellmann
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftSebastian Hellmann
 

Mehr von Sebastian Hellmann (10)

KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
 
DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016
 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015
 
LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of Data
 
Navigation-induced Knowledge Engineering by Example
 Navigation-induced Knowledge Engineering by Example Navigation-induced Knowledge Engineering by Example
Navigation-induced Knowledge Engineering by Example
 
Introduction to LDL 2012
Introduction to LDL 2012Introduction to LDL 2012
Introduction to LDL 2012
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Tool collection as linkeddata
Tool collection as linkeddataTool collection as linkeddata
Tool collection as linkeddata
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draft
 

Kürzlich hochgeladen

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 

Kürzlich hochgeladen (20)

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 

Integrating NLP using Linked Data

  • 1. Creating Knowledge out of Interlinked Data http://lod2.eu ISWC – 2013/10/23 – Page 1 Integrating NLP using Linked Data Sebastian Hellmann, Jens Lehmann, Sören Auer and Martin Brümmer http://slideshare.net/kurzum http://nlp2rdf.org http://lod2.eu LOD2 Presentation . 02.09.2010 . Page AKSW, Universität Leipzig http://lod2.eu
  • 2. ISWC – 2013/10/23 – Page 2 Introduction http://lod2.eu
  • 3. ISWC – 2013/10/23 – Page 3 Introduction Core problems in integrating NLP: 1. Too much heterogeneity 2. Almost no open standards available 3. Lack of open collaboration 4. Difficult and large domain http://lod2.eu
  • 4. ISWC – 2013/10/23 – Page 4 Problem analysis Hardly any reusability in NLP • Free software (as in free beer), but no open licenses • Few standards and few mappings • Integration is hard-wired (you have to write software) – for each tool, for each framework Main benefits of using RDF, OWL and Linked Data are: • lower entry barrier (as a client / user) • easy data integration (linking, mapping) • reusability of tools and conceptualisations (ontologies) • off-the-shelf solutions for common tasks http://lod2.eu
  • 5. ISWC – 2013/10/23 – Page 5 The Semantic Gap http://lod2.eu
  • 6. ISWC – 2013/10/23 – Page 6 http://lod2.eu
  • 7. ISWC – 2013/10/23 – Page 7 NLP2RDF project NLP2RDF (http://nlp2rdf.org) - community project bootstrapped by LOD2 - develops NLP Interchange Format (NIF) - umbrella project to combine (and consolidate) existing work http://lod2.eu
  • 8. ISWC – 2013/10/23 – Page 8 NIF Overview The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. → to create an eco-system of interopable web services http://lod2.eu
  • 9. ISWC – 2013/10/23 – Page 9 http://lod2.eu NIF Overview The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. • Reuse of existing standards such as RDF, OWL2, the PROV Ontology, LAF (ISO 24612), Unicode and RFC 5147 • Standardize access parameters, annotations (e.g. tokenization), validation and log messages • Reuse of existing ontologies:
  • 10. ISWC – 2013/10/23 – Page 10 http://lod2.eu Example NIF Workflow NIF workflow, however, can obviously not provide any better performance (Fmeasure, speed) than a properly configured UIMA or GATE pipeline with the same components.
  • 11. ISWC – 2013/10/23 – Page 11 Use Cases • • • Internationalization TagSet 2.0 Part of Speech Tagging Wikifier API access via RDFaCE (Entity Linking) http://lod2.eu
  • 12. ISWC – 2013/10/23 – Page 12 http://lod2.eu UC1 - Internationalisation Tagset 2.0 • NIF will be the recommended RDF conversion of the Internationalisation Tagset 2.0 of W3C (ITS 2.0) - http://www.w3.org/TR/its20/ • NIF turns out to have a unique selling proposition regarding NLP and RDF • There were no suitable alternative RDF vocabulary for this conversion available.
  • 13. ISWC – 2013/10/23 – Page 13 Source: http://www.w3.org/TR/its20/#EX-HTML-whitespace-normalization http://lod2.eu ITS 2.0 RDFa parsers loose all provenance information: <http://examples.com/books/wikinomics> dc:title ''Wikinomics'' . Source: https://en.wikipedia.org/wiki/RDFa
  • 14. ISWC – 2013/10/23 – Page 14 UC1 - Internationalisation Tagset 2.0 http://lod2.eu
  • 15. ISWC – 2013/10/23 – Page 15 UC1 - Internationalisation Tagset 2.0 String offset based on: - Unicode NFC, code points - ISO 24612 - RFC 5147 http://lod2.eu
  • 16. http://lod2.eu ISWC – 2013/10/23 – Page 16 UC2 – Part of Speech Tagging Please see the paper: http://purl.org/olia
  • 17. ISWC – 2013/10/23 – Page 17 UC3 – Wikifier API access via RDFaCE https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki http://lod2.eu
  • 18. ISWC – 2013/10/23 – Page 18 UC3 - Wikifier API access via RDFaCE http://rdface.aksw.org/ http://lod2.eu
  • 19. ISWC – 2013/10/23 – Page 19 UC3 - Wikifier API access via RDFaCE http://rdface.aksw.org/ http://lod2.eu
  • 20. ISWC – 2013/10/23 – Page 20 Evaluation Please see the paper! 1) Quantitative Analysis with Google Wikilinks Corpus as NIF RDF • Crawl of 3 million web sites, 40 million Wikipedia links • ~ 477 million triples in NIF 2) Questionnaire and Developers Study for NIF 1.0 • NIF 1.0 was released in September 2009 • Over 30 known implementations (22 not from authors) • 14 developers participated in the study • Minimal NIF implementation requires less than 500 LoC 3) Qualitative Comparison with other Frameworks and Formats http://lod2.eu
  • 21. ISWC – 2013/10/23 – Page 21 State of NIF 2.0 Corpora as Linked Data • Wikilinks corpus - http://wiki-link.nlp2rdf.org • KORE 50 - http://www.yovisto.com/labs/ner-benchmarks/ • DBpedia Spotlight dataset Tools • entityclassifier.eu – http://entityclassifier.eu • Spotlight - http://spotlight.dbpedia.org • Open NLP • Stanford CoreNLP - https://github.com/NLP2RDF/software • Validator - https://github.com/NLP2RDF/software http://lod2.eu
  • 22. ISWC – 2013/10/23 – Page 22 State of NIF 2.0 • • • Rollout is in progress Distributed implementation at different speed and quality Software lifecycle: • Implementation • Testing/Validation • Integration in the main software • Deployment as a web service • Hosted web services often not up to date while code base is http://lod2.eu
  • 23. ISWC – 2013/10/23 – Page 23 How to join - http://nlp2rdf.org http://lod2.eu
  • 24. ISWC – 2013/10/23 – Page 24 For ontology creators NLP2RDF provides infrastructure for your NLP ontologies • • • • • • Redundant, persistent hosting Maven packages Code and documentation generation Continuous Integration (planned) Indexing Validation of instance data Please write to me or the mailing list nlp2rdf@lists.informatik.uni-leipzig.de http://lod2.eu
  • 25. http://lod2.eu ISWC – 2013/10/23 – Page 25 Take home message • Early industrial uptake • OpenLink, Vistatech.ie, Zemanta, Tenforce, Unister • ITS 2.0 W3C standard was driven by localization industry • • NIF is open and free (CC0 planned) NIF is designed to be a cost-saver Not primarily aimed at increasing features or performance (F-Measure)
  • 26. ISWC – 2013/10/23 – Page 26 Thanks for your attention Open Community – All feedback is welcome! http://slideshare.net/kurzum Websites: http://nlp2rdf.org http://lod2.eu http://lod2.eu
  • 27. ISWC – 2013/10/23 – Page 27 Annotations http://lod2.eu
  • 28. ISWC – 2013/10/23 – Page 28 NIF http://lod2.eu
  • 29. ISWC – 2013/10/23 – Page 29 Scalability - Salzburg Research KMT https://bitbucket.org/srfgkmt/stanbol-nlp http://lod2.eu
  • 30. ISWC – 2013/10/23 – Page 30 Unicode Normal Form C • • Recommendation for RDF Literals http://unicode.org/reports/tr15/#Norm_Forms http://lod2.eu
  • 31. ISWC – 2013/10/23 – Page 31 Tokenization Christian Chiarcos, Julia Ritz, Manfred Stede: By all these lovely tokens... Merging conflicting tokenizations. Language Resources and Evaluation 46(1): 53-74 (2012) http://lod2.eu
  • 32. http://lod2.eu ISWC – 2013/10/23 – Page 32 Validation over specification • • • • • • SPARQL queries produce (find) errors http://persistence.uni-leipzig.org/nlp2rdf/ontologies/testcase/lib/nif-2.0-suite.t RLOG – An RDF Logging Ontology ./validate.jar -i nif-erroneous-model.ttl -t file Demo → character count Demo → all errors ALL DEMOS ARE AVAILABLE AT: http://nlp2rdf.org/leipzig-24-9-2013
  • 33. ISWC – 2013/10/23 – Page 33 NIF Demo: http://nlp2rdf.lod2.eu/demo.php http://lod2.eu
  • 34. ISWC – 2013/10/23 – Page 34 OLiA http://purl.org/olia http://lod2.eu
  • 35. ISWC – 2013/10/23 – Page 35 NIF http://lod2.eu
  • 36. ISWC – 2013/10/23 – Page 36 NIF http://lod2.eu