CLTL Software and Web Services Guide

•Als PPTX, PDF herunterladen•

0 gefällt mir•397 views

Rubén Izquierdo Beviá

CLTL: Description of web services and sofware. Nijmegen 2013

Technologie Bildung

CLTL
Software and Web
Services
Rubén Izquierdo Beviá

Rubén Izquierdo Beviá
About me
 5-year degree on Computer Science (University of
Alicante, Alicante, Spain)

 National NLP projects and 1 European project (QALLME)
(University of Alicante, Alicante, Spain)

 Thesis about NLP & Word Sense Disambiguation (University
of Alicante, Alicante, Spain. Sept 2010)

 Postdoc position at DutchSemCor Project (University of
Tilburg, Tilburg. Sept 2011-Sept2012)

 Postdoc position at OpeNER Project (Vrije
University, Amsterdam. Sept 2012-)

CLTL software
 In general common input/output format
 KAF
 NAF, as an extension of KAF

 Single components performing single tasks
 Integration of existing modules
 Adaptation of input/output formats

 Development of new ones

KAF
Kyoto Annotation Format
 Stand-off, layered, XML-based representation format





Different types of information are stored in different layers
Layers are linked by means of references
Suitable for creating pipelines based on this format
Layers:
 Text  tokens
 Term  lemmas, part-of-speech, term sentiment, word
senses
 Entities, chunks, opinions…

NAF
NewsReader Annotation Format
 Extension of KAF

 Allow the cross-document processing
 Event coreference

 ID’s are converted into valid URI’s

 Store the same type of information provided by different
tools
 Result of two different pos-taggers

How the software is provided I
 All modules are publicly available on GitHub
 CLTL GitHub
 http://github.com/cltl

 NewsReader GitHub
 http://github.com/newsreader

 OpeNER GitHub
 http://github.com/opener-project/

How the software is provided
II
 Some are available as Web Services
 Exposed as REST web services
 Accept and input stream (KAF/NAF)
 Generate an output stream (KAF/NAF)
 Easy to call from command line with CURL
 Easy to create module pipelines in the same way you create a
linux commands pipeline

 http://wordpress.let.vupr.nl/web-services/

Our software I
 General modules (integrated)
 Tokenizers: whitespace based, open-nlp trained...
 Sentence splitters: based on rules, open-nlp
 Pos-taggers: treetagger, open-nlp pos taggers
 Chunker: trained on Alpino data with open-nlp
 Parsers: Alpino (nl), Stanford (en)

Our software II
 General modules (developed by us)
 Wordnet Tools
 Functions to use a WordNet in LMF format

 Word Sense Disambiguation systems
 UKB: unsupersived
 SVM: supervised (for nl derived from DutchSemcor)

 Multiword tagger
 multiword sequences of terms according the WordNet

 OntoTagger
 Ontotagger inserts (semantic) labels into KAF representation on the basis
of lemma or wordnet synset representations of text

Our software III
 General modules (developed by us)
 Named Entity Recognizer
 Detects dates and locations using specific resources +
GeoNames

 KyBot
 Extract tuples and relations from a set of profiles formulated
using semantic and structural properties

Our software IV
 OpeNER related (developed by us)
 Hotel property tagger
 Detect aspects related with
cleanliness, staff, breakfast, rooms…

 Term polarity tagger
 Positive/negative terms, intensifiers, negators …
 Opinion miner
 Detect opinions: target + holder + expression
 2 rule based version // 1 machine learning version

Our software V
 NewsReader related (developed by us)
 Discourse Module
 Splits incoming texts into headers and paragraphs
 Factuality Classifier
 Classifies whether a statement is factual/probable/possible or
not

 Event Coreference
 Compares descriptions of events within and across
documents to decide if they refer to the same events.

Weitere ähnliche Inhalte

Andere mochten auch

Social media in de culturele sectorSuzan Huesken - van Dooren

Efficient approach of patent search paradigm (abstract)Prateek Jaiswal

CLTL python course: Object Oriented Programming (3/3)Rubén Izquierdo Beviá

Divine safety finalTAVADO

Social media & de culturele sectorSuzan Huesken - van Dooren

5 FAQS About Dental ImplantsDrBjorklund

Portfoliojessica_minnis

patent search paradigm (ieee)Prateek Jaiswal

Проект : Есть такая профессия - Родину защищать!Aleksey92

CLTL Software and Web Services Rubén Izquierdo Beviá

Self-Organizing Time Synchronization in Wireless Sensor Networks with Adaptiv...Önder Gürcan

Managing A Hedge Fund: Marketing To Investors & Raising CapitalTyra Jeffries

Маркетинг Monster energyPavel Kozlov

Peran pemimpin perubahanYusuf Darismah

Andere mochten auch (15)

Social media in de culturele sector

Efficient approach of patent search paradigm (abstract)

CLTL python course: Object Oriented Programming (3/3)

Divine safety final

Social media & de culturele sector

5 FAQS About Dental Implants

Portfolio

patent search paradigm (ieee)

Проект : Есть такая профессия - Родину защищать!

CLTL Software and Web Services

Self-Organizing Time Synchronization in Wireless Sensor Networks with Adaptiv...

Managing A Hedge Fund: Marketing To Investors & Raising Capital

Маркетинг Monster energy

Peran pemimpin perubahan

Ähnlich wie CLTL Software and Web Services Guide

OOP Comparative StudyDarren Tan

A Strong Object Recognition Using Lbp, Ltp And RlbpRikki Wright

NIF - Version 1.0 - 2011/10/23Sebastian Hellmann

Programing paradigm & implementationBilal Maqbool ツ

Evolution Of Object Oriented TechnologySharon Roberts

Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Project

plone.app.multilingual Ramon Navarro

c#.pptxGokulPadmakumar3

F# Tutorial @ QConTomas Petricek

epicenter2010 Open XmlCraig Murphy

Chapter1guest9ccd0e

Dot net-interview-questions-and-answers part iRakesh Joshi

SinuxUniv. Al. I. Cuza

OOoCon LpodAlexandro Colorado

Microsoft.NetVishwa Mohan

.NetGowarthini

OOP JavaSaif Kassim

Presentation of lpOD (ODF automation platform) at FOSDEM 2010Itaapy

OBJECT ORIENTED PROGRAMMING.docxAleKi2

Ähnlich wie CLTL Software and Web Services Guide (20)

OOP Comparative Study

A Strong Object Recognition Using Lbp, Ltp And Rlbp

NIF - Version 1.0 - 2011/10/23

Programing paradigm & implementation

Evolution Of Object Oriented Technology

Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services

plone.app.multilingual

c#.pptx

F# Tutorial @ QCon

epicenter2010 Open Xml

Chapter1

Dot net-interview-questions-and-answers part i

Sinux

OOoCon Lpod

Microsoft.Net

.Net

OOP Java

Presentation of lpOD (ODF automation platform) at FOSDEM 2010

OBJECT ORIENTED PROGRAMMING.docx

Mehr von Rubén Izquierdo Beviá

ULM-1 Understanding Languages by Machines: The borders of AmbiguityRubén Izquierdo Beviá

DutchSemCor workshop: Domain classification and WSD systemsRubén Izquierdo Beviá

RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged CorpusRubén Izquierdo Beviá

Topic modeling and WSD on the Ancora corpusRubén Izquierdo Beviá

Information ExtractionRubén Izquierdo Beviá

Error analysis of Word Sense DisambiguationRubén Izquierdo Beviá

Juan Calvino y el CalvinismoRubén Izquierdo Beviá

KafNafParserPy: a python library for parsing/creating KAF and NAF filesRubén Izquierdo Beviá

CLTL python course: Object Oriented Programming (2/3)Rubén Izquierdo Beviá

CLTL python course: Object Oriented Programming (1/3)Rubén Izquierdo Beviá

Thesis presentation (WSD and Semantic Classes)Rubén Izquierdo Beviá

ULM1 - The borders of AmbiguityRubén Izquierdo Beviá

CLTL presentation: training an opinion mining system from KAF files using CRFRubén Izquierdo Beviá

CLIN 2012: DutchSemCor Building a semantically annotated corpus for DutchRubén Izquierdo Beviá

RANLP 2013: DutchSemcor in quest of the ideal corpusRubén Izquierdo Beviá

Mehr von Rubén Izquierdo Beviá (15)

ULM-1 Understanding Languages by Machines: The borders of Ambiguity

DutchSemCor workshop: Domain classification and WSD systems

RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus

Topic modeling and WSD on the Ancora corpus

Information Extraction

Error analysis of Word Sense Disambiguation

Juan Calvino y el Calvinismo

KafNafParserPy: a python library for parsing/creating KAF and NAF files

CLTL python course: Object Oriented Programming (2/3)

CLTL python course: Object Oriented Programming (1/3)

Thesis presentation (WSD and Semantic Classes)

ULM1 - The borders of Ambiguity

CLTL presentation: training an opinion mining system from KAF files using CRF

CLIN 2012: DutchSemCor Building a semantically annotated corpus for Dutch

RANLP 2013: DutchSemcor in quest of the ideal corpus

Kürzlich hochgeladen

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Commit 2024 - Secret Management made easyAlfredo García Lavilla

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Gen AI in Business - Global Trends Report 2024.pdfAddepto

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Kürzlich hochgeladen (20)

SAP Build Work Zone - Overview L2-L3.pptx

DMCC Future of Trade Web3 - Special Edition

Commit 2024 - Secret Management made easy

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Gen AI in Business - Global Trends Report 2024.pdf

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Moving Beyond Passwords: FIDO Paris Seminar.pdf

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

WordPress Websites for Engineers: Elevate Your Brand

Scanning the Internet for External Cloud Exposures via SSL Certs

Unraveling Multimodality with Large Language Models.pdf

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

DevEX - reference for building teams, processes, and platforms

DevoxxFR 2024 Reproducible Builds with Apache Maven

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

TeamStation AI System Report LATAM IT Salaries 2024

Ensuring Technical Readiness For Copilot in Microsoft 365

Developer Data Modeling Mistakes: From Postgres to NoSQL

CLTL Software and Web Services Guide

1. CLTL Software and Web Services Rubén Izquierdo Beviá

2. Rubén Izquierdo Beviá About me  5-year degree on Computer Science (University of Alicante, Alicante, Spain)  National NLP projects and 1 European project (QALLME) (University of Alicante, Alicante, Spain)  Thesis about NLP & Word Sense Disambiguation (University of Alicante, Alicante, Spain. Sept 2010)  Postdoc position at DutchSemCor Project (University of Tilburg, Tilburg. Sept 2011-Sept2012)  Postdoc position at OpeNER Project (Vrije University, Amsterdam. Sept 2012-)

3. CLTL software  In general common input/output format  KAF  NAF, as an extension of KAF  Single components performing single tasks  Integration of existing modules  Adaptation of input/output formats  Development of new ones

4. KAF Kyoto Annotation Format  Stand-off, layered, XML-based representation format     Different types of information are stored in different layers Layers are linked by means of references Suitable for creating pipelines based on this format Layers:  Text  tokens  Term  lemmas, part-of-speech, term sentiment, word senses  Entities, chunks, opinions…

5. KAF Kyoto Annotation Format

6. NAF NewsReader Annotation Format  Extension of KAF  Allow the cross-document processing  Event coreference  ID’s are converted into valid URI’s  Store the same type of information provided by different tools  Result of two different pos-taggers

7. How the software is provided I  All modules are publicly available on GitHub  CLTL GitHub  http://github.com/cltl  NewsReader GitHub  http://github.com/newsreader  OpeNER GitHub  http://github.com/opener-project/

8. How the software is provided II  Some are available as Web Services  Exposed as REST web services  Accept and input stream (KAF/NAF)  Generate an output stream (KAF/NAF)  Easy to call from command line with CURL  Easy to create module pipelines in the same way you create a linux commands pipeline  http://wordpress.let.vupr.nl/web-services/

9. How the software is provided II

10. How the software is provided II

11. Our software I  General modules (integrated)  Tokenizers: whitespace based, open-nlp trained...  Sentence splitters: based on rules, open-nlp  Pos-taggers: treetagger, open-nlp pos taggers  Chunker: trained on Alpino data with open-nlp  Parsers: Alpino (nl), Stanford (en)

12. Our software II  General modules (developed by us)  Wordnet Tools  Functions to use a WordNet in LMF format  Word Sense Disambiguation systems  UKB: unsupersived  SVM: supervised (for nl derived from DutchSemcor)  Multiword tagger  multiword sequences of terms according the WordNet  OntoTagger  Ontotagger inserts (semantic) labels into KAF representation on the basis of lemma or wordnet synset representations of text

13. Our software III  General modules (developed by us)  Named Entity Recognizer  Detects dates and locations using specific resources + GeoNames  KyBot  Extract tuples and relations from a set of profiles formulated using semantic and structural properties

14. Our software IV  OpeNER related (developed by us)  Hotel property tagger  Detect aspects related with cleanliness, staff, breakfast, rooms…  Term polarity tagger  Positive/negative terms, intensifiers, negators …  Opinion miner  Detect opinions: target + holder + expression  2 rule based version // 1 machine learning version

15. Our software V  NewsReader related (developed by us)  Discourse Module  Splits incoming texts into headers and paragraphs  Factuality Classifier  Classifies whether a statement is factual/probable/possible or not  Event Coreference  Compares descriptions of events within and across documents to decide if they refer to the same events.

16. CLTL Software and Web Services Rubén Izquierdo Beviá

CLTL Software and Web Services Guide

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (15)

Ähnlich wie CLTL Software and Web Services Guide

Ähnlich wie CLTL Software and Web Services Guide (20)

Mehr von Rubén Izquierdo Beviá

Mehr von Rubén Izquierdo Beviá (15)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

CLTL Software and Web Services Guide