SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Natural Language Processing
for Irish
Teresa Lynn, PhD
Research Fellow
ADAPT Centre
Dublin City University
The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
www.adaptcentre.ieOutline
o Overview of Natural Language Processing (NLP)
o English - Irish machine translation
o NLP for User-generated Content
o Importance of technology for Minority Languages
www.adaptcentre.ieWhat is Natural Language Processing?
“Using computers to analyse, derive meaning and understand text”
o ‘Attempt’ to understand how humans speak/ use language
www.adaptcentre.ieWhat is Natural Language Processing?
“Using computers to analyse, derive meaning and understand text”
o ‘Attempt’ to understand how humans speak/ use language
Why do computers need to understand language?
o Text summarisation
o Sentiment analysis
o Topic extraction (Information Retrieval)
o Grammar Checking
o Text Mining (Big Data problem)
o Machine Translation
o Question-Answering Systems
www.adaptcentre.ieChallenges of processing language
• Human languages are:
– Elegant
– Efficient
– Flexible
– Complex
• One word/sentence may mean many things
• Many ways of saying the same thing
• Meaning depends on context
• Literal and figurative language (metaphor)
• Language and culture (different ways of conceptualising
the same thing)
www.adaptcentre.ieAmbiguous Headlines
Syntactic ambiguity:
EYE DROPS OFF SHELF
SQUAD HELPS DOG BITE VICTIM
ENRAGED COW INJURES FARMER WITH AXE
STOLEN PAINTING FOUND BY TREE
Semantic Ambiguity
PANDA MATING FAILS; VETERINARIAN TAKES OVER
SAFETY EXPERTS SAY SCHOOL BUS PASSENGERS SHOULD BE BELTED
POLICE BEGIN CAMPAIGN TO RUN DOWN JAYWALKERS
Source: http://www.alta.asn.au/events/altss_w2003_proc/altss/courses/somers/headlines.htm
www.adaptcentre.ieWhat does a machine know about language?
www.adaptcentre.ieWhat does a machine know about language?
Sentence = a string/sequence of characters:
“The man saw the boy with the telescope”
www.adaptcentre.ieWhat does a machine know about language?
Sentence = a string/ sequence of characters:
“The man saw the boy with the telescope”
Who is doing what? Who has the telescope?
www.adaptcentre.ieSyntactic Parsing 101
Who is doing what? Who has the telescope? = Parsing
“The man saw the boy with the telescope”
www.adaptcentre.ieSyntactic Parsing 101
Who is doing what? Who has the telescope? = Parsing
“The man saw the boy with the telescope”
DET NOUN VERB DET NOUN PREP DET NOUN
Part-of-speech Tagging
www.adaptcentre.ieParsing 101 – ambiguity
www.adaptcentre.ieParsing 101 – ambiguity
www.adaptcentre.ieTraditional Approach – Rules
‘I like ice-cream in summer’
‘I like summer in ice-cream’ ….??
Syntactic Parsing Rules:
S  NP VP
S  NP VP PP
NP  Noun | Pronoun
VP  Verb NP | Verb PP
PP  Preposition Noun
Noun  ‘ice-cream’ | ‘summer’
Pronoun  `I’
Verb  `like’
Preposition  ‘in’
www.adaptcentre.ieMachine Learning – data driven approaches
Supervised Machine Learning requires a LOT of:
• structured data
• annotated data
• reliable data
www.adaptcentre.ieMachine Learning – data driven approaches
Supervised Machine Learning requires a LOT of:
• structured data
• annotated data
• reliable data
www.adaptcentre.ieMachine Learning – data sparsity
Supervised Machine Learning requires a LOT of:
www.adaptcentre.ieMachine Learning – data sparsity
Source: expertsystem.com, redbubble.com
www.adaptcentre.ieIrish – Long distance dependencies
VSO: Word Order
English: `The boy who was looking through the
telescope yesterday on the street saw the man’
Irish: Chonaic an buachaill a bhí ag feachaint
tríd an teileascóp inné ar an tsráid an fear sin
Lit Translation [Saw]v [the boy who was looking
through the telescope yesterday on the street]subj [the
man] obj
www.adaptcentre.ieIrish Language Features – Sparsity
Word Order
English: `I saw the boy’
Irish: Chonaic mé an buachaill
Translation Saw I the boy
www.adaptcentre.ieIrish Language Features
www.adaptcentre.ieIrish Language Features
Vowel Harmony
Caithim – `I spend’
Casaim – `I turn’
Rithfinn – `I would run’
D’íosfainn – `I would eat’
www.adaptcentre.ieOutline
o Overview of Natural Language Processing (NLP)
o English - Irish machine translation
o NLP for User-generated Content
o Importance of technology for Minority Languages
www.adaptcentre.ieUser-Generated Content & NLP
Where do we find UGC?
Blogs
Social Media sites
Micro-blogs (Twitter)
Informal Emails
What is difficult about UGC for NLP?
Unstructured Text
Ungrammatical
Text Speak
Difficult to predict
Various symbols (e.g. Emojis, Hashtags)
www.adaptcentre.ieMy Work – Minority Language Twitter
Social Media Bandwagon
www.adaptcentre.ieMy Work – Minority Language Twitter
 Code-switching
 Diacritics
 Verb drop
 Spacing issue
 Phonetic spelling
 Abbreviations
grma -> go raibh maith agat
t7ain -> tseachtain
www.adaptcentre.ieMy Work – Minority Language Twitter
Goals:
o Build a corpus of POS-tagged Irish tweets
o Train a statistical POS tagger for Irish tweets
o Assess how we can leverage existing resources
o Examine the impact of noisy UG text on existing resources
www.adaptcentre.ieCrawled corpus of Irish Tweets
www.adaptcentre.iePOS-tagged tweets (with standard POS-tagger)
"<RT>"
"RT" Guess Abr
"<@NiallSF>"
"@NiallSF" Guess Unknown Noun
"<:>"
":" Punct Int
"<Sásta>"
"sásta" Adj Comp
"<go>"
"go" Part Vb Cmpl
"go" Part Vb Subj
"<raibh>"
"bí" Verb VI PastInd Dep Ecl
"bí" Verb VI PresSubj Ecl
"<sé>"
"sé" Pron Pers 3P Sg Masc Sbj
"<suaimhneach>"
"suaimhneach" Adj Base
www.adaptcentre.iePOS-tags for Irish Tweets
Adapted from work by Gimpel et al., (2011)
www.adaptcentre.ieMapped POS tags
"<RT>"
"RT" ~
"<@NiallSF>"
"@NiallSF" @
"<:>"
":" ~
"<Sásta>"
"sásta" A
"<go>"
"go" T
"<raibh>"
"bí" V
"<sé>"
"sé" O
"<suaimhneach>"
"suaimhneach" A
www.adaptcentre.ieApplication of our work
 Sociolinguistic studies
 Improved automated translation of tweets
 Improved sentiment analysis
 Cross-lingual social media analysis
www.adaptcentre.ieOutline
o Overview of Natural Language Processing (NLP)
o NLP for UGC
o Irish UGC
o Importance of technology for Minority Languages
Image: National Folklore Collection UCD
www.adaptcentre.ie
Source:
indigenoustweets.com
Challenging beliefs through technology
Source:
indigenoustweets.com
www.adaptcentre.ieConclusion
Harness technology to encourage language use:
o at school
o at home (phone technology, games)
o at work (through content creation tools, MT systems)
o online
Influence Government Policy with statistics gathered
through:
o online use analysis
o demand for technology
o empirically demonstrating evolution of language
www.adaptcentre.ie
#GRMA
Go Raibh Maith Agaibh
Thank you!
Questions?
Contact: teresa.lynn@adaptcentre.ie
www.adaptcentre.ieLanguage at Risk
Need to ensure continuing language usage
…….through technology
o Edutainment packages
o Word processing tools
o Webpage translation
o Search engines
o Games
o Social media
o Summarise discussions
o Monitor user sentiment
o Track misuse
Source: http://www.leuphana.de/institute/ies/llt2015.html

Weitere ähnliche Inhalte

Ähnlich wie Natural Language Processing for Irish

Intro 2 document
Intro 2 documentIntro 2 document
Intro 2 document
Uma Kant
 

Ähnlich wie Natural Language Processing for Irish (20)

Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
Agile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business Agility
Agile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business AgilityAgile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business Agility
Agile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business Agility
 
Natural_Language_Processing_1.ppt
Natural_Language_Processing_1.pptNatural_Language_Processing_1.ppt
Natural_Language_Processing_1.ppt
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)
 
You've Got (Big) Data! Now What?
You've Got (Big) Data! Now What?You've Got (Big) Data! Now What?
You've Got (Big) Data! Now What?
 
Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...
Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...
Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...
 
Do We Need Better Presentations
Do We Need Better PresentationsDo We Need Better Presentations
Do We Need Better Presentations
 
Feb.2016 Demystifying Digital Humanities - Workshop 2
Feb.2016 Demystifying Digital Humanities - Workshop 2Feb.2016 Demystifying Digital Humanities - Workshop 2
Feb.2016 Demystifying Digital Humanities - Workshop 2
 
Deep Representation: Building a Semantic Image Search Engine
Deep Representation: Building a Semantic Image Search EngineDeep Representation: Building a Semantic Image Search Engine
Deep Representation: Building a Semantic Image Search Engine
 
16-nlp (2).ppt
16-nlp (2).ppt16-nlp (2).ppt
16-nlp (2).ppt
 
Intro 2 document
Intro 2 documentIntro 2 document
Intro 2 document
 
1004-nlp.ppt
1004-nlp.ppt1004-nlp.ppt
1004-nlp.ppt
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
Intro
IntroIntro
Intro
 
Intro
IntroIntro
Intro
 
painav-121009211104-phpapp02.pptx
painav-121009211104-phpapp02.pptxpainav-121009211104-phpapp02.pptx
painav-121009211104-phpapp02.pptx
 
Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...
Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...
Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
 
C-ing the Future
C-ing the FutureC-ing the Future
C-ing the Future
 

Mehr von Teresa Lynn

Mehr von Teresa Lynn (7)

AI Challenges for Low-resourced Languages
AI Challenges for Low-resourced LanguagesAI Challenges for Low-resourced Languages
AI Challenges for Low-resourced Languages
 
Protecting Minority Languages from Digital Extinction
Protecting Minority Languages from Digital ExtinctionProtecting Minority Languages from Digital Extinction
Protecting Minority Languages from Digital Extinction
 
Dublin Machine Learning Meetup 2019
Dublin Machine Learning Meetup 2019Dublin Machine Learning Meetup 2019
Dublin Machine Learning Meetup 2019
 
Universal Dependencies
Universal DependenciesUniversal Dependencies
Universal Dependencies
 
Active Learning and the Irish Treebank
Active Learning and the Irish TreebankActive Learning and the Irish Treebank
Active Learning and the Irish Treebank
 
Minority Language Twitter: Part-of-Speech Tagging and Analysis of Irish Tweets
Minority Language Twitter: Part-of-Speech Tagging and Analysis of Irish TweetsMinority Language Twitter: Part-of-Speech Tagging and Analysis of Irish Tweets
Minority Language Twitter: Part-of-Speech Tagging and Analysis of Irish Tweets
 
Summary of 2015 British-Irish Council IML languages conference.
Summary of 2015 British-Irish Council IML languages conference.Summary of 2015 British-Irish Council IML languages conference.
Summary of 2015 British-Irish Council IML languages conference.
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Natural Language Processing for Irish