SlideShare a Scribd company logo
1 of 32
The SAWA Corpus A Parallel Corpus  English - Swahili Guy De Pauw   (guy.depauw@aflat.org) Peter Waiganjo Wagacha   (waiganjo@aflat.org) Gilles-Maurice de Schryver   (gillesmaurice.deschryver@aflat.org)
Resource-scarceness ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data-driven approaches ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Machine Translation ,[object Object],[object Object],[object Object],[object Object],data-driven Learn translation from examples: !! Parallel corpus !!
Parallel Corpus ,[object Object],[object Object],[object Object]
Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
3 phases ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Collection ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Collection ,[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Available data in SAWA Corpus All manually sentence aligned! English  Sentences Kiswahili  Sentences English Words Kiswahili Words New Testament 16.4k 16.3k 189.2k 151.1k Quran 14.3k 14.5k 165.5k 124.3k Declaration of HR 0.2k 1.8k 1.8k Kamusi.org 5.6k 35.5k 26.7k Movie Subtitles 9.0k 72.2k 58.4k Investment Reports 3.2k 3.1k 52.9k 54.9k Local Translator 1.5k 1.6k 25.0k 25.7k Total 50.2k 50.3k 542.1k 442.9k
Available data in SAWA Corpus All manually sentence aligned! English  Sentences Kiswahili  Sentences English Words Kiswahili Words New Testament 16.4k 16.3k 189.2k 151.1k Quran 14.3k 14.5k 165.5k 124.3k Declaration of HR 0.2k 1.8k 1.8k Kamusi.org 5.6k 35.5k 26.7k Movie Subtitles 9.0k 72.2k 58.4k Investment Reports 3.2k 3.1k 52.9k 54.9k Local Translator 1.5k 1.6k 25.0k 25.7k Total 50.2k 50.3k 542.1k 442.9k
Available data in SAWA Corpus All manually sentence aligned! Thanks to Mahmoud Shokrollahi-Far University College of Nabiye Akram (Iran) English  Sentences Kiswahili  Sentences English Words Kiswahili Words New Testament 16.4k 16.3k 189.2k 151.1k Quran 14.3k 14.5k 165.5k 124.3k Declaration of HR 0.2k 1.8k 1.8k Kamusi.org 5.6k 35.5k 26.7k Movie Subtitles 9.0k 72.2k 58.4k Investment Reports 3.2k 3.1k 52.9k 54.9k Local Translator 1.5k 1.6k 25.0k 25.7k Total 50.2k 50.3k 542.1k 442.9k
Available data in SAWA Corpus All manually sentence aligned! English  Sentences Kiswahili  Sentences English Words Kiswahili Words New Testament 16.4k 16.3k 189.2k 151.1k Quran 14.3k 14.5k 165.5k 124.3k Declaration of HR 0.2k 1.8k 1.8k Kamusi.org 5.6k 35.5k 26.7k Movie Subtitles 9.0k 72.2k 58.4k Investment Reports 3.2k 3.1k 52.9k 54.9k Local Translator 1.5k 1.6k 25.0k 25.7k Total 50.2k 50.3k 542.1k 442.9k
Available data in SAWA Corpus All manually sentence aligned! English  Sentences Kiswahili  Sentences English Words Kiswahili Words New Testament 16.4k 16.3k 189.2k 151.1k Quran 14.3k 14.5k 165.5k 124.3k Declaration of HR 0.2k 1.8k 1.8k Kamusi.org 5.6k 35.5k 26.7k Movie Subtitles 9.0k 72.2k 58.4k Investment Reports 3.2k 3.1k 52.9k 54.9k Local Translator 1.5k 1.6k 25.0k 25.7k Total 50.2k 50.3k 542.1k 442.9k
Available data in SAWA Corpus All manually sentence aligned! English  Sentences Kiswahili  Sentences English Words Kiswahili Words New Testament 16.4k 16.3k 189.2k 151.1k Quran 14.3k 14.5k 165.5k 124.3k Declaration of HR 0.2k 1.8k 1.8k Kamusi.org 5.6k 35.5k 26.7k Movie Subtitles 9.0k 72.2k 58.4k Investment Reports 3.2k 3.1k 52.9k 54.9k Local Translator 1.5k 1.6k 25.0k 25.7k Total 50.2k 50.3k 542.1k 442.9k
Available data in SAWA Corpus All manually sentence aligned! English  Sentences Kiswahili  Sentences English Words Kiswahili Words New Testament 16.4k 16.3k 189.2k 151.1k Quran 14.3k 14.5k 165.5k 124.3k Declaration of HR 0.2k 1.8k 1.8k Kamusi.org 5.6k 35.5k 26.7k Movie Subtitles 9.0k 72.2k 58.4k Investment Reports 3.2k 3.1k 52.9k 54.9k Local Translator 1.5k 1.6k 25.0k 25.7k Total 50.2k 50.3k 542.1k 442.9k
Available data in SAWA Corpus All manually sentence aligned! Thanks to Dr. James Omboga Zaja University of Nairobi English  Sentences Kiswahili  Sentences English Words Kiswahili Words New Testament 16.4k 16.3k 189.2k 151.1k Quran 14.3k 14.5k 165.5k 124.3k Declaration of HR 0.2k 1.8k 1.8k Kamusi.org 5.6k 35.5k 26.7k Movie Subtitles 9.0k 72.2k 58.4k Investment Reports 3.2k 3.1k 52.9k 54.9k Local Translator 1.5k 1.6k 25.0k 25.7k Total 50.2k 50.3k 542.1k 442.9k
Available data in SAWA Corpus All manually sentence aligned! English  Sentences Kiswahili  Sentences English Words Kiswahili Words New Testament 16.4k 16.3k 189.2k 151.1k Quran 14.3k 14.5k 165.5k 124.3k Declaration of HR 0.2k 1.8k 1.8k Kamusi.org 5.6k 35.5k 26.7k Movie Subtitles 9.0k 72.2k 58.4k Investment Reports 3.2k 3.1k 52.9k 54.9k Local Translator 1.5k 1.6k 25.0k 25.7k Total 50.2k 50.3k 542.1k 442.9k
Word alignment ,[object Object],No she ‘ s uh , , up north La  , , , yuko , aa juu  kaskazini
Word alignment You caught me skiving , I ‘ m afraid . Samahani , umenidaka  nikihepa  .
Word alignment ,[object Object],[object Object]
Current results ,[object Object],Precision Recall F (  =1) 39.4% 44.5% 41.79%
Word alignment ,[object Object],No she ‘ s uh , , up north La  , , , yuko , aa juu  kaskazini
Alignment problems nimemkatalia have turned him down I
Morphological decomposition have turned him down I ni+ me+ m+ katalia
Current results ,[object Object],[object Object],Precision Recall F (  =1) 50.2% 64.5% 55.8%
Future work ,[object Object]
Future work ,[object Object],[object Object],[object Object]
Future work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusion ,[object Object],[object Object],[object Object],[object Object]

More Related Content

Similar to The SAWA Corpus - A parallel Corpus English - Swahili

Writing Template With Drawing Box. Online assignment writing service.
Writing Template With Drawing Box. Online assignment writing service.Writing Template With Drawing Box. Online assignment writing service.
Writing Template With Drawing Box. Online assignment writing service.Jeanne Hall
 
Essay On Down Syndrome.pdf
Essay On Down Syndrome.pdfEssay On Down Syndrome.pdf
Essay On Down Syndrome.pdfChristina Morgan
 
Essay On Down Syndrome.pdf
Essay On Down Syndrome.pdfEssay On Down Syndrome.pdf
Essay On Down Syndrome.pdfJenn Cooper
 
How To Write An Essay On Your Ipad. Online assignment writing service.
How To Write An Essay On Your Ipad. Online assignment writing service.How To Write An Essay On Your Ipad. Online assignment writing service.
How To Write An Essay On Your Ipad. Online assignment writing service.Amy Cruz
 
Writing To Inform - Poverty - GCSE English - Marked By Teachers.Com
Writing To Inform - Poverty - GCSE English - Marked By Teachers.ComWriting To Inform - Poverty - GCSE English - Marked By Teachers.Com
Writing To Inform - Poverty - GCSE English - Marked By Teachers.ComNicole Gomez
 
Examples Of Informal Essay. Fantastic Informal Essay Thatsnotus
Examples Of Informal Essay. Fantastic Informal Essay  ThatsnotusExamples Of Informal Essay. Fantastic Informal Essay  Thatsnotus
Examples Of Informal Essay. Fantastic Informal Essay ThatsnotusHeidi Marshall
 
Essay Writers Online. College essay: Professional essay writers online
Essay Writers Online. College essay: Professional essay writers onlineEssay Writers Online. College essay: Professional essay writers online
Essay Writers Online. College essay: Professional essay writers onlineYngris Seino
 
Essay On Religion
Essay On ReligionEssay On Religion
Essay On ReligionRobin Ortiz
 
Community Health and Social Services Network: Improving Access to Health and ...
Community Health and Social Services Network: Improving Access to Health and ...Community Health and Social Services Network: Improving Access to Health and ...
Community Health and Social Services Network: Improving Access to Health and ...CMA Medeiros
 
SEND HANDHAKES I WILL PICK3POINTSWhat do you consider the .docx
SEND HANDHAKES I WILL PICK3POINTSWhat do you consider the .docxSEND HANDHAKES I WILL PICK3POINTSWhat do you consider the .docx
SEND HANDHAKES I WILL PICK3POINTSWhat do you consider the .docxlorileemcclatchie
 

Similar to The SAWA Corpus - A parallel Corpus English - Swahili (13)

Writing Template With Drawing Box. Online assignment writing service.
Writing Template With Drawing Box. Online assignment writing service.Writing Template With Drawing Box. Online assignment writing service.
Writing Template With Drawing Box. Online assignment writing service.
 
CASL Report1
CASL Report1CASL Report1
CASL Report1
 
Essay On Down Syndrome.pdf
Essay On Down Syndrome.pdfEssay On Down Syndrome.pdf
Essay On Down Syndrome.pdf
 
Essay On Down Syndrome.pdf
Essay On Down Syndrome.pdfEssay On Down Syndrome.pdf
Essay On Down Syndrome.pdf
 
How To Write An Essay On Your Ipad. Online assignment writing service.
How To Write An Essay On Your Ipad. Online assignment writing service.How To Write An Essay On Your Ipad. Online assignment writing service.
How To Write An Essay On Your Ipad. Online assignment writing service.
 
Doc106
Doc106Doc106
Doc106
 
Writing To Inform - Poverty - GCSE English - Marked By Teachers.Com
Writing To Inform - Poverty - GCSE English - Marked By Teachers.ComWriting To Inform - Poverty - GCSE English - Marked By Teachers.Com
Writing To Inform - Poverty - GCSE English - Marked By Teachers.Com
 
Examples Of Informal Essay. Fantastic Informal Essay Thatsnotus
Examples Of Informal Essay. Fantastic Informal Essay  ThatsnotusExamples Of Informal Essay. Fantastic Informal Essay  Thatsnotus
Examples Of Informal Essay. Fantastic Informal Essay Thatsnotus
 
Essay Writers Online.pdf
Essay Writers Online.pdfEssay Writers Online.pdf
Essay Writers Online.pdf
 
Essay Writers Online. College essay: Professional essay writers online
Essay Writers Online. College essay: Professional essay writers onlineEssay Writers Online. College essay: Professional essay writers online
Essay Writers Online. College essay: Professional essay writers online
 
Essay On Religion
Essay On ReligionEssay On Religion
Essay On Religion
 
Community Health and Social Services Network: Improving Access to Health and ...
Community Health and Social Services Network: Improving Access to Health and ...Community Health and Social Services Network: Improving Access to Health and ...
Community Health and Social Services Network: Improving Access to Health and ...
 
SEND HANDHAKES I WILL PICK3POINTSWhat do you consider the .docx
SEND HANDHAKES I WILL PICK3POINTSWhat do you consider the .docxSEND HANDHAKES I WILL PICK3POINTSWhat do you consider the .docx
SEND HANDHAKES I WILL PICK3POINTSWhat do you consider the .docx
 

More from Guy De Pauw

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Guy De Pauw
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Guy De Pauw
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingGuy De Pauw
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageGuy De Pauw
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguageGuy De Pauw
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)Guy De Pauw
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Guy De Pauw
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusGuy De Pauw
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of SantomeGuy De Pauw
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Guy De Pauw
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTGuy De Pauw
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionGuy De Pauw
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingGuy De Pauw
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishGuy De Pauw
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsGuy De Pauw
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersGuy De Pauw
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentGuy De Pauw
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersGuy De Pauw
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemGuy De Pauw
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemGuy De Pauw
 

More from Guy De Pauw (20)

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech Tagging
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh Language
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik Language
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News Corpus
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of Santome
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFST
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic Inflection
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken Irish
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 years
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound Analysers
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource Development
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá Characters
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation System
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription System
 

Recently uploaded

Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastUXDXConf
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024Stephanie Beckett
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxJennifer Lim
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 

Recently uploaded (20)

Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 

The SAWA Corpus - A parallel Corpus English - Swahili

  • 1. The SAWA Corpus A Parallel Corpus English - Swahili Guy De Pauw (guy.depauw@aflat.org) Peter Waiganjo Wagacha (waiganjo@aflat.org) Gilles-Maurice de Schryver (gillesmaurice.deschryver@aflat.org)
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. Available data in SAWA Corpus All manually sentence aligned! English Sentences Kiswahili Sentences English Words Kiswahili Words New Testament 16.4k 16.3k 189.2k 151.1k Quran 14.3k 14.5k 165.5k 124.3k Declaration of HR 0.2k 1.8k 1.8k Kamusi.org 5.6k 35.5k 26.7k Movie Subtitles 9.0k 72.2k 58.4k Investment Reports 3.2k 3.1k 52.9k 54.9k Local Translator 1.5k 1.6k 25.0k 25.7k Total 50.2k 50.3k 542.1k 442.9k
  • 13. Available data in SAWA Corpus All manually sentence aligned! English Sentences Kiswahili Sentences English Words Kiswahili Words New Testament 16.4k 16.3k 189.2k 151.1k Quran 14.3k 14.5k 165.5k 124.3k Declaration of HR 0.2k 1.8k 1.8k Kamusi.org 5.6k 35.5k 26.7k Movie Subtitles 9.0k 72.2k 58.4k Investment Reports 3.2k 3.1k 52.9k 54.9k Local Translator 1.5k 1.6k 25.0k 25.7k Total 50.2k 50.3k 542.1k 442.9k
  • 14. Available data in SAWA Corpus All manually sentence aligned! Thanks to Mahmoud Shokrollahi-Far University College of Nabiye Akram (Iran) English Sentences Kiswahili Sentences English Words Kiswahili Words New Testament 16.4k 16.3k 189.2k 151.1k Quran 14.3k 14.5k 165.5k 124.3k Declaration of HR 0.2k 1.8k 1.8k Kamusi.org 5.6k 35.5k 26.7k Movie Subtitles 9.0k 72.2k 58.4k Investment Reports 3.2k 3.1k 52.9k 54.9k Local Translator 1.5k 1.6k 25.0k 25.7k Total 50.2k 50.3k 542.1k 442.9k
  • 15. Available data in SAWA Corpus All manually sentence aligned! English Sentences Kiswahili Sentences English Words Kiswahili Words New Testament 16.4k 16.3k 189.2k 151.1k Quran 14.3k 14.5k 165.5k 124.3k Declaration of HR 0.2k 1.8k 1.8k Kamusi.org 5.6k 35.5k 26.7k Movie Subtitles 9.0k 72.2k 58.4k Investment Reports 3.2k 3.1k 52.9k 54.9k Local Translator 1.5k 1.6k 25.0k 25.7k Total 50.2k 50.3k 542.1k 442.9k
  • 16. Available data in SAWA Corpus All manually sentence aligned! English Sentences Kiswahili Sentences English Words Kiswahili Words New Testament 16.4k 16.3k 189.2k 151.1k Quran 14.3k 14.5k 165.5k 124.3k Declaration of HR 0.2k 1.8k 1.8k Kamusi.org 5.6k 35.5k 26.7k Movie Subtitles 9.0k 72.2k 58.4k Investment Reports 3.2k 3.1k 52.9k 54.9k Local Translator 1.5k 1.6k 25.0k 25.7k Total 50.2k 50.3k 542.1k 442.9k
  • 17. Available data in SAWA Corpus All manually sentence aligned! English Sentences Kiswahili Sentences English Words Kiswahili Words New Testament 16.4k 16.3k 189.2k 151.1k Quran 14.3k 14.5k 165.5k 124.3k Declaration of HR 0.2k 1.8k 1.8k Kamusi.org 5.6k 35.5k 26.7k Movie Subtitles 9.0k 72.2k 58.4k Investment Reports 3.2k 3.1k 52.9k 54.9k Local Translator 1.5k 1.6k 25.0k 25.7k Total 50.2k 50.3k 542.1k 442.9k
  • 18. Available data in SAWA Corpus All manually sentence aligned! English Sentences Kiswahili Sentences English Words Kiswahili Words New Testament 16.4k 16.3k 189.2k 151.1k Quran 14.3k 14.5k 165.5k 124.3k Declaration of HR 0.2k 1.8k 1.8k Kamusi.org 5.6k 35.5k 26.7k Movie Subtitles 9.0k 72.2k 58.4k Investment Reports 3.2k 3.1k 52.9k 54.9k Local Translator 1.5k 1.6k 25.0k 25.7k Total 50.2k 50.3k 542.1k 442.9k
  • 19. Available data in SAWA Corpus All manually sentence aligned! Thanks to Dr. James Omboga Zaja University of Nairobi English Sentences Kiswahili Sentences English Words Kiswahili Words New Testament 16.4k 16.3k 189.2k 151.1k Quran 14.3k 14.5k 165.5k 124.3k Declaration of HR 0.2k 1.8k 1.8k Kamusi.org 5.6k 35.5k 26.7k Movie Subtitles 9.0k 72.2k 58.4k Investment Reports 3.2k 3.1k 52.9k 54.9k Local Translator 1.5k 1.6k 25.0k 25.7k Total 50.2k 50.3k 542.1k 442.9k
  • 20. Available data in SAWA Corpus All manually sentence aligned! English Sentences Kiswahili Sentences English Words Kiswahili Words New Testament 16.4k 16.3k 189.2k 151.1k Quran 14.3k 14.5k 165.5k 124.3k Declaration of HR 0.2k 1.8k 1.8k Kamusi.org 5.6k 35.5k 26.7k Movie Subtitles 9.0k 72.2k 58.4k Investment Reports 3.2k 3.1k 52.9k 54.9k Local Translator 1.5k 1.6k 25.0k 25.7k Total 50.2k 50.3k 542.1k 442.9k
  • 21.
  • 22. Word alignment You caught me skiving , I ‘ m afraid . Samahani , umenidaka nikihepa .
  • 23.
  • 24.
  • 25.
  • 26. Alignment problems nimemkatalia have turned him down I
  • 27. Morphological decomposition have turned him down I ni+ me+ m+ katalia
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.