SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Arabic Script
SHOULD NOT
be so Scary!
What else Unicode still needs to do
for the Perso-Arabic script
Behnam Esfahbod
behnam.es
37th Internationalization and Unicode Conference – October 2013 – Santa Clara, CA
In the past decade...
The FarsiWeb Project
●
●
●
●

National Standard for Persian Unicode text
Standard Persian keyboard layouts on all major OSs
Standard-based Open Source Persian TTF fonts
Persian support in GNOME & Mozilla projects

IRNIC, The .IR ccTLD Registry
●
●

Deploy Persian (second-level) IDNs
Apply for the IDN ccTLD (actually, 2 of them!)

Persian Computing Community
●
●

Over 300 engineers, designers,
SIGs, e.g. Persian Typography

ICANN & IETF
●
●

IDN Variant Issues Project
MESWG Task Force on Arabic-Script IDNs
Previously on Unicode
Perso-Arabic Script
Mainly Arabic and Indo-Iranian languages
●
●

Is the basis for many alphabets
Persian, Urdu, Pashto, …, and Arabic!

No alphabet uses all kinds of shapes
●
●

Four-dots is not used in Arabic or Persian
But 99% of the shapes are recognized by everyone

Many writing styles
●

Naskh, Nasta’liq, etc

Not all alphabets use all the features
●

Arabic language/alphabet doesn’t use ZWNJ, traditionally
Rooted in the language properties and its conjugation forms

●

But recently is being used for non-Arabic words

IBM ⇒ ‫آییبیمام‬
Semantic Encoding
Writing Direction
●
●

Letters:
Digits:

right-to-left
left-to-right

⇒ Unicode Bidirectional Algorithm (Bidi/UBA)

‫س ل ما م‬
One code-point for each letter
⇒ Unicode Arabic Joining Algorithm

‫سلـام‬
‫سلم‬
●

Joining Control Characters
Dual-Joining (yellow) & Right-Joining (blue)
Special Characters
1. U+0640 ARABIC TATWEEL
●
●

a.k.a. Kashida
‫کشیــــــــــــــــــده ⇒ کشیده‬

2. U+200C ZERO WIDTH NON-JOINER
●
●

a.k.a. ZWNJ
‫نامهمای ⇒ نامهای‬

3. U+200D ZERO WIDTH JOINER
●
●

a.k.a. ZWJ
.‫ه.ش. ⇒ ه.ش‬
‍
Status Quo
1. Kashida
●
●

On the keyboard: too similar to Hyphen
Makes the text hard to process (search, …)

2. ZWNJ
●
●
●
●
●
●

Inaccessible in non-standard keyboard layouts
Confuses search engines and applications
IDNA2003
UTR #31 (Unicode Identifier and Pattern Syntax)
IDNA2008
UTR #36 (Unicode Security Considerations)

3. ZWJ
●

Too little use
But when it’s needed, user is already tired of the other issues
Be Stylish
Typographic Styles
Western styles for Latin-like scripts
●
●
●

Bold
Italic & Slanted (including Iranic: left-slanted)
Underline
Question:
When, how, and by whom these three became the de facto standard?

Justification & Elongation
●
●
●

Traditionally used in manuscripts
Used to be available in movable type too
Recent study shows Elongation is still the right way to
emphasize text in Perso-Arabic script
© 2013 Nasser Hadjloo, “Curious Case of Word Stylers”, TEDxTehran

1000 persons × 10 emphasis methods
⇒ Justification & Elongation lead the way!
Typographic Emphasis
Letter Spacing

LATIN SCRIPT
Elongation + Word Spacing

‫خــــط فـــــارســــی‬
●

CSS
“letter-spacing” is useless (‫)خ ‍ط‬
‍
●
“text-justify” does work, only for justification
⇒ No sane way to do elongation
●
Verbal Emphasis
Letter Repetition

Latin script… Noooo!
Elongation

!‫خط فارسی… یبـلـــــی‬
●

Keyboard
●

●

Direct mapping of characters to typographic elements (glyphs)
is bad!
Press a Kashida key a few times to emphasize some letter,
bad again!
Text Input & Storage
Joining Control
ZWNJ is very common in Persian
●
●

Necessary to create compound words
And mandatory for some words

‫خانهمای‬
‫یبییبی‬
●

But preferential in some others

‫خطکش‬
‫میوشود‬

Larry Tesler: “no modes”
●

Mode: a distinct setting in which the same user input will produce
perceived different results than it would in other settings
Modeless Input Methods
Current keyboards techniques
●
●

Based on the typewriter technologies
Based on the needs of Latin script

Kashida insertion should be implicit
●
●

Three key presses give you three letters
Three letters need three key presses

[‫]ب[...............]ل[....]ه‬

‫یبـــــــــــــــــــلــــــه‬
ZWNJ insertion should be implicit
●
●

Instead of “mode”, we need a “modifier”
Shift key would make much sense!
Note: only applies to dual-joining letters, if followed by a joining letter
Obstacles
Input
●
●
●

Need to remove unnecessary ZWNJ dynamically
Hard to implement in keyboard engines
Have to be implemented in an IME

Processing
●

Search
●
●

●

Kashida should be ignored
ZWNJ should be respected, sometimes

Identifiers
●

Standards have to deal with them, individually!

Output
●
●

A few apps still have problems with ZWNJ
Kashida not handled correctly in most fonts
Multilingual Environments
HCI Model & Language

Language in the computer

Language in the user’s mind
Encoding Challenges
Characters with similar shapes (Not in all joining
forms)
Arabic Yeh vs. Persian Yeh
© SIL International

Prefered shapes of letter Heh family based on languages
Internationalized Identifiers
Confused about Kashida
●

Use NFKC to get rid of it

Confused about ZWNJ
●

●

Some protocols mandate some rules
●
And FAIL if the input doesn’t comply
●
UTR #31 (Unicode Identifier and Pattern Syntax) rule
2.3.A1
●
UTR #46 ⇢ IDNA2008 ⇢ UTR #31
No standard approach to make it more user-friendly
●
Although the semantics are obvious to the user

No harmony in same-shape problems
●
●
●

Protocols barely talk about it
Policy-making works only for a few cases
Applications decide how to handle it, if they do
Developing a Solution
Existing Similar Methods
Case-Mapping
●
●

●

Letter cases (different representation of the same concept)
Useful for western scripts (Latin, Cyrillic, Greek, …)
●
Doesn’t work for the other scripts
Language-dependent

Normalization Form Canonical
●

●

Deal with encoding issues
●
Ensure backward-compatibility
Allow more expansion of UCD
●
Better forward-compatibility

Normalization Form Compatibility
●
●

Works good for Arabic script (Kashida)
But too much damage to other scripts
The Solution SHALL be able to…
●

Remove unnecessary (invisible) ZWNJs

●

Remove Kashida characters

●

Place text into an specific language
●

●

Maintaining the expected shape

Stay consistent when language is not specified
Arabic Shape Mapping
●

Language-less Basic Normalization
●

●

Language-less Identifier Normalization
●
●

●

Basic normalization
Remove any Kashida

Language-based Shape Mapping
●

●

Remove any unnecessary (invisible) ZWNJ

Map characters with same joining-form shapes to the right
one for the target language

Language-less Shape Mapping
●

Not straightforward at all!
‫‪T H E‬‬
‫‪E N D‬‬
‫پـــــــــــــــــــــــایان‬

Weitere ähnliche Inhalte

Ähnlich wie Arabic Script Challenges

Programming assignment help by myassignmenthelp
Programming assignment help by myassignmenthelpProgramming assignment help by myassignmenthelp
Programming assignment help by myassignmenthelpwww.myassignmenthelp.net
 
Introduction Programming Languages
Introduction Programming LanguagesIntroduction Programming Languages
Introduction Programming LanguagesManish Kharotia
 
Computer Programming Overview
Computer Programming OverviewComputer Programming Overview
Computer Programming Overviewagorolabs
 
Computer programming programming_langugages
Computer programming programming_langugagesComputer programming programming_langugages
Computer programming programming_langugageseShikshak
 
Computer programing 111 lecture 1
Computer programing 111 lecture 1 Computer programing 111 lecture 1
Computer programing 111 lecture 1 ITNet
 
Some of my best friends are localisers
Some of my best friends are localisersSome of my best friends are localisers
Some of my best friends are localisersDwayne Bailey
 
Single-Sourcing and Localization stc16
Single-Sourcing and Localization stc16Single-Sourcing and Localization stc16
Single-Sourcing and Localization stc16Laura Dent
 
Programming Languages Categories / Programming Paradigm By: Prof. Lili Saghafi
Programming Languages Categories / Programming Paradigm By: Prof. Lili Saghafi Programming Languages Categories / Programming Paradigm By: Prof. Lili Saghafi
Programming Languages Categories / Programming Paradigm By: Prof. Lili Saghafi Professor Lili Saghafi
 
Programming language design and implemenation
Programming language design and implemenationProgramming language design and implemenation
Programming language design and implemenationAshwini Awatare
 
Build Your Locale Style Guide
Build Your Locale Style GuideBuild Your Locale Style Guide
Build Your Locale Style GuideNaoko Takano
 
Lect 1. introduction to programming languages
Lect 1. introduction to programming languagesLect 1. introduction to programming languages
Lect 1. introduction to programming languagesVarun Garg
 
Essential Smart Programming Techniques that gets you hired by Tech Giants
Essential Smart Programming Techniques that gets you hired by Tech GiantsEssential Smart Programming Techniques that gets you hired by Tech Giants
Essential Smart Programming Techniques that gets you hired by Tech GiantsEmipro Technologies Pvt. Ltd.
 
Single-Sourcing and Localization
Single-Sourcing and LocalizationSingle-Sourcing and Localization
Single-Sourcing and LocalizationLaura Dent
 
Roots and Routes: Crowdsourced Manuscript Transcription Workshop
Roots and Routes: Crowdsourced Manuscript Transcription WorkshopRoots and Routes: Crowdsourced Manuscript Transcription Workshop
Roots and Routes: Crowdsourced Manuscript Transcription WorkshopBen Brumfield
 

Ähnlich wie Arabic Script Challenges (20)

Programming assignment help by myassignmenthelp
Programming assignment help by myassignmenthelpProgramming assignment help by myassignmenthelp
Programming assignment help by myassignmenthelp
 
Introduction Programming Languages
Introduction Programming LanguagesIntroduction Programming Languages
Introduction Programming Languages
 
Introduction to programming languages
Introduction to programming languagesIntroduction to programming languages
Introduction to programming languages
 
Computer Programming Overview
Computer Programming OverviewComputer Programming Overview
Computer Programming Overview
 
Computer programming programming_langugages
Computer programming programming_langugagesComputer programming programming_langugages
Computer programming programming_langugages
 
Computer programing 111 lecture 1
Computer programing 111 lecture 1 Computer programing 111 lecture 1
Computer programing 111 lecture 1
 
Some of my best friends are localisers
Some of my best friends are localisersSome of my best friends are localisers
Some of my best friends are localisers
 
Presentation-1.pptx
Presentation-1.pptxPresentation-1.pptx
Presentation-1.pptx
 
Single-Sourcing and Localization stc16
Single-Sourcing and Localization stc16Single-Sourcing and Localization stc16
Single-Sourcing and Localization stc16
 
Lecture 8
Lecture 8Lecture 8
Lecture 8
 
Programming Languages Categories / Programming Paradigm By: Prof. Lili Saghafi
Programming Languages Categories / Programming Paradigm By: Prof. Lili Saghafi Programming Languages Categories / Programming Paradigm By: Prof. Lili Saghafi
Programming Languages Categories / Programming Paradigm By: Prof. Lili Saghafi
 
Programming language design and implemenation
Programming language design and implemenationProgramming language design and implemenation
Programming language design and implemenation
 
Build Your Locale Style Guide
Build Your Locale Style GuideBuild Your Locale Style Guide
Build Your Locale Style Guide
 
About programming languages
About programming languagesAbout programming languages
About programming languages
 
Lect 1. introduction to programming languages
Lect 1. introduction to programming languagesLect 1. introduction to programming languages
Lect 1. introduction to programming languages
 
DITA for Localization
DITA for LocalizationDITA for Localization
DITA for Localization
 
Essential Smart Programming Techniques that gets you hired by Tech Giants
Essential Smart Programming Techniques that gets you hired by Tech GiantsEssential Smart Programming Techniques that gets you hired by Tech Giants
Essential Smart Programming Techniques that gets you hired by Tech Giants
 
Single-Sourcing and Localization
Single-Sourcing and LocalizationSingle-Sourcing and Localization
Single-Sourcing and Localization
 
Roots and Routes: Crowdsourced Manuscript Transcription Workshop
Roots and Routes: Crowdsourced Manuscript Transcription WorkshopRoots and Routes: Crowdsourced Manuscript Transcription Workshop
Roots and Routes: Crowdsourced Manuscript Transcription Workshop
 
Ppl 13 july2019
Ppl 13 july2019Ppl 13 july2019
Ppl 13 july2019
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Kürzlich hochgeladen (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Arabic Script Challenges

  • 1. Arabic Script SHOULD NOT be so Scary! What else Unicode still needs to do for the Perso-Arabic script Behnam Esfahbod behnam.es 37th Internationalization and Unicode Conference – October 2013 – Santa Clara, CA
  • 2. In the past decade... The FarsiWeb Project ● ● ● ● National Standard for Persian Unicode text Standard Persian keyboard layouts on all major OSs Standard-based Open Source Persian TTF fonts Persian support in GNOME & Mozilla projects IRNIC, The .IR ccTLD Registry ● ● Deploy Persian (second-level) IDNs Apply for the IDN ccTLD (actually, 2 of them!) Persian Computing Community ● ● Over 300 engineers, designers, SIGs, e.g. Persian Typography ICANN & IETF ● ● IDN Variant Issues Project MESWG Task Force on Arabic-Script IDNs
  • 4. Perso-Arabic Script Mainly Arabic and Indo-Iranian languages ● ● Is the basis for many alphabets Persian, Urdu, Pashto, …, and Arabic! No alphabet uses all kinds of shapes ● ● Four-dots is not used in Arabic or Persian But 99% of the shapes are recognized by everyone Many writing styles ● Naskh, Nasta’liq, etc Not all alphabets use all the features ● Arabic language/alphabet doesn’t use ZWNJ, traditionally Rooted in the language properties and its conjugation forms ● But recently is being used for non-Arabic words IBM ⇒ ‫آییبیمام‬
  • 5. Semantic Encoding Writing Direction ● ● Letters: Digits: right-to-left left-to-right ⇒ Unicode Bidirectional Algorithm (Bidi/UBA) ‫س ل ما م‬ One code-point for each letter ⇒ Unicode Arabic Joining Algorithm ‫سلـام‬ ‫سلم‬ ● Joining Control Characters
  • 6. Dual-Joining (yellow) & Right-Joining (blue)
  • 7. Special Characters 1. U+0640 ARABIC TATWEEL ● ● a.k.a. Kashida ‫کشیــــــــــــــــــده ⇒ کشیده‬ 2. U+200C ZERO WIDTH NON-JOINER ● ● a.k.a. ZWNJ ‫نامهمای ⇒ نامهای‬ 3. U+200D ZERO WIDTH JOINER ● ● a.k.a. ZWJ .‫ه.ش. ⇒ ه.ش‬ ‍
  • 8. Status Quo 1. Kashida ● ● On the keyboard: too similar to Hyphen Makes the text hard to process (search, …) 2. ZWNJ ● ● ● ● ● ● Inaccessible in non-standard keyboard layouts Confuses search engines and applications IDNA2003 UTR #31 (Unicode Identifier and Pattern Syntax) IDNA2008 UTR #36 (Unicode Security Considerations) 3. ZWJ ● Too little use But when it’s needed, user is already tired of the other issues
  • 10. Typographic Styles Western styles for Latin-like scripts ● ● ● Bold Italic & Slanted (including Iranic: left-slanted) Underline Question: When, how, and by whom these three became the de facto standard? Justification & Elongation ● ● ● Traditionally used in manuscripts Used to be available in movable type too Recent study shows Elongation is still the right way to emphasize text in Perso-Arabic script
  • 11. © 2013 Nasser Hadjloo, “Curious Case of Word Stylers”, TEDxTehran 1000 persons × 10 emphasis methods ⇒ Justification & Elongation lead the way!
  • 12. Typographic Emphasis Letter Spacing LATIN SCRIPT Elongation + Word Spacing ‫خــــط فـــــارســــی‬ ● CSS “letter-spacing” is useless (‫)خ ‍ط‬ ‍ ● “text-justify” does work, only for justification ⇒ No sane way to do elongation ●
  • 13. Verbal Emphasis Letter Repetition Latin script… Noooo! Elongation !‫خط فارسی… یبـلـــــی‬ ● Keyboard ● ● Direct mapping of characters to typographic elements (glyphs) is bad! Press a Kashida key a few times to emphasize some letter, bad again!
  • 14. Text Input & Storage
  • 15. Joining Control ZWNJ is very common in Persian ● ● Necessary to create compound words And mandatory for some words ‫خانهمای‬ ‫یبییبی‬ ● But preferential in some others ‫خطکش‬ ‫میوشود‬ Larry Tesler: “no modes” ● Mode: a distinct setting in which the same user input will produce perceived different results than it would in other settings
  • 16. Modeless Input Methods Current keyboards techniques ● ● Based on the typewriter technologies Based on the needs of Latin script Kashida insertion should be implicit ● ● Three key presses give you three letters Three letters need three key presses [‫]ب[...............]ل[....]ه‬ ‫یبـــــــــــــــــــلــــــه‬ ZWNJ insertion should be implicit ● ● Instead of “mode”, we need a “modifier” Shift key would make much sense! Note: only applies to dual-joining letters, if followed by a joining letter
  • 17. Obstacles Input ● ● ● Need to remove unnecessary ZWNJ dynamically Hard to implement in keyboard engines Have to be implemented in an IME Processing ● Search ● ● ● Kashida should be ignored ZWNJ should be respected, sometimes Identifiers ● Standards have to deal with them, individually! Output ● ● A few apps still have problems with ZWNJ Kashida not handled correctly in most fonts
  • 19. HCI Model & Language Language in the computer Language in the user’s mind
  • 20. Encoding Challenges Characters with similar shapes (Not in all joining forms) Arabic Yeh vs. Persian Yeh
  • 21. © SIL International Prefered shapes of letter Heh family based on languages
  • 22. Internationalized Identifiers Confused about Kashida ● Use NFKC to get rid of it Confused about ZWNJ ● ● Some protocols mandate some rules ● And FAIL if the input doesn’t comply ● UTR #31 (Unicode Identifier and Pattern Syntax) rule 2.3.A1 ● UTR #46 ⇢ IDNA2008 ⇢ UTR #31 No standard approach to make it more user-friendly ● Although the semantics are obvious to the user No harmony in same-shape problems ● ● ● Protocols barely talk about it Policy-making works only for a few cases Applications decide how to handle it, if they do
  • 24. Existing Similar Methods Case-Mapping ● ● ● Letter cases (different representation of the same concept) Useful for western scripts (Latin, Cyrillic, Greek, …) ● Doesn’t work for the other scripts Language-dependent Normalization Form Canonical ● ● Deal with encoding issues ● Ensure backward-compatibility Allow more expansion of UCD ● Better forward-compatibility Normalization Form Compatibility ● ● Works good for Arabic script (Kashida) But too much damage to other scripts
  • 25. The Solution SHALL be able to… ● Remove unnecessary (invisible) ZWNJs ● Remove Kashida characters ● Place text into an specific language ● ● Maintaining the expected shape Stay consistent when language is not specified
  • 26. Arabic Shape Mapping ● Language-less Basic Normalization ● ● Language-less Identifier Normalization ● ● ● Basic normalization Remove any Kashida Language-based Shape Mapping ● ● Remove any unnecessary (invisible) ZWNJ Map characters with same joining-form shapes to the right one for the target language Language-less Shape Mapping ● Not straightforward at all!
  • 27. ‫‪T H E‬‬ ‫‪E N D‬‬ ‫پـــــــــــــــــــــــایان‬