1. Arabic Spell Checkers
Natural Language Processing - CS465
Supervised by:
Dr. Amal Al-Saif
Done by:
Hanan Al-Mohammadi
Mona Al-Mutairi
Imam Muhammad ibn Saud University, Department of
Computer Science and Information System
1
5. First Paper
“An Approach for Analyzing and Correcting
Spelling Errors for Non-native Arabic learners”
o Based on a questioning environment.
6. First Paper
• Error Detection
Two types of errors:
1. Ill-formed word errors.
o Buckwalter’s Arabic Morphological analyzer .
Ex. ‘ ’ is ill-formed of word ‘ ’
2. Semantically incorrect errors.
Ex. If a spelling question displays a happy face to a learner
and asks him to write a word which describes this picture
and he enter ’ ’/helped instead of ’ ’/happy
7. First Paper
• Error Correction
Edit distance technique.
• Filtering
1. Morphological Analyzer Filter.
Ex. After applying Correction techniques on word ‘ ’, ‘ ’
appears as correction. So, Morphological filter will exclude it.
2. Gloss Filter.
Ex. If user misspelled word ’ ’/happy with ’ ’ (the second letter
’ ’ is incorrectly replaced by the short vowel Fatha). applying Correction
techniques will result two possible word corrections: ’ ’/happy and
’ ’/helped, Both are valid Arabic words. Apply gloss filter will
exclude word ’ ’/helped.
8. First Paper
• Evaluation:
Done using real test data composed of 190 misspelled words and include
both single and multi-error misspellings composed of up to three errors per
word. Average word length is 5 letters per word.
• Result
80+% recall and 90+% precision were achieved for each type of spelling
error.
9. Second Paper
“Towards Automatic Spell Checking for
Arabic”
• Composed of Arabic morphological
analyzer, lexicon, spelling detector, and spelling
corrector.
• Spelling detection
• Two possibilities :
1. The misspelled word is an invalid word, Ex. ‘ ’ for
‘ ’
2. The misspelled word is a valid word , Ex. ‘ ’ in
place of ‘ ’
10. Second Paper
• Spelling correction:
• Add missing character: the candidates of the misspelled ‘ ’ are
‘ ’, ‘ ’ and ‘ ’
• Replace incorrect character: the candidates of the misspelled " " are
" ", " and " ".
• Remove excessive character: the candidates of the misspelled word
" " are " ", " ".
• Add a space to split words: the candidates of the misspelled word " "
are " ", " ".
• Arabic morphological analyzer
• Broke down the inflected word ‘ ’ into the prefix
‘ ', the suffix ‘ ', and the stem ‘ ’. Then check the stem
lexicon, if has entry in the lexicon stem is correct.
12. Third Paper
- Algorithm defined by B. Haddad and M. Yassen
- Error patterns
Simple Errors :
Editing Errors and Boundary Problems
Cognitive and Phonetic Mistakes
Syntax Errors
Semantic Errors
Substitution: (/ → /, fāl→qāl, he said), the letter (/ /,f) mistakenly substituted by (/ /,q).
Deletion: (/ → /, ’sḫdama→ ’staḫdama, he or it-used), the letter (/ /,t) is missing.
Insertion: (/ → /, makttūb → maktūb, a letter in the sense of a message). (/ /,t) is additionally inserted.
Transposition: (/ → /, ’ğmitā‘ → ’ğtimā‘, meeting). The letter (/ /, t) is swapped.
(/ → /, ra’īs’alğami‘h→ ra’īs ’alğami‘h)
(/ → /, fa qāl → faqāl, and then he said)
(/ or → /, hādā or hāzā → hadā, the particle that)
(/ → /, the girl went to [the]- school), (/ /,dahaba) instead of
(/ /, dahabat).
(/ → /, red rebuking cells → red blood cells). (/ /, ’ldam, the rebuking)
instead of (/ /, ’ldam, the-blood).
13. Third Paper
- Knowledge base :
D&C = ( DAWKB , NDAKB , CORSTR)
- Derivative Arabic Word Knowledge Base DAWKB
- For each valid Arabic root there is a certain number of consistent patterns.
- Root-pattern relationship means, a word, which has at least one lexical occurrence
in the Arabic vocabulary.
- dwj = ( Prefji + PtjΘsubMGRi + Suffji ) MSR PNGRi
- Database for NDW & AW
Considered as stems or lexemes collected in the knowledge base.
- Non-Word Recognition and Error Correction Strategy
14. Fourth Paper
- Paper proposed by A. Hattab and A. Hussein.
- The proposed system consists of three models.
- The detection and correction model, classify words
into a non-words or a misspelling.
15. Fourth Paper
Evaluation :
-There are two run applied for the proposed system, first run without the detection
and correction method and the second is with detection and correction method.
-The same data will be used in both experiments. The results of these experiments
are shown in Tables:
-The detection and correction algorithm outperformed the Bayes algorithm by about
10%, without checking misspelling errors accuracy is 68.85%, while the average
accuracy for the classification system with misspellings detection and correction is
71.77%.