SlideShare ist ein Scribd-Unternehmen logo
1 von 7
Downloaden Sie, um offline zu lesen
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 3, Ver. VI (May – Jun. 2015), PP 25-31
www.iosrjournals.org
DOI: 10.9790/0661-17362531 www.iosrjournals.org 25 | Page
A Word Stemming Algorithm for Hausa Language
Muazzam Bashir1
, Azilawati Binti Rozaimee2
, Wan Malini Binti Wan Isa3
1, 2, 3
(Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin Tembila
Campus 22200 Besut Terengganu, Malaysia)
Abstract: Hausa, a highly inflected language, needs a worthy stemming approach for efficient information
retrieval (IR). However, there is a limited or unavailable study to stemming in the language. Stemming refers to
the systematic way of reducing a word to its base or root form. It is a crucial aspect in the field of natural
language processing (NLP) such as text summarization and machine translation. As such, this study
inspirationally presents an automatic word stemming system for Hausa language with a view to contributing to
the field of electronic text processing, as well as NLP, in general. The proposed method is a modification of
Porter’s algorithm to fit Hausa morphological rules. The system has an accuracy of 73.8% for implementation
with 2573 words extracted from four different articles from Hausa Leadership newspaper. If immensely
improved over time (employing more exceptional cases in future work), it would inspire the development of
more tools for the language. Hence, the language would rapidly adopt the advancement in technology.
Keywords: Hausa language, Information retrieval, Natural language processing, Stemming.
I. Introduction
A stemming algorithm refers to a computational method of reducing all words having the same root-
form to the same form by stripping affixes or infixes attached to those words. Stemming programs are also
called stemming algorithms or stemmers [1]. One of the advantages of stemming is improving the performance
of information retrieval (IR) by providing morphological variants of key terms with a view to matching the
documents with the given query. The words in the request might have a lot of variants [2]. For instance, a user
might have “stemming” in their query and would likely be interested in “stemmed” or “stem”. Conflation is the
process of matching morphological variant terms. Conflation is either manual, employing some regular
expressions or automatic – using stemmers. Moreover, automatic conflation can be language dependent or
language independent, each of which has its advantages over the other. Several studies are still going on both of
the two conflations. Several studies that include English stemmer in [1], UniSZA stemmer for Malay [3], etc.,
are language dependent studies. The improvement in technology drew the attention of many researchers to
conduct studies for language neutral stemming [4] [5].
Stemming is inappropriate in two ways namely: over-stemming and under-stemming [2]. Over-
stemming a word refers to too much removal of its part by stemming process, and this has a serious effect on IR
as it would lead to the retrieval of non-relevant document due to conflation of unrelated terms. Under-stemming
a word means removing too little part of a word, and this can prevent the retrieval of relevant documents.
This study focuses on language dependent conflation. It is the process of using a given language set of
standard rules of inflection or derivation of a similar group of words to reduce any given word, which belongs to
such groups, to its stem. Fortunately, the need for a corpus of the target language is very limited as compared to
Language Independent Approach. Furthermore, stemming is also a crucial aspect in the field of automatic text
summarization especially the abstractive form – that requires a sentence and word modification. It is believed
that enhancing the rules for a particular language is significantly reducing errors in an automated conflation [3].
There are very limited or unavailable studies that attempted to develop a stemming algorithm for Hausa
language despite the fact that Hausa language is an International language widely spoken in West Africa with
estimated speakers of over 52 million. It is a very popular language in many African countries such as Nigeria,
Niger, Togo, Ghana and Mali. Nigeria has at least 40 million Hausa speakers. It happens to be the top language
in Africa [6]. It has two forms of writing system namely: Ajami and the Boko system. The Ajami uses modified
Arabic alphabets and became acknowledged in early 17th
century. While the Boko system, Latin-based, was
introduced in 1930s by the colonial administration. The Boko system is the most prevalent in the growing
printed Hausa literature such as books, novels, newspaper and plays. The language also has different dialects
such as “Kananci”, “Katsinanci” and “Zazzaganci”. The “Kananci” (from Kano) is the standard in Hausa
literature and it is adopted by the international Hausa news broadcasters such as BBC Hausa and VOA Hausa.
Several high degrees (Masters and Ph.D.) in Hausa language are offered overseas universities in the US, UK,
and Germany [7].
In this study, a word stemming system based on Hausa language morphological rules was proposed
which modified the Porter‟s stemming algorithm [12] and also used some rules proposed in [3] accordingly.
A Word Stemming Algorithm for Hausa Language
DOI: 10.9790/0661-17362531 www.iosrjournals.org 26 | Page
II. Literature review
There are four categories of Automated Conflation based on their approaches namely: Successor
Variety, Affix Removal, n-gram and Table Lookup methods. Fig. 1 below shows the two forms of conflation as
mentioned earlier and stemmers‟ approaches [2].
1.1 Successor Variety
Successor variety employed frequencies of letter series, [8], to segment a word into its stem by
utilizing a word corpus in the process of stemming. For instance, consider a word corpus: “kanwa”, “Katanga”,
“kunya”, “kubewa” and “kango”. To find the successor variety of the word “kaya” then the initial letter of the
target word “k” matches five words with distinct two second letters. The next prefix “ka” matches three words
with two different third letters. The next “kay” matches no words, likewise the complete word “kaya”. Hence,
the word “kaya” has the successor variety of zero. Having determining the Successor Varieties for a test word
taking peaks into consideration, then the detection and segmentation of the word preceded.
2.2 Affix Removal
In this approach, the suffix, prefix and infix are stripped from the words with a view to producing their
stems. This method has been widely used since the last five decades and still common in nowadays stemmers.
The method is mainly on two principles namely: Iteration and longest-match [1].
2.2.1 Iteration
This approach uses affix stripping with looping around the given order-classes and without repeating a
match of the affix within each particular order-class.
2.2.2 Longest-match
Longest-match approach maintains that within a given set of endings, should more than one ending
produce a match then the removal of the longest one followed.
2.3 N-Gram
A method of conflating terms called the shared digram proposed by [9]. A digram refers to a couple of
successive letters. A research in [10] generalized the digram method and called it n-gram method. That is after
testing for trigram, tetra-gram, etc.
In this approach, association measures are determined by the couple of terms based on shared distinct
digrams. For instance, the term “dangantaka” (relationships) and “dangana” (to rely on) can be broken into
digram as follows:
dangantaka => da an ng ga an nt ta ak ka
unique digram = ak an da ga ka ng nt ta
dangana => da an ng ga an na
unique digram = an da ga na ng
Conflation Methods
Manual Automatic (Stemmers)
Successor
Variety
Affix
Removal
n-gram Table
Lookup
Iteration Longest-match
Figure 1: Conflation methods
A Word Stemming Algorithm for Hausa Language
DOI: 10.9790/0661-17362531 www.iosrjournals.org 27 | Page
The term “dangantaka” has nine digrams eight of which are unique while “dangana” has six digrams
five of which are unique. And the two words share four digrams. After the determination of the unique digrams,
then the similarity measure can be evaluated using Dice‟s coefficient [2] as follows:
P is the number of unique digrams of the first term. Q is the number of unique digrams of the second
term while C is the number of unique digrams shared by P and Q.
Hence, for the above example, we have: P=8; Q=5; and C=4 and therefore S = 2*4/(8+5) = 0.62.
Similarly, this is calculated for all pairs of words in the database and then clustered into groups. This similarity
measure gives a hint where the stem lies in every pair.
2.4 Table Lookup
Table Lookup is the process of storing all the words with their corresponding stems in a stand as
reported in [11]. All the terms in both the query and indexes are stemmed through lookup from the table. The
major obstacles to this approach include extensive use of a particular language to the stems of each word; it is
also a time-consuming process and requires a significant storage overhead for such tables.
2.5 Related Work
The earliest published studies in the field of stemming emerged in the 1960s and all of which are
English language dependent. The first published research in stemming algorithm, [1], is for English language. It
was a context sensitive based on longest-match rule developed for use in a library to maximize the performance
of IR. Later, the second published, Porter stemmer, proposed in [12] that concentrated on suffix removal in
English words. The system was reasonably straightforward, fast, having small lines of codes and also better than
the existing methods. The studies in [1] and [12] served as the basis for the later studies particularly in other
languages. Some of the studies that aimed at developing a stemming algorithm in other languages include
Wolaytta text stemmer proposed in [13] which was a prototype context sensitive iterative stemmer. The system
has an accuracy of 90.6% on training using 3537 Words. Moreover, the system has an accuracy of 86.9% with
unseen 884 words. A light weight stemmer, a rule-based, for Urdu text [14] ensures very abundant exceptional
cases before undergoing stemming process. The system is quite enough to be effective in IR.
A UniSZA Stemmer is an enhanced system developed in [3] for Malay language that was determined
to improve the processing speed of the previous methods. The study proposed seven simple rules that would
disregard the dependency on Malay dictionary even though it also relies on Malay root words. Subsequently, the
system has superior performance over the existing systems. Fortunately, this study used the idea of suffix
stripping in Porter stemmer [12] as a hint due to the presence of infixes in Hausa words. In addition, the study
also considered some of the rules proposed in UniSZA stemmer [3] all with a view to fitting the Hausa language
stemming approach.
III. Proposed method
Preprocessing Stem each word based on the
given stemming rule
Raw Hausa text Dictionary
Return the
stemmed words
Check a dictionary for those
words with affixes-like which are
not
Figure 2: Hausa stemmer framework
A Word Stemming Algorithm for Hausa Language
DOI: 10.9790/0661-17362531 www.iosrjournals.org 28 | Page
This study proposed a system for Hausa stemming which is a context sensitive and iterative in nature.
The study modified the algorithm in [12] and used some rules in [3], to handle exceptional cases that occur in
Hausa language. The study thoroughly analyzed the Hausa word morphology presented in [15], [16], [6] and
[7]. For this reason, this study developed the Hausa stemmer that utilizes the following rules check length,
Stopword list, masculine & feminine gender marker, prefix list, suffix list, infix rule and check a dictionary. The
proposed system was designed using Java programming language. It starts by reading the Hausa text from the
collected article samples saved in notepad file. Subsequently, the file undergoes preprocessing, which involves
the removal of punctuations, removal of words starting with a digit and changing the entire text case to lower
case as well as tokenizing the text. For every word in the text, the stemming rule is applied and checks a
dictionary where applicable. Finally, the system returns the stemmed words. The fig. 2 above demonstrates
architecture of the system.
3.1 Check a Dictionary
Hausa words, sometimes, have complexity in the Affix stripping process. There exist many words that
have prefix-like, suffix-like and sometimes both of which are neither suffix nor prefix. For instance, the word
“babur” (motorbike) has prefix-like of “ba” and suffix-like of “r”. In this case, a dictionary can be used to tackle
the problem.
3.2 Prefix list
This list contains the morphemes that are used with stems to derive new words in Hausa language.
Each of them would be checked on any given word for stemming by the system. Some of them are “ba”, “ma”,
“mai”, “yan”, etc.
3.3 Suffix list
This list contains the morphemes that are attached to stems for formation of new words in Hausa
language. For example, the word dakakku (pounded) was derived from daka (pound); in this case “kku” is a
suffix. Other suffixes include wa, iya, uwa, anya, etc. But in some instances, Hausa new words are formed by
changing the final vowel of a word. For example, derivation of kujeru (chairs) from kujera (chair), the character
„a‟ changed to „u‟. A Hausa word sometimes contains gender marker and a pronoun attached to it. Therefore,
that attachment is regarded as a suffix in this system. Consider the word motarsa (his car), the feminine gender
marker „r‟ and the pronoun “sa” (his) are attached to the “mota” (car). Other suffixes include nka, nshi, rta, etc.
3.4 Infix rule
Hausa morphology has complex infixes rules. These rules stripped words having unlisted suffixes in
the suffix list with a view to decreasing the dimensions of the suffix list. For instance, the plural word lambobi
(numbers) derived from lamba (number) belongs to the o…i rule: C1V1…Cn-1Vn-1CnVn having where Cn-1 = Cn
and Vn-1=o, Vn=i. If this rule is correct, then Vn-1CnVn would be replaced by a character „a‟. More Hausa words
belonging to this instruction include harkoki (activities), hanyoyi (ways), kofofi (doors), bindigogi (guns), etc.
Another rule is e…a…i rule: For instance, the word karesani (heifers) derived from karsana (heifer). In
this rule, the „e‟ is dropped, and the „i' is changed to „a‟ to get the stem back. The words that belong to this rule
include karefasi, tarewadi, etc.
3.5 Check a length
In this function, the lengths of the words are determined, and only those words exceed the given
threshold value would undergo the stemming process. This study sets the limit to be three.
3.6 Stopword list
This list stored all the Hausa words that frequently occur in a text. The system discards stopwords upon
encountered. The system moves to the next word if it meets a word that belongs to the stopword list. These
words include “a”, “wannan”, “nan”, “can”, “cewa”, etc.
3.7 Masculine & Feminine gender marker
In Hausa texts, the definite and indefinite articles are attached to the words using „r‟ for feminine and
„n‟ for masculine. Feminine marker is attached to feminine words while masculine to masculine words.
Therefore, this gender marker has to be stripped to stem the particular word. And there are some words that are
exceptional, e.g. “babur”.
A Word Stemming Algorithm for Hausa Language
DOI: 10.9790/0661-17362531 www.iosrjournals.org 29 | Page
Here is the proposed algorithm:
1. Read the raw text
2. Tokenize the text into words
3. READ the word to be stemmed.
4. IF the word contains letters only
Go to 5
ELSE
Go to 10
5. IF the size of the word is greater than three
IF the word is in the Stopword List
Go to 10
ELSE
Go to 6
ELSE
Go to 7
6. IF the word ends in „r‟ or „n‟
IF the word ends in „r‟
IF „u‟ or „i‟ comes before the „r‟
Go to 7
ELSE
Remove the „r‟ and Go to 8
ELSE
Remove the „n‟ and Go to 8
7. RETURN the word and RECORD it in a stem dictionary and Go to 10
8. IF the word starts with any of the prefixes in the Prefixes list
Remove the prefix and
CHECK Dictionary IF the resulting word exists
Go to 7
ELSE
IF the word ends with any of the Suffixes list
Remove the suffix and Go to 7
ELSE
Restore the Prefix and Go to 9
ELSE
IF there exists immediate duplicate of first three letters in the word
Remove the first three letters and Go to 7
ELSE
IF the word is a double word
Return the first word & Go to 7
ELSE Go to 9
9. IF the word ends with any of the Suffixes in the Suffixes list
Remove the suffix and
CHECK Dictionary IF the word exist
Go to 7
ELSE
IF the word has any infix rule
Remove the infix and Go to 7
ELSE
Go to 7
ELSE
IF the word has any infix rule
Remove the infix and Go to 7
ELSE
Go to 7
10. IF the end of the text is not met Go to 3
ELSE Go to 11
11. Stop
A Word Stemming Algorithm for Hausa Language
DOI: 10.9790/0661-17362531 www.iosrjournals.org 30 | Page
IV. Result and discussion
As mentioned earlier, with a very limited research in stemming for Hausa, then the proposed system‟s
outcome was judged by a linguistic expert in the language. The study used a collection of 2573 words from four
different articles in Hausa Leadership newspaper to evaluate the system.
Out of 2573 words, 966 words are detected as Stop words. The remaining words are 1607 and only 741
discrete words. Hence, 741 discrete words were used in the system on which 547 words were stemmed
correctly. On the other hand, 110 and 84 words were over-stemmed and under-stemmed respectively as
stemming error. The system has an accuracy of 73.8%. Table 1 provides a summary of this evaluation.
Table 1: Highlights of the System Outcome
Words Expected Output System Output
(Stemmed Words)
Remarks
asibitin (the hospital) asibiti asibiti Stemmed properly
musulmai (Muslims) musulmi musulmi Stemmed properly
sa‟o‟i (hours) sa‟a sa‟a Stemmed properly
sojojin (the soldiers) soja soja Stemmed properly
shugabannin (the leaders) shugaba shugaba Stemmed properly
wayoyin (the phones) waya waya Stemmed properly
kare (dog) kare ka Over-stemmed
kazanta (mess) kazanta kaza Over-stemmed
kunne (ear) kunne ku Over-stemmed
mummunan (the ugly) muni muna Under-stemmed
sabuwar (the new) sabo sabu Under-stemmed
The Table 1 above indicates that even with the use of the Hausa Dictionary, the system still
encountered stemming error. When the system gets the word “kare” it is expected to return “kare” as the stem
but the system applied the stemming rule and checked in the dictionary and found the output “ka” is valid. Same
goes with the words kazanta with a stem “kaza” and “kunne” with “ku”. One way of solving this problem,
though quite expensive, is to find all the words with this ambiguity and put them into an exceptional list.
The problem of under-stemming has to do with the complexity of the Hausa morphology as it has so
many inconsistent rules. Hausa verbs ending in „a‟ are the root forms, and all others are derived form. The rule
is sometimes inconsistent as there are others (exceptional cases) with ending in „e‟, „o‟, „i‟, „u‟ as the root forms.
Same goes with plural formation in Hausa nouns, endings in „a‟ are typically considered plurals. For example,
the word “mata” (women) from “mace” (woman) and “yara” (boys) from “yaro” (boy), etc. But in some cases,
endings in „u‟ are plurals, for instance, the word “kujeru” (chairs) from “kujera” (chair). Therefore, there are
also singular words ending in „a‟.
Hence, the stemming errors in this system are as a result of these inconsistencies. To overcome the
mistakes, find all these exceptional cases and employ them in the system.
V. Conclusion
It is possible to develop Hausa morphological stemmer and improve it over time despite the scarcity of
linguistic resources. The proposed system can efficiently stem Hausa word with accuracy of 73.8%. The system
is quite good even though it lacks abundant exceptional cases. The results showed that Hausa stemming should
not depend on a dictionary but rather on exempting some particular group of words from the process. Over-
stemming and under-stemming with 26.2% are, often, due to this unavailability. Provision of such exceptions
will enhance the efficiency of the system.
Our future work includes the provision of more exceptional cases (Hausa proper nouns inclusive) as
well as additional rules to help in improving the system. Also, future work will involve utilizing part of speech
tagging. Part of speech tagging will be conducive in further dividing the Hausa stemming process into several
modules such as noun and verb modules stemming.
Acknowledgements
The authors would like to extend appreciation of the support given by University Sultan Zainal Abidin
(UniSZA) and the Kano State Government. Also, our gratitude to Dr. Tijjani Shehu (a lecturer, Department of
Linguistics, Bayero University, Kano) for his tireless support in examining the results.
A Word Stemming Algorithm for Hausa Language
DOI: 10.9790/0661-17362531 www.iosrjournals.org 31 | Page
References
[1]. Lovins, Julie B. Development of a stemming algorithm. MIT Information Processing Group, Electronic Systems Laboratory, 1968.
[2]. Frakes, W. B. "Introduction to information storage and retrieval systems." Space 14 (1992): 10.
[3]. Abdullah, Syed, Engku Fadzli Hasan, Syarilla Ahmad Saany, Hasni Hassan, Mohd Satar, and Siti Dhalila. "Simple Rules Malay
Stemmer." In The International Conference on Informatics and Applications (ICIA2012), pp. 28-35. The Society of Digital
Information and Wireless Communication, 2012.
[4]. Pande, B. P., Pawan Tamta, and H. S. Dhami. "Generation, Implementation and Appraisal of an N-gram based Stemming
Algorithm." arXiv preprint arXiv:1312.4824 (2013).
[5]. Dianati, Mohammad Hassan, Mohammad Hadi Sadreddini, Amir Hossein Rasekh, Seyed Mostafa Fakhrahmad, and Hossein Taghi-
Zadeh. "Words Stemming Based on Structural and Semantic Similarity." Computer Engineering and Applications Journal 3, no. 2
(2014): 89-99.
[6]. Smirnova, Mirra Aleksandrovna. The Hausa language: a descriptive grammar. Routledge & Kegan Paul, 1982.
[7]. Newman, Paul. The Hausa language: An encyclopedic reference grammar. Yale University Press, 2000.
[8]. Hafer, Margaret A., and Stephen F. Weiss. "Word segmentation by letter successor varieties." Information storage and retrieval 10,
no. 11 (1974): 371-385.
[9]. Adamson, George W., and Jillian Boreham. "The use of an association measure based on character structure to identify semantically
related pairs of words and document titles." Information storage and retrieval 10.7 (1974): 253-260.
[10]. Mcnamee, Paul, and James Mayfield. "Character n-gram tokenization for European language text retrieval." Information retrieval
7.1-2 (2004): 73-97.
[11]. Frakes, William B. "Term conflation for information retrieval." Proceedings of the 7th annual international ACM SIGIR conference
on Research and development in information retrieval. British Computer Society, 1984.
[12]. Porter, Martin F. "An algorithm for suffix stripping." Program 14, no. 3 (1980): 130-137.
[13]. Lessa, Lemma. "Development of stemming algorithm for wolaytta text.", master‟s thesis, Department of Information Science,
Faculty of Informatics, Addis Ababa University, Ethiopia (2003).
[14]. Khan, Sajjad Ahmad, Waqas Anwar, Usama Ijaz Bajwa, and Xuan Wang. "A Light Weight Stemmer for Urdu Language: A Scarce
Resourced Language." In 24th International Conference on Computational Linguistics, p. 69. 2012.
[15]. Abdoulaye, Mahamane L. "Aspects of Hausa morphosyntax in role and reference grammar." PhD diss., State University of New
York at Buffalo, 1992.
[16]. Al-Hassan, Bello SY. "Transfixation in Hausa: A Hypothetical Analysis." Studies of the Department of African Languages and
Cultures 45 (2011).

Weitere ähnliche Inhalte

Was ist angesagt?

THE STRUCTURED COMPACT TAG-SET FOR LUGANDA
THE STRUCTURED COMPACT TAG-SET FOR LUGANDATHE STRUCTURED COMPACT TAG-SET FOR LUGANDA
THE STRUCTURED COMPACT TAG-SET FOR LUGANDAijnlc
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)ThennarasuSakkan
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indianeSAT Publishing House
 
STRUCTURED AND QUANTITATIVE PROPERTIES OF ARABIC SMS-BASED CLASSIFIED ADS SUB...
STRUCTURED AND QUANTITATIVE PROPERTIES OF ARABIC SMS-BASED CLASSIFIED ADS SUB...STRUCTURED AND QUANTITATIVE PROPERTIES OF ARABIC SMS-BASED CLASSIFIED ADS SUB...
STRUCTURED AND QUANTITATIVE PROPERTIES OF ARABIC SMS-BASED CLASSIFIED ADS SUB...ijnlc
 
A survey of named entity recognition in assamese and other indian languages
A survey of named entity recognition in assamese and other indian languagesA survey of named entity recognition in assamese and other indian languages
A survey of named entity recognition in assamese and other indian languagesijnlc
 
CBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERCBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERijnlc
 
MYANMAR WORDS SORTING
MYANMAR WORDS SORTING MYANMAR WORDS SORTING
MYANMAR WORDS SORTING kevig
 
Ny3424442448
Ny3424442448Ny3424442448
Ny3424442448IJERA Editor
 
Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)Jorge Baptista
 
Establishment of a List of Non-Compositional Multi-Word Combinations for Eng...
 Establishment of a List of Non-Compositional Multi-Word Combinations for Eng... Establishment of a List of Non-Compositional Multi-Word Combinations for Eng...
Establishment of a List of Non-Compositional Multi-Word Combinations for Eng...Research Journal of Education
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Reviewinscit2006
 
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATIONEFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATIONIJDKP
 
Grapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisGrapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisIJERA Editor
 
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...Editor IJARCET
 
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMHINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMijnlc
 
Hybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textHybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textijnlc
 
Personalising speech to-speech translation
Personalising speech to-speech translationPersonalising speech to-speech translation
Personalising speech to-speech translationbehzad66
 

Was ist angesagt? (18)

THE STRUCTURED COMPACT TAG-SET FOR LUGANDA
THE STRUCTURED COMPACT TAG-SET FOR LUGANDATHE STRUCTURED COMPACT TAG-SET FOR LUGANDA
THE STRUCTURED COMPACT TAG-SET FOR LUGANDA
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indian
 
STRUCTURED AND QUANTITATIVE PROPERTIES OF ARABIC SMS-BASED CLASSIFIED ADS SUB...
STRUCTURED AND QUANTITATIVE PROPERTIES OF ARABIC SMS-BASED CLASSIFIED ADS SUB...STRUCTURED AND QUANTITATIVE PROPERTIES OF ARABIC SMS-BASED CLASSIFIED ADS SUB...
STRUCTURED AND QUANTITATIVE PROPERTIES OF ARABIC SMS-BASED CLASSIFIED ADS SUB...
 
A survey of named entity recognition in assamese and other indian languages
A survey of named entity recognition in assamese and other indian languagesA survey of named entity recognition in assamese and other indian languages
A survey of named entity recognition in assamese and other indian languages
 
CBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERCBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMER
 
MYANMAR WORDS SORTING
MYANMAR WORDS SORTING MYANMAR WORDS SORTING
MYANMAR WORDS SORTING
 
Ny3424442448
Ny3424442448Ny3424442448
Ny3424442448
 
Ijetcas14 575
Ijetcas14 575Ijetcas14 575
Ijetcas14 575
 
Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)
 
Establishment of a List of Non-Compositional Multi-Word Combinations for Eng...
 Establishment of a List of Non-Compositional Multi-Word Combinations for Eng... Establishment of a List of Non-Compositional Multi-Word Combinations for Eng...
Establishment of a List of Non-Compositional Multi-Word Combinations for Eng...
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Review
 
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATIONEFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
 
Grapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisGrapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech Synthesis
 
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
 
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMHINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
 
Hybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textHybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic text
 
Personalising speech to-speech translation
Personalising speech to-speech translationPersonalising speech to-speech translation
Personalising speech to-speech translation
 

Andere mochten auch

Stemming algorithms
Stemming algorithmsStemming algorithms
Stemming algorithmsRaghu nath
 
Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Dhabal Sethi
 
Designing a Rule Based Stemmer for Afaan Oromo Text
Designing a Rule Based Stemmer for Afaan Oromo TextDesigning a Rule Based Stemmer for Afaan Oromo Text
Designing a Rule Based Stemmer for Afaan Oromo TextWaqas Tariq
 
Tokenizer and stemmer for telugu language
Tokenizer and stemmer for telugu languageTokenizer and stemmer for telugu language
Tokenizer and stemmer for telugu languageSri Anusha Medarametla
 
Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...baskaran_md
 
Tamil Morphological Analysis
Tamil Morphological AnalysisTamil Morphological Analysis
Tamil Morphological AnalysisKarthik Sankar
 
Morphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageMorphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageLushanthan Sivaneasharajah
 
Introduce prefixes suffixes roots affixes power point
Introduce prefixes suffixes roots affixes power pointIntroduce prefixes suffixes roots affixes power point
Introduce prefixes suffixes roots affixes power pointDaphna Doron
 
Paper id 25201466
Paper id 25201466Paper id 25201466
Paper id 25201466IJRAT
 
Top Tips For Working Smarter
Top Tips For Working SmarterTop Tips For Working Smarter
Top Tips For Working SmarterInterQuest Group
 

Andere mochten auch (13)

22 ijcse-01208
22 ijcse-0120822 ijcse-01208
22 ijcse-01208
 
Stemming algorithms
Stemming algorithmsStemming algorithms
Stemming algorithms
 
Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11
 
Designing a Rule Based Stemmer for Afaan Oromo Text
Designing a Rule Based Stemmer for Afaan Oromo TextDesigning a Rule Based Stemmer for Afaan Oromo Text
Designing a Rule Based Stemmer for Afaan Oromo Text
 
Tokenizer and stemmer for telugu language
Tokenizer and stemmer for telugu languageTokenizer and stemmer for telugu language
Tokenizer and stemmer for telugu language
 
Munnar
MunnarMunnar
Munnar
 
Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...
 
project doc (1)
project doc (1)project doc (1)
project doc (1)
 
Tamil Morphological Analysis
Tamil Morphological AnalysisTamil Morphological Analysis
Tamil Morphological Analysis
 
Morphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageMorphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil Language
 
Introduce prefixes suffixes roots affixes power point
Introduce prefixes suffixes roots affixes power pointIntroduce prefixes suffixes roots affixes power point
Introduce prefixes suffixes roots affixes power point
 
Paper id 25201466
Paper id 25201466Paper id 25201466
Paper id 25201466
 
Top Tips For Working Smarter
Top Tips For Working SmarterTop Tips For Working Smarter
Top Tips For Working Smarter
 

Ă„hnlich wie A Word Stemming Algorithm for Hausa Language

Designing A Rule Based Stemming Algorithm for Kambaata Language Text
Designing A Rule Based Stemming Algorithm for Kambaata Language TextDesigning A Rule Based Stemming Algorithm for Kambaata Language Text
Designing A Rule Based Stemming Algorithm for Kambaata Language TextCSCJournals
 
Designing A Rule Based Stemming Algorithm for Kambaata Language Text
Designing A Rule Based Stemming Algorithm for Kambaata Language TextDesigning A Rule Based Stemming Algorithm for Kambaata Language Text
Designing A Rule Based Stemming Algorithm for Kambaata Language TextCSCJournals
 
Machine transliteration survey
Machine transliteration surveyMachine transliteration survey
Machine transliteration surveyunyil96
 
A new hybrid metric for verifying
A new hybrid metric for verifyingA new hybrid metric for verifying
A new hybrid metric for verifyingcsandit
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introductionThennarasuSakkan
 
ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...
ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...
ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...kevig
 
Design of A Spell Corrector For Hausa Language
Design of A Spell Corrector For Hausa LanguageDesign of A Spell Corrector For Hausa Language
Design of A Spell Corrector For Hausa LanguageWaqas Tariq
 
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONFURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONkevig
 
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONFURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONijnlc
 
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMS
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMSCOMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMS
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMSIJMIT JOURNAL
 
Corpus study design
Corpus study designCorpus study design
Corpus study designbikashtaly
 
An Unsupervised Approach to Develop Stemmer
An Unsupervised Approach to Develop StemmerAn Unsupervised Approach to Develop Stemmer
An Unsupervised Approach to Develop Stemmerkevig
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...ijaia
 
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...gerogepatton
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...gerogepatton
 
A statistical approach to term extraction.pdf
A statistical approach to term extraction.pdfA statistical approach to term extraction.pdf
A statistical approach to term extraction.pdfJasmine Dixon
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
 
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVALDICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVALcsandit
 

Ă„hnlich wie A Word Stemming Algorithm for Hausa Language (20)

Designing A Rule Based Stemming Algorithm for Kambaata Language Text
Designing A Rule Based Stemming Algorithm for Kambaata Language TextDesigning A Rule Based Stemming Algorithm for Kambaata Language Text
Designing A Rule Based Stemming Algorithm for Kambaata Language Text
 
Designing A Rule Based Stemming Algorithm for Kambaata Language Text
Designing A Rule Based Stemming Algorithm for Kambaata Language TextDesigning A Rule Based Stemming Algorithm for Kambaata Language Text
Designing A Rule Based Stemming Algorithm for Kambaata Language Text
 
Machine transliteration survey
Machine transliteration surveyMachine transliteration survey
Machine transliteration survey
 
A new hybrid metric for verifying
A new hybrid metric for verifyingA new hybrid metric for verifying
A new hybrid metric for verifying
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introduction
 
ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...
ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...
ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...
 
Design of A Spell Corrector For Hausa Language
Design of A Spell Corrector For Hausa LanguageDesign of A Spell Corrector For Hausa Language
Design of A Spell Corrector For Hausa Language
 
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONFURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
 
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONFURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
 
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMS
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMSCOMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMS
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMS
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
 
An Unsupervised Approach to Develop Stemmer
An Unsupervised Approach to Develop StemmerAn Unsupervised Approach to Develop Stemmer
An Unsupervised Approach to Develop Stemmer
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
 
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
 
A statistical approach to term extraction.pdf
A statistical approach to term extraction.pdfA statistical approach to term extraction.pdf
A statistical approach to term extraction.pdf
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
 
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVALDICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
 

Mehr von iosrjce

An Examination of Effectuation Dimension as Financing Practice of Small and M...
An Examination of Effectuation Dimension as Financing Practice of Small and M...An Examination of Effectuation Dimension as Financing Practice of Small and M...
An Examination of Effectuation Dimension as Financing Practice of Small and M...iosrjce
 
Does Goods and Services Tax (GST) Leads to Indian Economic Development?
Does Goods and Services Tax (GST) Leads to Indian Economic Development?Does Goods and Services Tax (GST) Leads to Indian Economic Development?
Does Goods and Services Tax (GST) Leads to Indian Economic Development?iosrjce
 
Childhood Factors that influence success in later life
Childhood Factors that influence success in later lifeChildhood Factors that influence success in later life
Childhood Factors that influence success in later lifeiosrjce
 
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...iosrjce
 
Customer’s Acceptance of Internet Banking in Dubai
Customer’s Acceptance of Internet Banking in DubaiCustomer’s Acceptance of Internet Banking in Dubai
Customer’s Acceptance of Internet Banking in Dubaiiosrjce
 
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...iosrjce
 
Consumer Perspectives on Brand Preference: A Choice Based Model Approach
Consumer Perspectives on Brand Preference: A Choice Based Model ApproachConsumer Perspectives on Brand Preference: A Choice Based Model Approach
Consumer Perspectives on Brand Preference: A Choice Based Model Approachiosrjce
 
Student`S Approach towards Social Network Sites
Student`S Approach towards Social Network SitesStudent`S Approach towards Social Network Sites
Student`S Approach towards Social Network Sitesiosrjce
 
Broadcast Management in Nigeria: The systems approach as an imperative
Broadcast Management in Nigeria: The systems approach as an imperativeBroadcast Management in Nigeria: The systems approach as an imperative
Broadcast Management in Nigeria: The systems approach as an imperativeiosrjce
 
A Study on Retailer’s Perception on Soya Products with Special Reference to T...
A Study on Retailer’s Perception on Soya Products with Special Reference to T...A Study on Retailer’s Perception on Soya Products with Special Reference to T...
A Study on Retailer’s Perception on Soya Products with Special Reference to T...iosrjce
 
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...iosrjce
 
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
Consumers’ Behaviour on Sony Xperia: A Case Study on BangladeshConsumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladeshiosrjce
 
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...iosrjce
 
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...iosrjce
 
Media Innovations and its Impact on Brand awareness & Consideration
Media Innovations and its Impact on Brand awareness & ConsiderationMedia Innovations and its Impact on Brand awareness & Consideration
Media Innovations and its Impact on Brand awareness & Considerationiosrjce
 
Customer experience in supermarkets and hypermarkets – A comparative study
Customer experience in supermarkets and hypermarkets – A comparative studyCustomer experience in supermarkets and hypermarkets – A comparative study
Customer experience in supermarkets and hypermarkets – A comparative studyiosrjce
 
Social Media and Small Businesses: A Combinational Strategic Approach under t...
Social Media and Small Businesses: A Combinational Strategic Approach under t...Social Media and Small Businesses: A Combinational Strategic Approach under t...
Social Media and Small Businesses: A Combinational Strategic Approach under t...iosrjce
 
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...iosrjce
 
Implementation of Quality Management principles at Zimbabwe Open University (...
Implementation of Quality Management principles at Zimbabwe Open University (...Implementation of Quality Management principles at Zimbabwe Open University (...
Implementation of Quality Management principles at Zimbabwe Open University (...iosrjce
 
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...iosrjce
 

Mehr von iosrjce (20)

An Examination of Effectuation Dimension as Financing Practice of Small and M...
An Examination of Effectuation Dimension as Financing Practice of Small and M...An Examination of Effectuation Dimension as Financing Practice of Small and M...
An Examination of Effectuation Dimension as Financing Practice of Small and M...
 
Does Goods and Services Tax (GST) Leads to Indian Economic Development?
Does Goods and Services Tax (GST) Leads to Indian Economic Development?Does Goods and Services Tax (GST) Leads to Indian Economic Development?
Does Goods and Services Tax (GST) Leads to Indian Economic Development?
 
Childhood Factors that influence success in later life
Childhood Factors that influence success in later lifeChildhood Factors that influence success in later life
Childhood Factors that influence success in later life
 
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
 
Customer’s Acceptance of Internet Banking in Dubai
Customer’s Acceptance of Internet Banking in DubaiCustomer’s Acceptance of Internet Banking in Dubai
Customer’s Acceptance of Internet Banking in Dubai
 
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
 
Consumer Perspectives on Brand Preference: A Choice Based Model Approach
Consumer Perspectives on Brand Preference: A Choice Based Model ApproachConsumer Perspectives on Brand Preference: A Choice Based Model Approach
Consumer Perspectives on Brand Preference: A Choice Based Model Approach
 
Student`S Approach towards Social Network Sites
Student`S Approach towards Social Network SitesStudent`S Approach towards Social Network Sites
Student`S Approach towards Social Network Sites
 
Broadcast Management in Nigeria: The systems approach as an imperative
Broadcast Management in Nigeria: The systems approach as an imperativeBroadcast Management in Nigeria: The systems approach as an imperative
Broadcast Management in Nigeria: The systems approach as an imperative
 
A Study on Retailer’s Perception on Soya Products with Special Reference to T...
A Study on Retailer’s Perception on Soya Products with Special Reference to T...A Study on Retailer’s Perception on Soya Products with Special Reference to T...
A Study on Retailer’s Perception on Soya Products with Special Reference to T...
 
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
 
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
Consumers’ Behaviour on Sony Xperia: A Case Study on BangladeshConsumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
 
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
 
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
 
Media Innovations and its Impact on Brand awareness & Consideration
Media Innovations and its Impact on Brand awareness & ConsiderationMedia Innovations and its Impact on Brand awareness & Consideration
Media Innovations and its Impact on Brand awareness & Consideration
 
Customer experience in supermarkets and hypermarkets – A comparative study
Customer experience in supermarkets and hypermarkets – A comparative studyCustomer experience in supermarkets and hypermarkets – A comparative study
Customer experience in supermarkets and hypermarkets – A comparative study
 
Social Media and Small Businesses: A Combinational Strategic Approach under t...
Social Media and Small Businesses: A Combinational Strategic Approach under t...Social Media and Small Businesses: A Combinational Strategic Approach under t...
Social Media and Small Businesses: A Combinational Strategic Approach under t...
 
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
 
Implementation of Quality Management principles at Zimbabwe Open University (...
Implementation of Quality Management principles at Zimbabwe Open University (...Implementation of Quality Management principles at Zimbabwe Open University (...
Implementation of Quality Management principles at Zimbabwe Open University (...
 
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
 

KĂĽrzlich hochgeladen

Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptxNikhil Raut
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESNarmatha D
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 

KĂĽrzlich hochgeladen (20)

Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptx
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIES
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 

A Word Stemming Algorithm for Hausa Language

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 3, Ver. VI (May – Jun. 2015), PP 25-31 www.iosrjournals.org DOI: 10.9790/0661-17362531 www.iosrjournals.org 25 | Page A Word Stemming Algorithm for Hausa Language Muazzam Bashir1 , Azilawati Binti Rozaimee2 , Wan Malini Binti Wan Isa3 1, 2, 3 (Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin Tembila Campus 22200 Besut Terengganu, Malaysia) Abstract: Hausa, a highly inflected language, needs a worthy stemming approach for efficient information retrieval (IR). However, there is a limited or unavailable study to stemming in the language. Stemming refers to the systematic way of reducing a word to its base or root form. It is a crucial aspect in the field of natural language processing (NLP) such as text summarization and machine translation. As such, this study inspirationally presents an automatic word stemming system for Hausa language with a view to contributing to the field of electronic text processing, as well as NLP, in general. The proposed method is a modification of Porter’s algorithm to fit Hausa morphological rules. The system has an accuracy of 73.8% for implementation with 2573 words extracted from four different articles from Hausa Leadership newspaper. If immensely improved over time (employing more exceptional cases in future work), it would inspire the development of more tools for the language. Hence, the language would rapidly adopt the advancement in technology. Keywords: Hausa language, Information retrieval, Natural language processing, Stemming. I. Introduction A stemming algorithm refers to a computational method of reducing all words having the same root- form to the same form by stripping affixes or infixes attached to those words. Stemming programs are also called stemming algorithms or stemmers [1]. One of the advantages of stemming is improving the performance of information retrieval (IR) by providing morphological variants of key terms with a view to matching the documents with the given query. The words in the request might have a lot of variants [2]. For instance, a user might have “stemming” in their query and would likely be interested in “stemmed” or “stem”. Conflation is the process of matching morphological variant terms. Conflation is either manual, employing some regular expressions or automatic – using stemmers. Moreover, automatic conflation can be language dependent or language independent, each of which has its advantages over the other. Several studies are still going on both of the two conflations. Several studies that include English stemmer in [1], UniSZA stemmer for Malay [3], etc., are language dependent studies. The improvement in technology drew the attention of many researchers to conduct studies for language neutral stemming [4] [5]. Stemming is inappropriate in two ways namely: over-stemming and under-stemming [2]. Over- stemming a word refers to too much removal of its part by stemming process, and this has a serious effect on IR as it would lead to the retrieval of non-relevant document due to conflation of unrelated terms. Under-stemming a word means removing too little part of a word, and this can prevent the retrieval of relevant documents. This study focuses on language dependent conflation. It is the process of using a given language set of standard rules of inflection or derivation of a similar group of words to reduce any given word, which belongs to such groups, to its stem. Fortunately, the need for a corpus of the target language is very limited as compared to Language Independent Approach. Furthermore, stemming is also a crucial aspect in the field of automatic text summarization especially the abstractive form – that requires a sentence and word modification. It is believed that enhancing the rules for a particular language is significantly reducing errors in an automated conflation [3]. There are very limited or unavailable studies that attempted to develop a stemming algorithm for Hausa language despite the fact that Hausa language is an International language widely spoken in West Africa with estimated speakers of over 52 million. It is a very popular language in many African countries such as Nigeria, Niger, Togo, Ghana and Mali. Nigeria has at least 40 million Hausa speakers. It happens to be the top language in Africa [6]. It has two forms of writing system namely: Ajami and the Boko system. The Ajami uses modified Arabic alphabets and became acknowledged in early 17th century. While the Boko system, Latin-based, was introduced in 1930s by the colonial administration. The Boko system is the most prevalent in the growing printed Hausa literature such as books, novels, newspaper and plays. The language also has different dialects such as “Kananci”, “Katsinanci” and “Zazzaganci”. The “Kananci” (from Kano) is the standard in Hausa literature and it is adopted by the international Hausa news broadcasters such as BBC Hausa and VOA Hausa. Several high degrees (Masters and Ph.D.) in Hausa language are offered overseas universities in the US, UK, and Germany [7]. In this study, a word stemming system based on Hausa language morphological rules was proposed which modified the Porter‟s stemming algorithm [12] and also used some rules proposed in [3] accordingly.
  • 2. A Word Stemming Algorithm for Hausa Language DOI: 10.9790/0661-17362531 www.iosrjournals.org 26 | Page II. Literature review There are four categories of Automated Conflation based on their approaches namely: Successor Variety, Affix Removal, n-gram and Table Lookup methods. Fig. 1 below shows the two forms of conflation as mentioned earlier and stemmers‟ approaches [2]. 1.1 Successor Variety Successor variety employed frequencies of letter series, [8], to segment a word into its stem by utilizing a word corpus in the process of stemming. For instance, consider a word corpus: “kanwa”, “Katanga”, “kunya”, “kubewa” and “kango”. To find the successor variety of the word “kaya” then the initial letter of the target word “k” matches five words with distinct two second letters. The next prefix “ka” matches three words with two different third letters. The next “kay” matches no words, likewise the complete word “kaya”. Hence, the word “kaya” has the successor variety of zero. Having determining the Successor Varieties for a test word taking peaks into consideration, then the detection and segmentation of the word preceded. 2.2 Affix Removal In this approach, the suffix, prefix and infix are stripped from the words with a view to producing their stems. This method has been widely used since the last five decades and still common in nowadays stemmers. The method is mainly on two principles namely: Iteration and longest-match [1]. 2.2.1 Iteration This approach uses affix stripping with looping around the given order-classes and without repeating a match of the affix within each particular order-class. 2.2.2 Longest-match Longest-match approach maintains that within a given set of endings, should more than one ending produce a match then the removal of the longest one followed. 2.3 N-Gram A method of conflating terms called the shared digram proposed by [9]. A digram refers to a couple of successive letters. A research in [10] generalized the digram method and called it n-gram method. That is after testing for trigram, tetra-gram, etc. In this approach, association measures are determined by the couple of terms based on shared distinct digrams. For instance, the term “dangantaka” (relationships) and “dangana” (to rely on) can be broken into digram as follows: dangantaka => da an ng ga an nt ta ak ka unique digram = ak an da ga ka ng nt ta dangana => da an ng ga an na unique digram = an da ga na ng Conflation Methods Manual Automatic (Stemmers) Successor Variety Affix Removal n-gram Table Lookup Iteration Longest-match Figure 1: Conflation methods
  • 3. A Word Stemming Algorithm for Hausa Language DOI: 10.9790/0661-17362531 www.iosrjournals.org 27 | Page The term “dangantaka” has nine digrams eight of which are unique while “dangana” has six digrams five of which are unique. And the two words share four digrams. After the determination of the unique digrams, then the similarity measure can be evaluated using Dice‟s coefficient [2] as follows: P is the number of unique digrams of the first term. Q is the number of unique digrams of the second term while C is the number of unique digrams shared by P and Q. Hence, for the above example, we have: P=8; Q=5; and C=4 and therefore S = 2*4/(8+5) = 0.62. Similarly, this is calculated for all pairs of words in the database and then clustered into groups. This similarity measure gives a hint where the stem lies in every pair. 2.4 Table Lookup Table Lookup is the process of storing all the words with their corresponding stems in a stand as reported in [11]. All the terms in both the query and indexes are stemmed through lookup from the table. The major obstacles to this approach include extensive use of a particular language to the stems of each word; it is also a time-consuming process and requires a significant storage overhead for such tables. 2.5 Related Work The earliest published studies in the field of stemming emerged in the 1960s and all of which are English language dependent. The first published research in stemming algorithm, [1], is for English language. It was a context sensitive based on longest-match rule developed for use in a library to maximize the performance of IR. Later, the second published, Porter stemmer, proposed in [12] that concentrated on suffix removal in English words. The system was reasonably straightforward, fast, having small lines of codes and also better than the existing methods. The studies in [1] and [12] served as the basis for the later studies particularly in other languages. Some of the studies that aimed at developing a stemming algorithm in other languages include Wolaytta text stemmer proposed in [13] which was a prototype context sensitive iterative stemmer. The system has an accuracy of 90.6% on training using 3537 Words. Moreover, the system has an accuracy of 86.9% with unseen 884 words. A light weight stemmer, a rule-based, for Urdu text [14] ensures very abundant exceptional cases before undergoing stemming process. The system is quite enough to be effective in IR. A UniSZA Stemmer is an enhanced system developed in [3] for Malay language that was determined to improve the processing speed of the previous methods. The study proposed seven simple rules that would disregard the dependency on Malay dictionary even though it also relies on Malay root words. Subsequently, the system has superior performance over the existing systems. Fortunately, this study used the idea of suffix stripping in Porter stemmer [12] as a hint due to the presence of infixes in Hausa words. In addition, the study also considered some of the rules proposed in UniSZA stemmer [3] all with a view to fitting the Hausa language stemming approach. III. Proposed method Preprocessing Stem each word based on the given stemming rule Raw Hausa text Dictionary Return the stemmed words Check a dictionary for those words with affixes-like which are not Figure 2: Hausa stemmer framework
  • 4. A Word Stemming Algorithm for Hausa Language DOI: 10.9790/0661-17362531 www.iosrjournals.org 28 | Page This study proposed a system for Hausa stemming which is a context sensitive and iterative in nature. The study modified the algorithm in [12] and used some rules in [3], to handle exceptional cases that occur in Hausa language. The study thoroughly analyzed the Hausa word morphology presented in [15], [16], [6] and [7]. For this reason, this study developed the Hausa stemmer that utilizes the following rules check length, Stopword list, masculine & feminine gender marker, prefix list, suffix list, infix rule and check a dictionary. The proposed system was designed using Java programming language. It starts by reading the Hausa text from the collected article samples saved in notepad file. Subsequently, the file undergoes preprocessing, which involves the removal of punctuations, removal of words starting with a digit and changing the entire text case to lower case as well as tokenizing the text. For every word in the text, the stemming rule is applied and checks a dictionary where applicable. Finally, the system returns the stemmed words. The fig. 2 above demonstrates architecture of the system. 3.1 Check a Dictionary Hausa words, sometimes, have complexity in the Affix stripping process. There exist many words that have prefix-like, suffix-like and sometimes both of which are neither suffix nor prefix. For instance, the word “babur” (motorbike) has prefix-like of “ba” and suffix-like of “r”. In this case, a dictionary can be used to tackle the problem. 3.2 Prefix list This list contains the morphemes that are used with stems to derive new words in Hausa language. Each of them would be checked on any given word for stemming by the system. Some of them are “ba”, “ma”, “mai”, “yan”, etc. 3.3 Suffix list This list contains the morphemes that are attached to stems for formation of new words in Hausa language. For example, the word dakakku (pounded) was derived from daka (pound); in this case “kku” is a suffix. Other suffixes include wa, iya, uwa, anya, etc. But in some instances, Hausa new words are formed by changing the final vowel of a word. For example, derivation of kujeru (chairs) from kujera (chair), the character „a‟ changed to „u‟. A Hausa word sometimes contains gender marker and a pronoun attached to it. Therefore, that attachment is regarded as a suffix in this system. Consider the word motarsa (his car), the feminine gender marker „r‟ and the pronoun “sa” (his) are attached to the “mota” (car). Other suffixes include nka, nshi, rta, etc. 3.4 Infix rule Hausa morphology has complex infixes rules. These rules stripped words having unlisted suffixes in the suffix list with a view to decreasing the dimensions of the suffix list. For instance, the plural word lambobi (numbers) derived from lamba (number) belongs to the o…i rule: C1V1…Cn-1Vn-1CnVn having where Cn-1 = Cn and Vn-1=o, Vn=i. If this rule is correct, then Vn-1CnVn would be replaced by a character „a‟. More Hausa words belonging to this instruction include harkoki (activities), hanyoyi (ways), kofofi (doors), bindigogi (guns), etc. Another rule is e…a…i rule: For instance, the word karesani (heifers) derived from karsana (heifer). In this rule, the „e‟ is dropped, and the „i' is changed to „a‟ to get the stem back. The words that belong to this rule include karefasi, tarewadi, etc. 3.5 Check a length In this function, the lengths of the words are determined, and only those words exceed the given threshold value would undergo the stemming process. This study sets the limit to be three. 3.6 Stopword list This list stored all the Hausa words that frequently occur in a text. The system discards stopwords upon encountered. The system moves to the next word if it meets a word that belongs to the stopword list. These words include “a”, “wannan”, “nan”, “can”, “cewa”, etc. 3.7 Masculine & Feminine gender marker In Hausa texts, the definite and indefinite articles are attached to the words using „r‟ for feminine and „n‟ for masculine. Feminine marker is attached to feminine words while masculine to masculine words. Therefore, this gender marker has to be stripped to stem the particular word. And there are some words that are exceptional, e.g. “babur”.
  • 5. A Word Stemming Algorithm for Hausa Language DOI: 10.9790/0661-17362531 www.iosrjournals.org 29 | Page Here is the proposed algorithm: 1. Read the raw text 2. Tokenize the text into words 3. READ the word to be stemmed. 4. IF the word contains letters only Go to 5 ELSE Go to 10 5. IF the size of the word is greater than three IF the word is in the Stopword List Go to 10 ELSE Go to 6 ELSE Go to 7 6. IF the word ends in „r‟ or „n‟ IF the word ends in „r‟ IF „u‟ or „i‟ comes before the „r‟ Go to 7 ELSE Remove the „r‟ and Go to 8 ELSE Remove the „n‟ and Go to 8 7. RETURN the word and RECORD it in a stem dictionary and Go to 10 8. IF the word starts with any of the prefixes in the Prefixes list Remove the prefix and CHECK Dictionary IF the resulting word exists Go to 7 ELSE IF the word ends with any of the Suffixes list Remove the suffix and Go to 7 ELSE Restore the Prefix and Go to 9 ELSE IF there exists immediate duplicate of first three letters in the word Remove the first three letters and Go to 7 ELSE IF the word is a double word Return the first word & Go to 7 ELSE Go to 9 9. IF the word ends with any of the Suffixes in the Suffixes list Remove the suffix and CHECK Dictionary IF the word exist Go to 7 ELSE IF the word has any infix rule Remove the infix and Go to 7 ELSE Go to 7 ELSE IF the word has any infix rule Remove the infix and Go to 7 ELSE Go to 7 10. IF the end of the text is not met Go to 3 ELSE Go to 11 11. Stop
  • 6. A Word Stemming Algorithm for Hausa Language DOI: 10.9790/0661-17362531 www.iosrjournals.org 30 | Page IV. Result and discussion As mentioned earlier, with a very limited research in stemming for Hausa, then the proposed system‟s outcome was judged by a linguistic expert in the language. The study used a collection of 2573 words from four different articles in Hausa Leadership newspaper to evaluate the system. Out of 2573 words, 966 words are detected as Stop words. The remaining words are 1607 and only 741 discrete words. Hence, 741 discrete words were used in the system on which 547 words were stemmed correctly. On the other hand, 110 and 84 words were over-stemmed and under-stemmed respectively as stemming error. The system has an accuracy of 73.8%. Table 1 provides a summary of this evaluation. Table 1: Highlights of the System Outcome Words Expected Output System Output (Stemmed Words) Remarks asibitin (the hospital) asibiti asibiti Stemmed properly musulmai (Muslims) musulmi musulmi Stemmed properly sa‟o‟i (hours) sa‟a sa‟a Stemmed properly sojojin (the soldiers) soja soja Stemmed properly shugabannin (the leaders) shugaba shugaba Stemmed properly wayoyin (the phones) waya waya Stemmed properly kare (dog) kare ka Over-stemmed kazanta (mess) kazanta kaza Over-stemmed kunne (ear) kunne ku Over-stemmed mummunan (the ugly) muni muna Under-stemmed sabuwar (the new) sabo sabu Under-stemmed The Table 1 above indicates that even with the use of the Hausa Dictionary, the system still encountered stemming error. When the system gets the word “kare” it is expected to return “kare” as the stem but the system applied the stemming rule and checked in the dictionary and found the output “ka” is valid. Same goes with the words kazanta with a stem “kaza” and “kunne” with “ku”. One way of solving this problem, though quite expensive, is to find all the words with this ambiguity and put them into an exceptional list. The problem of under-stemming has to do with the complexity of the Hausa morphology as it has so many inconsistent rules. Hausa verbs ending in „a‟ are the root forms, and all others are derived form. The rule is sometimes inconsistent as there are others (exceptional cases) with ending in „e‟, „o‟, „i‟, „u‟ as the root forms. Same goes with plural formation in Hausa nouns, endings in „a‟ are typically considered plurals. For example, the word “mata” (women) from “mace” (woman) and “yara” (boys) from “yaro” (boy), etc. But in some cases, endings in „u‟ are plurals, for instance, the word “kujeru” (chairs) from “kujera” (chair). Therefore, there are also singular words ending in „a‟. Hence, the stemming errors in this system are as a result of these inconsistencies. To overcome the mistakes, find all these exceptional cases and employ them in the system. V. Conclusion It is possible to develop Hausa morphological stemmer and improve it over time despite the scarcity of linguistic resources. The proposed system can efficiently stem Hausa word with accuracy of 73.8%. The system is quite good even though it lacks abundant exceptional cases. The results showed that Hausa stemming should not depend on a dictionary but rather on exempting some particular group of words from the process. Over- stemming and under-stemming with 26.2% are, often, due to this unavailability. Provision of such exceptions will enhance the efficiency of the system. Our future work includes the provision of more exceptional cases (Hausa proper nouns inclusive) as well as additional rules to help in improving the system. Also, future work will involve utilizing part of speech tagging. Part of speech tagging will be conducive in further dividing the Hausa stemming process into several modules such as noun and verb modules stemming. Acknowledgements The authors would like to extend appreciation of the support given by University Sultan Zainal Abidin (UniSZA) and the Kano State Government. Also, our gratitude to Dr. Tijjani Shehu (a lecturer, Department of Linguistics, Bayero University, Kano) for his tireless support in examining the results.
  • 7. A Word Stemming Algorithm for Hausa Language DOI: 10.9790/0661-17362531 www.iosrjournals.org 31 | Page References [1]. Lovins, Julie B. Development of a stemming algorithm. MIT Information Processing Group, Electronic Systems Laboratory, 1968. [2]. Frakes, W. B. "Introduction to information storage and retrieval systems." Space 14 (1992): 10. [3]. Abdullah, Syed, Engku Fadzli Hasan, Syarilla Ahmad Saany, Hasni Hassan, Mohd Satar, and Siti Dhalila. "Simple Rules Malay Stemmer." In The International Conference on Informatics and Applications (ICIA2012), pp. 28-35. The Society of Digital Information and Wireless Communication, 2012. [4]. Pande, B. P., Pawan Tamta, and H. S. Dhami. "Generation, Implementation and Appraisal of an N-gram based Stemming Algorithm." arXiv preprint arXiv:1312.4824 (2013). [5]. Dianati, Mohammad Hassan, Mohammad Hadi Sadreddini, Amir Hossein Rasekh, Seyed Mostafa Fakhrahmad, and Hossein Taghi- Zadeh. "Words Stemming Based on Structural and Semantic Similarity." Computer Engineering and Applications Journal 3, no. 2 (2014): 89-99. [6]. Smirnova, Mirra Aleksandrovna. The Hausa language: a descriptive grammar. Routledge & Kegan Paul, 1982. [7]. Newman, Paul. The Hausa language: An encyclopedic reference grammar. Yale University Press, 2000. [8]. Hafer, Margaret A., and Stephen F. Weiss. "Word segmentation by letter successor varieties." Information storage and retrieval 10, no. 11 (1974): 371-385. [9]. Adamson, George W., and Jillian Boreham. "The use of an association measure based on character structure to identify semantically related pairs of words and document titles." Information storage and retrieval 10.7 (1974): 253-260. [10]. Mcnamee, Paul, and James Mayfield. "Character n-gram tokenization for European language text retrieval." Information retrieval 7.1-2 (2004): 73-97. [11]. Frakes, William B. "Term conflation for information retrieval." Proceedings of the 7th annual international ACM SIGIR conference on Research and development in information retrieval. British Computer Society, 1984. [12]. Porter, Martin F. "An algorithm for suffix stripping." Program 14, no. 3 (1980): 130-137. [13]. Lessa, Lemma. "Development of stemming algorithm for wolaytta text.", master‟s thesis, Department of Information Science, Faculty of Informatics, Addis Ababa University, Ethiopia (2003). [14]. Khan, Sajjad Ahmad, Waqas Anwar, Usama Ijaz Bajwa, and Xuan Wang. "A Light Weight Stemmer for Urdu Language: A Scarce Resourced Language." In 24th International Conference on Computational Linguistics, p. 69. 2012. [15]. Abdoulaye, Mahamane L. "Aspects of Hausa morphosyntax in role and reference grammar." PhD diss., State University of New York at Buffalo, 1992. [16]. Al-Hassan, Bello SY. "Transfixation in Hausa: A Hypothetical Analysis." Studies of the Department of African Languages and Cultures 45 (2011).