SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
DOI : 10.5121/ijnlc.2014.3202 11
DEVELOPEMNT AND EVALUATION OF A
WEB BASED QUESTION ANSWERING
SYSTEM FOR ARABIC LANGUAG
Heba Kurdi, Sara Alkhaider, Nada Alfaifi
Department of Computer Science
Al Imam Muhammad Ibn Saud Islamic University
Riyadh, SA
ABSTRACT
Question Answering (QA) systems are gaining great importance due to the increasing amount of web
content and the high demand for digital information that regular information retrieval techniques cannot
satisfy. A question answering system enables users to have a natural language dialog with the machine,
which is required for virtually all emerging online service systems on the Internet. The need for such
systems is higher in the context of the Arabic language. This is because of the scarcity of Arabic QA
systems, which can be attributed to the great challenges they present to the research community,including
theparticularities of Arabic, such as short vowels, absence of capital letters, complex morphology, etc. In
this paper, we report the design and implementation of an Arabic web-based question answering
system,which we called “JAWEB”, the Arabic word for the verb “answer”. Unlike all Arabic question-
answering systems, JAWEB is a web-based application,so it can be accessed at any time and from
anywhere. Evaluating JAWEBshowed that it gives the correct answer with 100% recall and 80% precision
on average. When comparedto ask.com, the well-established web-based QA system, JAWEBprovided 15-
20% higher recall.These promising results give clear evidence that JAWEB has great potential as a QA
platform and is much needed by Arabic-speaking Internet users across the world.
KEYWORDS
Question Answering system, Natural Language Processing , Arabic language tools.
1.INTRODUCTION
Question-answering is a challenging task in general. It involves state-of-the-art techniques in
various fields, such as Information Retrieval (IR), Natural Language Processing (NLP), Artificial
Intelligence (AI) and software technologies. The main goal of a question-answering system, is to
give a precise answer to users’ queries in natural language [1] [2]. This has the great advantage of
helping users to get desired answers without searching large pools of information. Unlike search
engines, such as Google and Yahoo, which allow a user to retrieve webpages or documents that
are partially relevant to given keyword(s), while leaving the job of excluding irrelevant hyperlinks
and locating desired passages to users, QA systems provide the user with highly relevant
information in answer to their questions at the passage or sentence level [3]. QA systems
categorize questions into three main groups: first, fact-seeking questions, i.e. factoid questions,
asking about (who, when, where, what and how much) which relate to different types of entities:
person, location, organization, time and quantity. This type of question is the most common form
of questions and therefore is the focus of this work. Second is list questions (e.g. List all the
countries of the European Union), asking about multiple pieces of information with a common
relation between them. Third is why-type questions (e.g. Why is the sky blue?) which seeks
explanations of causes for a certain phenomenon or event. The second and third categories are
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
12
less common in QA systems and require more advanced NLP and AI techniques and were
therefore excluded from this study.
Based on the knowledge domain, QA systems classify questions aseither closed-domain
questions, which ask about information related to a specific domain (for example medical or
sport, etc.), or open-domain questions which ask about information in any knowledge discipline.
They require wider knowledge and advanced techniques to extract the most relevant answers [4].
The latter is the focus of this paper.
A QA system consists of three main components: a question-classification component, an
information-retrieval component, and an answer-extraction component. The question-
classification component playsa primary role as it categorizes questions based on their keywords:
who, when, where, what and how much. The information-retrieval componentsearches for
relevant answers in information sources by looking for matching keywords. Finally, the answer-
extraction componentselects the most relevant answers and ranks them accordingly.The quality of
a QA system heavily depends on the effectiveness of this module as it makes the decision about
the final list of answers to be presented to the user [5] [6].
Although the systems have common core components ,they differ vastly in the techniques utilized
by each component. This can be attributed mainly to the supported language(s) by the system,
since each language has special features that affect how it should be analyzed, searched and
retrieved.
Arabic is among the most popular languages with nearly 300 million speakers across the globe. It
is a Semitic language. These are well known for their non-concatenative morphology, in which
roots consist of an isolated set of constants, rather than syllables or words. This feature among
many others has rated NLP and IR as grand challenging tasks for the Arabic language, despite
some attempts such as [7]-[15]. Consequently, the huge Arabic content on the Internet is still
underutilized[16]. Important emerging applications, question-answering systems for online
services in particular, are simply not available in Arabic, despite their abundance in many other
languages, especially English and other Latin-based languages .The challenging features of the
Arabic language ,sacristy of Arabic QA systems and lack of web-based Arabic QA systems are
the main drivers for this paper. It adds several contributions to this important field, including
comprehensively surveying Arabic QA systems and a comparing between them, developing and
evaluating of JAWEB, the first Arabic web-based QA system, and extending the Arabic Corpus
provided in [30].
The remaining of this paper is organized as follows: In section 2, the challenges of the Arabic
language are highlighted. Section 3 reviews related work in QA systems. The system architecture
and functional components of JAWEB are introduced in section 4. In section 5, the experimental
results are presented and discussed. Finally, section 6 concludes the paper and indicates future
work.
2.CHALLENGES OF ARABIC LANGUAGE
Arabic is highly inflectional and derivational. This results in sparseness of terms in text, which
leads to inefficiency in many statistical IR and NLP techniques. The absence of diacritics in
Modern Standard Arabic also adds a lot of ambiguity to Question Analysis and Answer
Extraction [16].
Many aspects are involved in the slow progress of Arabic NLP and IR, compared to the
accomplishments in English and other languages. These aspects ,as highlighted in [17], include:
1. Arabic has a complex morphology. It is highly derivational and inflectional, which extremely
complicates morphology analysis.
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
13
• Arabic is a derivational language, to find a word in an Arabic dictionary, we start by
extracting the root and then we search for the root in the dictionary. The word
derivation is done by adding affixes (prefix, infix, or suffix) to the root, according to
several patterns [18] [19], Figure 1 shows an example of that.
Figure 1 An example of Arabic Language derivation
• Arabic is an inflectional language, which means that the construction of a word
involves finding the root and adding affixes (prefix, infix and suffix) to it, as
illustrated in Figure 2.
Figure 2an example of Arabic words composition
2. The correct meaning of an Arabic word can be ambiguous in the absence of diacritical marks
(short vowels);so a written word may hold different meanings as illustrated in Figure 3.
3. Arabic does not use capital letters which makes differentiating between named entities and
other words difficult, as shown in Figure 4.
Disconnected َ َ‫ـــ‬َ
Is disconnected ِ ُ‫ـــ‬َ
So connect َِ‫ــــ‬َ
Disconnection, chapter, semester ْ َ‫ــــ‬ٌ
Detail (verb) َ‫ــــ‬ْ
Disconnected parts َ ِ‫ــــ‬
Figure 3: An example of the effect of Arabic diacritics in the meaning of a word
Figure 4. An example of absence of capital letters in Arabic
4. Numerals are written form left to right, while alphabets are written from right to left. This
makes editing Arabic text difficult when both numbers and letters are presented on the same line.
5. The most used encodings for Arabic text, UTF-8 and Unicode, present many problems when
processing Arabic texts.
6. Lack of Arabic corpora, lexicon and electronic dictionaries.
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
14
3.RELATED WORK
Question-Answering systems present a good solution for textual information retrieval and
knowledge sharing and discovery. This is why a large number of QA systems have been
developed in various languages. Some languages, such as English, are better served than others,
such as Arabic. This might be related to language features and the maturity of research in the
countries speaking it. This section surveys QA systems for English and for Arabic and compares
between them.
A. For Latin languages
Due to the popularity, importance and features of the English language, tens of QA systems are
available in English. Extensive surveys are available in [3] [20]. Here we highlight some of the
most prominent work on this area.
• QALC [21] is the first QA system developed for English in the Text REtrieval Conference
(TREC) evolution campaign in 1999. It providesanswers to English factoid questions based on
syntactic and semantic analysis.
• QRISTAL [22] is a multilingual QA system (English, French, Portuguese, Italian and Polish). It
retrieves answers from a local database or from the web, based on named entities’ recognition and
conceptual and thematic analysis.
• WebQA [23] is a web based QA system,which uses the template-mapping technique to define
the question type. It retrieves passages from Google search engine and employs a clustering
technique to extract multiple answer blocks from different web pages.
• Ask.com [24], originally known as Ask Jeeves, is a famous web-based question-answering
system that supports English, Arabic and many other languages. It allows users to get answers to
questions expressed in everyday, natural language, as well as by traditional keyword searching. It
also supports math, dictionary, and conversion questions.
B. For Arabic languages
The situation is less bright for the Arabic language, despite of its wide spread. Although research
in the field of Arabic QA systemshas already started [7]-[15],it is slow progressing and has
limited results. Computerized tools and resources in general are lacking in Arabic [16], which has
reflected negatively on the number of Arabic QA systems. The list includes:
• AQSA (1993) [25] seems to be the first QA system in the language. It is a knowledge-based
QA system that extracts answers from structured data only. It uses the frames technique to
present the knowledge fromthe radiation domain. However,no published evaluation is
available for the system.
• QARAB (2004) [26]is a stand-alone (non-web-based)QA systems that uses IR and NLP
techniques to extract answers from a collection of Arabic newspaper texts. It provides
answers to factoid questions but does not support other types of questions, such as how or
why.
• ArabiQA (2007) [27] is a stand-alone Arabic QA system that deals with factoid
questions,using Named Entity Recognitiontechniques and Java Information Retrieval System
(JIRS) for Arabic text. It is designed specifically for Arabic factoid questions. However,
system implementation has not been completed yet.
• QASAL (2009) [28] is a stand-alone Arabic QA system for factoid questions which uses the
NooJ platform [29] as a linguistic development environment. The system takes advantage of
some linguistic techniques from IR and NLP to process Arabic text documents containing
answers to factoid questions. Published work based on this systems does not include
experimental resultsor performance metrics. The overall functionality of the system is limited
to the amount of available tools developed for the Arabic language by the NooJ Arabic
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
15
modules.
• ArQA (2011) [17] is a stand-alone QA system that provides answers to factoid questions
expressed in Arabic. This system has a pipeline architecture with four modules: question
processing, passage retrieval, answer extraction and answer validation modules. Each module
is the result of combining several IR and NLP techniques and tools to improve validity of
returned answers.
• AQuASys (2011) [30]: a stand-alone Arabic QAsystem for factoid questions. It
extensivelyutilizes NLP techniques to analyses questions and retrieves answers from an
Arabic corpus that has been developed by the authors. Retrieved answers are scored and
presented based on their relevance.
Table1 illustrates the differences between some state-of-art QA systems in English and Arabic
languages. Based on the table it is clear that web-based Arabic QA systems are lacking.
Table 1: comparison between QA Systems
3.JAWEB ARCHITECTURE
JAWEBis an Arabic web-based QA system that focuses on factoid questions. It accepts questions
related to any named entity,including person, location, organization, time etc.. Afteranalyzing the
question, important information is extracted to retrieve the most relevant answers from an Arabic
corpus. The system is composed of fourcomponents: user interface, question analyzer, passage
retrieval and answer extractor. The user interface runs at the client side while all other modules
run at the server side.As illustrated in Figure 5, each component is structured of several modules
with a distinct task carried by each.
Main Features
QALC
QRISTAL
Ask.com
QARAB
ArabiQA
ArQA
QASAL
AQuASys
JAWEB
Web-based
system
× × √ × × × × × √
Retrieves answers
from a corpus
√ × × √ √ √ √ √ √
Retrieves answers
from the web
× √ √ × × × × × ×
Natural language
processing tools
√ √ √ √ × √ √ √ √
Named entity
recognition
× √ √ × √ √ √ × ×
Answers factoid
questions
√ × √ × √ √ √ √ √
Answers Open
domain questions
√ √ √ × √ × √ √ √
Supports multiple
languages
× √ √ × × × × × ×
Supports Arabic
language
× × √ √ √ √ √ √ √
Provides Short
answers block
√ √ × √ √ √ √ √ √
Measures answer
precision
- √ × - √ √ - √ √
Measures answer
recall
- √ √ - √ √ √ √ √
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
16
Figure 5: JAWEB system detailedarchitecture
• User Interface
The user interface is implemented as a simple webpage with an input form that accepts Arabic
factoid questions expressed in natural language. The question handler module validates entered
questions and passes them to the server side. Later on, when the answers areretrieved from the
corpus, the answer-viewer module formats and presents themin descendingorder of relevance; the
desired answer is displayed at the top,followed by a list of candidate answers.
• Question-Analyzer:
This component aims at identifying the type of the question. It has five modules:tokenizer,
answer-type detector, question keyword extractor, extra keywords generator and question words
stemmer. The user enters a question in a web-based form at the client side.The question is sent to
the server side,where the tokenizer breaks the question into separate words.Afterwards, the
answer-type detector identifies the category of the answer, which can be time, organization,
person, location etc., based on the interrogative nouns ( who, what, ‫أ‬ where and when),as
illustrated in Table2. The question keywords are the rest of the words in the question, after
excluding stop-words and interrogative nouns;they are identified by the question keywords
extractor module. The extra keywords generatorproducessynonyms of the keywords to expand the
range ofrelated answers. For example, for a question like: ‫ا‬ (When has Tunisia
become independent? ) the extra keywords are( ‫م‬ , year , , month etc.). Extracted
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
17
keywords are stemmed, if necessary, so affixes are removed by the question keywords stemmer
module to ease computing similarity with the keywords in the extracted answers later. The high-
level algorithm of the Question Analyzer is illustrated in Alg. 1.
Table2: Example of question interrogative nouns and expected answer type
Expected Answer type interrogative nouns
Time when
Location ‫أ‬ where
Object what
Person who
Quantity ‫آ‬ how much
Quantity ‫آ‬ how many
• AnswersRetriever:
This component searches for potential answers and retrieves them from the corpus. It locates all
sentences that contains a pattern that matches any word from the list of question keywords and
extra keywords, as shown in Alg. 2.
• Answers Extractor
This component is responsible for selecting the most relevant answers from the potential answers
list. It is composed of three modules. The first module isthe answer keywords stemmer whichgets
roots of the keywords in theretrieved answer to ease checking similarity between a user question
and potential answers, a taskcarried out by the second module, which is the answer-similarity
checker. Similarity is measured by counting the number of matching keywords between the
question and each retrieved answer. Based on this, the third module, answers ranker,sorts answers
in descending order of their relevance.The most relevant answer is considered as the desired
answer while the rest of answers are candidate answers. The main steps carried by this module are
illustrated in Alg. 3.
Alg. 2 Potential Answer Retrieval
Input: question type, list of stemmed question keywords and extra keywords
Output: list of potential answers
n = 0 // no. of potential answers
While not endOfCorpus
potentialAnswer[n]= locateMatchingSentence (questionKeywords, extraKeywords)
n++
Alg. 1 Question Analyzer
Input: user question in natural language
Output: type of expected answers, stemmed question keywords and extra
keywords
tokenVector = tokenizer (userQuestion)
interrogativeNoun = tokenVector.interrogativeNoun ();
If (interrogativeNoun == " ‫)"أ‬ // where
Then{ answerExpectedType = "Location"
If (interrogativeNoun == " ") // who
Then{ answerExpectedType = "Person"
If (interrogativeNoun == " ") // when
Then{ answerExpectedType = "Time"
If (interrogativeNoun == " ") // when
Then{ answerExpectedType = "Thing"
If (interrogativeNoun ==" ‫)"آ‬ // how much/many
Then{ answerExpectedType = "Quantity"
questionKeywords = tokenVector.questionKeywords ();
stemmedQuestionKeywords = stemmer (questionKeywords)
extraKeywords = extraQuestionKeyword.Generator (questionKeywords) }
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
18
4.IMPLEMENTATION AND TESTING
JAWEB has been built based on AQuASys [30] and as a natural extension to it.It provides a web
interface to the system, anadditional support for Arabic language presentation in web browsers,
an extended corpus as well as an extensive evaluation and testing framework.The user interface
was developed as a JSP webpage that acceptsArabic factoid questions. It provides a user-friendly
interface where a usercan type his/her questions. The question should begin with an interrogative
noun:
Dreamweaver, an Adobe proprietary web development application, was used to design the
interface. The servlet and java classes on the server side were implemented usingNetBeans IDE.
The server was a GlassFish Server 3.1.2,which is an open-source application server project started
by Sun Microsystems for the Java EE platform and now sponsored by Oracle Corporation.This
server ran on 2.50 GHz CPU and an internal memory of 4 GB.
An extended version of the Arabic corpus developed by [30], containing 39,660 words with size
of 457 KB,was used as the information pool to retrieve answers.The Arabic Khoja's stemmer [31]
was adopted for keywords stemming.
The system was tested thoroughly to insure correctness of obtained results. The following
example illustrates the main steps carried out by the system to answer the Arabic factoid question:
“when did Ibn Khaldun die?”
1) In JAWEB webpage, illustrated in Figure 6, the user typed the question in the textbox:
Figure 6: Jaweb system input interface
2) The user clickedthe " " button,sent the question to the sever side, where the question
analyzer received it. The following processes were done:
Alg. 3 AnswerExtractor (potentialAnswer [],stemmedQuestionKeywords)
Input:list of potential answers,stemmed question keywords
Output: list of extracted answers ranked in descending order of similarity
n = 0 // no. of potential answers
Repeat
stemmedAnswerKeywords = stemmer (potentialAnswer [n])
similarity = noOfSimilarWords (stemmedQuestionKeywords,
stemmedAnswerkeywords)
potentialAnswer [n]).similarity =similarity
sortDescending (potentialAnswer [])
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
19
a. Question tokenization
b. Question keywords extraction
c. Question keywords Stemming
d. Answer type detection
e. Extra keywords generation
3) The system located all answers that containany keyword or extra keyword from the corpus, as
shown in Figure 7.
Figure 7: Retrieved answer based on extra keyword
4) In each retrieved answer,
a. Keywordswere stemmed.
b. Similarity was calculatedby counting how many keywords are contained in each
answer.
c. Answers were ranked based onsimilarity.
5) The system displayed the answers in order, based on similarity. The answer with highest rank
was placed at the top, as shown in Figure 8.
Time
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
20
Figure 8: View the answers based on the rank
5.PERFORMANCE EVLAUATION
6.1 Performance Measures
The end objective of JAWEB is to develop a web-based QA system that answers Arabic
factoid questions with high accuracy and speed. To evaluate this system,three performance
measures were considered:
• RECALL [32]: the ratio of the number of relevant sentences retrieved to the total number
of relevant sentences in the corpus. It is usually expressed as a percentage and
calculated as:
RECALL= A/A+B *100% (1)
Where
A: number of retrieved relevant answers, B: number of relevant answer not retrieved
• PRECISION[32]: the ratio of the number of relevant sentences retrieved to the total
number of sentences retrieved. It is usually expressed as a percentage and calculated as:
PRECISION= A/A+C *100% (2)
Where
A: number of retrieved relevant answers, C: number of retrieved irrelevant answers
• RESPONSE TIME: the time from when a user hits the search button until aresponse is
displayed on the browser.
6.2 System Performance
We ran tens of experimentswith various type of factoid questions. The results of recall, precision
and response time are presented in Figure9, Figure 10 and Figure 11.
As illustrated in Figure 9, The RECALL of JAWEB was always an incredible 100%.This is
because it was always able to return all relevant answers in the corpus successfully. This means
that B, the number of relevant answer not retrieved, was maintained at zero.It is also important to
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
21
note that the first retrieved answersof all questions were correct and accurate,which can be
attributed to the use of the stemmer(in both question and answer modules), the extra keywords
generatorand the similarity checker which has resulted in accurate ranking of retrieved answers.
Figure 10 shows that the average PRECISIONwas 80%,aremarkable number, considering the
limited size of our corpus. Even the “how much” questions, which had the least precision, had a
percentage no lower than 70%. “What” and “how many”, the most straightforward questions,
received the most precise answers. It should be noted these results are also highly dependent
onthe amount of relevant and irrelevant information available in the corpus that relates to each
question.
In Figure 11, the averageRESPONSE TIMEis calculated in nanosecondsand presented for each
type of questions. The average response time was 108.2 nanoseconds. “How much” questions
consumed the longest response time, while questions starting with “who”took the least.This
makes sense, as the latter requires less processing, since searching for people’s namesis usually
straight-forward. However, the RESPONSE TIMEin general depends on the corpus size and the
power of the server machine.
Figure 9: Recallof different types of questions
Figure 10: Precision of different types of questions
0
20
40
60
80
100
when where who what how
much
how
many
Recall(%)
Question Type
0
20
40
60
80
100
when where who what how
much
how
many
Precision(%)
Question Type
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
22
Figure 11: Response time of different types of questions
6.3 Benchmarking
To have clear insight into JAWEB performance, we used a well-established QA system that
supports the Arabic Language, Ask.com [10], as a benchmark. Ask.com is a famous web-based
question answering system with multilingual support. Itaccepts colloquially-expressed questions
and retrieves hyperlinks to webpages that contain similar keywords to those in the questions.
We ran several experiments with each of the supported type of factoid questions to get answers
form Ask.com and JAWEB. Snapshots of answers to some example questions, listed in Table 3,
are presented in Figure 12 –Figure 17. All figures illustrate that JAWEB has consistently given
the right and most accurate answer as the first responseswhich can be attributed to the use of the
stemmer (in both question and answer modules), the extra keywords generator and the similarity
checker which has resulted in accurate ranking of retrieved answers . In contrast, Ask.com
provided the correct answer in the second or third hyperlinks.
Comparisons between Ask.com and JAWEB in terms ofRECALL, PRECISION and RESPONSE
TIMEare depicted in the graphs in Figure18, Figure 19 and Figure 20, respectively.It is important
to note that having no access to the knowledge base of Ask .com and in order to be able to
calculate the RECALL and PRECISION, we have only considered the first five retrieved pages,
which may slightly affect the accuracy of the experiments.
Figure 18 shows thatRECALL of JAWEB was alwayshigher than that of Ask.comand maintained
at 100%, as the formerhas successfully retrieved all relevant answers from the corpus. The
JAWEBPRECISIONwas admittedly less than ask.com, as shown in Figure 19,but it should be
noted that it performed as well as the famous website, scoring over 90% in precision, despite the
large difference in corpus sizes. In addition, even at its worst, for a question type that ask.com had
the lowest precision too, JAWEB was not less precise than 70%. Regardless, the precision can
easily be improved as the project launches, when feedback from users is attained and the corpus is
grown in size and capacity. We believe that this is clear evidence that JAWEB has great potential
as a QA platform and is much needed by Arabic-speaking Internet users across the world.
Naturally, ask.com wasfaster than our system,as illustrated in Figure 20, thanks to the use of high-
power server CPUs for ask.com, as opposed toJAWEB 2.50 GHz. Again, if the project launches,
the response time is expected to improve dramatically, when servers that are more powerful are
provided.
Table 3: Experimental questions
No. Type Arabic English
Q1 Who ‫؟‬ ‫ه‬ Who is Muhammad Tangier?
Q2 When ‫؟‬ ‫د‬ ‫ا‬ ‫ا‬ ‫ا‬ ‫ت‬ When was Kingdom of Saudi Arabia united?
Q3 What ‫؟‬ ‫ا‬ ‫ت‬ ‫ا‬ ‫ه‬ ‫ا‬ ‫ه‬ What are Egyptianpyramids?
Q4 Where ‫د‬ ‫ا‬ ‫ا‬ ‫ا‬ ‫أ‬‫؟‬ Where is the Kingdom of Saudi Arabia located?
Q5 How much ‫؟‬ ‫ر‬ ‫ا‬ ‫ة‬ ‫ا‬ ‫ارة‬ ‫در‬ ‫آ‬ How much is the temperature oftheEarth’scrust?
Q6 How many ‫ض؟‬ ‫ا‬ ‫ن‬ ‫د‬ ‫آ‬ How many residents are there in Riyadh?
90
95
100
105
110
115
120
125
when where who what how
much
how
many
Responsetime(Nano
sec.)
Question Type
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
23
Figure 12.a: JAWEB answers for a who question
Figure 12.b: Ask answers for a who question
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
24
Figure 13.a: JAWEB answers for a when question
Figure 13.b: Ask.com Answers for when question
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
25
Figure 14.a: JAWEB answers for a what question
Figure 14.b: Ask answers for a what question
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
26
Figure 15.a: JAWEB answers for where question
Figure 15.b: Ask answers for a where question
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
27
Figure 16.a: JAWEB answers for a how much question
Figure 16.b: Ask answers for a how much question
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
28
Figure 17.a: JAWEB answers for a how many question
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
29
Figure 17.b: Ask answers for a how many question
Figure 18: Comparison of the Recall
0
20
40
60
80
100
when where who what how much how many
Recall(%)
Question Type
Ask JAWEB
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
30
Figure 19: Comparison of the Precision
Figure 20: Comparison of the performance-Time
6.CONCLUSION & FUTURE WORK
JAWEB is a web-based Arabic question answering application system. It takes native Arabic
questions from the end user and processes them in the server. The processing consists of three
modules: question analysis, passage retrieval, and answer extraction.JAWEB finds and extracts
accurate answers for the user. It also retrieves the most relevant potential answers and ranks them
on the web page.
The evaluation experiment shows the high recall performance of the system, which is attributable
to stemming. In the future, we expect to improve user interaction to incorporate user feedback for
more precise results. We also plan on using the Arabic Named Entity Recognition to provide
more accurate answers. In addition, we will consider the "why" and "how" questions, which
require deep linguistic and semantic analysis.
0
20
40
60
80
100
when where who What how much how many
Precision(%)
Question Type
Ask JAWEB
0
20
40
60
80
100
120
when where who What how much how many
ResponseTime(Microsec.)
Question Type
Ask JAWEB
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
31
ACKNOWLEDGMENT
We would like to thank Dr. Sman Bekhti and his colleagues fortheir valuable support and
comments based on their Arabic QA systemAQUASYS. We would also like to thank Dr.Shereen
Khoja for providing us with the Arabic stemmerpackage that she has developed.
REFERENCES
1. R. Alshalabi “Experimenting with a Question Answering System for the Arabic Language”,
Computers & the Humanities, Vol. 38, pp. 397-415, Nov 2004.
2. K. Arai and A. Handayani “Question Answering System for an Effective Collaborative Learning”,
International Journal of Advanced Computer Science and Applications, Vol. 3, pp.60-64,
3. A. Allam and M. Haggag, “The Question Answering Systems: A Survey,” International Journal of
Research and Reviews in Information Sciences (IJRRIS), Vol. 2, No. 3, September 2012.
4. D. Hai, L. Kosseim, “The Problem of Precision in Restricted-Domain Question Answering. Some
Proposed Methods of Improvement”, In Proceedings of the ACL 2004 Workshop on Question
Answering in Restricted Domains, Barcelona, Spain, Publisher of Association for Computational
Linguistics, July 2004, PP.8-15.
5. A. Lamjiri, “A Syntactic Candidate Ranking Method for Answering Non-copulative Questions,”
Ph.D. thesis, Concordia University, 2007.
6. P. Walke, S. Karale, “Implementation approaches for various categories of question answering
system,” In Proceedings of IEEE Conference on Information & Communication Technologies
(ICT), 2013, pp.402,407, 11-12 April 2013.
7. R. Alshalabi,“Pattern-based stemmer for finding Arabic roots,”Information Technology Journal, Vol.
4, pp. 38-43, 2005.
8. R. Grishman, B. Sundheim,“Message Understanding Conference-6: A Brief History,”In Proceedings
ofInternational Conference on Computational Linguistics, 1996.
9. A. Borthwick,“A Maximum Entropy Approach to Named Entity Recognition”. PhD thesis, New
York University, 1999.
10. Y. Benajiba, P. Rosso , J. Ruiz, “ANERsys: An Arabic Named Entity Recognition system based on
Maximum Entropy”, In Proceedings of CICLing, 2007
11. M. Maamouri, A. Bies, T. Buckwalter, “The Penn Arabic Treebank: Building a Large-scale
Annotated Arabic Corpus,”In Proceedings ofArabic Language Resources and Tools, 2004.
12. S. Mesfar,“Standard Arabic formalization and linguistic platform for its analysis,”In Proceedings of
The Challenge of Arabic for NLP/MTConference, London, England, pp. 84-95, 2006.
13. A. Kabbaj,K. Bouzoubaa,“Amine Platform Page on SourceForge”, http://amine-
platform.sourceforge.net/, checked Aug.2nd, 2013
14 .JM Gómez, D. Buscaldi, P. Rosso, E. Sanchis, “JIRS Language-Independent Passage Retrieval
System A Comparative Study,”In Proceedings of 5th International Conference on Natural Language
Processing, ICON-2007, Hyderabad, India, January 4-6, 2007
15. W. Black, S. Elkateb, H. Rodriguez, M. Alkhalifa, P. Vossen, A. Pease, C. Fellbaum“Introducing the
Arabic WordNet project,” In Proceedings of3rd International WordNet Conference (GWC-06),
2006.
16. A. Ezzeldin, M. Shaheen “A Survey of Arabic question answering: challenges, tasks, approaches,
tools, and future trends”, In Proceedings ofThe International Arab Conference on Information
Technology, 2012, pp. 280-287.
17. O. Badawy, M. Shaheen and A. Hamadene “ARQA High-Performance Arabic Question Answering
System”, In Proceedings ofArabic Language Technology International Conference, 2011, pp. 129-
136.
18. P.Brihaye,“Technical principles of the Morphological Analysis of AraMorph,”
http://www.nongnu.org/aramorph/english/principles.html, checked Aug.2nd, 2013
19. H. Harmanani, W. Keirouz, S. Raheel, “A Rule-Based Extensible Stemmer for Information Retrieval
with Application to Arabic,”International Arab Journal of Information Technology, Vol.3, No. 3,
2006
20. S. Kalaivani, K. Duraiswamy, “Comparison of Question Answering Systems Based on Ontology and
Semantic Web in Different Environment,” Journal of Computer Science, Vol. 8, No. 9, 2012): 1407.
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014
32
21. O. Ferret, B. Grau, M. Huraults-Plantet, “Finding an answer based on the recognition of the issue
focus,”In Proceedings ofTREC-10.
22. D. Laurent, P. Séguéla, S. NègreCross,“Lingual Question Answering using QRISTAL for CLEF
2006, Lecture Notes in Computer Science, Vol. 4730, 2007, pp. 339-350.
23. S. Parthasarathy, J. Chen “A Web-based Question Answering System for Effective e-Learning,” In
Proceedings ofIEEE International Conference on Advanced Learning Technologies, 2007, pp. 142-
146.
24. Ask.com, checked Aug. 2nd, 2013.
25. F. Mohammed, K. Nasser, H. Harb “A knowledge-based Arabic Question Answering System
(AQAS),” In Proceedings ofACM SIGART Bulletin, 1993, pp. 21-33.
26. B. Hammo, H. Abu-Salem, S. Lytinen. “QARAB: A Question Answering System to Support the
Arabic Language”. In Proceedings ofthe workshop on computational approaches to Semitic
languages, 2002, pp. 55-65, Philadelphia.
27. Y. Benajiba, P. Rosso, A. Lyhyaoui “Implementation of the ArabiQA Question Answering System's
components.”, In Proceedings ofWorkshop on Arabic Natural Language Processing, 2nd Information
Communication Technologies Int. Symposium, ICTIS, 2007.
28. W. Brini, M. Ellouze, S. Mesfar, L. Belguith “An Arabic Question-Answering system for factoid
questionns,”In Proceedings of IEEE International Conference on Natural Language Processing and
Knowledge Engineering, 2009.
29. http://www.nooj4nlp.net/pages/introduction.html, checked may 6th, 2013.
30. S. Bekhti, A. Rehman, M. Al-Harbi, T. Saba “AQuASys: an Arabic Question Answering System
Based on Extensive Question Analysis and Answer Relevance Scoring,” International Journal of
Academic Research, Vol. 3, pp.45-54, July 2011.
31. S. Khoja, “Stemming Arabic Text,”" http://zeus.cs.pacificu.edu/shereen/research.htm, checked may
6th, 2013.
32. K. Arai, A. Handayani “Question Answering System for an Effective Collaborative
Learning,”International Journal of Advanced Computer Science and Applications, Vol. 3, pp.60-64,
2012.

Weitere ähnliche Inhalte

Was ist angesagt?

ATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliterationATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliterationIJECEIAES
 
Hybrid approaches for automatic vowelization of arabic texts
Hybrid approaches for automatic vowelization of arabic textsHybrid approaches for automatic vowelization of arabic texts
Hybrid approaches for automatic vowelization of arabic textsijnlc
 
Improving performance of english hindi cross language information retrieval u...
Improving performance of english hindi cross language information retrieval u...Improving performance of english hindi cross language information retrieval u...
Improving performance of english hindi cross language information retrieval u...ijnlc
 
IRJET- Review of Chatbot System in Marathi Language
IRJET- Review of Chatbot System in Marathi LanguageIRJET- Review of Chatbot System in Marathi Language
IRJET- Review of Chatbot System in Marathi LanguageIRJET Journal
 
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...ijnlc
 
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMHINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMijnlc
 
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...IJECEIAES
 
Recognizing named entities in
Recognizing named entities inRecognizing named entities in
Recognizing named entities incsandit
 
Exploring the effects of stemming on
Exploring the effects of stemming onExploring the effects of stemming on
Exploring the effects of stemming onijaia
 
Named Entity Recognition using Tweet Segmentation
Named Entity Recognition using Tweet SegmentationNamed Entity Recognition using Tweet Segmentation
Named Entity Recognition using Tweet SegmentationIRJET Journal
 
IRJET- Review of Chatbot System in Hindi Language
IRJET-  	  Review of Chatbot System in Hindi LanguageIRJET-  	  Review of Chatbot System in Hindi Language
IRJET- Review of Chatbot System in Hindi LanguageIRJET Journal
 
Comparison Analysis of Post- Processing Method for Punjabi Font
Comparison Analysis of Post- Processing Method for Punjabi FontComparison Analysis of Post- Processing Method for Punjabi Font
Comparison Analysis of Post- Processing Method for Punjabi FontIRJET Journal
 
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...Syeful Islam
 
Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageSurvey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageEditor IJCATR
 
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUE
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUEIMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUE
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUEcsandit
 
IRJET - Chat-Bot for College Information System using AI
IRJET -  	  Chat-Bot for College Information System using AIIRJET -  	  Chat-Bot for College Information System using AI
IRJET - Chat-Bot for College Information System using AIIRJET Journal
 
Grapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisGrapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisIJERA Editor
 
DEVELOPMENT OF TOOL TO PROMOTE WEB ACCESSIBILITY FOR DEAF
DEVELOPMENT OF TOOL TO PROMOTE WEB ACCESSIBILITY FOR DEAFDEVELOPMENT OF TOOL TO PROMOTE WEB ACCESSIBILITY FOR DEAF
DEVELOPMENT OF TOOL TO PROMOTE WEB ACCESSIBILITY FOR DEAFcsandit
 

Was ist angesagt? (20)

ATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliterationATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliteration
 
Hybrid approaches for automatic vowelization of arabic texts
Hybrid approaches for automatic vowelization of arabic textsHybrid approaches for automatic vowelization of arabic texts
Hybrid approaches for automatic vowelization of arabic texts
 
A017420108
A017420108A017420108
A017420108
 
Improving performance of english hindi cross language information retrieval u...
Improving performance of english hindi cross language information retrieval u...Improving performance of english hindi cross language information retrieval u...
Improving performance of english hindi cross language information retrieval u...
 
IRJET- Review of Chatbot System in Marathi Language
IRJET- Review of Chatbot System in Marathi LanguageIRJET- Review of Chatbot System in Marathi Language
IRJET- Review of Chatbot System in Marathi Language
 
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
 
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMHINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
 
Ac04507168175
Ac04507168175Ac04507168175
Ac04507168175
 
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
 
Recognizing named entities in
Recognizing named entities inRecognizing named entities in
Recognizing named entities in
 
Exploring the effects of stemming on
Exploring the effects of stemming onExploring the effects of stemming on
Exploring the effects of stemming on
 
Named Entity Recognition using Tweet Segmentation
Named Entity Recognition using Tweet SegmentationNamed Entity Recognition using Tweet Segmentation
Named Entity Recognition using Tweet Segmentation
 
IRJET- Review of Chatbot System in Hindi Language
IRJET-  	  Review of Chatbot System in Hindi LanguageIRJET-  	  Review of Chatbot System in Hindi Language
IRJET- Review of Chatbot System in Hindi Language
 
Comparison Analysis of Post- Processing Method for Punjabi Font
Comparison Analysis of Post- Processing Method for Punjabi FontComparison Analysis of Post- Processing Method for Punjabi Font
Comparison Analysis of Post- Processing Method for Punjabi Font
 
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
 
Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageSurvey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi Language
 
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUE
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUEIMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUE
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUE
 
IRJET - Chat-Bot for College Information System using AI
IRJET -  	  Chat-Bot for College Information System using AIIRJET -  	  Chat-Bot for College Information System using AI
IRJET - Chat-Bot for College Information System using AI
 
Grapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisGrapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech Synthesis
 
DEVELOPMENT OF TOOL TO PROMOTE WEB ACCESSIBILITY FOR DEAF
DEVELOPMENT OF TOOL TO PROMOTE WEB ACCESSIBILITY FOR DEAFDEVELOPMENT OF TOOL TO PROMOTE WEB ACCESSIBILITY FOR DEAF
DEVELOPMENT OF TOOL TO PROMOTE WEB ACCESSIBILITY FOR DEAF
 

Andere mochten auch

Building a vietnamese dialog mechanism for v dlg~tabl system
Building a vietnamese dialog mechanism for v dlg~tabl systemBuilding a vietnamese dialog mechanism for v dlg~tabl system
Building a vietnamese dialog mechanism for v dlg~tabl systemijnlc
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicijnlc
 
Verb based manipuri sentiment analysis
Verb based manipuri sentiment analysisVerb based manipuri sentiment analysis
Verb based manipuri sentiment analysisijnlc
 
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...ijnlc
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONijnlc
 
Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...ijnlc
 
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATIONHANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATIONijnlc
 
Hybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textHybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textijnlc
 
CLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSING
CLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSINGCLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSING
CLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSINGijnlc
 
Chunking in manipuri using crf
Chunking in manipuri using crfChunking in manipuri using crf
Chunking in manipuri using crfijnlc
 
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODELNERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODELijnlc
 
An improved apriori algorithm for association rules
An improved apriori algorithm for association rulesAn improved apriori algorithm for association rules
An improved apriori algorithm for association rulesijnlc
 
A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...ijnlc
 
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...ijnlc
 
An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...ijnlc
 
Evaluation of subjective answers using glsa enhanced with contextual synonymy
Evaluation of subjective answers using glsa enhanced with contextual synonymyEvaluation of subjective answers using glsa enhanced with contextual synonymy
Evaluation of subjective answers using glsa enhanced with contextual synonymyijnlc
 
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAl
S ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAlS ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAl
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAlijnlc
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELSijnlc
 
Conceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationConceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationijnlc
 
C. Porchietto Libero News.It 28.05.09
C. Porchietto Libero News.It 28.05.09C. Porchietto Libero News.It 28.05.09
C. Porchietto Libero News.It 28.05.09Claudia Porchietto
 

Andere mochten auch (20)

Building a vietnamese dialog mechanism for v dlg~tabl system
Building a vietnamese dialog mechanism for v dlg~tabl systemBuilding a vietnamese dialog mechanism for v dlg~tabl system
Building a vietnamese dialog mechanism for v dlg~tabl system
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabic
 
Verb based manipuri sentiment analysis
Verb based manipuri sentiment analysisVerb based manipuri sentiment analysis
Verb based manipuri sentiment analysis
 
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
 
Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...
 
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATIONHANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
 
Hybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textHybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic text
 
CLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSING
CLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSINGCLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSING
CLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSING
 
Chunking in manipuri using crf
Chunking in manipuri using crfChunking in manipuri using crf
Chunking in manipuri using crf
 
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODELNERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
 
An improved apriori algorithm for association rules
An improved apriori algorithm for association rulesAn improved apriori algorithm for association rules
An improved apriori algorithm for association rules
 
A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...
 
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
 
An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...
 
Evaluation of subjective answers using glsa enhanced with contextual synonymy
Evaluation of subjective answers using glsa enhanced with contextual synonymyEvaluation of subjective answers using glsa enhanced with contextual synonymy
Evaluation of subjective answers using glsa enhanced with contextual synonymy
 
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAl
S ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAlS ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAl
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAl
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
 
Conceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationConceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarization
 
C. Porchietto Libero News.It 28.05.09
C. Porchietto Libero News.It 28.05.09C. Porchietto Libero News.It 28.05.09
C. Porchietto Libero News.It 28.05.09
 

Ähnlich wie Developemnt and evaluation of a web based question answering system for arabic languag

Car-Following Parameters by Means of Cellular Automata in the Case of Evacuation
Car-Following Parameters by Means of Cellular Automata in the Case of EvacuationCar-Following Parameters by Means of Cellular Automata in the Case of Evacuation
Car-Following Parameters by Means of Cellular Automata in the Case of EvacuationCSCJournals
 
Diacritic Oriented Arabic Information Retrieval System
Diacritic Oriented Arabic Information Retrieval SystemDiacritic Oriented Arabic Information Retrieval System
Diacritic Oriented Arabic Information Retrieval SystemCSCJournals
 
Keywords- Based on Arabic Information Retrieval Using Light Stemmer
Keywords- Based on Arabic Information Retrieval Using Light Stemmer Keywords- Based on Arabic Information Retrieval Using Light Stemmer
Keywords- Based on Arabic Information Retrieval Using Light Stemmer IJCSIS Research Publications
 
Using linguistic analysis to translate
Using linguistic analysis to translateUsing linguistic analysis to translate
Using linguistic analysis to translateIJwest
 
Clustering Web Search Results for Effective Arabic Language Browsing
Clustering Web Search Results for Effective Arabic Language BrowsingClustering Web Search Results for Effective Arabic Language Browsing
Clustering Web Search Results for Effective Arabic Language Browsingkevig
 
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETSCUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETSgerogepatton
 
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETSCUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETSijaia
 
Customer Opinions Evaluation: A Case Study on Arabic Tweets
Customer Opinions Evaluation: A Case Study on Arabic Tweets Customer Opinions Evaluation: A Case Study on Arabic Tweets
Customer Opinions Evaluation: A Case Study on Arabic Tweets gerogepatton
 
QUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageQUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageIJERA Editor
 
A black-box-approach-for-response-quality-evaluation-of-conversational-agent-...
A black-box-approach-for-response-quality-evaluation-of-conversational-agent-...A black-box-approach-for-response-quality-evaluation-of-conversational-agent-...
A black-box-approach-for-response-quality-evaluation-of-conversational-agent-...Cemal Ardil
 
A New Concept Extraction Method for Ontology Construction From Arabic Text
A New Concept Extraction Method for Ontology Construction From Arabic TextA New Concept Extraction Method for Ontology Construction From Arabic Text
A New Concept Extraction Method for Ontology Construction From Arabic TextCSCJournals
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Textskevig
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Textskevig
 
Application of hidden markov model in question answering systems
Application of hidden markov model in question answering systemsApplication of hidden markov model in question answering systems
Application of hidden markov model in question answering systemsijcsa
 
THE CORRELATION BETWEEN ARABIC STUDENT’S ENGLISH PROFICIENCY AND THEIR COMPUT...
THE CORRELATION BETWEEN ARABIC STUDENT’S ENGLISH PROFICIENCY AND THEIR COMPUT...THE CORRELATION BETWEEN ARABIC STUDENT’S ENGLISH PROFICIENCY AND THEIR COMPUT...
THE CORRELATION BETWEEN ARABIC STUDENT’S ENGLISH PROFICIENCY AND THEIR COMPUT...ijmpict
 
THE CORRELATION BETWEEN ARABIC STUDENT’S ENGLISH PROFICIENCY AND THEIR COMPUT...
THE CORRELATION BETWEEN ARABIC STUDENT’S ENGLISH PROFICIENCY AND THEIR COMPUT...THE CORRELATION BETWEEN ARABIC STUDENT’S ENGLISH PROFICIENCY AND THEIR COMPUT...
THE CORRELATION BETWEEN ARABIC STUDENT’S ENGLISH PROFICIENCY AND THEIR COMPUT...ijmpict
 
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...ijnlc
 
Mediterranean Arabic Language and Speech Technology Resources
Mediterranean Arabic Language and Speech Technology ResourcesMediterranean Arabic Language and Speech Technology Resources
Mediterranean Arabic Language and Speech Technology ResourcesHend Al-Khalifa
 
04. 9990 16097-1-ed (edited arf)
04. 9990 16097-1-ed (edited arf)04. 9990 16097-1-ed (edited arf)
04. 9990 16097-1-ed (edited arf)IAESIJEECS
 
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...IJCI JOURNAL
 

Ähnlich wie Developemnt and evaluation of a web based question answering system for arabic languag (20)

Car-Following Parameters by Means of Cellular Automata in the Case of Evacuation
Car-Following Parameters by Means of Cellular Automata in the Case of EvacuationCar-Following Parameters by Means of Cellular Automata in the Case of Evacuation
Car-Following Parameters by Means of Cellular Automata in the Case of Evacuation
 
Diacritic Oriented Arabic Information Retrieval System
Diacritic Oriented Arabic Information Retrieval SystemDiacritic Oriented Arabic Information Retrieval System
Diacritic Oriented Arabic Information Retrieval System
 
Keywords- Based on Arabic Information Retrieval Using Light Stemmer
Keywords- Based on Arabic Information Retrieval Using Light Stemmer Keywords- Based on Arabic Information Retrieval Using Light Stemmer
Keywords- Based on Arabic Information Retrieval Using Light Stemmer
 
Using linguistic analysis to translate
Using linguistic analysis to translateUsing linguistic analysis to translate
Using linguistic analysis to translate
 
Clustering Web Search Results for Effective Arabic Language Browsing
Clustering Web Search Results for Effective Arabic Language BrowsingClustering Web Search Results for Effective Arabic Language Browsing
Clustering Web Search Results for Effective Arabic Language Browsing
 
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETSCUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
 
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETSCUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
 
Customer Opinions Evaluation: A Case Study on Arabic Tweets
Customer Opinions Evaluation: A Case Study on Arabic Tweets Customer Opinions Evaluation: A Case Study on Arabic Tweets
Customer Opinions Evaluation: A Case Study on Arabic Tweets
 
QUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageQUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu Language
 
A black-box-approach-for-response-quality-evaluation-of-conversational-agent-...
A black-box-approach-for-response-quality-evaluation-of-conversational-agent-...A black-box-approach-for-response-quality-evaluation-of-conversational-agent-...
A black-box-approach-for-response-quality-evaluation-of-conversational-agent-...
 
A New Concept Extraction Method for Ontology Construction From Arabic Text
A New Concept Extraction Method for Ontology Construction From Arabic TextA New Concept Extraction Method for Ontology Construction From Arabic Text
A New Concept Extraction Method for Ontology Construction From Arabic Text
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
 
Application of hidden markov model in question answering systems
Application of hidden markov model in question answering systemsApplication of hidden markov model in question answering systems
Application of hidden markov model in question answering systems
 
THE CORRELATION BETWEEN ARABIC STUDENT’S ENGLISH PROFICIENCY AND THEIR COMPUT...
THE CORRELATION BETWEEN ARABIC STUDENT’S ENGLISH PROFICIENCY AND THEIR COMPUT...THE CORRELATION BETWEEN ARABIC STUDENT’S ENGLISH PROFICIENCY AND THEIR COMPUT...
THE CORRELATION BETWEEN ARABIC STUDENT’S ENGLISH PROFICIENCY AND THEIR COMPUT...
 
THE CORRELATION BETWEEN ARABIC STUDENT’S ENGLISH PROFICIENCY AND THEIR COMPUT...
THE CORRELATION BETWEEN ARABIC STUDENT’S ENGLISH PROFICIENCY AND THEIR COMPUT...THE CORRELATION BETWEEN ARABIC STUDENT’S ENGLISH PROFICIENCY AND THEIR COMPUT...
THE CORRELATION BETWEEN ARABIC STUDENT’S ENGLISH PROFICIENCY AND THEIR COMPUT...
 
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
 
Mediterranean Arabic Language and Speech Technology Resources
Mediterranean Arabic Language and Speech Technology ResourcesMediterranean Arabic Language and Speech Technology Resources
Mediterranean Arabic Language and Speech Technology Resources
 
04. 9990 16097-1-ed (edited arf)
04. 9990 16097-1-ed (edited arf)04. 9990 16097-1-ed (edited arf)
04. 9990 16097-1-ed (edited arf)
 
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
 

Kürzlich hochgeladen

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Kürzlich hochgeladen (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Developemnt and evaluation of a web based question answering system for arabic languag

  • 1. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 DOI : 10.5121/ijnlc.2014.3202 11 DEVELOPEMNT AND EVALUATION OF A WEB BASED QUESTION ANSWERING SYSTEM FOR ARABIC LANGUAG Heba Kurdi, Sara Alkhaider, Nada Alfaifi Department of Computer Science Al Imam Muhammad Ibn Saud Islamic University Riyadh, SA ABSTRACT Question Answering (QA) systems are gaining great importance due to the increasing amount of web content and the high demand for digital information that regular information retrieval techniques cannot satisfy. A question answering system enables users to have a natural language dialog with the machine, which is required for virtually all emerging online service systems on the Internet. The need for such systems is higher in the context of the Arabic language. This is because of the scarcity of Arabic QA systems, which can be attributed to the great challenges they present to the research community,including theparticularities of Arabic, such as short vowels, absence of capital letters, complex morphology, etc. In this paper, we report the design and implementation of an Arabic web-based question answering system,which we called “JAWEB”, the Arabic word for the verb “answer”. Unlike all Arabic question- answering systems, JAWEB is a web-based application,so it can be accessed at any time and from anywhere. Evaluating JAWEBshowed that it gives the correct answer with 100% recall and 80% precision on average. When comparedto ask.com, the well-established web-based QA system, JAWEBprovided 15- 20% higher recall.These promising results give clear evidence that JAWEB has great potential as a QA platform and is much needed by Arabic-speaking Internet users across the world. KEYWORDS Question Answering system, Natural Language Processing , Arabic language tools. 1.INTRODUCTION Question-answering is a challenging task in general. It involves state-of-the-art techniques in various fields, such as Information Retrieval (IR), Natural Language Processing (NLP), Artificial Intelligence (AI) and software technologies. The main goal of a question-answering system, is to give a precise answer to users’ queries in natural language [1] [2]. This has the great advantage of helping users to get desired answers without searching large pools of information. Unlike search engines, such as Google and Yahoo, which allow a user to retrieve webpages or documents that are partially relevant to given keyword(s), while leaving the job of excluding irrelevant hyperlinks and locating desired passages to users, QA systems provide the user with highly relevant information in answer to their questions at the passage or sentence level [3]. QA systems categorize questions into three main groups: first, fact-seeking questions, i.e. factoid questions, asking about (who, when, where, what and how much) which relate to different types of entities: person, location, organization, time and quantity. This type of question is the most common form of questions and therefore is the focus of this work. Second is list questions (e.g. List all the countries of the European Union), asking about multiple pieces of information with a common relation between them. Third is why-type questions (e.g. Why is the sky blue?) which seeks explanations of causes for a certain phenomenon or event. The second and third categories are
  • 2. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 12 less common in QA systems and require more advanced NLP and AI techniques and were therefore excluded from this study. Based on the knowledge domain, QA systems classify questions aseither closed-domain questions, which ask about information related to a specific domain (for example medical or sport, etc.), or open-domain questions which ask about information in any knowledge discipline. They require wider knowledge and advanced techniques to extract the most relevant answers [4]. The latter is the focus of this paper. A QA system consists of three main components: a question-classification component, an information-retrieval component, and an answer-extraction component. The question- classification component playsa primary role as it categorizes questions based on their keywords: who, when, where, what and how much. The information-retrieval componentsearches for relevant answers in information sources by looking for matching keywords. Finally, the answer- extraction componentselects the most relevant answers and ranks them accordingly.The quality of a QA system heavily depends on the effectiveness of this module as it makes the decision about the final list of answers to be presented to the user [5] [6]. Although the systems have common core components ,they differ vastly in the techniques utilized by each component. This can be attributed mainly to the supported language(s) by the system, since each language has special features that affect how it should be analyzed, searched and retrieved. Arabic is among the most popular languages with nearly 300 million speakers across the globe. It is a Semitic language. These are well known for their non-concatenative morphology, in which roots consist of an isolated set of constants, rather than syllables or words. This feature among many others has rated NLP and IR as grand challenging tasks for the Arabic language, despite some attempts such as [7]-[15]. Consequently, the huge Arabic content on the Internet is still underutilized[16]. Important emerging applications, question-answering systems for online services in particular, are simply not available in Arabic, despite their abundance in many other languages, especially English and other Latin-based languages .The challenging features of the Arabic language ,sacristy of Arabic QA systems and lack of web-based Arabic QA systems are the main drivers for this paper. It adds several contributions to this important field, including comprehensively surveying Arabic QA systems and a comparing between them, developing and evaluating of JAWEB, the first Arabic web-based QA system, and extending the Arabic Corpus provided in [30]. The remaining of this paper is organized as follows: In section 2, the challenges of the Arabic language are highlighted. Section 3 reviews related work in QA systems. The system architecture and functional components of JAWEB are introduced in section 4. In section 5, the experimental results are presented and discussed. Finally, section 6 concludes the paper and indicates future work. 2.CHALLENGES OF ARABIC LANGUAGE Arabic is highly inflectional and derivational. This results in sparseness of terms in text, which leads to inefficiency in many statistical IR and NLP techniques. The absence of diacritics in Modern Standard Arabic also adds a lot of ambiguity to Question Analysis and Answer Extraction [16]. Many aspects are involved in the slow progress of Arabic NLP and IR, compared to the accomplishments in English and other languages. These aspects ,as highlighted in [17], include: 1. Arabic has a complex morphology. It is highly derivational and inflectional, which extremely complicates morphology analysis.
  • 3. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 13 • Arabic is a derivational language, to find a word in an Arabic dictionary, we start by extracting the root and then we search for the root in the dictionary. The word derivation is done by adding affixes (prefix, infix, or suffix) to the root, according to several patterns [18] [19], Figure 1 shows an example of that. Figure 1 An example of Arabic Language derivation • Arabic is an inflectional language, which means that the construction of a word involves finding the root and adding affixes (prefix, infix and suffix) to it, as illustrated in Figure 2. Figure 2an example of Arabic words composition 2. The correct meaning of an Arabic word can be ambiguous in the absence of diacritical marks (short vowels);so a written word may hold different meanings as illustrated in Figure 3. 3. Arabic does not use capital letters which makes differentiating between named entities and other words difficult, as shown in Figure 4. Disconnected َ َ‫ـــ‬َ Is disconnected ِ ُ‫ـــ‬َ So connect َِ‫ــــ‬َ Disconnection, chapter, semester ْ َ‫ــــ‬ٌ Detail (verb) َ‫ــــ‬ْ Disconnected parts َ ِ‫ــــ‬ Figure 3: An example of the effect of Arabic diacritics in the meaning of a word Figure 4. An example of absence of capital letters in Arabic 4. Numerals are written form left to right, while alphabets are written from right to left. This makes editing Arabic text difficult when both numbers and letters are presented on the same line. 5. The most used encodings for Arabic text, UTF-8 and Unicode, present many problems when processing Arabic texts. 6. Lack of Arabic corpora, lexicon and electronic dictionaries.
  • 4. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 14 3.RELATED WORK Question-Answering systems present a good solution for textual information retrieval and knowledge sharing and discovery. This is why a large number of QA systems have been developed in various languages. Some languages, such as English, are better served than others, such as Arabic. This might be related to language features and the maturity of research in the countries speaking it. This section surveys QA systems for English and for Arabic and compares between them. A. For Latin languages Due to the popularity, importance and features of the English language, tens of QA systems are available in English. Extensive surveys are available in [3] [20]. Here we highlight some of the most prominent work on this area. • QALC [21] is the first QA system developed for English in the Text REtrieval Conference (TREC) evolution campaign in 1999. It providesanswers to English factoid questions based on syntactic and semantic analysis. • QRISTAL [22] is a multilingual QA system (English, French, Portuguese, Italian and Polish). It retrieves answers from a local database or from the web, based on named entities’ recognition and conceptual and thematic analysis. • WebQA [23] is a web based QA system,which uses the template-mapping technique to define the question type. It retrieves passages from Google search engine and employs a clustering technique to extract multiple answer blocks from different web pages. • Ask.com [24], originally known as Ask Jeeves, is a famous web-based question-answering system that supports English, Arabic and many other languages. It allows users to get answers to questions expressed in everyday, natural language, as well as by traditional keyword searching. It also supports math, dictionary, and conversion questions. B. For Arabic languages The situation is less bright for the Arabic language, despite of its wide spread. Although research in the field of Arabic QA systemshas already started [7]-[15],it is slow progressing and has limited results. Computerized tools and resources in general are lacking in Arabic [16], which has reflected negatively on the number of Arabic QA systems. The list includes: • AQSA (1993) [25] seems to be the first QA system in the language. It is a knowledge-based QA system that extracts answers from structured data only. It uses the frames technique to present the knowledge fromthe radiation domain. However,no published evaluation is available for the system. • QARAB (2004) [26]is a stand-alone (non-web-based)QA systems that uses IR and NLP techniques to extract answers from a collection of Arabic newspaper texts. It provides answers to factoid questions but does not support other types of questions, such as how or why. • ArabiQA (2007) [27] is a stand-alone Arabic QA system that deals with factoid questions,using Named Entity Recognitiontechniques and Java Information Retrieval System (JIRS) for Arabic text. It is designed specifically for Arabic factoid questions. However, system implementation has not been completed yet. • QASAL (2009) [28] is a stand-alone Arabic QA system for factoid questions which uses the NooJ platform [29] as a linguistic development environment. The system takes advantage of some linguistic techniques from IR and NLP to process Arabic text documents containing answers to factoid questions. Published work based on this systems does not include experimental resultsor performance metrics. The overall functionality of the system is limited to the amount of available tools developed for the Arabic language by the NooJ Arabic
  • 5. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 15 modules. • ArQA (2011) [17] is a stand-alone QA system that provides answers to factoid questions expressed in Arabic. This system has a pipeline architecture with four modules: question processing, passage retrieval, answer extraction and answer validation modules. Each module is the result of combining several IR and NLP techniques and tools to improve validity of returned answers. • AQuASys (2011) [30]: a stand-alone Arabic QAsystem for factoid questions. It extensivelyutilizes NLP techniques to analyses questions and retrieves answers from an Arabic corpus that has been developed by the authors. Retrieved answers are scored and presented based on their relevance. Table1 illustrates the differences between some state-of-art QA systems in English and Arabic languages. Based on the table it is clear that web-based Arabic QA systems are lacking. Table 1: comparison between QA Systems 3.JAWEB ARCHITECTURE JAWEBis an Arabic web-based QA system that focuses on factoid questions. It accepts questions related to any named entity,including person, location, organization, time etc.. Afteranalyzing the question, important information is extracted to retrieve the most relevant answers from an Arabic corpus. The system is composed of fourcomponents: user interface, question analyzer, passage retrieval and answer extractor. The user interface runs at the client side while all other modules run at the server side.As illustrated in Figure 5, each component is structured of several modules with a distinct task carried by each. Main Features QALC QRISTAL Ask.com QARAB ArabiQA ArQA QASAL AQuASys JAWEB Web-based system × × √ × × × × × √ Retrieves answers from a corpus √ × × √ √ √ √ √ √ Retrieves answers from the web × √ √ × × × × × × Natural language processing tools √ √ √ √ × √ √ √ √ Named entity recognition × √ √ × √ √ √ × × Answers factoid questions √ × √ × √ √ √ √ √ Answers Open domain questions √ √ √ × √ × √ √ √ Supports multiple languages × √ √ × × × × × × Supports Arabic language × × √ √ √ √ √ √ √ Provides Short answers block √ √ × √ √ √ √ √ √ Measures answer precision - √ × - √ √ - √ √ Measures answer recall - √ √ - √ √ √ √ √
  • 6. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 16 Figure 5: JAWEB system detailedarchitecture • User Interface The user interface is implemented as a simple webpage with an input form that accepts Arabic factoid questions expressed in natural language. The question handler module validates entered questions and passes them to the server side. Later on, when the answers areretrieved from the corpus, the answer-viewer module formats and presents themin descendingorder of relevance; the desired answer is displayed at the top,followed by a list of candidate answers. • Question-Analyzer: This component aims at identifying the type of the question. It has five modules:tokenizer, answer-type detector, question keyword extractor, extra keywords generator and question words stemmer. The user enters a question in a web-based form at the client side.The question is sent to the server side,where the tokenizer breaks the question into separate words.Afterwards, the answer-type detector identifies the category of the answer, which can be time, organization, person, location etc., based on the interrogative nouns ( who, what, ‫أ‬ where and when),as illustrated in Table2. The question keywords are the rest of the words in the question, after excluding stop-words and interrogative nouns;they are identified by the question keywords extractor module. The extra keywords generatorproducessynonyms of the keywords to expand the range ofrelated answers. For example, for a question like: ‫ا‬ (When has Tunisia become independent? ) the extra keywords are( ‫م‬ , year , , month etc.). Extracted
  • 7. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 17 keywords are stemmed, if necessary, so affixes are removed by the question keywords stemmer module to ease computing similarity with the keywords in the extracted answers later. The high- level algorithm of the Question Analyzer is illustrated in Alg. 1. Table2: Example of question interrogative nouns and expected answer type Expected Answer type interrogative nouns Time when Location ‫أ‬ where Object what Person who Quantity ‫آ‬ how much Quantity ‫آ‬ how many • AnswersRetriever: This component searches for potential answers and retrieves them from the corpus. It locates all sentences that contains a pattern that matches any word from the list of question keywords and extra keywords, as shown in Alg. 2. • Answers Extractor This component is responsible for selecting the most relevant answers from the potential answers list. It is composed of three modules. The first module isthe answer keywords stemmer whichgets roots of the keywords in theretrieved answer to ease checking similarity between a user question and potential answers, a taskcarried out by the second module, which is the answer-similarity checker. Similarity is measured by counting the number of matching keywords between the question and each retrieved answer. Based on this, the third module, answers ranker,sorts answers in descending order of their relevance.The most relevant answer is considered as the desired answer while the rest of answers are candidate answers. The main steps carried by this module are illustrated in Alg. 3. Alg. 2 Potential Answer Retrieval Input: question type, list of stemmed question keywords and extra keywords Output: list of potential answers n = 0 // no. of potential answers While not endOfCorpus potentialAnswer[n]= locateMatchingSentence (questionKeywords, extraKeywords) n++ Alg. 1 Question Analyzer Input: user question in natural language Output: type of expected answers, stemmed question keywords and extra keywords tokenVector = tokenizer (userQuestion) interrogativeNoun = tokenVector.interrogativeNoun (); If (interrogativeNoun == " ‫)"أ‬ // where Then{ answerExpectedType = "Location" If (interrogativeNoun == " ") // who Then{ answerExpectedType = "Person" If (interrogativeNoun == " ") // when Then{ answerExpectedType = "Time" If (interrogativeNoun == " ") // when Then{ answerExpectedType = "Thing" If (interrogativeNoun ==" ‫)"آ‬ // how much/many Then{ answerExpectedType = "Quantity" questionKeywords = tokenVector.questionKeywords (); stemmedQuestionKeywords = stemmer (questionKeywords) extraKeywords = extraQuestionKeyword.Generator (questionKeywords) }
  • 8. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 18 4.IMPLEMENTATION AND TESTING JAWEB has been built based on AQuASys [30] and as a natural extension to it.It provides a web interface to the system, anadditional support for Arabic language presentation in web browsers, an extended corpus as well as an extensive evaluation and testing framework.The user interface was developed as a JSP webpage that acceptsArabic factoid questions. It provides a user-friendly interface where a usercan type his/her questions. The question should begin with an interrogative noun: Dreamweaver, an Adobe proprietary web development application, was used to design the interface. The servlet and java classes on the server side were implemented usingNetBeans IDE. The server was a GlassFish Server 3.1.2,which is an open-source application server project started by Sun Microsystems for the Java EE platform and now sponsored by Oracle Corporation.This server ran on 2.50 GHz CPU and an internal memory of 4 GB. An extended version of the Arabic corpus developed by [30], containing 39,660 words with size of 457 KB,was used as the information pool to retrieve answers.The Arabic Khoja's stemmer [31] was adopted for keywords stemming. The system was tested thoroughly to insure correctness of obtained results. The following example illustrates the main steps carried out by the system to answer the Arabic factoid question: “when did Ibn Khaldun die?” 1) In JAWEB webpage, illustrated in Figure 6, the user typed the question in the textbox: Figure 6: Jaweb system input interface 2) The user clickedthe " " button,sent the question to the sever side, where the question analyzer received it. The following processes were done: Alg. 3 AnswerExtractor (potentialAnswer [],stemmedQuestionKeywords) Input:list of potential answers,stemmed question keywords Output: list of extracted answers ranked in descending order of similarity n = 0 // no. of potential answers Repeat stemmedAnswerKeywords = stemmer (potentialAnswer [n]) similarity = noOfSimilarWords (stemmedQuestionKeywords, stemmedAnswerkeywords) potentialAnswer [n]).similarity =similarity sortDescending (potentialAnswer [])
  • 9. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 19 a. Question tokenization b. Question keywords extraction c. Question keywords Stemming d. Answer type detection e. Extra keywords generation 3) The system located all answers that containany keyword or extra keyword from the corpus, as shown in Figure 7. Figure 7: Retrieved answer based on extra keyword 4) In each retrieved answer, a. Keywordswere stemmed. b. Similarity was calculatedby counting how many keywords are contained in each answer. c. Answers were ranked based onsimilarity. 5) The system displayed the answers in order, based on similarity. The answer with highest rank was placed at the top, as shown in Figure 8. Time
  • 10. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 20 Figure 8: View the answers based on the rank 5.PERFORMANCE EVLAUATION 6.1 Performance Measures The end objective of JAWEB is to develop a web-based QA system that answers Arabic factoid questions with high accuracy and speed. To evaluate this system,three performance measures were considered: • RECALL [32]: the ratio of the number of relevant sentences retrieved to the total number of relevant sentences in the corpus. It is usually expressed as a percentage and calculated as: RECALL= A/A+B *100% (1) Where A: number of retrieved relevant answers, B: number of relevant answer not retrieved • PRECISION[32]: the ratio of the number of relevant sentences retrieved to the total number of sentences retrieved. It is usually expressed as a percentage and calculated as: PRECISION= A/A+C *100% (2) Where A: number of retrieved relevant answers, C: number of retrieved irrelevant answers • RESPONSE TIME: the time from when a user hits the search button until aresponse is displayed on the browser. 6.2 System Performance We ran tens of experimentswith various type of factoid questions. The results of recall, precision and response time are presented in Figure9, Figure 10 and Figure 11. As illustrated in Figure 9, The RECALL of JAWEB was always an incredible 100%.This is because it was always able to return all relevant answers in the corpus successfully. This means that B, the number of relevant answer not retrieved, was maintained at zero.It is also important to
  • 11. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 21 note that the first retrieved answersof all questions were correct and accurate,which can be attributed to the use of the stemmer(in both question and answer modules), the extra keywords generatorand the similarity checker which has resulted in accurate ranking of retrieved answers. Figure 10 shows that the average PRECISIONwas 80%,aremarkable number, considering the limited size of our corpus. Even the “how much” questions, which had the least precision, had a percentage no lower than 70%. “What” and “how many”, the most straightforward questions, received the most precise answers. It should be noted these results are also highly dependent onthe amount of relevant and irrelevant information available in the corpus that relates to each question. In Figure 11, the averageRESPONSE TIMEis calculated in nanosecondsand presented for each type of questions. The average response time was 108.2 nanoseconds. “How much” questions consumed the longest response time, while questions starting with “who”took the least.This makes sense, as the latter requires less processing, since searching for people’s namesis usually straight-forward. However, the RESPONSE TIMEin general depends on the corpus size and the power of the server machine. Figure 9: Recallof different types of questions Figure 10: Precision of different types of questions 0 20 40 60 80 100 when where who what how much how many Recall(%) Question Type 0 20 40 60 80 100 when where who what how much how many Precision(%) Question Type
  • 12. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 22 Figure 11: Response time of different types of questions 6.3 Benchmarking To have clear insight into JAWEB performance, we used a well-established QA system that supports the Arabic Language, Ask.com [10], as a benchmark. Ask.com is a famous web-based question answering system with multilingual support. Itaccepts colloquially-expressed questions and retrieves hyperlinks to webpages that contain similar keywords to those in the questions. We ran several experiments with each of the supported type of factoid questions to get answers form Ask.com and JAWEB. Snapshots of answers to some example questions, listed in Table 3, are presented in Figure 12 –Figure 17. All figures illustrate that JAWEB has consistently given the right and most accurate answer as the first responseswhich can be attributed to the use of the stemmer (in both question and answer modules), the extra keywords generator and the similarity checker which has resulted in accurate ranking of retrieved answers . In contrast, Ask.com provided the correct answer in the second or third hyperlinks. Comparisons between Ask.com and JAWEB in terms ofRECALL, PRECISION and RESPONSE TIMEare depicted in the graphs in Figure18, Figure 19 and Figure 20, respectively.It is important to note that having no access to the knowledge base of Ask .com and in order to be able to calculate the RECALL and PRECISION, we have only considered the first five retrieved pages, which may slightly affect the accuracy of the experiments. Figure 18 shows thatRECALL of JAWEB was alwayshigher than that of Ask.comand maintained at 100%, as the formerhas successfully retrieved all relevant answers from the corpus. The JAWEBPRECISIONwas admittedly less than ask.com, as shown in Figure 19,but it should be noted that it performed as well as the famous website, scoring over 90% in precision, despite the large difference in corpus sizes. In addition, even at its worst, for a question type that ask.com had the lowest precision too, JAWEB was not less precise than 70%. Regardless, the precision can easily be improved as the project launches, when feedback from users is attained and the corpus is grown in size and capacity. We believe that this is clear evidence that JAWEB has great potential as a QA platform and is much needed by Arabic-speaking Internet users across the world. Naturally, ask.com wasfaster than our system,as illustrated in Figure 20, thanks to the use of high- power server CPUs for ask.com, as opposed toJAWEB 2.50 GHz. Again, if the project launches, the response time is expected to improve dramatically, when servers that are more powerful are provided. Table 3: Experimental questions No. Type Arabic English Q1 Who ‫؟‬ ‫ه‬ Who is Muhammad Tangier? Q2 When ‫؟‬ ‫د‬ ‫ا‬ ‫ا‬ ‫ا‬ ‫ت‬ When was Kingdom of Saudi Arabia united? Q3 What ‫؟‬ ‫ا‬ ‫ت‬ ‫ا‬ ‫ه‬ ‫ا‬ ‫ه‬ What are Egyptianpyramids? Q4 Where ‫د‬ ‫ا‬ ‫ا‬ ‫ا‬ ‫أ‬‫؟‬ Where is the Kingdom of Saudi Arabia located? Q5 How much ‫؟‬ ‫ر‬ ‫ا‬ ‫ة‬ ‫ا‬ ‫ارة‬ ‫در‬ ‫آ‬ How much is the temperature oftheEarth’scrust? Q6 How many ‫ض؟‬ ‫ا‬ ‫ن‬ ‫د‬ ‫آ‬ How many residents are there in Riyadh? 90 95 100 105 110 115 120 125 when where who what how much how many Responsetime(Nano sec.) Question Type
  • 13. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 23 Figure 12.a: JAWEB answers for a who question Figure 12.b: Ask answers for a who question
  • 14. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 24 Figure 13.a: JAWEB answers for a when question Figure 13.b: Ask.com Answers for when question
  • 15. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 25 Figure 14.a: JAWEB answers for a what question Figure 14.b: Ask answers for a what question
  • 16. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 26 Figure 15.a: JAWEB answers for where question Figure 15.b: Ask answers for a where question
  • 17. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 27 Figure 16.a: JAWEB answers for a how much question Figure 16.b: Ask answers for a how much question
  • 18. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 28 Figure 17.a: JAWEB answers for a how many question
  • 19. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 29 Figure 17.b: Ask answers for a how many question Figure 18: Comparison of the Recall 0 20 40 60 80 100 when where who what how much how many Recall(%) Question Type Ask JAWEB
  • 20. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 30 Figure 19: Comparison of the Precision Figure 20: Comparison of the performance-Time 6.CONCLUSION & FUTURE WORK JAWEB is a web-based Arabic question answering application system. It takes native Arabic questions from the end user and processes them in the server. The processing consists of three modules: question analysis, passage retrieval, and answer extraction.JAWEB finds and extracts accurate answers for the user. It also retrieves the most relevant potential answers and ranks them on the web page. The evaluation experiment shows the high recall performance of the system, which is attributable to stemming. In the future, we expect to improve user interaction to incorporate user feedback for more precise results. We also plan on using the Arabic Named Entity Recognition to provide more accurate answers. In addition, we will consider the "why" and "how" questions, which require deep linguistic and semantic analysis. 0 20 40 60 80 100 when where who What how much how many Precision(%) Question Type Ask JAWEB 0 20 40 60 80 100 120 when where who What how much how many ResponseTime(Microsec.) Question Type Ask JAWEB
  • 21. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 31 ACKNOWLEDGMENT We would like to thank Dr. Sman Bekhti and his colleagues fortheir valuable support and comments based on their Arabic QA systemAQUASYS. We would also like to thank Dr.Shereen Khoja for providing us with the Arabic stemmerpackage that she has developed. REFERENCES 1. R. Alshalabi “Experimenting with a Question Answering System for the Arabic Language”, Computers & the Humanities, Vol. 38, pp. 397-415, Nov 2004. 2. K. Arai and A. Handayani “Question Answering System for an Effective Collaborative Learning”, International Journal of Advanced Computer Science and Applications, Vol. 3, pp.60-64, 3. A. Allam and M. Haggag, “The Question Answering Systems: A Survey,” International Journal of Research and Reviews in Information Sciences (IJRRIS), Vol. 2, No. 3, September 2012. 4. D. Hai, L. Kosseim, “The Problem of Precision in Restricted-Domain Question Answering. Some Proposed Methods of Improvement”, In Proceedings of the ACL 2004 Workshop on Question Answering in Restricted Domains, Barcelona, Spain, Publisher of Association for Computational Linguistics, July 2004, PP.8-15. 5. A. Lamjiri, “A Syntactic Candidate Ranking Method for Answering Non-copulative Questions,” Ph.D. thesis, Concordia University, 2007. 6. P. Walke, S. Karale, “Implementation approaches for various categories of question answering system,” In Proceedings of IEEE Conference on Information & Communication Technologies (ICT), 2013, pp.402,407, 11-12 April 2013. 7. R. Alshalabi,“Pattern-based stemmer for finding Arabic roots,”Information Technology Journal, Vol. 4, pp. 38-43, 2005. 8. R. Grishman, B. Sundheim,“Message Understanding Conference-6: A Brief History,”In Proceedings ofInternational Conference on Computational Linguistics, 1996. 9. A. Borthwick,“A Maximum Entropy Approach to Named Entity Recognition”. PhD thesis, New York University, 1999. 10. Y. Benajiba, P. Rosso , J. Ruiz, “ANERsys: An Arabic Named Entity Recognition system based on Maximum Entropy”, In Proceedings of CICLing, 2007 11. M. Maamouri, A. Bies, T. Buckwalter, “The Penn Arabic Treebank: Building a Large-scale Annotated Arabic Corpus,”In Proceedings ofArabic Language Resources and Tools, 2004. 12. S. Mesfar,“Standard Arabic formalization and linguistic platform for its analysis,”In Proceedings of The Challenge of Arabic for NLP/MTConference, London, England, pp. 84-95, 2006. 13. A. Kabbaj,K. Bouzoubaa,“Amine Platform Page on SourceForge”, http://amine- platform.sourceforge.net/, checked Aug.2nd, 2013 14 .JM Gómez, D. Buscaldi, P. Rosso, E. Sanchis, “JIRS Language-Independent Passage Retrieval System A Comparative Study,”In Proceedings of 5th International Conference on Natural Language Processing, ICON-2007, Hyderabad, India, January 4-6, 2007 15. W. Black, S. Elkateb, H. Rodriguez, M. Alkhalifa, P. Vossen, A. Pease, C. Fellbaum“Introducing the Arabic WordNet project,” In Proceedings of3rd International WordNet Conference (GWC-06), 2006. 16. A. Ezzeldin, M. Shaheen “A Survey of Arabic question answering: challenges, tasks, approaches, tools, and future trends”, In Proceedings ofThe International Arab Conference on Information Technology, 2012, pp. 280-287. 17. O. Badawy, M. Shaheen and A. Hamadene “ARQA High-Performance Arabic Question Answering System”, In Proceedings ofArabic Language Technology International Conference, 2011, pp. 129- 136. 18. P.Brihaye,“Technical principles of the Morphological Analysis of AraMorph,” http://www.nongnu.org/aramorph/english/principles.html, checked Aug.2nd, 2013 19. H. Harmanani, W. Keirouz, S. Raheel, “A Rule-Based Extensible Stemmer for Information Retrieval with Application to Arabic,”International Arab Journal of Information Technology, Vol.3, No. 3, 2006 20. S. Kalaivani, K. Duraiswamy, “Comparison of Question Answering Systems Based on Ontology and Semantic Web in Different Environment,” Journal of Computer Science, Vol. 8, No. 9, 2012): 1407.
  • 22. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.2, April 2014 32 21. O. Ferret, B. Grau, M. Huraults-Plantet, “Finding an answer based on the recognition of the issue focus,”In Proceedings ofTREC-10. 22. D. Laurent, P. Séguéla, S. NègreCross,“Lingual Question Answering using QRISTAL for CLEF 2006, Lecture Notes in Computer Science, Vol. 4730, 2007, pp. 339-350. 23. S. Parthasarathy, J. Chen “A Web-based Question Answering System for Effective e-Learning,” In Proceedings ofIEEE International Conference on Advanced Learning Technologies, 2007, pp. 142- 146. 24. Ask.com, checked Aug. 2nd, 2013. 25. F. Mohammed, K. Nasser, H. Harb “A knowledge-based Arabic Question Answering System (AQAS),” In Proceedings ofACM SIGART Bulletin, 1993, pp. 21-33. 26. B. Hammo, H. Abu-Salem, S. Lytinen. “QARAB: A Question Answering System to Support the Arabic Language”. In Proceedings ofthe workshop on computational approaches to Semitic languages, 2002, pp. 55-65, Philadelphia. 27. Y. Benajiba, P. Rosso, A. Lyhyaoui “Implementation of the ArabiQA Question Answering System's components.”, In Proceedings ofWorkshop on Arabic Natural Language Processing, 2nd Information Communication Technologies Int. Symposium, ICTIS, 2007. 28. W. Brini, M. Ellouze, S. Mesfar, L. Belguith “An Arabic Question-Answering system for factoid questionns,”In Proceedings of IEEE International Conference on Natural Language Processing and Knowledge Engineering, 2009. 29. http://www.nooj4nlp.net/pages/introduction.html, checked may 6th, 2013. 30. S. Bekhti, A. Rehman, M. Al-Harbi, T. Saba “AQuASys: an Arabic Question Answering System Based on Extensive Question Analysis and Answer Relevance Scoring,” International Journal of Academic Research, Vol. 3, pp.45-54, July 2011. 31. S. Khoja, “Stemming Arabic Text,”" http://zeus.cs.pacificu.edu/shereen/research.htm, checked may 6th, 2013. 32. K. Arai, A. Handayani “Question Answering System for an Effective Collaborative Learning,”International Journal of Advanced Computer Science and Applications, Vol. 3, pp.60-64, 2012.