This document discusses the performance evaluation of a web search engine for the Tetum language based on user experience in Timor-Leste. A questionnaire was distributed to 133 Timorese internet users who use Tetum as their primary search language. Key findings include: 1) 76% of users search for academic information, 53% for news, and 29% for other topics; 2) 83% make queries using individual words while 17% use paragraphs; 3) search results are considered very or somewhat relevant by 45% of users; and 4) the top 10 search results match the query only 40-48% of the time and contain mixed languages for 54-90% of users.
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Performance Evaluation on the Web Search Engine to the Tetum Language
1. Performance Evaluation on the Web Search Engine
to the Tetum Language
Borja L. C. Patrocinio Antonino
Informatics Engineering Departments
Faculty of Engieneering, Science and Technology
Hera, Dili
Email: borja.antonino@untl.edu.tl
Aristidis de Jesus Ornai
Informatics Engineering Departments
Faculty of Engieneering, Science and Technology
Hera, Dili
Email:aristornai@gmail.com
Abstract—The major task of information retrieval is repre-
senting, storing, organizing, and offering access to information.
Since the internet has become a repository, we can search any
information among different languages, from funny contents to
religion articles. When searching information on the internet,
we use a web search engine with queries. While Timor-Leste has
official languages Portuguese and Tetum, some users may use the
other languages as queries which affect retrieval performance. In
this paper we present performance evaluation of a web search
engine for the Tetum language based on user experience.
Keywords—Information Retrieval, Web Search Engine, Perfor-
mance Evaluation, Tetum Language, User Experience
I. INTRODUCTION
Timor-Leste, currently categorized as a developing country,
is expected to have economic and education development in
the future. On the other side, dealing with languages is still one
of the main problems of Timor-Leste because we are suffering
from a lack of information to develop their own language.
Tetum is one of the official languages of Timor-Leste besides
the Portuguese language.
Tetum is an Austronesian language, like most indigenous
languages of the eastern part. Tetum adopted many words
derived from Portuguese and other common words from Malay
as well. Based on the result of census 2010, 53.4% of the
population in Timor-Leste can speak, read and write Tetum
instead of Portuguese and English[3]. As the official language
and the unity language, the development of Tetum is very
important to help the Timorese people to face the globalization.
One of the key to face the globalization is the internet. The
internet has become a vast repository of information. The
number of Timorese internet user is 340,000, among all the
people 1,200,000 (as of Jun, 2016) [1]. Note that this number
is based on Facebook penetration, which does not mean that
this number is the real Timorese internet user who accessed the
internet every day. The number of internet users who accessed
the internet except Facebook is only 14,030 (as of July, 2016)
[2]
Information retrieval is a way to discover information given a
query, especially in a database stored in a computer. Two main
approaches exist to find information: matching words in the
query against the database (keyword searching) and traversing
the database using hypertext or hypermedia links. A web
search engine is a program designed to help find information
stored on the World Wide Web (WWW). Search engines have
essentially massive full-text indexes of web pages. They also
have some categories of texts allowing users to more easily find
the set of texts of their interest. Furthermore, they regularly
update text indexes to operate searching quickly and efficiently.
As a result, search engines allow users to ask for information
(typically those containing a given word or phrase) and retrieve
a list of references which match the criteria[4].
One of the biggest problem for web search engines is the
quality of the indexes. How search engines retrieve texts
according to a given query directly affects the quality of search
results, making not only problems faced by search engines but
also those for users. To work effectively for search engines,
search engines should properly deal with a query given by
the users because the quality of results retrieved by the search
engine depends on it.
The objective of this research is to know the level of satis-
faction of Timorese users when using a web search engine to
search the information in their own language, also to know
how the web search engine should retrieves the result for a
given query including Tetum words. In this research, the author
distributed a questionnaire to Timorese users who use Tetum
as a primary language when searching information using web
search engines.
In the next section, we briefly describe about Tetum language
and the orthography about Tetum. In Section 3 we briefly
describe about information retrieval focusing on web search
engine, and performance evaluation technique. In Section 4 we
present the result, and finally conclusion appears in Section 5.
II. TETUM
The first Tetum, Tetum-Terique, had already established
itself as a lingua franca before the arrival of Portuguese,
apparently as a result of the conquest of the eastern part of
the island by the kingdom of Ue-Hali, the province of the
Belos, and the need for an instrument of communication for
trade. With the arrival of Portuguese, the contact between
the Timorese community and the Portuguese presence in the
mission was established in the administrative and commercial
fields, leading to adoption of new terms and concepts of
Portuguese to complete the gap of the lexicon of Tetum. After
that, the Jesuits of Soibada in 1938 had already translated a
part of the Bible to Tetum, and in 1916 the governor Filomeno
da Camara adopted the book Tetum-Portuguese Primer as an
official manual for learning in schools. In 1981 Tetum became
the official language of the Church in the Liturgical Acts.
Portuguese was the language for teaching and administration of
Portuguese Timor, while the Tetum Franca was the language
2. of social and commercial interaction of all ethnic-linguistic
groups. When Indonesian force invaded and occupied Timor-
Leste in 1975, and declared as the 27th province of the
Republic of Indonesia, banning the use of Portuguese. But
instead of adopting the Indonesian language as a liturgical
language, the Catholic Church opted for Tetum, making it a
pillar of cultural and national identity.
Currently, Tetum is the most spoken language in Timor-Leste.
Although Tetum Franca has regional and social variations,
today the use of Tetum has been broadened because almost
the all East Timorese can use and understand Tetum. As a
result, this Tetum Franca was adopted as an official language
with the designation of Tetum Official.
A. Ortography
The modern orthography of Tetum phonetically and largely
corresponds to the actual pronunciation of words [5]. Like
Portuguese, Tetum uses the Latin alphabet, distributed in 5
vowels: a, e, i, o, u and 20 consonants: b, d, f, g, h, j, k,
l, ll, m, n, ˜n, p, r, s, t, v, x, y, and z, in addition to the
apostrophe [’] as a tone. In Tetum, the letters c and q are
excluded. Furthermore, g, h, s and x do not always have the
same values as Portuguese. The INL (Instituto Nacional de
Linguistica) has employed the digraphs of Galician ll and ˜n
for the Portuguese lh and nh (for example: Toalha → Toalla,
Linha → Li˜na).
The accentuation in Tetum usually falls in the penultimate
syllable of each word, and in the irregular cases it falls in the
last or the antepenult syllable. The tonic syllable is marked
graphically with an acute accent on the last and the last
syllable. Words with a toned accent in the penultimate syllable
need not to be spelled with an accent. The circumcision accent
is not used[5]. Table 1 shows some examples of differences
between Tetum and Portuguese, with corresponding English
words.
TABLE I. EXAMPLE WORDS WITH ACCENTUATION IN LINGUA TETUM
[5]
Tetum Portuguese English
Matak [m´a-t´ak] Cru, Verde Cross, Green
Labarik [la-b´a-rik] Crianc¸a Child
Kaf´e Caf´e Coffee
Xuming´al Pastilha El´astica Bubble Gum
Portug´es Portuguˆes Portuguese
K´opia Cˆopia Copy
III. INFORMATION RETRIEVAL
The task of information retrieval is representing, storing,
organizing, and offering access to information. Information re-
trieval is different from data retrieval, which finds precise data
in databases with a given structure. In information retrieval
systems, information is not a structured data, but a free-form
text (web pages or other documents) or multimedia content [7].
The retrieval models are divided into classic models and struc-
tured models. In classic models, each document is described by
a set of representative keywords - also called indexing terms -
that represents the subject of the document and summarizes its
content in a meaningful way. In structured models, users can
specify, in addition to the keywords, some information about
the structure of the text (such as sections to be searched, fonts,
proximity of words, among other information) [7].
For users, the information retrieval process starts from a need
for information. Users provide, to an information retrieval
system, a query formulated from their need for information.
The system then compares the query with stored documents
[9]. The task of the system is to return to the users the
documents which satisfy the users’ need.
A. Web Search Engine Evaluation
A web search engine is one of the information retrieval
systems for the internet, that allows internet users to find
any information all over the world web. Nevertheless, the
information retrieval on the web is quite different from that in
the traditional indexed databases. This difference comes from
the distinctive qualities of the web such as the high degree of
dynamism, hyperlinked character, the absence of a controlled
indexing vocabulary, the heterogeneity of documents and au-
thoring styles, and the ease of use for different types of users.
Evaluation of search engines can be broadly classified into two
categories, namely testimonials and shootouts. Testimonials
are generally carried out by the trade press or by computer
industry organizations, which compare search engines based
on speed, ease of use, interface design or other features
that are readily visible to the users of the search engines.
Though testimonials help users with some useful information
in making decisions about which search engine should be to
use, they only indirectly suggest which search engine is the
most effective in retrieving relevant web pages.
Many studies have been reported in web search evaluation,
however, it is difficult to determine which method should be
used. In this work, after looking deeply on various technique
for evaluation, we have decide to use three of eight categories
presented from Rashid A. et Al [11] which are: Relevance of
search result to user query, Coverage of the web the search,
satisfaction of user with the search result, coverage of the web
the search.
IV. RESULT
We distributed the questionnaire via social media and
direct interview to the university student, especially who
used Tetum to search information on a web search engine.
This questionnaire consists of 11 questions and has 4 main
parts which are reflected to the 3 categories mentioned in
our introduction. The following shows the profile of the
respondent.
Profile of the Respondent
Student Academics Professional Total
Male 44 10 32 86
Female 29 2 16 47
Total 73 12 48 133
Students
54.89%
Academics
9.02%
Professionals
36.89%
3. Male
65%
Female
35%
We had collected 133 answers from Timorese internet users
specially who use Tetum to make a query on a search engine.
The table above shows that 65% of the answer was made by
male users and the rest was from female users. In addition,
36 users said they are professional workers, 55% are students,
and 9% are academicians.
How Often do You use Web Search Engine?
Student Academics Professional Total
Rarely 19 2 11 32
Sometimes 39 8 28 73
Often 15 2 9 36
Rarely
24%
Sometimes
56%
Often
20%
The table above showed that 56% of the usual internet user
in Timor Leste are sometimes using the Search Engine, 20%
often using the internet and only 24% of respondent said that
rarely using the search engine to search an information with
Tetum Language.
Which Search Engine do you use?
Student Academics Professional Total
Google 73 12 47 132
Bing 0 0 1 1
Google 99% Bing1%
The table above show, almost all the respondent or 99% of the
answers using a Google as their Search Engine, only 1% of
the answer using Bing.
A. Relevance of search result to user query
To know the relevance of search result to user query need
to know the intension of the user of using search engine, and
how the users querying the search engine, because a query is
not associated with a single document in a collection. On the
contrary, several documents are returned through a query. In
the following will show the result.
Type of Information Searched?
Student Academics Professional Total
News 31 6 34 71
Academis 59 11 31 101
Other 17 2 19 38
53%News
76%Academics
29%Other
10% 30% 50% 70% 90%
The table above show that 76% of users are looking for
Books or information related to the academic’s area, 53% users
are looking for news, and 29% looking for another thing for
example, about the Public Figure or any other information.
How you make a query?
Student Academics Professional Total
Word 63 10 36 109
Paragraph 10 2 12 24
Word
83%
Paragraph
17 %
The result show that 83% of users said they used word model
as a query to the search engine, and 17% other use paragraph
model.
Are you mixing with another language when you make a query?
Student Academics Professional Total
Tetum 9 3 19 31
Portuguese 41 6 18 65
English 29 6 25 60
Indonesian 34 4 15 52
37%Tetum
78%Portuguese
72%English
64%Indonesian
10% 30% 50% 70% 90%
The table above show that 37% users use only the Tetum
Language to make search, and 72% of users mix with the
English Language, 78% of users mix with the Portuguese
Language, and 64% of users mixing with the Indonesian
Language.
Are the search result is relevant with what you querying?
Student Academics Professional Total
Not Relevant 2 2 4 8
Somewhat 7 1 9 17
Ok 29 4 16 49
Relevant 21 2 14 37
Very Relevant 14 3 5 22
4. Not Relevant
5%
Somewhat
13%
Ok
37%
Relevant
28%
Very Relevant
17%
The table above show, the relevancy between user’s query
versus the result retrieved by the search engine, and the 17%
of users said Very Relevant, 28% said Relevant, 37% said Ok,
13% said Somewhat and 5% said Not Relevant.
B. Coverage of the web to the search
We try to resume in 2 question which can representing the
web search coverage on the User query, also about the retrieved
language.
The result of your searching is exactly showed in the first 10 Line?
Student Academics Professional Total
Never 6 2 5 13
Seldom 10 2 7 19
Someyimes 26 2 25 53
Often 23 4 10 37
Always 7 2 1 10
Never
10%
Seldom
14%Sometimes
40%
Often
28%
Always
8%
From the above result, we can see that 10% of users never
found the result related to given queries, among the first 10
articles shown by the search engine. Moreover, we also see
14% for seldom, 40% for sometimes, 28% for often, and 8%
for always, respectively.
Are the retrieved language is mixed with any other language
Student Academics Professional Total
Tetum 10 4 21 35
Portuguese 44 6 22 73
English 21 3 12 36
Indonesian 27 1 10 38
26%Tetum
54%Portuguese
27%English
29%Indonesian
10% 30% 50% 70% 90%
According to the last result, more than 90% users could not
always find the exact answer from the most relevant 10
articles in the result showed by the search engine. Therefore,
we asked them whether the results retrieved by the search
engine contained more than two languages. The result in the
table above shows that only 26% of the user said all the
documents retrieved were written in Tetum, while 54% said
the results were mixed with Portuguese, 27% said the results
had English articles, and 29% said the results were mixed
with Indonesian.
C. Satisfaction of the Tetum Language Users to The Web
Search Engine
The last question in this questionnaire is about overall sat-
isfaction on the web search engine, when retrieving documents
or information with Tetum.
Are You satisfied to the performance off web search engine to the Tetum Language?
Student Academics Professional Total
Very Dissatisfied 9 2 10 8
Dissatisfied 10 1 8 17
Neither 22 4 15 49
Satisfied 15 2 10 37
Very Satisfied 17 3 5 22
Very Dissatisfied
16%
Dissatisfied
14%
Neither
31%
Satisfied
20%
Very Satisfied
19%
The table above shows that 16% of Tetum users were very
dissatisfied with the performance of Google search engine,
and 14% were also dissatisfied. On the other hand, 31% users
judged fairly, 20% were satisfied, and 19% were very satisfied.
V. CONCLUSION
A web search engine is one of the methods that provides
everyone any information. From this study the author found
that the Google search engine is the most used search engine
for Tetum users and more than a half respondent satisfied on
the performance of the Google search engine. However, as
mentioned in the beginning, Tetum still has suffered from the
lack of information about the language structure, and a small
number of documents indexed on the internet.
This work investigated impact factors of retrieval performance.
It is found that queries and retrieval results are mixed with
Tetum and 3 other languages (Portuguese, English, and Indone-
sian). The language mixed queries may degrade search results,
and we found in this research that the search results mixed with
some languages made disappointment of users; more than a
half of respondent could not exactly find documents related to
queries among the first 10 articles of the search results.
5. VI. ACKNOWLEDGMENT
The authors would like to thank to the Satoshi Tamura,PhD.
and Hidekazu Fukai,PhD. for all the support during the process
of this research.
REFERENCES
[1] Internet World Stats - Usage and Population Statistic,
http://www.internetworldstats.com/asia.htm#tp
[2] Timor-Leste Internet Users,
http://www.internetlivestats.com/internet-users/timor-leste/
[3] General Directorate of Statistic Timor Leste Population and Housing
Census 2010, Analytical Report on Education (Volume 9) , p. 40-42
[4] Pourmir R. M. Introduction to Web Search Engine. Journal of Novel
Applied Sciences, pp. 724–748, 1987-3-7, ISSN 2322-5149 c 2014
JNAS
[5] Hull G. Manual de Lingua Tetum para Timor Leste. (Portugues) Manual
for the Tetum Language of Timor Leste Sebastiao Aparicio da Silva
Project for the Protection and Promotion of East Timorese Language,
pp. 4-5, 2004
[6] Antonino, B. L. C. P. An Information Retrieval System for Tetum
Language. Master Course Thesis, Informatics Departments, University
of Evora, Portugal, April 2013.
[7] Blair, D. C., Maron, Full text Information Retrieval: Further Analysis
and Clarification. Information Processing and Management, pp. 437-447,
1990.
[8] Inkpen, D. Information Retrieval on the Internet. Professor, University
of Ottawa, pp. 2.
[9] Manning, D. C., Raghavan, P., Schutze, H An Introduction to Informa-
tion Retrieva. Cambridge University Press, Cambridge, England, Online
edition c 2009 pp. 152-154.
[10] Ohtsuka, T., Eguchi, K., Yamana, H. An Evaluation Method of Web
Search Engine based on User’s Sense. Working Notes of NTCIR-4
c 2004 National Institute of Informatics, Tokyo, Japan.
[11] Ali, R., Sufyan Beg, M. M. An overview of Web Search Evaluation
Method. Journal Computer and Electrical Engineering, Volume 3 Issue
6, November, 2011, pp. 835-848, Pergamon Press, Inc. Tarrytown, NY,
USA.
[12] Harter S. P., Hert C. A. Evaluation on Information Retrieval Systems:
approaches, issues and methods. Annual Rev Information Sci Technol
1997; 32: 3 – 79.
[13] Cleverdon C. W., Mills J., Keen E. M. Factors affecting the performance
of indexing systems. ASLIB, Cranfield Research Project, Volume 2, pp.
37-59, Bedford: UK; 1966.
[14] Chignell, M. H., Gwizdka, J., Bodner, R, C. Discriminating meta-
search: a framework for evaluation. Inf. Process. Manage. 1999;
35:337:63.
[15] B¨uttcher, S., Clarke, C. L. A., Cormack, G. V., Information Retrieval:
Implementing and Evaluating Search Engine. The MIT Press, Cam-
bridge, Massachusetts, London , England c 2010, ISBN: 978-0-262-
528887-0