SlideShare a Scribd company logo
1 of 7
Download to read offline
PRE-RANKING DOCUMENTS
VALORIZATION IN THE INFORMATION
RETRIEVAL PROCESS
Chkiwa Mounira1, Jedidi Anis1 and Faiez Gargouri1
1

Multimedia, InfoRmation systems and Advanced Computing Laboratory
Sfax University, Tunisia
m.chkiwa@gmail.com, jedidianis@gmail.com, faiez.gargouri@isimsf.rnu.tn

ABSTRACT
In this short paper we present three methods to valorise score relevance of some documents
basing on their characteristics in order to enhance their ranking. Our framework is an
information retrieval system dedicated to children. The valorisation methods aim to increase the
relevance score of some documents by an additional value which is proportional to the number
of multimedia objects included, the number of objects linked to the user particulars and the
included topics. All of the three valorization methods use fuzzy rules to identify the valorization
value.

KEYWORDS
Information Retrieval, Pre-ranking valorization, Fuzzy Logic, Fuzzy Rules

1. INTRODUCTION
In the information retrieval process, younger users have particularities concerning what they
really need in results. In this paper we study those particularities in order to have maximum
coverage of relevant documents. To do it, we present the Pre-ranking Documents Valorization
which aims to increase some documents relevance score by an additional value in order to
enhance their ranking. In order to make the additional value be proportional to the document
characteristics we use fuzzy logic principles. The pre-ranking document valorization takes place
after running the querying process which finds the relevant documents. Our framework
materializes collaboration between two axes: the Semantic Web [1 and 2] and the Fuzzy Logic [3,
4 and 5] .We use semantic web technologies as RDF [6 and 7] to annotate semantically our
collection of web documents and SPARQL [8] for querying the annotation. Also, we use Fuzzy
rules to find the exact value added to a score relevance in order to enhance the ranking of the
correspondent document. The rest of the paper is organized as follows: in section 2 we introduce
some preliminaries about our framework which is an information retrieval system dedicated for
children. Section 3 we present the pre-ranking document valorization by explaining its three
types. Finally section 4 concludes the paper.

David C. Wyld et al. (Eds) : CST, ITCS, JSE, SIP, ARIA, DMS - 2014
pp. 353–359, 2014. © CS & IT-CSCP 2014

DOI : 10.5121/csit.2014.4132
354

Computer Science & Information Technology (CS & IT)

2. FRAMEWORK
To ensure a high quality of use, available information retrieval systems dedicated for children
have common highlighted facts:
−

−
−
−

Security: Since the web covers a large amount of uncontrollable data, security represents the
first factor taken into account to create a safe information retrieval system dedicated for
children.
Design: it represents the main visual factor to call users attention.
Profile: it represents the common way used to personalize a search process taking into
account the user’s personal interests.
Querying: this factor makes the difference between information search engines even if it
follows outwardly the same demarche: the annotation, the query/document matching, the
ranking and the result display.

In addition of considering the cited facts, our particularity resides in the “pre-ranking document
valorisation” which is localized in the figure below.
User
query Q

Querying process
Search information
interface

relevant
documents

1. Query/ document matching => score relevance RS
2. pre-ranking document valorization => added value V to RS
3. ranking process

Figure 1. Our querying particularity

In order to explore young web users searching interests, we do a survey covering 723 children
between 8 and 15 years. The results show that more than 75% of them prefer results rich by
multimedia objects (especially pictures and video sequences). Likewise, more than 70% of
surveyed children want to see in returned results information dealing with their own particulars
(own country, avenue, friends …) especially with the success of social networks use. In the next
section we present two flexible methods that give the priority to relevant results rich by
multimedia objects and/or dealing with the current user particulars. In the other side, we consider
the fact that a returned document may be exhaustive and deal with different topics; this fact could
misplace a user attention especially if it is a child. We propose therefore a method that avoid
“multi-topic” documents and make priority to documents focusing only on query main topic. The
three pre-ranking valorization methods are submitted to fuzzy rules which mainly use variables
representing the user ages; Figure 2 shows the membership function representing web young
user’s age as input set which ranges from 3 to 14
Computer Science & Information Technology (CS & IT)

355

1.2
preschool

1

main childhood

preteen

0.8
0.6
0.4
0.2
0
1

2

3

4

5

6

7

8

9

10 11 12 13 14 15

Figure 2. Fuzzy sets representing the different childhood periods of young web users.

3. PRE-RANKING DOCUMENT VALORIZATION
After running a classical query/document process, we get the score relevance of each document to
the query. The pre-ranking document valorization aims to increase score relevance of relevant
documents depending on their particularities. It’s much easier and time-saver for the system to
handle only relevant documents and not all the collection documents. For that, we choose to not
run the document valorization while the querying process but after it. We define three types of
document valorization: the “multimedia richer” document valorization, the “Nearest personal
information” document valorization and “same-topic” document valorization. All of the three
types of the pre-ranking document valorization are submitted to the application of fuzzy rules.

3.1. “Multimedia Richer” Document Valorization
Given that younger user are interested more in multimedia objects (images, video sequences …)
while the information retrieval process. The “multimedia richer” document valorization aims to
increase the relevance score of relevant document including more multimedia objects. This fact is
submitted to fuzzy rules aiming to decide the value added V to score relevance proportionally to
the user age UA and the Number of Multimedia Objects in the Document NMOD.

─ If UA is in preschool period and NMOD is high then V is high.
─ If UA is in main childhood period and NMOD is high then V is medium.
─ If UA is in preteen period and NMOD is high or medium then V is low.
The Number of Multimedia Objects in the Document NMOD is an input set represented by
triangular membership functions which ranges from Min to Max (see figure 3). Min and Max are
variables representing respectively the minimum number of multimedia objects included in a
document in the collection and the maximum number of multimedia objects included in a
document in the collection.
356

Computer Science & Information Technology (CS & IT)
1.2
Low

1

medium

high

0.8
0.6
0.4
0.2
NMOD

0
Min

(min+max)*0,25

(min+max)*0,5

(min+max)*0,75

Max

Figure 3. Fuzzy sets representing the variation range of NMOD.

3.2. “Nearest Personal Information” Document Valorization
As we mention before, more than 70% of surveyed children want to see in returned results
information dealing with their own particulars (own country, own avenue, own school, friends
…). The idea is to familiarize the information searching concept to children through the
maximum coverage of their own particular in the information searching process. Referring to this
observation, we suppose that results will be considered relevant if they include some personal
data. The nearest personal information document valorization exploits the user age UA and the
Number of Personal information Items found in a Document NPID as numerical inputs of the
fuzzification operation submitted to the fuzzy rules. A personal information item found in a
document may be a piece of text, a picture, or any multimedia object dealing with the current user
particularities. The fuzzy rules listed below make decision about the value V added to a document
relevance score in order to valorize documents dealing with user personal information:
─ If UA is in main childhood period and NPID is high then V is medium.
─ If UA in preteen period and NPID is high then V is high.
─ If UA in preteen period and NPID is medium or low then V is low.
As the NMOD variable, The NPID is an input set represented by triangular membership functions
which ranges from 0 to Max (see figure 4). Max is a variable representing the maximum number
of personal information items concerning a user found in a document of the collection and 0
means that a document didn’t include any information items concerning the current user.
1.2

Low

medium

high

1
0.8
0.6
0.4
0.2
NPID

0
0

max*0,25

max*0,5

max*0,75

max

Figure 4. Fuzzy sets representing the variation range of NPID.
Computer Science & Information Technology (CS & IT)

357

3.3. “Same-Topic” Document Valorization
A “Same-topic” document valorization aims to increase relevance score of document referring to
the topics mentioned therein. The Figure 5 gives a structural view of this type of document
valorization. The meta-document is introduced in [9] and it is able to annotate multimedia objects
as well as web documents in a way that ensures its reusability. The querying process matches the
user query with the meta-documents in order to identify the score relevance of the document to
the query. We define the “topic cloud” as groups of weighted terms concerning a given topic.
Simply, we collect potential terms representing a given topic to construct a topic cloud. The
terms’ weights express the ability of each term to represent the topic. After running a usual
querying process matching the query and the meta-documents, we get the relevance score for
each annotated resource or document. At this point, the topic clouds are used to enhance ranking
results in the benefit of relevant documents focusing mainly on query interests.
WD1T1

Document
1

MetaDocument 1

topic
1

Document
2
.
.
.
.
.
.
.
.
.
.

MetaDocument 2
.
.
.
.
.
.
.
.
.
.

Topic
2
.
.
.
.
.
.
.
.
.
.

Document
n

MetaDocument n

Topic
m

WQT1
Query Q
WQT2

Figure 5. The “Same-topic” document valorization.

To run the “Same-topic” document valorization, first, we establish the meta-document/topic
weighted links WDT. WDT expresses the potential topics mentioned by the document. To assign a
weight WDT to a meta-document/topic link, we simply sum the weights of topic terms existing in
the meta-document. Then we establish query/topics weighted links which express the ability of
each topic to represent the query. To assign a weight WQT to a Query/Topic link, we use the
classic similarity measure between two weighted terms vectors:
W୕୘ = simሺQ, Tiሻ =

෍

౪

౪

ౠసభ

୛౧ౠ ∗ ୛౪౟ౠ

ሺ୛౧ౠ ሻమ ∗ ෍

ඨ෍

ౠసభ

౪

ሺ୛౪౟ౠ ሻమ

(1)

ౠసభ

The next step of the “Same topic” document valorization is to calculate for each document his
topic similarity relative to the query in order to increase or decrease its relevance score in terms
of the value of the topic similarity. The topic similarity TS is calculated as follow:
୩

TS ሺQ, Dሻ = ෍

หW୕୘౟ − Wୈ୘౟ ห

୧ୀଵ

(2)

The main goal of a “Same topic” document valorization is to increase relevance of documents
focusing on the same query topics and valorize “mono-topic” documents to users at an early age
in order to facilitate its comprehension. The TS (Q, D) value is optimal when its value is minimal;
this means that the query and the document are focusing on the same topics with approached
values. Contrariwise, if the TS is high, this means that the document deals with other topics in
addition to the query topics. Finally, the increase value V affected to a document Relevance Score
RS is based on the following fuzzy rules:
─ If UA is in (pre-school or main childhood period) and TS is low then V is high
─ If UA is in preteen period and TS is low or medium then V is medium
358

Computer Science & Information Technology (CS & IT)

Remains to mention that this valorization method is inspired from our previous work [10], except
that we exclude the rule that decrease relevance score of documents having a high TS value. This
exclusion allows to not penalizing some document because of the diversity of topics included
therein. Also, we include the UA variable in this valorization method instead of score relevance
RS. The TS variable is an input set represented by triangular membership functions which ranges
from 0 to Max (see figure 6). Maximum TS means that the current document deals with several
topics in addition to the query topics and a null TS means that the document and the query are
dealing with the same topics.
1.2

Low

medium

1
0.8
0.6
0.4
0.2
TS

0
0

max*0,25

max*0,5

max*0,75

max

Figure 6. Fuzzy sets representing the variation range of TS.
After the application of valorization methods separately to a document Di we sum the eventually
deduced values to get the value Vi which is added to the document score relevance in order to get
the new score relevance. After applying the valorization process to the relevant documents set
found in the querying process, we pass finally to the ranking procedure. Table 1 summarizes the
document valorization process.
Table 1. Recapitulation of the pre-ranking document valorization process.
Valorization method
Multimedia Richer
Nearest Personal Information
Same-Topic
Valorization process

Inputs variables
UA
NMOD

output
v1

NPID

UA

v2

TS

UA

v3

Document Di

Vi=∑ଷ v୩
௞ୀଵ

4. CONCLUSION
In this paper, we present three methods to valorize score relevance of some documents depending
on their characteristics concerning the multimedia included objects, the user particulars and the
topics mentioned therein. In this work, we use fuzzy rules to define in a flexible way the value
added to the score relevance of valorized documents. Our framework is under development and it
represents an information retrieval system dedicated for kids. We are working in the short-term
on the identification of the relevant range and shape of the membership function representing the
added value V usable on the three valorization methods.

REFERENCES
[1]
[2]

Tim Berners-Lee, James Hendler and Ora Lassila the Semantic Web. Scientific American: Feature
Article: The Semantic Web: May 2001
W3C. Semantic Web http://www.w3.org/standards/semanticweb/
Computer Science & Information Technology (CS & IT)
[3]

359

Zadeh, L.A.: Fuzzy sets. Information and Control (1965) 338–353
http://www-bisc.cs.berkeley.edu/Zadeh-1965.pdf
[4] Lotfi A. Zadeh: Knowledge Representation in Fuzzy Logic. IEEE Trans. Knowl. Data Eng. 1(1): 89100 (1989)
[5] Lotfi A. Zadeh: A Summary and Update of "Fuzzy Logic". GrC 2010: 42-44
[6] Manola, F. Miller, E., Beckett, D. and Herman, I. 2007. RDF Primer
http://www.w3.org/2007/02/turtle/primer/
[7] RDF Working Group. Resource Description Framework (RDF) http://www.w3.org/RDF/
[8] Eric Prud'hommeaux, W3C. SPARQL Query Language for RDF W3C Recommendation 15 January
2008 http://www.w3.org/TR/rdf-sparql-query/
[9] Jedidi A. (July 2005). « modélisation générique de documents multimédia par des métadonnées :
mécanismes d’annotation et d’interrogation » Thesis of « Université TOULOUSE III Paul Sabatier »,
France.
[10] Chkiwa, M., Jedidi, A., Gargouri, F. (October 2013). Handling Uncertainty in Semantic Information
Retrieval Process. URSW 2013: 29-33, Sydney Australia

More Related Content

What's hot

Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...IJERA Editor
 
Automatic detection of online abuse and analysis of problematic users in wiki...
Automatic detection of online abuse and analysis of problematic users in wiki...Automatic detection of online abuse and analysis of problematic users in wiki...
Automatic detection of online abuse and analysis of problematic users in wiki...Melissa Moody
 
Data Collection Methods for Building a Free Response Training Simulation
Data Collection Methods for Building a Free Response Training SimulationData Collection Methods for Building a Free Response Training Simulation
Data Collection Methods for Building a Free Response Training SimulationMelissa Moody
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notesBAIRAVI T
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
An effective pre processing algorithm for information retrieval systems
An effective pre processing algorithm for information retrieval systemsAn effective pre processing algorithm for information retrieval systems
An effective pre processing algorithm for information retrieval systemsijdms
 
Bioinformatioc: Information Retrieval
Bioinformatioc: Information RetrievalBioinformatioc: Information Retrieval
Bioinformatioc: Information RetrievalDr. Rupak Chakravarty
 
STATS415-Final_report
STATS415-Final_reportSTATS415-Final_report
STATS415-Final_reportYilei Zhang
 
SOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENT
SOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENTSOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENT
SOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENTARAVINDRM2
 
Lectures 1,2,3
Lectures 1,2,3Lectures 1,2,3
Lectures 1,2,3alaa223
 
Information Retrieval on Text using Concept Similarity
Information Retrieval on Text using Concept SimilarityInformation Retrieval on Text using Concept Similarity
Information Retrieval on Text using Concept Similarityrahulmonikasharma
 
2. an efficient approach for web query preprocessing edit sat
2. an efficient approach for web query preprocessing edit sat2. an efficient approach for web query preprocessing edit sat
2. an efficient approach for web query preprocessing edit satIAESIJEECS
 

What's hot (19)

Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
 
Automatic detection of online abuse and analysis of problematic users in wiki...
Automatic detection of online abuse and analysis of problematic users in wiki...Automatic detection of online abuse and analysis of problematic users in wiki...
Automatic detection of online abuse and analysis of problematic users in wiki...
 
Data Collection Methods for Building a Free Response Training Simulation
Data Collection Methods for Building a Free Response Training SimulationData Collection Methods for Building a Free Response Training Simulation
Data Collection Methods for Building a Free Response Training Simulation
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
 
Introduction abstract
Introduction abstractIntroduction abstract
Introduction abstract
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
Mam assign
Mam assignMam assign
Mam assign
 
An effective pre processing algorithm for information retrieval systems
An effective pre processing algorithm for information retrieval systemsAn effective pre processing algorithm for information retrieval systems
An effective pre processing algorithm for information retrieval systems
 
Bioinformatioc: Information Retrieval
Bioinformatioc: Information RetrievalBioinformatioc: Information Retrieval
Bioinformatioc: Information Retrieval
 
Lec 2
Lec 2Lec 2
Lec 2
 
Lec1,2
Lec1,2Lec1,2
Lec1,2
 
STATS415-Final_report
STATS415-Final_reportSTATS415-Final_report
STATS415-Final_report
 
Hu3414421448
Hu3414421448Hu3414421448
Hu3414421448
 
Ijetcas14 446
Ijetcas14 446Ijetcas14 446
Ijetcas14 446
 
SOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENT
SOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENTSOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENT
SOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENT
 
Lectures 1,2,3
Lectures 1,2,3Lectures 1,2,3
Lectures 1,2,3
 
Information Retrieval on Text using Concept Similarity
Information Retrieval on Text using Concept SimilarityInformation Retrieval on Text using Concept Similarity
Information Retrieval on Text using Concept Similarity
 
2. an efficient approach for web query preprocessing edit sat
2. an efficient approach for web query preprocessing edit sat2. an efficient approach for web query preprocessing edit sat
2. an efficient approach for web query preprocessing edit sat
 
Kp3518241828
Kp3518241828Kp3518241828
Kp3518241828
 

Similar to PRE-RANKING DOCUMENTS VALORIZATION IN THE INFORMATION RETRIEVAL PROCESS

Characterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining TechniquesCharacterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining TechniquesIJTET Journal
 
Evidence Based Healthcare Design
Evidence Based Healthcare DesignEvidence Based Healthcare Design
Evidence Based Healthcare DesignCarmen Martin
 
IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...IRJET Journal
 
Designing a Survey Study to Measure the Diversity of Digital Learners
Designing a Survey Study to Measure the Diversity of Digital LearnersDesigning a Survey Study to Measure the Diversity of Digital Learners
Designing a Survey Study to Measure the Diversity of Digital LearnersIJITE
 
Designing a Survey Study to Measure the Diversity of Digital Learners
Designing a Survey Study to Measure the Diversity of Digital Learners Designing a Survey Study to Measure the Diversity of Digital Learners
Designing a Survey Study to Measure the Diversity of Digital Learners IJITE
 
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsProjection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsIRJET Journal
 
Final review m score
Final review m scoreFinal review m score
Final review m scoreazhar4010
 
Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsIOSR Journals
 
Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesKaran Deep Singh
 
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata PrivacyTwo-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacydbpublications
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining14894
 
Meliorating usable document density for online event detection
Meliorating usable document density for online event detectionMeliorating usable document density for online event detection
Meliorating usable document density for online event detectionIJICTJOURNAL
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfDr. Radhey Shyam
 
Big Data: Privacy and Security Aspects
Big Data: Privacy and Security AspectsBig Data: Privacy and Security Aspects
Big Data: Privacy and Security AspectsIRJET Journal
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleDr. Radhey Shyam
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applicationsSubrat Swain
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWieijjournal
 
Research in Big Data - An Overview
Research in Big Data - An OverviewResearch in Big Data - An Overview
Research in Big Data - An Overviewieijjournal
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWieijjournal1
 

Similar to PRE-RANKING DOCUMENTS VALORIZATION IN THE INFORMATION RETRIEVAL PROCESS (20)

Characterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining TechniquesCharacterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining Techniques
 
Evidence Based Healthcare Design
Evidence Based Healthcare DesignEvidence Based Healthcare Design
Evidence Based Healthcare Design
 
IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
 
Designing a Survey Study to Measure the Diversity of Digital Learners
Designing a Survey Study to Measure the Diversity of Digital LearnersDesigning a Survey Study to Measure the Diversity of Digital Learners
Designing a Survey Study to Measure the Diversity of Digital Learners
 
Designing a Survey Study to Measure the Diversity of Digital Learners
Designing a Survey Study to Measure the Diversity of Digital Learners Designing a Survey Study to Measure the Diversity of Digital Learners
Designing a Survey Study to Measure the Diversity of Digital Learners
 
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsProjection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
 
Final review m score
Final review m scoreFinal review m score
Final review m score
 
Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logs
 
Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and Issues
 
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata PrivacyTwo-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
Meliorating usable document density for online event detection
Meliorating usable document density for online event detectionMeliorating usable document density for online event detection
Meliorating usable document density for online event detection
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
 
Big Data: Privacy and Security Aspects
Big Data: Privacy and Security AspectsBig Data: Privacy and Security Aspects
Big Data: Privacy and Security Aspects
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applications
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 
Research in Big Data - An Overview
Research in Big Data - An OverviewResearch in Big Data - An Overview
Research in Big Data - An Overview
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 

Recently uploaded

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

PRE-RANKING DOCUMENTS VALORIZATION IN THE INFORMATION RETRIEVAL PROCESS

  • 1. PRE-RANKING DOCUMENTS VALORIZATION IN THE INFORMATION RETRIEVAL PROCESS Chkiwa Mounira1, Jedidi Anis1 and Faiez Gargouri1 1 Multimedia, InfoRmation systems and Advanced Computing Laboratory Sfax University, Tunisia m.chkiwa@gmail.com, jedidianis@gmail.com, faiez.gargouri@isimsf.rnu.tn ABSTRACT In this short paper we present three methods to valorise score relevance of some documents basing on their characteristics in order to enhance their ranking. Our framework is an information retrieval system dedicated to children. The valorisation methods aim to increase the relevance score of some documents by an additional value which is proportional to the number of multimedia objects included, the number of objects linked to the user particulars and the included topics. All of the three valorization methods use fuzzy rules to identify the valorization value. KEYWORDS Information Retrieval, Pre-ranking valorization, Fuzzy Logic, Fuzzy Rules 1. INTRODUCTION In the information retrieval process, younger users have particularities concerning what they really need in results. In this paper we study those particularities in order to have maximum coverage of relevant documents. To do it, we present the Pre-ranking Documents Valorization which aims to increase some documents relevance score by an additional value in order to enhance their ranking. In order to make the additional value be proportional to the document characteristics we use fuzzy logic principles. The pre-ranking document valorization takes place after running the querying process which finds the relevant documents. Our framework materializes collaboration between two axes: the Semantic Web [1 and 2] and the Fuzzy Logic [3, 4 and 5] .We use semantic web technologies as RDF [6 and 7] to annotate semantically our collection of web documents and SPARQL [8] for querying the annotation. Also, we use Fuzzy rules to find the exact value added to a score relevance in order to enhance the ranking of the correspondent document. The rest of the paper is organized as follows: in section 2 we introduce some preliminaries about our framework which is an information retrieval system dedicated for children. Section 3 we present the pre-ranking document valorization by explaining its three types. Finally section 4 concludes the paper. David C. Wyld et al. (Eds) : CST, ITCS, JSE, SIP, ARIA, DMS - 2014 pp. 353–359, 2014. © CS & IT-CSCP 2014 DOI : 10.5121/csit.2014.4132
  • 2. 354 Computer Science & Information Technology (CS & IT) 2. FRAMEWORK To ensure a high quality of use, available information retrieval systems dedicated for children have common highlighted facts: − − − − Security: Since the web covers a large amount of uncontrollable data, security represents the first factor taken into account to create a safe information retrieval system dedicated for children. Design: it represents the main visual factor to call users attention. Profile: it represents the common way used to personalize a search process taking into account the user’s personal interests. Querying: this factor makes the difference between information search engines even if it follows outwardly the same demarche: the annotation, the query/document matching, the ranking and the result display. In addition of considering the cited facts, our particularity resides in the “pre-ranking document valorisation” which is localized in the figure below. User query Q Querying process Search information interface relevant documents 1. Query/ document matching => score relevance RS 2. pre-ranking document valorization => added value V to RS 3. ranking process Figure 1. Our querying particularity In order to explore young web users searching interests, we do a survey covering 723 children between 8 and 15 years. The results show that more than 75% of them prefer results rich by multimedia objects (especially pictures and video sequences). Likewise, more than 70% of surveyed children want to see in returned results information dealing with their own particulars (own country, avenue, friends …) especially with the success of social networks use. In the next section we present two flexible methods that give the priority to relevant results rich by multimedia objects and/or dealing with the current user particulars. In the other side, we consider the fact that a returned document may be exhaustive and deal with different topics; this fact could misplace a user attention especially if it is a child. We propose therefore a method that avoid “multi-topic” documents and make priority to documents focusing only on query main topic. The three pre-ranking valorization methods are submitted to fuzzy rules which mainly use variables representing the user ages; Figure 2 shows the membership function representing web young user’s age as input set which ranges from 3 to 14
  • 3. Computer Science & Information Technology (CS & IT) 355 1.2 preschool 1 main childhood preteen 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Figure 2. Fuzzy sets representing the different childhood periods of young web users. 3. PRE-RANKING DOCUMENT VALORIZATION After running a classical query/document process, we get the score relevance of each document to the query. The pre-ranking document valorization aims to increase score relevance of relevant documents depending on their particularities. It’s much easier and time-saver for the system to handle only relevant documents and not all the collection documents. For that, we choose to not run the document valorization while the querying process but after it. We define three types of document valorization: the “multimedia richer” document valorization, the “Nearest personal information” document valorization and “same-topic” document valorization. All of the three types of the pre-ranking document valorization are submitted to the application of fuzzy rules. 3.1. “Multimedia Richer” Document Valorization Given that younger user are interested more in multimedia objects (images, video sequences …) while the information retrieval process. The “multimedia richer” document valorization aims to increase the relevance score of relevant document including more multimedia objects. This fact is submitted to fuzzy rules aiming to decide the value added V to score relevance proportionally to the user age UA and the Number of Multimedia Objects in the Document NMOD. ─ If UA is in preschool period and NMOD is high then V is high. ─ If UA is in main childhood period and NMOD is high then V is medium. ─ If UA is in preteen period and NMOD is high or medium then V is low. The Number of Multimedia Objects in the Document NMOD is an input set represented by triangular membership functions which ranges from Min to Max (see figure 3). Min and Max are variables representing respectively the minimum number of multimedia objects included in a document in the collection and the maximum number of multimedia objects included in a document in the collection.
  • 4. 356 Computer Science & Information Technology (CS & IT) 1.2 Low 1 medium high 0.8 0.6 0.4 0.2 NMOD 0 Min (min+max)*0,25 (min+max)*0,5 (min+max)*0,75 Max Figure 3. Fuzzy sets representing the variation range of NMOD. 3.2. “Nearest Personal Information” Document Valorization As we mention before, more than 70% of surveyed children want to see in returned results information dealing with their own particulars (own country, own avenue, own school, friends …). The idea is to familiarize the information searching concept to children through the maximum coverage of their own particular in the information searching process. Referring to this observation, we suppose that results will be considered relevant if they include some personal data. The nearest personal information document valorization exploits the user age UA and the Number of Personal information Items found in a Document NPID as numerical inputs of the fuzzification operation submitted to the fuzzy rules. A personal information item found in a document may be a piece of text, a picture, or any multimedia object dealing with the current user particularities. The fuzzy rules listed below make decision about the value V added to a document relevance score in order to valorize documents dealing with user personal information: ─ If UA is in main childhood period and NPID is high then V is medium. ─ If UA in preteen period and NPID is high then V is high. ─ If UA in preteen period and NPID is medium or low then V is low. As the NMOD variable, The NPID is an input set represented by triangular membership functions which ranges from 0 to Max (see figure 4). Max is a variable representing the maximum number of personal information items concerning a user found in a document of the collection and 0 means that a document didn’t include any information items concerning the current user. 1.2 Low medium high 1 0.8 0.6 0.4 0.2 NPID 0 0 max*0,25 max*0,5 max*0,75 max Figure 4. Fuzzy sets representing the variation range of NPID.
  • 5. Computer Science & Information Technology (CS & IT) 357 3.3. “Same-Topic” Document Valorization A “Same-topic” document valorization aims to increase relevance score of document referring to the topics mentioned therein. The Figure 5 gives a structural view of this type of document valorization. The meta-document is introduced in [9] and it is able to annotate multimedia objects as well as web documents in a way that ensures its reusability. The querying process matches the user query with the meta-documents in order to identify the score relevance of the document to the query. We define the “topic cloud” as groups of weighted terms concerning a given topic. Simply, we collect potential terms representing a given topic to construct a topic cloud. The terms’ weights express the ability of each term to represent the topic. After running a usual querying process matching the query and the meta-documents, we get the relevance score for each annotated resource or document. At this point, the topic clouds are used to enhance ranking results in the benefit of relevant documents focusing mainly on query interests. WD1T1 Document 1 MetaDocument 1 topic 1 Document 2 . . . . . . . . . . MetaDocument 2 . . . . . . . . . . Topic 2 . . . . . . . . . . Document n MetaDocument n Topic m WQT1 Query Q WQT2 Figure 5. The “Same-topic” document valorization. To run the “Same-topic” document valorization, first, we establish the meta-document/topic weighted links WDT. WDT expresses the potential topics mentioned by the document. To assign a weight WDT to a meta-document/topic link, we simply sum the weights of topic terms existing in the meta-document. Then we establish query/topics weighted links which express the ability of each topic to represent the query. To assign a weight WQT to a Query/Topic link, we use the classic similarity measure between two weighted terms vectors: W୕୘ = simሺQ, Tiሻ = ෍ ౪ ౪ ౠసభ ୛౧ౠ ∗ ୛౪౟ౠ ሺ୛౧ౠ ሻమ ∗ ෍ ඨ෍ ౠసభ ౪ ሺ୛౪౟ౠ ሻమ (1) ౠసభ The next step of the “Same topic” document valorization is to calculate for each document his topic similarity relative to the query in order to increase or decrease its relevance score in terms of the value of the topic similarity. The topic similarity TS is calculated as follow: ୩ TS ሺQ, Dሻ = ෍ หW୕୘౟ − Wୈ୘౟ ห ୧ୀଵ (2) The main goal of a “Same topic” document valorization is to increase relevance of documents focusing on the same query topics and valorize “mono-topic” documents to users at an early age in order to facilitate its comprehension. The TS (Q, D) value is optimal when its value is minimal; this means that the query and the document are focusing on the same topics with approached values. Contrariwise, if the TS is high, this means that the document deals with other topics in addition to the query topics. Finally, the increase value V affected to a document Relevance Score RS is based on the following fuzzy rules: ─ If UA is in (pre-school or main childhood period) and TS is low then V is high ─ If UA is in preteen period and TS is low or medium then V is medium
  • 6. 358 Computer Science & Information Technology (CS & IT) Remains to mention that this valorization method is inspired from our previous work [10], except that we exclude the rule that decrease relevance score of documents having a high TS value. This exclusion allows to not penalizing some document because of the diversity of topics included therein. Also, we include the UA variable in this valorization method instead of score relevance RS. The TS variable is an input set represented by triangular membership functions which ranges from 0 to Max (see figure 6). Maximum TS means that the current document deals with several topics in addition to the query topics and a null TS means that the document and the query are dealing with the same topics. 1.2 Low medium 1 0.8 0.6 0.4 0.2 TS 0 0 max*0,25 max*0,5 max*0,75 max Figure 6. Fuzzy sets representing the variation range of TS. After the application of valorization methods separately to a document Di we sum the eventually deduced values to get the value Vi which is added to the document score relevance in order to get the new score relevance. After applying the valorization process to the relevant documents set found in the querying process, we pass finally to the ranking procedure. Table 1 summarizes the document valorization process. Table 1. Recapitulation of the pre-ranking document valorization process. Valorization method Multimedia Richer Nearest Personal Information Same-Topic Valorization process Inputs variables UA NMOD output v1 NPID UA v2 TS UA v3 Document Di Vi=∑ଷ v୩ ௞ୀଵ 4. CONCLUSION In this paper, we present three methods to valorize score relevance of some documents depending on their characteristics concerning the multimedia included objects, the user particulars and the topics mentioned therein. In this work, we use fuzzy rules to define in a flexible way the value added to the score relevance of valorized documents. Our framework is under development and it represents an information retrieval system dedicated for kids. We are working in the short-term on the identification of the relevant range and shape of the membership function representing the added value V usable on the three valorization methods. REFERENCES [1] [2] Tim Berners-Lee, James Hendler and Ora Lassila the Semantic Web. Scientific American: Feature Article: The Semantic Web: May 2001 W3C. Semantic Web http://www.w3.org/standards/semanticweb/
  • 7. Computer Science & Information Technology (CS & IT) [3] 359 Zadeh, L.A.: Fuzzy sets. Information and Control (1965) 338–353 http://www-bisc.cs.berkeley.edu/Zadeh-1965.pdf [4] Lotfi A. Zadeh: Knowledge Representation in Fuzzy Logic. IEEE Trans. Knowl. Data Eng. 1(1): 89100 (1989) [5] Lotfi A. Zadeh: A Summary and Update of "Fuzzy Logic". GrC 2010: 42-44 [6] Manola, F. Miller, E., Beckett, D. and Herman, I. 2007. RDF Primer http://www.w3.org/2007/02/turtle/primer/ [7] RDF Working Group. Resource Description Framework (RDF) http://www.w3.org/RDF/ [8] Eric Prud'hommeaux, W3C. SPARQL Query Language for RDF W3C Recommendation 15 January 2008 http://www.w3.org/TR/rdf-sparql-query/ [9] Jedidi A. (July 2005). « modélisation générique de documents multimédia par des métadonnées : mécanismes d’annotation et d’interrogation » Thesis of « Université TOULOUSE III Paul Sabatier », France. [10] Chkiwa, M., Jedidi, A., Gargouri, F. (October 2013). Handling Uncertainty in Semantic Information Retrieval Process. URSW 2013: 29-33, Sydney Australia