SlideShare ist ein Scribd-Unternehmen logo
1 von 7
Nature-Inspired Methods for the Semantic Web

                     Claudiu Mih˘il˘ and Magdalena Jitc˘
                                a a                    a

                         Faculty of Computer Science,
                        ”Al.I. Cuza” University of Ia¸i,
                          16, G-ral Berthelot Street,
                             700483 Ia¸i, Romania
              {claudiu.mihaila, magdalena.jitca}

      Abstract. More recently, significant research efforts are made towards
      uncertainty representation and reasoning in ontologies for the Semantic
      Web. This work reports on the contributions using methods inspired from
      nature in multiple Semantic Web domains, such as information retrieval
      and extraction, clustering, and personalisation. Furthermore, it describes
      briefly the attempts of modelling uncertainty.

      Key words: semantic Web, methods inspired from nature, soft com-
      puting, Web mining, uncertainty modelling

1   Introduction

In the context of an ever-expanding World Wide Web (www), more than 100
million registered domains [1], over 25 billion indexed pages [2], and more than
one trillion unique urls [3] have been reported. The variety of information avail-
able on the web has led the researchers to multiple research directions, one of the
most important being related to the difference between human- and machine-
understandable information and another related to information uncertainty. The
Semantic Web models available until the past few years have included little ex-
plicit information about uncertainty representation and processing because of
the concerns raised by the scalability and computational complexity of this pos-
sible approach. Much research interest focusses on the techniques for extracting
incomplete, partial or uncertain knowledge, as well as on handling uncertainty
when representing extracted information using ontologies.
    This report provides an overview of the contributions to this research area
regarding the development or improvement of the currently available Semantic
Web tools and models by means of soft computing. It also presents the work
dealing with representation of uncertain knowledge and reasoning in presence of
    In the near future, semantic web systems are expected to integrate a consis-
tent set of the available soft computing techniques, including uncertainty repre-
sentations, statistical measures, fuzzy rules or belief networks for transmission
across the Web.
In the first part of the report, we describe the uses of nature-inspired methods
in the Web and then in the Semantic Web. In the second part, we describe the
attempts of modelling uncertainty.

2     Current use of nature-inspired methods in the Web

Due to the vastness and diversity of the Web, it has become impossible to be able
to create software which comprises it completely and which is able to understand
correctly the information it contains. The lack of structure and patterns and
the large amount of data has led researchers into developing nature inspired
methodologies, which can find, most of the times, an optimal solution to NP-
complete problems.
    Methods inspired from nature are used in various Web domains. For example,
SnapAd.com1 uses genetic algorithms to produce advertisements. This service
begins with a base population of ad variations and, after employing the genetic
algorithm, manages to select their best-performing characteristics in order to
create an impressive result.
    Other works, such as [4, 5], use genetic algorithms to determine clusters of
similar users in social networks. The algorithms use fitness functions which mea-
sure the number of intra- and inter-connections for groups and variation opera-
tors which reduce the space of possible solutions in an appreciable manner.
    In addition, nature inspired methods have been successfully used in search
engines [6], information retrieval [7], and question answering [8] systems.

3     Nature-inspired methods in the Semantic Web

Web mining is the area of data mining which deals with the analysis and ex-
traction of interesting knowledge from the World Wide Web. However, when
working with large amounts of mixed and poorly tagged information, which is
constantly changing, problems are very likely to arise. According to [9], the main
problems regard handling context sensitive queries, summarisation, deduction,
personalisation and learning. Fig. 1 depicts the subtasks of web mining, which
will later be discussed along with the problems they might raise.

                           Fig. 1. Web mining subtasks

Information retrieval The issues which may occur during the task of infor-
mation retrieval (ir) are related to the uncertainty and the accuracy of the user
queries, as well as to the deduction and decision capabilities of the system. Sev-
eral approaches of the fuzzy logic which try to solve the issues of formulating
queries in relation to the relevance of the resulting documents with respect to
the input query are included in [9]. The results show that systems based on fuzzy
Boolean ir models would be most suitable for representing both the document
contents and the information needs.
    Artificial neural networks (ann) also provide a convenient method of knowl-
edge representation for ir applications, as their learning ability eases the task of
implementing adaptive systems. The system [10] first encodes the initial knowl-
edge base, and then constantly refines it by means of the neural networks. The
advantages of this approach is that the correctness of the initial information does
not directly influence the output, as this information is improved at each step
by extracting rules from the knowledge-based nns.
    The genetic algorithms (ga) that have been used for this purpose assign
so-called relevance coefficients to the html tags, which are deduced from the
training text set. As regards the sub-task of query optimisation, gas have been
used at reweighting the document indexing without having to expand the queries
    A novel approach using evolutionary algorithms in a distributed environment
is reported in [12]. Their intention is to determine to which information sources
the queries should be sequentially sent. By combining a query sampling method
and an evolutionary method, the resource descriptions are retrieved and inte-
grated optimally. The process of ontological mediation with query-based sam-
pling is depicted in Fig. 2 [13]. While the crawlers sample the resource descrip-
tions of the information sources, the mediator conducts the process of ontological
mediation for the integration of the obtained ontologies into a single large one

  Fig. 2. A whole process of ontological mediation with query-based sampling. [13]

    Moreover, due to the fact that crawlers continue obtaining semantic informa-
tion from the sources, the ontologies evolve over time. This process is achieved
by employing a genetic algorithm within the mediator, which determines the
best mapping between the obtained semantic substructures and the estimated
local ontology. The results of the conducted experiments prove the scalability of
the entire contextual mediation.
    Another technique that can be used to solve the task of approximate infor-
mation retrieval is the rough sets (rs) theory [9], considering that the set of
relevant documents may be less accurate and that it can be represented by its
”upper” and ”lower” approximations. The lower one corresponds to the most
specific set, that is definitely relevant to the searched item and the upper one
refers to the most general set that may possibly be relevant. This concept can
further be used at improving the efficiency of ir systems by implementing a
dynamic and focused search, based on the above described technique.

Information extraction Information extraction (ie) is the task of identifying
specific fragments of a single document representing its core semantic content.
The most effective methods of ie discovered until now involve working with
wrappers, procedures for extracting information from web resources. However,
they have the drawback of being particular to a certain resource, hence they
cannot be applied on every available web resource.
    This performance can be improved by using nns with a boosted wrapper
induction (bwi) technique [15]. By using the AdaBoost algorithm, bwi repeat-
edly reweights the training examples so that subsequent patterns handle training
examples missed by previous rules. The results of the learning process are com-
parable to the ones obtained with the hmm technique for learning and then
extracting the information [16].
    Another approach is that of Inductive Logic Programming [17], in which
logical rules are learned in order to identify phrases to be extracted from a
document [18].

Clustering Clustering is an important issue while dealing with web documents
in order to cover tasks such as measuring the relevance or the speed, obtaining
browsable summaries or working with overlapping data. However, there are still
some unresolved problems regarding efficient clustering arising from the nature
of web data itself. A fuzzy clustering technique for web log data mining, based
on an algorithm for clustering user session, is presented in [9]. It analyses the
structure of a certain website and the urls in order to be able to compute the
degree of similarity between two user sessions.
    The ability of nns in modelling complex nonlinear functions can also be used
for this task [9], for example in classifying web pages, as well as user patterns,
in both supervised and unsupervised manners.
    Another soft computing method used for document clustering is rs theory,
among which variable precision and tolerance relations are significant for this
task. In particular, rough mereology has been used for mining multimedia ob-
jects, as well as web graphs or semantic structures [19].
    An evolutionary approach for the conceptual clustering of semantic knowl-
edge bases is presented in [20]. Their method can be applied to multi-relational
knowledge bases to exploit effectively and, most importantly, language-indepen-
dently a semi-distance dissimilarity measure defined for the space of individual
resources. Such clusterings of semantically annotated resources present a high
degree of interest due to their ability of defining new emerging concepts (con-
cept formation), which can induce new concept definitions or a refinement of
existing ones (ontology evolution). The evolutionary algorithm they developed,
which extends distance-based clustering procedures employing medoids as cluster
prototypes, remains stable along multiple repetitions, converging towards clus-
terings of comparable quality with generally the same number of clusters, and
avoiding being caught in points of local minima. Furthermore, the work could
be extended in order to create hierarchies of clusters of specific granularity.

Personalisation Personalisation involves using the technology to accommodate
the differences between individuals, but in this context it refers to the fact the
retrieved content and the search results should be according to users’ preferences
and interests. The most effective way of learning the user profiles by using train-
ing data collected from several users or systems. ”Syskill and Webert”, an agent
which learns user profiles using the Bayesian classifier, is introduced in [9]. As
an extension, it can be used to determine whether the users would have interest
for a similar page. This decision is possible due to analysing the html source of
a page, but the prerequisite for this is the previous retrieval of the considered
    An improved way of obtaining quality and useful ”aggregate user profiles”
from patterns is given in [21]. This approach relies on two techniques involving
clustering of both user transactions and page views with the purpose of obtaining
the overlapping aggregate profiles, which can later be used by recommender
systems for real-time personalisation.

3.1   Uncertainty modelling
The issue of uncertainty on the Semantic Web is still a challenging research field,
as this domain deals with imprecise information from different applications, each
with its special knowledge representation needs (e.g., multimedia processing,
face recognition, gps systems). To deal with uncertainty in the Semantic Web
and its applications, many researchers have proposed extending owl and the
Description Logic (dl) formalisms with special mathematical frameworks.
    A probabilistic method, based on Bayesian networks (bn), is proposed in [22],
to represent and compute the overlap in concept hierarchies. The overlap between
a pair of concepts (selected vs. referred) is a numeric value in the [0, 1] range
and indicates how well a data item matches the query concept. It approaches
0 in case of disjoint concepts and 1 when the referred concept is subsumed by
the selected one. Based upon the possible relations between concepts a graph
notation has been used for representing the degree of overlap in the concept
hierarchy. The goal of this approach is to represent the overlap between concepts
from a taxonomic structure, without requiring the user any prior knowledge of
probability theory or bns.
A probabilistic framework for modelling uncertainty in semantic web ontolo-
gies based on Bayesian networks has been developed in [23]. Their goal is to
convert any owl ontology into a bn by using probabilistic extensions to de-
scription logics. The translated bn is semantically consistent with the original
ontology and satisfies all the given probabilistic constraints. The drawback of
this approach is that the probabilistic information must be added to the on-
tology by the human modeller and this task requires knowledge of probability
theory. This framework, called BayesOWL, is currently at version 1.0, and it is
available for download2 as a Java extension.
    More recently, a World Wide Web Consortium (w3c) Incubator Group on
Uncertainty Reasoning for the World Wide Web was created in order to describe
situations where uncertainty reasoning would improve majorly information ex-
traction, to identify methodologies which can be applied to these cases, and to
develop a standardised representation of uncertainty [24]. The most commonly
used approaches to uncertainty for the www that the group identified are prob-
abilistic theories (e.g., bn), fuzzy logic, and belief functions. After analysing 16
use cases, the group developed an uncertainty ontology and concluded that the
uncertainty came either from data, or from reasoning.

4     Conclusions

In this report, we have summarised the achievements using soft computing
methodologies in the context of the Semantic Web and briefly described their
principles. We have then summarily introduced uncertainty modelling and gave
an overview of some approaches.
   Many important aspects still remain open for future research. Specifically,
there is a need for scalable formalisms to support uncertainty and vagueness in
ontology languages, and implementations of these formalisms.

 1. DomainTools,     LLC:          Domain      Counts     &   Internet    Statistics. Accessed 10 January 2010.
 2. de    Kunder,    M.:          The    size   of    the    World    Wide      Web. Accessed 10 January 2010.
 3. Alpert,    J.,  Hajaj,    N.:         We     knew     the   web     was     big... (25 July
    2008) Accessed 10 January 2010.
 4. Pizzuti, C.: Community detection in social networks with genetic algorithms. In:
    GECCO ’08: Proceedings of the 10th annual conference on Genetic and evolution-
    ary computation, New York, NY, USA, ACM (2008) pp. 1137–1138
 5. Lipczak, M., Milios, E.: Agglomerative genetic algorithm for clustering in social
    networks. In: GECCO ’09: Proceedings of the 11th Annual conference on Genetic
    and evolutionary computation, New York, NY, USA, ACM (2009) pp. 1243–1250
6. Picarougne, F., Monmarch, N., Oliver, A., Venturini, G.: Geniminer: Web mining
    with a genetic-based (2002)
 7. Xu, Y., Deli, Y., Yu, L.: Efficient annealing -inspired genetic algorithm for in-
    formation retrieval from web-document. In: GEC ’09: Proceedings of the first
    ACM/SIGEVO Summit on Genetic and Evolutionary Computation, New York,
    NY, USA, ACM (2009) pp. 1017–1020
 8. Figueroa, A.G., Neumann, G.: Genetic algorithms for data-driven web question
    answering. Evolutionary Computation 16(1) (2008) pp. 89–125
 9. Pal, S.K., Talwar, V., Mitra, P., Member, S., Member, S.: Web mining in soft
    computing framework: Relevance, state of the art and future directions. IEEE
    Transactions on Neural Networks 13 (2002) pp. 1163–1177
10. Shavlik, J., Towell, G.G.: Knowledge-based artificial neural networks. Artificial
    Intelligence 70(1/2) (1994) pp. 119–165
11. Yang, J.J., Korfhage, R.R.: Query modification using genetic algorithms in vector
    space models. International Journal of Expert Systems 7(2) (1994) pp. 165–191
12. Jung, J.J.: An evolutionary approach to query-sampling for heterogeneous systems.
    Expert Systems with Applications 37(1) (2010) pp. 226–232
13. Jung, J.J.: Ontological framework based on contextual mediation for collaborative
    information retrieval. Information Retrieval 10(1) (2007) pp. 85–109
14. Noy, N.F., Musen, M.A.: Prompt: Algorithm and tool for automated ontology
    merging and alignment. In: Proceedings of the Seventeenth National Conference
    on Artificial Intelligence and Twelfth Conference on Innovative Applications of
    Artificial Intelligence, AAAI Press / The MIT Press (2000) pp. 450–455
15. Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proceedings of the
    Seventeenth National Conference on Artificial Intelligence and Twelfth Conference
    on Innovative Applications of Artificial Intelligence, AAAI Press / The MIT Press
    (2000) pp. 577–583
16. Bikel, D.M., Schwartz, R., Weischedel, R.M.: An algorithm that learns what‘s in
    a name. Machine Learning 34(1-3) (1999) pp. 211–231
17. Muggleton, S., ed.: Inductive Logic Programming. Academic Press, New York,
    NY (1992)
18. Freitag, D.: Toward general-purpose learning for information extraction. In: Pro-
    ceedings of the 17th international conference on Computational linguistics, Mor-
    ristown, NJ, USA, Association for Computational Linguistics (1998) pp. 404–408
19. Polkowski, L., Skowron, A.: Rough mereology: A new paradigm for approximate
    reasoning. Int. J. Approx. Reasoning 15(4) (1996) pp. 333–365
20. Fanizzi, N., d’Amato, C., Esposito, F.: Evolutionary conceptual clustering based
    on induced pseudo-metrics. International Journal on Semantic Web & Information
    Systems 4(3) (2008) pp. 44–67
21. Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Discovery and evaluation of aggre-
    gate usage profiles for web personalization. Data Min. Knowl. Discov. 6(1) (2002)
    pp. 61–82
22. Holi, M., Hyv¨nen, E. In: Modeling uncertainty in semantic web taxonomies.
    Springer-Verlag, Berlin (2006)
23. Ding, Z., Peng, Y.: A probabilistic extension to ontology language owl. In: HICSS
    ’04: Proceedings of the Proceedings of the 37th Annual Hawaii International Con-
    ference on System Sciences (HICSS’04) - Track 4, Washington, DC, USA, IEEE
    Computer Society (2004) p. 40111.1
24. W3C Incubator Group Report: Uncertainty Reasoning for the World Wide Web. (31 March 2008) Accessed
    10 January 2010.

Weitere ähnliche Inhalte

Was ist angesagt?

Classification of web services using data mining algorithms and improved lear...
Classification of web services using data mining algorithms and improved lear...Classification of web services using data mining algorithms and improved lear...
Classification of web services using data mining algorithms and improved lear...TELKOMNIKA JOURNAL
05 20275 computational solution...
05 20275 computational solution...05 20275 computational solution...
05 20275 computational solution...IAESIJEECS
Sup (Semantic User Profiling)
Sup (Semantic User Profiling)Sup (Semantic User Profiling)
Sup (Semantic User Profiling)Emanuela Boroș
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...IOSR Journals
A Survey on: Utilizing of Different Features in Web Behavior Prediction
A Survey on: Utilizing of Different Features in Web Behavior PredictionA Survey on: Utilizing of Different Features in Web Behavior Prediction
A Survey on: Utilizing of Different Features in Web Behavior PredictionEditor IJMTER
A detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesA detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesijctet
Iaetsd scalable mobile presence cloud with
Iaetsd scalable mobile presence cloud withIaetsd scalable mobile presence cloud with
Iaetsd scalable mobile presence cloud withIaetsd Iaetsd
Web Chat using React Framework
Web Chat using React FrameworkWeb Chat using React Framework
Web Chat using React Frameworkijtsrd
Semantically enriching content using OpenCalais
Semantically enriching content using OpenCalaisSemantically enriching content using OpenCalais
Semantically enriching content using OpenCalaisMarius Butuc
A language independent web data extraction using vision based page segmentati...
A language independent web data extraction using vision based page segmentati...A language independent web data extraction using vision based page segmentati...
A language independent web data extraction using vision based page segmentati...eSAT Publishing House
Record matching over query results
Record matching over query resultsRecord matching over query results
Record matching over query resultsambitlick
An Effective Approach for Document Crawling With Usage Pattern and Image Base...
An Effective Approach for Document Crawling With Usage Pattern and Image Base...An Effective Approach for Document Crawling With Usage Pattern and Image Base...
An Effective Approach for Document Crawling With Usage Pattern and Image Base...Editor IJCATR
The Appropriateness of the Factual Density as an Informativeness Measure for ...
The Appropriateness of the Factual Density as an Informativeness Measure for ...The Appropriateness of the Factual Density as an Informativeness Measure for ...
The Appropriateness of the Factual Density as an Informativeness Measure for ...csandit

Was ist angesagt? (19)

Classification of web services using data mining algorithms and improved lear...
Classification of web services using data mining algorithms and improved lear...Classification of web services using data mining algorithms and improved lear...
Classification of web services using data mining algorithms and improved lear...
05 20275 computational solution...
05 20275 computational solution...05 20275 computational solution...
05 20275 computational solution...
Sup (Semantic User Profiling)
Sup (Semantic User Profiling)Sup (Semantic User Profiling)
Sup (Semantic User Profiling)
320 324
320 324320 324
320 324
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
A Survey on: Utilizing of Different Features in Web Behavior Prediction
A Survey on: Utilizing of Different Features in Web Behavior PredictionA Survey on: Utilizing of Different Features in Web Behavior Prediction
A Survey on: Utilizing of Different Features in Web Behavior Prediction
A detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesA detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniques
Iaetsd scalable mobile presence cloud with
Iaetsd scalable mobile presence cloud withIaetsd scalable mobile presence cloud with
Iaetsd scalable mobile presence cloud with
Web Chat using React Framework
Web Chat using React FrameworkWeb Chat using React Framework
Web Chat using React Framework
Semantically enriching content using OpenCalais
Semantically enriching content using OpenCalaisSemantically enriching content using OpenCalais
Semantically enriching content using OpenCalais
625 634
625 634625 634
625 634
A language independent web data extraction using vision based page segmentati...
A language independent web data extraction using vision based page segmentati...A language independent web data extraction using vision based page segmentati...
A language independent web data extraction using vision based page segmentati...
Record matching over query results
Record matching over query resultsRecord matching over query results
Record matching over query results
An Effective Approach for Document Crawling With Usage Pattern and Image Base...
An Effective Approach for Document Crawling With Usage Pattern and Image Base...An Effective Approach for Document Crawling With Usage Pattern and Image Base...
An Effective Approach for Document Crawling With Usage Pattern and Image Base...
The Appropriateness of the Factual Density as an Informativeness Measure for ...
The Appropriateness of the Factual Density as an Informativeness Measure for ...The Appropriateness of the Factual Density as an Informativeness Measure for ...
The Appropriateness of the Factual Density as an Informativeness Measure for ...

Andere mochten auch

Analysing Entity Type Variation across Biomedical Subdomains
Analysing Entity Type Variation across Biomedical SubdomainsAnalysing Entity Type Variation across Biomedical Subdomains
Analysing Entity Type Variation across Biomedical SubdomainsClaudiu Mihăilă
Europe in motion CEIP VICTORIA DÍEZ
Europe in motion CEIP VICTORIA DÍEZEurope in motion CEIP VICTORIA DÍEZ
Europe in motion CEIP VICTORIA DÍEZego-sum-qui-sum
Paganismo uma mal oculto dentro das igrejas
Paganismo uma mal oculto dentro das igrejasPaganismo uma mal oculto dentro das igrejas
Paganismo uma mal oculto dentro das igrejasMarcio de Medeiros
Response Plan redacted
Response Plan redactedResponse Plan redacted
Response Plan redactedJustin Menzia
UK Construction in 2015 #FISorg
UK Construction in 2015 #FISorgUK Construction in 2015 #FISorg
UK Construction in 2015 #FISorgfisorg
Functional Dependency Grammar
Functional Dependency GrammarFunctional Dependency Grammar
Functional Dependency GrammarClaudiu Mihăilă
Invata engleza-carte-pt-copii
Invata engleza-carte-pt-copiiInvata engleza-carte-pt-copii
Invata engleza-carte-pt-copiifundata

Andere mochten auch (13)

Analysing Entity Type Variation across Biomedical Subdomains
Analysing Entity Type Variation across Biomedical SubdomainsAnalysing Entity Type Variation across Biomedical Subdomains
Analysing Entity Type Variation across Biomedical Subdomains
Europe in motion CEIP VICTORIA DÍEZ
Europe in motion CEIP VICTORIA DÍEZEurope in motion CEIP VICTORIA DÍEZ
Europe in motion CEIP VICTORIA DÍEZ
2016 NGH Convention Newsletter 6
2016 NGH Convention Newsletter 62016 NGH Convention Newsletter 6
2016 NGH Convention Newsletter 6
Paganismo uma mal oculto dentro das igrejas
Paganismo uma mal oculto dentro das igrejasPaganismo uma mal oculto dentro das igrejas
Paganismo uma mal oculto dentro das igrejas
my last vacation
my last vacationmy last vacation
my last vacation
Response Plan redacted
Response Plan redactedResponse Plan redacted
Response Plan redacted
UK Construction in 2015 #FISorg
UK Construction in 2015 #FISorgUK Construction in 2015 #FISorg
UK Construction in 2015 #FISorg
Functional Dependency Grammar
Functional Dependency GrammarFunctional Dependency Grammar
Functional Dependency Grammar
Invata engleza-carte-pt-copii
Invata engleza-carte-pt-copiiInvata engleza-carte-pt-copii
Invata engleza-carte-pt-copii

Ähnlich wie Nature-Inspired Methods for Semantic Web

Web Mining for an Academic Portal: The case of Al-Imam Muhammad Ibn Saud Isla...
Web Mining for an Academic Portal: The case of Al-Imam Muhammad Ibn Saud Isla...Web Mining for an Academic Portal: The case of Al-Imam Muhammad Ibn Saud Isla...
Web Mining for an Academic Portal: The case of Al-Imam Muhammad Ibn Saud Isla...IOSR Journals
Classifier Model using Artificial Neural Network
Classifier Model using Artificial Neural NetworkClassifier Model using Artificial Neural Network
Classifier Model using Artificial Neural NetworkAI Publications
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient TechniquesData Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient TechniquesIJAEMSJORNAL
Advance Clustering Technique Based on Markov Chain for Predicting Next User M...
Advance Clustering Technique Based on Markov Chain for Predicting Next User M...Advance Clustering Technique Based on Markov Chain for Predicting Next User M...
Advance Clustering Technique Based on Markov Chain for Predicting Next User M...idescitation
1. Web Mining – Web mining is an application of data mining for di.docx
1. Web Mining – Web mining is an application of data mining for di.docx1. Web Mining – Web mining is an application of data mining for di.docx
1. Web Mining – Web mining is an application of data mining for di.docxbraycarissa250
Sentimental classification analysis of polarity multi-view textual data using...
Sentimental classification analysis of polarity multi-view textual data using...Sentimental classification analysis of polarity multi-view textual data using...
Sentimental classification analysis of polarity multi-view textual data using...IJECEIAES
An effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded contentAn effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded contentijdpsjournal
Concept drift and machine learning model for detecting fraudulent transaction...
Concept drift and machine learning model for detecting fraudulent transaction...Concept drift and machine learning model for detecting fraudulent transaction...
Concept drift and machine learning model for detecting fraudulent transaction...IJECEIAES
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET Journal
A genetic based research framework 3
A genetic based research framework 3A genetic based research framework 3
A genetic based research framework 3prj_publication
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplicationidescitation

Ähnlich wie Nature-Inspired Methods for Semantic Web (20)

Web Mining for an Academic Portal: The case of Al-Imam Muhammad Ibn Saud Isla...
Web Mining for an Academic Portal: The case of Al-Imam Muhammad Ibn Saud Isla...Web Mining for an Academic Portal: The case of Al-Imam Muhammad Ibn Saud Isla...
Web Mining for an Academic Portal: The case of Al-Imam Muhammad Ibn Saud Isla...
Classifier Model using Artificial Neural Network
Classifier Model using Artificial Neural NetworkClassifier Model using Artificial Neural Network
Classifier Model using Artificial Neural Network
[IJCT-V3I2P30] Authors: Sunny Sharma
[IJCT-V3I2P30] Authors: Sunny Sharma[IJCT-V3I2P30] Authors: Sunny Sharma
[IJCT-V3I2P30] Authors: Sunny Sharma
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient TechniquesData Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
Advance Clustering Technique Based on Markov Chain for Predicting Next User M...
Advance Clustering Technique Based on Markov Chain for Predicting Next User M...Advance Clustering Technique Based on Markov Chain for Predicting Next User M...
Advance Clustering Technique Based on Markov Chain for Predicting Next User M...
1. Web Mining – Web mining is an application of data mining for di.docx
1. Web Mining – Web mining is an application of data mining for di.docx1. Web Mining – Web mining is an application of data mining for di.docx
1. Web Mining – Web mining is an application of data mining for di.docx
Sentimental classification analysis of polarity multi-view textual data using...
Sentimental classification analysis of polarity multi-view textual data using...Sentimental classification analysis of polarity multi-view textual data using...
Sentimental classification analysis of polarity multi-view textual data using...
An effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded contentAn effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded content
Concept drift and machine learning model for detecting fraudulent transaction...
Concept drift and machine learning model for detecting fraudulent transaction...Concept drift and machine learning model for detecting fraudulent transaction...
Concept drift and machine learning model for detecting fraudulent transaction...
Introduction abstract
Introduction abstractIntroduction abstract
Introduction abstract
Spe165 t
Spe165 tSpe165 t
Spe165 t
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
A genetic based research framework 3
A genetic based research framework 3A genetic based research framework 3
A genetic based research framework 3
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplication

Kürzlich hochgeladen

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Kürzlich hochgeladen (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Nature-Inspired Methods for Semantic Web

  • 1. Nature-Inspired Methods for the Semantic Web Claudiu Mih˘il˘ and Magdalena Jitc˘ a a a Faculty of Computer Science, ”Al.I. Cuza” University of Ia¸i, s 16, G-ral Berthelot Street, 700483 Ia¸i, Romania s {claudiu.mihaila, magdalena.jitca} Abstract. More recently, significant research efforts are made towards uncertainty representation and reasoning in ontologies for the Semantic Web. This work reports on the contributions using methods inspired from nature in multiple Semantic Web domains, such as information retrieval and extraction, clustering, and personalisation. Furthermore, it describes briefly the attempts of modelling uncertainty. Key words: semantic Web, methods inspired from nature, soft com- puting, Web mining, uncertainty modelling 1 Introduction In the context of an ever-expanding World Wide Web (www), more than 100 million registered domains [1], over 25 billion indexed pages [2], and more than one trillion unique urls [3] have been reported. The variety of information avail- able on the web has led the researchers to multiple research directions, one of the most important being related to the difference between human- and machine- understandable information and another related to information uncertainty. The Semantic Web models available until the past few years have included little ex- plicit information about uncertainty representation and processing because of the concerns raised by the scalability and computational complexity of this pos- sible approach. Much research interest focusses on the techniques for extracting incomplete, partial or uncertain knowledge, as well as on handling uncertainty when representing extracted information using ontologies. This report provides an overview of the contributions to this research area regarding the development or improvement of the currently available Semantic Web tools and models by means of soft computing. It also presents the work dealing with representation of uncertain knowledge and reasoning in presence of uncertainty. In the near future, semantic web systems are expected to integrate a consis- tent set of the available soft computing techniques, including uncertainty repre- sentations, statistical measures, fuzzy rules or belief networks for transmission across the Web.
  • 2. In the first part of the report, we describe the uses of nature-inspired methods in the Web and then in the Semantic Web. In the second part, we describe the attempts of modelling uncertainty. 2 Current use of nature-inspired methods in the Web Due to the vastness and diversity of the Web, it has become impossible to be able to create software which comprises it completely and which is able to understand correctly the information it contains. The lack of structure and patterns and the large amount of data has led researchers into developing nature inspired methodologies, which can find, most of the times, an optimal solution to NP- complete problems. Methods inspired from nature are used in various Web domains. For example, SnapAd.com1 uses genetic algorithms to produce advertisements. This service begins with a base population of ad variations and, after employing the genetic algorithm, manages to select their best-performing characteristics in order to create an impressive result. Other works, such as [4, 5], use genetic algorithms to determine clusters of similar users in social networks. The algorithms use fitness functions which mea- sure the number of intra- and inter-connections for groups and variation opera- tors which reduce the space of possible solutions in an appreciable manner. In addition, nature inspired methods have been successfully used in search engines [6], information retrieval [7], and question answering [8] systems. 3 Nature-inspired methods in the Semantic Web Web mining is the area of data mining which deals with the analysis and ex- traction of interesting knowledge from the World Wide Web. However, when working with large amounts of mixed and poorly tagged information, which is constantly changing, problems are very likely to arise. According to [9], the main problems regard handling context sensitive queries, summarisation, deduction, personalisation and learning. Fig. 1 depicts the subtasks of web mining, which will later be discussed along with the problems they might raise. Fig. 1. Web mining subtasks 1
  • 3. Information retrieval The issues which may occur during the task of infor- mation retrieval (ir) are related to the uncertainty and the accuracy of the user queries, as well as to the deduction and decision capabilities of the system. Sev- eral approaches of the fuzzy logic which try to solve the issues of formulating queries in relation to the relevance of the resulting documents with respect to the input query are included in [9]. The results show that systems based on fuzzy Boolean ir models would be most suitable for representing both the document contents and the information needs. Artificial neural networks (ann) also provide a convenient method of knowl- edge representation for ir applications, as their learning ability eases the task of implementing adaptive systems. The system [10] first encodes the initial knowl- edge base, and then constantly refines it by means of the neural networks. The advantages of this approach is that the correctness of the initial information does not directly influence the output, as this information is improved at each step by extracting rules from the knowledge-based nns. The genetic algorithms (ga) that have been used for this purpose assign so-called relevance coefficients to the html tags, which are deduced from the training text set. As regards the sub-task of query optimisation, gas have been used at reweighting the document indexing without having to expand the queries [11]. A novel approach using evolutionary algorithms in a distributed environment is reported in [12]. Their intention is to determine to which information sources the queries should be sequentially sent. By combining a query sampling method and an evolutionary method, the resource descriptions are retrieved and inte- grated optimally. The process of ontological mediation with query-based sam- pling is depicted in Fig. 2 [13]. While the crawlers sample the resource descrip- tions of the information sources, the mediator conducts the process of ontological mediation for the integration of the obtained ontologies into a single large one [14]. Fig. 2. A whole process of ontological mediation with query-based sampling. [13] Moreover, due to the fact that crawlers continue obtaining semantic informa- tion from the sources, the ontologies evolve over time. This process is achieved by employing a genetic algorithm within the mediator, which determines the best mapping between the obtained semantic substructures and the estimated
  • 4. local ontology. The results of the conducted experiments prove the scalability of the entire contextual mediation. Another technique that can be used to solve the task of approximate infor- mation retrieval is the rough sets (rs) theory [9], considering that the set of relevant documents may be less accurate and that it can be represented by its ”upper” and ”lower” approximations. The lower one corresponds to the most specific set, that is definitely relevant to the searched item and the upper one refers to the most general set that may possibly be relevant. This concept can further be used at improving the efficiency of ir systems by implementing a dynamic and focused search, based on the above described technique. Information extraction Information extraction (ie) is the task of identifying specific fragments of a single document representing its core semantic content. The most effective methods of ie discovered until now involve working with wrappers, procedures for extracting information from web resources. However, they have the drawback of being particular to a certain resource, hence they cannot be applied on every available web resource. This performance can be improved by using nns with a boosted wrapper induction (bwi) technique [15]. By using the AdaBoost algorithm, bwi repeat- edly reweights the training examples so that subsequent patterns handle training examples missed by previous rules. The results of the learning process are com- parable to the ones obtained with the hmm technique for learning and then extracting the information [16]. Another approach is that of Inductive Logic Programming [17], in which logical rules are learned in order to identify phrases to be extracted from a document [18]. Clustering Clustering is an important issue while dealing with web documents in order to cover tasks such as measuring the relevance or the speed, obtaining browsable summaries or working with overlapping data. However, there are still some unresolved problems regarding efficient clustering arising from the nature of web data itself. A fuzzy clustering technique for web log data mining, based on an algorithm for clustering user session, is presented in [9]. It analyses the structure of a certain website and the urls in order to be able to compute the degree of similarity between two user sessions. The ability of nns in modelling complex nonlinear functions can also be used for this task [9], for example in classifying web pages, as well as user patterns, in both supervised and unsupervised manners. Another soft computing method used for document clustering is rs theory, among which variable precision and tolerance relations are significant for this task. In particular, rough mereology has been used for mining multimedia ob- jects, as well as web graphs or semantic structures [19]. An evolutionary approach for the conceptual clustering of semantic knowl- edge bases is presented in [20]. Their method can be applied to multi-relational
  • 5. knowledge bases to exploit effectively and, most importantly, language-indepen- dently a semi-distance dissimilarity measure defined for the space of individual resources. Such clusterings of semantically annotated resources present a high degree of interest due to their ability of defining new emerging concepts (con- cept formation), which can induce new concept definitions or a refinement of existing ones (ontology evolution). The evolutionary algorithm they developed, which extends distance-based clustering procedures employing medoids as cluster prototypes, remains stable along multiple repetitions, converging towards clus- terings of comparable quality with generally the same number of clusters, and avoiding being caught in points of local minima. Furthermore, the work could be extended in order to create hierarchies of clusters of specific granularity. Personalisation Personalisation involves using the technology to accommodate the differences between individuals, but in this context it refers to the fact the retrieved content and the search results should be according to users’ preferences and interests. The most effective way of learning the user profiles by using train- ing data collected from several users or systems. ”Syskill and Webert”, an agent which learns user profiles using the Bayesian classifier, is introduced in [9]. As an extension, it can be used to determine whether the users would have interest for a similar page. This decision is possible due to analysing the html source of a page, but the prerequisite for this is the previous retrieval of the considered page. An improved way of obtaining quality and useful ”aggregate user profiles” from patterns is given in [21]. This approach relies on two techniques involving clustering of both user transactions and page views with the purpose of obtaining the overlapping aggregate profiles, which can later be used by recommender systems for real-time personalisation. 3.1 Uncertainty modelling The issue of uncertainty on the Semantic Web is still a challenging research field, as this domain deals with imprecise information from different applications, each with its special knowledge representation needs (e.g., multimedia processing, face recognition, gps systems). To deal with uncertainty in the Semantic Web and its applications, many researchers have proposed extending owl and the Description Logic (dl) formalisms with special mathematical frameworks. A probabilistic method, based on Bayesian networks (bn), is proposed in [22], to represent and compute the overlap in concept hierarchies. The overlap between a pair of concepts (selected vs. referred) is a numeric value in the [0, 1] range and indicates how well a data item matches the query concept. It approaches 0 in case of disjoint concepts and 1 when the referred concept is subsumed by the selected one. Based upon the possible relations between concepts a graph notation has been used for representing the degree of overlap in the concept hierarchy. The goal of this approach is to represent the overlap between concepts from a taxonomic structure, without requiring the user any prior knowledge of probability theory or bns.
  • 6. A probabilistic framework for modelling uncertainty in semantic web ontolo- gies based on Bayesian networks has been developed in [23]. Their goal is to convert any owl ontology into a bn by using probabilistic extensions to de- scription logics. The translated bn is semantically consistent with the original ontology and satisfies all the given probabilistic constraints. The drawback of this approach is that the probabilistic information must be added to the on- tology by the human modeller and this task requires knowledge of probability theory. This framework, called BayesOWL, is currently at version 1.0, and it is available for download2 as a Java extension. More recently, a World Wide Web Consortium (w3c) Incubator Group on Uncertainty Reasoning for the World Wide Web was created in order to describe situations where uncertainty reasoning would improve majorly information ex- traction, to identify methodologies which can be applied to these cases, and to develop a standardised representation of uncertainty [24]. The most commonly used approaches to uncertainty for the www that the group identified are prob- abilistic theories (e.g., bn), fuzzy logic, and belief functions. After analysing 16 use cases, the group developed an uncertainty ontology and concluded that the uncertainty came either from data, or from reasoning. 4 Conclusions In this report, we have summarised the achievements using soft computing methodologies in the context of the Semantic Web and briefly described their principles. We have then summarily introduced uncertainty modelling and gave an overview of some approaches. Many important aspects still remain open for future research. Specifically, there is a need for scalable formalisms to support uncertainty and vagueness in ontology languages, and implementations of these formalisms. References 1. DomainTools, LLC: Domain Counts & Internet Statistics. Accessed 10 January 2010. 2. de Kunder, M.: The size of the World Wide Web. Accessed 10 January 2010. 3. Alpert, J., Hajaj, N.: We knew the web was big... (25 July 2008) Accessed 10 January 2010. 4. Pizzuti, C.: Community detection in social networks with genetic algorithms. In: GECCO ’08: Proceedings of the 10th annual conference on Genetic and evolution- ary computation, New York, NY, USA, ACM (2008) pp. 1137–1138 5. Lipczak, M., Milios, E.: Agglomerative genetic algorithm for clustering in social networks. In: GECCO ’09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, New York, NY, USA, ACM (2009) pp. 1243–1250 2˜ypeng/BayesOWL/
  • 7. 6. Picarougne, F., Monmarch, N., Oliver, A., Venturini, G.: Geniminer: Web mining with a genetic-based (2002) 7. Xu, Y., Deli, Y., Yu, L.: Efficient annealing -inspired genetic algorithm for in- formation retrieval from web-document. In: GEC ’09: Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation, New York, NY, USA, ACM (2009) pp. 1017–1020 8. Figueroa, A.G., Neumann, G.: Genetic algorithms for data-driven web question answering. Evolutionary Computation 16(1) (2008) pp. 89–125 9. Pal, S.K., Talwar, V., Mitra, P., Member, S., Member, S.: Web mining in soft computing framework: Relevance, state of the art and future directions. IEEE Transactions on Neural Networks 13 (2002) pp. 1163–1177 10. Shavlik, J., Towell, G.G.: Knowledge-based artificial neural networks. Artificial Intelligence 70(1/2) (1994) pp. 119–165 11. Yang, J.J., Korfhage, R.R.: Query modification using genetic algorithms in vector space models. International Journal of Expert Systems 7(2) (1994) pp. 165–191 12. Jung, J.J.: An evolutionary approach to query-sampling for heterogeneous systems. Expert Systems with Applications 37(1) (2010) pp. 226–232 13. Jung, J.J.: Ontological framework based on contextual mediation for collaborative information retrieval. Information Retrieval 10(1) (2007) pp. 85–109 14. Noy, N.F., Musen, M.A.: Prompt: Algorithm and tool for automated ontology merging and alignment. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, AAAI Press / The MIT Press (2000) pp. 450–455 15. Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, AAAI Press / The MIT Press (2000) pp. 577–583 16. Bikel, D.M., Schwartz, R., Weischedel, R.M.: An algorithm that learns what‘s in a name. Machine Learning 34(1-3) (1999) pp. 211–231 17. Muggleton, S., ed.: Inductive Logic Programming. Academic Press, New York, NY (1992) 18. Freitag, D.: Toward general-purpose learning for information extraction. In: Pro- ceedings of the 17th international conference on Computational linguistics, Mor- ristown, NJ, USA, Association for Computational Linguistics (1998) pp. 404–408 19. Polkowski, L., Skowron, A.: Rough mereology: A new paradigm for approximate reasoning. Int. J. Approx. Reasoning 15(4) (1996) pp. 333–365 20. Fanizzi, N., d’Amato, C., Esposito, F.: Evolutionary conceptual clustering based on induced pseudo-metrics. International Journal on Semantic Web & Information Systems 4(3) (2008) pp. 44–67 21. Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Discovery and evaluation of aggre- gate usage profiles for web personalization. Data Min. Knowl. Discov. 6(1) (2002) pp. 61–82 22. Holi, M., Hyv¨nen, E. In: Modeling uncertainty in semantic web taxonomies. o Springer-Verlag, Berlin (2006) 23. Ding, Z., Peng, Y.: A probabilistic extension to ontology language owl. In: HICSS ’04: Proceedings of the Proceedings of the 37th Annual Hawaii International Con- ference on System Sciences (HICSS’04) - Track 4, Washington, DC, USA, IEEE Computer Society (2004) p. 40111.1 24. W3C Incubator Group Report: Uncertainty Reasoning for the World Wide Web. (31 March 2008) Accessed 10 January 2010.