SlideShare ist ein Scribd-Unternehmen logo
1 von 114
Project Name:
Building a Semantic Search in
a Library
Project members
 IJAZ UL HAQ (EJO) – GROUP LEADER
(ID:2820150066)
 CHEN JIAYU
 RON WONG SEE
 ALEX
 ZHENG YUE
 GARY
 PANG PENGFEI
Semantic Web
 Experts expect online information to be organized in
smarter, more useful ways in coming years, but there is a
dispute about whether the improvements will match.
Sir Tim Berner’s vision is a web that allows software
agents to carry out sophisticated tasks for users, making
meaningful connections between bits of information so
“computers can perform more of the tedious work
involved in finding, combining, and acting upon
information on the web.
Project Description
 Our project is about developing semantic search
in a library. Semantic web concept will be used to
engineer the searching process used in library
catalogue search systems and to make it efficient.
 Ontology is a higher view of a database schema;
it helps generate queries to extract data from the
database by selecting classes and relationships.
Applications of ontology
• Searching & browsing
• Decision support system
• Question answering system
• Recommendation
• Data integration
• Etc.
Semantic digital library
• Proposed an approach for managing, organizing and
populating ontology for document collections in
digital library.
• The document metadata and content are inserted
and populated to a knowledge base which allows
sophisticated query and searching.
• Firstly to propose an ontology based information
retrieval model which is based on the classic vector
space model which includes document annotation,
instance-based weighting and concept-based
ranking.
Semantic digital library
• General architecture
Ontology
Cont…
Apache Jena Api’s
Jena is a programming toolkit, using
the Java programming language.
While there are a few command-line
tools to help you perform some key
tasks using Jena, mostly you use Jena
by writing Java programs.
Eclipse And MySQL
 We used Apche Jena Api’s in Java Eclipse to get the
ontology file and search through the MySQL database
using that Ontology, this project is not a complete search
Engine for that book library, but we just differentiate
between the simple Syntactic Search and the Semantic
search, We have just put the data related to Software
Engineering books, not the actual book but the data
related to that book , and made the search to search
through the ontology. And that what sematic search is, to
search through the Documents or web using a pre defined
Ontology.
Future Work…
 In coming future the searching engine should be that
smart to fulfill all the search requirement of User Query,
 To show the user what user actually wants to see.
 For the Our project, it should search through the
documents of books, to show user the books or articles
user wants.
 There are many semantic search Engines, one of them is
Swoogle
Refrences
 http://protege-project.136.n4.nabble.com/PROTEGE-VS-JENA-
td4663290.html
 http://www.cs.ox.ac.uk/people/thomas.lukasiewicz/ssw11.pdf
 https://wiki.csc.calpoly.edu/OntologyTutorial/wiki/IntroductionToOnt
ologiesWithProtege
 http://stackoverflow.com/questions/3536856/implement-a-
semantic-search-with-in-a-web-application
 http://semantic-mediawiki.org/wiki/Help:Semantic_search
 https://code.google.com/p/tdwg-rdf/wiki/Beginners7OWL
 http://www.semanticfocus.com/search/query/protege%20tutorial%
20video
 http://www.semanticfocus.com/blog/entry/title/introduction-to-
the-semantic-web-vision-and-technologies-part-3-the-resource-
description-framework/
 http://www.semanticfocus.com/blog/entry/title/service-ontologies/
 http://www.semanticfocus.com/blog/entry/title/semantic-web-
search-engine-roundup/
 http://arxiv.org/ftp/arxiv/papers/1305/1305.5827.pdf
 https://classes.soe.ucsc.edu/cmps080k/Winter07/lectures/game-
ontology-overview.pdf
References Continue….
 http://hecpk.summon.serialssolutions.com/search?utf8=%E2%9C%93&s.q=se
mantically+enhanced+information+retrieval#!/search?ho=t&l=en&q=seman
tically%20enhanced%20information%20retrieval

 http://rt2de9up4t.search.serialssolutions.com/?ctx_ver=Z39.88-
2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-
8&rfr_id=info:sid/summon.serialssolutions.com&rft_val_fmt=info:ofi/fmt:kev:
mtx:journal&rft.genre=article&rft.atitle=Semantically+enhanced+Information
+Retrieval%3A+An+ontology-
based+approach&rft.jtitle=Web+Semantics%3A+Science%2C+Services+and
+Agents+on+the+World+Wide+Web&rft.au=Fernandez%2C+Miriam&rft.a
u=Cantador%2C+IvaN&rft.au=Lopez%2C+Vanesa&rft.au=Vallet%2C+David
&rft.date=2011-12-01&rft.pub=Elsevier+B.V&rft.issn=1570-
8268&rft.eissn=1873-
7749&rft.volume=9&rft.issue=4&rft.spage=434&rft_id=info:doi/10.1016%2Fj.
websem.2010.11.003&rft.externalDBID=BSHEE&rft.externalDocID=273483308
&paramdict=en-US

 http://link.springer.com/chapter/10.1007/978-3-540-25956-5_7
 http://www.semanticfocus.com/search/query/Introduction+to+Semantic+We
b+Vision+and+Technologies+-+Part+5+protege/
 http://protege.stanford.edu/publications/ontology_development/ontology101
-noy-mcguinness.html
References Continue….
 http://www.codeproject.com/Articles/9240/Explorer-s-Guide-to-the-
Semantic-Web-Chapter-Sea
 http://www.codeproject.com/Articles/13376/Music-and-the-Semantic-Web
 http://www.codeproject.com/search.aspx?q=semantic+search+engine+exam
ple&doctypeid=1%3b2%3b3%3b13%3b14
 https://www.quora.com/What-is-the-best-way-to-build-a-search-engine-for-
research-papers-academic-purposes
 http://www.slideshare.net/pwlodar1/implementing-semantic-search
 http://www.slideshare.net/SergeLinckels/semantic-web-applications-search-
engines?related=1
 https://mariaiulianadascalu.files.wordpress.com/2014/02/owl-cs-manchester-
ac-uk_-eowltutorialp4_v1_3.pdf
 http://www.slideshare.net/larsga/semantic-search-with-topic-maps-2534371
Any Questions??????????
What is RDF
1
RDF
Resource description framework (RDF) is a W3C
standard for describing web resources, such as the
title, author, date, content and copyright information
of a web page.
1 A framework for describing resources on
the Web
2 Provides a model of the data and the syntax
3 Designed to be read and understood by
computer
4
Using XML
Not to show people
5
Use attributes and attribute values to describe
resources
Resources
http://www.w3school.com.cn/r
df
Attributes Author、homepage
Attribute values
David 、
http://www.w3school.com.cn
<?xml version="1.0"?>
<RDF>
<Description
about="http://www.w3school.com.cn/RDF">
<author>David</author>
<homepage>http://www.w3school.com.cn</homepage
>
</Description>
</RDF>
Example
RDF statement
Subject Predicate Object
"The author of http://www.w3school.com.cn/rdf is
David."
Title Artist Country Company Price Year
Empire Burlesque Bob Dylan USA Columbia 10.90 1985
Hide your heart Bonnie Tyler UK CBS
Rescord
9.90 1988
RDF instance
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:cd="http://www.recshop.fake/cd#">
<rdf:Description rdf:about="http://www.recshop.fake/cd/Empire
Burlesque">
<cd:artist>Bob Dylan</cd:artist>
<cd:country>USA</cd:country>
<cd:company>Columbia</cd:company>
<cd:price>10.90</cd:price>
<cd:year>1985</cd:year>
</rdf:Description>
<rdf:Description rdf:about="http://www.recshop.fake/cd/Hide your
heart">
<cd:artist>Bonnie Tyler</cd:artist>
<cd:country>UK</cd:country>
<cd:company>CBS Records</cd:company>
<cd:price>9.90</cd:price>
<cd:year>1988</cd:year>
</rdf:Description>
How to make sentences to RDF
2
The proposed framework for
semantic annotation of Chinese
Web pages
From sentences to RDF
<div id="navfirst">
<ul id="menu">
<li id="h"><a href="/h.asp" title="HTML 系列教程">HTML 系列教程</a></li>
<li id="b"><a href="/b.asp" title="浏览器脚本教程">浏览器脚本</a></li>
<li id="s"><a href="/s.asp" title="服务器脚本教程">服务器脚本</a></li>
<li id="d"><a href="/d.asp" title="ASP.NET 教程">ASP.NET 教程</a></li>
<li id="x"><a href="/x.asp" title="XML 系列教程">XML 系列教程</a></li>
<li id="ws"><a href="/ws.asp" title="Web Services 系列教程">Web Services 系列教程
</a></li>
<li id="w"><a href="/w.asp" title="建站手册">建站手册</a></li>
</ul>
There are a large number of HTML documents on
the Web, these documents are for human reading, not
for machine processing, there is no semantic knowledge
that can be used by the computer.
In general, semantic tagging is a process
that is represented by the knowledge
representation of documents under the guidance
of domain ontology, which is usually divided
into two steps.
type tagging (TT)
relation extraction (RE)
First
Second
1、Data preparation
3、Assembly phase
2、Identification stage
Three Steps
Data Preparation
1、Domain ontology
2、Domain vocabulary
1、Data Preparation
1、Domain ontology
The domain ontology is the core data of the
semantic annotation, the definition of ontology,
the attribute, and the data of the pre stored in the
ontology.
Concept Object properties Data type
properties
Instance
data
Total
422 87 147 2420 3096
Protege
Automatic program
extraction
Domain expert manual extraction
2、Domain vocabulary
1、Data Preparation
The field of vocabulary is established by
statistical methods, the data sources to download
web pages in the focused crawler, clause
processing, data processing for natural language
text sentence set.
2、Identification stage
Explicit attribute type labeling algorithm(EPTT)
Input: Word segmentation
Output: A collection of annotated types and a
new word for word segmentation.
2、Identification stage
Begin:
Step1:Application identification rules,
recognition the general purpose type entities in
a sentence, Label type
Step2:Application of the list of words, the words
in the sentence are precisely matched, and the
corresponding types are marked
2、Identification stage
Step3:Application N tuple (N-gram) segmentation
technique, myopic match the sentence with the
words in the annotation vocabulary list;if
success, the corresponding types are marked
Step4:The result of sentence segmentation is
adjusted to ensure that the type of the word is
not cut . If it has been separated from the
segmentation process, the word will be merged
into one word.
2、Identification stage
原文:上海电力上市时间是1990年1月26日。
未标注分词结果:上海/电力/上市/时间/是/1月/26日
标注后:上海电力/上市时间/是/1990年1月26日
3、 Assembly phase
Dependency grammar:
There is a direct relation between syntactic sentence
words, the syntactic relation is a direction, is usually a word
to govern another word, the dominant and dominated
relationship reflects the relationship between the words in
a sentenc.
3、 Assembly phase
1、Dependency pair: Relation(Gov,Dep)
Gov:domain word Dep:Subordinate word
Relation:grammar relation
The dependency pair can form a dependency tree
according to parent node:Gov,the sub node :Dep.
2、Dependency tree
2、Dependency tree
3、 Assembly phase
Stanford University syntactic parser
中石油和上海电力今天在上海证券交易所上市,法人代表为张华和李刚。
3、 Assembly phase
3、Dependency forest
A sentence can be divided into a number of
clauses, each of which constitutes a dependency tree;
these trees can form the entire sentence of the
forest.
Based on dependency tree Relation extraction algorithm(D
Grammatical relationship triple,GRT
3、 Assembly phase
Conclusion
1、Explicit type marking method (EPTT) and Relation extra
Method(DTRE) is effective.
2、The field vocabulary list is still manual
tagging, the next step is to use machine learning
methods to achieve automation
1、http://www.w3school.com.cn/rdf/rdf_intro.asp
2、Semantic Annotation of Chinese Web Page:From
Sentences to RDF Representation .College of
Computer Science and Technology ,Jilin
University,Changchun 130012,Jin Tao,Zuo Wanli,Sun
Jigui,and Che Haiyuan.
Reference
Information Retrieval
Building a semantic search engine in library
GROUP ?
陈嘉玉 JiaYU Chen
2120150978
CONTENTS
Introduction
Information Retrieval
Search engine
Algorithm
Work
Reference
Introduction
Building a semantic search engine in library
--semantic search systems
Semantic search systems might combine a
range of techniques, ranging from statistics
based IR methods for ranking, database
methods for efficient indexing and query
processing, up to complex reasoning
techniques for making inferences!
History of Search Engine
The originator of search engine: Archie
The origin of modern search engine:
Wanderer
Yahoo
The first search engine in the modern
sense :Lycos Infoseek
The first meta-search engine: Metacrawler
The first search engine to support natural
language search : AltaVista
The belated king: Google
Chinese search engine in the first
place: Baidu
Concentrated,
aggressive, plain,
humility, setting
up (making money)
as a fairy tale
Focus on
technology and
Chinese search
Usage Frequency of Internet Applications in China
Frequency Frequency
Information channels Life assistant
Internet News 77.3% Job hunting 15.2%
Search engine 74.8% Online Education 24.0%
Write a blog 19.1% Online shopping 25.5%
Communication tools Internet sales 4.3%
Instant messaging 69.8% Online travel booking 3.9%
E-mail 55.4% Internet banking 20.9%
Entertainment tools Online stock trading 14.1%
Online music 68.5%
Online video 61.1%
Online games 47.0%
Information Retrieval
Definition:
Information
Retrieval (IR), is
the process to
find specific
information
from the data
source to meet
the needs.
IR
Look up the pronunciation and
meaning of a word from a
dictionary according to spelling
Look up a contact information from
a phone contacts
Look up sentences including a
word from electronic dictionary
electronic
information age
Information retrieval is the field about information
structure, analysis, organization, storage, search and
access.
Traditional
1
2
3
Universal Search
The search on the World Wide Web is the most
common applications of information retrieval.
Vertical search
Enterprise
search
4
5
Desktop
Search
P2P search
It is a special form of Web search, the
search is limited on limited topics.
It finds the needed information from a large
number of distributed computer files in a
intranet.
It is the personal Edition of Enterprise
Search. The source is collection of files
stored on personal computers, including
documents, source code, mails and web
browsing history.
It search on a network nodes
but without centralized controller.
Key Issues In Information Retrieval
Relevance is a basic concept in information
retrieval.(Precision , Recall)
A retrieval model is a formal representation of the
matching procession between a query and a
document.
Evaluation: The quality of sorted documents
depends on the matching degree between the list
and user’s requirment.
Search Engine
A search engine is the practical
application of information retrieval
technology on large-scale text collection. Important issues
in the design of
search engine• including all problems in information
retrieval:
• effective sorting algorithms
• evaluation
• user interaction.
• Large-scale data brings many other
problems to search engine:
• response time
• query throughput
• indexing speed
the most
important issue
is the
performance of
the search
engine
Algorithm
preprocessing
Tokenizing is a important step in the text
preprocessing.
Documents and queries must be transformed into morphemes in the same
way.
For a given text, there may be several segmentation results, which will affect the
result of retrieval.
Removal of stopword
Stopword refers to the words appear the most frequently in documents and
have no actual meaning. For example, functional words such as: “the”,
“of”, “to” and “for”, etc.
The problem of using a stoplist is that if the user submits a query “to be or
not to be” or “down under”, search engines are not likely to return
search results.
Solution:The indexed phase uses a very small stoplist, but query phase with
a larger stoplist.
preprocessing
Stemming task is to normalize words derived from a same stem.
For example, classify "fish", "fishes", "fishing" into an equivalent class.
Stemming usually has slightly improvement on ranking. Like stop word removal, it is optional.
Stemming on all words may lead to a search problem.
Information extraction recognizes more complex index terms.
But usual information extraction require more complex calculation.
Named entity recognition is able to detect names, places, organizations, dates and so
on.
Index creation
 1.Index term
Text conversion module converts documents into index terms or features.
 2.Document statistics
Document statistics component summarize and record statistics characteristics
of words of documents
 3.Weight computing
Weight of terms reflects their relative importance, are used to compute ranking
score.
 4.Inversion component
Inversion component is the key component of indexing, which convert
document-term stream into word-document stream.
 5.Index allocation
Index allocation component distribute index to computers, or nodes of a
network.
Query processing components
user-interface ranking
evaluation
index
Document
database
Log
database
Ranking
TF(term frequency):
is determined by the
number of
occurrence of it in a
document
1 2
DFt, document
frequency, it denotes
the number of
documents in which
term t appears.
The DF is often higher than TF by several orders of
magnitude, thus the impact of TF will be covered by DF.
It is necessary to map DF into to smaller value. Assume
the number corpus is N, the IDF of term t is (inverse
document frequency):
3
TF-IDF
We hope the weight of term t obey following rules:
(1) If t appears in only a few documents many times, it
weight is very high;
(2) If t appears few times in a document, or appears in
many documents, its weight is lower than (1). Now its
effect on the last correlation is small);
(3) if t appears in all the documents, the weight is the
minimum.
Combine TF and IDF to form term’s final weight.
Document Similarity
• Point distance
• sim(d1,d2) = |V(d1)-V(d2)|
• This value is related to the length of
documents:
In tf-idf, tf will vary with the length of doc.
Calculate the cosine
similarities of
vectors of query
and each
document.
Sort documents
according to
similarity, and
choose the K most
similar documents.
Work
Design a part of database
We divide the books
into many parts. Art
is a part of the
books.
Art also have it own
structure.
Realize keyword research This picture show
something about
the structure of the
data in our
database.
Realize keyword research A part of the code
to realize a simple
function of the
research system.
Referenc
e
1
2
3
Introduction to Semantic Search Engine
MFernandez-semanticIR
Semantically enhanced
Information Retrieval: an
ontology-based approach
4
5
基于概念的信息检索模型研究
基于开放网络知识的信
息检索与数据挖掘
6 语义搜索
THANK
S
BIT
Jena RDF Framework
Name: 郑越
Number:2120151072
一、Overview
二、Implementation Procedure
三、Reference
CONTENTS
Part
1
Overview
1
What is the Jena ?
Jena is a Java framework for the creation of applications for
the Semantic Web
Provides interfaces and classes for the creation of RDF
Also provides classes/interfaces for the management of OWL-
based ontologies.
2
What is the RDFS?
RDFS is the weakest ontology language supported by Jena. RDFS allows the ontologist to build a simple
hierarchy of concepts, and a hierarchy of properties. Consider the following trivial characterization.
A simple example:
3
Jena API and ontology languages
Jena aims to provide a consistent programming interface for ontology application
development, independent of which ontology language you are using in your programs.
The Jena Ontology API is language-neutral : the Java class names are not specific to
the underlying language.
To represent the differences between the various representations, each of the ontology
languages has a profile.
OWL:
RDFS: null (RDFS does not define object properties)
4
Ontology Model
Ontology model is an extended version of Jena's Model class. The base Model allows access to
the statements in a collection of RDF data.
OntModel extends the base Model by adding support for the kinds of constructs expected to be
in an ontology: classes (in a class hierarchy), properties (in a property hierarchy) and individuals.
All of the state information remains encoded as RDF triples stored in the RDF model.
The ontology API doesn't change the RDF representation of ontologies, just adding a set of
convenience classes and methods that make it easier for you to write programs that manipulate
the underlying RDF triples.
5
Part
2
Implementation Process
6
Create RDF Models——Resources
A simple example : People Resources
Resource “http://…/JohnSmith”represent a person
“John Smith”is a property
In Jena, resources are represented by the Resource class,and its
property is represented by the Property class. And the overall
model with the Model class to express. A Model object can
contain multiple resources.
7
Create RDF Models——
Statement
Each arrows in Model is a statement . Statement is composed of three
parts, namely subject, predicate and object.
Subject: The location of the arrow in the diagram. Representative
resources.
Predicate: Arrow in the diagram. Attribute of resources.
Object: the position of the arrow in the diagram. Value representing
attributes. It can be text, it can be a resource.
8
Output RDF
We can write an output stream through the write
Model method in model.
• model.write(OutputStream) : 也可以用
model.write(OutputStream, null) 代替。默认的输出格式。
• model.write(OutputStream, "RDF/XML-ABBREV"): 使用XML 缩略
语法输出RDF。
• model.write(OutputStream, "N-TRIPLE"): 输出n 元组的格式。
9
Input RDF
We can write an input stream
through the read Model method in
model.
10
Operation in
Model
Model.remove:can achieve the statement of the delete
operation
Model.add:can achieve the increase of statement.
Model.intersection(Model model): Intersection operation.
To create a new Model, the new Model contains two parts
in the previous.
Model.union(Model model): And operation. To create a
new Model, the new Model contains a part of the previous
two Model.
Model.difference(Model model): Repair operation. Create
a new Model, the new Model contains a single in the
Model of the parameters shown in the Model is not part of
the.
11
union operation in model
Both of the two models have the same property “vcard:FN”
After using union operation,the repeated values “vcard:FN” only appear once.
12
Reasoner
Jena contains a series of reasoning rules, mainly for the
characteristics of the definition of some of the rules, for
checking the concept of the relationship between
different classes, attributes of the transfer, mutual
inverse, disjoint, etc.
These rules can be called general rules, but it can not
meet the requirements in some specific information
retrieval of the specific areas. In this situation,we can
customize rules to meet the specific requirements.The
custom rule is the supplement to the general rule, also
is the actual application in the individual need.
Rule:(? x work in ? y),(? y use ? z) (? x use ? z)
13
The reasoning machine works can be
summarized as follows:
(1) Create the reasoning machine
according to the resource and
Ontology, which have been created
or read into RDF three tuple.
(2) Obtain the model object (InfGraph)
by Model API and Ontology API.
(3) Through the concept of reasoning.
complete the semantic based
information retrieval, get the desired
results by using OntologyAPI and
ModelAPI
14
RDQL
Query the “domestic dog” ’s vocabulary entry in the Wordnet
15
RDQL
Query the “panther” and “tiger” ’s upper word in the Wordnet
16
Persistent ontology to
database
The persistence model for any database is created
by the following steps:
1) load database JDBC driver
2) to create a database connection
3) to create a ModelMaker for the database
4) creating a model for Ontology
17
Table Name Content
jena_g1t1_stmt Ontology data
jena_g1t0_reif Processed ontology data
jena_sys_stmt System metadata
jena_graph Each user's name and unique identifier
jean_long_lit Long character constants
that are not easy to store directly in a statement
jena_long_uri A long URI
which is not easy to store directly in a statement.
jena_prefix URI prefix
16
19
20
Part
3
Reference
21
数字图书馆中基于本体的语义检索模型研究.pdf
基于Ontology的语义信息检索模型研究.pdf
Apache Jena Ontology API
API使用详解(in csdn.net )
Jena推理机制及应用研究
谢谢观赏
The Challenges of Semantic Search Engine.
Alex N. Mugire. 2820150025
Beijing Institute of Technology
Digital Library 2015
Introduction.
 Todays big problem in the information society is information
overload, a problem which is boosted by the huge size of the
world wide web (WWW). The Web has given us access to
millions of resources, irrespective of their physical location and
language.
 With the expected continuous growth of the World Wide Web
(WWW), we expect search engines will have a hard time
maintaining the quality of retrieval results. Moreover, they only
access static content, and ignore the dynamic part of the web
(pages generated from databases/ updated data).
 I there fore explained some major challenges in this
presentation.
 Challenges;
 The Availability of Content .
Currently, there is little Semantic Web content
available. Existing web content should be upgraded
to Semantic Web content including static hypertext
markerup language (HTML) pages, existing XML
content, and dynamic content, multimedia and web
services.
Scalability of Semantic Web Content.
Once we have the Semantic Web content, we need to worry about
how to manage it in a scalable manner, that is how to organize it,
where to store it and how to find the right content. Effective
exploitation of the linked data requires infrastructure that scales
to a large and ever growing collection of interlinked data.
Heterogeneity.
Effective exploitation of the data web requires an
effective mechanism for
Finding the relevant data sources,
Integrating data sources and
Combining elements from different data sources.
 Uncertainty.
Incomplete Representation of User’s Needs and content meanings
 User cannot completely specify the need which results into missing of required
data.
Example “Find action films directed by some Hong Kong film director and starring
Chinese martial actors” this creates to uncertainty of data in search area.
 The semantic information in the search space is incomplete.
 Multilinguality
This problem already exists in the current Web, and should also
be tackled in the Semantic Web. Any Semantic Web approach
should provide facilities to access information in several
languages, allowing the creation and access to semantic web
search content independently of the native language of content
providers and users.
 From eMarketer Source: Vilaweb.com who showed this statistics of
languages English 68.4%, Japanese 5.9%, German 5.8%, Chinese 3.9%,
French 3.0%, Spanish 2.4%, Russian 1.9%, Italian 1.6%, Portuguese 1.4%,
Korean 1.3% ,Other 4.6%
Language Search Problem
The development of a domain ontology.
Ontologies are playing big role of enabling the semantic web.
Semantic web communities develop ontologies in their domains,
which includes many experts in the same domain and each of have his
or her own social challenge. What does this create?
 This will require a technical team to control, manage, coordinate
and collaborate the support, which is a challenge to have such
development. Most of the ontology development tools today, like
Protégé-2000 are personal ontology editors and they lack these
functionalities.
Unnecessary Adverts Challenge.
This has become a usual challenge where some websites includes
this for that whenever you log in to them you have to be interacted
by some necessarily adverts which sometimes collapses your
search query and your computer if and only if you install some of
their advertised apps. This creates challenge to one’s wish of
searching his intent and takes much of our time. This snapshot
indicates the issue above.
See snapshot below,
Conspiracy and Hoax Websites.
There are some people who benefit from deceiving, and writing
wrong updates on matters depending on their wish. This creates
to wrong turn of one’s search intention.
 Measures to disable this challenge are needed with much
desire to save time and quick the semantic search.
Conclusion,
 I tried to identify some of challenges of semantic web search that is
currently affecting to day and tomorrow as searching tool continues
to advance forward in world wide.
 Challenges affects daily users of web search and even misleads
people in certain cases, web tools should be developed to keep
aware of these challenges.
 Finally every web site, should at least allow English version
language as their second language and being used more to easier the
search of information.
References
 Miriam Fernandez | KMI, Open University, UK Thanh Tran | Institute
AIFB, KIT, DE Peter Mika Yahoo Research, Spain
 Protégé-2000. Protégé Project, http://protege.stanford.edu/index.html.
2004.
 V. Richard Benjamins, Jesús Contreras, Oscar Corcho and Asunción
Gómez-Pérez
Intelligent Software Components, S.A.
www.isoco.com (Spain)
 Wikipedia https://en.wikipedia.org/wiki/Semantic_search
 Thanks for your Attention!
非常感谢您的关注。
Questions Allowed.

Weitere ähnliche Inhalte

Was ist angesagt?

Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introductionnimmyjans4
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibEl Habib NFAOUI
 
Lectures 1,2,3
Lectures 1,2,3Lectures 1,2,3
Lectures 1,2,3alaa223
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notesAnandh Arumugakan
 
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENT
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENTAUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENT
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENTijnlc
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphTrey Grainger
 
Context Based Web Indexing For Semantic Web
Context Based Web Indexing For Semantic WebContext Based Web Indexing For Semantic Web
Context Based Web Indexing For Semantic WebIOSR Journals
 
Improving Annotations in Digital Documents using Document Features and Fuzzy ...
Improving Annotations in Digital Documents using Document Features and Fuzzy ...Improving Annotations in Digital Documents using Document Features and Fuzzy ...
Improving Annotations in Digital Documents using Document Features and Fuzzy ...IRJET Journal
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notesBAIRAVI T
 
A novel method to search information through multi agent search and retrie
A novel method to search information through multi agent search and retrieA novel method to search information through multi agent search and retrie
A novel method to search information through multi agent search and retrieIAEME Publication
 
Searching for Meaning
Searching for MeaningSearching for Meaning
Searching for MeaningTrey Grainger
 
Open domain question answering system using semantic role labeling
Open domain question answering system using semantic role labelingOpen domain question answering system using semantic role labeling
Open domain question answering system using semantic role labelingeSAT Publishing House
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engines0P5a41b
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Editor IJARCET
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibEl Habib NFAOUI
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsTrey Grainger
 

Was ist angesagt? (19)

Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 
Lectures 1,2,3
Lectures 1,2,3Lectures 1,2,3
Lectures 1,2,3
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENT
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENTAUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENT
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENT
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge Graph
 
Context Based Web Indexing For Semantic Web
Context Based Web Indexing For Semantic WebContext Based Web Indexing For Semantic Web
Context Based Web Indexing For Semantic Web
 
Improving Annotations in Digital Documents using Document Features and Fuzzy ...
Improving Annotations in Digital Documents using Document Features and Fuzzy ...Improving Annotations in Digital Documents using Document Features and Fuzzy ...
Improving Annotations in Digital Documents using Document Features and Fuzzy ...
 
Lec1,2
Lec1,2Lec1,2
Lec1,2
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
 
A novel method to search information through multi agent search and retrie
A novel method to search information through multi agent search and retrieA novel method to search information through multi agent search and retrie
A novel method to search information through multi agent search and retrie
 
Searching for Meaning
Searching for MeaningSearching for Meaning
Searching for Meaning
 
Open domain question answering system using semantic role labeling
Open domain question answering system using semantic role labelingOpen domain question answering system using semantic role labeling
Open domain question answering system using semantic role labeling
 
Lec 2
Lec 2Lec 2
Lec 2
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engine
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
 
Accessing database using nlp
Accessing database using nlpAccessing database using nlp
Accessing database using nlp
 

Ähnlich wie Building a Semantic search Engine in a library

Towards Ontology Development Based on Relational Database
Towards Ontology Development Based on Relational DatabaseTowards Ontology Development Based on Relational Database
Towards Ontology Development Based on Relational Databaseijbuiiir1
 
SEMANTIC INFORMATION EXTRACTION IN UNIVERSITY DOMAIN
SEMANTIC INFORMATION EXTRACTION IN UNIVERSITY DOMAINSEMANTIC INFORMATION EXTRACTION IN UNIVERSITY DOMAIN
SEMANTIC INFORMATION EXTRACTION IN UNIVERSITY DOMAINcscpconf
 
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...IJwest
 
Semantic Information Retrieval Using Ontology in University Domain
Semantic Information Retrieval Using Ontology in University Domain Semantic Information Retrieval Using Ontology in University Domain
Semantic Information Retrieval Using Ontology in University Domain dannyijwest
 
Introduction of Semantic Web using NLP techniques.
Introduction of Semantic Web using NLP techniques.Introduction of Semantic Web using NLP techniques.
Introduction of Semantic Web using NLP techniques.Sandeep Wakchaure
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Amit Sheth
 
Searching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerSearching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerIOSR Journals
 
A knowledge-workbench-for-software-development
A knowledge-workbench-for-software-developmentA knowledge-workbench-for-software-development
A knowledge-workbench-for-software-developmentDimitris Panagiotou
 
Ck32985989
Ck32985989Ck32985989
Ck32985989IJMER
 
Intelligent Semantic Web Search Engines: A Brief Survey
Intelligent Semantic Web Search Engines: A Brief Survey  Intelligent Semantic Web Search Engines: A Brief Survey
Intelligent Semantic Web Search Engines: A Brief Survey dannyijwest
 
Intelligent Semantic Web Search Engines: A Brief Survey
Intelligent Semantic Web Search Engines: A Brief Survey  Intelligent Semantic Web Search Engines: A Brief Survey
Intelligent Semantic Web Search Engines: A Brief Survey dannyijwest
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic webStanley Wang
 
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based SystemSemantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based Systemijcnes
 
Novel Database-Centric Framework for Incremental Information Extraction
Novel Database-Centric Framework for Incremental Information ExtractionNovel Database-Centric Framework for Incremental Information Extraction
Novel Database-Centric Framework for Incremental Information Extractionijsrd.com
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software DatasetsTao Xie
 
3 Understanding Search
3 Understanding Search3 Understanding Search
3 Understanding Searchmasiclat
 

Ähnlich wie Building a Semantic search Engine in a library (20)

Towards Ontology Development Based on Relational Database
Towards Ontology Development Based on Relational DatabaseTowards Ontology Development Based on Relational Database
Towards Ontology Development Based on Relational Database
 
SEMANTIC INFORMATION EXTRACTION IN UNIVERSITY DOMAIN
SEMANTIC INFORMATION EXTRACTION IN UNIVERSITY DOMAINSEMANTIC INFORMATION EXTRACTION IN UNIVERSITY DOMAIN
SEMANTIC INFORMATION EXTRACTION IN UNIVERSITY DOMAIN
 
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
 
Semantic Information Retrieval Using Ontology in University Domain
Semantic Information Retrieval Using Ontology in University Domain Semantic Information Retrieval Using Ontology in University Domain
Semantic Information Retrieval Using Ontology in University Domain
 
Introduction of Semantic Web using NLP techniques.
Introduction of Semantic Web using NLP techniques.Introduction of Semantic Web using NLP techniques.
Introduction of Semantic Web using NLP techniques.
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...
 
Searching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerSearching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal Computer
 
A knowledge-workbench-for-software-development
A knowledge-workbench-for-software-developmentA knowledge-workbench-for-software-development
A knowledge-workbench-for-software-development
 
Ck32985989
Ck32985989Ck32985989
Ck32985989
 
Intelligent Semantic Web Search Engines: A Brief Survey
Intelligent Semantic Web Search Engines: A Brief Survey  Intelligent Semantic Web Search Engines: A Brief Survey
Intelligent Semantic Web Search Engines: A Brief Survey
 
Intelligent Semantic Web Search Engines: A Brief Survey
Intelligent Semantic Web Search Engines: A Brief Survey  Intelligent Semantic Web Search Engines: A Brief Survey
Intelligent Semantic Web Search Engines: A Brief Survey
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic web
 
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based SystemSemantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based System
 
Novel Database-Centric Framework for Incremental Information Extraction
Novel Database-Centric Framework for Incremental Information ExtractionNovel Database-Centric Framework for Incremental Information Extraction
Novel Database-Centric Framework for Incremental Information Extraction
 
N017249497
N017249497N017249497
N017249497
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
 
Web Content Mining
Web Content MiningWeb Content Mining
Web Content Mining
 
Web content mining
Web content miningWeb content mining
Web content mining
 
3 Understanding Search
3 Understanding Search3 Understanding Search
3 Understanding Search
 
UCIAD overview
UCIAD overviewUCIAD overview
UCIAD overview
 

Kürzlich hochgeladen

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 

Kürzlich hochgeladen (20)

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 

Building a Semantic search Engine in a library

  • 1. Project Name: Building a Semantic Search in a Library
  • 2. Project members  IJAZ UL HAQ (EJO) – GROUP LEADER (ID:2820150066)  CHEN JIAYU  RON WONG SEE  ALEX  ZHENG YUE  GARY  PANG PENGFEI
  • 3. Semantic Web  Experts expect online information to be organized in smarter, more useful ways in coming years, but there is a dispute about whether the improvements will match. Sir Tim Berner’s vision is a web that allows software agents to carry out sophisticated tasks for users, making meaningful connections between bits of information so “computers can perform more of the tedious work involved in finding, combining, and acting upon information on the web.
  • 4. Project Description  Our project is about developing semantic search in a library. Semantic web concept will be used to engineer the searching process used in library catalogue search systems and to make it efficient.  Ontology is a higher view of a database schema; it helps generate queries to extract data from the database by selecting classes and relationships.
  • 5. Applications of ontology • Searching & browsing • Decision support system • Question answering system • Recommendation • Data integration • Etc.
  • 6. Semantic digital library • Proposed an approach for managing, organizing and populating ontology for document collections in digital library. • The document metadata and content are inserted and populated to a knowledge base which allows sophisticated query and searching. • Firstly to propose an ontology based information retrieval model which is based on the classic vector space model which includes document annotation, instance-based weighting and concept-based ranking.
  • 7. Semantic digital library • General architecture
  • 10. Apache Jena Api’s Jena is a programming toolkit, using the Java programming language. While there are a few command-line tools to help you perform some key tasks using Jena, mostly you use Jena by writing Java programs.
  • 11. Eclipse And MySQL  We used Apche Jena Api’s in Java Eclipse to get the ontology file and search through the MySQL database using that Ontology, this project is not a complete search Engine for that book library, but we just differentiate between the simple Syntactic Search and the Semantic search, We have just put the data related to Software Engineering books, not the actual book but the data related to that book , and made the search to search through the ontology. And that what sematic search is, to search through the Documents or web using a pre defined Ontology.
  • 12. Future Work…  In coming future the searching engine should be that smart to fulfill all the search requirement of User Query,  To show the user what user actually wants to see.  For the Our project, it should search through the documents of books, to show user the books or articles user wants.  There are many semantic search Engines, one of them is Swoogle
  • 13. Refrences  http://protege-project.136.n4.nabble.com/PROTEGE-VS-JENA- td4663290.html  http://www.cs.ox.ac.uk/people/thomas.lukasiewicz/ssw11.pdf  https://wiki.csc.calpoly.edu/OntologyTutorial/wiki/IntroductionToOnt ologiesWithProtege  http://stackoverflow.com/questions/3536856/implement-a- semantic-search-with-in-a-web-application  http://semantic-mediawiki.org/wiki/Help:Semantic_search  https://code.google.com/p/tdwg-rdf/wiki/Beginners7OWL  http://www.semanticfocus.com/search/query/protege%20tutorial% 20video  http://www.semanticfocus.com/blog/entry/title/introduction-to- the-semantic-web-vision-and-technologies-part-3-the-resource- description-framework/  http://www.semanticfocus.com/blog/entry/title/service-ontologies/  http://www.semanticfocus.com/blog/entry/title/semantic-web- search-engine-roundup/  http://arxiv.org/ftp/arxiv/papers/1305/1305.5827.pdf  https://classes.soe.ucsc.edu/cmps080k/Winter07/lectures/game- ontology-overview.pdf
  • 14. References Continue….  http://hecpk.summon.serialssolutions.com/search?utf8=%E2%9C%93&s.q=se mantically+enhanced+information+retrieval#!/search?ho=t&l=en&q=seman tically%20enhanced%20information%20retrieval   http://rt2de9up4t.search.serialssolutions.com/?ctx_ver=Z39.88- 2004&ctx_enc=info%3Aofi%2Fenc%3AUTF- 8&rfr_id=info:sid/summon.serialssolutions.com&rft_val_fmt=info:ofi/fmt:kev: mtx:journal&rft.genre=article&rft.atitle=Semantically+enhanced+Information +Retrieval%3A+An+ontology- based+approach&rft.jtitle=Web+Semantics%3A+Science%2C+Services+and +Agents+on+the+World+Wide+Web&rft.au=Fernandez%2C+Miriam&rft.a u=Cantador%2C+IvaN&rft.au=Lopez%2C+Vanesa&rft.au=Vallet%2C+David &rft.date=2011-12-01&rft.pub=Elsevier+B.V&rft.issn=1570- 8268&rft.eissn=1873- 7749&rft.volume=9&rft.issue=4&rft.spage=434&rft_id=info:doi/10.1016%2Fj. websem.2010.11.003&rft.externalDBID=BSHEE&rft.externalDocID=273483308 &paramdict=en-US   http://link.springer.com/chapter/10.1007/978-3-540-25956-5_7  http://www.semanticfocus.com/search/query/Introduction+to+Semantic+We b+Vision+and+Technologies+-+Part+5+protege/  http://protege.stanford.edu/publications/ontology_development/ontology101 -noy-mcguinness.html
  • 15. References Continue….  http://www.codeproject.com/Articles/9240/Explorer-s-Guide-to-the- Semantic-Web-Chapter-Sea  http://www.codeproject.com/Articles/13376/Music-and-the-Semantic-Web  http://www.codeproject.com/search.aspx?q=semantic+search+engine+exam ple&doctypeid=1%3b2%3b3%3b13%3b14  https://www.quora.com/What-is-the-best-way-to-build-a-search-engine-for- research-papers-academic-purposes  http://www.slideshare.net/pwlodar1/implementing-semantic-search  http://www.slideshare.net/SergeLinckels/semantic-web-applications-search- engines?related=1  https://mariaiulianadascalu.files.wordpress.com/2014/02/owl-cs-manchester- ac-uk_-eowltutorialp4_v1_3.pdf  http://www.slideshare.net/larsga/semantic-search-with-topic-maps-2534371
  • 18. RDF Resource description framework (RDF) is a W3C standard for describing web resources, such as the title, author, date, content and copyright information of a web page.
  • 19. 1 A framework for describing resources on the Web 2 Provides a model of the data and the syntax 3 Designed to be read and understood by computer 4 Using XML Not to show people 5
  • 20. Use attributes and attribute values to describe resources Resources http://www.w3school.com.cn/r df Attributes Author、homepage Attribute values David 、 http://www.w3school.com.cn
  • 22. RDF statement Subject Predicate Object "The author of http://www.w3school.com.cn/rdf is David."
  • 23. Title Artist Country Company Price Year Empire Burlesque Bob Dylan USA Columbia 10.90 1985 Hide your heart Bonnie Tyler UK CBS Rescord 9.90 1988 RDF instance
  • 24. <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cd="http://www.recshop.fake/cd#"> <rdf:Description rdf:about="http://www.recshop.fake/cd/Empire Burlesque"> <cd:artist>Bob Dylan</cd:artist> <cd:country>USA</cd:country> <cd:company>Columbia</cd:company> <cd:price>10.90</cd:price> <cd:year>1985</cd:year> </rdf:Description> <rdf:Description rdf:about="http://www.recshop.fake/cd/Hide your heart"> <cd:artist>Bonnie Tyler</cd:artist> <cd:country>UK</cd:country> <cd:company>CBS Records</cd:company> <cd:price>9.90</cd:price> <cd:year>1988</cd:year> </rdf:Description>
  • 25. How to make sentences to RDF 2
  • 26. The proposed framework for semantic annotation of Chinese Web pages From sentences to RDF
  • 27. <div id="navfirst"> <ul id="menu"> <li id="h"><a href="/h.asp" title="HTML 系列教程">HTML 系列教程</a></li> <li id="b"><a href="/b.asp" title="浏览器脚本教程">浏览器脚本</a></li> <li id="s"><a href="/s.asp" title="服务器脚本教程">服务器脚本</a></li> <li id="d"><a href="/d.asp" title="ASP.NET 教程">ASP.NET 教程</a></li> <li id="x"><a href="/x.asp" title="XML 系列教程">XML 系列教程</a></li> <li id="ws"><a href="/ws.asp" title="Web Services 系列教程">Web Services 系列教程 </a></li> <li id="w"><a href="/w.asp" title="建站手册">建站手册</a></li> </ul> There are a large number of HTML documents on the Web, these documents are for human reading, not for machine processing, there is no semantic knowledge that can be used by the computer.
  • 28. In general, semantic tagging is a process that is represented by the knowledge representation of documents under the guidance of domain ontology, which is usually divided into two steps. type tagging (TT) relation extraction (RE) First Second
  • 31. 1、Data Preparation 1、Domain ontology The domain ontology is the core data of the semantic annotation, the definition of ontology, the attribute, and the data of the pre stored in the ontology.
  • 32. Concept Object properties Data type properties Instance data Total 422 87 147 2420 3096 Protege Automatic program extraction Domain expert manual extraction
  • 33. 2、Domain vocabulary 1、Data Preparation The field of vocabulary is established by statistical methods, the data sources to download web pages in the focused crawler, clause processing, data processing for natural language text sentence set.
  • 34. 2、Identification stage Explicit attribute type labeling algorithm(EPTT) Input: Word segmentation Output: A collection of annotated types and a new word for word segmentation.
  • 35. 2、Identification stage Begin: Step1:Application identification rules, recognition the general purpose type entities in a sentence, Label type Step2:Application of the list of words, the words in the sentence are precisely matched, and the corresponding types are marked
  • 36. 2、Identification stage Step3:Application N tuple (N-gram) segmentation technique, myopic match the sentence with the words in the annotation vocabulary list;if success, the corresponding types are marked Step4:The result of sentence segmentation is adjusted to ensure that the type of the word is not cut . If it has been separated from the segmentation process, the word will be merged into one word.
  • 38. 3、 Assembly phase Dependency grammar: There is a direct relation between syntactic sentence words, the syntactic relation is a direction, is usually a word to govern another word, the dominant and dominated relationship reflects the relationship between the words in a sentenc.
  • 39. 3、 Assembly phase 1、Dependency pair: Relation(Gov,Dep) Gov:domain word Dep:Subordinate word Relation:grammar relation The dependency pair can form a dependency tree according to parent node:Gov,the sub node :Dep. 2、Dependency tree
  • 40. 2、Dependency tree 3、 Assembly phase Stanford University syntactic parser 中石油和上海电力今天在上海证券交易所上市,法人代表为张华和李刚。
  • 41. 3、 Assembly phase 3、Dependency forest A sentence can be divided into a number of clauses, each of which constitutes a dependency tree; these trees can form the entire sentence of the forest.
  • 42. Based on dependency tree Relation extraction algorithm(D Grammatical relationship triple,GRT 3、 Assembly phase
  • 43. Conclusion 1、Explicit type marking method (EPTT) and Relation extra Method(DTRE) is effective. 2、The field vocabulary list is still manual tagging, the next step is to use machine learning methods to achieve automation
  • 44. 1、http://www.w3school.com.cn/rdf/rdf_intro.asp 2、Semantic Annotation of Chinese Web Page:From Sentences to RDF Representation .College of Computer Science and Technology ,Jilin University,Changchun 130012,Jin Tao,Zuo Wanli,Sun Jigui,and Che Haiyuan. Reference
  • 45. Information Retrieval Building a semantic search engine in library GROUP ? 陈嘉玉 JiaYU Chen 2120150978
  • 48. Building a semantic search engine in library --semantic search systems Semantic search systems might combine a range of techniques, ranging from statistics based IR methods for ranking, database methods for efficient indexing and query processing, up to complex reasoning techniques for making inferences!
  • 49. History of Search Engine The originator of search engine: Archie The origin of modern search engine: Wanderer Yahoo The first search engine in the modern sense :Lycos Infoseek The first meta-search engine: Metacrawler The first search engine to support natural language search : AltaVista The belated king: Google Chinese search engine in the first place: Baidu Concentrated, aggressive, plain, humility, setting up (making money) as a fairy tale Focus on technology and Chinese search
  • 50. Usage Frequency of Internet Applications in China Frequency Frequency Information channels Life assistant Internet News 77.3% Job hunting 15.2% Search engine 74.8% Online Education 24.0% Write a blog 19.1% Online shopping 25.5% Communication tools Internet sales 4.3% Instant messaging 69.8% Online travel booking 3.9% E-mail 55.4% Internet banking 20.9% Entertainment tools Online stock trading 14.1% Online music 68.5% Online video 61.1% Online games 47.0%
  • 52. Definition: Information Retrieval (IR), is the process to find specific information from the data source to meet the needs. IR Look up the pronunciation and meaning of a word from a dictionary according to spelling Look up a contact information from a phone contacts Look up sentences including a word from electronic dictionary electronic information age Information retrieval is the field about information structure, analysis, organization, storage, search and access. Traditional
  • 53. 1 2 3 Universal Search The search on the World Wide Web is the most common applications of information retrieval. Vertical search Enterprise search 4 5 Desktop Search P2P search It is a special form of Web search, the search is limited on limited topics. It finds the needed information from a large number of distributed computer files in a intranet. It is the personal Edition of Enterprise Search. The source is collection of files stored on personal computers, including documents, source code, mails and web browsing history. It search on a network nodes but without centralized controller.
  • 54. Key Issues In Information Retrieval Relevance is a basic concept in information retrieval.(Precision , Recall) A retrieval model is a formal representation of the matching procession between a query and a document. Evaluation: The quality of sorted documents depends on the matching degree between the list and user’s requirment.
  • 56. A search engine is the practical application of information retrieval technology on large-scale text collection. Important issues in the design of search engine• including all problems in information retrieval: • effective sorting algorithms • evaluation • user interaction. • Large-scale data brings many other problems to search engine: • response time • query throughput • indexing speed the most important issue is the performance of the search engine
  • 58. preprocessing Tokenizing is a important step in the text preprocessing. Documents and queries must be transformed into morphemes in the same way. For a given text, there may be several segmentation results, which will affect the result of retrieval. Removal of stopword Stopword refers to the words appear the most frequently in documents and have no actual meaning. For example, functional words such as: “the”, “of”, “to” and “for”, etc. The problem of using a stoplist is that if the user submits a query “to be or not to be” or “down under”, search engines are not likely to return search results. Solution:The indexed phase uses a very small stoplist, but query phase with a larger stoplist.
  • 59. preprocessing Stemming task is to normalize words derived from a same stem. For example, classify "fish", "fishes", "fishing" into an equivalent class. Stemming usually has slightly improvement on ranking. Like stop word removal, it is optional. Stemming on all words may lead to a search problem. Information extraction recognizes more complex index terms. But usual information extraction require more complex calculation. Named entity recognition is able to detect names, places, organizations, dates and so on.
  • 60. Index creation  1.Index term Text conversion module converts documents into index terms or features.  2.Document statistics Document statistics component summarize and record statistics characteristics of words of documents  3.Weight computing Weight of terms reflects their relative importance, are used to compute ranking score.  4.Inversion component Inversion component is the key component of indexing, which convert document-term stream into word-document stream.  5.Index allocation Index allocation component distribute index to computers, or nodes of a network.
  • 61. Query processing components user-interface ranking evaluation index Document database Log database
  • 63. TF(term frequency): is determined by the number of occurrence of it in a document 1 2 DFt, document frequency, it denotes the number of documents in which term t appears. The DF is often higher than TF by several orders of magnitude, thus the impact of TF will be covered by DF. It is necessary to map DF into to smaller value. Assume the number corpus is N, the IDF of term t is (inverse document frequency): 3
  • 64. TF-IDF We hope the weight of term t obey following rules: (1) If t appears in only a few documents many times, it weight is very high; (2) If t appears few times in a document, or appears in many documents, its weight is lower than (1). Now its effect on the last correlation is small); (3) if t appears in all the documents, the weight is the minimum. Combine TF and IDF to form term’s final weight.
  • 65. Document Similarity • Point distance • sim(d1,d2) = |V(d1)-V(d2)| • This value is related to the length of documents: In tf-idf, tf will vary with the length of doc. Calculate the cosine similarities of vectors of query and each document. Sort documents according to similarity, and choose the K most similar documents.
  • 66. Work
  • 67. Design a part of database We divide the books into many parts. Art is a part of the books. Art also have it own structure.
  • 68. Realize keyword research This picture show something about the structure of the data in our database.
  • 69. Realize keyword research A part of the code to realize a simple function of the research system.
  • 71. 1 2 3 Introduction to Semantic Search Engine MFernandez-semanticIR Semantically enhanced Information Retrieval: an ontology-based approach 4 5 基于概念的信息检索模型研究 基于开放网络知识的信 息检索与数据挖掘 6 语义搜索
  • 73. Jena RDF Framework Name: 郑越 Number:2120151072
  • 76. 1 What is the Jena ? Jena is a Java framework for the creation of applications for the Semantic Web Provides interfaces and classes for the creation of RDF Also provides classes/interfaces for the management of OWL- based ontologies.
  • 77. 2 What is the RDFS? RDFS is the weakest ontology language supported by Jena. RDFS allows the ontologist to build a simple hierarchy of concepts, and a hierarchy of properties. Consider the following trivial characterization. A simple example:
  • 78. 3 Jena API and ontology languages Jena aims to provide a consistent programming interface for ontology application development, independent of which ontology language you are using in your programs. The Jena Ontology API is language-neutral : the Java class names are not specific to the underlying language. To represent the differences between the various representations, each of the ontology languages has a profile. OWL: RDFS: null (RDFS does not define object properties)
  • 79. 4 Ontology Model Ontology model is an extended version of Jena's Model class. The base Model allows access to the statements in a collection of RDF data. OntModel extends the base Model by adding support for the kinds of constructs expected to be in an ontology: classes (in a class hierarchy), properties (in a property hierarchy) and individuals. All of the state information remains encoded as RDF triples stored in the RDF model. The ontology API doesn't change the RDF representation of ontologies, just adding a set of convenience classes and methods that make it easier for you to write programs that manipulate the underlying RDF triples.
  • 80. 5
  • 82. 6 Create RDF Models——Resources A simple example : People Resources Resource “http://…/JohnSmith”represent a person “John Smith”is a property In Jena, resources are represented by the Resource class,and its property is represented by the Property class. And the overall model with the Model class to express. A Model object can contain multiple resources.
  • 83. 7 Create RDF Models—— Statement Each arrows in Model is a statement . Statement is composed of three parts, namely subject, predicate and object. Subject: The location of the arrow in the diagram. Representative resources. Predicate: Arrow in the diagram. Attribute of resources. Object: the position of the arrow in the diagram. Value representing attributes. It can be text, it can be a resource.
  • 84. 8 Output RDF We can write an output stream through the write Model method in model. • model.write(OutputStream) : 也可以用 model.write(OutputStream, null) 代替。默认的输出格式。 • model.write(OutputStream, "RDF/XML-ABBREV"): 使用XML 缩略 语法输出RDF。 • model.write(OutputStream, "N-TRIPLE"): 输出n 元组的格式。
  • 85. 9 Input RDF We can write an input stream through the read Model method in model.
  • 86. 10 Operation in Model Model.remove:can achieve the statement of the delete operation Model.add:can achieve the increase of statement. Model.intersection(Model model): Intersection operation. To create a new Model, the new Model contains two parts in the previous. Model.union(Model model): And operation. To create a new Model, the new Model contains a part of the previous two Model. Model.difference(Model model): Repair operation. Create a new Model, the new Model contains a single in the Model of the parameters shown in the Model is not part of the.
  • 87. 11 union operation in model Both of the two models have the same property “vcard:FN” After using union operation,the repeated values “vcard:FN” only appear once.
  • 88. 12 Reasoner Jena contains a series of reasoning rules, mainly for the characteristics of the definition of some of the rules, for checking the concept of the relationship between different classes, attributes of the transfer, mutual inverse, disjoint, etc. These rules can be called general rules, but it can not meet the requirements in some specific information retrieval of the specific areas. In this situation,we can customize rules to meet the specific requirements.The custom rule is the supplement to the general rule, also is the actual application in the individual need. Rule:(? x work in ? y),(? y use ? z) (? x use ? z)
  • 89. 13 The reasoning machine works can be summarized as follows: (1) Create the reasoning machine according to the resource and Ontology, which have been created or read into RDF three tuple. (2) Obtain the model object (InfGraph) by Model API and Ontology API. (3) Through the concept of reasoning. complete the semantic based information retrieval, get the desired results by using OntologyAPI and ModelAPI
  • 90. 14 RDQL Query the “domestic dog” ’s vocabulary entry in the Wordnet
  • 91. 15 RDQL Query the “panther” and “tiger” ’s upper word in the Wordnet
  • 92. 16 Persistent ontology to database The persistence model for any database is created by the following steps: 1) load database JDBC driver 2) to create a database connection 3) to create a ModelMaker for the database 4) creating a model for Ontology
  • 93. 17 Table Name Content jena_g1t1_stmt Ontology data jena_g1t0_reif Processed ontology data jena_sys_stmt System metadata jena_graph Each user's name and unique identifier jean_long_lit Long character constants that are not easy to store directly in a statement jena_long_uri A long URI which is not easy to store directly in a statement. jena_prefix URI prefix
  • 94. 16
  • 95. 19
  • 96. 20
  • 100. The Challenges of Semantic Search Engine. Alex N. Mugire. 2820150025 Beijing Institute of Technology Digital Library 2015
  • 101. Introduction.  Todays big problem in the information society is information overload, a problem which is boosted by the huge size of the world wide web (WWW). The Web has given us access to millions of resources, irrespective of their physical location and language.  With the expected continuous growth of the World Wide Web (WWW), we expect search engines will have a hard time maintaining the quality of retrieval results. Moreover, they only access static content, and ignore the dynamic part of the web (pages generated from databases/ updated data).  I there fore explained some major challenges in this presentation.
  • 102.  Challenges;  The Availability of Content . Currently, there is little Semantic Web content available. Existing web content should be upgraded to Semantic Web content including static hypertext markerup language (HTML) pages, existing XML content, and dynamic content, multimedia and web services.
  • 103. Scalability of Semantic Web Content. Once we have the Semantic Web content, we need to worry about how to manage it in a scalable manner, that is how to organize it, where to store it and how to find the right content. Effective exploitation of the linked data requires infrastructure that scales to a large and ever growing collection of interlinked data.
  • 104. Heterogeneity. Effective exploitation of the data web requires an effective mechanism for Finding the relevant data sources, Integrating data sources and Combining elements from different data sources.
  • 105.  Uncertainty. Incomplete Representation of User’s Needs and content meanings  User cannot completely specify the need which results into missing of required data. Example “Find action films directed by some Hong Kong film director and starring Chinese martial actors” this creates to uncertainty of data in search area.  The semantic information in the search space is incomplete.
  • 106.  Multilinguality This problem already exists in the current Web, and should also be tackled in the Semantic Web. Any Semantic Web approach should provide facilities to access information in several languages, allowing the creation and access to semantic web search content independently of the native language of content providers and users.  From eMarketer Source: Vilaweb.com who showed this statistics of languages English 68.4%, Japanese 5.9%, German 5.8%, Chinese 3.9%, French 3.0%, Spanish 2.4%, Russian 1.9%, Italian 1.6%, Portuguese 1.4%, Korean 1.3% ,Other 4.6%
  • 108. The development of a domain ontology. Ontologies are playing big role of enabling the semantic web. Semantic web communities develop ontologies in their domains, which includes many experts in the same domain and each of have his or her own social challenge. What does this create?  This will require a technical team to control, manage, coordinate and collaborate the support, which is a challenge to have such development. Most of the ontology development tools today, like Protégé-2000 are personal ontology editors and they lack these functionalities.
  • 109. Unnecessary Adverts Challenge. This has become a usual challenge where some websites includes this for that whenever you log in to them you have to be interacted by some necessarily adverts which sometimes collapses your search query and your computer if and only if you install some of their advertised apps. This creates challenge to one’s wish of searching his intent and takes much of our time. This snapshot indicates the issue above. See snapshot below,
  • 110.
  • 111. Conspiracy and Hoax Websites. There are some people who benefit from deceiving, and writing wrong updates on matters depending on their wish. This creates to wrong turn of one’s search intention.  Measures to disable this challenge are needed with much desire to save time and quick the semantic search.
  • 112. Conclusion,  I tried to identify some of challenges of semantic web search that is currently affecting to day and tomorrow as searching tool continues to advance forward in world wide.  Challenges affects daily users of web search and even misleads people in certain cases, web tools should be developed to keep aware of these challenges.  Finally every web site, should at least allow English version language as their second language and being used more to easier the search of information.
  • 113. References  Miriam Fernandez | KMI, Open University, UK Thanh Tran | Institute AIFB, KIT, DE Peter Mika Yahoo Research, Spain  Protégé-2000. Protégé Project, http://protege.stanford.edu/index.html. 2004.  V. Richard Benjamins, Jesús Contreras, Oscar Corcho and Asunción Gómez-Pérez Intelligent Software Components, S.A. www.isoco.com (Spain)  Wikipedia https://en.wikipedia.org/wiki/Semantic_search
  • 114.  Thanks for your Attention! 非常感谢您的关注。 Questions Allowed.

Hinweis der Redaktion

  1. Resource description framework (RDF) is a W3C standard for describing web resources, such as the title, author, date, content and copyright information of a web page. 资源描述框架(RDF)是用于描述网络资源的 W3C 标准,比如网页的标题、作者、修改日期、内容以及版权信息
  2. RDF 指资源描述框架(Resource Description Framework) RDF 是一个用于描述 Web 上的资源的框架 RDF 提供了针对数据的模型以及语法,这样独立的团体们就可以交换和使用它 RDF 被设计为可被计算机阅读和理解 RDF 被设计的目的不是为了向人们显示出来 RDF 使用 XML 编写 RDF 是 W3C 语义网络活动的组成部分 RDF 是一个 W3C 推荐标准
  3. RDF 规则 : 使用 Web 标识符 (URIs) 来标识资源。 RDF 使用属性和属性值来描述资源。Use attributes and attribute values to describe resources 资源是可拥有 URI 的任何事物Can have anything URI ,比如 "http://www.w3school.com.cn/rdf" 属性是拥有名称的资源,比如 "author" 或 "homepage" 属性值是某个属性的值,比如 "David" 或 "http://www.w3school.com.cn" <?xml version="1.0"?> <RDF> <Description about="http://www.w3school.com.cn/RDF"> <author>David</author> <homepage>http://www.w3school.com.cn</homepage> </Description> </RDF> RDF 陈述 资源、属性和属性值的组合可形成一个陈述(被称为陈述的主体、谓语和客体)。 请看一些陈述的具体例子,来加深理解: 陈述:"The author of http://www.w3school.com.cn/rdf is David." 陈述的主体是:http://www.w3school.com.cn/rdf 谓语是:author 客体是:David
  4. <?xml version="1.0"?> <RDF> <Description about="http://www.w3school.com.cn/RDF"> <author>David</author> <homepage>http://www.w3school.com.cn</homepage> </Description> </RDF>
  5. RDF 陈述 RDF statement  资源、属性和属性值的组合可形成一个陈述(被称为陈述的主体、谓语和客体)。 请看一些陈述的具体例子,来加深理解: 陈述:"The author of http://www.w3school.com.cn/rdf is David." 陈述的主体是:http://www.w3school.com.cn/rdf 谓语是:author 客体是:David
  6. 这是一个 CD 列表的其中几行: TitleArtistCountryCompanyPriceYearEmpire BurlesqueBob DylanUSAColumbia10.901985 Hide your heartBonnie TylerUKCBS Records9.901988 这是一个 RDF 文档的其中几行: <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cd="http://www.recshop.fake/cd#"> <rdf:Description rdf:about="http://www.recshop.fake/cd/Empire Burlesque"> <cd:artist>Bob Dylan</cd:artist> <cd:country>USA</cd:country> <cd:company>Columbia</cd:company> <cd:price>10.90</cd:price> <cd:year>1985</cd:year> </rdf:Description> <rdf:Description rdf:about="http://www.recshop.fake/cd/Hide your heart"> <cd:artist>Bonnie Tyler</cd:artist> <cd:country>UK</cd:country> <cd:company>CBS Records</cd:company> <cd:price>9.90</cd:price> <cd:year>1988</cd:year> </rdf:Description> . . . </rdf:RDF>此 RDF 文档的第一行是 XML 声明。这个 XML 声明之后是 RDF 文档的根元素:<rdf:RDF>。 xmlns:rdf 命名空间,规定了带有前缀 rdf 的元素来自命名空间 "http://www.w3.org/1999/02/22-rdf-syntax-ns#"。 xmlns:cd 命名空间,规定了带有前缀 cd 的元素来自命名空间 "http://www.recshop.fake/cd#"。 <rdf:Description> 元素包含了对被 rdf:about 属性标识的资源的描述。 元素:<cd:artist>、<cd:country>、<cd:company> 等是此资源的属性。
  7. 此 RDF 文档的第一行是 XML 声明。这个 XML 声明之后是 RDF 文档的根元素:<rdf:RDF>。 xmlns:rdf 命名空间,规定了带有前缀 rdf 的元素来自命名空间 "http://www.w3.org/1999/02/22-rdf-syntax-ns#"。 xmlns:cd 命名空间,规定了带有前缀 cd 的元素来自命名空间 "http://www.recshop.fake/cd#"。 <rdf:Description> 元素包含了对被 rdf:about 属性标识的资源的描述。 元素:<cd:artist>、<cd:country>、<cd:company> 等是此资源的属性。
  8. Web上存在大量的HTML文档,这些文档是供人类阅读的,而不是为了机器处理,没有可以被计算机利用的语义知识,需要经过语义文档的标注,使得文档中知识规范化,可被机器处理。 There are a large number of HTML documents on the Web, these documents are for human reading, not for machine processing, there is no semantic knowledge that can be used by the computer. 
  9. 概括的讲,语义标注是一个在领域本体指导下为文档填加规范化知识表示的过程。即将文档中的文本知识用RDF语言描述出来,这个过程通常分成两个步骤。 1)将文档中与本体中概念相对应的词词标记出来,作为概念对应的事例,通常以RDF资源形式表示。 To mark the words that correspond to the concept of a document in the document, as a case, which is a concept corresponding to the RDF resource  2)找出这些事例当中的存在的与本体中属性相对应的关系,通常将关联的两个事例及实例间关系表示为(R1,P,R2),即一个RDF陈述。 To find out the relationship between the existence of these cases and the corresponding attributes in the ontology, the relationship between the two cases and the instance of the P is usually expressed as a RDF statement. 第一步,类型标注,type tagging TT 第二步,关系抽取,relation extraction RE
  10. 数据准备领域本体,领域词汇表 领域本体是语义标注的核心数据,本体内定义的概念、属性等元信息以及本体内预先存放的实例数据 ---将在识别阶段为类型标注器提供类型标注信息及标注列表 ---在组合阶段被用于验证知识三元组的有效性。 领域词汇表通过统计学的方法建立,其数据来源于聚焦爬虫所下载的网页集合,在分句处理后,将数据处理为自然语言文本句子集合。
  11. 数据准备领域本体,领域词汇表 领域本体是语义标注的核心数据,本体内定义的概念、属性等元信息以及本体内预先存放的实例数据 The domain ontology is the core data of the semantic annotation, the definition of ontology, the attribute, and the data of the pre stored in the ontology.  ---将在识别阶段为类型标注器提供类型标注信息及标注列表 The type and label information and the list of the types are provided in the identification phase.  ---在组合阶段被用于验证知识三元组的有效性。 In the combination phase is used to verify the validity of the knowledge of the three tuple. 
  12. 我的工作是将中文领域网页的语义标注,建立一个符合中国金融实际的领域本体,由于时间不足,仅仅完成股票相关的概念、属性等信息定义工作,通过Protégé工具,构造的本体包括概念422个,对象87个,数据类型
  13. 领域词汇表通过统计学的方法建立,其数据来源于聚焦爬虫所下载的网页集合,在分句处理后,将数据处理为自然语言文本句子集合。 The field of vocabulary is established by statistical methods, the data sources to download web pages in the focused crawler, clause processing, data processing for natural language text sentence set. 
  14. Begin Step1:应用识别规则,对句子中的数字、金钱、日期等通用类型实体识别,并标注类型; Step1:Application identification rules,recognition the general purpose type entities in a sentence, Label type   Step2:应用标注词汇列表,对句子中的词汇进行精确匹配,并标注对应类型; Step2:Application of the list of words, the words in the sentence are precisely matched, and the corresponding types are marked.  Step3:应用N元组(N-gram)切分技术,将句中词与标注词汇列表中的词进行近视匹配,对匹配成功的,标注对应类型。 Step3:Application N tuple (N-gram) segmentation technique, myopic match the sentence with the words in the annotation vocabulary list;if success, the corresponding types are marked Step4:对于句子分词结果进行调整,保证已经标注类型的词不被切分;若已由分词程序切分,则将分开的词重新合并为一个词;将句中的数字、日期、金钱等词汇转化为与本体内数字、日期相符的规范形式,并建立原形与新形的对照表。
  15. Begin Step1:应用识别规则,对句子中的数字、金钱、日期等通用类型实体识别,并标注类型; Step1:Application identification rules,recognition the general purpose type entities in a sentence, Label type   Step2:应用标注词汇列表,对句子中的词汇进行精确匹配,并标注对应类型; Step2:Application of the list of words, the words in the sentence are precisely matched, and the corresponding types are marked.  Step3:应用N元组(N-gram)切分技术,将句中词与标注词汇列表中的词进行近视匹配,对匹配成功的,标注对应类型。 Step3:Application N tuple (N-gram) segmentation technique, myopic match the sentence with the words in the annotation vocabulary list;if success, the corresponding types are marked Step4:对于句子分词结果进行调整,保证已经标注类型的词不被切分;若已由分词程序切分,则将分开的词重新合并为一个词;将句中的数字、日期、金钱等词汇转化为与本体内数字、日期相符的规范形式,并建立原形与新形的对照表。 Step4:The result of sentence segmentation is adjusted to ensure that the type of the word is not cut . If it has been separated from the segmentation process, the word will be merged into one word. 
  16. 句中各依存对按照以Gov为父节点,Dep为子节点的形式进行连接,可以形成一棵描述句子依存关系的依存树。 The dependency pair can form a dependency tree according to parent node:Gov,the sub node :Dep.