SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen

Abstract— Information on the web is tremendously increasing in
recent years with the faster rate. This massive or voluminous data
has driven intricate problems for information retrieval and
knowledge management. As the data resides in a web with several
forms, the Knowledge management in the web is a challenging
task. Here the novel 'Semantic Web' concept may be used for
understanding the web contents by the machine to offer
intelligent services in an efficient way with a meaningful
knowledge representation. The data retrieval in the traditional
web source is focused on 'page ranking' techniques, whereas in
the semantic web the data retrieval processes are based on the
‘concept based learning'. The proposed work is aimed at the
development of a new framework for automatic generation of
ontology and RDF to some real time Web data, extracted from
multiple repositories by tracing their URI’s and Text Documents.
Improved inverted indexing technique is applied for ontology
generation and turtle notation is used for RDF notation. A
program is written for validating the extracted data from
multiple repositories by removing unwanted data and considering
only the document section of the web page.
Index Terms— Semantic Web, Resource description
framework, Ontology, Improved inverted indexing technique,
Knowledge management.
I. INTRODUCTION
World Wide Web (WWW) is considered as a global
information repository that identifies documents and other web
resources by Uniform Resource Locators, interlinked by
hypertext links. Search engines are used to retrieve the
information from the web. Data overburden is the most
concerning issue in these days for the existing system.
Evolution of web includes the web versions of web 1.0, 2.0
etc. In this series, the web version 3.0 is referred to as semantic
web [1] is evolved as a knowledge management support across
the globe. Search engines should be enriched with semantic
web capabilities that analyze webpage content and provide
more relevant results corresponding to the user query.
Semantic web standards include resource description
framework (RDF), web ontology, RDF Schema and rule
interchange format (RIF) for handling data. Resource
description framework (RDF) provides a conceptual
description of information for representing the web resources
like Turtle syntax, N-Triples etc. Resource Description
Framework (RDF) describes data on the Web in graph form
[2]. Ontologies consist of the finite set of terms, relationships,
constraints and axioms [3]. Ontologies have proven to be
useful for effective knowledge modeling and information
retrieval. The remaining paper is arranged as follows: In
Section 2 the related work is presented. The proposed work
and its methodology are discussed in Section 3 & 4. The
results are presented in Section 5. Conclusions are given in
Section 6.
II. RELATED WORK
M.S.P.Babu et.al [4] provided the overview of some of the
semantic search engines that yield unique search experience
for users. Wilkinson et.al [5] proposed an information retrieval
system using document structure. Amel Grissa Touzi et.al [6]
suggested the Fuzzy Ontology of Data mining (FODM) for
processing automated generation of ontologies in the domain
of data mining. Amira Aloui et.al [7] implemented a plugin
named “FO-FQ Tab plug-in”, which can be integrated with
protégé editor for building the fuzzy ontologies from large
databases. To overcome the drawbacks of the existing system
for accessing the related science information, M.S.P.Babu et.al
[8] proposed a new framework for automatic generation of
ontology and RDF for real-time web data. Tahani Alsubait
et.al [9] developed the e-learning suite, with the set of
questions designed using ontological representation.
A.H.M.Rupasingha et.al [10] suggested that the performance
of the ontology generation is always dependent on the
specificity of the terms. Seongwook Youn et.al [11] discussed
pros and cons of tools like protégé 2000, OilEd, Apollo,
OntoLingua, Onto Edit, webODE, KAON, ICOM, DEO,
webOnto that is used for ontology creation.Kgotatso Desmond
Mogotlane et.al[12] presented a comparative study of plugins
of protege tool like DB2OWL and Data Master.
Sudeepthi Govathoti1, M.S. Prasasd Babu2
Research Scholar, Andhra University, Visakhapatnam &Assistant Professor, Anurag Group of Institutions, Hyderabad,
Professor, Department of CS&SE, Andhra University, Visakhapatnam,
sudeepthicse@cvsr.ac.in1
, profmspbabu@gmail.com2
An Implementation of a New Framework for
Automatic Generation of Ontology and RDF to
Real Time Web and Journal Data
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 1, January 2018
89 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
III. PROPOSED WORK
Semantic web capabilities like RDF & ontology are applied
to enrich the knowledge. The proposed work is an
implementation of the framework proposed by the authors [8].
The framework is designed with reference to the semantic web
Stack. It is carried out in two phases, namely Data extraction
phase and Data representation phase. Web scraping is
performed using HTML parsing technique in data extraction
phase by giving sample search query as an input to multiple
repositories. DOM parsing and HTML parsing techniques are
applied to validate the data retrieved from multiple repositories
by considering only the document section of the webpage.
Extensible markup language (XML) is the base for the
semantic web representations; the validated information is
converted into semi-structured notations by using XSD
declaration from DOM tree and passed as an input for the next
layers of the proposed framework. XML notation is given as
an input to data representation phase. RDF notation is
generated and represented in graphical form using Graphviz
tool. A textual representation of RDF graph is provided using
Turtle, the Terse RDF Triple Language. Improved Inverted
Indexing technique is applied for ontological representation of
words by excluding the stop words.
Figure 1: Proposed framework
IV METHODOLOGY
Implementation of the framework proposed in Section III will
be carried out in two phases namely data extraction and data
representation phases. The details are given below.
Phase 1: Data extraction
Data extraction phase performs web scraping from multiple
repositories and stores the scraped data into the database. The
Data extraction phase is sub- divided into three steps namely
web scraping, data validation, XML Conversion. The scraped
data is further validated by removing the unwanted data in the
considering document section of web page. The data stored in
table format in the database, after the data validation process,
is converted into the Semi-Structured Notations (i.e. XML
Notations) and passed as an input to the data representation
phase.
Step 1: Web Scraping
Web scraping, also be referred as screen scraping or Web
harvesting, is used to fetch and extract the data from a web
document using HTML parsing techniques. Here Web pages
are crawled and the content of the Web page is extracted. The
data in the Web page includes three sections namely: Web
page statistics bar, document section and descriptive section.
The three items are stored in a database as three different
attributes in a database table. HTML parsing technique is used
for scraping data from the web documents is shown in Fig 2:
Figure 2: Content of the Web page
Step 2: Data Validation
In the Data Validation Step, the data collected from step 1 is
validated using HTML and DOM parsing techniques. Here
unwanted data is removed and the necessary portion of URLs
is retained. In this step web status bar and descriptive sections
are removed in the database table. The validated data is stored
in a database. Document section displays the results in the
form of page title, URL, Snippet (description) for the given
search query.
Figure 3: Content of the Web page after Data validation
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 1, January 2018
90 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
Descriptive section provides the Wikipedia information
about the input query. The Data validation process is carried
out by considering only the document section of the web page
as shown in Fig: 3.
Step.3: XML Conversion
In XML Conversion Step the data, validated in step 2, is converted
into a DOM tree using XML Schema Definition (XSD). The
conversion is performed on data that is validated and stored in
database by considering each individual field/ attribute into
namespace convention. The XSD declaration of DOM tree
has hierarchical structures which have root node, representing
the search key word and three child nodes, representing Title,
URL and Description respectively. The XSD declaration of
the DOM tree with an example is shown in Fig 4:
Figure 4: XSD Declaration of DOM tree
Phase2: Data representation
Data extracted from steps 1,2 and 3 is maintained in an XML
format and is given as an input to data representation phase. In
Semantic Web architecture, the major source of data
representation imposes RDF-ization and Ontology generation.
Hence the data representation phase is sub- divided into two
steps namely RDF-ization and ontology generation, which are
explained in detail in step 4 & 5 respectively.
Step 4: RDF-ization
The Resource Description Framework (RDF) is the basic
building block in semantic web, promoting conceptual
modeling of web data [13]. The RDF-ization process is carried
out using Turtle notation and Graphviz tool. In this step the
XML notation data stored in extraction phase is given as an
input to RDF-ization. The RDF notation is visualized in the
form of RDF graph using Graphviz tool. Decomposition of
tuple creates a new blank node corresponding to the row and a
new triple set is obtained. Each tuple in a relational database is
decomposed as RDF triples, namely: the title is taken as
subject, URL is considered as predicate and description is
taken as object. A node can be a URI reference, literal or the
blank node. The graph in Fig: 5 is an example of RDF-ization
process of a semantic net.
Figure 5: RDF Triple
The triple is represented as a <subject, predicate, object>
format by exploring the relationship among the nodes [14].
The XML conversion carried out in step 3 is represented in
RDF syntax using Turtle notation from the convention
specified in “http://www.w3.org/1999/02/22-rdf-syntax-ns#“
as shown in Fig 6:
Fig-6: RDF-ization
The <rdf: Description> element provides the description of
resource identified by <rdf: about> attribute. The tags <rbss:
title>, <rbss: keyword>, <rbss: URL> are the properties of the
resource identified. The RDF represented in turtle notation is
visualized in a graphical format using Graphviz tool .It is open
source software that is used for generating graphs.
Step 5: Ontology Generation
Ontology is defined as a formal specification of
conceptualization of the domain of Interest. In ontology
generation step, the RDF notation obtained from step 4 is used
to create a vocabulary of words using improved inverted
indexing algorithm. Improved Inverted Indexing algorithm is
employed on real time web data collected from multiple
repositories and text documents. The words from the
description tags are extracted by excluding the stop words and
frequency count/Term frequency (TF) of each word is
maintained. The illustration of improved inverted indexing
algorithm is presented as follows:
Algorithm: Improved inverted indexing
Input: Database D= {T1, T2…Tn}, Storage Database
Output: Attributes {A1, A2…An}, where Ai, for i=1,2…n
are representing ontology vocabulary.
Parameters: Swrdk= Array of Stop Words
attsLq= Snippet attribute
Wordsk= Words stemmed from snippet attribute
attsLf= Word frequencies after stemming
attsL= ontology along with the frequencies count.
1. Swrdk={};
2. for i=0;i<=i+,i≤ D do
3. attsLq=Query Coverage(D,i);
4. Wordsk=Words Separate(attsLq,Swrdk);
5. attsLf= Words Usage frequency(D,attsLq,WordsK);
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 1, January 2018
91 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
6. attsL=attsLqU attsLf
7. f=highest_freq(attsL)
8. if (f<freq(attsL)) then
9. sort(Wordsk,freq(attsL))
10. end if
11.end for
12.return (Wordsk,freq(attsL)
V. RESULTS AND DISCUSSION
The Semantic web stack proposed by Tim Berners Lee [15] is
implemented by using the frame work proposed by the authors
in section III. It is implemented in PHP version: 5 (Open
Source scripting language) and MySQL version: 5 (open-
source relational database management system) environments.
It is tested on an input with test dataset comprising of sample
search keywords. Response time is the amount of time that
elapses from the receipt of the query until the results are
displayed to user. Response time can be measured on server
side or client side as shown in Fig 7.
Figure 7: Response Time
Throughput is defined as number of queries executed per
second (qps). Throughput and response time are observed for
the set of retrieval operations with respect to the page load
times. The performance of framework implemented with
respect to throughput is shown in Fig 8.
Figure 8: Throughput
.
A sample search query is given as an input and web scraping
results are shown in Fig 9:
Figure 9: Web scraping results
Web scraping performance is evaluated by considering the
following parameters like database size and count of URL’s
extracted which is shown in Fig 10:
Figure 10: Web scraping analysis
Scraped data from multiple repositories is given as an input to
data validation step. The validated data is obtained as an
output to data validation process by applying HTML and
DOM parsing technique. Data validation considers only the
document section of a web page. Data validation results are
shown in Fig 11:
Figure 11: Data validation results
The performance of data validation processing with wanted
and unwanted data from the scraped data by considering the
following parameters like database size and count of URL’s is
shown in Fig 12:
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 1, January 2018
92 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
Figure12: Data Validation
The validated data stored in database is converted into the
XML notations by applying XSD declaration as shown in
Fig 13:
Figure13: XML Conversion
Resource Description Framework (RDF) is a recommended
standard of World Wide Web Consortium (W3C) [16]. RDF
representation of data in turtle form is shown in Fig 14:
Figure14: RDF-ization results Turtle form
RDF generation for sample relation named “testrdf” which has
an attributes as <name, description, freq> is considered. The
“testrdf” represents the relation name is considered as a class
in an RDF graph and has set of three nodes that are connecting
the testrdf in depth wise manner represents the tuple of a
relation as shown in Fig 15:
Figure15: RDF Graph generation
Ontology generation for the data obtained from multiple
repositories as well as the text file. Improved inverted
indexing technique is applied for extracting the words with
their frequencies discarding the stop words, in the order of
highest precedence. The result of ontology generation for real
time web data with frequency is shown in Fig 16:
Figure16: Ontology Generation for Real time web data
The highest frequency word is considered as a frequent search
term for the purpose of rule framing using description logic.
The rule mapping is done for the efficient retrieval operation
which will be future work. The result of ontology generation
for text document is shown in Fig 17.
Figure17: Ontology Generation for text document
VI. CONCLUSION
The evolution of web has taken many forms namely web 1.0,
web 2.0, web 3.0 , web 4.0 which lead to high-end
information retrieval systems using semantic web. The existing
traditional system collects the data from search engines is
exhibiting average performance in retrieval. Implementation of
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 1, January 2018
93 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
proposed framework for automatic generation of ontology’s
and RDF improves the performance of traditional search
engines by incorporating semantic capabilities. It includes the
application of HTML parsing technique, DOM parsing
techniques and Turtle notation of graphviz tool. The algorithm
improves information retrieval in Semantic Web and Expert
Systems. The future work includes applying efficient
cryptography for securing database and rule framing for the
design of an expert system.
REFERENCES
[1] Sareh Aghaei, Mohammad Ali Nematbakhsh and Hadi Khosravi
Farsani, “Evolution Of The World Wide Web: From Web 1.0 To
Web 4.0”, International Journal of Web & Semantic Technology
(IJWesT) Vol.3, No.1, January 2012.
[2] Abdeslem DENNAI, Sidi Mohammed BENSLIMANE,"Semantic
Indexing of Web Documents Based on Domain Ontology", I.J.
Information Technology and Computer Science, 2015, 02, 1-11.
[3] Seema Redekar, Vishal Chekkala, Siddhapa Gouda, Swapnil
Yalgude,"Web Search Engine Using Ontology Learning",International
Journal of Innovative Research in Computer and Communication
Engineering,Vol. 5, Issue 3, March 2017.
[4] G Sudeepthi, G Anuradha, M Surendra Prasad Babu,” A survey on
semantic web search engine” International Journal of Computer
Science Issues, 2012/3 IJCSI, Volume 9 Issue 2 Pages 241-245.
[5] R. Wilkinson, ‟Effective retrieval of structured documents‟. (S.-V. New
York, Ed.) Pages 311 – 317, 1994.
[6] Amel Grissa Touzi, Hela Ben Massoud and Alaya Ayadi,” Automatic
Ontology Generation for Data Mining Using FCA and Clustering”,
arxiv.org, no. 1311.1764.
[7] Amira Aloui, ENIT, Tunis, Tunisia,”A Fuzzy Ontology-Based Platform
for Flexible Querying”, International Journal of Service Science,
Management, Engineering, and Technology, Vol.6, Issue 3, July-
September 2015, pp 12-26.
[8] Prof. M Surendra Prasad Babu, Sudeepthi Govathoti,”A Semantic Model
for Building Integrated Ontology Databases”, 7th
IEEE International
conference on software engineering and service science.
[9] Tahani Alsubait, Bijan Parsia, Ulrike Sattler,”Ontology-Based Multiple
Choice Question Generation”, Kunstl Intell, Vol. 30, 2016, pp 183-188.
[10] Rupasingha A. H. M. Rupasingha, “Improving Web Service Clustering
through a Novel Ontology Generation Method by Domain Specificity “,
IEEE 24th International Conference on Web Services, 2017.
[11] Seongwook Youn, Anchit Arora, Preetham Chandrasekhar, Paavany
Jayanty, Ashish Mestry and Shikha Sethi,”Survey about Ontology
Development Tools for Ontology-based Knowledge Management”.
[12] Kgotatso Desmond Mogotlane, Jean Vincent Fonou-
Dombeu,”Automatic Conversion of Relational Databases into
Ontologies”.
[13] FatemeAbiri, Mohsen Kahani, FataneZarinkalam"An Entity Based RDF
Indexing Schema Using Hadoop and HBase", 2011.
[14] Faizan Shaikh, Usman A. Siddiqui, IramShahzadi, SyedUami, Zubair A.
Shaik "SWISE: Semantic Web based Intelligent Search Engine", 2010.
[15] Gopal Pandey,” The Semantic Web: An Introduction and Issues”,
International Journal of Engineering Research and Applications, Vol. 2,
Issue 1,Jan-Feb 2012, pp.780-786.
[16] Resource Description Framework (RDF). Model and Syntax
Specification. Technical
report, W3C.
Prof. M.S.Prasad Babu was born on 12 -
08-1956 in Andhra Pradesh, India. He
obtained his bachelors to doctoral degrees
from Andhra University. He has 39 years of
teaching and research experience. He guided
12 PhD's and 210 PG students for their
thesis. He was the president of ICT section
of ISCA in 2006-07. He attended about 50
National and International Conferences in
India and abroad and presented keynotes. He contributed about
250 papers National and International journals and
conferences. He developed 10 international reputed Web
portals. He won ISCA Young Scientist Award in 1986, State
Best teacher award for engineering in 2015 and Dr. Sarvepalli
Radhakrishnan Best Academician Award of Andhra
University for 2014. He was the Conference Steering Chair
for eight IEEE ICSESS of Beijing, China from 2010 to 2017.
Prof Babu is presently working as Vice Principal of AU Engg
College, Chairman, Faculty of Engineering of Andhra
University and senior Professor in Computer Science &
Systems Engineering discipline.
G. SUDEEPTHI was born in,
Rajahmundry, Andhra Pradesh, India in
1983. She received B. TECH from National
Institute of Technology Warangal, India in
2005 and M. Tech in Computer Science &
Engineering from KLC Vijayawada, India
in 2007. She is working as part time
research scholar in the Dept. of CS & SE, Andhra University
and as a Assistant Professor at Anurag Group of Institutions
Hyderabad, India. She has got more than ten years of teaching
experience. She Qualified FET-2011 Conducted by JNTUH.
She contributed seven research papers in International journals
and Presented one research paper at International conference in
Beijing, China and another in Warangal, India.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 1, January 2018
94 https://sites.google.com/site/ijcsis/
ISSN 1947-5500

Weitere ähnliche Inhalte

Was ist angesagt?

Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...dannyijwest
 
XML Retrieval: A Survey
XML Retrieval: A SurveyXML Retrieval: A Survey
XML Retrieval: A Surveyijceronline
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech LegislationMartin Necasky
 
AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...
AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...
AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...IJwest
 
Unit 4 rdbms study_material
Unit 4  rdbms study_materialUnit 4  rdbms study_material
Unit 4 rdbms study_materialgayaramesh
 
Linked Data Hypercubes
Linked Data HypercubesLinked Data Hypercubes
Linked Data HypercubesDave Reynolds
 
Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463IJRAT
 
Information Intermediaries
Information IntermediariesInformation Intermediaries
Information IntermediariesDave Reynolds
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data toIJwest
 
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...eswcsummerschool
 
Genre discovery in corpus management systems (2004)
Genre discovery in corpus management systems (2004)Genre discovery in corpus management systems (2004)
Genre discovery in corpus management systems (2004)Joseba Abaitua
 
Unit 3 rdbms study_materials-converted
Unit 3  rdbms study_materials-convertedUnit 3  rdbms study_materials-converted
Unit 3 rdbms study_materials-convertedgayaramesh
 
Short Report Bridges performance gap between Relational and RDF
Short Report Bridges performance gap between Relational and RDFShort Report Bridges performance gap between Relational and RDF
Short Report Bridges performance gap between Relational and RDFAkram Abbasi
 

Was ist angesagt? (19)

Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
 
XML Retrieval: A Survey
XML Retrieval: A SurveyXML Retrieval: A Survey
XML Retrieval: A Survey
 
Metadata mapping
Metadata mappingMetadata mapping
Metadata mapping
 
Mods0210
Mods0210Mods0210
Mods0210
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech Legislation
 
Metadata Mapping & Crosswalks
Metadata Mapping & CrosswalksMetadata Mapping & Crosswalks
Metadata Mapping & Crosswalks
 
AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...
AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...
AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...
 
Unit 4 rdbms study_material
Unit 4  rdbms study_materialUnit 4  rdbms study_material
Unit 4 rdbms study_material
 
Linked Data Hypercubes
Linked Data HypercubesLinked Data Hypercubes
Linked Data Hypercubes
 
Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463
 
Information Intermediaries
Information IntermediariesInformation Intermediaries
Information Intermediaries
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data to
 
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
 
Spotlight
SpotlightSpotlight
Spotlight
 
Genre discovery in corpus management systems (2004)
Genre discovery in corpus management systems (2004)Genre discovery in corpus management systems (2004)
Genre discovery in corpus management systems (2004)
 
Phd presentation
Phd presentationPhd presentation
Phd presentation
 
Unit 3 rdbms study_materials-converted
Unit 3  rdbms study_materials-convertedUnit 3  rdbms study_materials-converted
Unit 3 rdbms study_materials-converted
 
Metadata crosswalks
Metadata crosswalksMetadata crosswalks
Metadata crosswalks
 
Short Report Bridges performance gap between Relational and RDF
Short Report Bridges performance gap between Relational and RDFShort Report Bridges performance gap between Relational and RDF
Short Report Bridges performance gap between Relational and RDF
 

Ähnlich wie An Implementation of a New Framework for Automatic Generation of Ontology and RDF to Real-Time Web and Journal Data

A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...cscpconf
 
A semantic based approach for knowledge discovery and acquistion from multipl...
A semantic based approach for knowledge discovery and acquistion from multipl...A semantic based approach for knowledge discovery and acquistion from multipl...
A semantic based approach for knowledge discovery and acquistion from multipl...csandit
 
IRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET Journal
 
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...IJwest
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...Computer Science Journals
 
Semantic Knowledge Acquisition of Information for Syntactic web
Semantic Knowledge Acquisition of Information for Syntactic web Semantic Knowledge Acquisition of Information for Syntactic web
Semantic Knowledge Acquisition of Information for Syntactic web dannyijwest
 
A Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web DatabasesA Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web DatabasesIJMER
 
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...IOSR Journals
 
Vision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result RecordsVision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result RecordsIJMER
 
Design and Implementation of SOA Enhanced Semantic Information Retrieval web ...
Design and Implementation of SOA Enhanced Semantic Information Retrieval web ...Design and Implementation of SOA Enhanced Semantic Information Retrieval web ...
Design and Implementation of SOA Enhanced Semantic Information Retrieval web ...iosrjce
 
Data.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataData.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataMatthew Rowe
 
Annotation for query result records based on domain specific ontology
Annotation for query result records based on domain specific ontologyAnnotation for query result records based on domain specific ontology
Annotation for query result records based on domain specific ontologyijnlc
 
Annotating Search Results from Web Databases
Annotating Search Results from Web Databases Annotating Search Results from Web Databases
Annotating Search Results from Web Databases Mohit Sngg
 

Ähnlich wie An Implementation of a New Framework for Automatic Generation of Ontology and RDF to Real-Time Web and Journal Data (20)

A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
 
A semantic based approach for knowledge discovery and acquistion from multipl...
A semantic based approach for knowledge discovery and acquistion from multipl...A semantic based approach for knowledge discovery and acquistion from multipl...
A semantic based approach for knowledge discovery and acquistion from multipl...
 
H017554148
H017554148H017554148
H017554148
 
IRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description Framework
 
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
 
Semantic Knowledge Acquisition of Information for Syntactic web
Semantic Knowledge Acquisition of Information for Syntactic web Semantic Knowledge Acquisition of Information for Syntactic web
Semantic Knowledge Acquisition of Information for Syntactic web
 
A Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web DatabasesA Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web Databases
 
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
 
L017418893
L017418893L017418893
L017418893
 
ANALYSIS OF RESEARCH ISSUES IN WEB DATA MINING
ANALYSIS OF RESEARCH ISSUES IN WEB DATA MINING ANALYSIS OF RESEARCH ISSUES IN WEB DATA MINING
ANALYSIS OF RESEARCH ISSUES IN WEB DATA MINING
 
Vision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result RecordsVision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result Records
 
Gt ea2009
Gt ea2009Gt ea2009
Gt ea2009
 
The Social Data Web
The Social Data WebThe Social Data Web
The Social Data Web
 
R01765113122
R01765113122R01765113122
R01765113122
 
Design and Implementation of SOA Enhanced Semantic Information Retrieval web ...
Design and Implementation of SOA Enhanced Semantic Information Retrieval web ...Design and Implementation of SOA Enhanced Semantic Information Retrieval web ...
Design and Implementation of SOA Enhanced Semantic Information Retrieval web ...
 
Data.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataData.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked Data
 
Annotation for query result records based on domain specific ontology
Annotation for query result records based on domain specific ontologyAnnotation for query result records based on domain specific ontology
Annotation for query result records based on domain specific ontology
 
Annotating Search Results from Web Databases
Annotating Search Results from Web Databases Annotating Search Results from Web Databases
Annotating Search Results from Web Databases
 

Kürzlich hochgeladen

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Kürzlich hochgeladen (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

An Implementation of a New Framework for Automatic Generation of Ontology and RDF to Real-Time Web and Journal Data

  • 1.  Abstract— Information on the web is tremendously increasing in recent years with the faster rate. This massive or voluminous data has driven intricate problems for information retrieval and knowledge management. As the data resides in a web with several forms, the Knowledge management in the web is a challenging task. Here the novel 'Semantic Web' concept may be used for understanding the web contents by the machine to offer intelligent services in an efficient way with a meaningful knowledge representation. The data retrieval in the traditional web source is focused on 'page ranking' techniques, whereas in the semantic web the data retrieval processes are based on the ‘concept based learning'. The proposed work is aimed at the development of a new framework for automatic generation of ontology and RDF to some real time Web data, extracted from multiple repositories by tracing their URI’s and Text Documents. Improved inverted indexing technique is applied for ontology generation and turtle notation is used for RDF notation. A program is written for validating the extracted data from multiple repositories by removing unwanted data and considering only the document section of the web page. Index Terms— Semantic Web, Resource description framework, Ontology, Improved inverted indexing technique, Knowledge management. I. INTRODUCTION World Wide Web (WWW) is considered as a global information repository that identifies documents and other web resources by Uniform Resource Locators, interlinked by hypertext links. Search engines are used to retrieve the information from the web. Data overburden is the most concerning issue in these days for the existing system. Evolution of web includes the web versions of web 1.0, 2.0 etc. In this series, the web version 3.0 is referred to as semantic web [1] is evolved as a knowledge management support across the globe. Search engines should be enriched with semantic web capabilities that analyze webpage content and provide more relevant results corresponding to the user query. Semantic web standards include resource description framework (RDF), web ontology, RDF Schema and rule interchange format (RIF) for handling data. Resource description framework (RDF) provides a conceptual description of information for representing the web resources like Turtle syntax, N-Triples etc. Resource Description Framework (RDF) describes data on the Web in graph form [2]. Ontologies consist of the finite set of terms, relationships, constraints and axioms [3]. Ontologies have proven to be useful for effective knowledge modeling and information retrieval. The remaining paper is arranged as follows: In Section 2 the related work is presented. The proposed work and its methodology are discussed in Section 3 & 4. The results are presented in Section 5. Conclusions are given in Section 6. II. RELATED WORK M.S.P.Babu et.al [4] provided the overview of some of the semantic search engines that yield unique search experience for users. Wilkinson et.al [5] proposed an information retrieval system using document structure. Amel Grissa Touzi et.al [6] suggested the Fuzzy Ontology of Data mining (FODM) for processing automated generation of ontologies in the domain of data mining. Amira Aloui et.al [7] implemented a plugin named “FO-FQ Tab plug-in”, which can be integrated with protégé editor for building the fuzzy ontologies from large databases. To overcome the drawbacks of the existing system for accessing the related science information, M.S.P.Babu et.al [8] proposed a new framework for automatic generation of ontology and RDF for real-time web data. Tahani Alsubait et.al [9] developed the e-learning suite, with the set of questions designed using ontological representation. A.H.M.Rupasingha et.al [10] suggested that the performance of the ontology generation is always dependent on the specificity of the terms. Seongwook Youn et.al [11] discussed pros and cons of tools like protégé 2000, OilEd, Apollo, OntoLingua, Onto Edit, webODE, KAON, ICOM, DEO, webOnto that is used for ontology creation.Kgotatso Desmond Mogotlane et.al[12] presented a comparative study of plugins of protege tool like DB2OWL and Data Master. Sudeepthi Govathoti1, M.S. Prasasd Babu2 Research Scholar, Andhra University, Visakhapatnam &Assistant Professor, Anurag Group of Institutions, Hyderabad, Professor, Department of CS&SE, Andhra University, Visakhapatnam, sudeepthicse@cvsr.ac.in1 , profmspbabu@gmail.com2 An Implementation of a New Framework for Automatic Generation of Ontology and RDF to Real Time Web and Journal Data International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 1, January 2018 89 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 2. III. PROPOSED WORK Semantic web capabilities like RDF & ontology are applied to enrich the knowledge. The proposed work is an implementation of the framework proposed by the authors [8]. The framework is designed with reference to the semantic web Stack. It is carried out in two phases, namely Data extraction phase and Data representation phase. Web scraping is performed using HTML parsing technique in data extraction phase by giving sample search query as an input to multiple repositories. DOM parsing and HTML parsing techniques are applied to validate the data retrieved from multiple repositories by considering only the document section of the webpage. Extensible markup language (XML) is the base for the semantic web representations; the validated information is converted into semi-structured notations by using XSD declaration from DOM tree and passed as an input for the next layers of the proposed framework. XML notation is given as an input to data representation phase. RDF notation is generated and represented in graphical form using Graphviz tool. A textual representation of RDF graph is provided using Turtle, the Terse RDF Triple Language. Improved Inverted Indexing technique is applied for ontological representation of words by excluding the stop words. Figure 1: Proposed framework IV METHODOLOGY Implementation of the framework proposed in Section III will be carried out in two phases namely data extraction and data representation phases. The details are given below. Phase 1: Data extraction Data extraction phase performs web scraping from multiple repositories and stores the scraped data into the database. The Data extraction phase is sub- divided into three steps namely web scraping, data validation, XML Conversion. The scraped data is further validated by removing the unwanted data in the considering document section of web page. The data stored in table format in the database, after the data validation process, is converted into the Semi-Structured Notations (i.e. XML Notations) and passed as an input to the data representation phase. Step 1: Web Scraping Web scraping, also be referred as screen scraping or Web harvesting, is used to fetch and extract the data from a web document using HTML parsing techniques. Here Web pages are crawled and the content of the Web page is extracted. The data in the Web page includes three sections namely: Web page statistics bar, document section and descriptive section. The three items are stored in a database as three different attributes in a database table. HTML parsing technique is used for scraping data from the web documents is shown in Fig 2: Figure 2: Content of the Web page Step 2: Data Validation In the Data Validation Step, the data collected from step 1 is validated using HTML and DOM parsing techniques. Here unwanted data is removed and the necessary portion of URLs is retained. In this step web status bar and descriptive sections are removed in the database table. The validated data is stored in a database. Document section displays the results in the form of page title, URL, Snippet (description) for the given search query. Figure 3: Content of the Web page after Data validation International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 1, January 2018 90 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 3. Descriptive section provides the Wikipedia information about the input query. The Data validation process is carried out by considering only the document section of the web page as shown in Fig: 3. Step.3: XML Conversion In XML Conversion Step the data, validated in step 2, is converted into a DOM tree using XML Schema Definition (XSD). The conversion is performed on data that is validated and stored in database by considering each individual field/ attribute into namespace convention. The XSD declaration of DOM tree has hierarchical structures which have root node, representing the search key word and three child nodes, representing Title, URL and Description respectively. The XSD declaration of the DOM tree with an example is shown in Fig 4: Figure 4: XSD Declaration of DOM tree Phase2: Data representation Data extracted from steps 1,2 and 3 is maintained in an XML format and is given as an input to data representation phase. In Semantic Web architecture, the major source of data representation imposes RDF-ization and Ontology generation. Hence the data representation phase is sub- divided into two steps namely RDF-ization and ontology generation, which are explained in detail in step 4 & 5 respectively. Step 4: RDF-ization The Resource Description Framework (RDF) is the basic building block in semantic web, promoting conceptual modeling of web data [13]. The RDF-ization process is carried out using Turtle notation and Graphviz tool. In this step the XML notation data stored in extraction phase is given as an input to RDF-ization. The RDF notation is visualized in the form of RDF graph using Graphviz tool. Decomposition of tuple creates a new blank node corresponding to the row and a new triple set is obtained. Each tuple in a relational database is decomposed as RDF triples, namely: the title is taken as subject, URL is considered as predicate and description is taken as object. A node can be a URI reference, literal or the blank node. The graph in Fig: 5 is an example of RDF-ization process of a semantic net. Figure 5: RDF Triple The triple is represented as a <subject, predicate, object> format by exploring the relationship among the nodes [14]. The XML conversion carried out in step 3 is represented in RDF syntax using Turtle notation from the convention specified in “http://www.w3.org/1999/02/22-rdf-syntax-ns#“ as shown in Fig 6: Fig-6: RDF-ization The <rdf: Description> element provides the description of resource identified by <rdf: about> attribute. The tags <rbss: title>, <rbss: keyword>, <rbss: URL> are the properties of the resource identified. The RDF represented in turtle notation is visualized in a graphical format using Graphviz tool .It is open source software that is used for generating graphs. Step 5: Ontology Generation Ontology is defined as a formal specification of conceptualization of the domain of Interest. In ontology generation step, the RDF notation obtained from step 4 is used to create a vocabulary of words using improved inverted indexing algorithm. Improved Inverted Indexing algorithm is employed on real time web data collected from multiple repositories and text documents. The words from the description tags are extracted by excluding the stop words and frequency count/Term frequency (TF) of each word is maintained. The illustration of improved inverted indexing algorithm is presented as follows: Algorithm: Improved inverted indexing Input: Database D= {T1, T2…Tn}, Storage Database Output: Attributes {A1, A2…An}, where Ai, for i=1,2…n are representing ontology vocabulary. Parameters: Swrdk= Array of Stop Words attsLq= Snippet attribute Wordsk= Words stemmed from snippet attribute attsLf= Word frequencies after stemming attsL= ontology along with the frequencies count. 1. Swrdk={}; 2. for i=0;i<=i+,i≤ D do 3. attsLq=Query Coverage(D,i); 4. Wordsk=Words Separate(attsLq,Swrdk); 5. attsLf= Words Usage frequency(D,attsLq,WordsK); International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 1, January 2018 91 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 4. 6. attsL=attsLqU attsLf 7. f=highest_freq(attsL) 8. if (f<freq(attsL)) then 9. sort(Wordsk,freq(attsL)) 10. end if 11.end for 12.return (Wordsk,freq(attsL) V. RESULTS AND DISCUSSION The Semantic web stack proposed by Tim Berners Lee [15] is implemented by using the frame work proposed by the authors in section III. It is implemented in PHP version: 5 (Open Source scripting language) and MySQL version: 5 (open- source relational database management system) environments. It is tested on an input with test dataset comprising of sample search keywords. Response time is the amount of time that elapses from the receipt of the query until the results are displayed to user. Response time can be measured on server side or client side as shown in Fig 7. Figure 7: Response Time Throughput is defined as number of queries executed per second (qps). Throughput and response time are observed for the set of retrieval operations with respect to the page load times. The performance of framework implemented with respect to throughput is shown in Fig 8. Figure 8: Throughput . A sample search query is given as an input and web scraping results are shown in Fig 9: Figure 9: Web scraping results Web scraping performance is evaluated by considering the following parameters like database size and count of URL’s extracted which is shown in Fig 10: Figure 10: Web scraping analysis Scraped data from multiple repositories is given as an input to data validation step. The validated data is obtained as an output to data validation process by applying HTML and DOM parsing technique. Data validation considers only the document section of a web page. Data validation results are shown in Fig 11: Figure 11: Data validation results The performance of data validation processing with wanted and unwanted data from the scraped data by considering the following parameters like database size and count of URL’s is shown in Fig 12: International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 1, January 2018 92 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 5. Figure12: Data Validation The validated data stored in database is converted into the XML notations by applying XSD declaration as shown in Fig 13: Figure13: XML Conversion Resource Description Framework (RDF) is a recommended standard of World Wide Web Consortium (W3C) [16]. RDF representation of data in turtle form is shown in Fig 14: Figure14: RDF-ization results Turtle form RDF generation for sample relation named “testrdf” which has an attributes as <name, description, freq> is considered. The “testrdf” represents the relation name is considered as a class in an RDF graph and has set of three nodes that are connecting the testrdf in depth wise manner represents the tuple of a relation as shown in Fig 15: Figure15: RDF Graph generation Ontology generation for the data obtained from multiple repositories as well as the text file. Improved inverted indexing technique is applied for extracting the words with their frequencies discarding the stop words, in the order of highest precedence. The result of ontology generation for real time web data with frequency is shown in Fig 16: Figure16: Ontology Generation for Real time web data The highest frequency word is considered as a frequent search term for the purpose of rule framing using description logic. The rule mapping is done for the efficient retrieval operation which will be future work. The result of ontology generation for text document is shown in Fig 17. Figure17: Ontology Generation for text document VI. CONCLUSION The evolution of web has taken many forms namely web 1.0, web 2.0, web 3.0 , web 4.0 which lead to high-end information retrieval systems using semantic web. The existing traditional system collects the data from search engines is exhibiting average performance in retrieval. Implementation of International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 1, January 2018 93 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 6. proposed framework for automatic generation of ontology’s and RDF improves the performance of traditional search engines by incorporating semantic capabilities. It includes the application of HTML parsing technique, DOM parsing techniques and Turtle notation of graphviz tool. The algorithm improves information retrieval in Semantic Web and Expert Systems. The future work includes applying efficient cryptography for securing database and rule framing for the design of an expert system. REFERENCES [1] Sareh Aghaei, Mohammad Ali Nematbakhsh and Hadi Khosravi Farsani, “Evolution Of The World Wide Web: From Web 1.0 To Web 4.0”, International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.1, January 2012. [2] Abdeslem DENNAI, Sidi Mohammed BENSLIMANE,"Semantic Indexing of Web Documents Based on Domain Ontology", I.J. Information Technology and Computer Science, 2015, 02, 1-11. [3] Seema Redekar, Vishal Chekkala, Siddhapa Gouda, Swapnil Yalgude,"Web Search Engine Using Ontology Learning",International Journal of Innovative Research in Computer and Communication Engineering,Vol. 5, Issue 3, March 2017. [4] G Sudeepthi, G Anuradha, M Surendra Prasad Babu,” A survey on semantic web search engine” International Journal of Computer Science Issues, 2012/3 IJCSI, Volume 9 Issue 2 Pages 241-245. [5] R. Wilkinson, ‟Effective retrieval of structured documents‟. (S.-V. New York, Ed.) Pages 311 – 317, 1994. [6] Amel Grissa Touzi, Hela Ben Massoud and Alaya Ayadi,” Automatic Ontology Generation for Data Mining Using FCA and Clustering”, arxiv.org, no. 1311.1764. [7] Amira Aloui, ENIT, Tunis, Tunisia,”A Fuzzy Ontology-Based Platform for Flexible Querying”, International Journal of Service Science, Management, Engineering, and Technology, Vol.6, Issue 3, July- September 2015, pp 12-26. [8] Prof. M Surendra Prasad Babu, Sudeepthi Govathoti,”A Semantic Model for Building Integrated Ontology Databases”, 7th IEEE International conference on software engineering and service science. [9] Tahani Alsubait, Bijan Parsia, Ulrike Sattler,”Ontology-Based Multiple Choice Question Generation”, Kunstl Intell, Vol. 30, 2016, pp 183-188. [10] Rupasingha A. H. M. Rupasingha, “Improving Web Service Clustering through a Novel Ontology Generation Method by Domain Specificity “, IEEE 24th International Conference on Web Services, 2017. [11] Seongwook Youn, Anchit Arora, Preetham Chandrasekhar, Paavany Jayanty, Ashish Mestry and Shikha Sethi,”Survey about Ontology Development Tools for Ontology-based Knowledge Management”. [12] Kgotatso Desmond Mogotlane, Jean Vincent Fonou- Dombeu,”Automatic Conversion of Relational Databases into Ontologies”. [13] FatemeAbiri, Mohsen Kahani, FataneZarinkalam"An Entity Based RDF Indexing Schema Using Hadoop and HBase", 2011. [14] Faizan Shaikh, Usman A. Siddiqui, IramShahzadi, SyedUami, Zubair A. Shaik "SWISE: Semantic Web based Intelligent Search Engine", 2010. [15] Gopal Pandey,” The Semantic Web: An Introduction and Issues”, International Journal of Engineering Research and Applications, Vol. 2, Issue 1,Jan-Feb 2012, pp.780-786. [16] Resource Description Framework (RDF). Model and Syntax Specification. Technical report, W3C. Prof. M.S.Prasad Babu was born on 12 - 08-1956 in Andhra Pradesh, India. He obtained his bachelors to doctoral degrees from Andhra University. He has 39 years of teaching and research experience. He guided 12 PhD's and 210 PG students for their thesis. He was the president of ICT section of ISCA in 2006-07. He attended about 50 National and International Conferences in India and abroad and presented keynotes. He contributed about 250 papers National and International journals and conferences. He developed 10 international reputed Web portals. He won ISCA Young Scientist Award in 1986, State Best teacher award for engineering in 2015 and Dr. Sarvepalli Radhakrishnan Best Academician Award of Andhra University for 2014. He was the Conference Steering Chair for eight IEEE ICSESS of Beijing, China from 2010 to 2017. Prof Babu is presently working as Vice Principal of AU Engg College, Chairman, Faculty of Engineering of Andhra University and senior Professor in Computer Science & Systems Engineering discipline. G. SUDEEPTHI was born in, Rajahmundry, Andhra Pradesh, India in 1983. She received B. TECH from National Institute of Technology Warangal, India in 2005 and M. Tech in Computer Science & Engineering from KLC Vijayawada, India in 2007. She is working as part time research scholar in the Dept. of CS & SE, Andhra University and as a Assistant Professor at Anurag Group of Institutions Hyderabad, India. She has got more than ten years of teaching experience. She Qualified FET-2011 Conducted by JNTUH. She contributed seven research papers in International journals and Presented one research paper at International conference in Beijing, China and another in Warangal, India. International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 1, January 2018 94 https://sites.google.com/site/ijcsis/ ISSN 1947-5500