SlideShare ist ein Scribd-Unternehmen logo
1 von 5
Downloaden Sie, um offline zu lesen
IOSR Journal of Computer Engineering (IOSRJCE)
ISSN: 2278-0661, ISBN: 2278-8727 Volume 5, Issue 6 (Sep-Oct. 2012), PP 25-29
www.iosrjournals.org
www.iosrjournals.org 25 | Page
Data Mining for XML Query-Answering Support
KC. Ravi Kumar1
, E. Krishnaveni Reddy2
, Ramadevi.G3
1, 2, 3
(CSE, Sridevi women’s Engineering College, Hyderabad, Andhra Pradesh)
Abstract: XML has become a defacto standard for storing, sharing and exchanging information across
heterogeneous platforms. The XML content is growing day by day in rapid pace. Enterprises need to make
queries on XML databases frequently. As huge XML data is available, it is challenging task to extract required
data from XML database. It is computationally expensive to answer queries without any support. Towards this,
in this paper we present a technique known as Tree-based Association Rules (TARs) mined rules that provide
required information on structure and content of XML file and the TARs are also stored in XML format. The
mined knowledge (TARs) used later for XML query answering support. This enables quick and accurate
answering. We also developed a prototype application to demonstrate the efficiency of the proposed system. The
empirical results are very positive and query answering is expected to be useful in real time applications.
Index Terms: XML, query answering support, data mining, tree-based association rules
I. Introduction
XML has become a popular format for storing and sharing data across heterogeneous platforms. The
XML format is neutral, flexible and interoperable [5]. It is widely used in applications as it can allow
applications to have communication though they are built in different platforms. The XML documents are
plenty in enterprises and the data retrieval can be done in two ways. The first approach is that user gives
keywords and the program searches for relevant documents. The second approach is give XML queries that are
answered. The first approach is done using conventional information retrieval technique [4] that works on the
search process based on the given search word. With respect to query answering, it is not easy to process such
request. To make this searching easy this paper presents data mining for XML query answering support. XML
documents are validated by either DTD or schema. However, schema presence is not mandatory to process
XML file [3].
This paper presents data mining framework for XML query answering support. The XML documents
essence is extracted and kept in another XML file in the form of TARs. With the help of this XML query
answering becomes easy.
II. Proposed Framework
The proposed XML query answering support framework is as shown in fig. 1. The purpose of this
framework is to perform data mining on XML and obtain intentional knowledge. The intentional knowledge is
also in the form of XML. This is nothing but rules with supports and confidence. In other words the result of
data mining is TARs (Tree-based Association Rules).
Fig. 1 – Proposed XML query answering support framework
As can be seen in fig. 1, the framework is to have data mining for XML query answering support.
When XML file is given as input, DOM parser will parse it for wellformedness and validness. If the given XML
document is valid, it is parsed and loaded into a DOM object which can be navigated easily. The parsed XML
file is given to data mining sub system which is responsible for sub tree generation and also TAR extraction.
Data mining for XML query-answering support
www.iosrjournals.org 26 | Page
The generated TARs are used by Query Processor Sub System. This module takes XML query from end user
and makes use of mined knowledge to answer the query quickly.
Tar Extraction
Extracting TARs through data mining is a process with two steps. In the first step frequent subtrees that
satisfy given support are mined are mined. In the second step interesting rules that have confidence above given
threshold are calculated from the frequent subtrees. Finding frequent sub trees is described in [1], [2], [6], [7],
[8], [9]. Algorithm 1 finds frequent sub trees and calculates interesting rules.
The rules obtained from algorithm 1 are written to an XML file. Then indexing is made. Afterwards when XML
queries are made, the proposed system uses index and TARs and quickly answers the query.
III. Experiments And Results
Environment
The environment used to develop the prototype application includes JSE (Java Standard Edition) 6.0,
Net Beans IDE that run in Windows 7 OS. A PC with 2 GB RAM and 2.9x GHz processor is used. The Java
SWING API is used to build graphical user interface while IO and JAXP (Java API for XML Parsing) are used
for implementing functionality. The main application GUI is as shown in fig.
Fig. 2 – The GUI of the prototype application
As can be seen the GUI has provision to choose an XML file as input. It also allows choosing a file for
storing extracted rules. A text area is provided to show the XML file content. View Tree button shows the XML
file with graphical tree representation. Generate Large ItemSet button generates frequent sub treesthat will be
used for further processing. On clicking the GenerateRuleFile button, it extracts TARs from the given XML file
and finally extracted rules are saved into the given TAR file. The QueryAnswering button invokes a form where
user can enter queries. The query interface is shown in fig. 3.
Data mining for XML query-answering support
www.iosrjournals.org 27 | Page
Fig. 3 – Query interface and results
As can be seen the fig. 3 (a) shows interface for making queries. The queries given here are processed
faster using rule files extracted from XML files. The result of given query in fig. 3 (a) is shown in fig. 3 (b) with
the results that satisfied given support and confidence.
IV. Results
We have performed four types of experiments. They are based on time required to extract intentional
knowledge from XML; time required to answer intentional and extensional queries; monitoring extraction time
with given support and confidence; and study of accuracy of intentional answers.
As can be seen in fig. 4, the TAR extraction time is more when number of nodes in XML document is
more. In other words, the time taken to extract TARs is directly proportional to the number of nodes in given
XML document.
Data mining for XML query-answering support
www.iosrjournals.org 28 | Page
As can be seen in fig. 5, the TAR extraction time is more when number of nodes in XML document is
more. In other words, the time taken to extract TARs is directly proportional to the number of nodes in given
XML document which is generated using XMark.
As can be seen in fig. 6, the TAR extraction time is more when number of nodes in XML document is
more. In other words, the time taken to extract TARs is directly proportional to the number of nodes in given
XML document with fixed depth.
Fig. 7 – Extraction time growth using CMTreeMiner with respect to number of nodes
As can be seen in fig. 7, the TAR extraction time is more when number of nodes in XML document is
more. In other words, the time taken to extract TARs is directly proportional to the number of nodes in given
XML document when CMTreeMiner is used.
As can be seen in fig. 8, the time taken for intensional and extensional query answering are plotted.
However, the intentional query answering takes very less time when compared with that of extensional
answering.
V. Conclusion
In this paper we presented a framework for extracting TARs from given XML file so as to support
XML queries. Towards this end, the aim of this paper is to mine frequent association rules and store the mined
content in XML format; use the TARs to support query answering or to gain information from XML databases.
A prototype application is built to test the efficiency of the proposed framework. The application takes XML file
as input and generates TARs and then finally index file that helps in query processing. The experimental results
revealed that the proposed application is useful and can be used in real time applications.
0
10
20
30
40
50
0 2 4 6 8 10
S
e
c
o
n
d
s
numberof nodes
Series1
Data mining for XML query-answering support
www.iosrjournals.org 29 | Page
References
[1] R. Agrawal and R. Srikant. Fast algorithms for mining association rulesin large databases. In Proc. of the 20th Int. Conf. on Very
Large DataBases, pages 487–499. Morgan Kaufmann Publishers Inc., 1994.
[2] T. Asai, H. Arimura, T. Uno, and S. Nakano. Discovering frequentsubstructures in large unordered trees. In Technical Report DOI-
TR216, Department of Informatics, Kyushu University. http://www.i.kyushuu.ac.jp/doitr/trcs216.pdf, 2003.
[3] D. Barbosa, L. Mignet, and P. Veltri. Studying the xml web: Gatheringstatistics from an xml sample. World Wide Web, 8(4):413–
438, 2005.
[4] Gary Marchionini. Exploratory search: from finding to understanding.Communications of the ACM, 49(4):41–46, 2006.
[5] World Wide Web Consortium. Extensible Markup Language (XML) 1.0,1998. http://www.w3C.org/TR/REC-xml/.
[6] K. Wang and H. Liu. Discovering typical structures of documents: aroad map approach. In Proc. of the 21st Int. Conf. on Research
andDevelopment in Information Retrieval, pages 146–154, 1998.
[7] Y. Xiao, J. F. Yao, Z. Li, and M. H. Dunham. Efficient data mining formaximal frequent subtrees. In Proc. of the 3rd IEEE Int.
Conf. on DataMining, page 379. IEEE Computer Society, 2003.
[8] X. Yan and J. Han. Closegraph: mining closed frequent graph patterns.In Proc. of the 9th ACM Int. Conf. on Knowledge Discovery
and DataMining, pages 286–295. ACM Press, 2003.
[9] M. J. Zaki. Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data
Engineering,17(8):1021–1035, 2005.Mirjana
About Authors:
Mr.K.C Ravi Kumar M.Tech CSE from JNTU Hderabad currently he is the head of
department for M.Tech CSE programme in Sridevi Women’s Engineering College having 17
years of Academic Experience. He is life member of IEEE & IST areas of research include
Data Mining & Data Warehousing Information Retrival Systems Information Security.
Mrs. E. Krishnaveni Reddy B.Tech(C.S.E),
M.Tech(S.E) Asst.Prof
Ms. Ramadevi. G, B.E in C.S.E from Muffakham Jah College of Engineering and
technology
M.Tech in C.S.E from Sridevi women’s Engineering college

Weitere Àhnliche Inhalte

Was ist angesagt?

Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal databaseTPO TPO
 
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...IRJET Journal
 
The Survey of Data Mining Applications And Feature Scope
The Survey of Data Mining Applications  And Feature Scope The Survey of Data Mining Applications  And Feature Scope
The Survey of Data Mining Applications And Feature Scope IJCSEIT Journal
 
Multikeyword Hunt on Progressive Graphs
Multikeyword Hunt on Progressive GraphsMultikeyword Hunt on Progressive Graphs
Multikeyword Hunt on Progressive GraphsIRJET Journal
 
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)IRJET Journal
 
Algoithems and data structures
Algoithems and data structuresAlgoithems and data structures
Algoithems and data structuresadamlongs1983
 
Enhancing the labelling technique of
Enhancing the labelling technique ofEnhancing the labelling technique of
Enhancing the labelling technique ofIJDKP
 
Privacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data SetsPrivacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data SetsIJERA Editor
 
Privacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted dataPrivacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted dataIOSR Journals
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplicationidescitation
 
IRJET- On-AIR Based Information Retrieval System for Semi-Structure Data
IRJET-  	  On-AIR Based Information Retrieval System for Semi-Structure DataIRJET-  	  On-AIR Based Information Retrieval System for Semi-Structure Data
IRJET- On-AIR Based Information Retrieval System for Semi-Structure DataIRJET Journal
 
Enabling Fine-grained Multi-keyword Search Supporting Classified Sub-dictiona...
Enabling Fine-grained Multi-keyword Search Supporting Classified Sub-dictiona...Enabling Fine-grained Multi-keyword Search Supporting Classified Sub-dictiona...
Enabling Fine-grained Multi-keyword Search Supporting Classified Sub-dictiona...1crore projects
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...ijsc
 
A unified approach for spatial data query
A unified approach for spatial data queryA unified approach for spatial data query
A unified approach for spatial data queryIJDKP
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notesBAIRAVI T
 

Was ist angesagt? (17)

Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal database
 
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
 
The Survey of Data Mining Applications And Feature Scope
The Survey of Data Mining Applications  And Feature Scope The Survey of Data Mining Applications  And Feature Scope
The Survey of Data Mining Applications And Feature Scope
 
Multikeyword Hunt on Progressive Graphs
Multikeyword Hunt on Progressive GraphsMultikeyword Hunt on Progressive Graphs
Multikeyword Hunt on Progressive Graphs
 
EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8
EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8
EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8
 
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
 
Algoithems and data structures
Algoithems and data structuresAlgoithems and data structures
Algoithems and data structures
 
Enhancing the labelling technique of
Enhancing the labelling technique ofEnhancing the labelling technique of
Enhancing the labelling technique of
 
1699 1704
1699 17041699 1704
1699 1704
 
Privacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data SetsPrivacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data Sets
 
Privacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted dataPrivacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted data
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplication
 
IRJET- On-AIR Based Information Retrieval System for Semi-Structure Data
IRJET-  	  On-AIR Based Information Retrieval System for Semi-Structure DataIRJET-  	  On-AIR Based Information Retrieval System for Semi-Structure Data
IRJET- On-AIR Based Information Retrieval System for Semi-Structure Data
 
Enabling Fine-grained Multi-keyword Search Supporting Classified Sub-dictiona...
Enabling Fine-grained Multi-keyword Search Supporting Classified Sub-dictiona...Enabling Fine-grained Multi-keyword Search Supporting Classified Sub-dictiona...
Enabling Fine-grained Multi-keyword Search Supporting Classified Sub-dictiona...
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...
 
A unified approach for spatial data query
A unified approach for spatial data queryA unified approach for spatial data query
A unified approach for spatial data query
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
 

Ähnlich wie Data Mining for XML Query-Answering Support

Optimization of Mining Association Rule from XML Documents
Optimization of Mining Association Rule from XML DocumentsOptimization of Mining Association Rule from XML Documents
Optimization of Mining Association Rule from XML DocumentsIOSR Journals
 
Cl4201593597
Cl4201593597Cl4201593597
Cl4201593597IJERA Editor
 
A novel approach towards developing a statistical dependent and rank
A novel approach towards developing a statistical dependent and rankA novel approach towards developing a statistical dependent and rank
A novel approach towards developing a statistical dependent and rankIAEME Publication
 
An improvised frequent pattern tree
An improvised frequent pattern treeAn improvised frequent pattern tree
An improvised frequent pattern treeIJDKP
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methodsijcsity
 
Secure Syntactic key Ranked Search over Encrypted Cloud in Data
Secure Syntactic key Ranked Search over Encrypted Cloud in DataSecure Syntactic key Ranked Search over Encrypted Cloud in Data
Secure Syntactic key Ranked Search over Encrypted Cloud in DataIJERA Editor
 
IRJET- Proficient Recovery Over Records using Encryption in Cloud Computing
IRJET- Proficient Recovery Over Records using Encryption in Cloud ComputingIRJET- Proficient Recovery Over Records using Encryption in Cloud Computing
IRJET- Proficient Recovery Over Records using Encryption in Cloud ComputingIRJET Journal
 
A Performance Based Transposition algorithm for Frequent Itemsets Generation
A Performance Based Transposition algorithm for Frequent Itemsets GenerationA Performance Based Transposition algorithm for Frequent Itemsets Generation
A Performance Based Transposition algorithm for Frequent Itemsets GenerationWaqas Tariq
 
Efficient Similarity Search Over Encrypted Data
Efficient Similarity Search Over Encrypted DataEfficient Similarity Search Over Encrypted Data
Efficient Similarity Search Over Encrypted DataIRJET Journal
 
Review on: Techniques for Predicting Frequent Items
Review on: Techniques for Predicting Frequent ItemsReview on: Techniques for Predicting Frequent Items
Review on: Techniques for Predicting Frequent Itemsvivatechijri
 
Design of file system architecture with cluster
Design of file system architecture with clusterDesign of file system architecture with cluster
Design of file system architecture with clustereSAT Publishing House
 
An improved apriori algorithm for association rules
An improved apriori algorithm for association rulesAn improved apriori algorithm for association rules
An improved apriori algorithm for association rulesijnlc
 
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...cscpconf
 
Review Over Sequential Rule Mining
Review Over Sequential Rule MiningReview Over Sequential Rule Mining
Review Over Sequential Rule Miningijsrd.com
 
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...IRJET Journal
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Ijarcet vol-2-issue-3-881-883
Ijarcet vol-2-issue-3-881-883Ijarcet vol-2-issue-3-881-883
Ijarcet vol-2-issue-3-881-883Editor IJARCET
 

Ähnlich wie Data Mining for XML Query-Answering Support (20)

Optimization of Mining Association Rule from XML Documents
Optimization of Mining Association Rule from XML DocumentsOptimization of Mining Association Rule from XML Documents
Optimization of Mining Association Rule from XML Documents
 
Cl4201593597
Cl4201593597Cl4201593597
Cl4201593597
 
A novel approach towards developing a statistical dependent and rank
A novel approach towards developing a statistical dependent and rankA novel approach towards developing a statistical dependent and rank
A novel approach towards developing a statistical dependent and rank
 
An improvised frequent pattern tree
An improvised frequent pattern treeAn improvised frequent pattern tree
An improvised frequent pattern tree
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methods
 
AVL TREE AN EFFICIENT RETRIEVAL ENGINE IN CLASSIFIED FINGERPRINT DATABASE
AVL TREE AN EFFICIENT RETRIEVAL ENGINE IN CLASSIFIED FINGERPRINT DATABASEAVL TREE AN EFFICIENT RETRIEVAL ENGINE IN CLASSIFIED FINGERPRINT DATABASE
AVL TREE AN EFFICIENT RETRIEVAL ENGINE IN CLASSIFIED FINGERPRINT DATABASE
 
Secure Syntactic key Ranked Search over Encrypted Cloud in Data
Secure Syntactic key Ranked Search over Encrypted Cloud in DataSecure Syntactic key Ranked Search over Encrypted Cloud in Data
Secure Syntactic key Ranked Search over Encrypted Cloud in Data
 
IRJET- Proficient Recovery Over Records using Encryption in Cloud Computing
IRJET- Proficient Recovery Over Records using Encryption in Cloud ComputingIRJET- Proficient Recovery Over Records using Encryption in Cloud Computing
IRJET- Proficient Recovery Over Records using Encryption in Cloud Computing
 
A Performance Based Transposition algorithm for Frequent Itemsets Generation
A Performance Based Transposition algorithm for Frequent Itemsets GenerationA Performance Based Transposition algorithm for Frequent Itemsets Generation
A Performance Based Transposition algorithm for Frequent Itemsets Generation
 
Efficient Similarity Search Over Encrypted Data
Efficient Similarity Search Over Encrypted DataEfficient Similarity Search Over Encrypted Data
Efficient Similarity Search Over Encrypted Data
 
Review on: Techniques for Predicting Frequent Items
Review on: Techniques for Predicting Frequent ItemsReview on: Techniques for Predicting Frequent Items
Review on: Techniques for Predicting Frequent Items
 
Design of file system architecture with cluster
Design of file system architecture with clusterDesign of file system architecture with cluster
Design of file system architecture with cluster
 
An improved apriori algorithm for association rules
An improved apriori algorithm for association rulesAn improved apriori algorithm for association rules
An improved apriori algorithm for association rules
 
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
 
50120130405014 2-3
50120130405014 2-350120130405014 2-3
50120130405014 2-3
 
Ap26261267
Ap26261267Ap26261267
Ap26261267
 
Review Over Sequential Rule Mining
Review Over Sequential Rule MiningReview Over Sequential Rule Mining
Review Over Sequential Rule Mining
 
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Ijarcet vol-2-issue-3-881-883
Ijarcet vol-2-issue-3-881-883Ijarcet vol-2-issue-3-881-883
Ijarcet vol-2-issue-3-881-883
 

Mehr von IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

KĂŒrzlich hochgeladen

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...gurkirankumar98700
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĂșjo
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

KĂŒrzlich hochgeladen (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Data Mining for XML Query-Answering Support

  • 1. IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 5, Issue 6 (Sep-Oct. 2012), PP 25-29 www.iosrjournals.org www.iosrjournals.org 25 | Page Data Mining for XML Query-Answering Support KC. Ravi Kumar1 , E. Krishnaveni Reddy2 , Ramadevi.G3 1, 2, 3 (CSE, Sridevi women’s Engineering College, Hyderabad, Andhra Pradesh) Abstract: XML has become a defacto standard for storing, sharing and exchanging information across heterogeneous platforms. The XML content is growing day by day in rapid pace. Enterprises need to make queries on XML databases frequently. As huge XML data is available, it is challenging task to extract required data from XML database. It is computationally expensive to answer queries without any support. Towards this, in this paper we present a technique known as Tree-based Association Rules (TARs) mined rules that provide required information on structure and content of XML file and the TARs are also stored in XML format. The mined knowledge (TARs) used later for XML query answering support. This enables quick and accurate answering. We also developed a prototype application to demonstrate the efficiency of the proposed system. The empirical results are very positive and query answering is expected to be useful in real time applications. Index Terms: XML, query answering support, data mining, tree-based association rules I. Introduction XML has become a popular format for storing and sharing data across heterogeneous platforms. The XML format is neutral, flexible and interoperable [5]. It is widely used in applications as it can allow applications to have communication though they are built in different platforms. The XML documents are plenty in enterprises and the data retrieval can be done in two ways. The first approach is that user gives keywords and the program searches for relevant documents. The second approach is give XML queries that are answered. The first approach is done using conventional information retrieval technique [4] that works on the search process based on the given search word. With respect to query answering, it is not easy to process such request. To make this searching easy this paper presents data mining for XML query answering support. XML documents are validated by either DTD or schema. However, schema presence is not mandatory to process XML file [3]. This paper presents data mining framework for XML query answering support. The XML documents essence is extracted and kept in another XML file in the form of TARs. With the help of this XML query answering becomes easy. II. Proposed Framework The proposed XML query answering support framework is as shown in fig. 1. The purpose of this framework is to perform data mining on XML and obtain intentional knowledge. The intentional knowledge is also in the form of XML. This is nothing but rules with supports and confidence. In other words the result of data mining is TARs (Tree-based Association Rules). Fig. 1 – Proposed XML query answering support framework As can be seen in fig. 1, the framework is to have data mining for XML query answering support. When XML file is given as input, DOM parser will parse it for wellformedness and validness. If the given XML document is valid, it is parsed and loaded into a DOM object which can be navigated easily. The parsed XML file is given to data mining sub system which is responsible for sub tree generation and also TAR extraction.
  • 2. Data mining for XML query-answering support www.iosrjournals.org 26 | Page The generated TARs are used by Query Processor Sub System. This module takes XML query from end user and makes use of mined knowledge to answer the query quickly. Tar Extraction Extracting TARs through data mining is a process with two steps. In the first step frequent subtrees that satisfy given support are mined are mined. In the second step interesting rules that have confidence above given threshold are calculated from the frequent subtrees. Finding frequent sub trees is described in [1], [2], [6], [7], [8], [9]. Algorithm 1 finds frequent sub trees and calculates interesting rules. The rules obtained from algorithm 1 are written to an XML file. Then indexing is made. Afterwards when XML queries are made, the proposed system uses index and TARs and quickly answers the query. III. Experiments And Results Environment The environment used to develop the prototype application includes JSE (Java Standard Edition) 6.0, Net Beans IDE that run in Windows 7 OS. A PC with 2 GB RAM and 2.9x GHz processor is used. The Java SWING API is used to build graphical user interface while IO and JAXP (Java API for XML Parsing) are used for implementing functionality. The main application GUI is as shown in fig. Fig. 2 – The GUI of the prototype application As can be seen the GUI has provision to choose an XML file as input. It also allows choosing a file for storing extracted rules. A text area is provided to show the XML file content. View Tree button shows the XML file with graphical tree representation. Generate Large ItemSet button generates frequent sub treesthat will be used for further processing. On clicking the GenerateRuleFile button, it extracts TARs from the given XML file and finally extracted rules are saved into the given TAR file. The QueryAnswering button invokes a form where user can enter queries. The query interface is shown in fig. 3.
  • 3. Data mining for XML query-answering support www.iosrjournals.org 27 | Page Fig. 3 – Query interface and results As can be seen the fig. 3 (a) shows interface for making queries. The queries given here are processed faster using rule files extracted from XML files. The result of given query in fig. 3 (a) is shown in fig. 3 (b) with the results that satisfied given support and confidence. IV. Results We have performed four types of experiments. They are based on time required to extract intentional knowledge from XML; time required to answer intentional and extensional queries; monitoring extraction time with given support and confidence; and study of accuracy of intentional answers. As can be seen in fig. 4, the TAR extraction time is more when number of nodes in XML document is more. In other words, the time taken to extract TARs is directly proportional to the number of nodes in given XML document.
  • 4. Data mining for XML query-answering support www.iosrjournals.org 28 | Page As can be seen in fig. 5, the TAR extraction time is more when number of nodes in XML document is more. In other words, the time taken to extract TARs is directly proportional to the number of nodes in given XML document which is generated using XMark. As can be seen in fig. 6, the TAR extraction time is more when number of nodes in XML document is more. In other words, the time taken to extract TARs is directly proportional to the number of nodes in given XML document with fixed depth. Fig. 7 – Extraction time growth using CMTreeMiner with respect to number of nodes As can be seen in fig. 7, the TAR extraction time is more when number of nodes in XML document is more. In other words, the time taken to extract TARs is directly proportional to the number of nodes in given XML document when CMTreeMiner is used. As can be seen in fig. 8, the time taken for intensional and extensional query answering are plotted. However, the intentional query answering takes very less time when compared with that of extensional answering. V. Conclusion In this paper we presented a framework for extracting TARs from given XML file so as to support XML queries. Towards this end, the aim of this paper is to mine frequent association rules and store the mined content in XML format; use the TARs to support query answering or to gain information from XML databases. A prototype application is built to test the efficiency of the proposed framework. The application takes XML file as input and generates TARs and then finally index file that helps in query processing. The experimental results revealed that the proposed application is useful and can be used in real time applications. 0 10 20 30 40 50 0 2 4 6 8 10 S e c o n d s numberof nodes Series1
  • 5. Data mining for XML query-answering support www.iosrjournals.org 29 | Page References [1] R. Agrawal and R. Srikant. Fast algorithms for mining association rulesin large databases. In Proc. of the 20th Int. Conf. on Very Large DataBases, pages 487–499. Morgan Kaufmann Publishers Inc., 1994. [2] T. Asai, H. Arimura, T. Uno, and S. Nakano. Discovering frequentsubstructures in large unordered trees. In Technical Report DOI- TR216, Department of Informatics, Kyushu University. http://www.i.kyushuu.ac.jp/doitr/trcs216.pdf, 2003. [3] D. Barbosa, L. Mignet, and P. Veltri. Studying the xml web: Gatheringstatistics from an xml sample. World Wide Web, 8(4):413– 438, 2005. [4] Gary Marchionini. Exploratory search: from finding to understanding.Communications of the ACM, 49(4):41–46, 2006. [5] World Wide Web Consortium. Extensible Markup Language (XML) 1.0,1998. http://www.w3C.org/TR/REC-xml/. [6] K. Wang and H. Liu. Discovering typical structures of documents: aroad map approach. In Proc. of the 21st Int. Conf. on Research andDevelopment in Information Retrieval, pages 146–154, 1998. [7] Y. Xiao, J. F. Yao, Z. Li, and M. H. Dunham. Efficient data mining formaximal frequent subtrees. In Proc. of the 3rd IEEE Int. Conf. on DataMining, page 379. IEEE Computer Society, 2003. [8] X. Yan and J. Han. Closegraph: mining closed frequent graph patterns.In Proc. of the 9th ACM Int. Conf. on Knowledge Discovery and DataMining, pages 286–295. ACM Press, 2003. [9] M. J. Zaki. Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering,17(8):1021–1035, 2005.Mirjana About Authors: Mr.K.C Ravi Kumar M.Tech CSE from JNTU Hderabad currently he is the head of department for M.Tech CSE programme in Sridevi Women’s Engineering College having 17 years of Academic Experience. He is life member of IEEE & IST areas of research include Data Mining & Data Warehousing Information Retrival Systems Information Security. Mrs. E. Krishnaveni Reddy B.Tech(C.S.E), M.Tech(S.E) Asst.Prof Ms. Ramadevi. G, B.E in C.S.E from Muffakham Jah College of Engineering and technology M.Tech in C.S.E from Sridevi women’s Engineering college