SlideShare ist ein Scribd-Unternehmen logo
1 von 23
TopicMapsforAssociation Rule Mining TomášKliegr, Jan Zemánek,  Marek Ovečka Department ofInformationandKnowledgeEngineering FacultyofInformaticsandStatistics University ofEconomics, Prague
Data Mining using CRISP-DM The goal of data mining is to obtain useful non-trivial patterns from the data. Analytical Report
Common data mining tasks Sex(M) andSalary(Low) andDistrict(Havlickuv Brod) => Quality(Bad) Association rules Clustering Classification
Association Rule Mining EXAMPLE Unlike clustering and classification, association rules provide true “nuggets” – rules meeting selectedinterestmeasures Duration(2y+)andDistrict(Prague)=>Loan Quality(good) THE QUEST FOR TOPIC MAPS Antecedent Consequent THE PROBLEM WITH INTEREST MEASURES Itisusually not possible to tweaktheinterestmeasurethresholdssothatonlythereallyinterestingrules are output. To be on the safe side, we often get (many!) more rulesthandesired,  Selectthereallyinterestingrulesfromtherulesoutputautomatically. Help searchingthroughtheresults.
Thequest More precise tasks 	or Automatic rule filtering The lingua franca for exchange of data mining models is PMML
Predictive Modeling Markup Language XML Schema PMML is the leading standard for statistical and data mining models Supported by over 20 vendors and organizations Covers the technical part of the CRISP-DM Cycle http://www.dmg.org/pmml_examples/index.html
PMML is “just” an XML Schema Developed for deploying mining models  Good for migration from one data mining environment to another But: No explicit links between nodes Verbose Self-contained. Lacks support for Interlinking multiple PMML documents Interlinking PMML with other information
Association Rule Mining Ontology The ontology is a „semantization“ of PMML XML Schema DESIGN GUIDELINES Thekey design principlewas to alloweasytransformation of data from PMML to AROn SCOPE The ontology is limited to thesubsetof PMML relevant to association rule mining.  60 topictypes, 50 associationtypesand 20 occurencetypes USE No automatictransformationisyetavailable, butwe are  working on oneusing OKS framework. Currently, data can be input usingOntopoly.
xs:element ismapped to topic type Topics are assignedsamenames as PMML Nodes Butrespectingspacesbetweenwordsandcapitalization Superclasses are introducedforsemanticallysimilar XML Nodes Namedelementsused as children in otherelementsthatcarry most ofthesemanticsoftheirparents are mergedwithparent Ifan XML element has a directlycorrespondingtopic type in the ontology, the URI ofthe XML element withintheschemaisused as subjectidentifier Design guidelines: Elements
Design guidelines: Attributes Enumerationrestriction on anattributeismapped as a topic type withenumerationsuperclass (thisis a workaroundformissing TMCL support in OKS) Attributesthatcouldbeinterpreted as reference to otherelementsbecomeassociations Otherattributesbecomeoccurencetypes
Design guidelines: Associations Names for association types are arbitrarily chosen so that they are most descriptive Introduce less rather than more associations  minimizes the effort when populating the ontology from PMML Avoid unnecessary inflation of the topic map Link only the semantically closest topics Additional „soft“relations can be introduced  with inference statementsorderivedwithtolog
Design guidelines: Role types Topictypesused to map PMML elements are used as role types Unless multiple topics are permitted in  associationend. In that case superclassisused as a role type, or a new role type isintroduced
Twoalternativeassociation rule representations ,[object Object],(Item-Itemset) ,[object Object],(BooleanAttributes)
Ongoingwork Support for background knowledge „alreadyknownassociationrules“ Support forschemamapping „linkingof background knowledgewithminingresults“ Already in the ontology, distinguished by base ofsubjectidentifier SchemaMapping http://keg.vse.cz/sma/XXX Background Knowledge http://keg.vse.cz/bko/xxx
Data Mining Use case PREDICT LOAN QUALITY Findclientcharacteristicsthatcouldbeused to predicttheirattitude to payingback a loan. BASED ON PAST RECORDS    Input data: records on alreadygivenloans
The data 6181 clients in the PKDD’99 financial dataset Data were preprocessed, i.e.
…. And perhaps 9997 otherassociationrules Preprocessed data Association Rule Learner
WE CAN’T PRESENT ALL 10.000 RULES TO THE CLIENT ASK CLIENT WHAT HE KNOWS Ifloandurationis more thantwoyearsandtheloanwasgiven in Praguedistrict, wecanexpectgoodloanquality. 				…background knowledge
Semantizetheresults
Formalize Background Knowledge
SchemaMapping Background knowledge can use different “vocabulary” than the data  If we are to use background knowledge in querying, we need to interlink them with data. The same approach would apply if we interlink several mining models (PMMLs)
DeletinginformationwithTopicMaps Find association rules that subsume background knowledge Visualizationof a tologquery
Summary Methodology for transferring XML Schema to Topic Maps Association Rule Mining Ontology based on PMML Easily extensible to other data mining algorithms Initial attempts to formalize background knowledge Initial attempts to use Topic Maps for schema mapping AROn On-Line: http://maiana.topicmapslab.de/u/lmaicher/tm/kliegr

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (13)

TMCL Edit
TMCL EditTMCL Edit
TMCL Edit
 
Topic Maps Exchange in the Absence of Shared Vocabularies
Topic Maps Exchange in the Absence of Shared VocabulariesTopic Maps Exchange in the Absence of Shared Vocabularies
Topic Maps Exchange in the Absence of Shared Vocabularies
 
tolog - a topic maps query language
tolog - a topic maps query languagetolog - a topic maps query language
tolog - a topic maps query language
 
Creating Topic Maps Ontologies for Space Experiments
Creating Topic Maps Ontologies for Space ExperimentsCreating Topic Maps Ontologies for Space Experiments
Creating Topic Maps Ontologies for Space Experiments
 
A step towards TMDM 3.0
A step towards TMDM 3.0A step towards TMDM 3.0
A step towards TMDM 3.0
 
interchangeability
interchangeabilityinterchangeability
interchangeability
 
Topic map for Topic Maps case examples
Topic map for Topic Maps case examplesTopic map for Topic Maps case examples
Topic map for Topic Maps case examples
 
idSpace
idSpaceidSpace
idSpace
 
SocioTM – Relevancies, Collaboration, and Socio-knowledge in Topic Maps
SocioTM – Relevancies, Collaboration, and Socio-knowledge in Topic MapsSocioTM – Relevancies, Collaboration, and Socio-knowledge in Topic Maps
SocioTM – Relevancies, Collaboration, and Socio-knowledge in Topic Maps
 
JavaScript Topic Maps in server environments
JavaScript Topic Maps in server environmentsJavaScript Topic Maps in server environments
JavaScript Topic Maps in server environments
 
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
 
HStrategies
HStrategiesHStrategies
HStrategies
 
vbhc
vbhcvbhc
vbhc
 

Ähnlich wie Topic Maps for Association Rule Mining

DMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedDMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
Johannes Hoppe
 
Machine learning for the Web:
Machine learning for the Web: Machine learning for the Web:
Machine learning for the Web:
butest
 
(Talk in Powerpoint Format)
(Talk in Powerpoint Format)(Talk in Powerpoint Format)
(Talk in Powerpoint Format)
butest
 

Ähnlich wie Topic Maps for Association Rule Mining (20)

Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)
 
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedDMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
 
A survey of xml tree patterns
A survey of xml tree patternsA survey of xml tree patterns
A survey of xml tree patterns
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basics
 
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELA TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
 
JOSA TechTalks - Machine Learning in Practice
JOSA TechTalks - Machine Learning in PracticeJOSA TechTalks - Machine Learning in Practice
JOSA TechTalks - Machine Learning in Practice
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
Learning to rank image tags with limited training examples
Learning to rank image tags with limited training examplesLearning to rank image tags with limited training examples
Learning to rank image tags with limited training examples
 
Learning deep structured semantic models for web search
Learning deep structured semantic models for web searchLearning deep structured semantic models for web search
Learning deep structured semantic models for web search
 
Machine learning for the Web:
Machine learning for the Web: Machine learning for the Web:
Machine learning for the Web:
 
(Talk in Powerpoint Format)
(Talk in Powerpoint Format)(Talk in Powerpoint Format)
(Talk in Powerpoint Format)
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)
 
about data mining and Exp about data mining and Exp.
about data mining and Exp about data mining and Exp.about data mining and Exp about data mining and Exp.
about data mining and Exp about data mining and Exp.
 
Clustering Algorithms.pptx
Clustering Algorithms.pptxClustering Algorithms.pptx
Clustering Algorithms.pptx
 
Machine learning-for-dummies-andrews-sobral-activeeon
Machine learning-for-dummies-andrews-sobral-activeeonMachine learning-for-dummies-andrews-sobral-activeeon
Machine learning-for-dummies-andrews-sobral-activeeon
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern MinimalizationABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
 

Mehr von tmra

Weber 2010 brn
Weber 2010 brnWeber 2010 brn
Weber 2010 brn
tmra
 
Designing a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_mapsDesigning a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_maps
tmra
 
Tmra2010 matsuuraposter
Tmra2010 matsuuraposterTmra2010 matsuuraposter
Tmra2010 matsuuraposter
tmra
 
Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010
tmra
 
Presentation final
Presentation finalPresentation final
Presentation final
tmra
 
Mappe1
Mappe1Mappe1
Mappe1
tmra
 

Mehr von tmra (20)

Topic Maps for improved access to and use of content in relational databases ...
Topic Maps for improved access to and use of content in relational databases ...Topic Maps for improved access to and use of content in relational databases ...
Topic Maps for improved access to and use of content in relational databases ...
 
External Schema for Topic Map Database
External Schema for Topic Map DatabaseExternal Schema for Topic Map Database
External Schema for Topic Map Database
 
Weber 2010 brn
Weber 2010 brnWeber 2010 brn
Weber 2010 brn
 
Subject Headings make information to be topic maps
Subject Headings make information to be topic mapsSubject Headings make information to be topic maps
Subject Headings make information to be topic maps
 
Inquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map DatabaseInquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map Database
 
Topic Merge Scenarios for Knowledge Federation
Topic Merge Scenarios for Knowledge FederationTopic Merge Scenarios for Knowledge Federation
Topic Merge Scenarios for Knowledge Federation
 
Modelling IMS QTI with Topic Maps
Modelling IMS QTI with Topic MapsModelling IMS QTI with Topic Maps
Modelling IMS QTI with Topic Maps
 
Hatana - Virtual Topic Map Merging
Hatana - Virtual Topic Map MergingHatana - Virtual Topic Map Merging
Hatana - Virtual Topic Map Merging
 
Designing a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_mapsDesigning a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_maps
 
Maiana - The social Topic Maps explorer
Maiana - The social Topic Maps explorerMaiana - The social Topic Maps explorer
Maiana - The social Topic Maps explorer
 
Tmra2010 matsuuraposter
Tmra2010 matsuuraposterTmra2010 matsuuraposter
Tmra2010 matsuuraposter
 
Automatic semantic interpretation of unstructured data for knowledge management
Automatic semantic interpretation of unstructured data for knowledge managementAutomatic semantic interpretation of unstructured data for knowledge management
Automatic semantic interpretation of unstructured data for knowledge management
 
Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010
 
Presentation final
Presentation finalPresentation final
Presentation final
 
Evaluation of Instances Asset in a Topic Maps-Based Ontology
Evaluation of Instances Asset in a Topic Maps-Based OntologyEvaluation of Instances Asset in a Topic Maps-Based Ontology
Evaluation of Instances Asset in a Topic Maps-Based Ontology
 
Defining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
Defining Domain-Specific Facets for Topic Maps With TMQL Path ExpressionsDefining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
Defining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
 
Mappe1
Mappe1Mappe1
Mappe1
 
Et Tu, Brute? Topic Maps and Discourse Semantics
Et Tu, Brute? Topic Maps and Discourse SemanticsEt Tu, Brute? Topic Maps and Discourse Semantics
Et Tu, Brute? Topic Maps and Discourse Semantics
 
A PHP library for Ontopia-CMS Integration
A PHP library for Ontopia-CMS IntegrationA PHP library for Ontopia-CMS Integration
A PHP library for Ontopia-CMS Integration
 
Live Integration Framework
Live Integration FrameworkLive Integration Framework
Live Integration Framework
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Topic Maps for Association Rule Mining

  • 1. TopicMapsforAssociation Rule Mining TomášKliegr, Jan Zemánek, Marek Ovečka Department ofInformationandKnowledgeEngineering FacultyofInformaticsandStatistics University ofEconomics, Prague
  • 2. Data Mining using CRISP-DM The goal of data mining is to obtain useful non-trivial patterns from the data. Analytical Report
  • 3. Common data mining tasks Sex(M) andSalary(Low) andDistrict(Havlickuv Brod) => Quality(Bad) Association rules Clustering Classification
  • 4. Association Rule Mining EXAMPLE Unlike clustering and classification, association rules provide true “nuggets” – rules meeting selectedinterestmeasures Duration(2y+)andDistrict(Prague)=>Loan Quality(good) THE QUEST FOR TOPIC MAPS Antecedent Consequent THE PROBLEM WITH INTEREST MEASURES Itisusually not possible to tweaktheinterestmeasurethresholdssothatonlythereallyinterestingrules are output. To be on the safe side, we often get (many!) more rulesthandesired, Selectthereallyinterestingrulesfromtherulesoutputautomatically. Help searchingthroughtheresults.
  • 5. Thequest More precise tasks or Automatic rule filtering The lingua franca for exchange of data mining models is PMML
  • 6. Predictive Modeling Markup Language XML Schema PMML is the leading standard for statistical and data mining models Supported by over 20 vendors and organizations Covers the technical part of the CRISP-DM Cycle http://www.dmg.org/pmml_examples/index.html
  • 7. PMML is “just” an XML Schema Developed for deploying mining models Good for migration from one data mining environment to another But: No explicit links between nodes Verbose Self-contained. Lacks support for Interlinking multiple PMML documents Interlinking PMML with other information
  • 8. Association Rule Mining Ontology The ontology is a „semantization“ of PMML XML Schema DESIGN GUIDELINES Thekey design principlewas to alloweasytransformation of data from PMML to AROn SCOPE The ontology is limited to thesubsetof PMML relevant to association rule mining. 60 topictypes, 50 associationtypesand 20 occurencetypes USE No automatictransformationisyetavailable, butwe are working on oneusing OKS framework. Currently, data can be input usingOntopoly.
  • 9. xs:element ismapped to topic type Topics are assignedsamenames as PMML Nodes Butrespectingspacesbetweenwordsandcapitalization Superclasses are introducedforsemanticallysimilar XML Nodes Namedelementsused as children in otherelementsthatcarry most ofthesemanticsoftheirparents are mergedwithparent Ifan XML element has a directlycorrespondingtopic type in the ontology, the URI ofthe XML element withintheschemaisused as subjectidentifier Design guidelines: Elements
  • 10. Design guidelines: Attributes Enumerationrestriction on anattributeismapped as a topic type withenumerationsuperclass (thisis a workaroundformissing TMCL support in OKS) Attributesthatcouldbeinterpreted as reference to otherelementsbecomeassociations Otherattributesbecomeoccurencetypes
  • 11. Design guidelines: Associations Names for association types are arbitrarily chosen so that they are most descriptive Introduce less rather than more associations minimizes the effort when populating the ontology from PMML Avoid unnecessary inflation of the topic map Link only the semantically closest topics Additional „soft“relations can be introduced with inference statementsorderivedwithtolog
  • 12. Design guidelines: Role types Topictypesused to map PMML elements are used as role types Unless multiple topics are permitted in associationend. In that case superclassisused as a role type, or a new role type isintroduced
  • 13.
  • 14. Ongoingwork Support for background knowledge „alreadyknownassociationrules“ Support forschemamapping „linkingof background knowledgewithminingresults“ Already in the ontology, distinguished by base ofsubjectidentifier SchemaMapping http://keg.vse.cz/sma/XXX Background Knowledge http://keg.vse.cz/bko/xxx
  • 15. Data Mining Use case PREDICT LOAN QUALITY Findclientcharacteristicsthatcouldbeused to predicttheirattitude to payingback a loan. BASED ON PAST RECORDS Input data: records on alreadygivenloans
  • 16. The data 6181 clients in the PKDD’99 financial dataset Data were preprocessed, i.e.
  • 17. …. And perhaps 9997 otherassociationrules Preprocessed data Association Rule Learner
  • 18. WE CAN’T PRESENT ALL 10.000 RULES TO THE CLIENT ASK CLIENT WHAT HE KNOWS Ifloandurationis more thantwoyearsandtheloanwasgiven in Praguedistrict, wecanexpectgoodloanquality. …background knowledge
  • 21. SchemaMapping Background knowledge can use different “vocabulary” than the data If we are to use background knowledge in querying, we need to interlink them with data. The same approach would apply if we interlink several mining models (PMMLs)
  • 22. DeletinginformationwithTopicMaps Find association rules that subsume background knowledge Visualizationof a tologquery
  • 23. Summary Methodology for transferring XML Schema to Topic Maps Association Rule Mining Ontology based on PMML Easily extensible to other data mining algorithms Initial attempts to formalize background knowledge Initial attempts to use Topic Maps for schema mapping AROn On-Line: http://maiana.topicmapslab.de/u/lmaicher/tm/kliegr