This paper investigates the possibilities for post-processing results of association rule mining algorithms with topic maps. Converting discovered association rules (DARs) as well as background knowledge to a topic map representation allows to assess the interestingness of discovered rules automatically with a topic map query language. This paper introduces a DAR ontology based on the GUHA method, a background knowledge ontology and a way of linking these two ontologies. It is shown on an example how these topic map ontologies can be used to represent particular mining data and how the tolog query language can be used to automatically find interesting rules in such a representation.
Strategies for Landing an Oracle DBA Job as a Fresher
Topic Maps for Association Rule Mining
1. TopicMapsforAssociation Rule Mining TomášKliegr, Jan Zemánek, Marek Ovečka Department ofInformationandKnowledgeEngineering FacultyofInformaticsandStatistics University ofEconomics, Prague
2. Data Mining using CRISP-DM The goal of data mining is to obtain useful non-trivial patterns from the data. Analytical Report
3. Common data mining tasks Sex(M) andSalary(Low) andDistrict(Havlickuv Brod) => Quality(Bad) Association rules Clustering Classification
4. Association Rule Mining EXAMPLE Unlike clustering and classification, association rules provide true “nuggets” – rules meeting selectedinterestmeasures Duration(2y+)andDistrict(Prague)=>Loan Quality(good) THE QUEST FOR TOPIC MAPS Antecedent Consequent THE PROBLEM WITH INTEREST MEASURES Itisusually not possible to tweaktheinterestmeasurethresholdssothatonlythereallyinterestingrules are output. To be on the safe side, we often get (many!) more rulesthandesired, Selectthereallyinterestingrulesfromtherulesoutputautomatically. Help searchingthroughtheresults.
5. Thequest More precise tasks or Automatic rule filtering The lingua franca for exchange of data mining models is PMML
6. Predictive Modeling Markup Language XML Schema PMML is the leading standard for statistical and data mining models Supported by over 20 vendors and organizations Covers the technical part of the CRISP-DM Cycle http://www.dmg.org/pmml_examples/index.html
7. PMML is “just” an XML Schema Developed for deploying mining models Good for migration from one data mining environment to another But: No explicit links between nodes Verbose Self-contained. Lacks support for Interlinking multiple PMML documents Interlinking PMML with other information
8. Association Rule Mining Ontology The ontology is a „semantization“ of PMML XML Schema DESIGN GUIDELINES Thekey design principlewas to alloweasytransformation of data from PMML to AROn SCOPE The ontology is limited to thesubsetof PMML relevant to association rule mining. 60 topictypes, 50 associationtypesand 20 occurencetypes USE No automatictransformationisyetavailable, butwe are working on oneusing OKS framework. Currently, data can be input usingOntopoly.
9. xs:element ismapped to topic type Topics are assignedsamenames as PMML Nodes Butrespectingspacesbetweenwordsandcapitalization Superclasses are introducedforsemanticallysimilar XML Nodes Namedelementsused as children in otherelementsthatcarry most ofthesemanticsoftheirparents are mergedwithparent Ifan XML element has a directlycorrespondingtopic type in the ontology, the URI ofthe XML element withintheschemaisused as subjectidentifier Design guidelines: Elements
10. Design guidelines: Attributes Enumerationrestriction on anattributeismapped as a topic type withenumerationsuperclass (thisis a workaroundformissing TMCL support in OKS) Attributesthatcouldbeinterpreted as reference to otherelementsbecomeassociations Otherattributesbecomeoccurencetypes
11. Design guidelines: Associations Names for association types are arbitrarily chosen so that they are most descriptive Introduce less rather than more associations minimizes the effort when populating the ontology from PMML Avoid unnecessary inflation of the topic map Link only the semantically closest topics Additional „soft“relations can be introduced with inference statementsorderivedwithtolog
12. Design guidelines: Role types Topictypesused to map PMML elements are used as role types Unless multiple topics are permitted in associationend. In that case superclassisused as a role type, or a new role type isintroduced
13.
14. Ongoingwork Support for background knowledge „alreadyknownassociationrules“ Support forschemamapping „linkingof background knowledgewithminingresults“ Already in the ontology, distinguished by base ofsubjectidentifier SchemaMapping http://keg.vse.cz/sma/XXX Background Knowledge http://keg.vse.cz/bko/xxx
15. Data Mining Use case PREDICT LOAN QUALITY Findclientcharacteristicsthatcouldbeused to predicttheirattitude to payingback a loan. BASED ON PAST RECORDS Input data: records on alreadygivenloans
16. The data 6181 clients in the PKDD’99 financial dataset Data were preprocessed, i.e.
17. …. And perhaps 9997 otherassociationrules Preprocessed data Association Rule Learner
18. WE CAN’T PRESENT ALL 10.000 RULES TO THE CLIENT ASK CLIENT WHAT HE KNOWS Ifloandurationis more thantwoyearsandtheloanwasgiven in Praguedistrict, wecanexpectgoodloanquality. …background knowledge
21. SchemaMapping Background knowledge can use different “vocabulary” than the data If we are to use background knowledge in querying, we need to interlink them with data. The same approach would apply if we interlink several mining models (PMMLs)
23. Summary Methodology for transferring XML Schema to Topic Maps Association Rule Mining Ontology based on PMML Easily extensible to other data mining algorithms Initial attempts to formalize background knowledge Initial attempts to use Topic Maps for schema mapping AROn On-Line: http://maiana.topicmapslab.de/u/lmaicher/tm/kliegr