SlideShare a Scribd company logo
1 of 9
Download to read offline
Transformation and aggregation
preprocessing for top-k recommendation
GAP rules induction
Marta Vomlelova, Michal Kopecky and Peter Vojtas
Charles University Prague
Content
• Data
• Task
• Mining – heuristics, domain specific, …
• Some results
• Mining - transferable methods , data aggregations
• Some results
• Oracle DB Data Miner
• Second order logic GAP rules
• Conclusions
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
2
RuleML-2015 Challenge Rule-based RS
for the web of data
3
Task
• Run Python script train data – intermediate join processing size big,
redundant (for each UserID,MovieID the 5003 movie data repeat)
• For each user find 5 movies that best match a user profile top5(u)
• Submit CSV format: userId, movieId, scoren
• Observations
• Score does not affect system response, only (unordered) sets are
compared
• P, R, F@5 between top5(u) and varying size target (estimated average
size of target is 9.4 resp. 8, depending on assumptions)
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
4
Mining – heuristics, domain specific, …
• 5003 DBPedia attributes – most frequent, clusters of properties, tried
mining, no relevant results (acquaintance with data)
• per attribute:
• relative frequency in ratings, NLP extraction
MAKEUP,VISUAL,SMIX,SEDIT,SPIELBERG,NY,CALIF,NOVELS,CAMERON,LA,ARIZONA,WILLIAMS
• KSI Pure first order logic with weighted average F = 0.05262 (our third)
• 0-1 order agreement with ratings ( good properties)
• 100*Movies.Spielberg + 50*Movies.Original + Movies.BayesAVG
• SCS_CUNI “Spielberg” F = 0.10681 (our best)
• Script downloaded table Xratings DB Ratings gave surprise
• disqualified Did not use only the training/test set F = 0.6987
• Precision: 0.9994 * 5000 = 4997 – three users have target set of size 4
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
5
Transferable methods , data aggregations
• GenreMatch (genres in users ratings versus movie genres) and decision
tree drastic pruning
• KTIML Data mining combined with first order 0.10085 (our second)
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
6
RulePreference Rule
0.11 R1:GoodProperty=1
0.25 R2: 113.5<CNT<400
0.29 R3: R1 and R2
0.58 R4: GoodProperty=0& CNT>399
0.57 R5: GoodProperty=1 & CNT>399
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
7
Oracle DB Data Miner
Second order logic GAP rules
• DB aggregations  second order logic
• “simple” queries can be transformed to rules. E.g.
SELECT UserID, MovieID, 5 FROM Ordered_Prediction WHERE OrdNr <= 5; …
… 100*Movies.Spielberg + 50*Movies.Original + Movies.BayesAVG
• corresponds to GAP rule
• SCS_CUNI_Movie(u,m):100*x1+50*x2+ x3 
•  SPIELBERG(m): x1 & ORIGINAL(m): x2 & BAYESAVG(m):x3
• Semantics so far:
• 2GAP - facts extended by atomic predicates corresponding to tables resulting
from database aggregations e.g. SPIELBERG(m), ORIGINAL(m), BAYESAVG(m)
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
8
Conclusions
• Data too big for rule induction tools – all processing in a relational DB
• Transformation via NLP extraction. Clustering and importance of
attributes
• Data base aggregation – CNT, AVG, ….
• “simple” rules (in a second order logic GAP)
• Rules give explanation intuitive for humans
• Precision - In ideal case we gave 75% of users at least one correct
recommendation
• Future work – distribution of learning quality along users (not only
AVG)
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
9

More Related Content

Viewers also liked

History of Induction and Recursion B
History of Induction and Recursion B History of Induction and Recursion B
History of Induction and Recursion B
Damien MacFarland
 
Induction and Decision Tree Learning (Part 1)
Induction and Decision Tree Learning (Part 1)Induction and Decision Tree Learning (Part 1)
Induction and Decision Tree Learning (Part 1)
butest
 
11X1 T14 08 mathematical induction 1 (2011)
11X1 T14 08 mathematical induction 1 (2011)11X1 T14 08 mathematical induction 1 (2011)
11X1 T14 08 mathematical induction 1 (2011)
Nigel Simmons
 
Math induction principle (slides)
Math induction principle (slides)Math induction principle (slides)
Math induction principle (slides)
IIUM
 
Mathematical induction
Mathematical inductionMathematical induction
Mathematical induction
Sman Abbasi
 
5.4 mathematical induction
5.4 mathematical induction5.4 mathematical induction
5.4 mathematical induction
math260
 

Viewers also liked (20)

RuleML 2015: Ontology Reasoning using Rules in an eHealth Context
RuleML 2015: Ontology Reasoning using Rules in an eHealth ContextRuleML 2015: Ontology Reasoning using Rules in an eHealth Context
RuleML 2015: Ontology Reasoning using Rules in an eHealth Context
 
OWL-XML-Summer-School-09
OWL-XML-Summer-School-09OWL-XML-Summer-School-09
OWL-XML-Summer-School-09
 
RuleML2015: How to combine event stream reasoning with transactions for the...
RuleML2015:   How to combine event stream reasoning with transactions for the...RuleML2015:   How to combine event stream reasoning with transactions for the...
RuleML2015: How to combine event stream reasoning with transactions for the...
 
RuleML 2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML 2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...RuleML 2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML 2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
 
History of Induction and Recursion B
History of Induction and Recursion B History of Induction and Recursion B
History of Induction and Recursion B
 
Induction and Decision Tree Learning (Part 1)
Induction and Decision Tree Learning (Part 1)Induction and Decision Tree Learning (Part 1)
Induction and Decision Tree Learning (Part 1)
 
RuleML 2015: When Processes Rule Events
RuleML 2015: When Processes Rule EventsRuleML 2015: When Processes Rule Events
RuleML 2015: When Processes Rule Events
 
11X1 T14 08 mathematical induction 1 (2011)
11X1 T14 08 mathematical induction 1 (2011)11X1 T14 08 mathematical induction 1 (2011)
11X1 T14 08 mathematical induction 1 (2011)
 
Challenge@rule ml2015 rule based recommender systems for the Web of Data
Challenge@rule ml2015 rule based recommender systems for the Web of DataChallenge@rule ml2015 rule based recommender systems for the Web of Data
Challenge@rule ml2015 rule based recommender systems for the Web of Data
 
Java and SPARQL
Java and SPARQLJava and SPARQL
Java and SPARQL
 
An Introduction to the Jena API
An Introduction to the Jena APIAn Introduction to the Jena API
An Introduction to the Jena API
 
Java and OWL
Java and OWLJava and OWL
Java and OWL
 
Ontology development in protégé-آنتولوژی در پروتوغه
Ontology development in protégé-آنتولوژی در پروتوغهOntology development in protégé-آنتولوژی در پروتوغه
Ontology development in protégé-آنتولوژی در پروتوغه
 
Iteration, induction, and recursion
Iteration, induction, and recursionIteration, induction, and recursion
Iteration, induction, and recursion
 
Math induction principle (slides)
Math induction principle (slides)Math induction principle (slides)
Math induction principle (slides)
 
Protege tutorial
Protege tutorialProtege tutorial
Protege tutorial
 
Mathematical induction
Mathematical inductionMathematical induction
Mathematical induction
 
Mathematical induction
Mathematical inductionMathematical induction
Mathematical induction
 
5.4 mathematical induction
5.4 mathematical induction5.4 mathematical induction
5.4 mathematical induction
 
Principle of mathematical induction
Principle of mathematical inductionPrinciple of mathematical induction
Principle of mathematical induction
 

Similar to Challenge@RuleML2015 Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

Geo2tag performance evaluation, Zaslavsky, Krinkin
Geo2tag performance evaluation, Zaslavsky, Krinkin Geo2tag performance evaluation, Zaslavsky, Krinkin
Geo2tag performance evaluation, Zaslavsky, Krinkin
OSLL
 

Similar to Challenge@RuleML2015 Transformation and aggregation preprocessing for top-k recommendation GAP rules induction (20)

Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jWebinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
 
Can data virtualization uphold performance with complex queries?
Can data virtualization uphold performance with complex queries?Can data virtualization uphold performance with complex queries?
Can data virtualization uphold performance with complex queries?
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
 
Trafficshifting: Avoiding Disasters & Improving Performance at Scale
Trafficshifting: Avoiding Disasters & Improving Performance at ScaleTrafficshifting: Avoiding Disasters & Improving Performance at Scale
Trafficshifting: Avoiding Disasters & Improving Performance at Scale
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
BSSML17 - Deepnets
BSSML17 - DeepnetsBSSML17 - Deepnets
BSSML17 - Deepnets
 
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
 
My sql cluster case study apr16
My sql cluster case study apr16My sql cluster case study apr16
My sql cluster case study apr16
 
Benchmarking
BenchmarkingBenchmarking
Benchmarking
 
Policy 2012 presentation
Policy 2012 presentationPolicy 2012 presentation
Policy 2012 presentation
 
AlphaPy
AlphaPyAlphaPy
AlphaPy
 
PostgreSQL as a Big Data Platform
PostgreSQL as a Big Data Platform PostgreSQL as a Big Data Platform
PostgreSQL as a Big Data Platform
 
Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data mining
 
PLNOG 3: John Evans - Best Practices in Network Planning
PLNOG 3: John Evans - Best Practices in Network PlanningPLNOG 3: John Evans - Best Practices in Network Planning
PLNOG 3: John Evans - Best Practices in Network Planning
 
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
 
Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems
Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems
Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems
 
Geo2tag performance evaluation, Zaslavsky, Krinkin
Geo2tag performance evaluation, Zaslavsky, Krinkin Geo2tag performance evaluation, Zaslavsky, Krinkin
Geo2tag performance evaluation, Zaslavsky, Krinkin
 

More from RuleML

A software agent controlling 2 robot arms in co-operating concurrent tasks
A software agent controlling 2 robot arms in co-operating concurrent tasksA software agent controlling 2 robot arms in co-operating concurrent tasks
A software agent controlling 2 robot arms in co-operating concurrent tasks
RuleML
 

More from RuleML (20)

Aggregates in Recursion: Issues and Solutions
Aggregates in Recursion: Issues and SolutionsAggregates in Recursion: Issues and Solutions
Aggregates in Recursion: Issues and Solutions
 
A software agent controlling 2 robot arms in co-operating concurrent tasks
A software agent controlling 2 robot arms in co-operating concurrent tasksA software agent controlling 2 robot arms in co-operating concurrent tasks
A software agent controlling 2 robot arms in co-operating concurrent tasks
 
Port Clearance Rules in PSOA RuleML: From Controlled-English Regulation to Ob...
Port Clearance Rules in PSOA RuleML: From Controlled-English Regulation to Ob...Port Clearance Rules in PSOA RuleML: From Controlled-English Regulation to Ob...
Port Clearance Rules in PSOA RuleML: From Controlled-English Regulation to Ob...
 
Challenge@RuleML2015 Developing Situation-Aware Applications for Disaster Man...
Challenge@RuleML2015 Developing Situation-Aware Applications for Disaster Man...Challenge@RuleML2015 Developing Situation-Aware Applications for Disaster Man...
Challenge@RuleML2015 Developing Situation-Aware Applications for Disaster Man...
 
Rule Generalization Strategies in Incremental Learning of Disjunctive Concepts
Rule Generalization Strategies in Incremental Learning of Disjunctive ConceptsRule Generalization Strategies in Incremental Learning of Disjunctive Concepts
Rule Generalization Strategies in Incremental Learning of Disjunctive Concepts
 
RuleML 2015 Constraint Handling Rules - What Else?
RuleML 2015 Constraint Handling Rules - What Else?RuleML 2015 Constraint Handling Rules - What Else?
RuleML 2015 Constraint Handling Rules - What Else?
 
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
 
RuleML2015 PSOA RuleML: Integrated Object-Relational Data and Rules
RuleML2015 PSOA RuleML: Integrated Object-Relational Data and RulesRuleML2015 PSOA RuleML: Integrated Object-Relational Data and Rules
RuleML2015 PSOA RuleML: Integrated Object-Relational Data and Rules
 
Industry@RuleML2015: Norwegian State of Estate A Reporting Service for the St...
Industry@RuleML2015: Norwegian State of Estate A Reporting Service for the St...Industry@RuleML2015: Norwegian State of Estate A Reporting Service for the St...
Industry@RuleML2015: Norwegian State of Estate A Reporting Service for the St...
 
A Service for Improving the Assignments of Common Agriculture Policy Funds to...
A Service for Improving the Assignments of Common Agriculture Policy Funds to...A Service for Improving the Assignments of Common Agriculture Policy Funds to...
A Service for Improving the Assignments of Common Agriculture Policy Funds to...
 
Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-
Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-
Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-
 
RuleML2015: Binary Frontier-guarded ASP with Function Symbols
RuleML2015: Binary Frontier-guarded ASP with Function SymbolsRuleML2015: Binary Frontier-guarded ASP with Function Symbols
RuleML2015: Binary Frontier-guarded ASP with Function Symbols
 
RuleML2015: API4KP Metamodel: A Meta-API for Heterogeneous Knowledge Platforms
RuleML2015: API4KP Metamodel: A Meta-API for Heterogeneous Knowledge PlatformsRuleML2015: API4KP Metamodel: A Meta-API for Heterogeneous Knowledge Platforms
RuleML2015: API4KP Metamodel: A Meta-API for Heterogeneous Knowledge Platforms
 
RuleML2015: Rule-Based Exploration of Structured Data in the Browser
RuleML2015: Rule-Based Exploration of Structured Data in the BrowserRuleML2015: Rule-Based Exploration of Structured Data in the Browser
RuleML2015: Rule-Based Exploration of Structured Data in the Browser
 
RuleML2015: Ontology-Based Multidimensional Contexts with Applications to Qua...
RuleML2015: Ontology-Based Multidimensional Contexts with Applications to Qua...RuleML2015: Ontology-Based Multidimensional Contexts with Applications to Qua...
RuleML2015: Ontology-Based Multidimensional Contexts with Applications to Qua...
 
RuleML2015: Compact representation of conditional probability for rule-based...
RuleML2015:  Compact representation of conditional probability for rule-based...RuleML2015:  Compact representation of conditional probability for rule-based...
RuleML2015: Compact representation of conditional probability for rule-based...
 
RuleML2015: Learning Characteristic Rules in Geographic Information Systems
RuleML2015: Learning Characteristic Rules in Geographic Information SystemsRuleML2015: Learning Characteristic Rules in Geographic Information Systems
RuleML2015: Learning Characteristic Rules in Geographic Information Systems
 
RuleML2015: Using Substitutive Itemset Mining Framework for Finding Synonymou...
RuleML2015: Using Substitutive Itemset Mining Framework for Finding Synonymou...RuleML2015: Using Substitutive Itemset Mining Framework for Finding Synonymou...
RuleML2015: Using Substitutive Itemset Mining Framework for Finding Synonymou...
 
RuleML2015: User Extensible System to Identify Problems in OWL Ontologies and...
RuleML2015: User Extensible System to Identify Problems in OWL Ontologies and...RuleML2015: User Extensible System to Identify Problems in OWL Ontologies and...
RuleML2015: User Extensible System to Identify Problems in OWL Ontologies and...
 
RuleML2015: Representing Flexible Role-Based Access Control Policies Using Ob...
RuleML2015: Representing Flexible Role-Based Access Control Policies Using Ob...RuleML2015: Representing Flexible Role-Based Access Control Policies Using Ob...
RuleML2015: Representing Flexible Role-Based Access Control Policies Using Ob...
 

Recently uploaded

Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 

Recently uploaded (20)

Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 

Challenge@RuleML2015 Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

  • 1. Transformation and aggregation preprocessing for top-k recommendation GAP rules induction Marta Vomlelova, Michal Kopecky and Peter Vojtas Charles University Prague
  • 2. Content • Data • Task • Mining – heuristics, domain specific, … • Some results • Mining - transferable methods , data aggregations • Some results • Oracle DB Data Miner • Second order logic GAP rules • Conclusions RuleML-2015 Challenge Rule-based RS for the web of data Transformation and aggregation preprocessing for top-k recommendation GAP rules induction 2
  • 3. RuleML-2015 Challenge Rule-based RS for the web of data 3
  • 4. Task • Run Python script train data – intermediate join processing size big, redundant (for each UserID,MovieID the 5003 movie data repeat) • For each user find 5 movies that best match a user profile top5(u) • Submit CSV format: userId, movieId, scoren • Observations • Score does not affect system response, only (unordered) sets are compared • P, R, F@5 between top5(u) and varying size target (estimated average size of target is 9.4 resp. 8, depending on assumptions) RuleML-2015 Challenge Rule-based RS for the web of data Transformation and aggregation preprocessing for top-k recommendation GAP rules induction 4
  • 5. Mining – heuristics, domain specific, … • 5003 DBPedia attributes – most frequent, clusters of properties, tried mining, no relevant results (acquaintance with data) • per attribute: • relative frequency in ratings, NLP extraction MAKEUP,VISUAL,SMIX,SEDIT,SPIELBERG,NY,CALIF,NOVELS,CAMERON,LA,ARIZONA,WILLIAMS • KSI Pure first order logic with weighted average F = 0.05262 (our third) • 0-1 order agreement with ratings ( good properties) • 100*Movies.Spielberg + 50*Movies.Original + Movies.BayesAVG • SCS_CUNI “Spielberg” F = 0.10681 (our best) • Script downloaded table Xratings DB Ratings gave surprise • disqualified Did not use only the training/test set F = 0.6987 • Precision: 0.9994 * 5000 = 4997 – three users have target set of size 4 RuleML-2015 Challenge Rule-based RS for the web of data Transformation and aggregation preprocessing for top-k recommendation GAP rules induction 5
  • 6. Transferable methods , data aggregations • GenreMatch (genres in users ratings versus movie genres) and decision tree drastic pruning • KTIML Data mining combined with first order 0.10085 (our second) RuleML-2015 Challenge Rule-based RS for the web of data Transformation and aggregation preprocessing for top-k recommendation GAP rules induction 6 RulePreference Rule 0.11 R1:GoodProperty=1 0.25 R2: 113.5<CNT<400 0.29 R3: R1 and R2 0.58 R4: GoodProperty=0& CNT>399 0.57 R5: GoodProperty=1 & CNT>399
  • 7. RuleML-2015 Challenge Rule-based RS for the web of data Transformation and aggregation preprocessing for top-k recommendation GAP rules induction 7 Oracle DB Data Miner
  • 8. Second order logic GAP rules • DB aggregations  second order logic • “simple” queries can be transformed to rules. E.g. SELECT UserID, MovieID, 5 FROM Ordered_Prediction WHERE OrdNr <= 5; … … 100*Movies.Spielberg + 50*Movies.Original + Movies.BayesAVG • corresponds to GAP rule • SCS_CUNI_Movie(u,m):100*x1+50*x2+ x3  •  SPIELBERG(m): x1 & ORIGINAL(m): x2 & BAYESAVG(m):x3 • Semantics so far: • 2GAP - facts extended by atomic predicates corresponding to tables resulting from database aggregations e.g. SPIELBERG(m), ORIGINAL(m), BAYESAVG(m) RuleML-2015 Challenge Rule-based RS for the web of data Transformation and aggregation preprocessing for top-k recommendation GAP rules induction 8
  • 9. Conclusions • Data too big for rule induction tools – all processing in a relational DB • Transformation via NLP extraction. Clustering and importance of attributes • Data base aggregation – CNT, AVG, …. • “simple” rules (in a second order logic GAP) • Rules give explanation intuitive for humans • Precision - In ideal case we gave 75% of users at least one correct recommendation • Future work – distribution of learning quality along users (not only AVG) RuleML-2015 Challenge Rule-based RS for the web of data Transformation and aggregation preprocessing for top-k recommendation GAP rules induction 9