Challenge@RuleML2015 Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

•

1 like•402 views

In this paper we describe the KTIML team approach to RuleML 2015 Rule-based Recommender Systems for the Web of Data Challenge Track. The task is to estimate the top 5 movies for each user separately in a semantically enriched MovieLens 1M dataset. We have three results. Best is a domain specif-ic method like "recommend for all users the same set of movies from Spiel-berg". Our contributions are domain independent data mining methods tailored for top-k which combine second order logic data aggregations and transfor-mations of metadata, especially 5003 open data attributes and general GAP rules mining methods.

Science

Transformation and aggregation
preprocessing for top-k recommendation
GAP rules induction
Marta Vomlelova, Michal Kopecky and Peter Vojtas
Charles University Prague

Content
• Data
• Task
• Mining – heuristics, domain specific, …
• Some results
• Mining - transferable methods , data aggregations
• Some results
• Oracle DB Data Miner
• Second order logic GAP rules
• Conclusions
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
2

RuleML-2015 Challenge Rule-based RS
for the web of data
3

Mining – heuristics, domain specific, …
• 5003 DBPedia attributes – most frequent, clusters of properties, tried
mining, no relevant results (acquaintance with data)
• per attribute:
• relative frequency in ratings, NLP extraction
MAKEUP,VISUAL,SMIX,SEDIT,SPIELBERG,NY,CALIF,NOVELS,CAMERON,LA,ARIZONA,WILLIAMS
• KSI Pure first order logic with weighted average F = 0.05262 (our third)
• 0-1 order agreement with ratings ( good properties)
• 100*Movies.Spielberg + 50*Movies.Original + Movies.BayesAVG
• SCS_CUNI “Spielberg” F = 0.10681 (our best)
• Script downloaded table Xratings DB Ratings gave surprise
• disqualified Did not use only the training/test set F = 0.6987
• Precision: 0.9994 * 5000 = 4997 – three users have target set of size 4
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
5

Transferable methods , data aggregations
• GenreMatch (genres in users ratings versus movie genres) and decision
tree drastic pruning
• KTIML Data mining combined with first order 0.10085 (our second)
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
6
RulePreference Rule
0.11 R1:GoodProperty=1
0.25 R2: 113.5<CNT<400
0.29 R3: R1 and R2
0.58 R4: GoodProperty=0& CNT>399
0.57 R5: GoodProperty=1 & CNT>399

RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
7
Oracle DB Data Miner

Second order logic GAP rules
• DB aggregations  second order logic
• “simple” queries can be transformed to rules. E.g.
SELECT UserID, MovieID, 5 FROM Ordered_Prediction WHERE OrdNr <= 5; …
… 100*Movies.Spielberg + 50*Movies.Original + Movies.BayesAVG
• corresponds to GAP rule
• SCS_CUNI_Movie(u,m):100*x1+50*x2+ x3 
•  SPIELBERG(m): x1 & ORIGINAL(m): x2 & BAYESAVG(m):x3
• Semantics so far:
• 2GAP - facts extended by atomic predicates corresponding to tables resulting
from database aggregations e.g. SPIELBERG(m), ORIGINAL(m), BAYESAVG(m)
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
8

Conclusions
• Data too big for rule induction tools – all processing in a relational DB
• Transformation via NLP extraction. Clustering and importance of
attributes
• Data base aggregation – CNT, AVG, ….
• “simple” rules (in a second order logic GAP)
• Rules give explanation intuitive for humans
• Precision - In ideal case we gave 75% of users at least one correct
recommendation
• Future work – distribution of learning quality along users (not only
AVG)
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
9

Viewers also liked

Traditionally, nurse call systems in hospitals are rather simple: patients have a button next to their bed to call a nurse. Which specific nurse is called cannot be controlled, as there is no extra information available. This is different for solutions based on semantic knowledge: if the state of care givers (busy or free), their current position, and for example their skills are known, a system can always choose the best suitable nurse for a call. In this paper we describe such a semantic nurse call system implemented using the EYE reasoner and Notation3 rules. The system is able to perform OWL-RL reasoning. Additionally, we use rules to implement complex decision trees. We compare our solution to an implementation using OWL-DL, the Pellet reasoner, and SPARQL queries. We show that our purely rule-based approach gives promising results. Further improvements will lead to a mature product which will significantly change the organization of modern hospitals.

RuleML 2015: Ontology Reasoning using Rules in an eHealth Context

RuleML

OWL-XML-Summer-School-09

Duncan Hull

Semantic Sensor Web is a new trend of research integrating Semantic Web technologies with sensor networks. It uses Semantic Web standards to describe both the data produced by the sensors, but also the sensors and their networks, which enables interoperability of sensor networks, and provides a way to formally analyze and reason about these networks. Since sensors produce data at a very high rate, they require solutions to reason efficiently about what complex events occur based on the data captured. In this paper we propose T Rev as a solution to combine the detection of complex events with the execution of transactions for these domains. T Rev is an abstract logic to model and execute reactive transactions. The logic is parametric on a pair of oracles defining the basic primitives of the domain, which makes it suitable for a wide range of applications. In this paper we provide oracle instantiations combining RDF/OWL and relational database semantics for T Rev. Afterwards, based on these oracles, we illustrate how T Rev can be useful for these domains.

RuleML2015: How to combine event stream reasoning with transactions for the...

RuleML

Since the development of Notation3 Logic, several years have passed in which the theory has been refined and used in practice by different reasoning engines such as cwm, FuXi or EYE. Nevertheless, a clear model-theoretic definition of its semantics is still missing. This leaves room for individual interpretations and renders it difficult to make clear statements about its relation to other logics such as DL or FOL or even about such basic concepts as correctness. In this paper we address one of the main open challenges: the formalization of implicit quantification. We point out how the interpretation of implicit quantifiers differs in two of the above mentioned reasoning engines and how the specification, proposed in the W3C team submission, could be formalized. Our formalization is then put into context by integrating it into a model-theoretic definition of the whole language. We finish our contribution by arguing why universal quantification should be handled differently than currently prescribed.

RuleML 2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...

RuleML

History of Induction and Recursion B

Damien MacFarland

Induction and Decision Tree Learning (Part 1)

butest

Big data, with its four main characteristics (Volume, Velocity, Variety, and Veracity) pose challenges to the gathering, management, analytics, and visualization of events. These very same four characteristics, however, also hold a great promise in unlocking the story behind data. In this talk, we focus on the observation that event creation is guided by processes. For example, GPS information, emitted by buses in an urban setting follow the bus scheduled route. Also, RTLS information about the whereabouts of patients and nurses in a hospital is guided by the predefined schedule of work. With this observation at hand, we thoroughly seek a method for mining, not the data, but rather the rules that guide data creation and show how, by knowing such rules, big data tasks become more efficient and more effective. In particular, we demonstrate how, by knowing the rules that govern event creation, we can detect complex events sooner and make use of historical data to predict future behaviors.

RuleML 2015: When Processes Rule Events

RuleML

11X1 T14 08 mathematical induction 1 (2011)

Nigel Simmons

Augmenting a feature set using mappings to the Web of data is an up-and-coming way to enrich data in the original dataset. Those enrichments are valuable especially for the recent preference learning algorithms and recommender systems. In this paper, we describe the process of mapping and augmenting the movie ratings dataset Movi- eTweetings from the perspective of RecSysRules 2015 Challenge. The ad-hoc queries to DBpedia are used as an underlying concept. To the best of our knowledge, there is no existing mapping dataset of movies for MovieTweetings.We also provide a brief discussion about the benets of the augmented feature set for an elementary rule-based representation of the user preferences.

Challenge@rule ml2015 rule based recommender systems for the Web of Data

RuleML

Java and SPARQL

Raji Ghawi

An Introduction to the Jena API

Craig Trim

Java and OWL

Raji Ghawi

Ontology development in protégé-آنتولوژی در پروتوغه

sadegh salehi

Iteration, induction, and recursion

Mohammed Hussein

Math induction principle (slides)

IIUM

Protege tutorial

Comércio de Portugal

Mathematical induction

sonia -

Mathematical induction

Sman Abbasi

5.4 mathematical induction

math260

Principle of mathematical induction

Kriti Varshney

Viewers also liked (20)

RuleML 2015: Ontology Reasoning using Rules in an eHealth Context

OWL-XML-Summer-School-09

RuleML2015: How to combine event stream reasoning with transactions for the...

RuleML 2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...

History of Induction and Recursion B

Induction and Decision Tree Learning (Part 1)

RuleML 2015: When Processes Rule Events

11X1 T14 08 mathematical induction 1 (2011)

Challenge@rule ml2015 rule based recommender systems for the Web of Data

Java and SPARQL

An Introduction to the Jena API

Java and OWL

Ontology development in protégé-آنتولوژی در پروتوغه

Iteration, induction, and recursion

Math induction principle (slides)

Protege tutorial

Mathematical induction

5.4 mathematical induction

Principle of mathematical induction

Similar to Challenge@RuleML2015 Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

We live in a profoundly connected world. From supply chains to payment networks to digital business and complex portfolios, our ability to understand and navigate not just data, but relationships inside the data, play an increasingly important role in all aspects of business. Highly connected value chains that generate massive volumes of connected data create an opportunity for graph analysis, which Gartner describes as "the single most single most effective competitive differentiator for organizations pursuing data-driven operations and decisions." This talk will introduce the power of graph databases and share how the latest IBM Power Systems offerings featuring the POWER8 processor and CAPI-attached Flash enable unique scaling, performance and price-performance advantages for Neo4j workloads.

Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j

Neo4j

Guiding through a typical Machine Learning Pipeline

Michael Gerke

Watch full webinar here: https://bit.ly/2JzypTx There are myths about data virtualization that are based on misconceptions and even falsehoods. These myths can confuse and worry people who - quite rightly - look at data virtualization as a critical technology for a modern, agile data architecture. We've decided that we need to set the record straight, so we put together this webinar series. It's time to bust a few myths! In the first webinar of the series, we’ll be busting the 'performance' myth. “What about performance?” is usually the first question that we get when talking to people about data virtualization. After all, the data virtualization layer sits between you and your data, so how does this affect the performance of your queries? Sometimes the myth is perpetuated by people with alternative solutions…the ‘Put all your data in our Cloud and everything will be fine. Data virtualization? Nah, you don’t need that! It can't handle big queries anyway,’ type of thing. Join us for this webinar to look at the basis of the 'performance' myth and examine whether there is any underlying truth to it.

Can data virtualization uphold performance with complex queries?

Denodo

Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong

Ceph Community

Trafficshifting: Avoiding Disasters & Improving Performance at Scale

APNIC

Using Bayesian Optimization to Tune Machine Learning Models: In this talk we briefly introduce Bayesian Global Optimization as an efficient way to optimize machine learning model parameters, especially when evaluating different parameters is time-consuming or expensive. We will motivate the problem and give example applications. We will also talk about our development of a robust benchmark suite for our algorithms including test selection, metric design, infrastructure architecture, visualization, and comparison to other standard and open source methods. We will discuss how this evaluation framework empowers our research engineers to confidently and quickly make changes to our core optimization engine. We will end with an in-depth example of using these methods to tune the features and hyperparameters of a real world problem and give several real world applications.

Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016

MLconf

MLConf 2016 SigOpt Talk by Scott Clark

SigOpt

Data Platform Architecture Principles and Evaluation Criteria

ScyllaDB

BSSML17 - Deepnets

BigML, Inc

LinkedIn serves traffic for its 467 million members from four data centers and multiple PoPs spread geographically around the world. Serving live traffic from from many places at the same time has taken us from a disaster recovery model to a disaster avoidance model where we can take an unhealthy data center or PoP out of rotation and redistribute its traffic to a healthy one within minutes, with virtually no visible impact to users. The geographical distribution of our infrastructure also allows us to optimize the end-user's experience by geo routing users to the best possible PoP and datacenter. This talk provide details on how LinkedIn shifts traffic between its PoPs and data centers to provide the best possible performance and availability for its members. We will also touch on the complexities of performance in APAC, how IPv6 is helping our members and how LinkedIn stress tests data centers verify its disaster recovery capabilities.

APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...

Michael Kehoe

My sql cluster case study apr16

Sumi Ryu

Benchmarking

Steve Loughran

Policy 2012 presentation

bdemchak

AlphaPy

Robert Scott

PostgreSQL as a Big Data Platform

Chris Travers

What exactly is a Data Warehouse? Termed as a special type of database, a Data Warehouse is used for storing large amounts of data, such as analytics, historical, or customer data, which can be leveraged to build large reports and also ensure data mining against it.@ http://maxonlinetraining.com/why-is-data-warehousing-online-training-important/ What is Data mining? The process of extracting valid, previously unknown, comprehensible and actionable information from large databases and using it to make crucial business decisions’ Call us at For any queries, please contact: +1 940 440 8084 / +91 953 383 7156 TODAY to join our Online IT Training course & find out how Max Online Training.com can help you embark on an exciting and lucrative IT career. TODAY to join our Online IT Training course & find out how Max Online Training.com can help you embark on an exciting and lucrative IT career. Visit www.maxonlinetraining.com For Complete Course Overview and to a book @https://goo.gl/QbTVal

Difference between data warehouse and data mining

maxonlinetr

PLNOG 3: John Evans - Best Practices in Network Planning

PROIDEA

In this paper we present the initial results of our work to run BigBench on Spark. First, we evaluated the data scalability behavior of the existing MapReduce implementation of BigBench. Next, we executed the group of 14 pure HiveQL queries on Spark SQL and compared the results with the respective Hive results. Our experiments show that: (1) for both MapReduce and Spark SQL, BigBench queries perform with the increase of the data size on average better than the linear scaling behavior and (2) pure HiveQL queries perform faster on Spark SQL than on Hive. http://clds.sdsc.edu/wbdb2015.ca/program

WBDB 2015 Performance Evaluation of Spark SQL using BigBench

t_ivanov

Database and big data analytics systems such as Hadoop and Spark have a large number of configuration parameters that control memory distribution, I/O optimization, parallelism, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators struggle to understand and tune them to achieve good performance. In this tutorial, we review existing approaches on automatic parameter tuning for databases, Hadoop, and Spark, which we classify into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We describe the foundations of different automatic parameter tuning algorithms and present pros and cons of each approach. We also highlight real-world applications and systems and identify research challenges for handling cloud services, resource heterogeneity, and real-time analytics

Automatic Parameter Tuning for Databases and Big Data Systems

Jiaheng Lu

Geo2tag performance evaluation, Zaslavsky, Krinkin

OSLL

Similar to Challenge@RuleML2015 Transformation and aggregation preprocessing for top-k recommendation GAP rules induction (20)

Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j

Guiding through a typical Machine Learning Pipeline

Can data virtualization uphold performance with complex queries?

Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong

Trafficshifting: Avoiding Disasters & Improving Performance at Scale

Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016

MLConf 2016 SigOpt Talk by Scott Clark

Data Platform Architecture Principles and Evaluation Criteria

BSSML17 - Deepnets

APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...

My sql cluster case study apr16

Benchmarking

Policy 2012 presentation

AlphaPy

PostgreSQL as a Big Data Platform

Difference between data warehouse and data mining

PLNOG 3: John Evans - Best Practices in Network Planning

WBDB 2015 Performance Evaluation of Spark SQL using BigBench

Automatic Parameter Tuning for Databases and Big Data Systems

Geo2tag performance evaluation, Zaslavsky, Krinkin

More from RuleML

Aggregates in Recursion: Issues and Solutions

RuleML

TeleoR is a major extension of Nilsson’s Teleo-Reactive (TR) rule based robotic agent programming language. Programs comprise sequences of guarded action rules grouped into parameterised procedures. The guards are deductive queries to a set of rapidly changing percept and other dynamic facts in the agent’s Belief Store. The actions are either tuples of primitive actions for external robotic resources, to be executed in parallel, or a single call to a TeleoR procedure, which can be a recursive call. The guards form a sub-goal tree routed at the guard of the first rule. When partially instantiated by the arguments of some call, this guard is the goal of the call. TeleoR extends TR in being typed and higher order, with extra forms of rules that allow finer control over sub-goal achieving task behaviour. Its Belief Store inference language is a higher order logic+function rule language, QuLog. QuLog also has action rules and primitive actions for updating the Belief Store and sending messages. The action of a TeleoR rule may be a combination of the action of a TR rule and a sequence of QuLog actions. TeleoR’s most important extension of TR is the concept of task atomic procedures, some arguments of which belong to a special but application specific resource type. This allows the high level programming of multitasking agents using multiple robotic resources. When two or more tasks need to use overlapping resources their use is alternated between task atomic calls in each task, in such a way that there is no interference, deadlock or task starvation. This multi-task programming is illustrated by giving the essentials of a program for an agent controlling two robotic arms in multiple block tower assembly tasks. It has been used to control both a Python interactive graphical simulation and a Baxter robot building real block towers, in each case with help or hindrance from a human. The arms move in parallel whenever it can be done without risk of clashing.

A software agent controlling 2 robot arms in co-operating concurrent tasks

RuleML

The Decision Management (DM) Community Challenge of March 2016 consisted of creating decision models from ten English Port Clearance Rules inspired by the International Ship and Port Facility Security Code. Based on an analysis of the moderately controlled English rules and current online solutions, we formalized the rules in PositionalSlotted, Object-Applicative (PSOA) RuleML. This resulted in: (1) a reordering, subgrouping, and explanation of the original rules on the specialized decision-model expressiveness level of (deontically contextualized) near-Datalog, non-recursive, near-deterministic, ground-queried, and non-subpredicating rules; (2) an object-relational PSOA RuleML rulebase which was complemented by facts to form a knowledge base queried in PSOATransRun for decision-making. Thus, the DM and logical formalizations get connected, which leads to generalized decision models with Hornlog, recursive, non-deterministic, non-ground-queried, and subpredicating rules.

Port Clearance Rules in PSOA RuleML: From Controlled-English Regulation to Ob...

RuleML

In order to enhance interoperability and productivity in the develop-ment of situation-aware applications for disaster management, proper mecha-nisms and guidelines are required. They must address the lack of semantics in modelling emergency situations. In addition, the ever-changing and unpredicta-ble nature of disaster scenarios present challenges for information processing and collaboration. This paper proposes a framework that combines the follow-ing elements: (i) a foundational ontology for temporal conceptualization; (ii) well-founded specifications of structural and behavioral models; (iii) a CEP en-gine based on a distributed rule-based platform for situation management; (iv) a model-driven approach. We illustrate the operation of the framework with a scenario for monitoring tuberculosis epidemy.

Challenge@RuleML2015 Developing Situation-Aware Applications for Disaster Man...

RuleML

Symbolic Machine Learning systems and applications, especially when applied to real-world domains, must face the problem of concepts that cannot be captured by a single definition, but require several alternate definitions, each of which covers part of the full concept extension. This problem is particularly relevant for incremental systems, where progressive covering approaches are not applicable, and the learning and refinement of the various definitions is interleaved during the learning phase. In these systems, not only the learned model depends on the order in which the examples are provided, but it also depends on the choice of the specific definition to be refined. This paper proposes different strategies for determining the order in which the alternate definitions of a concept should be considered in a generalization step, and evaluates their performance on a real-world domain dataset.

Rule Generalization Strategies in Incremental Learning of Disjunctive Concepts

RuleML

Constraint Handling Rules (CHR) is both a versatile theoretical formalism based on logic and an efficient practical high-level programming language based on rules and constraints. Procedural knowledge is often expressed by if-then rules, events and actions are related by reaction rules, change is expressed by update rules. Algorithms are often specified using inference rules, rewrite rules, transition rules, sequents, proof rules, or logical axioms. All these kinds of rules can be directly written in CHR. The clean logical semantics of CHR facilitates non-trivial program analysis and transformation. About a dozen implementations of CHR exist in Prolog, Haskell, Java, Javascript and C. Some of them allow to apply millions of rules per second. CHR is also available as WebCHR for online experimentation with more than 40 example programs. More than 200 academic and industrial projects worldwide use CHR, and about 2000 research papers reference it.

RuleML 2015 Constraint Handling Rules - What Else?

RuleML

The traditional semantics for First Order Logic (sometimes called Tarskian semantics) is based on the notion of interpretations of constants. Herbrand semantics is an alternative semantics based directly on truth assignments for ground sentences rather than interpretations of constants. Herbrand semantics is simpler and more intuitive than Tarskian semantics; and, consequently, it is easier to teach and learn. Moreover, it is more expressive. For example, while it is not possible to finitely axiomatize integer arithmetic with Tarskian semantics, this can be done easily with Herbrand Semantics. The downside is a loss of some common logical properties, such as compactness and completeness. However, there is no loss of inferential power. Anything that can be proved according to Tarskian semantics can also be proved according to Herbrand semantics. In this presentation, we define Herbrand semantics; we look at the implications for research on logic and rules systems and automated reasoning; and and we assess the potential for popularizing logic.

RuleML2015 The Herbrand Manifesto - Thinking Inside the Box

RuleML

RuleML2015 PSOA RuleML: Integrated Object-Relational Data and Rules

RuleML

Industry@RuleML2015: Norwegian State of Estate A Reporting Service for the St...

RuleML

A Service for Improving the Assignments of Common Agriculture Policy Funds to...

RuleML

UML class diagrams (UCDs) are a widely adopted formalism for modeling the intensional structure of a software system. Although UCDs are typically guiding the implementation of a system, it is common in practice that developers need to recover the class diagram from an implemented system. This process is known as reverse engineering. A fundamental property of reverse engineered (or simply re-engineered) UCDs is consistency, showing that the system is realizable in practice. In this work, we investigate the consistency of re-engineered UCDs, and we show is pspace-complete. The upper bound is obtained by exploiting algorithmic techniques developed for conjunctive query answering under guarded Datalog+/-, that is, a key member of the Datalog+/- family of KR languages, while the lower bound is obtained by simulating the behavior of a polynomial space Turing machine.

Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-

RuleML

It has been acknowledged that emerging Web applications require features that are not available in standard rule languages like Datalog or Answer Set Programming (ASP), e.g., they are not powerful enough to deal with anonymous values (objects that are not explicitly mentioned in the data but whose existence is implied by the background knowledge). In this paper, we introduce a new rule language based on ASP extended with function symbols, which can be used to reason about anonymous values. In particular, we define binary frontier-guarded programs (BFG programs) that allow for disjunction, function symbols, and negation under the stable model semantics. In order to ensure decidability, BFG programs are syntactically restricted by allowing at most binary predicates and by requiring rules to be frontier-guarded. BFG programs are expressive enough to simulate ontologies expressed in popular Description Logics (DLs), capture their recent non-monotonic extensions, and can simulate conjunctive query answering over many standard DLs. We provide an elegant automata-based algorithm to reason in BFG programs, which yields a 3ExpTime upper bound for reasoning tasks like deciding consistency or cautious entailment. Due to existing results, these problems are known to be 2ExpTime-hard.

RuleML2015: Binary Frontier-guarded ASP with Function Symbols

RuleML

API4KP (API for Knowledge Platforms) is a standard development effort that targets the basic administration services as well as the retrieval, modification and processing of expressions in machine-readable languages, including but not limited to knowledge representation and reasoning (KRR) languages, within heterogeneous (multi-language, multi-nature) knowledge platforms. KRR languages of concern in this paper include but are not limited to RDF(S), OWL, RuleML and Common Logic, and the knowledge platforms may support one or several of these. Additional languages are integrated using mappings into KRR languages. A general notion of structure for knowledge sources is developed using monads. The presented API4KP metamodel, in the form of an OWL ontology, provides the foundation of an abstract syntax for communications about knowledge sources and environments, including a classification of knowledge source by mutability, structure, and an abstraction hierarchy as well as the use of performatives (inform, query, ...), languages, logics, dialects, formats and lineage. Finally, the metamodel provides a classification of operations on knowledge sources and environments which may be used for requests (message-passing).

RuleML2015: API4KP Metamodel: A Meta-API for Heterogeneous Knowledge Platforms

RuleML

We present Dexter, a browser-based, domain-independent structured-data explorer for users. Dexter enables users to explore data from multiple local and Web-accessible heterogeneous data sources such as files, Web pages, APIs and databases in the form of tables. Dexter’s users can also compute tables from existing ones as well as validate the tables (base or computed) through declarative rules. Dexter enables users to perform ad hoc queries over their tables with higher expressivity than that is supported by the underlying data sources. Dexter evaluates a user’s query on the client side while evaluating sub-queries on remote sources whenever possible. Dexter also allows users to visualize and share tables, and export (e.g., in JSON, plain XML, and RuleML) tables along with their computation rules. Dexter has been tested for a variety of data sets from domains such as government and apparel manufacturing. Dexter is available online at http://dexter.stanford.edu.

RuleML2015: Rule-Based Exploration of Structured Data in the Browser

RuleML

Data quality assessment and data cleaning are context dependent activities. Starting from this observation, in previous work a context model for the assessment of the quality of a database was proposed. A context takes the form of a possibly virtual database or a data integration system into which the database under assessment is mapped, for additional analysis, processing, and quality data extraction. In this work, we extend contexts with dimensions, and by doing so, multidimensional data quality assessment becomes possible. At the core of multidimensional contexts we find ontologies written as Datalog ± programs with provably good properties in terms of query answering. We use this language to represent dimension hierarchies, dimensional constraints, dimensional rules, and specifying quality data. Query answering relies on and triggers dimensional navigation, and becomes an important tool for the extraction of quality data.

RuleML2015: Ontology-Based Multidimensional Contexts with Applications to Qua...

RuleML

Context-aware systems gained huge popularity in recent years due to rapid evolution of personal mobile devices. Equipped with variety of sensors, such devices are sources of a lot of valuable information that allows the system to act in an intelligent way. However, the certainty and presence of this information may depend on many factors like measurement accuracy or sensor availability. Such a dynamic nature of information may cause the system not to work properly or not to work at all. To allow for robustness of the context-aware system an uncertainty handling mechanism should be provided with it. Several approaches were developed to solve uncertainty in context knowledge bases, including probabilistic reasoning, fuzzy logic, or certainty factors. In this paper, we present a representation method that combines strengths of rules based on the attributive logic and Bayesian networks. Such a combination allows efficiently encode conditional probability distribution of random variables into a reasoning structure called XTT2. This provides a method for building hybrid context-aware systems that allows for robust inference in uncertain knowledge bases.

RuleML2015: Compact representation of conditional probability for rule-based...

RuleML

We provide a general framework for learning characterization rules of a set of objects in Geographic Information Systems (GIS) relying on the definition of distance quantified paths. Such expressions specify how to navigate between the different layers of the GIS starting from the target set of objects to characterize. We have defined a generality relation between quantified paths and proved that it is monotonous with respect to the notion of coverage, thus allowing to develop an interactive and effective algorithm to explore the search space of possible rules. We describe GISMiner, an interactive system that we have developed based on our framework. Finally, we present our experimental results from a real GIS about mineral exploration.

RuleML2015: Learning Characteristic Rules in Geographic Information Systems

RuleML

Over the last two decades frequent itemset and association rule mining has attracted huge attention from the scientific community which resulted in numerous publications, models, algorithms, and optimizations of basic frameworks. In this paper we introduce an extension of the frequent itemset framework, called substitutive itemsets. Substitutive itemsets allow to discover equivalences between items, i.e., they represent pairs of items that can be used interchangeably in many contexts. In the paper we present basic notions pertaining to substitutive itemsets, describe the implementation of the proposed method available as a RapidMiner plugin, and illustrate the use of the framework for mining substitutive object properties in the Linked Data.

RuleML2015: Using Substitutive Itemset Mining Framework for Finding Synonymou...

RuleML

The Semantic Web uses ontologies to associate meaning to Web content so machines can process it. One inherent problem to this approach is that, as its popularity increases, there is an ever growing number of ontologies available to be used, leading to difficulties in choosing appropriate ones. With that in mind, we created a system that allows users to evaluate ontologies/rules. It is composed by the Metadata description For Ontologies/Rules (MetaFOR), an ontology in OWL, and a tool to convert any OWL ontology to MetaFOR. With the MetaFOR version of an ontology, it is possible to use SWRL rules to identify anomalies in it. These can be problems already documented in the literature or user defined ones. SWRL is familiar to users, so it is easier to define new project specific anomalies. We present a case study where the system detects 9 problems, from the literature, and two user defined ones

RuleML2015: User Extensible System to Identify Problems in OWL Ontologies and...

RuleML

Access control systems often use rule based frameworks to express access policies. These frameworks not only simplify the representation of policies, but also provide reasoning capabilities that can be used to verify the policies. In this work, we propose to use defeasible reasoning to simplify the specification of role-based access control policies and make them modular and more robust. We use the Flora-2 rule-based reasoner for representing a role-based access control policy. Our early experiments show that the wide range of features provided by Flora-2 greatly simplifies the task of building the requisite ontologies and the reasoning components for such access control systems.

RuleML2015: Representing Flexible Role-Based Access Control Policies Using Ob...

RuleML

More from RuleML (20)