This talk will cover various aspects of Logic Programming. We examine Logic Programming in the contexts of Programming Languages, Mathematical Logic and Machine Learning.
We will we start with an introduction to Prolog and metaprogramming in Prolog. We will also discuss how miniKanren and Core.Logic differ from Prolog while maintaining the paradigms of logic programming.
We will then cover the Unification Algorithm in depth and examine the mathematical motivations which are rooted in Skolem Normal Form. We will describe the process of converting a statement in first order logic to clausal form logic. We will also discuss the applications of the Unification Algorithm to automated theorem proving and type inferencing.
Finally we will look at the role of Prolog in the context of Machine Learning. This is known as Inductive Logic Programming. In that context we will briefly review Decision Tree Learning and it's relationship to ILP. We will then examine Sequential Covering Algorithms for learning clauses in Propositional Calculus and then the more general FOIL algorithm for learning sets of Horn clauses in First Order Predicate Calculus. Examples will be given in both Common Lisp and Clojure for these algorithms.
Pierre de Lacaze has over 20 years’ experience with Lisp and AI based technologies. He holds a Bachelor of Science in Applied Mathematics and Computer Science and a Master’s Degree in Computer Science. He is the president of LispNYC.org
2. The Logic Programming Model
• Logic Programming is an abstract model of computation.
• Lambda Calculus is another abstract model of
computation.
• Prolog is a particular implementation of the logic
programming model in much the same way that Clojure
and Haskell are particular implementation of the lambda
calculus.
• OPS5 is another implementation of the logic programming
model.
• The use of mathematical logic to represent and execute
computer programs is also a feature of the lambda
calculus
• Prolog is classified as a functional language (wikipedia)
3. Prolog Introduction
• Prolog is a declarative language
• Prolog is logic programming language
• Invented in 1972 by Colmerauer & Roussel
– Edinburg Prolog
– Marseilles Prolog
• Initially used for Natural Language Processing
• Programs consist of fact & rules
• Fact is a clause in FOPC
• Rule is an inference: B A1,…,An
• Use queries to run programs and perform retrievals
4. Logic Programming Paradigm
• A program is logical description of your
problem from which a solution is logically
derivable
• The execution of a program is very much like
the mathematical proof of a theorem
• Where’s my program?
– N! is (N-1)! times N
– 0! is 1
5. Horn Clauses
• A Horn clause is a disjunction of literals with at most one positive literal
• In mathematical logic and logic programming, a Horn clause is a logical
formula of a particular rule-like form which gives it useful properties for
use in logic programming, formal specification, and model theory.
• Horn clauses are named for the logician Alfred Horn (1951)
• (u ← p ∧ q ∧ ... ∧ t) is equivalent to (u ∨ ¬p ∨ ¬q ∨ ... ∨ ¬t)
• In the non-propositional case, all variables in a clause are implicitly
universally quantified with scope the entire clause. Thus, for example:
1. ¬ human(X) ∨ mortal(X)stands for:
2. ∀X( ¬ human(X) ∨ mortal(X) )which is logically equivalent to:
3. ∀X ( human(X) → mortal(X) )
6. Warren Abstract Machine
• In 1983, David H. D. Warren designed an abstract machine for the execution of
Prolog consisting of a memory architecture and an instruction set.
• This design became known as the Warren Abstract Machine (WAM) and has
become the de facto standard target for Prolog compilers.
• Prolog code is reasonably easy to translate to WAM instructions which can be
more efficiently interpreted.
• Also, subsequent code improvements and compilation to native code are often
easier to perform on the more low-level representation.
• In order to write efficient Prolog programs, a basic understanding of how the WAM
works can be advantageous.
• Some of the most important WAM concepts are first argument indexing and its
relation to choice-points, tail call optimization and memory reclamation on failure
• http://en.wikipedia.org/wiki/Warren_Abstract_Machine
7. Prolog Facts
• Facts: <predicate>(<arg1>,…,<argN>)
• Example: likes(mary, john)
• Constants must be in lowercase
• Variables must be in uppercase or start with
an underscore.
• Example: eats(mikey, X)
• Example: believes(peter, likes(mary, john))
8. Basic Inferences & Variables
likes(john, cheese)
likes(mary, cheese)
likes(bob, meat)
similar(X,Y) :- likes(X,Z), likes(Y,Z)
Note: You can use [‘<filename/pathname>’]. to compile and load files
GNU Prolog 1.4.1
By Daniel Diaz
Copyright (C) 1999-2012 Daniel Diaz
| ?- ['C:ProjectsLanguagesPrologsimilar.pl'].
compiling C:/Projects/Languages/code/Prolog/similar.pl for byte code...
C:/Projects/Languages/code/Prolog/similar.pl compiled, 4 lines read - 935
bytes written,
(16 ms) yes
9. Filling in the Blanks
| ?- similar(john, mary).
yes
| ?- similar(john, bob).
no
| ?- similar(mary, X).
X = john ?
yes
| ?- similar(X, Y).
X = john
Y = john ? ;
X = john
Y = mary ? ;
X = mary
Y = john ? ;
X = mary
Y = mary ?
Note: can use ; and a to get next or all answers
10. Unification
• The Unification Algorithm is a famous algorithm from the field of AI,
often used in theorem proving, game playing, planning, etc…
• It can loosely be thought of as an algorithm that tries to make to
non-grounded terms the same.
• P(X, 2) = P(1, Y) X=1 & Y=2 P(1, 2)
• P(X, X) = P(Y, 5) X=5 & Y=5 P (5, 5)
• P(X, Y) = P(2, Z) X=2 & Y=Z P (2, Z)
• See Artificial Intelligence (Russell & Norvig)
11. Prolog Rules
• Rules: <head> :- <body>
• Head: Single clause typically with variables
• Body: Conjunction of goals with variables
• Examples:
ancestor(X,Y) :- parent(X,Y)
ancestor(X,Y) :- parent(X,Z), parent(Z,Y)
ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y)
14. Using Rules in Both Directions
// Find all ancestors
| ?- ancestor(X, p4).
X = p3 ? ;
X = p1 ? ;
X = p2 ? ;
no
// Find all descendants
| ?- ancestor(p1, X).
X = p2 ? ;
X = p3 ? ;
X = p4 ? ;
no
15. Accessing Elements of a List
| ?- [1, 2, 3] = [X | Y].
X = 1
Y = [2,3]
| ?- [1, 2, 3] = [_, X | Y].
X = 2
Y = [3]
16. Lists and Math (1)
count(0, []).
count(Count, [Head|Tail]) :-
count(TailCount, Tail), Count is TailCount + 1.
sum(0, []).
sum(Total, [Head|Tail]) :-
sum(Sum, Tail), Total is Head + Sum.
average(Average, List) :-
sum(Sum, List),
count(Count, List),
Average is Sum/Count.
19. Solving Sudoku (3)
• Finite Domain variables: A new type of data is introduced: FD
variables which can only take values in their domains. The
initial domain of an FD variable is 0..fd_max_integer where
fd_max_integer represents the greatest value that any FD
variable can take.
• fd_domain(Board, 1, 4).
Used to specify the range of values of each Sudoku cell.
• fd_all_different(X).
Used to specify that all elements in the list must have distinct values.
20. Structure Inspection
| ?- functor(father(tom, harry), P, A).
A = 2
P = father
Yes
| ?- arg(1,father(tom, harry), A1).
A1 = tom
Yes
| ?- arg(2,father(tom, harry), A2).
A2 = harry
yes
| ?- functor(X, father, 2).
X = father(_, _)
Yes
| ?- father(tom, harry) =.. [X, Y, Z].
X = father
Y = tom
Z = harry
yes
| ?- X =.. [father, tom, harry].
X = father(tom, harry)
yes
| ?- X =.. [father, tom, harry], assertz(X).
X = father(tom, harry)
yes
| ?- father(tom, harry).
yes
functor, arg and =..
21. Meta-Logical Predicates
• Outside scope of first-order logic
• Query and affect the state of the proof
• Treat variables as objects
• Convert data structures to goals
• Type Predicates:
• var(<term>)
• nonvar(<term>)
• Variables as objects: freeze & melt
• Dynamically Affecting the Knowledge Base
• assert(<goal>)
• retract(<goal>)
• The Meta-Variable Facility: call(<goal>)
• Memoization: lemma(<goal>)
22. OPS5
• OPS5 is a rule-based or production system computer language, notable as the
first such language to be used in a successful expert system, the R1/XCON
system used to configure VAX computers.
• The OPS family was developed in the late 1970s by Charles Forgy while at
Carnegie Mellon University.
• Allen Newell's research group in artificial intelligence had been working on
production systems for some time,
• Forgy's implementation, based on his Rete algorithm, was especially efficient,
sufficiently so that it was possible to scale up to larger problems involving
hundreds or thousands of rules.
• OPS5 uses a forward chaining inference engine.
• programs execute by scanning "working memory elements" (which are
vaguely object-like, with classes and attributes) looking for matches with the
rules in "production memory".
• Rules have actions that may modify or remove the matched element, create
new ones, perform side effects such as output, and so forth. Execution
continues until no more matches can be found.
23. miniKanren
• miniKanren is an embedded Domain Specific Language for logic programming
• miniKanren is a simplified version of KANREN.
• First Introduced in The Reasoned Schemer by Daniel P. Friedman, William E. Byrd and Oleg Kiselyov
(MIT Press, 2005)
• KANREN: is a declarative logic programming system with first-class relations, embedded in a pure
functional subset of Scheme.
• KANREN has a set-theoretical semantics, true unions, fair scheduling, first-class relations, lexically-
scoped logical variables, depth-first and iterative deepening strategies. The system achieves high
performance and expressivity without cuts.
• The core miniKanren language is very simple, with only three logical operators and one interface
operator.
• miniKanren has been implemented in a growing number of host languages, including Scheme,
Racket, Clojure, Haskell, Python, JavaScript, Scala, Ruby, OCaml, and PHP, among many other
languages.
• miniKanren is designed to be easily modified and extended; extensions include Constraint Logic
Programming, probabilistic logic programming, nominal logic programming, and tabling.
• http://minikanren.org/
24. Core.Logic
• Core.Logic is a Clojure based implementation of miniKanren written by David
Nolen.
• https://github.com/clojure/core.logic
• Core.logic supports the following Logic Programming paradigms
– CLP: Constraint Logic Programming (CLP)
– CLP(FD): constraint logic programming over finite domains
– Tabling: Certain kinds of logic programs that would not terminate in Prolog
will terminate in core.logic if you create a tabled goal.
– Nominal Logic Programming: Nominal logic programming makes it easier to
write programs that must reason about binding and scope.
25. Unification
• Original Algorithm: Robinson (1965)
• Efficient Algorithm: Montanari (1982)
• Intuition: Make two terms the same
• Input: Two terms.
• Output: A set of bindings (aka substitutions)
• Example: P(x,
26. Unification Algorithm
(from Wikipedia)
• A variable which is uninstantiated can be unified with an
atom, a term, or another uninstantiated variable, thus
effectively becoming its alias.
• In many modern Prolog dialects and in first-order logic, a
variable cannot be unified with a term that contains it;
this is the so-called occurs check.
• Two atoms can only be unified if they are identical.
• Similarly, a term can be unified with another term if the
top function symbols and arities of the terms are
identical and if the parameters can be unified
simultaneously. Note that this is a recursive behavior.
27. Unification Applications: Type Inferencing
(from Wikipedia)
• Unification is used during type inference, for instance in the functional programming
language Haskell.
• Used for both type inferencing and type error detection.
• The Haskell expression 1:['a','b','c'] is not correctly typed
– the list construction function ":" is of type a->[a]->[a]
– the first argument "1" the polymorphic type variable "a" has to denote the type Int
– "['a','b','c']" is of type [Char],
– "a" cannot be both Char and Int at the same time.
• Unification for Type Inferencing
– Any type variable unifies with any type expression, and is instantiated to that expression. A
specific theory might restrict this rule with an occurs check.
– Two type constants unify only if they are the same type.
– Two type constructions unify only if they are applications of the same type constructor and all of
their component types recursively unify.
– Due to its declarative nature, the order in a sequence of unifications is (usually) unimportant.
• Algorithm W: Hindley-Milner Type Inferencing. Unification + Constraint Satisfaction
28. Example Type Inferencing in Haskell
GHCi, version 7.4.2: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
ghci> 1:[2,3,4]
[1,2,3,4]
ghci> 1:['a', 'b', 'c']
<interactive>:3:1:
No instance for (Num Char) arising from the literal `1'
Possible fix: add an instance declaration for (Num Char)
In the first argument of `(:)', namely `1'
In the expression: 1 : ['a', 'b', 'c']
In an equation for `it': it = 1 : ['a', 'b', 'c']
instance Num Char where
fromInteger x = chr (fromIntegral x)
That would cast the '1' into the character with the ascii code 1. This is of course a terrible idea :-)
Note: Can't declare a union type to do exactly what you want, as in Haskell we have tagged unions, not untagged ones.
Tagged union: sum type corresponds to intuitionistic logical disjunction under the Curry–Howard correspondence.
29. Origins of Theorem Proving
• Roots of formalized logic go back to Aristotle.
• Frege's Begriffsschrift (1879) introduced both a complete propositional calculus and
what is essentially modern predicate logic.
• His Foundations of Arithmetic, published 1884,[ expressed (parts of) mathematics in
formal logic.
• This approach was continued by Russell and Whitehead in their influential Principia
Mathematica, first published 1910–1913,[ and with a revised second edition in 1927.
• Russell and Whitehead thought they could derive all mathematical truth using axioms
and inference rules of formal logic, in principle opening up the process to
automatisation.
• In 1920, Thoralf Skolem simplified a previous result by Leopold Löwenheim, leading to
the Löwenheim–Skolem theorem
• And in 1930, to the notion of a Herbrand universe and a Herbrand interpretation that
allowed (un)satisfiability of first-order formulas (and hence the validity of a theorem) to
be reduced to (potentially infinitely many) propositional satisfiability problems
30. Resolution Theorem Proving
(from Wikipedia)
• In mathematical logic and automated theorem proving, resolution is a rule of
inference leading to a refutation theorem-proving technique for sentences in
propositional logic and first-order logic.
• In other words, iteratively applying the resolution rule in a suitable way allows for
telling whether a propositional formula is satisfiable and for proving that a first-
order formula is unsatisfiable.
• Attempting to prove a satisfiable first-order formula as unsatisfiable may result in a
nonterminating computation; this problem doesn't occur in propositional logic.
• The resolution technique uses proof by contradiction and is based on the fact that
any sentence in propositional logic can be transformed into an equivalent
sentence in conjunctive normal form.
• In Boolean logic, a formula is in conjunctive normal form (CNF) or clausal normal
form if it is a conjunction of clauses, where a clause is a disjunction of literals;
otherwise put, it is an AND of ORs. As a normal form, it is useful in automated
theorem proving. It is similar to the product of sums form used in circuit theory
31. Unification based Theorem Proving
(from Wikipedia)
• One approach: Proofs by resolution refutation.
• Resolution Rule: Elimination of complementary literals
– e.g. (a V ¬b V c V b) produces (a V b)
• Modus ponens can be seen as a special case of resolution of a one-literal
clause and a two-literal clause.
• The resolution rule can be traced back to Davis and Putnam (1960). however,
their algorithm required to try all ground instances of the given formula
• This source of combinatorial explosion was eliminated in 1965 by John Alan
Robinson's syntactical unification algorithm, which allowed one to instantiate
the formula during the proof "on demand" just as far as needed to keep
refutation completeness
32. Skolem Normal Form
(from Wikipedia)
• In mathematical logic, reduction to Skolem normal form (SNF) is a method for
removing existential quantifiers from formal logic statements, often
performed as the first step in an automated theorem prover.
• Skolemization works by applying a second-order equivalence in conjunction to
the definition of first-order satisfiability. The equivalence provides a way for
"moving" an existential quantifier before a universal one.
• Intuitively, the sentence "for every x there exists a y such that " is converted
into the equivalent form "there exists a function f mapping every x into y a
such that, for every x it holds R(x, f(x)).
• Thoralf Skolem was a Norwegian mathematician in the late 1800’s.
33. First Order Logic to Normal Form
(from Artificial Intelligence, Russell & Norvig, 1995)
• Eliminate implication
a ⇒ b becomes ¬a V b
• Move ¬ Inside
¬(a V b) becomes (¬a ∧ ¬b)
• Standardize variables
(∃x p(x)) V (∀x (g(x)) becomes (∃x1 p(x1)) V (∀x2 (g(x2))
• Move quantifiers left
p V ∀x q becomes ∀x p V q
• Skolemize
∀x person(x) => ∃y heart(y) ∧ has(x, y) becomes
∀x person(x) => heart(F(x)) ∧ has(x, F(x))
• Distribute ∧ over V
(a ∧ b) V c becomes (a V c) ∧ (b V c)
• Flatten nested conjunctions and disjunctions
(a V b) V c becomes (a V b V c)
• Convert disjunctions to implications
(¬a V ¬b V c V d) becomes (a ∧ b) => (c V d)
• See Russel&Norvig Chapter 9 for a resolution refutation proof that first converts to normal form.
34. Inductive Logic Programming
• Inductive logic programming (ILP) is a subfield of machine
learning
• Uses logic programming representation uniformly
– hypotheses
– examples
– background knowledge
• Input: an encoding of the known background knowledge and
a set of examples represented as a logical database of facts
• Output: hypothesized logic program which entails all the
positive and none of the negative examples.
35. Decision Tree Learning
• Quinlan, J. R., (1986). Induction of Decision Trees.
• A tree can be "learned" by splitting the source set into subsets
based on an attribute value test, then recursively repeated.
• This process of top-down induction of decision trees (TDIDT)
is an example of a greedy algorithm, and it is by far the most
common strategy for learning decision trees from data.
• Partitioning is based on Attribute Gain as measured by then
Entropy of an attribute.
36. The ID3 Algorithm
(from Wikipedia)
• Calculate the entropy of every attribute using the training set
• Split the set into subsets using the attribute for which entropy
is minimum (maximum information gain)
• Make a decision tree node containing that attribute
• Recur on subsets using remaining attributes.
• C4.5 Algorithm (Quinlan) extends ID3
– Handling both continuous and discrete attributes
– Handling training data with missing attribute
– Handling attributes with differing costs.
– Pruning trees after creation - C4
37. Play Tennis Training Data
Tom Mitchell, Machine Learning, Chapter 3
Outlook Temperature Humidity Wind Play Tennis
Sunny Hot High Weak No
Sunny Hot High Strong No
Overcast Hot High Weak Yes
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Overcast Cool Normal Strong Yes
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Overcast Mild High Strong Yes
Overcast Hot Normal Weak Yes
Rain Mild High Strong No
39. Decision Trees as Prolog Programs
play-tennis () :- outlook(rain), wind(weak)
play-tennis () :- outlook(overcast)
play-tennis () :- outlook(sunny), humidity(normal)
OUTLOOK
rain
WIND
weak
+
overcast
+
sunny
HUMIDITY
normal
+
⇒
Take each positive branch of the decision tree and add it as Prolog rule for the
target attribute.
40. Sequential Covering Algorithms
Sequential-covering (target-attribute, attributes, examples, threshold)
learned-rules {}
rule learn-one-rule (target-attribute, attributes, examples)
while performance (rule, examples) > threshold
learned-rules learned-rules + rule
examples examples – (examples correctly classified by rule)
rule learn-one-rule (target-attribute, attributes, examples)
learned-rules <- sort learned-rules based on performance
return (learned rules)
Performance(h, target-attribute, examples)
entropy(subset of examples that match h wrt to target-attribute)
41. Learn-one-Rule1
Learn-one-Rule (target-attribute, attributes, examples)
Initialize best-hypothesis to {} and candidate-hypotheses {best-hypothesis}
While candidate-hypotheses not empty
1. Generate next most specific candidate-hypotheses
all-constraints all constraints of the form (a = v) in examples
new-candidate-hypotheses
for each h in candidate-hypotheses and each c in all-constraints
create a specialization of h by adding c to it
remove from new-candidate-hypotheses duplicates, inconsistent hypotheses and non-maximally specific
2. Update best-hypothesis
for all h in new-candidate-hypotheses
If performance (h, examples, target-attribute) > performance(best-hypothesis, examples, target-attribute)
then best-hypothesis h
3. Update candidate-hypotheses
set candidate-hypotheses best k hypotheses according performance metric.
Return a rule of the form “if <best-hypothesis> then <prediction>” where prediction is the most frequently
occurring value for target attribute amongst covered examples.
1Based on CN2 algorithm by Clark & Niblett (1989)
42. Learning Horn Clauses: FOIL
FOIL (target-predicate, predicates, examples)
pos those examples for target-predicate is true
neg those examples for target-predicate is true
learned-rules {}
while pos is not empty do
learn a new rule
new-rule rule that predicts target predicate with no preconditions.
new-rule-neg neg
while new-rule-neg do
add a new literal to specialize new rule
candidate-literals new literal candidates based on predicates
best-literal argmax (FOIL-Gain (literal, new-rule)) over candidate-literals
add best-literal to the preconditions of new-rule
new-rule-neg subset of new-rule-neg which satisfy new-rule preconditions.
learned-rules learned-rules + new-rule
pos pos – (members of pos covered by new-rule)
return learned-rules
43. FOIL Information Gain Criteria
Gain(R0, R1) := t * ( log2(p1/(p1+n1)) - log2(p0/(p0+n0)) )
• R0 denotes a rule before adding a new literal.
• R1 is an extesion of R0.
• p0 denotes the number of positive examples, covered by R0,
• p1 the number of positive examples covered by R1.
• n0 and n1 are the number of negative examples, covered by the according rule.
• t is the number of positive examples, covered by both R0 and R1.
• http://www-ai.cs.uni-dortmund.de/kdnet/auto?self=$81d91e8ddbd8094353