Slides for my talk at the 15th International Conference on Artificial Intelligence and Law (ICAIL 2015), June 11, 2015.
The full ICAIL 2015 paper is available on ResearchGate at bit.ly/1qCnLJq.
Nell’iperspazio con Rocket: il Framework Web di Rust!
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Semantics
1. How to Ground
A Language for Legal Discourse
In a Prototypical Perceptual Semantics
L. Thorne McCarty
Rutgers University
2. Background Papers
● “An Implementation of Eisner v. Macomber,” in ICAIL-'95.
– Computational reconstruction of 1920 corporate tax case.
– Based on a theory of “prototypes and deformations.”
● “Some Arguments About Legal Arguments,” in ICAIL-'97.
– Critical review of the literature.
– Discussion of “The Correct Theory” in Section 5:
● “Legal reasoning is a form of theory construction... A judge
rendering a decision in a case is constructing a theory of that
case... If we are looking for a computational analogue of
this phenomenon, the first field that comes to mind is
machine learning...”
3. ICAIL-'97, Section 5:
… Most machine learning algorithms assume that concepts have
“classical” definitions, with necessary and sufficient conditions, but
legal concepts tend to be defined by prototypes. When you first look at
prototype models [Smith and Medin, 1981], they seem to make the
learning problem harder, rather than easier, since the space of possible
concepts seems to be exponentially larger in these models than it is in
the classical model. But empirically, this is not the case. Somehow, the
requirement that the exemplar of a concept must be “similar” to a
prototype (a kind of “horizontal” constraint) seems to reinforce the
requirement that the exemplar must be placed at some determinate
level of the concept hierarchy (a kind of “vertical” constraint). How is
this possible? This is one of the great mysteries of cognitive science.
It is also one of the great mysteries of legal theory. ...
4. Summary
Contemporary trends in machine learning have now shed new light on
the subject. In this paper, I will describe my
● Recent work on “manifold learning”:
“Clustering, Coding and the Concept of Similarity,” arXiv:1401.2411 [cs.LG] (10 Jan 2014).
● Work in progress on “deep learning” (forthcoming, 2015):
“Differential Similarity in Higher Dimensional Spaces: Theory and Applications.”
“Deep Learning with a Riemannian Dissimilarity Metric.”
Taken together, this work leads to a logical language grounded in a
prototypical perceptual semantics, with implications for legal theory.
5. Prototype Coding
What is prototype coding?
● The basic idea is to represent a point in an n-dimensional space by
measuring its distance from a prototype in several specified
directions.
● Furthermore, we want to select a prototype that lies at the origin of
an embedded, low-dimensional, nonlinear subspace, which is in
some sense “optimal”.
6. Manifold Learning
S. Rifai, Y.N. Dauphin, P. Vincent, Y. Bengio, X. Muller, “The Manifold
Tangent Classifier,” in NIPS 2011:
Three hypotheses:
1. ...
2. The (unsupervised) manifold hypothesis, according to which real world
data presented in high dimensional spaces is likely to concentrate in the
vicinity of non-linear sub-manifolds of much lower dimensionality ...
[citations omitted]
3. The manifold hypothesis for classification, according to which points of
different classes are likely to concentrate along different sub-manifolds,
separated by low density regions of the input space.
7. Manifold Learning
The Probabilistic Model:
Brownian motion with a drift term. More precisely, a diffusion process
generated by the following differential operator:
● The invariant probability measure is proportional to .
● Thus is the gradient of the log of the probability density.
eU x
∇ U x
9. Manifold Learning
The Geometric Model:
To implement the idea of prototype coding, we choose:
● A radial coordinate, ρ, which follows .
● The directional coordinates, θ1
, θ2
,...,θn−1
, orthogonal to .
But we actually want a lower-dimensional subspace, obtained by
projecting our diffusion process onto a k−1 dimensional subset of the
directional coordinates. The device we need is a Riemannian metric,
, which we interpret as a measure of dissimilarity. Crucially, the
dissimilarity metric should depend on the probability measure.
∇ U x
∇ U x
gij x
10. Manifold Learning
● Find a principal axis for the ρ
coordinate.
● Choose the principal directions
for the θ1
, θ2
,..., θk –1
coordinates.
● To compute the coordinate
curves, follow the geodesics of
the Riemannian metric in each of
the k−1 principal directions.
11. Manifold Learning
Prototypical Clusters
● Probability density is a mixture:
● These two prototypical clusters
are “exponentially” far apart.
It is natural to refer to this model as
a theory of differential similarity.
e
U x
≈ p1 e
U 1 x
p2 e
U 2x
12. Deep Learning
S. Rifai, Y.N. Dauphin, P. Vincent, Y. Bengio, X. Muller, “The Manifold Tangent
Classifier,” in NIPS 2011:
Three hypotheses:
1. The semi-supervised learning hypothesis, according to which learning
aspects of the input distribution p(x) can improve models of the conditional
distribution of the supervised target p(y|x) ... [citation omitted]. This hypothesis
underlies not only the strict semi-supervised setting where one has many more
unlabeled examples at his disposal than labeled ones, but also the successful
unsupervised pretraining approach for learning deep architectures … [citations
omitted].
2. ...
3. ...
13. Deep Learning
Historically, used as a benchmark for supervised learning:
● Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-Based Learning Applied to
Document Recognition." Proceedings of the IEEE, 86(11):2278-2324 (November, 1998).
We will treat it as a problem in unsupervised feature learning.
Standard Example: MNIST
● pixels
● 60,000 training set images
● 10,000 test set images
28×28
15. Deep Learning
● is estimated from
the data using the mean
shift algorithm.
● at a prototype.
● The prototypical clusters
partition the space of
600,000 patches.
∇ U x
∇ U x=0
35 Prototypes
16. Deep Learning
● is estimated from
the data using the mean
shift algorithm.
● at a prototype.
● The prototypical clusters
partition the space of
600,000 patches.
∇ U x
∇ U x=0
35 Prototypes
19. Deep Learning
General Procedure:
● Construct the product manifold from the encoded values of the
smaller patches.
● Construct a submanifold using the Riemannian dissimilarity metric.
encode Category: 4
12 dimensions48 dimensions
20. The Logical Language
Rewrite the top four patches as a logical product:
Use the syntax of my Language for Legal Discourse (LLD):
21. The Logical Language
For this interpretation, we need a logical language based on category
theory:
Define: Categorical Product
● In Man, this is the product manifold.
Define: Categorical Subobject
● In Man, this is a submanifold.
objects morphisms
Set abstract sets arbitrary mappings
Top topological spaces continuous mappings
Man differential manifolds smooth mappings
22. The Logical Language
For this interpretation, we need a logical language based on category
theory:
Define: Categorical Product
● In Man, this is the product manifold.
Define: Categorical Subobject
● In Man, this is a submanifold.
objects morphisms
Set abstract sets arbitrary mappings
Top topological spaces continuous mappings
Man differential manifolds smooth mappings
logic
classical
intuitionistic
????
23. The Logical Language
Sequent Calculus:
●
Actor and Corporation are interpreted as differential manifolds.
●
macomber and so are interpreted as points on these manifolds.
●
Control is interpreted as a submanifold of the product manifold.
● A sequent is interpreted as a morphism.
24. The Logical Language
Structural Rule for cut:
Introduction and Elimination Rules for conjunction:
Horn Axioms:
This is sufficient for horn clause logic programming.
25. The Logical Language
Novel Property:
A proof is a composition of morphisms in the category Man, i.e., it is a
smooth mapping of differential manifolds.
26. The Logical Language
Novel Property:
A subspace is not always a submanifold.
● Implications for Godel's Theorem?
● Implications for Learnability?
Note: If we are looking for a learnable
knowledge representation language, we
want it to be as restrictive as possible.
1.0 0.5 0.5 1.0x
1.0
0.5
0.5
1.0
y
27. The Logical Language
Introduction and Elimination Rules for existential quantifiers:
Introduction and Elimination Rules for universal quantifiers:
Introduction and Elimination Rules for implication:
Axioms for simple embedded implications:
28. The Logical Language
Conclusion:
We have thus reconstructed, with a semantics grounded in the category
of differential manifolds, Man, the full intuitionistic logic
programming language in:
● “Clausal Intuitionistic Logic. I. Fixed-Point Semantics,” J. of Logic
Programing, 5(1): 1-31 (1988).
● “Clausal Intuitionistic Logic. II. Tableau Proof Procedures,” J. of
Logic Programing, 5(2): 93-132 (1988).
29. Defining the Ontology of LLD
From “A Language for Legal Discourse. I. Basic Features,” in
ICAIL'89:
● “There are many common sense categories underlying the
representation of a legal problem domain: space, time, mass, action,
permission, obligation, causation, purpose, intention, knowledge,
belief, and so on. The idea is to select a small set of these common
sense categories, ... and … develop a knowledge representation
language that faithfully mirrors the structure of this set. The
language should be formal: it should have a compositional syntax, a
precise semantics and a well-defined inference mechanism. ...”
30. Defining the Ontology of LLD
● Count Terms and Mass Terms
● Events/Actions and Modalities Over Actions
● “Permissions and Obligations,” IJCAI '83.
● “Modalities Over Actions,” KR '94.
● Knowledge and Belief
● S.N. Artemov, “The Logic of Justification,” Rev. of Symbolic Logic, 7(1): 1-36
(2008).
● M. Fitting, “Reasoning with Justifications” (2009).
36. Toward a Theory of Coherence
Logic
Geometry
Probability
Constraints
Logic is constrained by the geometry.
37. Toward a Theory of Coherence
Logic
Geometry
Probability
Constraints
Logic is constrained by the geometry.
Geometric model is constrained by
the probabilistic model.
38. Toward a Theory of Coherence
Logic
Geometry
Probability
Constraints
Logic is constrained by the geometry.
Geometric model is constrained by
the probabilistic model.
Probability measure is constrained by the data.
39. Toward a Theory of Coherence
Logic
Geometry
Probability
Constraints
Logic is constrained by the geometry.
Geometric model is constrained by
the probabilistic model.
Probability measure is constrained by the data.
Conjecture: The existence of these mutual constraints makes
theory construction possible.
40. Toward a Theory of Coherence
ICAIL-'97, Section 5:
… Somehow, the requirement that the exemplar of a concept must be
“similar” to a prototype (a kind of “horizontal” constraint) seems to
reinforce the requirement that the exemplar must be placed at some
determinate level of the concept hierarchy (a kind of “vertical”
constraint). How is this possible?
This is one of the great mysteries of cognitive science.
It is also one of the great mysteries of legal theory.
41. Toward a Theory of Coherence
ICAIL-'97, Section 5:
… Somehow, the requirement that the exemplar of a concept must be
“similar” to a prototype (a kind of “horizontal” constraint) seems to
reinforce the requirement that the exemplar must be placed at some
determinate level of the concept hierarchy (a kind of “vertical” constraint).
How is this possible?
This is one of the great mysteries of cognitive science.
It is also one of the great mysteries of legal theory.
Q: Is the mystery now solved?