Comprehension as compression

Comprehension
and Compression
Scientific Understanding, Pattern Recognition, and
Kolmogorov Complexity
John S. Wilkins
School of Historical and Philosophical Studies

John S Wilkins john@wilkins.id.au
1. Understanding in music, business, and science maybe
2
Outline

2. Traditional accounts and recent work
2
Outline

3. The mechanics of understanding
2
Outline

4. Kolmogorov complexity and compression
2
Outline

5. The subjects that understand
2
Outline

6. Handwaving
2
Outline

6. Handwaving
7. Phylogenies
2
Outline

6. Handwaving
7. Phylogenies
8. Conclusions [tentative]
2
Outline

“The accompanying diagram will aid us in understanding this rather
perplexing subject.” [Darwin, Origin, chapter 4]
3
Understanding perplexing subjects

4
Biology and the big data problem

4
Biology and the big data problem
“It’s human DNA!”

“But besides this practical concern, there is a
second basic motivation for the scientific quest,
namely, man's insatiable intellectual curiosity, his
deep concern to know the world he lives in, and
to explain, and thus to understand, the unending
flow of phenomena it presents to him.”
[Carl Hempel 1962]
From the data up

[Carl Hempel 1962]
“Information is not knowledge.
Knowledge is not wisdom.
Wisdom is not truth.”
[Zappa 1979]
From the data up

[Carl Hempel 1962]
“Information is not knowledge.
Knowledge is not wisdom.
Wisdom is not truth.”
[Zappa 1979]
“Data is not information, information is not
knowledge, knowledge is not wisdom, wisdom is
not truth.”
[Robert Royar 1994]
Hempel, Carl G. "Explanation in Science and History," in
Frontiers of Science and Philosophy, ed. R.C. Colodny, 1962,
pp. 9- 19. Pittsburgh: The University of Pittsburgh Press.
Royar, Robert. “New Horizons, Clouded Vistas.” Computers
and Composition 11, no. 2 (January 1, 1994): 93–105.
Zappa, Frank. “Packard Goose”. 1979. Joe’s Garage: Acts I, II &
III. FZ Records.
From the data up

The missing level in the
DIKW pyramid

DIKW pyramid
Understanding?

DIKW pyramid
Understanding?
If we approach this from
the machine learning
perspective, we might
get a better idea of
human scientific
understanding

Since Aristotle, understanding has been seen as based on causal explanation
(cf. de Regt 2004, Baumberger et al. 2017 for summaries)
7
Traditional accounts of understanding

Hempel and the logical empiricists rejected subjective elements such as feelings of confidence
or enlightenment
7

or enlightenment
• “Empathic insight and subjective understanding provide no warrant of objective validity, no
basis for the systematic prediction or explanation of phenomena…” [Hempel 1965, 163]
7

or enlightenment
The objectivist approach requires ultimately some pragmatist notion:
7

or enlightenment
• Of prediction (Trout 2002, Wittgenstein 1953 §152)
7

or enlightenment
• Of explanation (van Fraassen 1980)
7

or enlightenment
The utility of causal accounts ultimately constitutes understanding.
7

or enlightenment
The utility of causal accounts ultimately constitutes understanding.
Subjectivist or phenomenological accounts of understanding are merely psychologistic on
this approach.
7
Van Fraassen, Bas C. The ScienQfic Image. Oxford: Clarendon Press, 1980.
Hempel, Carl G. Aspects of ScienQfic ExplanaQon, and Other Essays in the Philosophy of Science. New York: The Free Press, 1965.
Regt, Henk W. de. “Discussion Note: Making Sense of Understanding.” Philosophy of Science 71, no. 1 (January 1, 2004): 98–109.
Trout, J. D. “Scienffic Explanafon and the Sense of Understanding.” Philosophy of Science 69, no. 2 (June 1, 2002): 212–33.
Baumberger, Christoph, Claus Beisbart, and Georg Brun. “What Is Understanding? An Overview of Recent Debates in Epistemology and Philosophy of Science.” In Explaining
Understanding: New PerspecQves from Epistemolgy and Philosophy of Science, edited by Stephen Grimm Christoph Baumberger and Sabine Ammon, 1–34. Routledge, 2017.

Henk de Regt and his
collaborators have
developed an account
of understanding as:
8
Recent work on understanding

collaborators have
• Intelligibility
8

collaborators have
• Intelligibility
• Empirical adequacy
8

collaborators have
• Intelligibility
• Consistency
8

collaborators have
• Intelligibility
• Consistency
These are contextual
features of disciplinary
or professional
understanding, without
reference to subjects
8
De Regt, Henk W. Understanding ScienQﬁc Understanding. Oxford Studies in Philosophy of Science. Oxford University Press, 2017.
De Regt, Henk W., Sabine Leonelli, and Kai Eigner. ScienQﬁc Understanding: Philosophical PerspecQves. Piksburgh: University of
Piksburgh Press, 2009.

9
De Regt’s Criteria

1.Intelligibility: the value that scientists attribute to the cluster of qualities
of a theory (in one or more of its representations) that facilitate the use of
the theory: A scientific theory T (in one or more of its representations) is
intelligible for scientists (in context C) if they can recognize qualitatively
characteristic consequences of T without performing exact calculations.
9

2.Empirical adequacy
Nuff said (for now; I’ll get back to this)
9

3.Consistency: A phenomenon P is understood scientifically if and only if
there is an explanation of P that is based on an intelligible theory T and
conforms to the basic epistemic values of empirical adequacy and
internal consistency.
9

3.Consistency: A phenomenon P is understood scientifically if and only if
there is an explanation of P that is based on an intelligible theory T and
conforms to the basic epistemic values of empirical adequacy and
internal consistency.
Note the passive nature here. It’s as if scientists don’t do very much in
understanding 9

The mechanism of understanding is not generally considered in epistemology…
10
Formal mechanics

• because we do not have the requisite neurobiological or neuropsychological
knowledge yet (we do not understand understanding);
10
Formal mechanics

• but also because there are objections to subjectivism in psychological
approaches (with which I concur).
10
Formal mechanics

How to approach this?
10
Formal mechanics

One domain in which such matters are dealt with mechanically (i.e., mechanisms are
sought) is machine learning (ML, a subset of AI).
10
Formal mechanics

• The mechanisms can be opaque “black boxes” in their operation (deep learning)
10
Formal mechanics

• There is no subjectivity required (but there is a “subject”; that is, a learning
system)
10
Formal mechanics

• There is no subjectivity required (but there is a “subject”; that is, a learning
system)
I will attempt to generalise ML and algorithmic information theoretic tools to apply to
this problem of understanding within knowing systems
10
Formal mechanics

It may help if we avail ourselves of an old distinction in physics between two kinds of
description:
11
Kinematic versus dynamic

description:
• Kinematics: description of motions without consideration of forces
11

description:
• Dynamics: the relations between forces and motion*
11
*Dunamis [δύναμις] is an ordinary Greek word for possibility or capability. Depending on context, it could be
translated "potency", "potential", "capacity", "ability", "power", "capability", "strength", "possibility",
"force" [Wikipedia]

description:
In brief: between phenomenal accounts and causal explanations.
11
"force" [Wikipedia]

description:
We can explore the problem of understanding in knowing systems via the shift from
kinematic accounts (what the behaviour of the system that understands is) to dynamic
explanations (what forces the behaviour of the understanding system)
11
"force" [Wikipedia]

description:
• Of systems that are the subject of understanding
11
"force" [Wikipedia]

description:
• Of systems that understand
11
"force" [Wikipedia]

description:
• Of systems that understand
These coincide: as we move from kinematic descriptions to dynamic explanations of
knowing systems, we also move from considering knowledge to considering understanding
11
"force" [Wikipedia]

Knowing systems have a limit to their working memory under realistic assumptions
12
Pattern recognition

Knowing systems also have to deal with noisy data and uncertainty
12
Pattern recognition

Once the scale of the data set exceeds the capacity of the system’s cognitive limits
[working memory, processing power] pattern recognition must be out-sourced
12
Pattern recognition

• E.g., regression and other statistical analyses
12
Pattern recognition

• Machine Learning classifications (ANNs)
12
Pattern recognition
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0069191

• Distance analyses (Hamming, Manhattan)
12
Pattern recognition

Less memory and cognitive resources are
required to retrieve and interpret patterns
12
Pattern recognition

A pattern therefore has information the data
does not
12
Pattern recognition

A pattern therefore has information the data
does not
[Here, information is a property of the state
of a knowing system]
12
Pattern recognition

As any ML system is a Turing machine, it follows that, even when we do not know
it, there is an algorithm and data set A that generates the output pattern P [where
P is the phenomenon]
13
Machine learning and understanding

The information content of P is therefore the complexity of A
13

• If P is lossy, then its information content is lower in complexity than the data set
13

• If P is lossless, then the complexity of A equals the complexity of P [defn]
13

So A is a compression algorithm*. By analogy, when we generate a pattern (such
as a regression curve) from a large or noisy data set, we are compressing the data
to generate information, and from that information we can fairly say we know the
domain the data set samples or represents.
13
* The worst compression is, of course, the data set itself plus a “print” command

So A is a compression algorithm*. By analogy, when we generate a pattern (such
as a regression curve) from a large or noisy data set, we are compressing the data
to generate information, and from that information we can fairly say we know the
domain the data set samples or represents.
The complexity of P is specified in Algorithmic Information Theory (AIT) as the
length (in bits) of the shortest program that generates P.
13
* The worst compression is, of course, the data set itself plus a “print” command

AIT is based on the work of Andrey Kolmogorov (1903–1987) and Ray Solomonov
(1926–2009)
14
Kolmogorov compressed

(1926–2009)
Kolmogorov Complexity (K) is the length of the shortest program in a language L
that generates a string w
14

(1926–2009)
• The most complex string in L is a fully random string
14

(1926–2009)
• A fully random string is incompressible
14

(1926–2009)
• The working memory of a ML system is thus the largest random string that it
can hold in memory while applying functions to it
14

(1926–2009)
Thus K gives the complexity of a string (of digits, or other structured data)
14

(1926–2009)
Thus K gives the complexity of a string (of digits, or other structured data)
If we conceive of ourselves as knowing systems analogous to ML systems, the
information in data is thus the K “program” in our cognitive processes
14
Li, Ming, and Paul Vitányi. An IntroducQon to Kolmogorov Complexity and Its ApplicaQons. 4th ed. Texts in Computer Science. Springer Internafonal
Publishing, 2019.
Grünwald, Peter, and Paul M. B. Vitányi. “Shannon Informafon and Kolmogorov Complexity.” CoRR cs.IT/0410002 (2004).
Wallace, C. S., and D. L. Dowe. “Minimum Message Length and Kolmogorov Complexity.” The Computer Journal 42, no. 4 (January 1, 1999): 270–83.

Note that while using complexity metrics is based on the subject – the ML or the
knowing system – it is not subjective, as the experience felt by the knower is not relevant
for understanding the information
15
Reducing the complexity of data

[Nor is this an objectivist account, as it does not depend upon a truth operator]
15

Complexity of the information is inversely related to the informativeness for a limited
knowing system
15

knowing system
• Put another way, the tractability of inferences increases as complexity reduces
15

knowing system
[Inversely, tractability reduces as data complexity increases, unless it is compressible]
15

knowing system
• Compression of data to information is a lossy process; the structure/pattern in the
data is simplified through regression, etc.
15

knowing system
• The function of this information is to set the expectations of the knowing system
[but that’s another talk; contrastive explanation is why]
15

knowing system
• The function of this information is to set the expectations of the knowing system
[but that’s another talk; contrastive explanation is why]
• In short: the information derived from the data sets the prior probabilities for
future data, highlighting anomalies and points of interest
15

16
Compression and generalisation
“Solomonoff held that
all inference problems
can be cast in the form
of extrapolation from an
ordered sequence of
binary symbols.”
[Li and Vintányi 2019,63]

16
Measurement
compresses to
ordered sequence of
binary symbols.”

Analysis
compresses to
16
Measurement
compresses to
ordered sequence of
binary symbols.”

Analysis
compresses to
16
Measurement
compresses to
Kinematic rules
y = f(x)
applies to
ordered sequence of
binary symbols.”

Analysis
compresses to
Dynamic or causal models
y = f(x)
16
Measurement
compresses to
Kinematic rules
y = f(x)
applies to
ordered sequence of
binary symbols.”

Analysis
compresses to
y = f(x)
16
Measurement
compresses to
Kinematic rules
y = f(x)
applies to
Data Information
KnowledgeUnderstanding
ordered sequence of
binary symbols.”

Analysis
compresses to
y = f(x)
16
Measurement
compresses to
Kinematic rules
y = f(x)
applies to
Data Information
Understanding
ordered sequence of
binary symbols.”

Analysis
compresses to
y = f(x)
16
Measurement
compresses to
Kinematic rules
y = f(x)
applies to
Data Information
Understanding
What counts as wisdom is left to
you to decide
ordered sequence of
binary symbols.”

So, how might this analogy work in the arena of science? Scientists seek to understand
the universe, of course, as Hempel said.
17
Subjects understanding

• But what does “scientists” refer to? What are the knowing systems, or subjects, in
science?
17

science?
• particular scientists
17

science?
• disciplines and sub disciplines
17

science?
• research groups
17

science?
• research groups
I would take this approach to apply equally well to both individual and collective
knowing systems/subjects.
17

science?
• research groups
We can say that Professor X understands some phenomenon, or that Discipline D
understands some phenomenon independently (individuals can fail to understand
what the discipline does, and vice versa)
17

science?
• research groups
We can say that Professor X understands some phenomenon, or that Discipline D
understands some phenomenon independently (individuals can fail to understand
what the discipline does, and vice versa)
[Does this mean we have actual Turing machines in our heads? No, it’s an abstract way
to consider the problem (just as neurones are not artificial neurones or v.v.)]
17

I do not have time here to give more than one proper case study; so I will handwave
at some:
18
Insert case studies here

at some:
• Gas laws (Boyles’ Law, Charles’ Law, Gay-Lussac’s Law, Avogadro’s Law, etc.)
leading to the Ideal Gas Law, to van der Waals’ equation
18

at some:
• Periodic table construction by empirical lab work, followed first by valency, then
electron shell, then quantum mechanical causal accounts
18

at some:
• Mendelian genetics (phenomenal) through to population genetics (kinematic),
through to molecular genetics (dynamic)
18

at some:
• Arguably: observational to theoretical ecology, through to ecological
thermodynamics and “ecological orbits” (Haynie 2001, Ginsburg and Colyvan 2004)
18

at some:
• Arguably: observational to theoretical ecology, through to ecological
thermodynamics and “ecological orbits” (Haynie 2001, Ginsburg and Colyvan 2004)
• Tectonic drift theory (Oreskes and le Grand 2003)
18
Ginzburg, Lev R., and Mark Colyvan. Ecological Orbits : How Planets Move and PopulaQons Grow. Oxford; New York: Oxford University Press,
2004.
Haynie, Donald T. Biological Thermodynamics. Cambridge ; New York: Cambridge University Press, 2001.
Oreskes, Naomi, and Homer E. LeGrand. Plate Tectonics: An Insider’s History of the Modern Theory of the Earth. Boulder, CO.: Westview Press,
2003.

• Phylogenies used to be done qualitatively/
subjectively
19
Case: Phylogenies
From Stirton, R. A. 1940. Phylogeny of
North American Equidae. Bull. Dept.
Geol. Sci., Univ. California 25(4):
165-198.

subjectively
• In cases like the palaeontology of the horse, only
morphological-skeletal characters were available
19
Case: Phylogenies
165-198.

subjectively
• Consequently, the diagrams relied upon the
apprehension of small data sets combined with
the “expertise” of the taxonomist
19
Case: Phylogenies
165-198.

subjectively
• As phylogenetics became mathematised with
“numerical” taxonomy (computer-based, “total
evidence”), it was supposed to become more
objective
19
Case: Phylogenies
165-198.

subjectively
objective
• It wasn’t (choice of principal components skewed
the results unpredictably)
19
Case: Phylogenies
165-198.

subjectively
objective
• It wasn’t (choice of principal components skewed
the results unpredictably)
• Enter molecular data 19
Case: Phylogenies
165-198.

• At first, DNA-DNA hybridisation
studies
20
Molecular phylogenies
Sibley, Charles G., and Jon E. Ahlquist. “Phylogeny and Classification
of Birds Based on the Data of DNA-DNA Hybridization.” Current
Ornithology. Boston, MA: Springer US, 1983.

studies
• E.g., Sibley and Ahlquist’s
taxonomy of birds in 1980s
20

studies
• Based on percentage of
similarity of total DNA [T50H];
a single number representing
billions of bases
20

studies
• Based on percentage of
similarity of total DNA [T50H];
a single number representing
billions of bases
• “With few exceptions the
DNA comparisons “see” the
same clusters of species that
are seen by the human eye.”
20

• Full genomic sequences
21
Sequence data

• Molecular clocks
21
Sequence data

• Problems
21
Sequence data

• Problems
• Paralogy and
homology
21
Sequence data

• Problems
• Paralogy and
homology
• Statistical
probabilities of
sequences
21
Sequence data

• Problems
• Paralogy and
homology
• Statistical
probabilities of
sequences
• Hybridisation and
later transfer
21
Sequence data

• Problems
• Paralogy and
homology
• Statistical
probabilities of
sequences
• Hybridisation and
later transfer
• What do the
diagrams summarise?
21
Sequence data

• A cladogram is a binary tree [B-Tree in computing]
22
Phylogenies are compressed data
Images: https://commons.wikimedia.org/wiki/File:Cladogram_stomiid.svg
http://fishesofaustralia.net.au/home/family/208

• Its topology represents the analysed structure in the
very large data set
22

very large data set
• Contrary to Hollywood/TV, a biologist cannot
eyeball a DNA sequence and identify the
species
22

very large data set
species
• In fact, they cannot even identify
alignments
22

very large data set
species
alignments
• The bigger the data set, the more compression of
the data is required to be able to understand the
relationships
22

very large data set
species
alignments
• The bigger the data set, the more compression of
the data is required to be able to understand the
relationships
• This is due to the fact that humans, and epistemic groups of
humans, have limitations on the working memory, cognitive
power, and information transmission channels, that they can bring
to a problem set, like any ML system
22

The more complete a large data set, the more compression is required
to comprehend the data structure
23
Big data is not always better

A total census, for example, is reduced to trends, thresholds, averages,
etc. Economic activity is reduced to Gross Domestic Products, etc.
23

In light of Big Data, even empirical adequacy becomes a statistical
property, as error creeping into measurement is magnified
23

So, to understand, compress.
23

So, to understand, compress.
Mind, compression is not, ipso facto, comprehension.
23

Science is, in my view, an attempt to express the greatest amount of empirical
consequences in the fewest terms. Understanding thus sets and constitutes:
24
Conclusions

• Empirical adequacy (van Fraassen)
24
Conclusions

• Generalisation/-ability
24
Conclusions

• Causal explanation
24
Conclusions

which are pretty much de Regt’s three criteria restated.
24
Conclusions

The initial epistemic activities of measurement, analysis and kinematic
generalisation all involve compression so that we may comprehend the things
needing explanation
24
Conclusions

needing explanation
• Compression of data is a kind of pattern matching, on the basis of which
ampliative inferences rest.
24
Conclusions

needing explanation
• Compression of data is a kind of pattern matching, on the basis of which
ampliative inferences rest.
• Big data is not, in its own right, a good thing, so there!
24
Conclusions

25
Acknowledgements

25
For asking the question:
• Adam Ford
For discussion:
• The late John Collier
• Malte Ebach
Acknowledgements

25
• Adam Ford
For discussion:
• Malte Ebach
• Ward Wheeler
• Marcus Hutter
Acknowledgements

25
• Adam Ford
For discussion:
• Malte Ebach
• Ward Wheeler
• Marcus Hutter
For funding:
• ISHPSSSB Oslo conference organisers
For ear nibbles: Clio, a muse
Thank you
Acknowledgements

Comprehension as compression

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Comprehension as compression

Ähnlich wie Comprehension as compression (20)

Mehr von John Wilkins

Mehr von John Wilkins (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Comprehension as compression