Descriptive Granularity - Building Foundations of Data Mining

DESCRIPTIVE GRANULARITY
Building Foundations of Data
Mining

In Memory of my Professors: Zdzislaw Pawlak,
Helena Rasiowa and Roman Sikorski

Anita Wasilewska
Computer Science Department
Stony Brook University
Stony Brook, NY
1

Part 1: INTRODUCTION

2

We all have scientiﬁc history;

All problems we work on have history;

It is important to trace history

of problems we work on;

We all build scientiﬁc history;

The future belongs to us,

and so does the past.

3

We all have scientiﬁc history;

Here is my LATEST history (of building Foun-
dations of Data Mining)

1995- 1998 I supervised PhD Thesis of

Ernestina Menasalvas, now Professor and a
Vice-Rector of Madrid Polytechnic.

We (with some others) went from building
models for concrete implementations (1996-
2002) to

developing a general language for Founda-
tions of Data Mining (2002 -2004) to

building a general foundational model for Data
Mining (2005- ).

4

It has been a slow process but finally a com-
munity and specialized conferences devel-
oped, books started to appear:

Foundations and Novel Approaches in Data
Mining, T.Y. Lin, S. Ohsuga, C. J. Liau,
and X. Hu , editors, Springer 2006,

Data Mining: Foundations and Practice, Tsau
Young Lin, Ying Xie, Anita Wasilewska,
Churn-Jung Liau, editors, Studies in Com-
putational Intelligence (SCI)118, Springer-
Verlag 2008,

and a field Foundations of Data Mining was
created.

We all build the scientific history and it takes
TIME and patience to do so.

5

Our work in Data Mining Foundations ma-
tured and ﬁnally we were invited by T.Y.
LIN to write a 20 pages long entry about
our research in the Encyclopedia of Com-
plexity and System Science published by
Springer in 2008.

The Encyclopedia is Springer’s latest and
prestigious initiative with its Board of Ed-
itors including between others Ahmed Ze-
wail, Nobel in Chemistry, Thomas Schelling,
Nobel in Economics, Richard E. Stearns,
1993 Turing Award, Pierre-Louis Lions, 1994
Fields Medal, and Lotﬁ Zadeh, IEEE Medal
of Honor.

All entries were by invitation only and the in-
clusion of our work shows the recognition
of the need for foundational studies in
newly developing domains.

6

All problems we work on have history

Short History of Foundational Studies

The origins of Foundational Studies can be
traced back to David Hilbert, a German
mathematician, recognized as one of the
most inﬂuential and universal mathemati-
cians of the 19th and early 20th centuries.

7

Hilbert Problems: In 1900 he proposed at the
Paris conference of the International Congress
of Mathematicians 23 problems for the fu-
ture century.

Several of them turned out to be very inﬂu-
ential for 20th century mathematics and
later Computer Science.

Of the cleanly-formulated Hilbert problems,

TEN problems: 3, 7, 10, 11, 13, 14, 17, 19,
20, and 21 have solutions that are ac-
cepted by consensus.

8

TWO Problems: 1, 2 are FOUNDATIONAL
Problems; 1 concerning Continuum Hypoth-
esis was solved by Cohen in 1963, and 2
concerning Consistency of Arithmetic was
solved by and Godel and Gentzen in 1936

FIVE Problems: 5, 9, 15, 18, and 22 have
partial solutions,

FOUR problems: 4, 6, 16, and 23 are too
loosely formulated to be ever described
as possible to be solved.

TWO Problems: 8 (the Riemann Hypothe-
sis, along with the Goldbach conjecture is
a part of it) and 12 are still OPEN, both
being in number theory.

9

Riemann hypothesis was proposed by Bern-
hard Riemann (1859)

It is a conjecture about the distribution of the
zeros of the Riemann zeta function which
states that all non-trivial zeros have real
part 1/2.

The Riemann hypothesis implies results about
the distribution of prime numbers that are
in some ways as good as possible.

Along with suitable generalizations, it is con-
sidered by some mathematicians to be the
most important unresolved problem in pure
mathematics.

10

Pierre Deligne proved in 1973 analogue of the
Riemann Hypothesis for zeta functions of
varieties defined over finite fields.

The full version of the hypothesis remains un-
solved, although

computer calculations have shown that the
first 10 trillion zeros lie on the critical line.

11

Goldbach’s conjecture (1742) is one of the
oldest unsolved problems in number theory
and in all of mathematics. It states:

Every even integer greater than 2 can
be expressed as the sum of two primes

For example;

4 = 2 + 2, 6 = 3 + 3, 8 = 3 + 5,

10 = 7 + 3, or 5 + 5, 12 = 5 + 7, 14 = ....

T. Oliveira e Silva is running a distributed com-
puter search that has veriﬁed the conjec-
ture for n ≤ 1.609 × 1018 and some higher
small ranges up to 4 × 1018.

12

Hilbert Program

Hilbert proposed, in 1920 a research project
that became known as Hilbert’s Program.

1. He wanted mathematics to be formulated
on a solid and complete logical founda-
tion.

2. He believed that in principle this could be
done, by showing that all of mathematics
follows from a correctly-chosen ﬁnite sys-
tem of axioms and that some such axiom
system is provably consistent.

3. He also believed that one can have such
a system in which proofs of theorems can
be deduced automatically from the way
the theorems are built.

13

In 1931 Kurt Godel showed that Hilbert’s grand
plan 1. and 2. was impossible as stated.

Godel proved in what is now called Godel’s
Incompleteness Theorem that any non
contradictory formal system, which was com-
prehensive enough to include at least arith-
metic, cannot demonstrate its complete-
ness by way of its own axioms.

In 1933-34 Gerhard Gentzen gave a positive
answer to 3. in a case of classical proposi-
tional logic, and partially positive answer in
case of (semi-undecidable) predicate logic.

Nevertheless Hilbert’s and Godel’s work led
to the development of recursion theory
and then mathematical logic and foun-
dations of mathematics as autonomous
disciplines.

14

Gentzen’s work led to the development of Proof
Theory and Automated Theorem Prov-
ing as separate Mathematics and Computer
Science domains.

Godel inspired works of Alonzo Church and
Alan Turing that became the basis for
theoretical computer science and also
led to the further development of a unique
phenomenon called the Polish School of
Mathematics and later to the creation of
Foundational Studies in Computer Science.

15

Personal History: my Master Thesis in Com-
puter Science (under Pawlak and Rasiowa)
consisted of a solution of Gentzen’s con-
juncture for Modal S4 and S5 Logics and
consequently I also developed first world
theorem prover for S4 Modal Logic in
1967.

As a result I have spent first 15 years of my
scientific life (before coming to USA) work-
ing in Proof Theory for non-classical log-
ics, formulated (as a pure mathematician)
a General Theory of Gentzen Type For-
malizations and established various re-
sults about connections and relationships
between certain Classes of Logics, For-
mal Languages and Theory of Programs
(as computer scientist).

16

Polish School of Mathematics

The term Polish School of Mathematics refers
to groups of mathematicians of the 1920’s
and 1930’s working on common subjects.

The main two groups were situated in War-
saw and Lvov (now Lviv, the biggest city
in Western Ukraine).

We talk hence more speciﬁcally about War-
saw and Lvov Schools of Mathematics
and additionally of Warsaw-Lvov School
of Logic working in Warsaw.

17

Any list of important twentieth century math-
ematicians contains Polish names in a fre-
quency out of proportion to the size of the
country.

Poland was partitioned by Russia, Germany,
and Austria and was under foreign domi-
nation for 200 years, from 1795 until the
end of World War I.

What was to become known as the Polish
School of Mathematics was possible be-
cause it was carefully planned, agreed
upon, and executed.

18

Independent Poland was crated in 1918 and
University of Warsaw re-opened with
Janiszewski, Mazurkiewicz, and Sierpin-
ski as professors of mathematics.

They chose logic, set theory, point-set topol-
ogy, and real functions as the area of
concentration.

The journal Fundamenta Mathematicae was
founded in 1920 and is still in print.

It was the ﬁrst specialized mathematical
journal in the world.

19

The choice of title was deliberate to reﬂect
that all areas published there were to be
connected with foundational studies.

It should be remembered that at the time
these areas had not yet received full
acceptance by the mathematical commu-
nity.

The choice reﬂected both insight and courage

20

The notable mathematicians of the Warsaw
and Lvov Schools of Mathematics were,
between others Stefan Banach, Stanis-
lam Ulam and after the war, Roman
Sikorski.

Stefan Banach was self-taught mathematics
prodigy and the founder of modern func-
tional analysis.

Mathematical concepts named after Banach
include the Banach-Tarski paradox, Hahn-
Banach theorem, BanachSteinhaus theo-
rem, Banach-Mazur game and Banach spaces.

21

Stanislaw Ulam emigrated to America just be-
fore the war and became American math-
ematician of Polish-Jewish origins.

He participated in the Manhattan Project
and originated the Teller-Ulam design of
thermonuclear weapons.

He also invented nuclear pulse propulsion and
developed a number of mathematical tools
in number theory, set theory, ergodic the-
ory and algebraic topology.

22

Roman Sikorski reputation was established by
his outstanding results in Boolean algebras,
functional analysis, theories of distribution,
measure theory, general topology, descrip-
tive set theory, and in Algebraic Math-
ematical Logic (with collaboration with
Rasiowa).

In axiomatic set theory, the Rasiowa-Sikorski
Lemma is one of the most fundamental
facts used in the technique of forcing.

23

The notable logicians of the Lvov-Warsaw
School of Logic were:

Alfred Tarski - since 1942 in Berkeley and
founder of American School of Founda-
tions of Mathematics,

Jan Lukasiewicz, Andrzej Mostowski, and
after the second world war Helena Ra-
siowa.

24

Helena Rasiowa became, in 1977 the founder
of Fundamenta Informaticae the ﬁrst world
journal specialized in foundation of com-
puter science.

The choice of the title Fundamenta Infor-
maticae was again deliberate.

It reﬂected not only the subject, but also
stresses that the new research area being
developed in Warsaw is a direct continu-
ation of the tradition of the Foundational
Studies of Polish School of Mathemat-
ics.

25

Part 2:
DESCRIPTIVE GRANULARITY
A Model for Data Mining

26

We present here a formal syntax and seman-
tics for a notion of a descriptive granu-
larity.

We do so in terms of three abstract models:
Descriptive, Semantic, and Granular.

Descriptive model formalizes the syntactical
concepts and properties of the data min-
ing, or learning process.

Semantic model formalizes its semantical prop-
erties.

Granular model establishes a relationship be-
tween the Descriptive and Semantic mod-
els in terms of a formal satisfaction rela-
tion.

27

Data Mining - Informal Deﬁnition

One of the main goals of Data Mining is to
provide comprehensible descriptions of
information extracted from the data bases.

We are hence interested in building models
for a descriptive data mining, i.e. the
data mining which main goal is to produce
a set of descriptions in a language easily
comprehensible to the user.

28

The descriptions come in different forms.

In case of classification problems it might be
a set of characteristic or discriminant rules,
it might be a decision tree or a neural net-
work with fixed set of weights.

In case of association analysis it is a set of
associations (frequent item sets), or asso-
ciation rules with accuracy parameters.

In case of cluster analysis it is a set of clus-
ters, each of which has its own description
and a cluster name.

29

In case of approximate classiﬁcation by the
Rough Set analysis it is usually a set of dis-
criminant or characteristic rules (with or
without accuracy parameters) or a set of
decision tables.

Data Mining results are usually presented to
the user in their descriptive, i.e. syntac-
tic form as it is the most natural form of
communication.

But the Data Mining process is deeply
semantical in its nature.

We hence build our Granular Model on two
levels: syntactic and semantic.

30

SYNTAX

We understand] by syntax, or syntactical
concepts simple relations among symbols
and expressions of formal symbolic lan-
guages.

A symbolic language is a pair
L = (A, E),
where A is an alphabet and E is the set of
expressions of L.

The expressions of formal languages, even if
created with a speciﬁc meaning in mind,
do not carry themselves any meaning, they
are just ﬁnite sequences of certain symbols.

The meaning is being assigned to them
by establishing a proper semantics.

31

SEMANTICS

Semantics for as given symbolic language L
assigns a speciﬁc interpretation in some
domain to all symbols and expressions
of the language.

It also involves related ideas such as truth
and model. They are called semantical
concepts to distinguish them from the syn-
tactical ones.

32

MODEL

The word model is used in many situations
and has many meanings but they all reﬂect
some parts, if not all, of its following formal
meaning.

A structure M , called also an interpretation,
is a model for a set E0 ⊆ E of expressions
of a formal language L if and only if every
expression E ∈ E0 is true in M .

33

All our Models are abstract structures that
allow us to formalize some general prop-
erties of Data Mining process and address
the semantics-syntax duality inherent to
any Data Mining process.

Moreover, it allows us to provide a formal def-
inition of a generalization and of Data
Mining as the process of information gen-
eralization.

34

The notion of generalization is deﬁned in
terms of granularity of steps of the pro-
cess.

Data is represented in the model in a form of
Knowledge Systems.

Each Knowledge System has a granularity
associated with it and the process changes,
or not, its granularity.

Granularity is the crucial for deﬁning some
notions and components of the model, hence
the Granular Model name.

35

Granular Model

Granular Model is a system
GM = ( S M, DM, |= ) where:

• SM is a Semantic Model;

• DM is a Descriptive Model;

• |= ⊆ P(U ) × E is called a satisfaction
relation, where U is the universe of SM
and E is the set of descriptions deﬁned
by the DM.

Satisfaction |= establishes truth relationship
between the data mining model and the
descriptive model.

36

Semantic Model deﬁnition motivation.

First step in any data mining procedures is to
drop the key attribute.

This step allows us to introduce similarities
in the database as records do not have their
unique identiﬁcation anymore.

The input into the data mining process is
hence always a a data table obtained from
the target data by removal of the key at-
tribute.

We call it a target data table.

37

As the next step we represent, following Rough
Set model our target data table as Pawlak’s
Information System with the universe U
by adding a new, non attribute column for
the record names, i.e. objects of U . We
take this set U as the universe of our model
of SM.

Why Information system?

We want to model Data Mining as a process
of generalization.

In order to model this process we have ﬁrst
to deﬁne what does it mean from seman-
tical point of view that one stage of the
process is more general then the other.

38

The idea behind is very simple. It is the
same as saying that (a + b)2 = a2 + 2ab + b2
is a more general formula then the formula
(2 + 3)2 = 22 + 2 · 2 · 3 + 32.

This means that one description (formula)
is more general then the other if it de-
scribes more objects.

From semantical point of view it means that
data mining process consists of putting ob-
jects (records) in sets of objects.

From syntactical point of view data min-
ing process consists of building descrip-
tions (in terms of attribute, values of at-
tributes pairs) of these sets of objects, with
some extra parameters, if needed.

39

To model a situation that allows us to talk
about descriptions of sets of records (ob-
jects) we extend the notion of Pawlak’s
model of information system to our notion
of Knowledge System.

The universe of a knowledge system con-
tains some subsets of U , i.e. elements of
P(U ).

For example a target data table (after pre-
processing) and the corresponding repre-
sentation by Pawlak’s information system,
and a knowledge system with universe
U of granularity one are as follows.

40

Target Data Table T0
a1 a2 a3
small small medium
medium small medium
small small medium
big small small
medium medium big
small small medium
big small small
medium medium big
small small medium
big small medium
medium medium small
small small medium
big small big
medium medium small

Target Information System I0
U a1 a2 a3
x1 small small medium
x2 medium small medium
x4 big small small
x5 medium medium big
x7 big small small
x8 medium medium big
x10 big small medium
x11 medium medium small
x13 big small big
x14 medium medium small

41

Knowledge System of granularity one (all
objects are one element sets) correspond-
ing to target table T0 is as follows.

Target Knowledge System K0
P 1 (U ) a1 a2 a3
{x1 } small small medium
{x2 } medium small medium
{x4 } big small small
{x5 } medium medium big
{x7 } big small small
{x8 } medium medium big
{x10 } big small medium
{x11 } medium medium small
{x13 } big small big
{x14 } medium medium small

42

Assume now that we have applied some algo-
rithm ALG1 and it has returned a following
set
D = {D1, D2, ...D7}
of descriptions.

D1 : (a1 = s) ∩ (a2 = s) ∩ (a3 = m),

D2 : (a1 = m) ∩ (a2 = s) ∩ (a3 = m),

D3 : (a1 = m) ∩ (a2 = m) ∩ (a3 = b),

D4 : (a1 = m) ∩ (a2 = m) ∩ (a3 = s),

D5 : (a1 = b) ∩ (a2 = s) ∩ (a3 = s),

D6 : (a1 = b) ∩ (a2 = s) ∩ (a3 = m),

D7 : (a1 = b) ∩ (a2 = s) ∩ (a3 = b).

43

Questions

Q1 How well this set of descriptions describes
our original data i.e. how accurate is the
algorithm ALG1 we have used to ﬁnd them,

Q2 how accurate is the knowledge we have
thus obtained out of our data.

The answer is formulated in terms of the tar-
get information system with the universe
U , and the sets S(D) deﬁned (after Pawlak)
for any description D ∈ D as follows.

S(D) = {x ∈ U : D}.

We call S(D) the truth set for D.

44

Intuitively, the sets

S(D) = {x ∈ U : D}
contain all records (i.e. their identiﬁers)
with the same description given in terms
of attribute, values of attribute pairs.

The descriptions do not need to utilize all at-
tributes of the target data, as it is often
the case, and one of ultimate goals of data
mining is to ﬁnd descriptions with as few
attributes as possible.

45

In association analysis the descriptions can rep-
resent the frequent item sets.

For example , for a frequent three itemset
D = i1i2i3, the truth set S(D) represents
all all transactions that contain items i1, i2, i3.

In general description come in diﬀerent forms,
depending on the data mining goal and ap-
plication.

We deﬁne formally a general form of descrip-
tions as a part of the Descriptive Model

46

For the target data and descriptions Di ∈ D
presented in the above examples the sets
S(Di) are as follows.

S1 = S(D1 ) = {x ∈ U : D1 } = {x1 , x3 , x6 , x9 , x12 },

S2 = S(D2 ) = {x ∈ U : D2 } = {x2 },

S3 = S(D3 ) = {x ∈ U : D3 } = {x5 , x8 },

S4 = S(D4 ) = {x ∈ U : D4 } = {x11 , x14 },

S5 = S(D5 ) = {x ∈ U : D5 } = {x4 , x7 },

S6 = S(D6 ) = {x ∈ U : D6 } = {x10 },

S7 = S(D7 ) = {x ∈ U : D7 } = {x13 }.

47

We represent our results in a form of a Knowl-
edge System as follows.

Resulting Knowledge System K1
P(U ) a1 a2 a3
{x1 , x3 , x6 , x9 , x12 } s s m
{x2 } m s m
{x5 , x8 } m m b
{x11 , x14 } m m s
{x4 , x7 } b s s
{x10 } b s s
{x13 } b s b

P(U ) a1 a2 a3
S1 s s m
S2 m s m
S3 m m b
S4 m m s
S5 b s s
S6 b s s
S7 b s b

48

The representation of data mining results in
a form of a knowledge system allows us to
deﬁne how good is the knowledge ob-
tained by a given algorithm.

In our case the knowledge obtained describes
100% of our target data as

S1 ∪ S2 ∪ S3 ∩ ... ∪ S7 = {x1, x2, ..., x14} = U.

Observe that the sets S1, ..S7 are also disjoint
and non-empty, i.e. they form a partition
of the universe U .

We deﬁne such knowledge as exact.

49

Moreover, we can see that the resulting sys-
tem K1 is more general then the input
data K0 because its granularity is higher
the the granularity of K0.

Deﬁnition: Granularity of a knowledge sys-
tem is the maximum of cardinality of its
granules, i.e. elements of its universe.

The granularity of all Target Knowledge Sys-
tems is one.

Granularity of K1 is

max{|S1|, ...|S7|} = max{5, 1, 2, } = 5.

50

Now assume that we have applied to out tar-
get data T (represented by K0 ) another
algorithm ALG2 and it returned two de-
scriptions D1, D2 under a condition that we
need only descriptions of the length 2 and
with frequency ≥ 30%. The descriptions
are:

D1 : (a1 = s) ∩ (a2 = s),

D2 : (a2 = s) ∩ (a3 = m).

Now we evaluate:

S1 = S(D1 ) = {x1 , x3 , x6 , x9 , x12 },

S2 = S(D2 ) = {x1 , x2 , x3 , x6 , x9 , x10 , x12 }.

51

Incorporating the algorithm parameters im-
posed by the ALG2 into our Knowledge
System we obtain the following table.

Resulting Knowledge System K2
P(U ) a1 a2 a3 #of attr frequency
S1 s s - 2 36%
S2 - s m 2 50%

The sets S1, S2 do not form a partition of the
universe U as S1 ∩ S2 = ∅ and moreover,
S1 ∪ S2 = U .

The knowledge obtained by the algorithm ALG2
is hence not exact.

It describes only 57% of the target data and
what is described is described following cer-
tain (frequency) conditions.

Of course K2 is more general then K0.

52

The algorithm ALG2 generalized the target
data, even if in an incomplete way.

The formal deﬁnitions of Information System,
Knowledge and Target Knowledge Systems,
and their granularity and exactness are as
follows.

53

Knowledge System is an extension of the fol-
lowing notion of Pawlak’s information sys-
tem.

Information System is a system

I = (U, A, VA, f ),
where U = ∅ is called a set of objects,
A = ∅, VA = ∅ are called the set of at-
tributes and values of of attributes, re-
spectively,
f is called an information function and
f : U × A −→ VA

54

A knowledge system based on the informa-
tion system

I = (U, A, VA, f )
is a system

KI = (P(U ), A, E, VA, VE , g)

where

E is a ﬁnite set of knowledge attributes (k-
attributes) such that A ∩ E = ∅.

VE is a ﬁnite set of values of k- attributes.

55

g is a partial function called knowledge in-
formation function(k-function)

g : P(U ) × (A ∪ E) −→ (VA ∪ VE )
such that

(i) g | ( x∈U {x} × A) = f

(ii) ∀S∈P(U )∀a∈A((S, a) ∈ dom(g) ⇒ g(S, a) ∈
VA)

(iii) ∀S∈P(U )∀e∈E ((S, e) ∈ dom(g) ⇒ g(S, e) ∈
VE )

56

We use the above notion of knowledge sys-
tem to deﬁne the granules of the universe
and the granularity of the system, an hence
later, the granularity of the data mining
process.

Granule: Any set S ∈ P(U ) i.e. S ⊆ U is
called a granule of U .

Granularity of S: The cardinality |S| of S is
called a granularity of S.

Granule Universe: The set

GrK = {S ∈ P : ∃b ∈ (E∪A)((S, b) ∈ dom(g))}
is called a granule universe of KI .

Granularity of K: A number grK = max{|S| :
S ∈ GrK } is called a granularity of K.

57

A knowledge system K = (P(U ), A, E, VA, VE , g)
is called exact if and only if all its granules
GrK form a partition of the universe U .

Operators: In our Model we represent data
mining algorithms as certain operators.

For example our ALG1 is represented in the
semantic model by an operator p1 acting
on some subset of a set K of knowledge
systems, such that

p1(K0) = K1.

ALG2 is represented in the model by an op-
erator p2 also acting on some (may be dif-
ferent) subset of the set K of knowledge
systems, such that

p2(K0) = K2.

58

We put all the above observations into a for-
mal notion of a semantic model.

Semantic Model is a system

S M = (P(U ), K, G),
where:

• U = ∅ is the universe;

• K = ∅ is a set of knowledge systems,
called also data mining process states;

• G = ∅ is the set of operators;

• Each operator p ∈ G is a partial function
on the set of all data mining process
states, i.e. p : K −→ K.

59

The semantic model is always being built for
a given application.

The target data is represented ﬁrst in a form
the target information system with the uni-
verse U , and then in the form of target
knowledge system K0, as we showed in our
examples.

60

The semantic model based on our examples
is as follows.

S M = (P(U ), K, G),
where:

• U = {x1, x2, ...x14};

• K = {K0, K1, K2};

• G = {p1, p2};

• Each pi ∈ G for (i = 1, 2) is a partial
function pi : K1 −→ K1, such that
p1(K0) = K1, p2(K0) = K2.

61

Data Mining as Generalization

We model data mining as a process of gen-
eralization in terms of the generalization
relation based on a notion of granularity
and generalization operators.

Deﬁnition: A relation ⊆ K × K is called a
generalization relation if the following
condition holds for any K, K ∈ K.

K K if and only if grK ≤ grK ,
where grK denotes the granularity of K.

62

Observe that for K0, K1, K2 from our exam-
ples grK0 = 1 ≤ 5 = grK1 ≤ 7 = grK2 , and
the system K2 is the most general.

But at the same time K1 is exact and K2 is
not exact, so we have a trade oﬀ between
exactness and generality.

Deﬁnition: an operator g ∈ G is called a gen-
eralization operator if for any K, K ∈ K
such that g(K) = K , we have that

K K.

Observe that both operators p1, p2 in our ex-
ample are generalization operators.

63

Data Mining Operators G

In data mining process the preprocessing and
data mining proper are disjoint , inclu-
sive/exlusive categories.

The preprocessing is an integral and very im-
portant stage of the data mining process
and needs as careful analysis as the data
mining proper.

Our framework allows us distinguish two dis-
joint classes of operators: the preprocess-
ing operators Gprep and data mining proper
operators Gdm and we put

G = Gprep ∪ Gdm.

64

We provide also a detailed formal definitions,
their motivation, and discussion of these
two classes.

Data Mining and preprocessing operators de-
fine different kind of generalizations.

The model presented in our examples didn’t
include the preprocessing stage; it used the
data mining proper operators only.

65

The main idea behind the concept of the
operator is to capture not only the fact
that data mining techniques generalize the
data but also to categorize existing meth-
ods.

We deﬁne within our model three classes of
data mining operators: classiﬁcation Gclass,
clustering Gclust, and association Gassoc.

We don’t include in our analysis purely sta-
tistical methods like regression, etc...

66

We prove the following theorem.

Theorem Let Gclass, Gclust and Gassoc be the
sets of all classiﬁcation, clustering, and as-
sociation operators, respectively.

The following conditions hold.

(1) Gclass = Gclust = Gassoc

(2) Gassoc ∩ Gclass = ∅,

(3) Gassoc ∩ Gclust = ∅.

67

Data Mining Process

Deﬁnition Any sequence

K1, K2, ....Kn (n ≥ 1)
of data mining states is called a data pre-
processing process, if there is a prepro-
cessing operator G ∈ Gprep, such that

G(Ki) = Ki+1, i = 1, 2, ...n − 1.

Deﬁnition Any sequence

K1, K2, ....Kn (n ≥ 1)
of data mining states is called a data min-
ing proper process , if there is a data
mining proper operator G ∈ Gdm, such
that

G(Ki) = Ki+1, i = 1, 2, ...n − 1.

68

The data mining process consists of the pre-
processing process (that might be empty)
and the data mining proper process.

We know that the sets Gprep and Gdm are dis-
joint. This justifies the the following defi-
nition.

Definition Data mining process process is any
sequence

K1, K2, ....Kn (n ≥ 1)
of data mining states, such that

K1, ..Ki (0 ≤ i ≤ n)
is a preprocessing process and

Ki+1, ...., Kn
is a data mining proper process.

69

Granular Model
Syntax- Semantic Duality of Data Mining




by the DM.

Satisfaction |= establishes relationship between
the semantic model and the descriptive model.

70

Descriptive Model

For any Semantic Model S M = (P(U ), K, G, )
we associate with it its descriptive counter-
part deﬁned below.

A Descriptive Model is a system

DM = ( L, E, DK ),
where:

L = ( A, E ) is called a descriptive lan-
guage;

A is a countably inﬁnite set called the alpha-
bet;

E = ∅ and E ⊆ A∗ is the set of descriptive
expressions of L;

71

DK = ∅ and DK ⊆ P(E) is a set of descrip-
tions of knowledge states.

As in a case of semantic model, we build the
descriptive model for a given application.

We deﬁne here only a general form of the
model.

We assume however, that whatever is the ap-
plication, the descriptions are always build
in terms of attributes and values of the
attributes, some logical connectives, some
predicates and some extra parameters, if
needed.

The commonly used descriptions have the form
(a = v) to denote that the attribute a has
a value v, but one might also use, as it is
often done, a predicate form a(v) or a(x, v)
instead.

72

For example, a neural network with its nodes
and weights can be seen as a formal de-
scription (in an appropriate descriptive lan-
guage), and the knowledge states would
represent changes in parameters during the
neural network training process.

The model we build here is a model for, what
we call a descriptive data mining, i.e. the
data mining for which the goal of the data
mining process is to produce a set of de-
scriptions in a language easily comprehen-
sible to the user.

For that purpose in the model we identify the
decision tree constructed by the classiﬁca-
tion by Decision Tree algorithm with the
set of discriminant rules obtained from the
tree.

73




by the DM.

Satisfaction |= establishes relationship between
the semantic model and the descriptive model.

We deﬁne the Satisfaction |= component of
the Granular Model DM in the following
stages.

Stage1 For each K ∈ K, we deﬁne its own
descriptive language LK = ( AK , EK ).
74

Stage2 For each K ∈ K, and descriptive ex-
pression F ∈ EK , we define what does it
mean that D satisfied in K; i.e. we define
a satisfaction relation |=K .

Stage3 For each K ∈ K, and descriptive ex-
pression F ∈ EK , we define what does it
mean that D is true K, i.e. |=K D.

Stage4 We use the satisfaction relation |=K
to define, for each K ∈ K, the set DK ⊆
P(EK ) of descriptions of its own knowl-
edge.

Stage5 We use the languages LK to define
the descriptive language L.

Stage6 We use the descriptive expressions
EK of LK to define the set E of descriptive
expressions of L.

Stage7 We use the satisfaction relations |=K
to define the satisfaction relation |= of
the Granular Model GM.

75

Part 3: TRACING THE
HISTORY
Mathematics Genealogy Project
genealogy.math.ndsu.nodak.edu

76

We all have a history

We are all mathematicians

Mission Statement of the Mathematics Ge-
nealogy Project defines a mathematician
as follows.

” ... Throughout this project when we use
the word ”mathematics” or ”mathemati-
cian” we mean that word in a very inclu-
sive sense. Thus, all relevant data from
statistics, computer science, or operations
research is welcome....”

Computer Science classification within the
project is: Mathematics Subject Classifi-
cation: 68Computer Science.

77

The Genealogy Project solicits information from
all schools who participate in the devel-
opment of research level mathematics and
from all individuals who may know desired
information. It means Computer Science
as well.

For them, and the history, we are all math-
ematicians.

78

Below are some links (sequences of connected
people) for a computer scientist.

Any two people in the sequence are listed in
order PhD student, Adviser.

If a person has more then one adviser the ad-
viser is preceded with a number; i.e.

adviser 1 is listed as 1. adviser Name,

adviser 2 is listed as 2. adviser Name, etc...

79

A mathematician would say:

For any element A of the sequence, if A
has more then one adviser, then for any
1 ≤ k ≤ n , an adviser k is listed as k.Name
of the adviser k,
and the number in front of the name is
omitted otherwise.

80

Link to Nicolaus Copernicus
(Mikolaj Kopernik)
He has 1598 descendants

Anita Wasilewska, Ph.D. Warsaw University,
1975, Poland, Helena Rasiowa, Ph.D. War-
saw University,1950, Andrzej Mostowski,
Ph.D. Warsaw University, 1938, 2. Alfred
Tarski, Ph.D. Warsaw University, 1924,
Stanislaw Lesniewski, Ph.D. University of
Lvov, 1912, Kazimierz Twardowski, Ph.D.
Universitat Wien, 1891, Franz Clemens
Brentano, Ph.D. Eberhard Karls Universi-
tat, Tubingen 1862, 2. Friedrich Adolf
Trendelenburg, Dr. phil. Universitat Leipzig,
1826, 1. Georg Ludwig Konig, Artium
Liberalium Magister, Georg August Univer-
sitat, Gottingen, 1790, Christian Heyne,
Magister Juris, Universitat Leipzig, 1752,

81

1. Johann August Bach, Magister philosophiae,
Universitat Leipzig, 1744, 1.Christian Kust-
ner, Magister philosophiae, Universitat Leipzig,
1742, Johann Ernesti, Magister philosophiae,
Universitat Leipzig, 1730, Johann Gesner,
Magister artium, Friedrich Schiller Univer-
sitat Jena, 1715, Johann Buddeus, Magis-
ter artium, Martin Luther Universitat, Halle
Wittenberg, 1687, Michael Walther, Jr.,
Magister artium, Theol. Dr., Martin Luther
Universitat, Halle Wittenberg, 1661, 1687,
2.Johann Quenstedt, Magister artium, Theol.
Dr., Universitat Helmstedt, Martin Luther
Universitat,b Halle Wittenberg, 1643, 1644,
Christoph Notnagel, Magister artium, Mar-
tin Luther Universitat, Halle Wittenberg,
1630, Ambrosius Rhodius, Magister artium,
Medicinae Dr., Martin Luther Universitat,
Halle Wittenberg, 1600, 1610,

82

1.Melchior Jostel, Magister artium, Medici-
nae Dr., Martin Luther Universitat, Halle
Wittenberg, 1583, 1600, 1.Valentin Otto,
Magister artium, Martin Luther Universi-
tat, Halle Wittenberg, 1570, Georg Joachim
Rheticus, Magister artium, Martin Luther
Universitat, Halle Wittenberg 1535,

2. Nicolaus Copernicus, Juris utriusque,
Doctor, Uniwersytet Jagiellonski (Cra-
cow Jagellonian University), Universita
di Bologna, Universita degli Studi di
Ferrara, Universita di Padova, 1499,
Poland-Italy,

2.Domenico Novara da Ferrara, Universita di
Firenze, 1483, 1. Johannes Regiomon-
tanus, Magister artium, Universitat Leipzig,
Universitat Wien, 1457,

83

Georg von Peuerbach, Magister artium, Uni-
versitat Wien, 1440, Johannes von Gmunden,
Magister artium, Universitat Wien, 1406,
Heinrich von Langenstein, Magister artium,
Theol. Dr., Universite de Paris, 1363,
1375, unknown.

Georg von Peuerbach, 1375 is my ”oldest”
ancestor.

THERE ARE 3 more lines of ancestry; also
interesting, if not so illustrious. Here they
are.

84

Link to Gottfried Leibniz
(54209 descendants),
Immanuel Kant
( 2176 descendants), and
Desiderius Erasmus of Rotterdam
(57416 descendants)

saw University, 1950, Andrzej Mostowski,
Ph.D. Warsaw University, 1938, 2. Alfred
Tarski, Ph.D. Warsaw University, 1924,
Stanislaw Lesniewski, Ph.D. University of
Lvov, 1912, Kazimierz Twardowski, Ph.D.
Universitat Wien, 1891, Franz Clemens
Brentano, Ph.D. Eberhard Karls Univer-
sitat, Tubingen 1862, 2. Friedrich Adolf
Trendelenburg, Dr. Phil. Universitat Leipzig,
1826, 2. Karl Reinhold, PhD.,

85

Immanuel Kant, Ph.D. Universitat Konigs-
berg 1770,

Martin Knutzen, Dr. Phil. Universitat Konigs-
berg, 1732, Christian von Wolﬀ, Dr. phil.,
Universitat Leipzig, 1700,

2. Gottfried Leibniz, Dr. jur. Universitat
Altdorf, 1666,

2. Christiaan Huygens, Artium Liberalium
Magister, Jurisutriusque Doctor, Universiteit
Leiden, Universite d’Angers, 1647, 1655,
Frans van Schooten, Jr., Artium Liberal-
ium Magister, Universiteit Leiden, 1635,
Jacobus Golius, Artium Liberalium Magis-
ter, Philosophiae Doctor Universiteit Lei-
den, 1612, 1621, 1. Willebrord (Snel van
Royen) Snellius, Artium Liberalium Magis-
ter, Universiteit Leiden, 1607, 2. Rudolph
86

(Snel van Royen) Snellius, Artium liberal-
ium Magister, Universitat zu Koln, Ruprecht
Karls Universitat Heidelberg, 1572, 1. Valen-
tine Naibod, Magister Artium, Martin Luther
Universitat, Halle Wittenberg, Universitat
Erfur, Erasmus Reinhold, Magister Artium,
Martin Luther Universitat, Halle Witten-
berg, 1535, Jakob Milich, Liberalium Ar-
tium Magister, Med. Dr., Albert Ludwigs
Universitat Freiburg, Breisgau, Universitat
Wien, 1520, 1524,

Desiderius Erasmus Roterodamus (sometimes
known as Desiderius Erasmus of Rot-
terdam), University of Paris, Theologiae
Baccalaureus, College de Montaigu, 1497,

Jan Standonck, Magister Artium, Theol. Dr.,
College Sainte-Barbe, College de Montaigu,
1474, 1490, unknown

Link to Pierre-Simon Laplace
( 50295 descendants) and
Jean Le Rond d’Alembert

Ph.D. Warsaw University, 1938, 1. Kaz-
imierz Kuratowski, Ph.D. Warsaw Uni-
versity, 1921, 1. Stefan Mazurkiewicz,
Ph.D. University of Lvov, 1913, Waclaw
Sierpinski, Ph.D. Uniwersytet Jagiellonski,
1906, 1. Stanislaw Zaremba, Ph.D. Uni-
versite Paris IV-Sorbonne, 1889, Gaston
Darboux, Ph.D. Ecole Normale Superieure,
Paris, 1866, Michel Chasles, Ph.D. Ecole
Polytechnique, 1814, Simeon Poisson, Ph.D.
Ecole Polytechnique, 1800, 2. Pierre-Simon
Laplace, Ph.D., Jean Le Rond d’Alembert,
unknown

87

Link to Emile Borel
(2506 descendants),
Leonhard Euler
(52555 descendants)

Ph.D. Warsaw University, 1938, 2. Zyg-
munt Janiszewski, Ph.D. Ecole Normale
Superieure Paris, 1911, Henri Lebesgue,
Ph.D. Universite Henri Poincare Nancy 1,
1902, Emile Borel, Ph.D. Ecole Normale
Superieure, Paris, 1893, Gaston Darboux,
Ph.D. Ecole Normale Superieure, Paris, 1866,
Michel Chasles, Ph.D., Ecole Polytechnique,
1814, Simeon Poisson, Ph.D. Ecole Poly-
technique, 1800,

88

1. Joseph Lagrange, no degree, student of
Leonhard Euler, Ph.D. Universitat Basel,
1726, Dr. med. Universitat Basel, 1694,
Dr. hab. Sci. Universitat Basel, 1684,
Gottfried Leibniz, Dr. jur. Universitat Alt-
dorf, 1666, 1.Johann Bernoulli, Dr. med.
Universitt Basel 1694, Jacob Bernoulli, Dr.
hab. Sci. Universitt Basel, 1684, Got-
tfried Wilhelm Leibniz, Dr. jur. Universitt
Altdorf, 1666, 1. Erhard Weigel, Ph.D.
Universitt Leipzig, 1650, unknown.

89

Link to Andrei Markov
(4824 descendants), and
Pafnuty Chebyshev (5964 descendants)

Ph.D. Warsaw University, 1938, 1. Kaz-
imierz Kuratowski, Ph.D. Warsaw Uni-
versity,1921, 1. Stefan Mazurkiewicz,
Ph.D. University of Lvov, 1913, Waclaw
Sierpinski, Ph.D. Uniwersytet Jagiellonski,
1906, 2. Georgy Fedoseevich Voronoy,
Ph.D. University of St. Petersburg, 1896,
Andrei Markov, Ph.D. University of St.
Petersburg, 1884, Pafnuty Chebyshev,
Ph.D. University of St. Petersburg, 1849,
Nikolai Dmitrievich Brashman, Ph.D. Moscow
State University, 1834, Joseph Johann von
Littrow, Ph.D., unknown

90

MY PhD COUSINS include

Kurt Goedel

Alain Turing

Alonso Church

Roman Sikorski

Zdzislam Pawlak

and many others....I am sure some of them
in this room!

91

In Stony Brook CS Department I traced 10
of them.

WE ALL ARE A BIG SCIENTIFIC
FAMILY!

92

Descriptive Granularity - Building Foundations of Data Mining

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Descriptive Granularity - Building Foundations of Data Mining

Similar to Descriptive Granularity - Building Foundations of Data Mining (20)

More from Distinguished Lecturer Series - Leon The Mathematician

More from Distinguished Lecturer Series - Leon The Mathematician (20)

Recently uploaded

Recently uploaded (20)

Descriptive Granularity - Building Foundations of Data Mining