SlideShare a Scribd company logo
1 of 94
Download to read offline
DESCRIPTIVE GRANULARITY
  Building Foundations of Data
             Mining




In Memory of my Professors: Zdzislaw Pawlak,
Helena Rasiowa and Roman Sikorski




           Anita Wasilewska
     Computer Science Department
        Stony Brook University
           Stony Brook, NY
                                       1
Part 1:   INTRODUCTION




                         2
We all have scientific history;


All problems we work on have history;


It is important to trace history


of problems we work on;


We all build scientific history;


The future belongs to us,


and so does the past.



                                    3
We all have scientific history;


Here is my LATEST history (of building Foun-
   dations of Data Mining)


1995- 1998 I supervised PhD Thesis of


Ernestina Menasalvas, now Professor and a
   Vice-Rector of Madrid Polytechnic.


We (with some others) went from building
  models for concrete implementations (1996-
  2002) to


developing a general language for Founda-
  tions of Data Mining (2002 -2004) to


building a general foundational model for Data
   Mining (2005- ).

                                        4
It has been a slow process but finally a com-
    munity and specialized conferences devel-
    oped, books started to appear:


Foundations and Novel Approaches in Data
  Mining, T.Y. Lin, S. Ohsuga, C. J. Liau,
  and X. Hu , editors, Springer 2006,


 Data Mining: Foundations and Practice, Tsau
  Young Lin, Ying Xie, Anita Wasilewska,
  Churn-Jung Liau, editors, Studies in Com-
  putational Intelligence (SCI)118, Springer-
  Verlag 2008,


and a field Foundations of Data Mining was
  created.


We all build the scientific history and it takes
  TIME and patience to do so.

                                       5
Our work in Data Mining Foundations ma-
  tured and finally we were invited by T.Y.
  LIN to write a 20 pages long entry about
  our research in the Encyclopedia of Com-
  plexity and System Science published by
  Springer in 2008.

   The Encyclopedia is Springer’s latest and
   prestigious initiative with its Board of Ed-
   itors including between others Ahmed Ze-
   wail, Nobel in Chemistry, Thomas Schelling,
   Nobel in Economics, Richard E. Stearns,
   1993 Turing Award, Pierre-Louis Lions, 1994
   Fields Medal, and Lotfi Zadeh, IEEE Medal
   of Honor.


All entries were by invitation only and the in-
    clusion of our work shows the recognition
    of the need for foundational studies in
    newly developing domains.

                                         6
All problems we work on have history


Short History of Foundational Studies


The origins of Foundational Studies can be
  traced back to David Hilbert, a German
  mathematician, recognized as one of the
  most influential and universal mathemati-
  cians of the 19th and early 20th centuries.




                                        7
Hilbert Problems: In 1900 he proposed at the
    Paris conference of the International Congress
    of Mathematicians 23 problems for the fu-
    ture century.


Several of them turned out to be very influ-
   ential for 20th century mathematics and
   later Computer Science.


Of the cleanly-formulated Hilbert problems,


TEN problems: 3, 7, 10, 11, 13, 14, 17, 19,
  20, and 21 have solutions that are ac-
  cepted by consensus.




                                          8
TWO Problems: 1, 2 are FOUNDATIONAL
  Problems; 1 concerning Continuum Hypoth-
  esis was solved by Cohen in 1963, and 2
  concerning Consistency of Arithmetic was
  solved by and Godel and Gentzen in 1936


FIVE Problems: 5, 9, 15, 18, and 22 have
   partial solutions,


FOUR problems: 4, 6, 16, and 23 are too
  loosely formulated to be ever described
  as possible to be solved.


TWO Problems: 8 (the Riemann Hypothe-
  sis, along with the Goldbach conjecture is
  a part of it) and 12 are still OPEN, both
  being in number theory.


                                       9
Riemann hypothesis was proposed by Bern-
   hard Riemann (1859)


It is a conjecture about the distribution of the
     zeros of the Riemann zeta function which
     states that all non-trivial zeros have real
     part 1/2.


The Riemann hypothesis implies results about
  the distribution of prime numbers that are
  in some ways as good as possible.


Along with suitable generalizations, it is con-
   sidered by some mathematicians to be the
   most important unresolved problem in pure
   mathematics.


                                         10
Pierre Deligne proved in 1973 analogue of the
   Riemann Hypothesis for zeta functions of
   varieties defined over finite fields.


The full version of the hypothesis remains un-
  solved, although


computer calculations have shown that the
   first 10 trillion zeros lie on the critical line.




                                             11
Goldbach’s conjecture (1742) is one of the
   oldest unsolved problems in number theory
   and in all of mathematics. It states:

    Every even integer greater than 2 can
   be expressed as the sum of two primes


For example;

      4 = 2 + 2,   6 = 3 + 3,   8 = 3 + 5,

   10 = 7 + 3, or 5 + 5,   12 = 5 + 7, 14 = ....


T. Oliveira e Silva is running a distributed com-
   puter search that has verified the conjec-
   ture for n ≤ 1.609 × 1018 and some higher
   small ranges up to 4 × 1018.



                                          12
Hilbert Program


Hilbert proposed, in 1920 a research project
    that became known as Hilbert’s Program.


1. He wanted mathematics to be formulated
   on a solid and complete logical founda-
   tion.


2. He believed that in principle this could be
   done, by showing that all of mathematics
   follows from a correctly-chosen finite sys-
   tem of axioms and that some such axiom
   system is provably consistent.


3. He also believed that one can have such
   a system in which proofs of theorems can
   be deduced automatically from the way
   the theorems are built.

                                        13
In 1931 Kurt Godel showed that Hilbert’s grand
    plan 1. and 2. was impossible as stated.


Godel proved in what is now called Godel’s
  Incompleteness Theorem that any non
  contradictory formal system, which was com-
  prehensive enough to include at least arith-
  metic, cannot demonstrate its complete-
  ness by way of its own axioms.


In 1933-34 Gerhard Gentzen gave a positive
    answer to 3. in a case of classical proposi-
    tional logic, and partially positive answer in
    case of (semi-undecidable) predicate logic.


Nevertheless Hilbert’s and Godel’s work led
   to the development of recursion theory
   and then mathematical logic and foun-
   dations of mathematics as autonomous
   disciplines.

                                           14
Gentzen’s work led to the development of Proof
   Theory and Automated Theorem Prov-
   ing as separate Mathematics and Computer
   Science domains.


Godel inspired works of Alonzo Church and
  Alan Turing that became the basis for
  theoretical computer science and also
  led to the further development of a unique
  phenomenon called the Polish School of
  Mathematics and later to the creation of
  Foundational Studies in Computer Science.




                                       15
Personal History: my Master Thesis in Com-
   puter Science (under Pawlak and Rasiowa)
   consisted of a solution of Gentzen’s con-
   juncture for Modal S4 and S5 Logics and
   consequently I also developed first world
   theorem prover for S4 Modal Logic in
   1967.


As a result I have spent first 15 years of my
  scientific life (before coming to USA) work-
  ing in Proof Theory for non-classical log-
  ics, formulated (as a pure mathematician)
  a General Theory of Gentzen Type For-
  malizations and established various re-
  sults about connections and relationships
  between certain Classes of Logics, For-
  mal Languages and Theory of Programs
  (as computer scientist).


                                       16
Polish School of Mathematics




The term Polish School of Mathematics refers
  to groups of mathematicians of the 1920’s
  and 1930’s working on common subjects.


The main two groups were situated in War-
  saw and Lvov (now Lviv, the biggest city
  in Western Ukraine).


We talk hence more specifically about War-
  saw and Lvov Schools of Mathematics
  and additionally of Warsaw-Lvov School
  of Logic working in Warsaw.



                                      17
Any list of important twentieth century math-
   ematicians contains Polish names in a fre-
   quency out of proportion to the size of the
   country.


Poland was partitioned by Russia, Germany,
   and Austria and was under foreign domi-
   nation for 200 years, from 1795 until the
   end of World War I.


What was to become known as the Polish
 School of Mathematics was possible be-
 cause it was carefully planned, agreed
 upon, and executed.




                                       18
Independent Poland was crated in 1918 and
   University of Warsaw re-opened with
   Janiszewski, Mazurkiewicz, and Sierpin-
   ski as professors of mathematics.


They chose logic, set theory, point-set topol-
  ogy, and real functions as the area of
  concentration.


The journal Fundamenta Mathematicae was
  founded in 1920 and is still in print.


It was the first specialized mathematical
   journal in the world.




                                       19
The choice of title was deliberate to reflect
  that all areas published there were to be
  connected with foundational studies.


It should be remembered that at the time
   these areas had not yet received full
   acceptance by the mathematical commu-
   nity.


The choice reflected both insight and courage




                                       20
The notable mathematicians of the Warsaw
  and Lvov Schools of Mathematics were,
  between others Stefan Banach, Stanis-
  lam Ulam and after the war, Roman
  Sikorski.


Stefan Banach was self-taught mathematics
   prodigy and the founder of modern func-
   tional analysis.


Mathematical concepts named after Banach
  include the Banach-Tarski paradox, Hahn-
  Banach theorem, BanachSteinhaus theo-
  rem, Banach-Mazur game and Banach spaces.




                                     21
Stanislaw Ulam emigrated to America just be-
   fore the war and became American math-
   ematician of Polish-Jewish origins.


He participated in the Manhattan Project
   and originated the Teller-Ulam design of
   thermonuclear weapons.


He also invented nuclear pulse propulsion and
   developed a number of mathematical tools
   in number theory, set theory, ergodic the-
   ory and algebraic topology.




                                       22
Roman Sikorski reputation was established by
  his outstanding results in Boolean algebras,
  functional analysis, theories of distribution,
  measure theory, general topology, descrip-
  tive set theory, and in Algebraic Math-
  ematical Logic (with collaboration with
  Rasiowa).


 In axiomatic set theory, the Rasiowa-Sikorski
    Lemma is one of the most fundamental
    facts used in the technique of forcing.




                                          23
The notable logicians of the Lvov-Warsaw
  School of Logic were:


Alfred Tarski - since 1942 in Berkeley and
    founder of American School of Founda-
    tions of Mathematics,


 Jan Lukasiewicz, Andrzej Mostowski, and
   after the second world war Helena Ra-
   siowa.




                                     24
Helena Rasiowa became, in 1977 the founder
   of Fundamenta Informaticae the first world
   journal specialized in foundation of com-
   puter science.


The choice of the title Fundamenta Infor-
  maticae was again deliberate.


It reflected not only the subject, but also
    stresses that the new research area being
    developed in Warsaw is a direct continu-
    ation of the tradition of the Foundational
    Studies of Polish School of Mathemat-
    ics.




                                        25
Part 2:
DESCRIPTIVE GRANULARITY
  A Model for Data Mining




                            26
We present here a formal syntax and seman-
  tics for a notion of a descriptive granu-
  larity.


We do so in terms of three abstract models:
  Descriptive, Semantic, and Granular.


Descriptive model formalizes the syntactical
   concepts and properties of the data min-
   ing, or learning process.


Semantic model formalizes its semantical prop-
  erties.


Granular model establishes a relationship be-
   tween the Descriptive and Semantic mod-
   els in terms of a formal satisfaction rela-
   tion.

                                        27
Data Mining - Informal Definition



One of the main goals of Data Mining is to
  provide comprehensible descriptions of
  information extracted from the data bases.


We are hence interested in building models
  for a descriptive data mining, i.e. the
  data mining which main goal is to produce
  a set of descriptions in a language easily
  comprehensible to the user.




                                       28
The descriptions come in different forms.


In case of classification problems it might be
    a set of characteristic or discriminant rules,
    it might be a decision tree or a neural net-
    work with fixed set of weights.


In case of association analysis it is a set of
    associations (frequent item sets), or asso-
    ciation rules with accuracy parameters.


In case of cluster analysis it is a set of clus-
    ters, each of which has its own description
    and a cluster name.




                                          29
In case of approximate classification by the
    Rough Set analysis it is usually a set of dis-
    criminant or characteristic rules (with or
    without accuracy parameters) or a set of
    decision tables.


Data Mining results are usually presented to
   the user in their descriptive, i.e. syntac-
   tic form as it is the most natural form of
   communication.


    But the Data Mining process is deeply
    semantical in its nature.


 We hence build our Granular Model on two
   levels: syntactic and semantic.


                                            30
SYNTAX


We understand] by syntax, or syntactical
  concepts simple relations among symbols
  and expressions of formal symbolic lan-
  guages.


A symbolic language is a pair
                   L = (A, E),
   where A is an alphabet and E is the set of
   expressions of L.


The expressions of formal languages, even if
  created with a specific meaning in mind,
  do not carry themselves any meaning, they
  are just finite sequences of certain symbols.


   The meaning is being assigned to them
   by establishing a proper semantics.

                                        31
SEMANTICS


Semantics for as given symbolic language L
  assigns a specific interpretation in some
  domain to all symbols and expressions
  of the language.


It also involves related ideas such as truth
   and model. They are called semantical
   concepts to distinguish them from the syn-
   tactical ones.




                                       32
MODEL


The word model is used in many situations
  and has many meanings but they all reflect
  some parts, if not all, of its following formal
  meaning.


A structure M , called also an interpretation,
   is a model for a set E0 ⊆ E of expressions
   of a formal language L if and only if every
   expression E ∈ E0 is true in M .




                                           33
All our Models are abstract structures that
    allow us to formalize some general prop-
    erties of Data Mining process and address
    the semantics-syntax duality inherent to
    any Data Mining process.


Moreover, it allows us to provide a formal def-
  inition of a generalization and of Data
  Mining as the process of information gen-
  eralization.




                                         34
The notion of generalization is defined in
  terms of granularity of steps of the pro-
  cess.


Data is represented in the model in a form of
   Knowledge Systems.


Each Knowledge System has a granularity
  associated with it and the process changes,
  or not, its granularity.


Granularity is the crucial for defining some
   notions and components of the model, hence
   the Granular Model name.




                                        35
Granular Model




Granular Model is a system
   GM = ( S M, DM, |= ) where:

    • SM is a Semantic Model;

    • DM is a Descriptive Model;

    • |= ⊆ P(U ) × E is called a satisfaction
      relation, where U is the universe of SM
      and E is the set of descriptions defined
      by the DM.


Satisfaction |= establishes truth relationship
   between the data mining model and the
   descriptive model.

                                        36
Semantic Model definition motivation.


First step in any data mining procedures is to
    drop the key attribute.


This step allows us to introduce similarities
   in the database as records do not have their
   unique identification anymore.


The input into the data mining process is
  hence always a a data table obtained from
  the target data by removal of the key at-
  tribute.


We call it a target data table.



                                         37
As the next step we represent, following Rough
   Set model our target data table as Pawlak’s
   Information System with the universe U
   by adding a new, non attribute column for
   the record names, i.e. objects of U . We
   take this set U as the universe of our model
   of SM.


Why Information system?


We want to model Data Mining as a process
  of generalization.


In order to model this process we have first
    to define what does it mean from seman-
    tical point of view that one stage of the
    process is more general then the other.


                                        38
The idea behind is very simple. It is the
  same as saying that (a + b)2 = a2 + 2ab + b2
  is a more general formula then the formula
  (2 + 3)2 = 22 + 2 · 2 · 3 + 32.


This means that one description (formula)
  is more general then the other if it de-
  scribes more objects.


From semantical point of view it means that
   data mining process consists of putting ob-
   jects (records) in sets of objects.


From syntactical point of view data min-
  ing process consists of building descrip-
  tions (in terms of attribute, values of at-
  tributes pairs) of these sets of objects, with
  some extra parameters, if needed.

                                          39
To model a situation that allows us to talk
   about descriptions of sets of records (ob-
   jects) we extend the notion of Pawlak’s
   model of information system to our notion
   of Knowledge System.


The universe of a knowledge system con-
  tains some subsets of U , i.e. elements of
  P(U ).


For example a target data table (after pre-
   processing) and the corresponding repre-
   sentation by Pawlak’s information system,
   and a knowledge system with universe
   U of granularity one are as follows.




                                        40
Target Data Table T0
    a1        a2        a3
   small     small   medium
  medium     small   medium
   small     small   medium
    big      small     small
  medium medium         big
   small     small   medium
    big      small     small
  medium medium         big
   small     small   medium
    big      small   medium
  medium medium        small
   small     small   medium
    big      small      big
  medium medium        small



   Target Information System I0
U       a1         a2       a3
x1     small     small    medium
x2    medium     small    medium
x3     small     small    medium
x4      big      small     small
x5    medium medium         big
x6     small     small    medium
x7      big      small     small
x8    medium medium         big
x9     small     small    medium
x10     big      small    medium
x11 medium medium          small
x12    small     small    medium
x13     big      small      big
x14 medium medium          small



                                   41
Knowledge System of granularity one (all
  objects are one element sets) correspond-
  ing to target table T0 is as follows.

               Target Knowledge System K0
          P 1 (U )     a1       a2       a3
          {x1 }       small    small   medium
          {x2 }      medium    small   medium
          {x3 }       small    small   medium
          {x4 }        big     small    small
          {x5 }      medium   medium     big
          {x6 }       small    small   medium
          {x7 }        big     small    small
          {x8 }      medium   medium     big
          {x9 }       small    small   medium
          {x10 }       big     small   medium
          {x11 }     medium   medium    small
          {x12 }      small    small   medium
          {x13 }       big     small     big
          {x14 }     medium   medium    small




                                                42
Assume now that we have applied some algo-
   rithm ALG1 and it has returned a following
   set
                      D = {D1, D2, ...D7}
       of descriptions.

D1 :    (a1 = s) ∩ (a2 = s) ∩ (a3 = m),



D2 :    (a1 = m) ∩ (a2 = s) ∩ (a3 = m),



D3 :    (a1 = m) ∩ (a2 = m) ∩ (a3 = b),



D4 :    (a1 = m) ∩ (a2 = m) ∩ (a3 = s),



D5 :    (a1 = b) ∩ (a2 = s) ∩ (a3 = s),



D6 :    (a1 = b) ∩ (a2 = s) ∩ (a3 = m),



D7 :    (a1 = b) ∩ (a2 = s) ∩ (a3 = b).


                                            43
Questions


Q1 How well this set of descriptions describes
  our original data i.e. how accurate is the
  algorithm ALG1 we have used to find them,


Q2 how accurate is the knowledge we have
  thus obtained out of our data.


The answer is formulated in terms of the tar-
  get information system with the universe
  U , and the sets S(D) defined (after Pawlak)
  for any description D ∈ D as follows.


               S(D) = {x ∈ U : D}.


We call S(D) the truth set for D.

                                        44
Intuitively, the sets

               S(D) = {x ∈ U :   D}
   contain all records (i.e. their identifiers)
   with the same description given in terms
   of attribute, values of attribute pairs.


The descriptions do not need to utilize all at-
  tributes of the target data, as it is often
  the case, and one of ultimate goals of data
  mining is to find descriptions with as few
  attributes as possible.




                                         45
In association analysis the descriptions can rep-
    resent the frequent item sets.


For example , for a frequent three itemset
   D = i1i2i3, the truth set S(D) represents
   all all transactions that contain items i1, i2, i3.


In general description come in different forms,
    depending on the data mining goal and ap-
    plication.


We define formally a general form of descrip-
  tions as a part of the Descriptive Model




                                             46
For the target data and descriptions Di ∈ D
  presented in the above examples the sets
  S(Di) are as follows.


S1 = S(D1 ) = {x ∈ U : D1 } = {x1 , x3 , x6 , x9 , x12 },



S2 = S(D2 ) = {x ∈ U : D2 } = {x2 },



S3 = S(D3 ) = {x ∈ U : D3 } = {x5 , x8 },



S4 = S(D4 ) = {x ∈ U : D4 } = {x11 , x14 },



S5 = S(D5 ) = {x ∈ U : D5 } = {x4 , x7 },



S6 = S(D6 ) = {x ∈ U : D6 } = {x10 },



S7 = S(D7 ) = {x ∈ U : D7 } = {x13 }.




                                                            47
We represent our results in a form of a Knowl-
  edge System as follows.

             Resulting Knowledge System         K1
             P(U )                      a1 a2   a3
             {x1 , x3 , x6 , x9 , x12 }  s  s   m
             {x2 }                      m   s   m
             {x5 , x8 }                 m  m     b
             {x11 , x14 }               m  m     s
             {x4 , x7 }                 b   s    s
             {x10 }                     b   s    s
             {x13 }                     b   s    b




                   P(U )    a1    a2   a3
                   S1        s     s   m
                   S2       m      s   m
                   S3       m     m    b
                   S4       m     m     s
                   S5       b      s    s
                   S6       b      s    s
                   S7       b      s   b




                                                     48
The representation of data mining results in
  a form of a knowledge system allows us to
  define how good is the knowledge ob-
  tained by a given algorithm.


In our case the knowledge obtained describes
    100% of our target data as

   S1 ∪ S2 ∪ S3 ∩ ... ∪ S7 = {x1, x2, ..., x14} = U.


Observe that the sets S1, ..S7 are also disjoint
  and non-empty, i.e. they form a partition
  of the universe U .


We define such knowledge as exact.



                                              49
Moreover, we can see that the resulting sys-
  tem K1 is more general then the input
  data K0 because its granularity is higher
  the the granularity of K0.


Definition: Granularity of a knowledge sys-
  tem is the maximum of cardinality of its
  granules, i.e. elements of its universe.


The granularity of all Target Knowledge Sys-
  tems is one.


Granularity of K1 is

   max{|S1|, ...|S7|} = max{5, 1, 2, } = 5.




                                              50
Now assume that we have applied to out tar-
  get data T (represented by K0 ) another
  algorithm ALG2 and it returned two de-
  scriptions D1, D2 under a condition that we
  need only descriptions of the length 2 and
  with frequency ≥ 30%. The descriptions
  are:


D1 : (a1 = s) ∩ (a2 = s),




D2 : (a2 = s) ∩ (a3 = m).



Now we evaluate:


S1 = S(D1 ) = {x1 , x3 , x6 , x9 , x12 },




S2 = S(D2 ) = {x1 , x2 , x3 , x6 , x9 , x10 , x12 }.




                                                       51
Incorporating the algorithm parameters im-
   posed by the ALG2 into our Knowledge
   System we obtain the following table.

                  Resulting Knowledge System K2
          P(U )    a1   a2   a3   #of attr   frequency
          S1        s    s    -      2          36%
          S2        -    s   m       2          50%



The sets S1, S2 do not form a partition of the
  universe U as S1 ∩ S2 = ∅ and moreover,
  S1 ∪ S2 = U .


The knowledge obtained by the algorithm ALG2
  is hence not exact.


It describes only 57% of the target data and
    what is described is described following cer-
    tain (frequency) conditions.


Of course K2 is more general then K0.

                                                         52
The algorithm ALG2 generalized the target
  data, even if in an incomplete way.


The formal definitions of Information System,
  Knowledge and Target Knowledge Systems,
  and their granularity and exactness are as
  follows.




                                     53
Knowledge System is an extension of the fol-
  lowing notion of Pawlak’s information sys-
  tem.


Information System is a system

                I = (U, A, VA, f ),
   where U = ∅ is called a set of objects,
   A = ∅, VA = ∅ are called the set of at-
   tributes and values of of attributes, re-
   spectively,
   f is called an information function and
   f : U × A −→ VA




                                       54
A knowledge system based on the informa-
   tion system

                  I = (U, A, VA, f )
    is a system


           KI = (P(U ), A, E, VA, VE , g)

   where


E is a finite set of knowledge attributes (k-
   attributes) such that A ∩ E = ∅.


VE is a finite set of values of k- attributes.




                                            55
g is a partial function called knowledge in-
    formation function(k-function)


         g : P(U ) × (A ∪ E) −→ (VA ∪ VE )
   such that


(i) g | ( x∈U {x} × A) = f


(ii) ∀S∈P(U )∀a∈A((S, a) ∈ dom(g) ⇒ g(S, a) ∈
    VA)


(iii) ∀S∈P(U )∀e∈E ((S, e) ∈ dom(g) ⇒ g(S, e) ∈
     VE )




                                             56
We use the above notion of knowledge sys-
  tem to define the granules of the universe
  and the granularity of the system, an hence
  later, the granularity of the data mining
  process.


Granule:    Any set S ∈ P(U ) i.e. S ⊆ U is
   called a granule of U .


Granularity of S: The cardinality |S| of S is
   called a granularity of S.


Granule Universe:    The set

   GrK = {S ∈ P : ∃b ∈ (E∪A)((S, b) ∈ dom(g))}
   is called a granule universe of KI .


Granularity of K: A number grK = max{|S| :
   S ∈ GrK } is called a granularity of K.

                                          57
A knowledge system K = (P(U ), A, E, VA, VE , g)
  is called exact if and only if all its granules
  GrK form a partition of the universe U .


Operators: In our Model we represent data
  mining algorithms as certain operators.


For example our ALG1 is represented in the
   semantic model by an operator p1 acting
   on some subset of a set K of knowledge
   systems, such that

                 p1(K0) = K1.


ALG2 is represented in the model by an op-
  erator p2 also acting on some (may be dif-
  ferent) subset of the set K of knowledge
  systems, such that

                 p2(K0) = K2.

                                        58
We put all the above observations into a for-
  mal notion of a semantic model.


Semantic Model is a system

              S M = (P(U ), K, G),
   where:

    • U = ∅ is the universe;

    • K = ∅ is a set of knowledge systems,
      called also data mining process states;

    • G = ∅ is the set of operators;

    • Each operator p ∈ G is a partial function
      on the set of all data mining process
      states, i.e. p : K −→ K.



                                         59
The semantic model is always being built for
  a given application.


The target data is represented first in a form
  the target information system with the uni-
  verse U , and then in the form of target
  knowledge system K0, as we showed in our
  examples.




                                       60
The semantic model based on our examples
  is as follows.

              S M = (P(U ), K, G),
   where:

    • U = {x1, x2, ...x14};

    • K = {K0, K1, K2};

    • G = {p1, p2};

    • Each pi ∈ G for (i = 1, 2) is a partial
      function pi : K1 −→ K1, such that
      p1(K0) = K1, p2(K0) = K2.




                                       61
Data Mining as Generalization


We model data mining as a process of gen-
  eralization in terms of the generalization
  relation based on a notion of granularity
  and generalization operators.


Definition: A relation ⊆ K × K is called a
  generalization relation if the following
  condition holds for any K, K ∈ K.

       K   K   if and only if grK ≤ grK ,
   where grK denotes the granularity of K.




                                       62
Observe that for K0, K1, K2 from our exam-
  ples grK0 = 1 ≤ 5 = grK1 ≤ 7 = grK2 , and
  the system K2 is the most general.

   But at the same time K1 is exact and K2 is
   not exact, so we have a trade off between
   exactness and generality.


Definition: an operator g ∈ G is called a gen-
  eralization operator if for any K, K ∈ K
  such that g(K) = K , we have that

                    K    K.


Observe that both operators p1, p2 in our ex-
  ample are generalization operators.




                                        63
Data Mining Operators G


In data mining process the preprocessing and
   data mining proper are disjoint , inclu-
   sive/exlusive categories.


The preprocessing is an integral and very im-
  portant stage of the data mining process
  and needs as careful analysis as the data
  mining proper.


Our framework allows us distinguish two dis-
  joint classes of operators: the preprocess-
  ing operators Gprep and data mining proper
  operators Gdm and we put

                G = Gprep ∪ Gdm.


                                        64
We provide also a detailed formal definitions,
  their motivation, and discussion of these
  two classes.


Data Mining and preprocessing operators de-
  fine different kind of generalizations.


The model presented in our examples didn’t
  include the preprocessing stage; it used the
  data mining proper operators only.




                                        65
The main idea behind the concept of the
  operator is to capture not only the fact
  that data mining techniques generalize the
  data but also to categorize existing meth-
  ods.


We define within our model three classes of
  data mining operators: classification Gclass,
  clustering Gclust, and association Gassoc.


We don’t include in our analysis purely sta-
  tistical methods like regression, etc...




                                        66
We prove the following theorem.


Theorem Let Gclass, Gclust and Gassoc be the
  sets of all classification, clustering, and as-
  sociation operators, respectively.


    The following conditions hold.


(1) Gclass = Gclust = Gassoc


(2) Gassoc ∩ Gclass = ∅,


(3) Gassoc ∩ Gclust = ∅.




                                          67
Data Mining Process


Definition   Any sequence

             K1, K2, ....Kn (n ≥ 1)
   of data mining states is called a data pre-
   processing process, if there is a prepro-
   cessing operator G ∈ Gprep, such that

       G(Ki) = Ki+1,     i = 1, 2, ...n − 1.


Definition Any sequence

             K1, K2, ....Kn (n ≥ 1)
   of data mining states is called a data min-
   ing proper process , if there is a data
   mining proper operator G ∈ Gdm, such
   that

       G(Ki) = Ki+1,     i = 1, 2, ...n − 1.

                                           68
The data mining process consists of the pre-
  processing process (that might be empty)
  and the data mining proper process.


We know that the sets Gprep and Gdm are dis-
  joint. This justifies the the following defi-
  nition.


Definition Data mining process process is any
  sequence

             K1, K2, ....Kn (n ≥ 1)
   of data mining states, such that

              K1, ..Ki (0 ≤ i ≤ n)
   is a preprocessing process and

                  Ki+1, ...., Kn
   is a data mining proper process.

                                       69
Granular Model
Syntax- Semantic Duality of Data Mining




Granular Model is a system
   GM = ( S M, DM, |= ) where:

    • SM is a Semantic Model;

    • DM is a Descriptive Model;

    • |= ⊆ P(U ) × E is called a satisfaction
      relation, where U is the universe of SM
      and E is the set of descriptions defined
      by the DM.


Satisfaction |= establishes relationship between
   the semantic model and the descriptive model.

                                        70
Descriptive Model

For any Semantic Model S M = (P(U ), K, G, )
  we associate with it its descriptive counter-
  part defined below.


A Descriptive Model is a system

              DM = ( L, E, DK ),
   where:


L = ( A, E ) is called a descriptive lan-
  guage;


A is a countably infinite set called the alpha-
   bet;


E = ∅ and E ⊆ A∗ is the set of descriptive
  expressions of L;

                                         71
DK = ∅ and DK ⊆ P(E) is a set of descrip-
  tions of knowledge states.


As in a case of semantic model, we build the
   descriptive model for a given application.


We define here only a general form of the
  model.


We assume however, that whatever is the ap-
  plication, the descriptions are always build
  in terms of attributes and values of the
  attributes, some logical connectives, some
  predicates and some extra parameters, if
  needed.


The commonly used descriptions have the form
  (a = v) to denote that the attribute a has
  a value v, but one might also use, as it is
  often done, a predicate form a(v) or a(x, v)
  instead.

                                        72
For example, a neural network with its nodes
   and weights can be seen as a formal de-
   scription (in an appropriate descriptive lan-
   guage), and the knowledge states would
   represent changes in parameters during the
   neural network training process.


The model we build here is a model for, what
  we call a descriptive data mining, i.e. the
  data mining for which the goal of the data
  mining process is to produce a set of de-
  scriptions in a language easily comprehen-
  sible to the user.


For that purpose in the model we identify the
   decision tree constructed by the classifica-
   tion by Decision Tree algorithm with the
   set of discriminant rules obtained from the
   tree.

                                          73
Granular Model is a system
   GM = ( S M, DM, |= ) where:

    • SM is a Semantic Model;

    • DM is a Descriptive Model;

    • |= ⊆ P(U ) × E is called a satisfaction
      relation, where U is the universe of SM
      and E is the set of descriptions defined
      by the DM.


Satisfaction |= establishes relationship between
   the semantic model and the descriptive model.


We define the Satisfaction |= component of
  the Granular Model DM in the following
  stages.


Stage1 For each K ∈ K, we define its own
   descriptive language LK = ( AK , EK ).
                                        74
Stage2 For each K ∈ K, and descriptive ex-
   pression F ∈ EK , we define what does it
   mean that D satisfied in K; i.e. we define
   a satisfaction relation |=K .


Stage3 For each K ∈ K, and descriptive ex-
   pression F ∈ EK , we define what does it
   mean that D is true K, i.e. |=K D.
Stage4 We use the satisfaction relation |=K
   to define, for each K ∈ K, the set DK ⊆
   P(EK ) of descriptions of its own knowl-
   edge.


Stage5 We use the languages LK to define
   the descriptive language L.


Stage6    We use the descriptive expressions
   EK of LK to define the set E of descriptive
   expressions of L.


Stage7 We use the satisfaction relations |=K
   to define the satisfaction relation |= of
   the Granular Model GM.



                                        75
Part 3:    TRACING THE
             HISTORY
Mathematics Genealogy Project
genealogy.math.ndsu.nodak.edu




                            76
We all have a history


We are all mathematicians


Mission Statement of the Mathematics Ge-
   nealogy Project defines a mathematician
   as follows.


” ... Throughout this project when we use
   the word ”mathematics” or ”mathemati-
   cian” we mean that word in a very inclu-
   sive sense. Thus, all relevant data from
   statistics, computer science, or operations
   research is welcome....”


Computer Science classification within the
  project is: Mathematics Subject Classifi-
  cation: 68Computer Science.

                                        77
The Genealogy Project solicits information from
  all schools who participate in the devel-
  opment of research level mathematics and
  from all individuals who may know desired
  information. It means Computer Science
  as well.


For them, and the history, we are all math-
   ematicians.




                                      78
Below are some links (sequences of connected
  people) for a computer scientist.


Any two people in the sequence are listed in
   order PhD student, Adviser.


If a person has more then one adviser the ad-
    viser is preceded with a number; i.e.


adviser 1 is listed as 1. adviser Name,


adviser 2 is listed as 2. adviser Name, etc...




                                          79
A mathematician would say:


 For any element A of the sequence, if A
  has more then one adviser, then for any
  1 ≤ k ≤ n , an adviser k is listed as k.Name
  of the adviser k,
  and the number in front of the name is
  omitted otherwise.




                                        80
Link to Nicolaus Copernicus
          (Mikolaj Kopernik)
        He has 1598 descendants


Anita Wasilewska, Ph.D. Warsaw University,
  1975, Poland, Helena Rasiowa, Ph.D. War-
  saw University,1950, Andrzej Mostowski,
  Ph.D. Warsaw University, 1938, 2. Alfred
  Tarski, Ph.D. Warsaw University, 1924,
  Stanislaw Lesniewski, Ph.D. University of
  Lvov, 1912, Kazimierz Twardowski, Ph.D.
  Universitat Wien, 1891, Franz Clemens
  Brentano, Ph.D. Eberhard Karls Universi-
  tat, Tubingen 1862, 2. Friedrich Adolf
  Trendelenburg, Dr. phil. Universitat Leipzig,
  1826, 1. Georg Ludwig Konig, Artium
  Liberalium Magister, Georg August Univer-
  sitat, Gottingen, 1790, Christian Heyne,
  Magister Juris, Universitat Leipzig, 1752,

                                       81
1. Johann August Bach, Magister philosophiae,
   Universitat Leipzig, 1744, 1.Christian Kust-
   ner, Magister philosophiae, Universitat Leipzig,
   1742, Johann Ernesti, Magister philosophiae,
   Universitat Leipzig, 1730, Johann Gesner,
   Magister artium, Friedrich Schiller Univer-
   sitat Jena, 1715, Johann Buddeus, Magis-
   ter artium, Martin Luther Universitat, Halle
   Wittenberg, 1687, Michael Walther, Jr.,
   Magister artium, Theol. Dr., Martin Luther
   Universitat, Halle Wittenberg, 1661, 1687,
   2.Johann Quenstedt, Magister artium, Theol.
   Dr., Universitat Helmstedt, Martin Luther
   Universitat,b Halle Wittenberg, 1643, 1644,
   Christoph Notnagel, Magister artium, Mar-
   tin Luther Universitat, Halle Wittenberg,
   1630, Ambrosius Rhodius, Magister artium,
   Medicinae Dr., Martin Luther Universitat,
   Halle Wittenberg, 1600, 1610,


                                         82
1.Melchior Jostel, Magister artium, Medici-
  nae Dr., Martin Luther Universitat, Halle
  Wittenberg, 1583, 1600, 1.Valentin Otto,
  Magister artium, Martin Luther Universi-
  tat, Halle Wittenberg, 1570, Georg Joachim
  Rheticus, Magister artium, Martin Luther
  Universitat, Halle Wittenberg 1535,


2. Nicolaus Copernicus, Juris utriusque,
  Doctor, Uniwersytet Jagiellonski (Cra-
  cow Jagellonian University), Universita
  di Bologna, Universita degli Studi di
  Ferrara, Universita di Padova, 1499,
  Poland-Italy,


2.Domenico Novara da Ferrara, Universita di
  Firenze, 1483, 1. Johannes Regiomon-
  tanus, Magister artium, Universitat Leipzig,
  Universitat Wien, 1457,

                                        83
Georg von Peuerbach, Magister artium, Uni-
   versitat Wien, 1440, Johannes von Gmunden,
   Magister artium, Universitat Wien, 1406,
   Heinrich von Langenstein, Magister artium,
   Theol. Dr., Universite de Paris, 1363,
   1375, unknown.




Georg von Peuerbach, 1375 is my ”oldest”
   ancestor.


THERE ARE 3 more lines of ancestry; also
  interesting, if not so illustrious. Here they
  are.




                                         84
Link to Gottfried Leibniz
          (54209 descendants),
            Immanuel Kant
        ( 2176 descendants), and
   Desiderius Erasmus of Rotterdam
          (57416 descendants)


Anita Wasilewska, Ph.D. Warsaw University,
  1975, Poland, Helena Rasiowa, Ph.D. War-
  saw University, 1950, Andrzej Mostowski,
  Ph.D. Warsaw University, 1938, 2. Alfred
  Tarski, Ph.D. Warsaw University, 1924,
  Stanislaw Lesniewski, Ph.D. University of
  Lvov, 1912, Kazimierz Twardowski, Ph.D.
  Universitat Wien, 1891, Franz Clemens
  Brentano, Ph.D. Eberhard Karls Univer-
  sitat, Tubingen 1862, 2. Friedrich Adolf
  Trendelenburg, Dr. Phil. Universitat Leipzig,
  1826, 2. Karl Reinhold, PhD.,

                                       85
Immanuel Kant, Ph.D. Universitat Konigs-
  berg 1770,


Martin Knutzen, Dr. Phil. Universitat Konigs-
  berg, 1732, Christian von Wolff, Dr. phil.,
  Universitat Leipzig, 1700,


2. Gottfried Leibniz, Dr.      jur.   Universitat
   Altdorf, 1666,


2.     Christiaan Huygens, Artium Liberalium
     Magister, Jurisutriusque Doctor, Universiteit
     Leiden, Universite d’Angers, 1647, 1655,
     Frans van Schooten, Jr., Artium Liberal-
     ium Magister, Universiteit Leiden, 1635,
     Jacobus Golius, Artium Liberalium Magis-
     ter, Philosophiae Doctor Universiteit Lei-
     den, 1612, 1621, 1. Willebrord (Snel van
     Royen) Snellius, Artium Liberalium Magis-
     ter, Universiteit Leiden, 1607, 2. Rudolph
                                           86
(Snel van Royen) Snellius, Artium liberal-
  ium Magister, Universitat zu Koln, Ruprecht
  Karls Universitat Heidelberg, 1572, 1. Valen-
  tine Naibod, Magister Artium, Martin Luther
  Universitat, Halle Wittenberg, Universitat
  Erfur, Erasmus Reinhold, Magister Artium,
  Martin Luther Universitat, Halle Witten-
  berg, 1535, Jakob Milich, Liberalium Ar-
  tium Magister, Med. Dr., Albert Ludwigs
  Universitat Freiburg, Breisgau, Universitat
  Wien, 1520, 1524,


Desiderius Erasmus Roterodamus (sometimes
  known as Desiderius Erasmus of Rot-
  terdam), University of Paris, Theologiae
  Baccalaureus, College de Montaigu, 1497,


Jan Standonck, Magister Artium, Theol. Dr.,
  College Sainte-Barbe, College de Montaigu,
  1474, 1490, unknown
Link to Pierre-Simon Laplace
         ( 50295 descendants) and
       Jean Le Rond d’Alembert


Anita Wasilewska, Ph.D. Warsaw University,
   1975, Poland, Helena Rasiowa, Ph.D. War-
   saw University, 1950, Andrzej Mostowski,
   Ph.D. Warsaw University, 1938, 1. Kaz-
   imierz Kuratowski, Ph.D. Warsaw Uni-
   versity, 1921, 1. Stefan Mazurkiewicz,
   Ph.D. University of Lvov, 1913, Waclaw
   Sierpinski, Ph.D. Uniwersytet Jagiellonski,
   1906, 1. Stanislaw Zaremba, Ph.D. Uni-
   versite Paris IV-Sorbonne, 1889, Gaston
   Darboux, Ph.D. Ecole Normale Superieure,
   Paris, 1866, Michel Chasles, Ph.D. Ecole
   Polytechnique, 1814, Simeon Poisson, Ph.D.
   Ecole Polytechnique, 1800, 2. Pierre-Simon
   Laplace, Ph.D., Jean Le Rond d’Alembert,
   unknown


                                      87
Link to Emile Borel
            (2506 descendants),
              Leonhard Euler
            (52555 descendants)


Anita Wasilewska, Ph.D. Warsaw University,
   1975, Poland, Helena Rasiowa, Ph.D. War-
   saw University, 1950, Andrzej Mostowski,
   Ph.D. Warsaw University, 1938, 2. Zyg-
   munt Janiszewski, Ph.D. Ecole Normale
   Superieure Paris, 1911, Henri Lebesgue,
   Ph.D. Universite Henri Poincare Nancy 1,
   1902, Emile Borel, Ph.D. Ecole Normale
   Superieure, Paris, 1893, Gaston Darboux,
   Ph.D. Ecole Normale Superieure, Paris, 1866,
   Michel Chasles, Ph.D., Ecole Polytechnique,
   1814, Simeon Poisson, Ph.D. Ecole Poly-
   technique, 1800,


                                       88
1. Joseph Lagrange, no degree, student of
  Leonhard Euler, Ph.D. Universitat Basel,
  1726, Dr. med. Universitat Basel, 1694,
  Dr. hab. Sci. Universitat Basel, 1684,
  Gottfried Leibniz, Dr. jur. Universitat Alt-
  dorf, 1666, 1.Johann Bernoulli, Dr. med.
  Universitt Basel 1694, Jacob Bernoulli, Dr.
  hab. Sci. Universitt Basel, 1684, Got-
  tfried Wilhelm Leibniz, Dr. jur. Universitt
  Altdorf, 1666, 1. Erhard Weigel, Ph.D.
  Universitt Leipzig, 1650, unknown.




                                        89
Link to Andrei Markov
         (4824 descendants), and
  Pafnuty Chebyshev (5964 descendants)


Anita Wasilewska, Ph.D. Warsaw University,
   1975, Poland, Helena Rasiowa, Ph.D. War-
   saw University, 1950, Andrzej Mostowski,
   Ph.D. Warsaw University, 1938, 1. Kaz-
   imierz Kuratowski, Ph.D. Warsaw Uni-
   versity,1921, 1. Stefan Mazurkiewicz,
   Ph.D. University of Lvov, 1913, Waclaw
   Sierpinski, Ph.D. Uniwersytet Jagiellonski,
   1906, 2. Georgy Fedoseevich Voronoy,
   Ph.D. University of St. Petersburg, 1896,
   Andrei Markov, Ph.D. University of St.
   Petersburg, 1884, Pafnuty Chebyshev,
   Ph.D. University of St. Petersburg, 1849,
   Nikolai Dmitrievich Brashman, Ph.D. Moscow
   State University, 1834, Joseph Johann von
   Littrow, Ph.D., unknown

                                      90
MY PhD COUSINS include


Kurt Goedel


Alain Turing


Alonso Church


Roman Sikorski


Zdzislam Pawlak


and many others....I am sure some of them
  in this room!


                                    91
In Stony Brook CS Department I traced 10
    of them.




WE ALL ARE A BIG SCIENTIFIC
 FAMILY!




                                   92

More Related Content

What's hot

Aok – areas of knowing mathematics
Aok – areas of knowing mathematicsAok – areas of knowing mathematics
Aok – areas of knowing mathematicst0nywilliams
 
Mathematics power point 2012 13
Mathematics power point 2012 13Mathematics power point 2012 13
Mathematics power point 2012 13Kieran Ryan
 
The Importance of Math and 10 Greatest Msthemeticians.
The Importance of Math and 10 Greatest Msthemeticians.The Importance of Math and 10 Greatest Msthemeticians.
The Importance of Math and 10 Greatest Msthemeticians.Vhinz Lorayes
 
Arithmetization of Analysis
Arithmetization of AnalysisArithmetization of Analysis
Arithmetization of Analysissheisirenebkm
 
Augustin louis cauchy
Augustin louis cauchyAugustin louis cauchy
Augustin louis cauchyCss Founder
 
Number Triangles (Triangular Arrays of Numbers): Pascal's Triangle, Others, a...
Number Triangles (Triangular Arrays of Numbers): Pascal's Triangle, Others, a...Number Triangles (Triangular Arrays of Numbers): Pascal's Triangle, Others, a...
Number Triangles (Triangular Arrays of Numbers): Pascal's Triangle, Others, a...Jonipol Fortaliza
 
7 contributionsof indian mathematicians to mathematics
7 contributionsof indian mathematicians to mathematics7 contributionsof indian mathematicians to mathematics
7 contributionsof indian mathematicians to mathematicsBGS Model Public School
 
The Soul of Computer Science - Prof. Salvador Lucas Alba
The Soul of Computer Science - Prof. Salvador Lucas AlbaThe Soul of Computer Science - Prof. Salvador Lucas Alba
The Soul of Computer Science - Prof. Salvador Lucas AlbaFacultad de Informática UCM
 
Linear equations lesson 6 slope
Linear equations lesson 6 slopeLinear equations lesson 6 slope
Linear equations lesson 6 slopeErik Tjersland
 

What's hot (18)

Math and chamistry
Math and chamistryMath and chamistry
Math and chamistry
 
Aok – areas of knowing mathematics
Aok – areas of knowing mathematicsAok – areas of knowing mathematics
Aok – areas of knowing mathematics
 
Tok- Maths Presentation
Tok- Maths PresentationTok- Maths Presentation
Tok- Maths Presentation
 
Mathematics power point 2012 13
Mathematics power point 2012 13Mathematics power point 2012 13
Mathematics power point 2012 13
 
Womenphd112421
Womenphd112421Womenphd112421
Womenphd112421
 
The Importance of Math and 10 Greatest Msthemeticians.
The Importance of Math and 10 Greatest Msthemeticians.The Importance of Math and 10 Greatest Msthemeticians.
The Importance of Math and 10 Greatest Msthemeticians.
 
Arithmetization of Analysis
Arithmetization of AnalysisArithmetization of Analysis
Arithmetization of Analysis
 
Augustin louis cauchy
Augustin louis cauchyAugustin louis cauchy
Augustin louis cauchy
 
Number Triangles (Triangular Arrays of Numbers): Pascal's Triangle, Others, a...
Number Triangles (Triangular Arrays of Numbers): Pascal's Triangle, Others, a...Number Triangles (Triangular Arrays of Numbers): Pascal's Triangle, Others, a...
Number Triangles (Triangular Arrays of Numbers): Pascal's Triangle, Others, a...
 
Mathematics
MathematicsMathematics
Mathematics
 
Mathematicians
MathematiciansMathematicians
Mathematicians
 
Ch01 4
Ch01 4Ch01 4
Ch01 4
 
maths ppt
maths ppt maths ppt
maths ppt
 
7 contributionsof indian mathematicians to mathematics
7 contributionsof indian mathematicians to mathematics7 contributionsof indian mathematicians to mathematics
7 contributionsof indian mathematicians to mathematics
 
Contribution of mathematicians by Pratima Nayak
Contribution of mathematicians by Pratima NayakContribution of mathematicians by Pratima Nayak
Contribution of mathematicians by Pratima Nayak
 
The Soul of Computer Science - Prof. Salvador Lucas Alba
The Soul of Computer Science - Prof. Salvador Lucas AlbaThe Soul of Computer Science - Prof. Salvador Lucas Alba
The Soul of Computer Science - Prof. Salvador Lucas Alba
 
Linear equations lesson 6 slope
Linear equations lesson 6 slopeLinear equations lesson 6 slope
Linear equations lesson 6 slope
 
Maths
MathsMaths
Maths
 

Similar to Descriptive Granularity - Building Foundations of Data Mining

17th Century Mathematics
17th Century Mathematics17th Century Mathematics
17th Century MathematicsNacRiz Rabino
 
Top 10 indian mathematician
Top 10 indian mathematician Top 10 indian mathematician
Top 10 indian mathematician Menka Madhok
 
Earlier a place value notation number system had evolved over a leng.pdf
Earlier a place value notation number system had evolved over a leng.pdfEarlier a place value notation number system had evolved over a leng.pdf
Earlier a place value notation number system had evolved over a leng.pdfbrijmote
 
History of Calculus
History of CalculusHistory of Calculus
History of Calculusfuture educ
 
A Course in Mathematical Logic for Mathematicians, Second Edition.pdf
A Course in Mathematical Logic for Mathematicians, Second Edition.pdfA Course in Mathematical Logic for Mathematicians, Second Edition.pdf
A Course in Mathematical Logic for Mathematicians, Second Edition.pdfssuser2c74e2
 
New microsoft office word document (2)
New microsoft office word document (2)New microsoft office word document (2)
New microsoft office word document (2)Amelia Arshad
 
Kiosk presentation
Kiosk presentationKiosk presentation
Kiosk presentationJaypee Tan
 
Renè descartes
Renè descartesRenè descartes
Renè descartesmartanna
 
A brief history of mathematics
A brief history of mathematicsA brief history of mathematics
A brief history of mathematicsAlicia Jane
 
Gottfried Wilhelm Leibniz
Gottfried Wilhelm LeibnizGottfried Wilhelm Leibniz
Gottfried Wilhelm LeibnizOmer Shaikh
 
Edwardian Proofs as Futuristic Programs
Edwardian Proofs as Futuristic ProgramsEdwardian Proofs as Futuristic Programs
Edwardian Proofs as Futuristic ProgramsValeria de Paiva
 
Mathematical problems130i q
Mathematical problems130i qMathematical problems130i q
Mathematical problems130i qMark Hilbert
 
Sociology of the Family1. Early on in the course Prof. C. stat.docx
Sociology of the Family1. Early on in the course Prof. C. stat.docxSociology of the Family1. Early on in the course Prof. C. stat.docx
Sociology of the Family1. Early on in the course Prof. C. stat.docxjensgosney
 
Famous Polish Mathematicians Kinga Sekuła 2d
Famous Polish Mathematicians Kinga Sekuła 2dFamous Polish Mathematicians Kinga Sekuła 2d
Famous Polish Mathematicians Kinga Sekuła 2dmagdajanusz
 

Similar to Descriptive Granularity - Building Foundations of Data Mining (20)

17th Century Mathematics
17th Century Mathematics17th Century Mathematics
17th Century Mathematics
 
Top 10 indian mathematician
Top 10 indian mathematician Top 10 indian mathematician
Top 10 indian mathematician
 
A First Course In Topology
A First Course In TopologyA First Course In Topology
A First Course In Topology
 
Earlier a place value notation number system had evolved over a leng.pdf
Earlier a place value notation number system had evolved over a leng.pdfEarlier a place value notation number system had evolved over a leng.pdf
Earlier a place value notation number system had evolved over a leng.pdf
 
History of Calculus
History of CalculusHistory of Calculus
History of Calculus
 
A Course in Mathematical Logic for Mathematicians, Second Edition.pdf
A Course in Mathematical Logic for Mathematicians, Second Edition.pdfA Course in Mathematical Logic for Mathematicians, Second Edition.pdf
A Course in Mathematical Logic for Mathematicians, Second Edition.pdf
 
New microsoft office word document (2)
New microsoft office word document (2)New microsoft office word document (2)
New microsoft office word document (2)
 
Kiosk presentation
Kiosk presentationKiosk presentation
Kiosk presentation
 
Fractal geometry
Fractal geometryFractal geometry
Fractal geometry
 
Philosophy of Mathematics
Philosophy of MathematicsPhilosophy of Mathematics
Philosophy of Mathematics
 
Renè descartes
Renè descartesRenè descartes
Renè descartes
 
A brief history of mathematics
A brief history of mathematicsA brief history of mathematics
A brief history of mathematics
 
Reviewer-in-HOM.docx
Reviewer-in-HOM.docxReviewer-in-HOM.docx
Reviewer-in-HOM.docx
 
Isaac newton
Isaac newtonIsaac newton
Isaac newton
 
Gottfried Wilhelm Leibniz
Gottfried Wilhelm LeibnizGottfried Wilhelm Leibniz
Gottfried Wilhelm Leibniz
 
Philosophy of science
Philosophy of sciencePhilosophy of science
Philosophy of science
 
Edwardian Proofs as Futuristic Programs
Edwardian Proofs as Futuristic ProgramsEdwardian Proofs as Futuristic Programs
Edwardian Proofs as Futuristic Programs
 
Mathematical problems130i q
Mathematical problems130i qMathematical problems130i q
Mathematical problems130i q
 
Sociology of the Family1. Early on in the course Prof. C. stat.docx
Sociology of the Family1. Early on in the course Prof. C. stat.docxSociology of the Family1. Early on in the course Prof. C. stat.docx
Sociology of the Family1. Early on in the course Prof. C. stat.docx
 
Famous Polish Mathematicians Kinga Sekuła 2d
Famous Polish Mathematicians Kinga Sekuła 2dFamous Polish Mathematicians Kinga Sekuła 2d
Famous Polish Mathematicians Kinga Sekuła 2d
 

More from Distinguished Lecturer Series - Leon The Mathematician

More from Distinguished Lecturer Series - Leon The Mathematician (20)

Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
 
Compressive Spectral Image Sensing, Processing, and Optimization
Compressive Spectral Image Sensing, Processing, and OptimizationCompressive Spectral Image Sensing, Processing, and Optimization
Compressive Spectral Image Sensing, Processing, and Optimization
 
Influence Propagation in Large Graphs - Theorems and Algorithms
Influence Propagation in Large Graphs - Theorems and AlgorithmsInfluence Propagation in Large Graphs - Theorems and Algorithms
Influence Propagation in Large Graphs - Theorems and Algorithms
 
Defying Nyquist in Analog to Digital Conversion
Defying Nyquist in Analog to Digital ConversionDefying Nyquist in Analog to Digital Conversion
Defying Nyquist in Analog to Digital Conversion
 
Opening Second Greek Signal Processing Jam
Opening Second Greek Signal Processing JamOpening Second Greek Signal Processing Jam
Opening Second Greek Signal Processing Jam
 
Sparse and Low Rank Representations in Music Signal Analysis
 Sparse and Low Rank Representations in Music Signal  Analysis Sparse and Low Rank Representations in Music Signal  Analysis
Sparse and Low Rank Representations in Music Signal Analysis
 
Nonlinear Communications: Achievable Rates, Estimation, and Decoding
Nonlinear Communications: Achievable Rates, Estimation, and DecodingNonlinear Communications: Achievable Rates, Estimation, and Decoding
Nonlinear Communications: Achievable Rates, Estimation, and Decoding
 
Sparsity Control for Robustness and Social Data Analysis
Sparsity Control for Robustness and Social Data AnalysisSparsity Control for Robustness and Social Data Analysis
Sparsity Control for Robustness and Social Data Analysis
 
Mixture Models for Image Analysis
Mixture Models for Image AnalysisMixture Models for Image Analysis
Mixture Models for Image Analysis
 
Semantic 3DTV Content Analysis and Description
Semantic 3DTV Content Analysis and DescriptionSemantic 3DTV Content Analysis and Description
Semantic 3DTV Content Analysis and Description
 
Sparse and Redundant Representations: Theory and Applications
Sparse and Redundant Representations: Theory and ApplicationsSparse and Redundant Representations: Theory and Applications
Sparse and Redundant Representations: Theory and Applications
 
Tribute to Nicolas Galatsanos
Tribute to Nicolas GalatsanosTribute to Nicolas Galatsanos
Tribute to Nicolas Galatsanos
 
Data Quality: Not Your Typical Database Problem
Data Quality: Not Your Typical Database ProblemData Quality: Not Your Typical Database Problem
Data Quality: Not Your Typical Database Problem
 
From Programs to Systems – Building a Smarter World
From Programs to Systems – Building a Smarter WorldFrom Programs to Systems – Building a Smarter World
From Programs to Systems – Building a Smarter World
 
Artificial Intelligence and Human Thinking
Artificial Intelligence and Human ThinkingArtificial Intelligence and Human Thinking
Artificial Intelligence and Human Thinking
 
Farewell to Disks: Efficient Processing of Obstinate Data
Farewell to Disks: Efficient Processing of Obstinate DataFarewell to Disks: Efficient Processing of Obstinate Data
Farewell to Disks: Efficient Processing of Obstinate Data
 
Artificial Intelligence and Human Thinking
Artificial Intelligence and Human ThinkingArtificial Intelligence and Human Thinking
Artificial Intelligence and Human Thinking
 
State Space Exploration for NASA’s Safety Critical Systems
State Space Exploration for NASA’s Safety Critical SystemsState Space Exploration for NASA’s Safety Critical Systems
State Space Exploration for NASA’s Safety Critical Systems
 
Web Usage Miningand Using Ontology for Capturing Web Usage Semantic
Web Usage Miningand Using Ontology for Capturing Web Usage SemanticWeb Usage Miningand Using Ontology for Capturing Web Usage Semantic
Web Usage Miningand Using Ontology for Capturing Web Usage Semantic
 
The Tower of Knowledge A Generic System Architecture
The Tower of Knowledge A Generic System ArchitectureThe Tower of Knowledge A Generic System Architecture
The Tower of Knowledge A Generic System Architecture
 

Recently uploaded

Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxJanEmmanBrigoli
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 

Recently uploaded (20)

Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 

Descriptive Granularity - Building Foundations of Data Mining

  • 1. DESCRIPTIVE GRANULARITY Building Foundations of Data Mining In Memory of my Professors: Zdzislaw Pawlak, Helena Rasiowa and Roman Sikorski Anita Wasilewska Computer Science Department Stony Brook University Stony Brook, NY 1
  • 2. Part 1: INTRODUCTION 2
  • 3. We all have scientific history; All problems we work on have history; It is important to trace history of problems we work on; We all build scientific history; The future belongs to us, and so does the past. 3
  • 4. We all have scientific history; Here is my LATEST history (of building Foun- dations of Data Mining) 1995- 1998 I supervised PhD Thesis of Ernestina Menasalvas, now Professor and a Vice-Rector of Madrid Polytechnic. We (with some others) went from building models for concrete implementations (1996- 2002) to developing a general language for Founda- tions of Data Mining (2002 -2004) to building a general foundational model for Data Mining (2005- ). 4
  • 5. It has been a slow process but finally a com- munity and specialized conferences devel- oped, books started to appear: Foundations and Novel Approaches in Data Mining, T.Y. Lin, S. Ohsuga, C. J. Liau, and X. Hu , editors, Springer 2006, Data Mining: Foundations and Practice, Tsau Young Lin, Ying Xie, Anita Wasilewska, Churn-Jung Liau, editors, Studies in Com- putational Intelligence (SCI)118, Springer- Verlag 2008, and a field Foundations of Data Mining was created. We all build the scientific history and it takes TIME and patience to do so. 5
  • 6. Our work in Data Mining Foundations ma- tured and finally we were invited by T.Y. LIN to write a 20 pages long entry about our research in the Encyclopedia of Com- plexity and System Science published by Springer in 2008. The Encyclopedia is Springer’s latest and prestigious initiative with its Board of Ed- itors including between others Ahmed Ze- wail, Nobel in Chemistry, Thomas Schelling, Nobel in Economics, Richard E. Stearns, 1993 Turing Award, Pierre-Louis Lions, 1994 Fields Medal, and Lotfi Zadeh, IEEE Medal of Honor. All entries were by invitation only and the in- clusion of our work shows the recognition of the need for foundational studies in newly developing domains. 6
  • 7. All problems we work on have history Short History of Foundational Studies The origins of Foundational Studies can be traced back to David Hilbert, a German mathematician, recognized as one of the most influential and universal mathemati- cians of the 19th and early 20th centuries. 7
  • 8. Hilbert Problems: In 1900 he proposed at the Paris conference of the International Congress of Mathematicians 23 problems for the fu- ture century. Several of them turned out to be very influ- ential for 20th century mathematics and later Computer Science. Of the cleanly-formulated Hilbert problems, TEN problems: 3, 7, 10, 11, 13, 14, 17, 19, 20, and 21 have solutions that are ac- cepted by consensus. 8
  • 9. TWO Problems: 1, 2 are FOUNDATIONAL Problems; 1 concerning Continuum Hypoth- esis was solved by Cohen in 1963, and 2 concerning Consistency of Arithmetic was solved by and Godel and Gentzen in 1936 FIVE Problems: 5, 9, 15, 18, and 22 have partial solutions, FOUR problems: 4, 6, 16, and 23 are too loosely formulated to be ever described as possible to be solved. TWO Problems: 8 (the Riemann Hypothe- sis, along with the Goldbach conjecture is a part of it) and 12 are still OPEN, both being in number theory. 9
  • 10. Riemann hypothesis was proposed by Bern- hard Riemann (1859) It is a conjecture about the distribution of the zeros of the Riemann zeta function which states that all non-trivial zeros have real part 1/2. The Riemann hypothesis implies results about the distribution of prime numbers that are in some ways as good as possible. Along with suitable generalizations, it is con- sidered by some mathematicians to be the most important unresolved problem in pure mathematics. 10
  • 11. Pierre Deligne proved in 1973 analogue of the Riemann Hypothesis for zeta functions of varieties defined over finite fields. The full version of the hypothesis remains un- solved, although computer calculations have shown that the first 10 trillion zeros lie on the critical line. 11
  • 12. Goldbach’s conjecture (1742) is one of the oldest unsolved problems in number theory and in all of mathematics. It states: Every even integer greater than 2 can be expressed as the sum of two primes For example; 4 = 2 + 2, 6 = 3 + 3, 8 = 3 + 5, 10 = 7 + 3, or 5 + 5, 12 = 5 + 7, 14 = .... T. Oliveira e Silva is running a distributed com- puter search that has verified the conjec- ture for n ≤ 1.609 × 1018 and some higher small ranges up to 4 × 1018. 12
  • 13. Hilbert Program Hilbert proposed, in 1920 a research project that became known as Hilbert’s Program. 1. He wanted mathematics to be formulated on a solid and complete logical founda- tion. 2. He believed that in principle this could be done, by showing that all of mathematics follows from a correctly-chosen finite sys- tem of axioms and that some such axiom system is provably consistent. 3. He also believed that one can have such a system in which proofs of theorems can be deduced automatically from the way the theorems are built. 13
  • 14. In 1931 Kurt Godel showed that Hilbert’s grand plan 1. and 2. was impossible as stated. Godel proved in what is now called Godel’s Incompleteness Theorem that any non contradictory formal system, which was com- prehensive enough to include at least arith- metic, cannot demonstrate its complete- ness by way of its own axioms. In 1933-34 Gerhard Gentzen gave a positive answer to 3. in a case of classical proposi- tional logic, and partially positive answer in case of (semi-undecidable) predicate logic. Nevertheless Hilbert’s and Godel’s work led to the development of recursion theory and then mathematical logic and foun- dations of mathematics as autonomous disciplines. 14
  • 15. Gentzen’s work led to the development of Proof Theory and Automated Theorem Prov- ing as separate Mathematics and Computer Science domains. Godel inspired works of Alonzo Church and Alan Turing that became the basis for theoretical computer science and also led to the further development of a unique phenomenon called the Polish School of Mathematics and later to the creation of Foundational Studies in Computer Science. 15
  • 16. Personal History: my Master Thesis in Com- puter Science (under Pawlak and Rasiowa) consisted of a solution of Gentzen’s con- juncture for Modal S4 and S5 Logics and consequently I also developed first world theorem prover for S4 Modal Logic in 1967. As a result I have spent first 15 years of my scientific life (before coming to USA) work- ing in Proof Theory for non-classical log- ics, formulated (as a pure mathematician) a General Theory of Gentzen Type For- malizations and established various re- sults about connections and relationships between certain Classes of Logics, For- mal Languages and Theory of Programs (as computer scientist). 16
  • 17. Polish School of Mathematics The term Polish School of Mathematics refers to groups of mathematicians of the 1920’s and 1930’s working on common subjects. The main two groups were situated in War- saw and Lvov (now Lviv, the biggest city in Western Ukraine). We talk hence more specifically about War- saw and Lvov Schools of Mathematics and additionally of Warsaw-Lvov School of Logic working in Warsaw. 17
  • 18. Any list of important twentieth century math- ematicians contains Polish names in a fre- quency out of proportion to the size of the country. Poland was partitioned by Russia, Germany, and Austria and was under foreign domi- nation for 200 years, from 1795 until the end of World War I. What was to become known as the Polish School of Mathematics was possible be- cause it was carefully planned, agreed upon, and executed. 18
  • 19. Independent Poland was crated in 1918 and University of Warsaw re-opened with Janiszewski, Mazurkiewicz, and Sierpin- ski as professors of mathematics. They chose logic, set theory, point-set topol- ogy, and real functions as the area of concentration. The journal Fundamenta Mathematicae was founded in 1920 and is still in print. It was the first specialized mathematical journal in the world. 19
  • 20. The choice of title was deliberate to reflect that all areas published there were to be connected with foundational studies. It should be remembered that at the time these areas had not yet received full acceptance by the mathematical commu- nity. The choice reflected both insight and courage 20
  • 21. The notable mathematicians of the Warsaw and Lvov Schools of Mathematics were, between others Stefan Banach, Stanis- lam Ulam and after the war, Roman Sikorski. Stefan Banach was self-taught mathematics prodigy and the founder of modern func- tional analysis. Mathematical concepts named after Banach include the Banach-Tarski paradox, Hahn- Banach theorem, BanachSteinhaus theo- rem, Banach-Mazur game and Banach spaces. 21
  • 22. Stanislaw Ulam emigrated to America just be- fore the war and became American math- ematician of Polish-Jewish origins. He participated in the Manhattan Project and originated the Teller-Ulam design of thermonuclear weapons. He also invented nuclear pulse propulsion and developed a number of mathematical tools in number theory, set theory, ergodic the- ory and algebraic topology. 22
  • 23. Roman Sikorski reputation was established by his outstanding results in Boolean algebras, functional analysis, theories of distribution, measure theory, general topology, descrip- tive set theory, and in Algebraic Math- ematical Logic (with collaboration with Rasiowa). In axiomatic set theory, the Rasiowa-Sikorski Lemma is one of the most fundamental facts used in the technique of forcing. 23
  • 24. The notable logicians of the Lvov-Warsaw School of Logic were: Alfred Tarski - since 1942 in Berkeley and founder of American School of Founda- tions of Mathematics, Jan Lukasiewicz, Andrzej Mostowski, and after the second world war Helena Ra- siowa. 24
  • 25. Helena Rasiowa became, in 1977 the founder of Fundamenta Informaticae the first world journal specialized in foundation of com- puter science. The choice of the title Fundamenta Infor- maticae was again deliberate. It reflected not only the subject, but also stresses that the new research area being developed in Warsaw is a direct continu- ation of the tradition of the Foundational Studies of Polish School of Mathemat- ics. 25
  • 26. Part 2: DESCRIPTIVE GRANULARITY A Model for Data Mining 26
  • 27. We present here a formal syntax and seman- tics for a notion of a descriptive granu- larity. We do so in terms of three abstract models: Descriptive, Semantic, and Granular. Descriptive model formalizes the syntactical concepts and properties of the data min- ing, or learning process. Semantic model formalizes its semantical prop- erties. Granular model establishes a relationship be- tween the Descriptive and Semantic mod- els in terms of a formal satisfaction rela- tion. 27
  • 28. Data Mining - Informal Definition One of the main goals of Data Mining is to provide comprehensible descriptions of information extracted from the data bases. We are hence interested in building models for a descriptive data mining, i.e. the data mining which main goal is to produce a set of descriptions in a language easily comprehensible to the user. 28
  • 29. The descriptions come in different forms. In case of classification problems it might be a set of characteristic or discriminant rules, it might be a decision tree or a neural net- work with fixed set of weights. In case of association analysis it is a set of associations (frequent item sets), or asso- ciation rules with accuracy parameters. In case of cluster analysis it is a set of clus- ters, each of which has its own description and a cluster name. 29
  • 30. In case of approximate classification by the Rough Set analysis it is usually a set of dis- criminant or characteristic rules (with or without accuracy parameters) or a set of decision tables. Data Mining results are usually presented to the user in their descriptive, i.e. syntac- tic form as it is the most natural form of communication. But the Data Mining process is deeply semantical in its nature. We hence build our Granular Model on two levels: syntactic and semantic. 30
  • 31. SYNTAX We understand] by syntax, or syntactical concepts simple relations among symbols and expressions of formal symbolic lan- guages. A symbolic language is a pair L = (A, E), where A is an alphabet and E is the set of expressions of L. The expressions of formal languages, even if created with a specific meaning in mind, do not carry themselves any meaning, they are just finite sequences of certain symbols. The meaning is being assigned to them by establishing a proper semantics. 31
  • 32. SEMANTICS Semantics for as given symbolic language L assigns a specific interpretation in some domain to all symbols and expressions of the language. It also involves related ideas such as truth and model. They are called semantical concepts to distinguish them from the syn- tactical ones. 32
  • 33. MODEL The word model is used in many situations and has many meanings but they all reflect some parts, if not all, of its following formal meaning. A structure M , called also an interpretation, is a model for a set E0 ⊆ E of expressions of a formal language L if and only if every expression E ∈ E0 is true in M . 33
  • 34. All our Models are abstract structures that allow us to formalize some general prop- erties of Data Mining process and address the semantics-syntax duality inherent to any Data Mining process. Moreover, it allows us to provide a formal def- inition of a generalization and of Data Mining as the process of information gen- eralization. 34
  • 35. The notion of generalization is defined in terms of granularity of steps of the pro- cess. Data is represented in the model in a form of Knowledge Systems. Each Knowledge System has a granularity associated with it and the process changes, or not, its granularity. Granularity is the crucial for defining some notions and components of the model, hence the Granular Model name. 35
  • 36. Granular Model Granular Model is a system GM = ( S M, DM, |= ) where: • SM is a Semantic Model; • DM is a Descriptive Model; • |= ⊆ P(U ) × E is called a satisfaction relation, where U is the universe of SM and E is the set of descriptions defined by the DM. Satisfaction |= establishes truth relationship between the data mining model and the descriptive model. 36
  • 37. Semantic Model definition motivation. First step in any data mining procedures is to drop the key attribute. This step allows us to introduce similarities in the database as records do not have their unique identification anymore. The input into the data mining process is hence always a a data table obtained from the target data by removal of the key at- tribute. We call it a target data table. 37
  • 38. As the next step we represent, following Rough Set model our target data table as Pawlak’s Information System with the universe U by adding a new, non attribute column for the record names, i.e. objects of U . We take this set U as the universe of our model of SM. Why Information system? We want to model Data Mining as a process of generalization. In order to model this process we have first to define what does it mean from seman- tical point of view that one stage of the process is more general then the other. 38
  • 39. The idea behind is very simple. It is the same as saying that (a + b)2 = a2 + 2ab + b2 is a more general formula then the formula (2 + 3)2 = 22 + 2 · 2 · 3 + 32. This means that one description (formula) is more general then the other if it de- scribes more objects. From semantical point of view it means that data mining process consists of putting ob- jects (records) in sets of objects. From syntactical point of view data min- ing process consists of building descrip- tions (in terms of attribute, values of at- tributes pairs) of these sets of objects, with some extra parameters, if needed. 39
  • 40. To model a situation that allows us to talk about descriptions of sets of records (ob- jects) we extend the notion of Pawlak’s model of information system to our notion of Knowledge System. The universe of a knowledge system con- tains some subsets of U , i.e. elements of P(U ). For example a target data table (after pre- processing) and the corresponding repre- sentation by Pawlak’s information system, and a knowledge system with universe U of granularity one are as follows. 40
  • 41. Target Data Table T0 a1 a2 a3 small small medium medium small medium small small medium big small small medium medium big small small medium big small small medium medium big small small medium big small medium medium medium small small small medium big small big medium medium small Target Information System I0 U a1 a2 a3 x1 small small medium x2 medium small medium x3 small small medium x4 big small small x5 medium medium big x6 small small medium x7 big small small x8 medium medium big x9 small small medium x10 big small medium x11 medium medium small x12 small small medium x13 big small big x14 medium medium small 41
  • 42. Knowledge System of granularity one (all objects are one element sets) correspond- ing to target table T0 is as follows. Target Knowledge System K0 P 1 (U ) a1 a2 a3 {x1 } small small medium {x2 } medium small medium {x3 } small small medium {x4 } big small small {x5 } medium medium big {x6 } small small medium {x7 } big small small {x8 } medium medium big {x9 } small small medium {x10 } big small medium {x11 } medium medium small {x12 } small small medium {x13 } big small big {x14 } medium medium small 42
  • 43. Assume now that we have applied some algo- rithm ALG1 and it has returned a following set D = {D1, D2, ...D7} of descriptions. D1 : (a1 = s) ∩ (a2 = s) ∩ (a3 = m), D2 : (a1 = m) ∩ (a2 = s) ∩ (a3 = m), D3 : (a1 = m) ∩ (a2 = m) ∩ (a3 = b), D4 : (a1 = m) ∩ (a2 = m) ∩ (a3 = s), D5 : (a1 = b) ∩ (a2 = s) ∩ (a3 = s), D6 : (a1 = b) ∩ (a2 = s) ∩ (a3 = m), D7 : (a1 = b) ∩ (a2 = s) ∩ (a3 = b). 43
  • 44. Questions Q1 How well this set of descriptions describes our original data i.e. how accurate is the algorithm ALG1 we have used to find them, Q2 how accurate is the knowledge we have thus obtained out of our data. The answer is formulated in terms of the tar- get information system with the universe U , and the sets S(D) defined (after Pawlak) for any description D ∈ D as follows. S(D) = {x ∈ U : D}. We call S(D) the truth set for D. 44
  • 45. Intuitively, the sets S(D) = {x ∈ U : D} contain all records (i.e. their identifiers) with the same description given in terms of attribute, values of attribute pairs. The descriptions do not need to utilize all at- tributes of the target data, as it is often the case, and one of ultimate goals of data mining is to find descriptions with as few attributes as possible. 45
  • 46. In association analysis the descriptions can rep- resent the frequent item sets. For example , for a frequent three itemset D = i1i2i3, the truth set S(D) represents all all transactions that contain items i1, i2, i3. In general description come in different forms, depending on the data mining goal and ap- plication. We define formally a general form of descrip- tions as a part of the Descriptive Model 46
  • 47. For the target data and descriptions Di ∈ D presented in the above examples the sets S(Di) are as follows. S1 = S(D1 ) = {x ∈ U : D1 } = {x1 , x3 , x6 , x9 , x12 }, S2 = S(D2 ) = {x ∈ U : D2 } = {x2 }, S3 = S(D3 ) = {x ∈ U : D3 } = {x5 , x8 }, S4 = S(D4 ) = {x ∈ U : D4 } = {x11 , x14 }, S5 = S(D5 ) = {x ∈ U : D5 } = {x4 , x7 }, S6 = S(D6 ) = {x ∈ U : D6 } = {x10 }, S7 = S(D7 ) = {x ∈ U : D7 } = {x13 }. 47
  • 48. We represent our results in a form of a Knowl- edge System as follows. Resulting Knowledge System K1 P(U ) a1 a2 a3 {x1 , x3 , x6 , x9 , x12 } s s m {x2 } m s m {x5 , x8 } m m b {x11 , x14 } m m s {x4 , x7 } b s s {x10 } b s s {x13 } b s b P(U ) a1 a2 a3 S1 s s m S2 m s m S3 m m b S4 m m s S5 b s s S6 b s s S7 b s b 48
  • 49. The representation of data mining results in a form of a knowledge system allows us to define how good is the knowledge ob- tained by a given algorithm. In our case the knowledge obtained describes 100% of our target data as S1 ∪ S2 ∪ S3 ∩ ... ∪ S7 = {x1, x2, ..., x14} = U. Observe that the sets S1, ..S7 are also disjoint and non-empty, i.e. they form a partition of the universe U . We define such knowledge as exact. 49
  • 50. Moreover, we can see that the resulting sys- tem K1 is more general then the input data K0 because its granularity is higher the the granularity of K0. Definition: Granularity of a knowledge sys- tem is the maximum of cardinality of its granules, i.e. elements of its universe. The granularity of all Target Knowledge Sys- tems is one. Granularity of K1 is max{|S1|, ...|S7|} = max{5, 1, 2, } = 5. 50
  • 51. Now assume that we have applied to out tar- get data T (represented by K0 ) another algorithm ALG2 and it returned two de- scriptions D1, D2 under a condition that we need only descriptions of the length 2 and with frequency ≥ 30%. The descriptions are: D1 : (a1 = s) ∩ (a2 = s), D2 : (a2 = s) ∩ (a3 = m). Now we evaluate: S1 = S(D1 ) = {x1 , x3 , x6 , x9 , x12 }, S2 = S(D2 ) = {x1 , x2 , x3 , x6 , x9 , x10 , x12 }. 51
  • 52. Incorporating the algorithm parameters im- posed by the ALG2 into our Knowledge System we obtain the following table. Resulting Knowledge System K2 P(U ) a1 a2 a3 #of attr frequency S1 s s - 2 36% S2 - s m 2 50% The sets S1, S2 do not form a partition of the universe U as S1 ∩ S2 = ∅ and moreover, S1 ∪ S2 = U . The knowledge obtained by the algorithm ALG2 is hence not exact. It describes only 57% of the target data and what is described is described following cer- tain (frequency) conditions. Of course K2 is more general then K0. 52
  • 53. The algorithm ALG2 generalized the target data, even if in an incomplete way. The formal definitions of Information System, Knowledge and Target Knowledge Systems, and their granularity and exactness are as follows. 53
  • 54. Knowledge System is an extension of the fol- lowing notion of Pawlak’s information sys- tem. Information System is a system I = (U, A, VA, f ), where U = ∅ is called a set of objects, A = ∅, VA = ∅ are called the set of at- tributes and values of of attributes, re- spectively, f is called an information function and f : U × A −→ VA 54
  • 55. A knowledge system based on the informa- tion system I = (U, A, VA, f ) is a system KI = (P(U ), A, E, VA, VE , g) where E is a finite set of knowledge attributes (k- attributes) such that A ∩ E = ∅. VE is a finite set of values of k- attributes. 55
  • 56. g is a partial function called knowledge in- formation function(k-function) g : P(U ) × (A ∪ E) −→ (VA ∪ VE ) such that (i) g | ( x∈U {x} × A) = f (ii) ∀S∈P(U )∀a∈A((S, a) ∈ dom(g) ⇒ g(S, a) ∈ VA) (iii) ∀S∈P(U )∀e∈E ((S, e) ∈ dom(g) ⇒ g(S, e) ∈ VE ) 56
  • 57. We use the above notion of knowledge sys- tem to define the granules of the universe and the granularity of the system, an hence later, the granularity of the data mining process. Granule: Any set S ∈ P(U ) i.e. S ⊆ U is called a granule of U . Granularity of S: The cardinality |S| of S is called a granularity of S. Granule Universe: The set GrK = {S ∈ P : ∃b ∈ (E∪A)((S, b) ∈ dom(g))} is called a granule universe of KI . Granularity of K: A number grK = max{|S| : S ∈ GrK } is called a granularity of K. 57
  • 58. A knowledge system K = (P(U ), A, E, VA, VE , g) is called exact if and only if all its granules GrK form a partition of the universe U . Operators: In our Model we represent data mining algorithms as certain operators. For example our ALG1 is represented in the semantic model by an operator p1 acting on some subset of a set K of knowledge systems, such that p1(K0) = K1. ALG2 is represented in the model by an op- erator p2 also acting on some (may be dif- ferent) subset of the set K of knowledge systems, such that p2(K0) = K2. 58
  • 59. We put all the above observations into a for- mal notion of a semantic model. Semantic Model is a system S M = (P(U ), K, G), where: • U = ∅ is the universe; • K = ∅ is a set of knowledge systems, called also data mining process states; • G = ∅ is the set of operators; • Each operator p ∈ G is a partial function on the set of all data mining process states, i.e. p : K −→ K. 59
  • 60. The semantic model is always being built for a given application. The target data is represented first in a form the target information system with the uni- verse U , and then in the form of target knowledge system K0, as we showed in our examples. 60
  • 61. The semantic model based on our examples is as follows. S M = (P(U ), K, G), where: • U = {x1, x2, ...x14}; • K = {K0, K1, K2}; • G = {p1, p2}; • Each pi ∈ G for (i = 1, 2) is a partial function pi : K1 −→ K1, such that p1(K0) = K1, p2(K0) = K2. 61
  • 62. Data Mining as Generalization We model data mining as a process of gen- eralization in terms of the generalization relation based on a notion of granularity and generalization operators. Definition: A relation ⊆ K × K is called a generalization relation if the following condition holds for any K, K ∈ K. K K if and only if grK ≤ grK , where grK denotes the granularity of K. 62
  • 63. Observe that for K0, K1, K2 from our exam- ples grK0 = 1 ≤ 5 = grK1 ≤ 7 = grK2 , and the system K2 is the most general. But at the same time K1 is exact and K2 is not exact, so we have a trade off between exactness and generality. Definition: an operator g ∈ G is called a gen- eralization operator if for any K, K ∈ K such that g(K) = K , we have that K K. Observe that both operators p1, p2 in our ex- ample are generalization operators. 63
  • 64. Data Mining Operators G In data mining process the preprocessing and data mining proper are disjoint , inclu- sive/exlusive categories. The preprocessing is an integral and very im- portant stage of the data mining process and needs as careful analysis as the data mining proper. Our framework allows us distinguish two dis- joint classes of operators: the preprocess- ing operators Gprep and data mining proper operators Gdm and we put G = Gprep ∪ Gdm. 64
  • 65. We provide also a detailed formal definitions, their motivation, and discussion of these two classes. Data Mining and preprocessing operators de- fine different kind of generalizations. The model presented in our examples didn’t include the preprocessing stage; it used the data mining proper operators only. 65
  • 66. The main idea behind the concept of the operator is to capture not only the fact that data mining techniques generalize the data but also to categorize existing meth- ods. We define within our model three classes of data mining operators: classification Gclass, clustering Gclust, and association Gassoc. We don’t include in our analysis purely sta- tistical methods like regression, etc... 66
  • 67. We prove the following theorem. Theorem Let Gclass, Gclust and Gassoc be the sets of all classification, clustering, and as- sociation operators, respectively. The following conditions hold. (1) Gclass = Gclust = Gassoc (2) Gassoc ∩ Gclass = ∅, (3) Gassoc ∩ Gclust = ∅. 67
  • 68. Data Mining Process Definition Any sequence K1, K2, ....Kn (n ≥ 1) of data mining states is called a data pre- processing process, if there is a prepro- cessing operator G ∈ Gprep, such that G(Ki) = Ki+1, i = 1, 2, ...n − 1. Definition Any sequence K1, K2, ....Kn (n ≥ 1) of data mining states is called a data min- ing proper process , if there is a data mining proper operator G ∈ Gdm, such that G(Ki) = Ki+1, i = 1, 2, ...n − 1. 68
  • 69. The data mining process consists of the pre- processing process (that might be empty) and the data mining proper process. We know that the sets Gprep and Gdm are dis- joint. This justifies the the following defi- nition. Definition Data mining process process is any sequence K1, K2, ....Kn (n ≥ 1) of data mining states, such that K1, ..Ki (0 ≤ i ≤ n) is a preprocessing process and Ki+1, ...., Kn is a data mining proper process. 69
  • 70. Granular Model Syntax- Semantic Duality of Data Mining Granular Model is a system GM = ( S M, DM, |= ) where: • SM is a Semantic Model; • DM is a Descriptive Model; • |= ⊆ P(U ) × E is called a satisfaction relation, where U is the universe of SM and E is the set of descriptions defined by the DM. Satisfaction |= establishes relationship between the semantic model and the descriptive model. 70
  • 71. Descriptive Model For any Semantic Model S M = (P(U ), K, G, ) we associate with it its descriptive counter- part defined below. A Descriptive Model is a system DM = ( L, E, DK ), where: L = ( A, E ) is called a descriptive lan- guage; A is a countably infinite set called the alpha- bet; E = ∅ and E ⊆ A∗ is the set of descriptive expressions of L; 71
  • 72. DK = ∅ and DK ⊆ P(E) is a set of descrip- tions of knowledge states. As in a case of semantic model, we build the descriptive model for a given application. We define here only a general form of the model. We assume however, that whatever is the ap- plication, the descriptions are always build in terms of attributes and values of the attributes, some logical connectives, some predicates and some extra parameters, if needed. The commonly used descriptions have the form (a = v) to denote that the attribute a has a value v, but one might also use, as it is often done, a predicate form a(v) or a(x, v) instead. 72
  • 73. For example, a neural network with its nodes and weights can be seen as a formal de- scription (in an appropriate descriptive lan- guage), and the knowledge states would represent changes in parameters during the neural network training process. The model we build here is a model for, what we call a descriptive data mining, i.e. the data mining for which the goal of the data mining process is to produce a set of de- scriptions in a language easily comprehen- sible to the user. For that purpose in the model we identify the decision tree constructed by the classifica- tion by Decision Tree algorithm with the set of discriminant rules obtained from the tree. 73
  • 74. Granular Model is a system GM = ( S M, DM, |= ) where: • SM is a Semantic Model; • DM is a Descriptive Model; • |= ⊆ P(U ) × E is called a satisfaction relation, where U is the universe of SM and E is the set of descriptions defined by the DM. Satisfaction |= establishes relationship between the semantic model and the descriptive model. We define the Satisfaction |= component of the Granular Model DM in the following stages. Stage1 For each K ∈ K, we define its own descriptive language LK = ( AK , EK ). 74
  • 75. Stage2 For each K ∈ K, and descriptive ex- pression F ∈ EK , we define what does it mean that D satisfied in K; i.e. we define a satisfaction relation |=K . Stage3 For each K ∈ K, and descriptive ex- pression F ∈ EK , we define what does it mean that D is true K, i.e. |=K D.
  • 76. Stage4 We use the satisfaction relation |=K to define, for each K ∈ K, the set DK ⊆ P(EK ) of descriptions of its own knowl- edge. Stage5 We use the languages LK to define the descriptive language L. Stage6 We use the descriptive expressions EK of LK to define the set E of descriptive expressions of L. Stage7 We use the satisfaction relations |=K to define the satisfaction relation |= of the Granular Model GM. 75
  • 77. Part 3: TRACING THE HISTORY Mathematics Genealogy Project genealogy.math.ndsu.nodak.edu 76
  • 78. We all have a history We are all mathematicians Mission Statement of the Mathematics Ge- nealogy Project defines a mathematician as follows. ” ... Throughout this project when we use the word ”mathematics” or ”mathemati- cian” we mean that word in a very inclu- sive sense. Thus, all relevant data from statistics, computer science, or operations research is welcome....” Computer Science classification within the project is: Mathematics Subject Classifi- cation: 68Computer Science. 77
  • 79. The Genealogy Project solicits information from all schools who participate in the devel- opment of research level mathematics and from all individuals who may know desired information. It means Computer Science as well. For them, and the history, we are all math- ematicians. 78
  • 80. Below are some links (sequences of connected people) for a computer scientist. Any two people in the sequence are listed in order PhD student, Adviser. If a person has more then one adviser the ad- viser is preceded with a number; i.e. adviser 1 is listed as 1. adviser Name, adviser 2 is listed as 2. adviser Name, etc... 79
  • 81. A mathematician would say: For any element A of the sequence, if A has more then one adviser, then for any 1 ≤ k ≤ n , an adviser k is listed as k.Name of the adviser k, and the number in front of the name is omitted otherwise. 80
  • 82. Link to Nicolaus Copernicus (Mikolaj Kopernik) He has 1598 descendants Anita Wasilewska, Ph.D. Warsaw University, 1975, Poland, Helena Rasiowa, Ph.D. War- saw University,1950, Andrzej Mostowski, Ph.D. Warsaw University, 1938, 2. Alfred Tarski, Ph.D. Warsaw University, 1924, Stanislaw Lesniewski, Ph.D. University of Lvov, 1912, Kazimierz Twardowski, Ph.D. Universitat Wien, 1891, Franz Clemens Brentano, Ph.D. Eberhard Karls Universi- tat, Tubingen 1862, 2. Friedrich Adolf Trendelenburg, Dr. phil. Universitat Leipzig, 1826, 1. Georg Ludwig Konig, Artium Liberalium Magister, Georg August Univer- sitat, Gottingen, 1790, Christian Heyne, Magister Juris, Universitat Leipzig, 1752, 81
  • 83. 1. Johann August Bach, Magister philosophiae, Universitat Leipzig, 1744, 1.Christian Kust- ner, Magister philosophiae, Universitat Leipzig, 1742, Johann Ernesti, Magister philosophiae, Universitat Leipzig, 1730, Johann Gesner, Magister artium, Friedrich Schiller Univer- sitat Jena, 1715, Johann Buddeus, Magis- ter artium, Martin Luther Universitat, Halle Wittenberg, 1687, Michael Walther, Jr., Magister artium, Theol. Dr., Martin Luther Universitat, Halle Wittenberg, 1661, 1687, 2.Johann Quenstedt, Magister artium, Theol. Dr., Universitat Helmstedt, Martin Luther Universitat,b Halle Wittenberg, 1643, 1644, Christoph Notnagel, Magister artium, Mar- tin Luther Universitat, Halle Wittenberg, 1630, Ambrosius Rhodius, Magister artium, Medicinae Dr., Martin Luther Universitat, Halle Wittenberg, 1600, 1610, 82
  • 84. 1.Melchior Jostel, Magister artium, Medici- nae Dr., Martin Luther Universitat, Halle Wittenberg, 1583, 1600, 1.Valentin Otto, Magister artium, Martin Luther Universi- tat, Halle Wittenberg, 1570, Georg Joachim Rheticus, Magister artium, Martin Luther Universitat, Halle Wittenberg 1535, 2. Nicolaus Copernicus, Juris utriusque, Doctor, Uniwersytet Jagiellonski (Cra- cow Jagellonian University), Universita di Bologna, Universita degli Studi di Ferrara, Universita di Padova, 1499, Poland-Italy, 2.Domenico Novara da Ferrara, Universita di Firenze, 1483, 1. Johannes Regiomon- tanus, Magister artium, Universitat Leipzig, Universitat Wien, 1457, 83
  • 85. Georg von Peuerbach, Magister artium, Uni- versitat Wien, 1440, Johannes von Gmunden, Magister artium, Universitat Wien, 1406, Heinrich von Langenstein, Magister artium, Theol. Dr., Universite de Paris, 1363, 1375, unknown. Georg von Peuerbach, 1375 is my ”oldest” ancestor. THERE ARE 3 more lines of ancestry; also interesting, if not so illustrious. Here they are. 84
  • 86. Link to Gottfried Leibniz (54209 descendants), Immanuel Kant ( 2176 descendants), and Desiderius Erasmus of Rotterdam (57416 descendants) Anita Wasilewska, Ph.D. Warsaw University, 1975, Poland, Helena Rasiowa, Ph.D. War- saw University, 1950, Andrzej Mostowski, Ph.D. Warsaw University, 1938, 2. Alfred Tarski, Ph.D. Warsaw University, 1924, Stanislaw Lesniewski, Ph.D. University of Lvov, 1912, Kazimierz Twardowski, Ph.D. Universitat Wien, 1891, Franz Clemens Brentano, Ph.D. Eberhard Karls Univer- sitat, Tubingen 1862, 2. Friedrich Adolf Trendelenburg, Dr. Phil. Universitat Leipzig, 1826, 2. Karl Reinhold, PhD., 85
  • 87. Immanuel Kant, Ph.D. Universitat Konigs- berg 1770, Martin Knutzen, Dr. Phil. Universitat Konigs- berg, 1732, Christian von Wolff, Dr. phil., Universitat Leipzig, 1700, 2. Gottfried Leibniz, Dr. jur. Universitat Altdorf, 1666, 2. Christiaan Huygens, Artium Liberalium Magister, Jurisutriusque Doctor, Universiteit Leiden, Universite d’Angers, 1647, 1655, Frans van Schooten, Jr., Artium Liberal- ium Magister, Universiteit Leiden, 1635, Jacobus Golius, Artium Liberalium Magis- ter, Philosophiae Doctor Universiteit Lei- den, 1612, 1621, 1. Willebrord (Snel van Royen) Snellius, Artium Liberalium Magis- ter, Universiteit Leiden, 1607, 2. Rudolph 86
  • 88. (Snel van Royen) Snellius, Artium liberal- ium Magister, Universitat zu Koln, Ruprecht Karls Universitat Heidelberg, 1572, 1. Valen- tine Naibod, Magister Artium, Martin Luther Universitat, Halle Wittenberg, Universitat Erfur, Erasmus Reinhold, Magister Artium, Martin Luther Universitat, Halle Witten- berg, 1535, Jakob Milich, Liberalium Ar- tium Magister, Med. Dr., Albert Ludwigs Universitat Freiburg, Breisgau, Universitat Wien, 1520, 1524, Desiderius Erasmus Roterodamus (sometimes known as Desiderius Erasmus of Rot- terdam), University of Paris, Theologiae Baccalaureus, College de Montaigu, 1497, Jan Standonck, Magister Artium, Theol. Dr., College Sainte-Barbe, College de Montaigu, 1474, 1490, unknown
  • 89. Link to Pierre-Simon Laplace ( 50295 descendants) and Jean Le Rond d’Alembert Anita Wasilewska, Ph.D. Warsaw University, 1975, Poland, Helena Rasiowa, Ph.D. War- saw University, 1950, Andrzej Mostowski, Ph.D. Warsaw University, 1938, 1. Kaz- imierz Kuratowski, Ph.D. Warsaw Uni- versity, 1921, 1. Stefan Mazurkiewicz, Ph.D. University of Lvov, 1913, Waclaw Sierpinski, Ph.D. Uniwersytet Jagiellonski, 1906, 1. Stanislaw Zaremba, Ph.D. Uni- versite Paris IV-Sorbonne, 1889, Gaston Darboux, Ph.D. Ecole Normale Superieure, Paris, 1866, Michel Chasles, Ph.D. Ecole Polytechnique, 1814, Simeon Poisson, Ph.D. Ecole Polytechnique, 1800, 2. Pierre-Simon Laplace, Ph.D., Jean Le Rond d’Alembert, unknown 87
  • 90. Link to Emile Borel (2506 descendants), Leonhard Euler (52555 descendants) Anita Wasilewska, Ph.D. Warsaw University, 1975, Poland, Helena Rasiowa, Ph.D. War- saw University, 1950, Andrzej Mostowski, Ph.D. Warsaw University, 1938, 2. Zyg- munt Janiszewski, Ph.D. Ecole Normale Superieure Paris, 1911, Henri Lebesgue, Ph.D. Universite Henri Poincare Nancy 1, 1902, Emile Borel, Ph.D. Ecole Normale Superieure, Paris, 1893, Gaston Darboux, Ph.D. Ecole Normale Superieure, Paris, 1866, Michel Chasles, Ph.D., Ecole Polytechnique, 1814, Simeon Poisson, Ph.D. Ecole Poly- technique, 1800, 88
  • 91. 1. Joseph Lagrange, no degree, student of Leonhard Euler, Ph.D. Universitat Basel, 1726, Dr. med. Universitat Basel, 1694, Dr. hab. Sci. Universitat Basel, 1684, Gottfried Leibniz, Dr. jur. Universitat Alt- dorf, 1666, 1.Johann Bernoulli, Dr. med. Universitt Basel 1694, Jacob Bernoulli, Dr. hab. Sci. Universitt Basel, 1684, Got- tfried Wilhelm Leibniz, Dr. jur. Universitt Altdorf, 1666, 1. Erhard Weigel, Ph.D. Universitt Leipzig, 1650, unknown. 89
  • 92. Link to Andrei Markov (4824 descendants), and Pafnuty Chebyshev (5964 descendants) Anita Wasilewska, Ph.D. Warsaw University, 1975, Poland, Helena Rasiowa, Ph.D. War- saw University, 1950, Andrzej Mostowski, Ph.D. Warsaw University, 1938, 1. Kaz- imierz Kuratowski, Ph.D. Warsaw Uni- versity,1921, 1. Stefan Mazurkiewicz, Ph.D. University of Lvov, 1913, Waclaw Sierpinski, Ph.D. Uniwersytet Jagiellonski, 1906, 2. Georgy Fedoseevich Voronoy, Ph.D. University of St. Petersburg, 1896, Andrei Markov, Ph.D. University of St. Petersburg, 1884, Pafnuty Chebyshev, Ph.D. University of St. Petersburg, 1849, Nikolai Dmitrievich Brashman, Ph.D. Moscow State University, 1834, Joseph Johann von Littrow, Ph.D., unknown 90
  • 93. MY PhD COUSINS include Kurt Goedel Alain Turing Alonso Church Roman Sikorski Zdzislam Pawlak and many others....I am sure some of them in this room! 91
  • 94. In Stony Brook CS Department I traced 10 of them. WE ALL ARE A BIG SCIENTIFIC FAMILY! 92