3. RESUME
Durant les dernières décennies il y a eu une importante amélioration de la précision dans
la lecture des données disponibles. Par exemple la géolocalisation est devenu de plus en plus
précise et peut maintenant identifier dans quelle pièce d’un bâtiment un utilisateur se trouve.
Pourtant ce niveau de précision est il toujours nécessaire ou même utile ? Par exemple un
utilisateur peut vouloir savoir sa position approximative avec l’aide de phrases tel que “ Vous
êtes proche de la station de train” ou “ Vous êtes toujours loin du centre”.
Avec ces deux phrases nous pouvons mettre en lumières deux mots particuliers “ proche” et
“loin”, pourquoi ? Car ils sont de nature vague, nous pouvons les lire et les comprendre
cependant il nous est impossible de leurs donner un sens précis. Ces mots ne sont pas fixe
dans leur interprétation, ils dépendent du context, croyances de la personne qui les prononcent
ou dépendent d’un phénomène jusque là inconnu.
La communication humaine est pleine de mots vague, nous les utilisons quotidiennement dans
notre travail, en société sans vraiment y prêter attention. Etonnamment nous parvenons à
communiquer sans créer d'ambiguïté même si nous avons des définitions opposée de ces mots.
De plus l’utilisation de mot vague permet dans une communication d’avoir une liberté
d’interprétation, cette tâche est laissé à l’interlocuteur qui doit adapter la sémantique de ces
mots par rapport à la personne qui les a utilisés. Le phénomène d’être vague peut être vu
comme un degré de croyance à propos d’un concept et dépend donc de la personne qui les
utilises et implique un traitement particulier de la part de l’interlocuteur.
Ce papier vise à explorer l’utilisation de terme vague comme : grand, petit, gros; pour la
production automatique de résumé à partir de données numériques. De nos jour avec l’ère du
Big Data beaucoup de données numériques sont disponibles mais il y a peu de travaux se
concentrant sur la présentation de ces données. Ce projet explore l’humanisation des données
avec l’aide de termes vagues et d’algorithme d’apprentissage automatique pour proposer un
modèle élégant et adaptatif pour le résumé linguistique automatique.
Le résumé automatique essai de généraliser un concept sur une base de données et a pour but
de transmettre uniquement l’information pertinente. C’est pourquoi les mots vagues semblent
être le bon outil et une piste sérieuse à explorer de part leurs nature, l’utilisation de termes
vague permet une plus grande interprétation pour l’interlocuteur et permet donc une
généralisation d’un concept.
3/55
4. Abstract
In the last Decade there has been a major increase in precision of some types of readily
available data. For example geolocalisation has become more and more precise and can now
pinpoint locations to within a particular room of a certain building. Yet this level of precision
always really required or even helpful ? For example an individual may want to know their
approximate locations with sentences like “ You are near the train station “ or “You are still
away from the center”. With this two examples we can highlight two words “near” and “away”,
why ? Because they are vague, you read this words without noticing that in fact you can’t give
a precise definition. They are not fix in their interpretation, they depends of the context,
beliefs of the person who utter it or are the effect of an unknown phenomenon.
Human communication is full of vagueness, we use unconsciously everyday of our life, in work
environment, in the society... Surprisingly we manage to communicate without ambiguity even
if we don’t share the same definition on this kind of words. Moreover vagueness allow us in
communication to have a freedom of interpretation, letting to the listener the difficult task of
computing the semantics. Vagueness can be seen has a degree of belief about a particular
concept and thus depend of the speaker state of mind and imply also a particular treatment for
the listener.
This paper aim to explore the use vagueness for producing automatic summary of
numeric data with the use of vague terms such as : tall,small, big… Nowadays in the big data
era a lot of numeric data are available but yet there is a lack of work for the presentation of
this data. This project explore the humanisation of data with the help of vague term and
machine learning in order to propose an elegant and adaptive model for automatic summary.
Automatic summary try to generalize a concept over a dataset and aim to transmit only the
useful information. In that sense vagueness seems to be the perfect tools because of its
nature, vague term allow a bigger range of interpretation and thus allow generalization of a
concept.
4/55
5.
HOST ESTABLISHMENT PRESENTATION
A. Bristol University
The University of Bristol is a red brick research university located in Bristol, United
Kingdom. It received its royal charter in 1909, and its predecessor institution, University College
Bristol, had been in existence since 1876. Bristol is organised into six academic faculties
composed of multiple schools and departments running over 200 undergraduate courses
situated in the Clifton area along with three of its nine halls of residence. The other six halls are
located in Stoke Bishop, an outer city suburb located 1.8 miles away. It is the largest
independent employer in Bristol.
The University of Bristol is ranked 11th in the UK for its research, according to the
Research Excellence Framework The University of Bristol is ranked 37th by the QS World
University Rankings 201516, and is ranked amongst the top ten of UK universities. The
University of Bristol is the youngest British university to be ranked among the top 40
institutions in the world according to the QS World University Rankings, and has also been
ranked at 15th in the world in terms of reputation.
B. Laboratory : Intelligent System Lab
The University of Bristol has a long tradition of excellence in Artificial Intelligence, with
research groups in Engineering dating back to the 1970s and 1980s. Now all these traditions
have converged to form the Intelligent Systems Laboratory (ISL), a leading research unit
counting 15 members of staff (four professors) and about 50 PhD students and postdocs.
Research activities include foundational work in machine learning (many of the ISL members
work in this central area of research), and applications to web intelligence, machine translation,
bioinformatics, semantic image analysis, robotics, as well as natural intelligent systems. Besides
these applications, research in ISL is a key enabler in a number of strategic research directions.
Data Science is one of the main frontiers for modern AI, dealing with vast masses of data, both
enabling their exploitation and benefiting from them. Another key frontier for intelligent
systems research is interacting with modern biology, both taking inspiration by it, and providing
tools for it.
5/55
6. B. Supervisor : Doctor Jonathan Lawry
Dr Lawry is focussed on developing probabilistic models of vagueness (fuzziness) and
applying them across a number of application domains in Artificial Intelligence. His approach is
the identification of vagueness or fuzziness with linguistic (semantic) uncertainty. This
approach allows for a much more flexible representation framework in which both propositions
and valuations can be ordered in terms of their relative vagueness, and in which we can
capture both stronger and weaker versions of an assertion e.g. absolutely short, quite short
etc. This opens the possibility of developing choice models of assertion in which there is a clear
rationale for choosing a vague statement over a (more) crisp one in the presence of
uncertainty.
6/55
7. Table of Contents
I. Definition of Vagueness
A. What is vagueness ? Why Language is vague ?
B. Reasoning and use of vagueness
C. Modelisation of vagueness
II. Summarization of numeric data with words
A. State of Art
B. Vagueness in automatic summary, a fuzzy question
III. Mathematical framework for automatic summary
A. General presentation
B. Detailed framework, end to be vague
IV. Results
V. Case study
A. Application to web
B. Results
VI. Overview
A. Participation of this framework within the axes of investigation
B. Future investigation and amelioration
VII. Conclusion
A. Personal experience
B. Professional overview
C. Greeting
Bibliography
page 8
page 9
page 10
page 18
page 21
page 26
page 31
page 42
page 48
page 49
page 50
page 51
page 53
page 54
page 55
7/55
8. I. Definition of Vagueness
A. What is Vagueness ? Why is language vague ?
An unexpected new trend is emerging in apps and web services, indeed the
information presented to the user is growing vague. So what is vagueness ?
Vagueness can be explained with the help of Classic logic where a sentence is either True or
False, this is known as the Boolean logic. But for some words and concepts of Human language
such as “Tall” and “Small” classical logic don’t work well. This words are what we call vague,
that is if you ask two people to classify a set of human according to their height into two
category small and tall. The two person will reach quite the same classification but some
person will be classify as tall for one and small for the other. This is vagueness, even if in some
case you will find no difference with Boolean logic, in some case an object can be tall and not
tall as the same time. This effect of vagueness can’t be represented with Boolean logic, it’s
called borderline case. There is no precise, known height which defines the line between a
person who is tall and a person who is not. That is why borderline term is used because there
is a range where you can’t be absolutely sure that someone is tall or not.
Why then we don’t simply adopt as a definition that “tall” will mean above a particular
threshold ? How Human represent vagueness and process with it ?
In a communication process view we can first think that vagueness is suboptimally
because it can create confusion and thus lead to misunderstanding. But in reality our brain and
communication deal perfectly with vagueness, letting to us the task of interpreting vagueness.
Even if you don’t think that house is tall because of your experience you can imagine and try to
understand the other point of view. Vagueness allow different interpretation depending of the
protagonist of the conversation, but even with this ambiguity they manage to understand and
accomplish complex task based on vague information. This is the reason why vagueness is
treated in the field of artificial intelligence and more particularly in reasoning under uncertainty.
“ A conceptexpression is vague if it is indeterminate, for some objects, whether or not
the concept applies “
Gottlob Frege
8/55
9.
B. Reasoning and use of Vagueness
We saw in the previous section the nature of vagueness and its particularity, but
what can we do with vagueness ? Vagueness is proper to language so to Human
communication, then the first use we might think for application of vagueness is for
humanising information. For example in a communication process a person will never say “ I
just bought a new shirt for one hundred twenty five pounds and fifty five penny” instead you
will rather say something like “ I just bought an expensive shirt” or “ I just buy a shirt for an
hundred of pounds”. In this example we see the difference between the crisp assertion in the
first place followed by the vague assertion just by reading. One seems more natural, could
have been uttered by an Human the other one is clearly unnatural and seems emotionless too
much mathematical.
Our brain generalize information from our environment every second, and it do the same when
we talk with other people. We sum the information that we want to transmit using vagueness,
this is done naturally, unconsciously and we don’t have issue to understand each other. This is
because we focus on the important information, the information that will transmit the true
concept. In the example the price was not the primary, important information that was the
fact that i buy a shirt, furthermore with the word expansive i can predict with the information
that i have over the speaker in what range of price it could be.
Reasoning with vagueness as we see in this little example is focused on balancing with
crisp and vague terms to express an information, concept, feelings and emotions. But the
particularity of vagueness is that it's depends of the interpretation of the two people and how
they can reach a common understanding. One hack of human communication well know is the
misunderstanding when both speaker and listener interpret differently a concept but don’t
realise it. Vagueness is always present in communication process and is at the base of human
speech. The modelization of vagueness will always imply a speaker and a listener, because
vagueness rely on the belief and goals of the two people involved in the discussion.
9/55
10.
C. Modelisation of Vagueness
So the first question we have to explore is : How to modelize vagueness ? We see that
with classical logic you can’t, but some researcher propose variation of Boolean logic. New
mathematical tools emerge for modelize vagueness, they take basics in philosophy and more
especially in the problematic : representation of knowledge. In order to understand this tools
we have first to define the difference between the truth value of vague term and classical logic
term.
Classical term will have only two truth value : True or False, in opposition vague term have this
pattern but in top they admit borderline case. If we take the example of person height and the
characterisation of this feature with small and tall, we can find a threshold under which we are
absolutely sure that these person is small and another threshold above we are definitely sure
that this person is tall.
Height of a person :
Tall is true if : Small is true if :
BorderLine case :
Where correspond to the threshold for Small and Tall
Figure 1 : Vagueness with BorderLine case
According to figure 1 we can see that we have crisp threshold to statut whether a
person is tall or small. But in the borderline case we can’t determine if the person is Tall or
Small, it’s undefined. Moreover the value of the threshold is problematic, how to choose this
thresholds ?
10/55
11. Epistemicists in vagueness want to retain classical logic and they endorse the somewhat
surprising claim that there's actually such a thresholds (they claim we know the existential
generalization `there is an n that such and such' even if there is no Particular n of which we
know that such and such). Many philosophers, however, find this claim something too hard to
swallow and take it as evidence that classical logic should be modified.
One problem that all different philosophers try to solve and where classical logic can’t be used
is the sorites paradox. The are a lot of variant in sorite paradox but all of them rely on the
difficulty of representing and deal with vague term.
Heap Paradox
1 grain of wheat does not make a heap.
If 1 grain of wheat does not make a heap then 2 grains of wheat do not.
If 2 grains of wheat do not make a heap then 3 grains do not.
…
If 9,999 grains of wheat do not make a heap then 10,000 do not. 10,000 grains of
wheat do not make a heap.
Figure 2. Sorites Paradox
Solving this problem with classical logic force to reach absurd inference like 1000 grains
doesn’t form a heap. That is why different logic emerge to threat and deal with this kind of
paradox, in every model we will refer to this issue and see how every model propose to solve
it.
( All the following definition of modelisation of vagueness are from the book [1] from the
researcher Kees van Deemter. )
11/55
12.
1. Supervaluationism
According to supervaluationists, borderline statements lack a truthvalue. This
neatly explains why it is universally impossible to know the truthvalue of a borderline
statement (to recall a truthvalue of a statement p for example, is True or False in standard
logic ). Supervaluationism exploit the law of excluded middle to treat with vague term, for
example instead of having the predicate “Charles is a baby “ they have “ Charles is a baby or is
not the case that charles is a baby”. Thus the method of supervaluationists allows one to
retain all the theorems of standard logic while admitting “truthvalue gaps”. The basic thought
underlying supervaluationism is that vagueness is a matter of underdetermination of meaning.
This thought is captured with the idea that the use we make of an expression does not decide
between a number of admissible candidates for making the expression precise. For example,
we can make it precise by saying that x is a baby just in case x is less than one year old; but
the use of the expression will allow other ways of making precise like `less than one year plus
a second'. If Martin is one year old, the sentence `Martin is a baby' will be true in some ways
of making `baby' precise and false in others. Since our use does not decide which of the ways
of making precise is correct, the truthvalue of the sentence `Martin is a baby' is left unsettled.
By supervaluationist standards, a sentence is true just in case it is true in every way of making
precise the vague expressions contained in it (that is, `truth is supertruth'). A precisification is
a way of making precise all the expressions of the language so that every sentence gets a
truthvalue (true or false but not both) in each precisification. In this sense, a precisification is
a classical truthvalue assignment.
As part of his solution to the sorites paradox, the supervaluationist will assert ‘There was
grains not being an heap when n grains but will be a heap when n+1 grains’. For this
statement comes out true under all admissible precisification of heap. However, when pressed
the supervaluationist will add an unofficial clarification: “Oh, of course I do not mean that there
really is a sharp threshold for an heap.”
12/55
13. 2. Fuzzy Logic
Fuzzy logic was introduced by the mathematician Lotfi A.Zadeh, is a form of many
valued logic where the truth value of a variable is in the range [0,1] and called a membership.
Fuzzy logic in opposition of Boolean logic deal with set of object where there is no a precise
define criteria of membership. The fuzzy set theory then deal with this kind of classes for
example to define the old set, there is no precise way of decide whether an object is old or not.
Instead fuzzy set works with membership function which assign to each object a continuous
value traducing how much this object is rely to this set.
So how fuzzy logic can be use in vagueness ? As we said before a vague term is a
particular object where you have borderline case, to illustrate how fuzzy logic can be use to
deal with vagueness we will use the set old. In pure set theory “ The class of old people” can’t
constitute a set or classes in the usual mathematical sense of these terms. But with fuzzy
theory we can because of the continuous value of the membership function, the class of old
people is represented with a degree of confidence. To represent closer the way of human
compute this kind of linguistic term, fuzzy set theory combine different terms, more often the
antonym is taken to represent the class of a vague term. In our example the antonym of old is
young, then we can define two fuzzy set as we can see on this figure.
Figure 2. Fuzzy set example for vague term : Age
With this representation according to an input x representing the age you will have
different membership value for : young, middle age and old. This theory try to modelize how
our brain works with differents beliefs, every person have a particular mechanism to represents
the vague class such as old people. In that sense you will probably find a matching with
another person for the classification of an old person according to his height, that is you will
share quite the same membership function and threshold. But in some case because of the
threshold taken you will infer that a particular height is considered as old and middle age for
another person. The inference in fuzzy logic is called the defuzzification, it consist on choosing
13/55
14. with the different membership values which one to retain. The most simple way of doing this is
taking the max of each membership function to classify the object.
Fuzzy set theory is a promising theory and can be view also with a degree of belief
about classification into sets. Moreover fuzzy set theory propose an elegant way to solve the
sorites paradox. Taking back our example of the heap, with the fuzzy logic the paradox is
solved with the membership function. Indeed as we see a 1000 grains doesn’t form an heap
with the implication rules but with fuzzy logic sorites paradox is not an issue. Your input
variable x representing the number of grains when is high will have a very low membership
value for the class “ Not a heap” and become greater for the class “ Is a heap”. The sorites
paradox is solved without difficulty because in fuzzy logic is just a matter of degree of
appartenance.
Even if fuzzy logic solve the sorites paradox a bias remain, the modelisation of fuzzy set
imply to choose for the membership function a threshold where you are sure that below or
above it, your concept is true or False ( as figure 1 ). The choose of the membership function
and the associate threshold are crucial, this choices are made by empirical experiment and yet
there is not an accepted and shared methodology to do so. What is missing in fuzzy logic is an
emergent process to compute automatically this threshold and the choice of the shape of the
membership functions.
14/55
15. 3. Manyvalued Logic
Manyvalued logics come from the field of nonclassical logics and differ from the most
common logic : Boolean. They are similar to classical logic because they accept the principle of
truthfunctionality, namely, that the truth of a compound sentence is determined by the truth
values of its component sentences (and so remains unaffected when one of its component
sentences is replaced by another sentence with the same truth value). But they differ from
classical logic by the fundamental fact that they do not restrict the number of truth values to
only two: they allow for a larger set W several truth degrees.
The fuzzy set theory came from this point of view as we already presented it, we will
introduce another theory : The three valued logic. Three valued logic is a good start point to
understand the mechanism behind the many valued logic. This logic was introduced by the
mathematician Kleene and consist of three truth value {0,½,1}, where the ½ correspond to
undefined truth value. This theory can be applied to vagueness where crisp term are always in
the truth value domain {0,1} and vague term in the domain {0,½,1}. The mechanism of
inference in this logic rely on two major principle of disjunction and conjunction. The
disjunction for three value logic works generally with the maximum of the truth value, and for
disjunction the lowest value is taken.
But this is only one way to treat this operators, another wild use semantics is to mix with the
probability theory. This figure sum up the difference of semantics for many valued logic and
regroup the two most used definition of disjunction and conjunction.
Figure 3. Operators Logic
With the probabilistic interpretation of this operators the sorites paradox can be solve,
the use of recursive implication will decrease the truth value of the concept. The truth value
will be the product of the previous sentences, as at one moment the predicate will have the
truth value ½ the product will then decrease and reach zero at some point. Thus the concept
of heap will have at one moment a zero truth value and then be false.
15/55
16. 4. Contextualism
Epistemic contextualism (EC) is a recent and hotly debated position. EC is roughly the
view that what is expressed by a knowledge attribution — a claim to the effect that S ‘knows’
that p — depends partly on something in the context of ‘the attributor’, and hence the view is
often called ‘attributor contextualism’. So EC, of the sort which will concern us here, is a
semantic thesis: it concerns the truth conditions of knowledge sentences, and/or the
propositions expressed by utterances thereof. The thesis is that it is only relative to a
contextuallydetermined standard that a knowledge sentence expresses a complete
proposition: change the standard, and you change what the sentence expresses; acontextually,
however, no such proposition is expressed. In this respect, knowledge utterances are supposed
to resemble utterances involving uncontroversially contextsensitive terms. For instance, just
what proposition is expressed by an utterance of :
1. He is a tall person
2. That’s red
3. He is nice person
depends in certain obvious ways upon such facts as the location (1) or identity (2) of the
speaker, and/or the referent of the demonstrative (in 3). Contextualism rely in the mere fact
that every truth value of a predicate depend of the context. To illustrate this vision let’s
suppose we have a jury of two person who have to classify person according to their height
and suppose that we have two scenario :
Every person to judge pass on a stage one by one and leave the scene.
Every person to judge come on the stage and stay
In the first scenario the context will not be build dynamically because all the people leave the
stage once they been classify. You will have disparity in the judgement of the jury due to the
fact that they both rely on their personal knowledge, interpretation of the predicate tall and
small. Whilst in the other hand in the second scenario they will have all the people/input in
front of them, the context will then be built dynamically depending to the set of people
presented. So they will reach more or less the same classification because they built the same
context, with this reflexion the sorites paradox it’s solved.
16/55
17. But it is solved theoretically, contextualism doesn’t propose a mathematical model to modelize
this view. This more a philosophical point of view on which you can start to built a more
concrete implementation by using mathematical tools.
Many critic come to this solution because contextualism only displace the issue of vagueness
into the mind of Human. For contextualist all the definition we can give to vague term depend
entirely to our psychological state, which was made by our experiment in life. A good example
is comparing two child :
One who grew up in a healthy family.
The second in a poor one.
When they will be teenager they won't have the same definition of vague term such as
“expansive”. They built two different knowledge about this term according to their development
and the environment they were confronted. It’s like during our childhood development
especially in the first year of our life, the brain learn and fix by habits threshold to represent
vague term. That is why when you are more aged it’s more complicated to change the meaning
over vague predicate, you are conditioned from your personal experience.
Personal choice
There is a lot of other possibility to deal with vagueness but we choose to explore the
most used one. After a careful reflexion we choose in our project to explore the use of Fuzzy
Logic mixed with contextualism. This motivation is argued with the wish of developing an
adaptive autonomous framework to represent vagueness. The contextualism point of view can
be explored with non supervised learning algorithm in order to propose an emergent
mechanism to learn and find the right threshold for vague term. Furthermore we think that this
methodology is closer to the mechanism involved in the human brain, this is support with
lecture in psychology and cognitive science over the child development of language.
17/55
18. II. Summarization of numeric Data with words
The summarization of data is an active field in research especially in artificial
intelligence and data mining. With the increase of data a lot of information can be extracted,
yet we struggle with the presentation of this information. Actually a lot of improvement are
made but they focus on machine learning, classification task and are reserved to specialist for
understanding and deal with the knowledge extracted. A regain of interest is made now in the
presentation of data, the goal is to make the information understandable by majority of human
and didn’t require special knowledge. This field regroup researcher from natural language
processing, artificial intelligence, mathematics… Vagueness is widely exploited in this field as
is a base of human language, and especially in computer science. We will introduce and
present the different methods explored by researcher in this field and see how vagueness play
an important role in this area.
A. State of Art
Automatic summary provide a new tool to extract knowledge from a large set of
information. It is a communication process where according to some inputs, information must
be extracted and transmitted to another. To represent this process a two agent model can be
used composed with a speaker and a listener. The speaker is the major actor, he is in charge
to compute the inputs information in order to extract patterns. This patterns are then
translated into a communication channel and transmitted to the second agent, the listener. The
listener is a passive actor and only receive the information from the speaker and interpret it. In
some application he had to choose an action to perform depending to the message he
received. This kind of issues require a more complex modelization, the field of Game theory is
often used where the model ( speaker and listener ) is represented as a game. The goal being
to find the best summary/information to transmit in order to maximize the award as typical
game model, the difficulty is to find the right award according to the application.
The speaker is the important part while the listener in most of the case are human and
no works have to be made on this part. The modelisation of the speaker agent depends on the
application, it is composed of different and specific part. For example the speaker must
implement an algorithm to treat the input information and extract pattern, knowledge, this is
where the field of Machine Learning is used.
18/55
19. The information infer from this part is indigest to present for an human, that is why the
speaker have to also implement a linguistic process to communicate the extracted patterns.
The field of Natural language generation with the field of game theory propose elegant
architecture and mathematical framework. In the case where the listener use the information
from the speaker to do an action the speaker must take in account this issue for the process of
summarization. He have to endorse a mechanism to exploit the beliefs and actions of the
listener to produce and choose the right message. This kind of problematic rely on the
treatment and decision under uncertainty, game theory algorithm again are widely explored.
To sum up and make more precise what have been said, Yager in [2] propose an elegant and
modulable model for automatic summary :
Figure 4. Yager Summary model
The summarizer is a linguistic sentence and most of the time is a vague one, to illustrate this
model let’s take a easy example :
S = middle age
Q = most
T degree of truth ( have to be calculated according to the Dataset, fuzzy logic is used
by Yader to compute it ).
Yager pose the basics for summarization and propose a general architecture that can be
complexified depending to the application. In the case of T, fuzzy logic is most of the time
used, P. Villar and al in [2] use Yager model to propose an automatic summarization of the
opinion of tourism hotel data. Villar and al go deeper in the use of linguistic term to describe
heterogeneous data and propose a fuzzy model based on semantic translation as a tool to
produce linguistic summary. Their goal is to benefit from the use of linguistic term in order for
a data analyst and a non specialist to understand and exploit the inferred information. Their
works start from the Yager model with in upstream a process to identify and classify the input
information in order to extract in their application sentiment classification.
19/55
20. This classification is important because it’s use to determine the fame of an hotel according to
different textual and numeric input.
To modelize vagueness they choose fuzzy logic, trapezoidal curve were chosen for membership
function. Their main works focus in the mix and calculation of degree of truth from different
vague term. They go much deeper in this way and propose several method for the
aggregation of vague term and different approaches to balance with the lost of information
induced from vagueness.
A.Ramos and al in [3] also explore the use of fuzzy logic to produce automatic textual
shortterm weather forecast for the region of Galicia. Their approach is mainly based on fuzzy
operators and crisp one but they innove with the use of intermediate language to capture
vagueness and produce linguistic term. In opposition at the model of Yader the architecture
of A.Ramos implements a pre process where information is extracted and computed into
intermediate code for finally with natural language templates produce linguistic weather
forecast.
Figure 5. Architecture for short term weather forecast
Even if a lot of architecture for linguistic summarization exist they all take basics with the
Yader architecture. The two other architecture we saw complexify and focus on some specific
part of the summary as a finer modelisation of vagueness and are dependent of the
application.
In the next part we will focus and go deeper in the mathematical formalism adopted to
modelize and represent vagueness in this two articles. Fuzzy logic will be more explained with
the presentation of the works of D.Dubois a referent in this field.
20/55
21. B. Vagueness in automatic summary, a fuzzy question
In this part we will focus on the formalism adopted in [2],[3],[4] and in particular
the use of Fuzzy Logic. Fuzzy logic was introduced by Zadeh to formalize the human
knowledge, Dubois in [5] explain how fuzzy set theory and vagueness are rely even if Zadeh
wanted a distinction. The claim that fuzzy sets are a basic tool for addressing vagueness of
linguistic terms has been around for a long time. But some researcher as Novak make the
opposition of vagueness with uncertainty, thus vague term must fulfill three features : The
existence of BorderLine Case, Unsharp boundaries, Susceptibility to sorites paradox.
Fuzzy logic have been controversial for philosophers, many of them are reluctant to consider a
different truth value system than Boolean one. One of the reasons for the misunderstanding
between fuzzy sets and the philosophy of vagueness may lie in the fact that Zadeh was trained
in engineering mathematics, not in the area of philosophy. In particular, vagueness is often
understood as a defect of natural language (since it is not appropriate for devising formal
proofs, it questions usual rational forms of reasoning). Actually, the vagueness of linguistic
terms was considered as a logical nightmare for early 20th century philosophers. In contrast,
for Zadeh, going from Boolean logic to fuzzy logic is viewed as a positive move: it captures
tolerance to errors (softening blunt threshold effects in algorithms) and may account for the
flexible use of words by people. It also help for information summarisation: detailed
descriptions are sometimes hard to make sense of, while summaries, even if imprecise, are
easier to grasp. The link between fuzzy set theory and vague term can be argue with the idea
that it’s natural to represent incomplete knowledge with set. But fuzzy logic was understood in
various way and help to modelize uncertainty, degree of belief and can even be connected with
modal logic.
Vagueness is a phenomenon observed in the way people use language, and is
characterized by variability in the use of some concepts between the listener and speaker. It
may be that one cause of such variability is the gradual perception of some concepts or some
words in natural language. This variability of interpretation, perception can be use in automatic
summary to capture a concept like in [4] where A.RamosSoto and al use it to generate
weather forecast. In [3], [5] they also use vagueness to produce automatic summary, even if
they all use Fuzzy Logic to represent vagueness they all differ in the way they implemented it.
21/55
22. RamosSoto : Weather forecast
Ramos and al in [4] proposed an architecture to automatically produce weather forecast
summary over the 315 Galician municipalities. Formally, each municipality M has an associated
forecast data series : which includes data series
for the input variables considered: sky state ( ), wind ( ) and maximum ( )
and minimum ( ) temperature. For clarity reasons in follow we will consider a single
municipality data series.
For each forecast data series , Ramos and al obtains linguistic descriptions about
seven forecast variables, namely cloud coverage, precipitation, wind, maximum and minimum
temperature variation and maximum and minimum temperature climatic behavior. For this,
they have devised a computational method divided in several linguistic description generation
operators. Here is the process where fuzzy logic is used to translate this features into vague
term, we will take the sky data to illustrate :
The first stage in Ramos and al application is to first transform the chronological data
serie into temporal linguistic term. To do so they use fuzzy set to represent linguistic temporal
term { Beginning, Half, End } with an associated membership function.
The second stage is to catch the concept associated with the main feature, here sky
data are traduced into CCL = {C, P C, V C} (“clear”,“partly cloudy”, “very cloudy”) fuzzy sets.
The procedure then is to concatenate all this temporal description taking the maximum degree
of membership in the fuzzy sets. The output is then an intermediate code with vague term
which describe the weather in precise time window. The global process is repeated for each
features, Ramos and al choose to first translate the numeric data into vague term in order to
produce at the end with template method and NLG methods a linguistic weather forecast.
This figure can summarize the process more clearly :
22/55
23. Ramon A.Carrasco : Automatic summary for Tourism Web Data
In this paper [3] Ramon and al propose a novel model to aggregate heterogeneous data
from various websites with opinions about hotels. Ramon and al focus on the mathematical
modelization of vague term and how to solve the issue of crisp boundaries. They use the same
architecture as Yager in [1] but go deeper in the representation of vagueness and the
formalism adopted. As in [4] they gather various data from websites about hotels opinions, but
they differ in the way that they use linguistic input from forum or comment section in some
rating website (TripAdvisor…). We will only focus on the way they treat and implement fuzzy
logic with vague term as this is the core issue, to illustrate our talk we will take as an example
the age of the clients in an Hotel.
A set of seven terms on the age of the hotel guest could be given as follows: = baby,
= child, = teenager, = young, = adult, = mature and = old. The semantic
ie the membership value is calculated with unbalanced trapezoidal functions. Trapezoidal
functions are represented with the 4tuple : { } where and represent the interval
where the concept is totally true ie 1, and represent the two threshold respectively lower
and upper where the concept is false ie 0. The vague interval is represented in the interval
and the membership value can only be in the [0,1] interval.
To solve the issue of crisp threshold another metrics is added , it represent the range of
translation for where the concept is True. To deal with this, two set and for
high and low translation are created in order to compute the truth degree of a concept. The
idea of the translation is to catch the different possible interpretation according to the choice of
the thresholds. Furthermore the authors propose a weighted model for example to highlight
metrics from valuable client. They determine two operators to do so and , the
first one is just a weight sum the second is a more complex aggregation and can be viewed as
a quasiarithmetic average. This procedure is applicable for vague term but also for crisp term,
to deal with it the authors propose a grammar G in which they store the space of interpretation
of the terms. For example primary term have no D parameter whereas some have a high/low
comparative term that means they have the D parameter.
23/55
24.
This figure summarize the process proposed by Ramos and al:
Figure 5. Fuzzy model based on semantic translation
JANUZS Kacprzyk Fuzzy Logic for linguistic summarization of DataBases
Januzs in his paper aim to produce a new query system interface to display the
information from a Database. This system is based on fuzzy logic and take basics with the
Yader architecture { S : summarizer, Q : quantity in agreement, T : degree of truth }. The
innovation in this proposed paper is not in a the way Januzs use fuzzy logic to represent
vagueness but in the way he treated the combinational issue of automatic summary.
To highlight this issue let take the study case of Januzs in his paper : Computer retailer.
For example to summarize the sell of a computer many options can arise like “most of the sell
are second hand”, but it can be more precise by adding conjunction,disjunction “ Most of the
sell are second hand and/or recent computer”. Given a set of attribute A and a vocabulary to
describe the features V with connector like AND, OR the research space of the possible
summary are huge and become an issue to compute with a large database.
Calculate the validity of each summary is a considerable task, and George and Srikanth
(1996) use a genetic algorithm to find the most appropriate summary in the space search. In
his approach, Januzs use also a genetic algorithm, and the overall quality (goodness) of a
summary is given by the weighted sum of some partial quality indicators, TI, ..., Ts (cf.
Kacprzyk and Yager, 1999) and the weights are derived from expert testimonies from pairwise
24/55
25. comparisons between the particular indicators using Saaty's AHP method. Thus, basically, the
problem is to find an optimal summary , S* { S } such that :
Where represent the weight associated to the linguistic term .
In this paper the author highlights the combinational issue for the automatic
summarization, in our work a special focus is made on this part in order to propose a fast and
reliable algorithm to generate the optimal summary.
Vagueness to Classify
We saw in all this different paper that vagueness is a wide use phenomenon especially
in summarization task. This is because in the summarization task the goal is to find the most
global summary that englobe different concepts such as for example :” Well paid workers”. In
this summary the interpretation of “ well paid” allow a greater set of possible object than the
sentence “ workers paid 120$ per hour”. That can be viewed as classification problem where
the goal is to create set using vague concept, this is why vague adjective are good way to do
so. They allow a larger range of interpretation and thus can theoretically catch fuzzy concept
and carry out more information.
In our work we take basics with all the techniques we saw previously, our wish was to
propose a modulable architecture for automatic summary. But we focus more on classification
task as final goal like for example the discrimination between diabetic and non diabetic patients
with the use of vagueness. Moreover our work was also focused on proposing a solution for the
bias that exist in modelisation of vagueness like finite threshold, decision making. In order to
be closer of the Human thinking we explore paper on psychology, psycholinguistic in childhood
development of language, to propose a mathematical model directly inspired from human brain
behavior.
25/55
26. III. Mathematical Framework for automatic summary
A. General presentation
We saw in the last section the nature of vagueness and the way to use it for automatic
summary. In the papers [3],[4],[5] the authors propose new model and architecture for
different application of summarization with vagueness, more precisely with fuzzy logic. In [2]
Yader pose a basic architecture which can be complexified according to the application.
Furthermore we saw that the modelisation of vagueness depending on the choice of the logic
imply some bias, the most critical one is the choice of the membership function and the
threshold when used with fuzzy logic. To be closer of the Human thinking we explore the
development of language and meaning of vague adjective in the development of the child.
S.Andersen in the paper [7] explore the process involved by child to treat vague concept such
as words cup or glass. Thus the idea is to take inspiration from child language emergence, how
do they treat vague concept ? How do they learn the boundary involved in vague concept ?
The overall process explored in this work take basics from the progress in the field of cognitive
science, that is the mix of different point of view from mathematics, psychology, logic to
understand and modelize the human cognition. We wanted to follow this process of reasoning
to propose a general and reliable framework inspired from the Human brain behavior.
To do so we split the framework into different part each focused on specific treatment, the goal
was to propose based on Yager model [1] a inspired cognitive architecture.
Figure 6. Framework Architecture
26/55
27. Extraction of Data
This part deal with the extraction of the information from Dataset, we use in this work
the UCI repository a famous repository for ML, the extraction of data is made for the format
CSV. CSV format state for comma separate value, in top of this convention the class attribute
have to be the last one in the row. The nature of the informations are not restricted to numeric
data, we extend with nominal value but only one with less than 8 possible values.
This is not a crucial part of the project but is essential for the rest of the framework, it began to
be essential when we propose a concrete application : a Web extractor data. The idea is to
extract the data directly from a website using a web crawler, in this case Celenium library was
used. In the interest of having a modular framework a inheritance model was built, to adapt
the process for a specific website a few parameters have to be settled. This mechanism allow
to test our framework on real data from the internet and from regular Dataset like UCI
repository.
Model
In this framework a model can be view as a definition of an object, that is all the
features that describe this object. All the name of the attribute, the number of class, their
name and the range values for nominal features. The model part is the basics of the
framework, it’s a definition of the object and give the right linguistic term to produce at the end
for the summary. For example to describe the diabeticPima Dataset several features are taken
like the concentration of plasma glucose in the blood after 2 hours, the age…. This attributes
are used to produce sentences at the end of the summary process with connectors ( And, OR)
to rely and produce a more convincing summary. This is inspired from Ramos and al in [4]
where they use template for the generation of linguistic summary, the motivation by doing that
is to focus only in vague term and not in crisp term.
Furthermore this modelisation allow to adapt the framework to different model, for any
Dataset only one class have to be written to describe the object.
27/55
28. Math : Fuzzy set
This part is the mathematical definition of fuzzy set, it gather all the fuzzy term with
their mathematical definition. The choice of membership function and other parameters proper
to Fuzzy set theory are set in this part of the framework. With the other math part they both
constitute the base to treat with vagueness according to contextualism point of view.
Math : Statistics / Machine learning
Given a Dataset to summarize a lot of statistics metrics can be extracted such as
distribution over one particular feature or several, or learning of some dependency between set
of objects... In this framework this part focus on extracting distribution graphics like cumulative
histogram, representation of object in several dimension... This part rely more on computer
modelisation rather than algorithmic process, the role of this part is to translate the input data
into mathematical presentation. We use framework such as Scipy or numpy to allow an easy
manipulation. Library such as sktlearn have been widely used to test and include machine
learning algorithm like clustering along several dimensions. To follow our thought of “ inspired
cognitive architecture” this part would be the traduction one, where brut data are treated to
allow the extraction of knowledge. This part is linked with the context part as we can see in
figure 6 , this two part represented the contextualism point of view. The context is dynamically
built from the input data which are traduced and computed to produce the mathematical
context to treat with vagueness.
Math : Context
To recall with the contextualism point of view, to treat with vagueness this logic propose
that every threshold proper to vague term exist but are not fixed. This threshold are computed
dynamically according to the set of object presented. This class is directly inspired with this
philosophy, the context is represented with a fraction of the dataset thus for every vague term
the threshold are computed actively.
28/55
29. Language
This part regroup all the vague vocabulary used to produce summary, such as : big,
low, high, normal...To follow Zadeh fuzzy theory like in [8] where fuzzy quantifiers are used
to modify the sense of vague term, we used : very, most. For example very is a quantifiers
which can be used to amplify the sense of vague term such as tall, and thus influence the
membership function. Others quantifiers are used to capture the group characterised with the
summary depending to the distribution, fraction of object represented by the summary. For
example when someone tell “ some of the birds are smart” the some attribute refer to a portion
of a set but not a precise one. That is why it can be used when the summary didn’t fit all the
target concept (ie the class) for balancing the summary and be true in the interpretation.
Moreover it can be combined as in Zadeh proposed in [8] with other distribution vague term
such as “most of all”, “ the majority” in order to cover and transmit the maximum information
with the summary over the target concept.
Summarizer
This part follow the architecture proposed by Yader in [2], it regroup the 3tuple :
Summarizer,Quantity agreement, Truth value : { S,Q,T }. The difference is that in this
framework the summarizer is viewed as a classification task, the goal being to find the most
accurate linguistic summary to discriminate the different class present in the input dataset. In
our model the quantity agreement and truth value used the same mathematical metrics :
distribution of a given summary over the dataset. That is that according to the probability
assigned to a given summary a quantitative agreement is computed in order to catch and
transmit with linguistic term the distribution. This process can be view with the information
theory as a choice problem where according to a specific distribution a quantity agreement
have to be chosen in order to transmit with the summary the right interpretation.
Decision making
Here the mathematical branch of deciding under uncertainty ie with probability are
widely use. But in a computational process two method was explored to deal with this issue,
the first one consist of traducing numeric features with word, for example : “ 1.86 meters “ >
“ high meters” before the summary process. This method allow to keep a very fast framework
but in the other hand restrict and delete some useful information for the decision making part.
That is why with the second method all the membership values are kept and treated at the end
where a mathematical decision equation is used to perform the choice classification.
29/55
30.
Display Summary
In order to produce a linguistic summary a specific part is used to do so, with the output
of the Decision making part and the Model part a summary is generated. A very basic flow of
NLG is used here to produce the sentences, most of the production is done with the template ie
the Model.
We describe here the general part of the framework, the behavior and role of the
different part composing it. All the architecture was inspired by neurofunctional knowledge, in
the next part a direct link will be made, with the evolution of the different version of the
framework.
30/55
31. B. Detailed Framework, end to be vague
In the following part, the architecture will be detailed with argumentation about the
choice of conception and the mathematics tools used in every part.
1. Framework version 1.0
The first version of the framework was very simple and was more focus on trying
different hypothesis before to start modeling all the architecture, we start with this very simple
architecture :
Figure 7. FrameworkV1 architecture
The first architecture as shown in the figure 7 is quite basics and was test on the
IrisDataset from the UCL repository. The idea in this first version is to test the following
hypothesis :
Suppose that our data has attributes so that takes values in for
. Let and we let denote the vector of attributes values
. A data set takes then the form of where .
Now we define a language with propositional variables : which are in our
case vague term : High , Low. Then for every attributes of the data set we match each
propositional variables of according to this formula :
Given an alpha , two threshold are computed over the cumulative histogram . The
low threshold is given by .
31/55
32. The upper threshold is given by then we have , the condition on
alpha is : due to the upper bound given by .
Figure 8. Calcul of threshold example
Given this threshold over all the attribute of we can with fuzzy set theory calcul
the membership value for the propositional variables . To modelize the membership
function we choose to use trapezoidal curve of type Right for lower bound and Left for upper
bound, this choice is totally empirical.
With this two membership function for each data in the dataset a truth
value for the propositional value is calculated. The decision maker in this version is based on a
maximum average, to illustrate this we will take the Iris dataset. Given a class and a
attribute with the language the goal is to two find the best sentence to
describe a specific attribute according to a given class.
32/55
33. For the class Irissetosa with the attribute sepal length the formula to find the right vague
term to describe it follow this formula :
Figure 9. Decision making FrameworkV1
This first version of the framework was not guided by cognitive perspective and the goal
was to see the discrimination offered by the restricted language . We didn’t
explore the classification of class using this linguistic description it will be done in the next
version of the framework.
33/55
34.
2. Framework version Two
After the result highlighted with the first version of the framework we decided to take
inspiration from the neuroscience and particularly in the treatment of language. In the human
brain two different area are majoritary involve in the production and comprehension of
language. The first one the Broca area was discovered by Paul Broca in the nineteenth century,
he examined the brain of a dead patient who had an unusual disorder. The patient was not
able to talk even if no motor lesion on his tongue or mouth was noticed. Broca examined the
brain and discover lesion in the posterior portion of the frontal lobe of the left hemisphere.
Years later Carl Wernicke a German neurologist, discovered another part of the brain, this one
involved in understanding language, in the posterior portion of the left temporal lobe. People
who had a lesion at this location could speak, but their speech was often incoherent and made
no sense.
From this we decided to built an architecture that try to mimic this process by
dissociating the language and his semantics. In our model this was done by adding two part, a
Language one which represent the Broca area and the Context which represent the Wernicke
area. The first part gather all the vocabulary and the rules to produce the rights sentences. The
second one was involve in our framework in making sense of this word, in our case to put
semantics of vague term with the use of fuzzy logic.
Figure 10. FrameworkV2 Architecture
34/55
35. The method used to put semantics on vague term is the same as the first version of the
framework, with threshold computed from the cumulative distribution of attribute. Furthermore
we focus on to use summarization and vagueness for a classification task, the goal was to find
the summary that allow the best discrimination between the different class. The problem
became then to find the best summary given a vocabulary that have the higher probability to
be true for the target class and lower probability to be true for the other class. This problematic
can be view as a game and then algorithms from game theory can be used and especially the
tree search using an heuristic.
Taking the same definition as previously with a dataset with a vector of features , and a
language composed of propositional value, we add connectives logic symbol : .
The task became to find the conjunction or disjunction of attributes that can well discriminate a
specific class using this heuristic for the research in the space search :
Theta is the sentence explored in the tree search, is a combination of conjunction or disjunction of
attributes with vague terms. The heuristic is directly translated with the idea that a sentence have to be
most true for the target class and a little true for the other class .
We choose for a computational and performance issue to first try to translate all the
numeric data with our language in this case . Following Zadeh in [8] we
introduce quantifiers to our language , this quantifiers act directly on the
membership function. The quantifier very put to the power the membership value of each
word, for the quite quantifiers the square root is used. The same membership function was
taken to modelize Low and High, for the word Normal we choose a gaussian membership. This
choice was made because even if you can’t describe how a single random thing happens, a
whole mess of them together will act like a gaussian.
So given a dataset with features describing by the vector , we translate the numeric
data into linguistic data with :
In this figure we can see the results of this process with the translation into word of numeric
input :
Figure 11. Example of numeric to words translation
This translation allow in the search process to have better computational time, the algorithm
will exhaustively search word matching using several sentence ( ) given by the vocabulary.
35/55
36. The search algorithm with the heuristic, calcul for a given class the maximum summary
that discriminate this class. This is an example of the output generated by this version of the
framework :
Figure 12. Iris Summary Class Irisversicolor ( Class 1 )
The other issue treated in this version of this framework is the choice of the threshold
over the cumulative distribution for the upper and lower bound of our vague vocabulary. One
approach explored took inspiration from Januzs in [5] where he used genetic algorithm to find
the right summary where in our framework we use a tree space search. We took inspiration
from Januzs and try to apply genetic algorithm to find the right threshold where :
The population is the threshold vector for all the attributes
The crossing function is based on taking a max/mean of all the selected individuals
The fitness function is the probability computed given a summary for a specific class
The application of genetic algorithm to find threshold was not a complete success, the
computational time was to important according to the results. We choose to explore another
method to dynamically find the right threshold for vague term by exploring achievements made
in child development of language.
36/55
37.
3. Framework Version 3
In the last version of the framework we decide to focus first on the calcul of the
threshold for vague term. After several lectures about the child development of language
especially in psycholinguistic we find an interesting alternative to threshold. In [7] the study
highlights that the child to learn new words concepts such as cup, works like a clustering
algorithm. That is that to catch the link between concept and a word the child first create set of
different concepts and refine them during his development. When new words with new
concepts arise in his vocabulary conflict of words and meaning force the child to refine his
definition. The conjunction of this concepts create more precise boundary and thus a concept
become more precise by learning and error making. To treat with vague term we explore a
similar mechanism, we state as the contextualism that the semantics of vague term came from
the experience. During his development the child learn vague concept such as distance (away,
close) or about size (small, tall) by his environment. In our framework the environment where
the knowledge can be first extracted will be on a fraction of the input data ( Training Set) .
This TS will be use to find the threshold with the use of a non supervised algorithm : kmeans.
In kmeans algorithm the goal is to find the centroids that better balance our data. In our case
the vocabulary is , so a 3means algorithm will be use for every TS, the
goal being to output the three value of the centroids find by the algorithm. This centroids will
be use as previously like threshold for membership function, the motivation of doing this is the
adaptability of the method.
Moreover we focus on the decision part, on the last version the numeric data were first
translate in the language L, the main issue is that we loose information with the max operator.
Given a summary over a class , for each data to classify we keep the membership value
[01] and inject it into a decision equation.
The final model proposed is the evolution of the version two but incorporating the dynamic
calcul of threshold and a new decision process, the model is presented in Figure 6.
37/55