SlideShare ist ein Scribd-Unternehmen logo
1 von 40
The Fuzzy Logical Databases and An Efficient Evaluation of a Fuzzy Equi-Join Supervised by : Dr. Bassam Hammo Presented   by :   Alaa AlZoubi
OutLines 1 . Introduction to Fuzzy concepts in database. 2.  Interpretation of Fuzzy Terms. 3.  We propose a new measure for a fuzzy equality. 4.  We define a new type of fuzzy equi-join that is based on the new fuzzy equality. 5.  A sort-merge join algorithm based on a partial order of intervals is used to evaluate the fuzzy equi-join 6. Experiment results to show a Significant improvement of efficiency when  FE indicators are used with the Sort-merge join algorithm
ABSTRACT In many real world  Applications Such as business decision making, medical diagnosis, and criminal justice, have to deal with information that is  uncertain or imprecise  . Classical database models often suffer from their incapability of representing and manipulating imprecise and uncertain information. Example : The age of Tom is “About 32” . knowledge - base and database systems should directly support such applications by providing functionalities to store and to manipulate  ill - known  data .
[object Object],[object Object],In recent years, various fuzzy data models and fuzzy database  systems have been proposed.  These models and systems extend relational and object -   Oriented data models using the fuzzy set and the possibility theory to provide the ability of  representing ill - known data and  issuing queries containing Soft  restrictions.
These models can be classified into two categories: 1-similarity-based In a similarity - based model, some similarity relationships are specified for some attributes so that values of these attributes may be grouped into similarity classes. 2-possibility-based In a possibility - based model, an ill - known data is represented by a possibility distribution which describes the possibility for each crisp attribute value to be the actual value of the data.
Among the algebraic operations,  fuzzy join is an important and  expensive  one, and  its efficient evaluation  is more difficult than  that of an ordinary join . There are two reasons for the difficulty: 1-   diverse semantics  :  In a fuzzy join, two tuples may join even if they do not  completely satisfy the  join condition. The extent to which they do satisfy the join condition is usually Represented by some satisfaction degrees  2-lack of fast access paths  :most efficient join algorithms such as indexing and  hashing  used in ordinary databases relational do not apply directly to fuzzy relational databases.
Fuzzy concepts in database   ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Introducing fuzziness? The table is a relation, the columns are attributes and the rows are tuples.  Each attribute has a domain. For example the domain for the attribute JobType might be the set  (Academic, Industry, Government). Robotics Government simth Cindy Statistics Government Alin Bill Expert Systems Industry Smith Alan AI Academic Tom Smith Expertise JobType SecondName FirstName
Imprecision in attribute values can be introduced using a  similarity matrix e.g. for the attribute expertise  matrices of this type can be used to determine the matching degree between an applicant and a job opening. 1.0 0.9 0.2 0.6 AI 0.9 1.0 0.2 0.6 Expert Systems 0.2 0.2 1.0 0.2 Statistics 0.6 0.6 0.2 1.0 Robotics AI Expert Systems Statistics Robotics
Imprecision in attribute values can also be introduced using a  linguistic variable . e.g. the attribute height could be stored as values (short, average, tall) each of these could be modeled as a fuzzy set. Another way in which imprecision may be introduced is to permit partial  membership of a tuple in a relation. (The degree to which a tuple is a member is stored as a special attribute). The following is a relation (or table) storing information re endangered species. Tuples of this type are sometimes called  weighted tuples. 0.4 … coastal Black Duck 0.4 … rain forest Wren 1.0 … grassland Field Mouse membership ------------- habitat Name
An example: By using conventional method we can call a person “TALL” if the height is 7 feet and a person with height 5 feet is NOT TALL. That is we represent the person is either “TALL” or “NOT TALL” in Boolean Logic 1 or 0, 1 for “TALL” and 0 for “NOT TALL”   Fuzzy sets may be used to show the relationship or degree of precision :    If  S is the set of all people in the Universe, a degree of membership is assigned to each person in set S to find the subset TALL.   The membership function is based on the person’s height. TALL(x) =  0,  if Height(x) < 5, (Height(x) – 5 )/ 2  ,  if  5 ≤ Height(x) < 7 1,  if Height(x) ≥ 7 feet  
Degree of relationship
Imprecise queries: A user may make an imprecise query on a database. This can be due to the use of: 1.Imprecise conditions. “ Find all tax payers who have been audited in 2008 and whose income is low”. 2.Imprecise operators. “ Find all countries whose export revenue is about the same as the import revenue”. 3.Imprecise quantifiers. “ Find the companies whose customers are mostly from government agencies”.
Interpretation of Fuzzy Terms   A fuzzy data has an uncertain or imprecise value. We associate each fuzzy data v with a fuzzy term and a membership function (of a fuzzy set). The membership function, denoted by µv, maps each crisp value x in the universe of v to a membership degree µv(x) in [0, 1] to indicate the possibility of v = x. A membership function can be defined in a number of ways.  Over a  numerical  universe, a membership functions is typically convex (with a convex curve) and normal (at least one member has degree 1).  We consider membership functions of a trapezoidal shape, and denote them by MF (a, b, c, d), where the parameters mark the endpoints of the shape.
If a value v has a membership function defined by MF (a, b, c, d), the interval  [a, d] is called the  supporting interval   of v  As special cases,  MF (l, l, u, u) defines an interval [l, u],  MF (v, v, v, v) defines a crisp value v, and MF (a, b, b, d) is a triangular function. * ( Over a categorical universe, membership function is defined by µv = x1/m1 + x2/m2 + _ _ _ + xk/mk, where xi is a value in the universe and mi is the membership degree of xi. The membership function of a single crisp value v is µv = v/1 ) . For example, the membership function defining fuzzy term F2 in this Figure  is denoted by MF (20, 30, 40, 50).
FUZZY RELATIONS   In this section, we briefly describe the representation of data in a fuzzy relational database. A data is crisp if it is certain and precise, and fuzzy, otherwise. A fuzzy (sub) set F of an ordinary set U is characterized by a membership function: µF: U—> [0, 1] The Idea of  Fuzzy Sets      Fuzzy sets are functions that map a value,  which might be a member of a set, to a number between zero and one, indicating its actual  degree  of membership  A degree of zero  means that the value is not in the set, and  a degree of one means that the value is completely representative of the set.
For every (crisp) value (x  U), µF(x) is the membership degree of x with respect to (wrt) F that is,  (µF(x) = 1)  if x is a full member, (0 <  µ F(x) < 1) if x is a partial member  or (  µ F(x) = 0) if x is not a member of F. Without loss of generality, x is in F only if µF(x) > 0. A fuzzy data v is represented by a possibility distribution restricted by  a fuzzy set F in the sense that v is a member of F, and the possibility for v to be a member x of F is exactly µF(x).
A membership function can be defined in a number of ways: Over a  numerical  universe, a membership function is typically convex  (with a convex curve) and normal (at least one member has degree 1).  The following generic parameterized function to define such membership functions. MF (a,b,c,d)(x)= { C=d < x 0,  if c<d ≤ x , or (d-x)/(d-c), if c < x <d ; 1,  if  b ≤ x ≤ c (x-a)/(b-a), if a<x<b; X<a=b 0,  if x≤ a<b or
Where the parameters a, b, c, and d are values in the universe satisfying  a ≤  b ≤  c ≤  d. In general, the curve of the generic function is a trapezoidal, as shown in The Following Fig, but can also be some other shapes. For example, MF (a, b, b, d) defines a triangular function since the second and the third parameters are the same.
Over a  nonnumerical  universe, a membership function takes the form of (µF=X1/m1+ X2/m2 +… +Xk/mk); Where xi is a value in the universe and mi is the membership degree of xi with respect to F. In this case, the degenerated membership function of a crisp value v is µv= v/1. The universe of an attribute A, denoted by U (A), is the set of crisp values that  may appear in the attribute. The domain of an attribute A, denoted by D (A), is the set of all (both crisp and fuzzy) values defined over U (A).  A fuzzy relation R with a schema (A1, A2. . . An) is a fuzzy set of tuples in D(A1) ×…× D(An) .
FUZZY EQUI-JOIN
A FUZZY EQUI-JOIN   In this section, we first define a  fuzzy equality  and then use it to define a fuzzy equi-join.  The following example shows the needs for a fuzzy equi-join. Example : Consider the following relation R (as shown in following Table). The query  &quot;Find all pairs of persons from R whose ages are equal to a degree no less than 0.5“ 58 Farmer Tom Middle age Teacher Mike About 34 Lawyer Cindy About 32 Teacher Bill 31 DBA Alan 20 Engineer Smith AGE OCCUPATION NAME
Solution : a join of R with itself on the AGE attribute with a fuzzy equality comparison.  Since AGE contains fuzzy values, we must determine the degree for two fuzzy ages, say About 32 and Middle age, to be equal (that is, to satisfy the join condition AGE = AGE). Where About 32 = MF (30, 32, 32, 34),  About 34 = MF (32, 34, 34, 36), and Middle age = MF (30, 35, 45, 50). It is obvious from Example that the computation of the satisfaction degree of the fuzzy equality comparison is the key to the meaning of the fuzzy equi-join
In the following, we propose a new measure for the fuzzy equality Comparison  based on the similarity of fuzzy values. Definition : Let D be a set of values. The fuzzy equality on D is a mapping ~=: D×D   [0, 1]. that for every pair of values v1 =MF (a1, b1, c1, d1)  and v2 =MF(a2, b2, c2, d2)  in D, gives (v1 ~= v2) = Where ∫ is over the universe on which the membership functions are defined,  and is interpreted as a summation if the universe is discrete  Intuitively, ∫min( μ v1(x),  μ v2(x))dx  is the accumulated membership degrees of the intersection, and ∫max( μ v1(x),  μ v2(x))dx  is that of the union of the two fuzzy  sets defining v1 and v2.  ∫ min( μ v1 (x),  μ v2 (x))dx  ∫ max( μ v1 (x),  μ v2 (x))dx
Definition : A fuzzy equi-join of fuzzy relations R and S on attributes R.A  and S.B with a threshold i ≥ 0. Denoted by R   (R.A  ~= S.B) ≥i  S   Is a fuzzy relation T with the membership function defined by  μ T(xy)=min ( μ R(x) ,  μ S(y) ,  μ q(xy) )  . Where x is a tuple in R , y is a tuple in S, and μ q(xy) = { Since this fuzzy equi-join allows the threshold value to be specified, it is very flexible and can be evaluated more efficiently than existing ones. Otherwise . x[A] ~=y[B] If (x[A] ~=y[B] < i)  0,
Compared with the existing measures, the new measure seems more natural. 1- Allows the algebraic operations  to be composed . 2- The degree is obtained by considering all possible values in both fuzzy data  rather than one best possible value of each fuzzy  data. Therefore, it is more intuitive. 3- fuzzy data can be regarded as the subjective representation of a real-world data viewed by an observer . 4- Note that for fuzzy data, the satisfaction degree must always be treated as  uncertain. 5- Notice that, for crisp data, the fuzzy equality is the same as the ordinary  equality, that is, it is a &quot;hard&quot; comparison.
AN INTERVAL-BASED FUZZY JOIN ALGORITHM We now present a  Sort-Merge Fuzzy Equi-join (SMFEJ)  algorithm,for evaluating the fuzzy equi-join. The purpose of using  SMFEJ  to   evaluate the fuzzy equi-join efficiently. The SMFEJ algorithm  assumes that fuzzy join attributes have  numeric  universes and membership functions are defined by the generic  parameterized function.
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
2-  joining phase . In the joining phase, each page of R is read once. For each tuple r in R, the S-tuples that may join with r are in the range of r as defined below. Thus, only those pages containing rngs(r) need to be read into a buffer and those tuples in rngs(r) need to be scanned to see if they actually join with r. Thus, the  time complexity  of the algorithm will be  O(cost(sorting)+ n + m) , where n and m are the sizes of R and S, respectively, in pages, and cost(sorting) is the time spent on sorting R and S including both I/O and CPU time. Typically cost(sorting) = n log n + m log m.
FUZZY EQUALITY INDICATORS We now consider how to use the SMFEJ algorithm to evaluate fuzzy equi-join efficiently. For practical reasons, we assume a  limited buffer  space available to the algorithm. Thus, during the joining phase, some pages in rngs(r) for some tuple r may have to be  swapped out  of the buffer to make rooms for other pages, and then be swapped back in because they are also in the range of the next R-tuple. In this case, the key to the efficient evaluation of fuzzy equi-join is to determine the appropriate intervals to associate with the fuzzy attribute values
Example : Assume that R has a tuple r with r[A] = MF(10, 10, 40, 40) and S contains exactly the tuples s1, . . . , s9 with S1[B] = MF(5, 5, 20, 20), S2[B] = MF(6, 6, 9, 9), S3[B] = MF(10, 10, 40, 40), S4[B] = MF(11, 11, 16, 16), S5[B] = MF(15, 15, 45, 45), S6[B] = MF(20, 20, 30, 30), S7[B] = MF(20, 20, 50, 50), S8[B] = MF(32, 32, 36, 36), and s9[B] = MF(35, 35, 60, 60). Thus, rngs(r). is  [ s1, . . . , s9 ] .
With a little calculation, we have : If the join condition is  (R.A ~= S.B) ≥ 0.5 , only s3, s5, and s7 will join with r. If the threshold value is raised from 0.5 to 0.9, only s3 will join with r. In both cases, however, all S tuples must be scanned. 0.1 ( r[A] ~= s9[B] ) 0.13 ( r[A] ~= s8[B] ) 0.5 ( r[A] ~= s7[B] ) 0.33 ( r[A] ~= s6[B] ) 0.71 ( r[A] ~= s5[B] ) 0.17 ( r[A] ~= s4[B] ) 1 ( r[A] ~= s3[B] ) 0 ( r[A] ~= s2[B] ) 0.29 ( r[A] ~= s1[B] ) Value ( r[A] ~= si[B] )
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Since every tuple in rngs(r) must be scanned during the join,  the efficiency can be improved by  moving as many irrelevant tuples out of rngs(r).  as possible. This can be achieved if the assignment of intervals to the attribute values is an appropriate function of the threshold value, so that the sorting will rearrange the tuples appropriately.
Intuitively, if f is an FE indicator over the domain of the join attributes of a fuzzy equi-join, by assigning intervals to join attribute values using f,  it guarantees that after sorting according to  , for every tuple r in R, every relevant S-tuple is in rngs(r). However, it does not guarantee that every tuple in rngs(r). joins with r unless f is a perfect FE indicator.  If both f and g are FE indicators over the domains of the join attributes, and f is stronger than g, f will assign smaller intervals to values than g would, thus may move more irrelevant tuples out of rngs(r) for every r.
EXPERIMENT RESULTS Study performance of algorithm  SMFEJ  using various types of data and the FE indicators The performance study is based on a simulation of algorithm SMFEJ on synthetic data. The experiments are performed using a Sun SPARCStation 5. The performance of the algorithm is measured by : 1- The number of I/O pages read from the inner relation, as the I/O  cost. 2- The number of comparisons made, as the CPU cost. For each pair of R and S tuple, if the values in the join attributes overlap with each other,  two comparisons are recorded , one to determine that they overlap, and the other to determine whether they really join. If the two values do not overlap,  one comparison  is recorded .
Experiment Result
The algorithm SMFEJ is implemented to take advantage of page buffers.  For each page of relation R, one page of relation S is read at a time,  and all join results that can be obtained from the two pages will be obtained before the next page of relation S is read. It is straightforward to see that a  larger   buffer space  will 1- Reduce the I/O cost.  2- Save more CPU cost than I/O cost.
CONCLUSION In this paper, we propose a  1- New fuzzy equality comparison operator with a measure that combines the possibility measure with the similarity measure. 2- Define a type of fuzzy equi-join based on the new fuzzy equality comparison operator which allows threshold values to be associated with individual predicates of the join condition.  3- A sort-merge join algorithm based on a partial order of intervals is used to evaluate the fuzzy equi-join.  4- Define FE indicators, that determine appropriate intervals for fuzzy data, are identified for data sets with different characteristics.  5- Experiment results from our preliminary simulation of the algorithm show a significant improvement of efficiency when FE indicators are used in conjunction with the sort-merge join algorithm.
It may be interesting to study 1- study Other types of data correlations . 2- Finding efficient join algorithms that can be applied to both numeric and  discrete attributes is an important issue .  3-Finding new types of fast access paths that handle both crisp and fuzzy data efficiently is a challenging task.

Weitere ähnliche Inhalte

Was ist angesagt?

When to use a structure vs classes in c++
When to use a structure vs classes in c++When to use a structure vs classes in c++
When to use a structure vs classes in c++Naman Kumar
 
Dbms important questions and answers
Dbms important questions and answersDbms important questions and answers
Dbms important questions and answersLakshmiSarvani6
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R NotesLakshmiSarvani6
 
Bca3020– data base management system(dbms)
Bca3020– data base management system(dbms)Bca3020– data base management system(dbms)
Bca3020– data base management system(dbms)smumbahelp
 
Relational database
Relational  databaseRelational  database
Relational databaseamkrisha
 
Efficiency of TreeMatch Algorithm in XML Tree Pattern Matching
Efficiency of TreeMatch Algorithm in XML Tree Pattern  MatchingEfficiency of TreeMatch Algorithm in XML Tree Pattern  Matching
Efficiency of TreeMatch Algorithm in XML Tree Pattern MatchingIOSR Journals
 
Lecture18 structurein c.ppt
Lecture18 structurein c.pptLecture18 structurein c.ppt
Lecture18 structurein c.ppteShikshak
 
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...cscpconf
 
Chapter01 introductory handbook
Chapter01 introductory handbookChapter01 introductory handbook
Chapter01 introductory handbookRaman Kannan
 
Sv data types and sv interface usage in uvm
Sv data types and sv interface usage in uvmSv data types and sv interface usage in uvm
Sv data types and sv interface usage in uvmHARINATH REDDY
 
Automatic face naming by learning discriminative affinity matrices from weakl...
Automatic face naming by learning discriminative affinity matrices from weakl...Automatic face naming by learning discriminative affinity matrices from weakl...
Automatic face naming by learning discriminative affinity matrices from weakl...Raja Ram
 
Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...
Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...
Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...Jinho Choi
 
Fundamentals of Database Systems Questions and Answers
Fundamentals of Database Systems Questions and AnswersFundamentals of Database Systems Questions and Answers
Fundamentals of Database Systems Questions and AnswersAbdul Rahman Sherzad
 

Was ist angesagt? (19)

When to use a structure vs classes in c++
When to use a structure vs classes in c++When to use a structure vs classes in c++
When to use a structure vs classes in c++
 
Dbms important questions and answers
Dbms important questions and answersDbms important questions and answers
Dbms important questions and answers
 
Lesson 2.2 abstraction
Lesson 2.2   abstractionLesson 2.2   abstraction
Lesson 2.2 abstraction
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R Notes
 
C# program structure
C# program structureC# program structure
C# program structure
 
Structures in c++
Structures in c++Structures in c++
Structures in c++
 
Bca3020– data base management system(dbms)
Bca3020– data base management system(dbms)Bca3020– data base management system(dbms)
Bca3020– data base management system(dbms)
 
Relational database
Relational  databaseRelational  database
Relational database
 
Efficiency of TreeMatch Algorithm in XML Tree Pattern Matching
Efficiency of TreeMatch Algorithm in XML Tree Pattern  MatchingEfficiency of TreeMatch Algorithm in XML Tree Pattern  Matching
Efficiency of TreeMatch Algorithm in XML Tree Pattern Matching
 
Lecture18 structurein c.ppt
Lecture18 structurein c.pptLecture18 structurein c.ppt
Lecture18 structurein c.ppt
 
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...
 
Chapter01 introductory handbook
Chapter01 introductory handbookChapter01 introductory handbook
Chapter01 introductory handbook
 
Relational model
Relational modelRelational model
Relational model
 
C structure and union
C structure and unionC structure and union
C structure and union
 
Sv data types and sv interface usage in uvm
Sv data types and sv interface usage in uvmSv data types and sv interface usage in uvm
Sv data types and sv interface usage in uvm
 
Chapter 3 Entity Relationship Model
Chapter 3 Entity Relationship ModelChapter 3 Entity Relationship Model
Chapter 3 Entity Relationship Model
 
Automatic face naming by learning discriminative affinity matrices from weakl...
Automatic face naming by learning discriminative affinity matrices from weakl...Automatic face naming by learning discriminative affinity matrices from weakl...
Automatic face naming by learning discriminative affinity matrices from weakl...
 
Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...
Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...
Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...
 
Fundamentals of Database Systems Questions and Answers
Fundamentals of Database Systems Questions and AnswersFundamentals of Database Systems Questions and Answers
Fundamentals of Database Systems Questions and Answers
 

Ähnlich wie An Efficient Evaluation of a Fuzzy Equi-Join

FUZZY STATISTICAL DATABASE AND ITS PHYSICAL ORGANIZATION
FUZZY STATISTICAL DATABASE AND ITS PHYSICAL ORGANIZATIONFUZZY STATISTICAL DATABASE AND ITS PHYSICAL ORGANIZATION
FUZZY STATISTICAL DATABASE AND ITS PHYSICAL ORGANIZATIONijdms
 
MemFunc.doc
MemFunc.docMemFunc.doc
MemFunc.docbutest
 
Application for Logical Expression Processing
Application for Logical Expression Processing Application for Logical Expression Processing
Application for Logical Expression Processing csandit
 
A Framework for Computing Linguistic Hedges in Fuzzy Queries
A Framework for Computing Linguistic Hedges in Fuzzy QueriesA Framework for Computing Linguistic Hedges in Fuzzy Queries
A Framework for Computing Linguistic Hedges in Fuzzy Queriesijdms
 
A fuzzy frequent pattern growth
A fuzzy frequent pattern growthA fuzzy frequent pattern growth
A fuzzy frequent pattern growthIJDKP
 
Emerging Approach to Computing Techniques.pptx
Emerging Approach to Computing Techniques.pptxEmerging Approach to Computing Techniques.pptx
Emerging Approach to Computing Techniques.pptxPoonamKumarSharma
 
Trajectory Data Fuzzy Modeling : Ambulances Management Use Case
Trajectory Data Fuzzy Modeling : Ambulances Management Use CaseTrajectory Data Fuzzy Modeling : Ambulances Management Use Case
Trajectory Data Fuzzy Modeling : Ambulances Management Use Caseijdms
 
An incremental approach to attribute reduction of dynamic set-valued informat...
An incremental approach to attribute reduction of dynamic set-valued informat...An incremental approach to attribute reduction of dynamic set-valued informat...
An incremental approach to attribute reduction of dynamic set-valued informat...Guangming Lang
 
Fuzzy logic and fuzzy time series edited
Fuzzy logic and fuzzy time series   editedFuzzy logic and fuzzy time series   edited
Fuzzy logic and fuzzy time series editedProf Dr S.M.Aqil Burney
 
Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...eSAT Journals
 
Citython presentation
Citython presentationCitython presentation
Citython presentationAnkit Tewari
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkshesnasuneer
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkshesnasuneer
 
Literature Review on Vague Set Theory in Different Domains
Literature Review on Vague Set Theory in Different DomainsLiterature Review on Vague Set Theory in Different Domains
Literature Review on Vague Set Theory in Different Domainsrahulmonikasharma
 
Ijarcet vol-2-issue-4-1363-1367
Ijarcet vol-2-issue-4-1363-1367Ijarcet vol-2-issue-4-1363-1367
Ijarcet vol-2-issue-4-1363-1367Editor IJARCET
 

Ähnlich wie An Efficient Evaluation of a Fuzzy Equi-Join (20)

FUZZY STATISTICAL DATABASE AND ITS PHYSICAL ORGANIZATION
FUZZY STATISTICAL DATABASE AND ITS PHYSICAL ORGANIZATIONFUZZY STATISTICAL DATABASE AND ITS PHYSICAL ORGANIZATION
FUZZY STATISTICAL DATABASE AND ITS PHYSICAL ORGANIZATION
 
MemFunc.doc
MemFunc.docMemFunc.doc
MemFunc.doc
 
Fb35884889
Fb35884889Fb35884889
Fb35884889
 
Fuzzy set
Fuzzy set Fuzzy set
Fuzzy set
 
Fuzzy logic1
Fuzzy logic1Fuzzy logic1
Fuzzy logic1
 
FuzzySet.pptx
FuzzySet.pptxFuzzySet.pptx
FuzzySet.pptx
 
Fuzzy logic
Fuzzy logicFuzzy logic
Fuzzy logic
 
Application for Logical Expression Processing
Application for Logical Expression Processing Application for Logical Expression Processing
Application for Logical Expression Processing
 
A Framework for Computing Linguistic Hedges in Fuzzy Queries
A Framework for Computing Linguistic Hedges in Fuzzy QueriesA Framework for Computing Linguistic Hedges in Fuzzy Queries
A Framework for Computing Linguistic Hedges in Fuzzy Queries
 
A fuzzy frequent pattern growth
A fuzzy frequent pattern growthA fuzzy frequent pattern growth
A fuzzy frequent pattern growth
 
Emerging Approach to Computing Techniques.pptx
Emerging Approach to Computing Techniques.pptxEmerging Approach to Computing Techniques.pptx
Emerging Approach to Computing Techniques.pptx
 
Trajectory Data Fuzzy Modeling : Ambulances Management Use Case
Trajectory Data Fuzzy Modeling : Ambulances Management Use CaseTrajectory Data Fuzzy Modeling : Ambulances Management Use Case
Trajectory Data Fuzzy Modeling : Ambulances Management Use Case
 
An incremental approach to attribute reduction of dynamic set-valued informat...
An incremental approach to attribute reduction of dynamic set-valued informat...An incremental approach to attribute reduction of dynamic set-valued informat...
An incremental approach to attribute reduction of dynamic set-valued informat...
 
Fuzzy logic and fuzzy time series edited
Fuzzy logic and fuzzy time series   editedFuzzy logic and fuzzy time series   edited
Fuzzy logic and fuzzy time series edited
 
Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...
 
Citython presentation
Citython presentationCitython presentation
Citython presentation
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
Literature Review on Vague Set Theory in Different Domains
Literature Review on Vague Set Theory in Different DomainsLiterature Review on Vague Set Theory in Different Domains
Literature Review on Vague Set Theory in Different Domains
 
Ijarcet vol-2-issue-4-1363-1367
Ijarcet vol-2-issue-4-1363-1367Ijarcet vol-2-issue-4-1363-1367
Ijarcet vol-2-issue-4-1363-1367
 

An Efficient Evaluation of a Fuzzy Equi-Join

  • 1. The Fuzzy Logical Databases and An Efficient Evaluation of a Fuzzy Equi-Join Supervised by : Dr. Bassam Hammo Presented by : Alaa AlZoubi
  • 2. OutLines 1 . Introduction to Fuzzy concepts in database. 2. Interpretation of Fuzzy Terms. 3. We propose a new measure for a fuzzy equality. 4. We define a new type of fuzzy equi-join that is based on the new fuzzy equality. 5. A sort-merge join algorithm based on a partial order of intervals is used to evaluate the fuzzy equi-join 6. Experiment results to show a Significant improvement of efficiency when FE indicators are used with the Sort-merge join algorithm
  • 3. ABSTRACT In many real world Applications Such as business decision making, medical diagnosis, and criminal justice, have to deal with information that is uncertain or imprecise . Classical database models often suffer from their incapability of representing and manipulating imprecise and uncertain information. Example : The age of Tom is “About 32” . knowledge - base and database systems should directly support such applications by providing functionalities to store and to manipulate ill - known data .
  • 4.
  • 5. These models can be classified into two categories: 1-similarity-based In a similarity - based model, some similarity relationships are specified for some attributes so that values of these attributes may be grouped into similarity classes. 2-possibility-based In a possibility - based model, an ill - known data is represented by a possibility distribution which describes the possibility for each crisp attribute value to be the actual value of the data.
  • 6. Among the algebraic operations, fuzzy join is an important and expensive one, and its efficient evaluation is more difficult than that of an ordinary join . There are two reasons for the difficulty: 1- diverse semantics : In a fuzzy join, two tuples may join even if they do not completely satisfy the join condition. The extent to which they do satisfy the join condition is usually Represented by some satisfaction degrees 2-lack of fast access paths :most efficient join algorithms such as indexing and hashing used in ordinary databases relational do not apply directly to fuzzy relational databases.
  • 7.
  • 8. Introducing fuzziness? The table is a relation, the columns are attributes and the rows are tuples. Each attribute has a domain. For example the domain for the attribute JobType might be the set (Academic, Industry, Government). Robotics Government simth Cindy Statistics Government Alin Bill Expert Systems Industry Smith Alan AI Academic Tom Smith Expertise JobType SecondName FirstName
  • 9. Imprecision in attribute values can be introduced using a similarity matrix e.g. for the attribute expertise matrices of this type can be used to determine the matching degree between an applicant and a job opening. 1.0 0.9 0.2 0.6 AI 0.9 1.0 0.2 0.6 Expert Systems 0.2 0.2 1.0 0.2 Statistics 0.6 0.6 0.2 1.0 Robotics AI Expert Systems Statistics Robotics
  • 10. Imprecision in attribute values can also be introduced using a linguistic variable . e.g. the attribute height could be stored as values (short, average, tall) each of these could be modeled as a fuzzy set. Another way in which imprecision may be introduced is to permit partial membership of a tuple in a relation. (The degree to which a tuple is a member is stored as a special attribute). The following is a relation (or table) storing information re endangered species. Tuples of this type are sometimes called weighted tuples. 0.4 … coastal Black Duck 0.4 … rain forest Wren 1.0 … grassland Field Mouse membership ------------- habitat Name
  • 11. An example: By using conventional method we can call a person “TALL” if the height is 7 feet and a person with height 5 feet is NOT TALL. That is we represent the person is either “TALL” or “NOT TALL” in Boolean Logic 1 or 0, 1 for “TALL” and 0 for “NOT TALL”   Fuzzy sets may be used to show the relationship or degree of precision :   If S is the set of all people in the Universe, a degree of membership is assigned to each person in set S to find the subset TALL.   The membership function is based on the person’s height. TALL(x) = 0, if Height(x) < 5, (Height(x) – 5 )/ 2 , if 5 ≤ Height(x) < 7 1, if Height(x) ≥ 7 feet  
  • 13. Imprecise queries: A user may make an imprecise query on a database. This can be due to the use of: 1.Imprecise conditions. “ Find all tax payers who have been audited in 2008 and whose income is low”. 2.Imprecise operators. “ Find all countries whose export revenue is about the same as the import revenue”. 3.Imprecise quantifiers. “ Find the companies whose customers are mostly from government agencies”.
  • 14. Interpretation of Fuzzy Terms A fuzzy data has an uncertain or imprecise value. We associate each fuzzy data v with a fuzzy term and a membership function (of a fuzzy set). The membership function, denoted by µv, maps each crisp value x in the universe of v to a membership degree µv(x) in [0, 1] to indicate the possibility of v = x. A membership function can be defined in a number of ways. Over a numerical universe, a membership functions is typically convex (with a convex curve) and normal (at least one member has degree 1). We consider membership functions of a trapezoidal shape, and denote them by MF (a, b, c, d), where the parameters mark the endpoints of the shape.
  • 15. If a value v has a membership function defined by MF (a, b, c, d), the interval [a, d] is called the supporting interval of v As special cases, MF (l, l, u, u) defines an interval [l, u], MF (v, v, v, v) defines a crisp value v, and MF (a, b, b, d) is a triangular function. * ( Over a categorical universe, membership function is defined by µv = x1/m1 + x2/m2 + _ _ _ + xk/mk, where xi is a value in the universe and mi is the membership degree of xi. The membership function of a single crisp value v is µv = v/1 ) . For example, the membership function defining fuzzy term F2 in this Figure is denoted by MF (20, 30, 40, 50).
  • 16. FUZZY RELATIONS In this section, we briefly describe the representation of data in a fuzzy relational database. A data is crisp if it is certain and precise, and fuzzy, otherwise. A fuzzy (sub) set F of an ordinary set U is characterized by a membership function: µF: U—> [0, 1] The Idea of Fuzzy Sets   Fuzzy sets are functions that map a value, which might be a member of a set, to a number between zero and one, indicating its actual degree of membership A degree of zero means that the value is not in the set, and a degree of one means that the value is completely representative of the set.
  • 17. For every (crisp) value (x U), µF(x) is the membership degree of x with respect to (wrt) F that is, (µF(x) = 1) if x is a full member, (0 < µ F(x) < 1) if x is a partial member or ( µ F(x) = 0) if x is not a member of F. Without loss of generality, x is in F only if µF(x) > 0. A fuzzy data v is represented by a possibility distribution restricted by a fuzzy set F in the sense that v is a member of F, and the possibility for v to be a member x of F is exactly µF(x).
  • 18. A membership function can be defined in a number of ways: Over a numerical universe, a membership function is typically convex (with a convex curve) and normal (at least one member has degree 1). The following generic parameterized function to define such membership functions. MF (a,b,c,d)(x)= { C=d < x 0, if c<d ≤ x , or (d-x)/(d-c), if c < x <d ; 1, if b ≤ x ≤ c (x-a)/(b-a), if a<x<b; X<a=b 0, if x≤ a<b or
  • 19. Where the parameters a, b, c, and d are values in the universe satisfying a ≤ b ≤ c ≤ d. In general, the curve of the generic function is a trapezoidal, as shown in The Following Fig, but can also be some other shapes. For example, MF (a, b, b, d) defines a triangular function since the second and the third parameters are the same.
  • 20. Over a nonnumerical universe, a membership function takes the form of (µF=X1/m1+ X2/m2 +… +Xk/mk); Where xi is a value in the universe and mi is the membership degree of xi with respect to F. In this case, the degenerated membership function of a crisp value v is µv= v/1. The universe of an attribute A, denoted by U (A), is the set of crisp values that may appear in the attribute. The domain of an attribute A, denoted by D (A), is the set of all (both crisp and fuzzy) values defined over U (A). A fuzzy relation R with a schema (A1, A2. . . An) is a fuzzy set of tuples in D(A1) ×…× D(An) .
  • 22. A FUZZY EQUI-JOIN In this section, we first define a fuzzy equality and then use it to define a fuzzy equi-join. The following example shows the needs for a fuzzy equi-join. Example : Consider the following relation R (as shown in following Table). The query &quot;Find all pairs of persons from R whose ages are equal to a degree no less than 0.5“ 58 Farmer Tom Middle age Teacher Mike About 34 Lawyer Cindy About 32 Teacher Bill 31 DBA Alan 20 Engineer Smith AGE OCCUPATION NAME
  • 23. Solution : a join of R with itself on the AGE attribute with a fuzzy equality comparison. Since AGE contains fuzzy values, we must determine the degree for two fuzzy ages, say About 32 and Middle age, to be equal (that is, to satisfy the join condition AGE = AGE). Where About 32 = MF (30, 32, 32, 34), About 34 = MF (32, 34, 34, 36), and Middle age = MF (30, 35, 45, 50). It is obvious from Example that the computation of the satisfaction degree of the fuzzy equality comparison is the key to the meaning of the fuzzy equi-join
  • 24. In the following, we propose a new measure for the fuzzy equality Comparison based on the similarity of fuzzy values. Definition : Let D be a set of values. The fuzzy equality on D is a mapping ~=: D×D  [0, 1]. that for every pair of values v1 =MF (a1, b1, c1, d1) and v2 =MF(a2, b2, c2, d2) in D, gives (v1 ~= v2) = Where ∫ is over the universe on which the membership functions are defined, and is interpreted as a summation if the universe is discrete Intuitively, ∫min( μ v1(x), μ v2(x))dx is the accumulated membership degrees of the intersection, and ∫max( μ v1(x), μ v2(x))dx is that of the union of the two fuzzy sets defining v1 and v2. ∫ min( μ v1 (x), μ v2 (x))dx ∫ max( μ v1 (x), μ v2 (x))dx
  • 25. Definition : A fuzzy equi-join of fuzzy relations R and S on attributes R.A and S.B with a threshold i ≥ 0. Denoted by R (R.A ~= S.B) ≥i S Is a fuzzy relation T with the membership function defined by μ T(xy)=min ( μ R(x) , μ S(y) , μ q(xy) ) . Where x is a tuple in R , y is a tuple in S, and μ q(xy) = { Since this fuzzy equi-join allows the threshold value to be specified, it is very flexible and can be evaluated more efficiently than existing ones. Otherwise . x[A] ~=y[B] If (x[A] ~=y[B] < i) 0,
  • 26. Compared with the existing measures, the new measure seems more natural. 1- Allows the algebraic operations to be composed . 2- The degree is obtained by considering all possible values in both fuzzy data rather than one best possible value of each fuzzy data. Therefore, it is more intuitive. 3- fuzzy data can be regarded as the subjective representation of a real-world data viewed by an observer . 4- Note that for fuzzy data, the satisfaction degree must always be treated as uncertain. 5- Notice that, for crisp data, the fuzzy equality is the same as the ordinary equality, that is, it is a &quot;hard&quot; comparison.
  • 27. AN INTERVAL-BASED FUZZY JOIN ALGORITHM We now present a Sort-Merge Fuzzy Equi-join (SMFEJ) algorithm,for evaluating the fuzzy equi-join. The purpose of using SMFEJ to evaluate the fuzzy equi-join efficiently. The SMFEJ algorithm assumes that fuzzy join attributes have numeric universes and membership functions are defined by the generic parameterized function.
  • 28.
  • 29. 2- joining phase . In the joining phase, each page of R is read once. For each tuple r in R, the S-tuples that may join with r are in the range of r as defined below. Thus, only those pages containing rngs(r) need to be read into a buffer and those tuples in rngs(r) need to be scanned to see if they actually join with r. Thus, the time complexity of the algorithm will be O(cost(sorting)+ n + m) , where n and m are the sizes of R and S, respectively, in pages, and cost(sorting) is the time spent on sorting R and S including both I/O and CPU time. Typically cost(sorting) = n log n + m log m.
  • 30. FUZZY EQUALITY INDICATORS We now consider how to use the SMFEJ algorithm to evaluate fuzzy equi-join efficiently. For practical reasons, we assume a limited buffer space available to the algorithm. Thus, during the joining phase, some pages in rngs(r) for some tuple r may have to be swapped out of the buffer to make rooms for other pages, and then be swapped back in because they are also in the range of the next R-tuple. In this case, the key to the efficient evaluation of fuzzy equi-join is to determine the appropriate intervals to associate with the fuzzy attribute values
  • 31. Example : Assume that R has a tuple r with r[A] = MF(10, 10, 40, 40) and S contains exactly the tuples s1, . . . , s9 with S1[B] = MF(5, 5, 20, 20), S2[B] = MF(6, 6, 9, 9), S3[B] = MF(10, 10, 40, 40), S4[B] = MF(11, 11, 16, 16), S5[B] = MF(15, 15, 45, 45), S6[B] = MF(20, 20, 30, 30), S7[B] = MF(20, 20, 50, 50), S8[B] = MF(32, 32, 36, 36), and s9[B] = MF(35, 35, 60, 60). Thus, rngs(r). is [ s1, . . . , s9 ] .
  • 32. With a little calculation, we have : If the join condition is (R.A ~= S.B) ≥ 0.5 , only s3, s5, and s7 will join with r. If the threshold value is raised from 0.5 to 0.9, only s3 will join with r. In both cases, however, all S tuples must be scanned. 0.1 ( r[A] ~= s9[B] ) 0.13 ( r[A] ~= s8[B] ) 0.5 ( r[A] ~= s7[B] ) 0.33 ( r[A] ~= s6[B] ) 0.71 ( r[A] ~= s5[B] ) 0.17 ( r[A] ~= s4[B] ) 1 ( r[A] ~= s3[B] ) 0 ( r[A] ~= s2[B] ) 0.29 ( r[A] ~= s1[B] ) Value ( r[A] ~= si[B] )
  • 33.
  • 34. Since every tuple in rngs(r) must be scanned during the join, the efficiency can be improved by moving as many irrelevant tuples out of rngs(r). as possible. This can be achieved if the assignment of intervals to the attribute values is an appropriate function of the threshold value, so that the sorting will rearrange the tuples appropriately.
  • 35. Intuitively, if f is an FE indicator over the domain of the join attributes of a fuzzy equi-join, by assigning intervals to join attribute values using f, it guarantees that after sorting according to , for every tuple r in R, every relevant S-tuple is in rngs(r). However, it does not guarantee that every tuple in rngs(r). joins with r unless f is a perfect FE indicator. If both f and g are FE indicators over the domains of the join attributes, and f is stronger than g, f will assign smaller intervals to values than g would, thus may move more irrelevant tuples out of rngs(r) for every r.
  • 36. EXPERIMENT RESULTS Study performance of algorithm SMFEJ using various types of data and the FE indicators The performance study is based on a simulation of algorithm SMFEJ on synthetic data. The experiments are performed using a Sun SPARCStation 5. The performance of the algorithm is measured by : 1- The number of I/O pages read from the inner relation, as the I/O cost. 2- The number of comparisons made, as the CPU cost. For each pair of R and S tuple, if the values in the join attributes overlap with each other, two comparisons are recorded , one to determine that they overlap, and the other to determine whether they really join. If the two values do not overlap, one comparison is recorded .
  • 38. The algorithm SMFEJ is implemented to take advantage of page buffers. For each page of relation R, one page of relation S is read at a time, and all join results that can be obtained from the two pages will be obtained before the next page of relation S is read. It is straightforward to see that a larger buffer space will 1- Reduce the I/O cost. 2- Save more CPU cost than I/O cost.
  • 39. CONCLUSION In this paper, we propose a 1- New fuzzy equality comparison operator with a measure that combines the possibility measure with the similarity measure. 2- Define a type of fuzzy equi-join based on the new fuzzy equality comparison operator which allows threshold values to be associated with individual predicates of the join condition. 3- A sort-merge join algorithm based on a partial order of intervals is used to evaluate the fuzzy equi-join. 4- Define FE indicators, that determine appropriate intervals for fuzzy data, are identified for data sets with different characteristics. 5- Experiment results from our preliminary simulation of the algorithm show a significant improvement of efficiency when FE indicators are used in conjunction with the sort-merge join algorithm.
  • 40. It may be interesting to study 1- study Other types of data correlations . 2- Finding efficient join algorithms that can be applied to both numeric and discrete attributes is an important issue . 3-Finding new types of fast access paths that handle both crisp and fuzzy data efficiently is a challenging task.