SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
Web Page Clustering Using a Fuzzy Logic Based
   Representation and Self-organizing Maps

    Alberto P. Garc´
                   ıa-Plaza, V´
                              ıctor Fresno, Raquel Mart´
                                                       ınez
                     NLP & IR Group, UNED

                       December 12, 2008
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents



             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                  slide 2
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents



             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                  slide 3
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                                  Objectives


              Group HTML documents by content similarity.
              Self-Organizing Maps (SOM) to organize, visualize and
              navigate through the collection.
              Term weighting function taking advantage of HTML tags
                      Combining, by means of fuzzy logic, heuristic criteria based on
                      the inherent semantics of some HTML tags and word positions
                      in the document.

       Hypothesis
       An improvement in document representation will involve an
       increase in map quality.



Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                 slide 4
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
                   1   Fuzzy Logic
                   2   EFCC
                   3   Linguistic Variables
                   4   Knowledge Base
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                  slide 5
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                                 Fuzzy logic



              Capturing human expert knowledge.
              Close to natural language.
              Knowledge base: defined by a set of IF-THEN rules.
              Linguistic variables
                      Defined using natural language words and fuzzy sets.
                      These sets allow the description of the membership degree of
                      an object to a particular class.




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                 slide 6
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
                   1   Fuzzy Logic
                   2   EFCC
                   3   Linguistic Variables
                   4   Knowledge Base
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                  slide 7
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                 slide 8
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                 slide 9
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 10
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 11
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 12
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 13
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 14
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 15
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 16
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 17
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 18
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
                   1   Fuzzy Logic
                   2   EFCC
                   3   Linguistic Variables
                   4   Knowledge Base
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 19
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Linguistic Variables




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 20
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Linguistic Variables




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 21
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Linguistic Variables




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 22
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Linguistic Variables




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 23
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Linguistic Variables




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 24
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Linguistic Variables




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 25
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
                   1   Fuzzy Logic
                   2   EFCC
                   3   Linguistic Variables
                   4   Knowledge Base
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 26
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                           Knowledge Base




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 27
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                           Knowledge Base




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 28
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                           Knowledge Base




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 29
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                           Knowledge Base




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 30
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
                   1   Dimensionality Reduction
                   2   Document Map
                   3   Evaluation Methods
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 31
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                  Dimensionality Reduction


              Input vectors dimension ranging from 100 to 5000
              Stopwords, puntuaction marks suffixes, and words occurring
              less than 50 times in the whole corpus were removed.
              Two well known methods:
                      Document frequency reduction.
                      Random projection method.
              Three proposed rank-based methods:
                      Most Valued Terms.
                      Fixed reduction method.
                      More Frequent Terms until n level.




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 32
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
                   1   Dimensionality Reduction
                   2   Document Map
                   3   Evaluation Methods
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 33
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                 Experiment Description               Results                  Conclusion


                              Document Map Construction



              Benchmark dataset for clustering: Banksearch1
                      10000 documents
                      10 classes
              SOM size was set equal to the number of classes of input
              documents, i.e. 5x2, in order to compare clustering results.




            1
              M. P. Sinka and D. W. Corne. A large benchmark dataset for web document clustering. Soft Computing
       Systems: Design, Management, and Applications, 2002.
Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                        slide 34
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
                   1   Dimensionality Reduction
                   2   Document Map
                   3   Evaluation Methods
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 35
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Evaluation Methods



              Weighted average of the F-measure for each class.
              After mapping the collection in the trained map, the class
              with greater number of documents mapped on a neuron will
              be selected to label the unit.
              All the document vectors in a neuron which class is different
              from the neuron label will be counted as errors.




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 36
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents



             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 37
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


             Best reduction for each term weighting function




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 38
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                         MFTn reduction provides stability




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 39
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


             EFCC+MFTn obtains its best results with the
                   smallest number of features




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 40
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents



             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 41
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                                 Conclusion


              Unsupervised document representation method, based on
              fuzzy logic, focused on clustering HTML documents by means
              of self-organizing maps.
              MFTn reduction is the most stable reduction in all cases.
              EFCC representation allows to obtain better results using a
              smaller vocabulary.
              Smaller number of features needed to represent the input
              documents and SOM unit vectors, which implies an
              improvement in computational cost.




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 42
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                            Thank You!




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 43
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives                Our Approach                  Experiment Description                   Results               Conclusion


                                                 Related Work

                                       VSM       Topic     Document                    Weighting             Modifies
                                               Information   Type                      Function               SOM
         Self organization of
         a Massive Document             Yes         Yes             Text         Shannon’s Entrophy              No
         Collection2
         Document Clustering            Yes          No             Text         Binary, TF, TF-IDF              No
         using Phrases3
         Document Clustering            Yes         Yes             Text        ESVM, HSVM, HyM                  No
         using WordNet4
         Conceptional SOM5              Yes          No             Text                    TF                   Yes




            2
              T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, and A. Saarela. Self organization of a
       massive document collection. IEEE Trans. on Neural Networks, 2000.
            3
              J. Bakus, M. Hussin, and M. Kamel. A som-based document clustering using phrases. In ICONIP, 2002.
            4
              C. Hung and S. Wermter. Neural network based document clustering using wordnet ontologies. Int. J.
       Hybrid Intell. Syst., 2004
            5
              Y. Liu, X. Wang, and C. Wu. Consom: A conceptional som model for text clustering. In Neurocomputing,
       2008
Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                                slide 44

Weitere ähnliche Inhalte

Andere mochten auch

Fuzzy logic
Fuzzy logicFuzzy logic
Fuzzy logicvini89
 
Developing Efficient Web-based GIS Applications
Developing Efficient Web-based GIS ApplicationsDeveloping Efficient Web-based GIS Applications
Developing Efficient Web-based GIS ApplicationsSwetha A
 
Introduction to sar-marjolaine_rouault
Introduction to sar-marjolaine_rouaultIntroduction to sar-marjolaine_rouault
Introduction to sar-marjolaine_rouaultNaivedya Mishra
 
Synthetic aperture radar
Synthetic aperture radarSynthetic aperture radar
Synthetic aperture radarMahesh pawar
 
MISSION TO PLANETS (CHANDRAYAAN,MAVEN,CURIOSITY,MANGALYAAN,CASSINI SOLSTICE M...
MISSION TO PLANETS (CHANDRAYAAN,MAVEN,CURIOSITY,MANGALYAAN,CASSINI SOLSTICE M...MISSION TO PLANETS (CHANDRAYAAN,MAVEN,CURIOSITY,MANGALYAAN,CASSINI SOLSTICE M...
MISSION TO PLANETS (CHANDRAYAAN,MAVEN,CURIOSITY,MANGALYAAN,CASSINI SOLSTICE M...Swetha A
 
Synthetic aperture radar (sar) 20150930
Synthetic aperture radar (sar) 20150930Synthetic aperture radar (sar) 20150930
Synthetic aperture radar (sar) 20150930JiyaE
 
OSM and QGIS
OSM and QGISOSM and QGIS
OSM and QGISQGIS UK
 
Map to Image Georeferencing using ERDAS software
 Map  to Image Georeferencing using ERDAS software Map  to Image Georeferencing using ERDAS software
Map to Image Georeferencing using ERDAS softwareSwetha A
 
33412283 solving-fuzzy-logic-problems-with-matlab
33412283 solving-fuzzy-logic-problems-with-matlab33412283 solving-fuzzy-logic-problems-with-matlab
33412283 solving-fuzzy-logic-problems-with-matlabsai kumar
 
Synthetic aperture radar_advanced
Synthetic aperture radar_advancedSynthetic aperture radar_advanced
Synthetic aperture radar_advancedNaivedya Mishra
 
Feature Extraction and Principal Component Analysis
Feature Extraction and Principal Component AnalysisFeature Extraction and Principal Component Analysis
Feature Extraction and Principal Component AnalysisSayed Abulhasan Quadri
 
Radar 2009 a 14 airborne pulse doppler radar
Radar 2009 a 14 airborne pulse doppler radarRadar 2009 a 14 airborne pulse doppler radar
Radar 2009 a 14 airborne pulse doppler radarForward2025
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysisDmitry Grapov
 
Radar 2009 a 18 synthetic aperture radar
Radar 2009 a 18 synthetic aperture radarRadar 2009 a 18 synthetic aperture radar
Radar 2009 a 18 synthetic aperture radarForward2025
 
GEOPROCESSING IN QGIS
GEOPROCESSING IN QGISGEOPROCESSING IN QGIS
GEOPROCESSING IN QGISSwetha A
 
Remote Sensing And GIS Application In Mineral , Oil , Ground Water MappingMin...
Remote Sensing And GIS Application In Mineral , Oil , Ground Water MappingMin...Remote Sensing And GIS Application In Mineral , Oil , Ground Water MappingMin...
Remote Sensing And GIS Application In Mineral , Oil , Ground Water MappingMin...Swetha A
 
Steps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS softwareSteps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS softwareSwetha A
 
Matlab Feature Extraction Using Segmentation And Edge Detection
Matlab Feature Extraction Using Segmentation And Edge DetectionMatlab Feature Extraction Using Segmentation And Edge Detection
Matlab Feature Extraction Using Segmentation And Edge DetectionDataminingTools Inc
 

Andere mochten auch (20)

Fuzzy logic
Fuzzy logicFuzzy logic
Fuzzy logic
 
Analysing Web GIS apps
Analysing Web GIS appsAnalysing Web GIS apps
Analysing Web GIS apps
 
Developing Efficient Web-based GIS Applications
Developing Efficient Web-based GIS ApplicationsDeveloping Efficient Web-based GIS Applications
Developing Efficient Web-based GIS Applications
 
Introduction to sar-marjolaine_rouault
Introduction to sar-marjolaine_rouaultIntroduction to sar-marjolaine_rouault
Introduction to sar-marjolaine_rouault
 
Synthetic aperture radar
Synthetic aperture radarSynthetic aperture radar
Synthetic aperture radar
 
MISSION TO PLANETS (CHANDRAYAAN,MAVEN,CURIOSITY,MANGALYAAN,CASSINI SOLSTICE M...
MISSION TO PLANETS (CHANDRAYAAN,MAVEN,CURIOSITY,MANGALYAAN,CASSINI SOLSTICE M...MISSION TO PLANETS (CHANDRAYAAN,MAVEN,CURIOSITY,MANGALYAAN,CASSINI SOLSTICE M...
MISSION TO PLANETS (CHANDRAYAAN,MAVEN,CURIOSITY,MANGALYAAN,CASSINI SOLSTICE M...
 
Synthetic aperture radar (sar) 20150930
Synthetic aperture radar (sar) 20150930Synthetic aperture radar (sar) 20150930
Synthetic aperture radar (sar) 20150930
 
OSM and QGIS
OSM and QGISOSM and QGIS
OSM and QGIS
 
Map to Image Georeferencing using ERDAS software
 Map  to Image Georeferencing using ERDAS software Map  to Image Georeferencing using ERDAS software
Map to Image Georeferencing using ERDAS software
 
2 cluster analysis
2  cluster analysis2  cluster analysis
2 cluster analysis
 
33412283 solving-fuzzy-logic-problems-with-matlab
33412283 solving-fuzzy-logic-problems-with-matlab33412283 solving-fuzzy-logic-problems-with-matlab
33412283 solving-fuzzy-logic-problems-with-matlab
 
Synthetic aperture radar_advanced
Synthetic aperture radar_advancedSynthetic aperture radar_advanced
Synthetic aperture radar_advanced
 
Feature Extraction and Principal Component Analysis
Feature Extraction and Principal Component AnalysisFeature Extraction and Principal Component Analysis
Feature Extraction and Principal Component Analysis
 
Radar 2009 a 14 airborne pulse doppler radar
Radar 2009 a 14 airborne pulse doppler radarRadar 2009 a 14 airborne pulse doppler radar
Radar 2009 a 14 airborne pulse doppler radar
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysis
 
Radar 2009 a 18 synthetic aperture radar
Radar 2009 a 18 synthetic aperture radarRadar 2009 a 18 synthetic aperture radar
Radar 2009 a 18 synthetic aperture radar
 
GEOPROCESSING IN QGIS
GEOPROCESSING IN QGISGEOPROCESSING IN QGIS
GEOPROCESSING IN QGIS
 
Remote Sensing And GIS Application In Mineral , Oil , Ground Water MappingMin...
Remote Sensing And GIS Application In Mineral , Oil , Ground Water MappingMin...Remote Sensing And GIS Application In Mineral , Oil , Ground Water MappingMin...
Remote Sensing And GIS Application In Mineral , Oil , Ground Water MappingMin...
 
Steps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS softwareSteps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS software
 
Matlab Feature Extraction Using Segmentation And Edge Detection
Matlab Feature Extraction Using Segmentation And Edge DetectionMatlab Feature Extraction Using Segmentation And Edge Detection
Matlab Feature Extraction Using Segmentation And Edge Detection
 

Kürzlich hochgeladen

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-Organizing Maps

  • 1. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez NLP & IR Group, UNED December 12, 2008
  • 2. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 2
  • 3. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 3
  • 4. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Objectives Group HTML documents by content similarity. Self-Organizing Maps (SOM) to organize, visualize and navigate through the collection. Term weighting function taking advantage of HTML tags Combining, by means of fuzzy logic, heuristic criteria based on the inherent semantics of some HTML tags and word positions in the document. Hypothesis An improvement in document representation will involve an increase in map quality. Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 4
  • 5. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 5
  • 6. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Fuzzy logic Capturing human expert knowledge. Close to natural language. Knowledge base: defined by a set of IF-THEN rules. Linguistic variables Defined using natural language words and fuzzy sets. These sets allow the description of the membership degree of an object to a particular class. Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 6
  • 7. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 7
  • 8. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 8
  • 9. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 9
  • 10. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 10
  • 11. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 11
  • 12. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 12
  • 13. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 13
  • 14. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 14
  • 15. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 15
  • 16. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 16
  • 17. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 17
  • 18. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 18
  • 19. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 19
  • 20. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Linguistic Variables Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 20
  • 21. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Linguistic Variables Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 21
  • 22. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Linguistic Variables Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 22
  • 23. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Linguistic Variables Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 23
  • 24. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Linguistic Variables Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 24
  • 25. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Linguistic Variables Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 25
  • 26. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 26
  • 27. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Knowledge Base Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 27
  • 28. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Knowledge Base Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 28
  • 29. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Knowledge Base Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 29
  • 30. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Knowledge Base Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 30
  • 31. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 1 Dimensionality Reduction 2 Document Map 3 Evaluation Methods 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 31
  • 32. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Dimensionality Reduction Input vectors dimension ranging from 100 to 5000 Stopwords, puntuaction marks suffixes, and words occurring less than 50 times in the whole corpus were removed. Two well known methods: Document frequency reduction. Random projection method. Three proposed rank-based methods: Most Valued Terms. Fixed reduction method. More Frequent Terms until n level. Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 32
  • 33. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 1 Dimensionality Reduction 2 Document Map 3 Evaluation Methods 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 33
  • 34. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Document Map Construction Benchmark dataset for clustering: Banksearch1 10000 documents 10 classes SOM size was set equal to the number of classes of input documents, i.e. 5x2, in order to compare clustering results. 1 M. P. Sinka and D. W. Corne. A large benchmark dataset for web document clustering. Soft Computing Systems: Design, Management, and Applications, 2002. Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 34
  • 35. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 1 Dimensionality Reduction 2 Document Map 3 Evaluation Methods 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 35
  • 36. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Evaluation Methods Weighted average of the F-measure for each class. After mapping the collection in the trained map, the class with greater number of documents mapped on a neuron will be selected to label the unit. All the document vectors in a neuron which class is different from the neuron label will be counted as errors. Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 36
  • 37. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 37
  • 38. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Best reduction for each term weighting function Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 38
  • 39. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion MFTn reduction provides stability Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 39
  • 40. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion EFCC+MFTn obtains its best results with the smallest number of features Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 40
  • 41. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 41
  • 42. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Conclusion Unsupervised document representation method, based on fuzzy logic, focused on clustering HTML documents by means of self-organizing maps. MFTn reduction is the most stable reduction in all cases. EFCC representation allows to obtain better results using a smaller vocabulary. Smaller number of features needed to represent the input documents and SOM unit vectors, which implies an improvement in computational cost. Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 42
  • 43. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Thank You! Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 43
  • 44. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Related Work VSM Topic Document Weighting Modifies Information Type Function SOM Self organization of a Massive Document Yes Yes Text Shannon’s Entrophy No Collection2 Document Clustering Yes No Text Binary, TF, TF-IDF No using Phrases3 Document Clustering Yes Yes Text ESVM, HSVM, HyM No using WordNet4 Conceptional SOM5 Yes No Text TF Yes 2 T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, and A. Saarela. Self organization of a massive document collection. IEEE Trans. on Neural Networks, 2000. 3 J. Bakus, M. Hussin, and M. Kamel. A som-based document clustering using phrases. In ICONIP, 2002. 4 C. Hung and S. Wermter. Neural network based document clustering using wordnet ontologies. Int. J. Hybrid Intell. Syst., 2004 5 Y. Liu, X. Wang, and C. Wu. Consom: A conceptional som model for text clustering. In Neurocomputing, 2008 Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 44