SlideShare a Scribd company logo
1 of 26
Download to read offline
BEIRA: A geo-semantic clustering
   method for area summary




 Osamu Masutani, Hirotoshi Iwasaki
 Denso IT Laboratory, Inc.


Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
Summary
        Background
        Concept
        System architecture
        Evaluation
        Conclusions & Future works




Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   2 of 26
Background – Map service
        Target
          -      Car navigation or PND (Personal
                 Navigation Devices)
          -      GPS mobile phone
          -      Web-based Map Service
        Major functionalities of map
        service
          -      View maps around current position
          -      Search route to destination
          -      Search favorite POI (Point of
                 Interests)
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   3 of 16
A scenario : A visitor to Nancy
        No previous knowledge about
        Nancy.
          -      Japanese
          -      A little interest about Art
        He has a free time.
          -      No plan.
          -      He can’t speak French.
          -      He has a GPS mobile phone.
        The only available information is
        from mobile map service.
          -      He’d like to search POIs using the service.
          -      What is a problem ?

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   4 of 16
Use cases : Searching POIs on mobile
        3 ways to search
        Location based search
          -      Nearby area
        Category based search
          -      “Restaurant” / “Italian” / …
          -      “Public” / “Library” / …
        Keyword based search
          -      “chocolate cake”, “soccer”,
                 “beautiful”, “calm” , …

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   5 of 16
Problem in location based search
        Filtering by the specified area
        Sometimes results are
        numerous
          -      In central urban area
          -      Broad area is chosen
        Selection is very hard
          -      UI is limited. (especially on mobile)




Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   6 of 16
Problem in category based search
        Filtering by specific
        category
        Sometimes results are
        numerous
          -      When the user doesn’t specify                      museum        park

                 detail category
        Information awareness
          -      Once the user chose “Museum”
                 category, he can’t find “Place                                  Place
                 Stanislas”.                                                     Stanislas




Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.      7 of 16
Problem of keyword based search
        Filtering by keyword match
        Information awareness                                       Art nouveau
          -      The users is required to know about
                 the keyword in advance
          -      “Art Nouveau” is good keyword to
                 find Nancy’s features.
          -      But if the user mistakes the keyword
                                                                          Place
                                                                          Stanislas

                 for “Art Deco” the result will be poor




Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.    8 of 16
Problems
        Information overload
          -      Numerous candidates
          -      Millions of POIs in mobile phone service
        Information awareness
          -      Both fixed category and free keyword
                 search have the similar problem.

                                                                    museum       park
        Solution
          -      Reduce the candidates
          -      But keep information awareness
          -      Clustering and summarization of
                 information
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.      9 of 16
Clustering and summarization
        Similar concept
          -      Web search engine “Vivisimo”
          -      Displays clustering result and
                 their topic of search results
          -      Dynamic category
        Easy to choose but
        comprehensive
          -      There are reduced number of
                 candidates but has
                 comprehensive view

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   10 of 16
Is Vivisimo enough ?
        It provides only semantic
        (topic) view.
          -      With map service
          -      Switching between semantic and
                 geographic view will be complicated
        Can these two views be
        combined?
          -      Use only map view
          -      Cluster = area


Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   11 of 16
BEIRA :Bird’s Eye Information Retrieval Application
        Topic based IR through geographic
        view.
          -      Use AOI (Area of Interest) instead of POI
          -      AOI consists of area(cluster) and its summary
                 (the word list)

Area
                           Art Nouveau




                                          Summary=word list
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   12 of 16
System architecture
        POI database
          -      Address of POI
          -      Text of POI (guide text, reputation text etc.)
        Preprocessing
          -      Geo-coding and Topic vector generation.
        Geo-semantic clustering and summarization
        Display AOI
                          Geographic         Latitude Longitude
                         preprocessing
      POI                                         Geo-semantic               Geo-semantic         AOI
    database                                       clustering            summarization
                           Semantic
                         preprocessing        Topic Vector


POI ID        Address         text         Etc…

                                                                    AOI ID    Area Polygon     Summary


Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.                           13 of 16
Implementation
        Combinations of GIS and Text mining
        tools




Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   14 of 16
Geo-semantic clustering
        Geographic clustering doesn’t reflect area topics :
        Circular area
        Semantic clustering doesn’t consider geographic
        view : Scattered area
        Geo-semantic clustering solves these problems
     Semantic Clustering                   G/S Clustering           Geographic Clustering




Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.             15 of 16
Geo-semantic clustering
        Co-clustering with geographic and
        semantic features
          -      Geographic feature : latitude, longitude
          -      Semantic feature : large dimension matrix (Latent
                 semantic indexing)

        G/S ratio R: the combination ratio
          -      R =Geographic bias / Semantic bias

                                     *R                             *1
                         Geographic Features           Semantic Features
              POI ID     Latitude       longitude      LSI1         LSI2   LSI3
              ・・・        ・・・            ・・・            ・・・          ・・・    ・・・


Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.                 16 of 16
Evaluation : geo-semantic clustering
        Dataset : Cafes in Shibuya
          -      Text contents : restaurants evaluation web site
                 “asku.com”
          -      272 cafes in the region (Shibuya ward).
        Correct cluster data
          -      Generated manually
          -      13 clusters in the region
          -      F measure



Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   17 of 16
Results of clustering
        Geo-semantic clustering produces non-
        circular area according to its topic.

        Semantic                          Geo-semantic              Geographic




       R=1.0E-04                              R=1.0E-02             R=1.0E+06
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
Evaluation of clustering
        We confirmed geo-semantic
        clustering is better than each solo
        clustering
          -      Intermediate ratio (0.01) is optimal.
                                                                                        0.6


                                                                                        0.5


                                                                                        0.4

                                                                                                                                MLSA
                                                                                        0.3                                     Tensor-Kmeans


                                                                                        0.2


                                                                                        0.1



                                            Semantic                1.0E-04   1.0E-02
                                                                                         0
                                                                                        1.0E+00   1.0E+02
                                                                                                                 Geographic
                                                                                                            1.0E+04   1.0E+06




Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.                                                 19 of 16
Area summarization
        Document summarization
        Term weighting : ex. TF/IDF
          -      The term that occurs many times in a
                 document is important (TF term
                 frequency)
          -      The rare term in entire document set is
                 important (IDF inverse document
                 frequency)


Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   20 of 16
Problem of IDF
         The simple IDF cannot extract regional
         characteristic word
           -      According to IDF , “onion” and “wedding” have same weight
           -      “wedding” should be regarded as more important because the
                  area where wedding is held should be biased.


z          Normal term                     Place name                Area term
           “onion”                         “Dogenzaka”               “Wedding”
IDF




IDF                   3.08                            3.51                3.04
K                     4.41                            54.0                9.93
 Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.               21 of 16
Location aware IDF
        The geographic distribution of word
         -      Term occurrence in the geographic space
        More condensed is regarded as more important
         -      Measurement : K-value (point distribution analysis method)
        IDF * K


 z            Normal term                     Place name            Area term
              “onion”                         “Dogenzaka”           “Wedding”
 IDF




 IDF                     3.08                            3.51            3.04
 K                       4.41                            54.0            9.93
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.               22 of 16
Evaluation of location aware IDF
         Evaluation measure : Extraction rate of
         location names
           -      The area characteristic terms has similar
                  distribution with location name
z          Normal term                     Place name                Area term
           “onion”                         “Dogenzaka”               “Wedding”
IDF




IDF                   3.08                            3.51                3.04
K                     4.41                            54.0                9.93
 Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.               23 of 16
Evaluation of location aware IDF
        Evaluation data
          -      All words in Shibuya area.
          -      Top 1,000 weighted terms
        Location aware IDF (IDF*K) efficiently
        extracts location name than
        conventional ones                                                                         30


                                                                                                  25




                                                                    density of location name[%]
                                                                                                  20

                                                                                                                                                                  IDF
                                                                                                  15                                                              K
                                                                                                                                                                  IDF*K

                                                                                                  10


                                                                                                  5


                                                                                                  0
                                                                                                       1   100   200   300   400    500   600   700   800   900
                                                                                                                                rank


Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.                                                                               24 of 16
Conclusions
        BEIRA attacks the issues on map
        service
          -      Information overload
          -      Information awareness
        Geo-semantic combination of features
        and processing can be used to make
        area characteristics view.
        Future works
          -      Automatic adaptation of G/S ratio
          -      Evaluation on other contents
                                                                    Hokkai Takashima
                                                                    (1850-1931)

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.    25 of 16
Thank you for your attention!




Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   26 of 26

More Related Content

More from Osamu Masutani

Hpc server講習会第3回応用編
Hpc server講習会第3回応用編Hpc server講習会第3回応用編
Hpc server講習会第3回応用編
Osamu Masutani
 

More from Osamu Masutani (20)

TOWARD A BETTER IPA EXPERIENCE FOR A CONNECTED VEHICLE BY MEANS OF USAGE PRED...
TOWARD A BETTER IPA EXPERIENCE FOR A CONNECTED VEHICLE BY MEANS OF USAGE PRED...TOWARD A BETTER IPA EXPERIENCE FOR A CONNECTED VEHICLE BY MEANS OF USAGE PRED...
TOWARD A BETTER IPA EXPERIENCE FOR A CONNECTED VEHICLE BY MEANS OF USAGE PRED...
 
Power BI勉強会 #6 Power BI で地理的分析とこまでできる?
Power BI勉強会 #6 Power BI で地理的分析とこまでできる?Power BI勉強会 #6 Power BI で地理的分析とこまでできる?
Power BI勉強会 #6 Power BI で地理的分析とこまでできる?
 
コネクテッドカーの胎動と交通サイバーフィジカルシステム
コネクテッドカーの胎動と交通サイバーフィジカルシステムコネクテッドカーの胎動と交通サイバーフィジカルシステム
コネクテッドカーの胎動と交通サイバーフィジカルシステム
 
R tools for Vsual Studio
R tools for Vsual StudioR tools for Vsual Studio
R tools for Vsual Studio
 
Power BI チュートリアル 導入・初級編
Power BI チュートリアル 導入・初級編Power BI チュートリアル 導入・初級編
Power BI チュートリアル 導入・初級編
 
A Sensing Coverage Analysis of a Route Control Method for Vehicular Crowd Sen...
A Sensing Coverage Analysis of a Route Control Method for Vehicular Crowd Sen...A Sensing Coverage Analysis of a Route Control Method for Vehicular Crowd Sen...
A Sensing Coverage Analysis of a Route Control Method for Vehicular Crowd Sen...
 
Matlab distributed computing serverの使い方
Matlab distributed computing serverの使い方Matlab distributed computing serverの使い方
Matlab distributed computing serverの使い方
 
Traffic simulation based on space syntax
Traffic simulation based on space syntaxTraffic simulation based on space syntax
Traffic simulation based on space syntax
 
C++ AMPを使ってみよう
C++ AMPを使ってみようC++ AMPを使ってみよう
C++ AMPを使ってみよう
 
Windows Store アプリをuniversal にして申請する手順
Windows Store アプリをuniversal にして申請する手順Windows Store アプリをuniversal にして申請する手順
Windows Store アプリをuniversal にして申請する手順
 
Hpc server講習会第3回応用編
Hpc server講習会第3回応用編Hpc server講習会第3回応用編
Hpc server講習会第3回応用編
 
Windows HPC Server 講習会 第1回 導入編 1/2
Windows HPC Server 講習会 第1回 導入編 1/2Windows HPC Server 講習会 第1回 導入編 1/2
Windows HPC Server 講習会 第1回 導入編 1/2
 
Windows HPC Server 講習会 第2回 開発編
Windows HPC Server 講習会 第2回 開発編Windows HPC Server 講習会 第2回 開発編
Windows HPC Server 講習会 第2回 開発編
 
A Multiple Pairs Shortest Path Algorithm 解説
A Multiple Pairs Shortest Path Algorithm 解説A Multiple Pairs Shortest Path Algorithm 解説
A Multiple Pairs Shortest Path Algorithm 解説
 
Clustering of time series subsequences is meaningless 解説
Clustering of time series subsequences is meaningless 解説Clustering of time series subsequences is meaningless 解説
Clustering of time series subsequences is meaningless 解説
 
Autopoiesis 2
Autopoiesis 2Autopoiesis 2
Autopoiesis 2
 
Autopoiesis 1
Autopoiesis 1Autopoiesis 1
Autopoiesis 1
 
UIMAウマー
UIMAウマーUIMAウマー
UIMAウマー
 
Toward a resilient prediction system for non-uniform traffic data
Toward a resilient prediction system for non-uniform traffic data Toward a resilient prediction system for non-uniform traffic data
Toward a resilient prediction system for non-uniform traffic data
 
BEIRA -鳥瞰型情報検索アプリケーション
BEIRA -鳥瞰型情報検索アプリケーションBEIRA -鳥瞰型情報検索アプリケーション
BEIRA -鳥瞰型情報検索アプリケーション
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

BEIRA: A geo-semantic clustering method for area summary

  • 1. BEIRA: A geo-semantic clustering method for area summary Osamu Masutani, Hirotoshi Iwasaki Denso IT Laboratory, Inc. Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
  • 2. Summary Background Concept System architecture Evaluation Conclusions & Future works Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 2 of 26
  • 3. Background – Map service Target - Car navigation or PND (Personal Navigation Devices) - GPS mobile phone - Web-based Map Service Major functionalities of map service - View maps around current position - Search route to destination - Search favorite POI (Point of Interests) Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 3 of 16
  • 4. A scenario : A visitor to Nancy No previous knowledge about Nancy. - Japanese - A little interest about Art He has a free time. - No plan. - He can’t speak French. - He has a GPS mobile phone. The only available information is from mobile map service. - He’d like to search POIs using the service. - What is a problem ? Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 4 of 16
  • 5. Use cases : Searching POIs on mobile 3 ways to search Location based search - Nearby area Category based search - “Restaurant” / “Italian” / … - “Public” / “Library” / … Keyword based search - “chocolate cake”, “soccer”, “beautiful”, “calm” , … Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 5 of 16
  • 6. Problem in location based search Filtering by the specified area Sometimes results are numerous - In central urban area - Broad area is chosen Selection is very hard - UI is limited. (especially on mobile) Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 6 of 16
  • 7. Problem in category based search Filtering by specific category Sometimes results are numerous - When the user doesn’t specify museum park detail category Information awareness - Once the user chose “Museum” category, he can’t find “Place Place Stanislas”. Stanislas Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 7 of 16
  • 8. Problem of keyword based search Filtering by keyword match Information awareness Art nouveau - The users is required to know about the keyword in advance - “Art Nouveau” is good keyword to find Nancy’s features. - But if the user mistakes the keyword Place Stanislas for “Art Deco” the result will be poor Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 8 of 16
  • 9. Problems Information overload - Numerous candidates - Millions of POIs in mobile phone service Information awareness - Both fixed category and free keyword search have the similar problem. museum park Solution - Reduce the candidates - But keep information awareness - Clustering and summarization of information Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 9 of 16
  • 10. Clustering and summarization Similar concept - Web search engine “Vivisimo” - Displays clustering result and their topic of search results - Dynamic category Easy to choose but comprehensive - There are reduced number of candidates but has comprehensive view Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 10 of 16
  • 11. Is Vivisimo enough ? It provides only semantic (topic) view. - With map service - Switching between semantic and geographic view will be complicated Can these two views be combined? - Use only map view - Cluster = area Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 11 of 16
  • 12. BEIRA :Bird’s Eye Information Retrieval Application Topic based IR through geographic view. - Use AOI (Area of Interest) instead of POI - AOI consists of area(cluster) and its summary (the word list) Area Art Nouveau Summary=word list Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 12 of 16
  • 13. System architecture POI database - Address of POI - Text of POI (guide text, reputation text etc.) Preprocessing - Geo-coding and Topic vector generation. Geo-semantic clustering and summarization Display AOI Geographic Latitude Longitude preprocessing POI Geo-semantic Geo-semantic AOI database clustering summarization Semantic preprocessing Topic Vector POI ID Address text Etc… AOI ID Area Polygon Summary Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 13 of 16
  • 14. Implementation Combinations of GIS and Text mining tools Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 14 of 16
  • 15. Geo-semantic clustering Geographic clustering doesn’t reflect area topics : Circular area Semantic clustering doesn’t consider geographic view : Scattered area Geo-semantic clustering solves these problems Semantic Clustering G/S Clustering Geographic Clustering Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 15 of 16
  • 16. Geo-semantic clustering Co-clustering with geographic and semantic features - Geographic feature : latitude, longitude - Semantic feature : large dimension matrix (Latent semantic indexing) G/S ratio R: the combination ratio - R =Geographic bias / Semantic bias *R *1 Geographic Features Semantic Features POI ID Latitude longitude LSI1 LSI2 LSI3 ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 16 of 16
  • 17. Evaluation : geo-semantic clustering Dataset : Cafes in Shibuya - Text contents : restaurants evaluation web site “asku.com” - 272 cafes in the region (Shibuya ward). Correct cluster data - Generated manually - 13 clusters in the region - F measure Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 17 of 16
  • 18. Results of clustering Geo-semantic clustering produces non- circular area according to its topic. Semantic Geo-semantic Geographic R=1.0E-04 R=1.0E-02 R=1.0E+06 Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
  • 19. Evaluation of clustering We confirmed geo-semantic clustering is better than each solo clustering - Intermediate ratio (0.01) is optimal. 0.6 0.5 0.4 MLSA 0.3 Tensor-Kmeans 0.2 0.1 Semantic 1.0E-04 1.0E-02 0 1.0E+00 1.0E+02 Geographic 1.0E+04 1.0E+06 Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 19 of 16
  • 20. Area summarization Document summarization Term weighting : ex. TF/IDF - The term that occurs many times in a document is important (TF term frequency) - The rare term in entire document set is important (IDF inverse document frequency) Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 20 of 16
  • 21. Problem of IDF The simple IDF cannot extract regional characteristic word - According to IDF , “onion” and “wedding” have same weight - “wedding” should be regarded as more important because the area where wedding is held should be biased. z Normal term Place name Area term “onion” “Dogenzaka” “Wedding” IDF IDF 3.08 3.51 3.04 K 4.41 54.0 9.93 Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 21 of 16
  • 22. Location aware IDF The geographic distribution of word - Term occurrence in the geographic space More condensed is regarded as more important - Measurement : K-value (point distribution analysis method) IDF * K z Normal term Place name Area term “onion” “Dogenzaka” “Wedding” IDF IDF 3.08 3.51 3.04 K 4.41 54.0 9.93 Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 22 of 16
  • 23. Evaluation of location aware IDF Evaluation measure : Extraction rate of location names - The area characteristic terms has similar distribution with location name z Normal term Place name Area term “onion” “Dogenzaka” “Wedding” IDF IDF 3.08 3.51 3.04 K 4.41 54.0 9.93 Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 23 of 16
  • 24. Evaluation of location aware IDF Evaluation data - All words in Shibuya area. - Top 1,000 weighted terms Location aware IDF (IDF*K) efficiently extracts location name than conventional ones 30 25 density of location name[%] 20 IDF 15 K IDF*K 10 5 0 1 100 200 300 400 500 600 700 800 900 rank Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 24 of 16
  • 25. Conclusions BEIRA attacks the issues on map service - Information overload - Information awareness Geo-semantic combination of features and processing can be used to make area characteristics view. Future works - Automatic adaptation of G/S ratio - Evaluation on other contents Hokkai Takashima (1850-1931) Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 25 of 16
  • 26. Thank you for your attention! Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 26 of 26