SlideShare a Scribd company logo
1 of 32
Download to read offline
ประสบการณ์การวิเคราะห์ข้อมูลด้วย
วิธีการทาเหมืองข้อมูล (Text Mining)



                                               ดร.อลิสา คงทน

                  นักวิจัย ห้องปฏิบัติการวิจัยวิทยาการมนุษยภาษา
           ศูนย์เทคโนโลยีอิเล็กทรอนิกส์และคอมพิวเตอร์แห่งชาติ



                                                                  1
Text Mining is about…



 “Sifting through vast collections of unstructured or
 semistructured data beyond the reach of data mining
 tools, text mining tracks information sources, links isolated
 concepts in distant documents, maps relationships
 between activities, and helps answer questions.”


                                   Tapping the Power of Text Mining
                             Communications of the ACM, Sept. 2006



                                                                      2
Humans VS. Computers
• Humans: Ability to distinguish and apply linguistic patterns to text

   – Could overcome language difficulties such as slangs, spelling
     variations, contextual meaning


• Computers: Ability to process text in large volumes at high speed
   – Could sift through a large collection of texts to find simple statistics
     and relationship among terms in an instant of time


• Text mining requires a combination of both
   Human's linguistic capability + computer's speed and accuracy


               NLP                                   Data Mining
Text Mining Tasks

• Information extraction:
  – Analyze unstructured text and identify key words or
    phrases and relationships within text
• Topic detection and tracking:
  – Filter and present only documents relevant to the user
    profile
• Summarization:
  – Text summarization reduces the content by retaining
    only its main points and overall meaning



                                                             4
Text Mining Tasks

• Categorization:
  – Automatic classify documents into predefined
    categories
• Clustering:
  – Group similar documents based on their similarity
• Concept Linkage
  – Connect related documents by identifying their shared
    concepts, helping users find information they perhaps
    wouldn't have found through traditional search methods



                                                             5
Text Mining Tasks

• Information Visualization
  – Represent documents or information in graphical
    formats for easily browsing, viewing, or searching
• Question and answering (Q&A)
  – Search and extract the best answer to a given question




                                                             6
Applications: Tech Mining

• Tech Mining is the application of text mining
  tools to science and technology (S&T)
  information particularly bibliographic abstracts

• It exploits the S&T databases to see patterns,
  detect associations, and foresee opportunities




                                                     7
Tech Mining Process




                      8
Technical Intelligences:
Who, What, When, Where?
• Digest multiple S&T information resources
• Profile Research Domains:
  –   Who?
  –   What?
  –   When?
  –   Where?
• Map Relationships: Topics & Teams
• Analyze Trends: What’s Hot & What’s Coming
• And do so -- Quickly

                                               9
What if I don’t have Tech
Mining Software?




                            10
What if I don’t have Tech
Mining Software?




                            11
Output example from Tech
Mining Software




Source: A.L. Porter, QTIP: quick technology intelligence processes, Technol. Forecast. Soc. Change 72 (2005)   12
Applications: Expert Finder




                              13
Applications: Expert Finder




                              14
Applications: Expert Finder




                              15
Applications: ABDUL
(Artificial BudDy U Love)

• An online information service which currently provides
  access to Thai linguistic (e.g., dictionary and sentence
  translation) and information resources (e.g., weather
  condition, stock price, gas price, traffic condition, etc.)


• Users are able to use natural language to interact with
  ABDUL via Instant Messaging (IM) based protocol, Web
  browser, and Mobile devices




                                                                16
Applications: ABDUL
(Artificial BudDy U Love)




                            17
Applications: ABDUL
(Artificial BudDy U Love)




                            18
Web 1.0 VS. Web 2.0




                      19
User-Generated Contents

• With the Web 2.0 or social networking websites, the
  amount of user-generated contents has increased
  exponentially


• User-generated contents often contain opinions and/or
  sentiments


• An in-depth analysis of these opinionated texts could
  reveal potentially useful information, e.g.,
  – Preferences of people towards many different topics including news
    events, social issues and commercial products



                                                                         20
Online Opinion Resources
Characteristics of Online
Reviews
• Natural language and unstructured text format

• Some reviews are long and contain only a few
  sentences expressing opinions on the product

• Could be difficult for a potential reader to
  understand and analyze each review that
  maybe relevant to his or her decision making


                                                  22
Opinion Mining

• Opinion mining and sentiment analysis is a task for
  analyzing and summarizing what people think about a
  certain topic


• Opinion mining has gained a lot of interest in text mining
  and NLP communities


• Three granularities of opinion mining:
  – Document level
  – Sentence level
  – Feature level

                                                               23
Feature-Based Opinion Mining

• This approach typically consists of two following
  steps:
      1. Identifying and extracting features of an object,
  topic or event from each sentence
      2. Determining whether the opinions regarding the
  features are positive or negative




                                                             24
Opinion Mining on Hotel Reviews in
Thailand (Graphical Display)




                                     25
Opinion Mining on Hotel Reviews in
Thailand (Textual Display)




                                     26
Comparison among Hotels




                          27
Opinion Mining on Mobile
Network Operators in Thailand




                                28
Opinion Mining on Mobile
Network Operators in Thailand




                                29
Challenges in Text Mining

• Text Mining = NLP + Data Mining
• Statistical NLP
  –   Ambiguity
  –   Context
  –   Tokenization  Sentence Detection
  –   POS tagging
• Data Mining
  – Ability to process the data
  – Massive amounts of data
  – Determining and extracting information of interest

                                                         30
Conclusions

• As the amount of data increases, text-mining
  tools that sift through it will be increasingly
  valuable

• Various applications for academic and industry
  uses




                                                    31
Thank you for your attention


           Q&A



                               32

More Related Content

What's hot

Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
butest
 
Data Mining
Data MiningData Mining
Data Mining
Mîrză MuNib
 

What's hot (20)

Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Data mining services
Data mining servicesData mining services
Data mining services
 
10.1.1.118.1099
10.1.1.118.109910.1.1.118.1099
10.1.1.118.1099
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
Knowledge Discovery in Databases
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
Knowledge Discovery in Databases
 
Data mining-2
Data mining-2Data mining-2
Data mining-2
 
Data mining and knowledge Discovery
Data mining and knowledge DiscoveryData mining and knowledge Discovery
Data mining and knowledge Discovery
 
Dwdm
DwdmDwdm
Dwdm
 
Data mining in agriculture
Data mining in agricultureData mining in agriculture
Data mining in agriculture
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
Data Mining
Data MiningData Mining
Data Mining
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 
Ir 01
Ir   01Ir   01
Ir 01
 
Introduction to DataMining
Introduction to DataMiningIntroduction to DataMining
Introduction to DataMining
 
Data Mining
Data MiningData Mining
Data Mining
 

Similar to Text Mining : Experience

Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 
KM - Cognitive Computing overview by Ken Martin 13Apr2016
KM - Cognitive Computing overview by Ken Martin 13Apr2016KM - Cognitive Computing overview by Ken Martin 13Apr2016
KM - Cognitive Computing overview by Ken Martin 13Apr2016
HCL Technologies
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Artificial Intelligence Institute at UofSC
 

Similar to Text Mining : Experience (20)

C N I20080404
C N I20080404C N I20080404
C N I20080404
 
Torsten Reimer
Torsten ReimerTorsten Reimer
Torsten Reimer
 
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and SensemakingAuto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
Text Mining
Text MiningText Mining
Text Mining
 
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...
 
KM - Cognitive Computing overview by Ken Martin 13Apr2016
KM - Cognitive Computing overview by Ken Martin 13Apr2016KM - Cognitive Computing overview by Ken Martin 13Apr2016
KM - Cognitive Computing overview by Ken Martin 13Apr2016
 
Km cognitive computing overview by ken martin 19 jan2015
Km   cognitive computing overview by ken martin 19 jan2015Km   cognitive computing overview by ken martin 19 jan2015
Km cognitive computing overview by ken martin 19 jan2015
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Mining Web content for Enhanced Search
Mining Web content for Enhanced Search
 
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapNew Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
 
Presentation on the Warsaw Conference on National Bibliographies August 2012
Presentation on the Warsaw Conference on National Bibliographies August 2012Presentation on the Warsaw Conference on National Bibliographies August 2012
Presentation on the Warsaw Conference on National Bibliographies August 2012
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
 
Introduction to Information Architecture & Design - 3/21/15
Introduction to Information Architecture & Design - 3/21/15Introduction to Information Architecture & Design - 3/21/15
Introduction to Information Architecture & Design - 3/21/15
 
Metadata and Taxonomies for More Flexible Information Architecture
Metadata and Taxonomies for More Flexible Information Architecture Metadata and Taxonomies for More Flexible Information Architecture
Metadata and Taxonomies for More Flexible Information Architecture
 
Introduction to Information Architecture & Design - 6/20/15
Introduction to Information Architecture & Design - 6/20/15Introduction to Information Architecture & Design - 6/20/15
Introduction to Information Architecture & Design - 6/20/15
 
Introduction to Information Architecture & Design - SVA Workshop 03/22/14
Introduction to Information Architecture & Design - SVA Workshop 03/22/14Introduction to Information Architecture & Design - SVA Workshop 03/22/14
Introduction to Information Architecture & Design - SVA Workshop 03/22/14
 
Introduction to Information Architecture & Design - 10/03/15
Introduction to Information Architecture & Design - 10/03/15Introduction to Information Architecture & Design - 10/03/15
Introduction to Information Architecture & Design - 10/03/15
 
Introduction to Information Architecture & Design - 2/14/15
Introduction to Information Architecture & Design - 2/14/15Introduction to Information Architecture & Design - 2/14/15
Introduction to Information Architecture & Design - 2/14/15
 
2014_WWW_BTOR
2014_WWW_BTOR2014_WWW_BTOR
2014_WWW_BTOR
 

More from Boonlert Aroonpiboon

More from Boonlert Aroonpiboon (20)

Excel quiz
Excel quizExcel quiz
Excel quiz
 
Scival for Research Performance
Scival for Research PerformanceScival for Research Performance
Scival for Research Performance
 
20190726 icde-session-chularat-nstda-4
20190726 icde-session-chularat-nstda-420190726 icde-session-chularat-nstda-4
20190726 icde-session-chularat-nstda-4
 
20190409 social-media-backup
20190409 social-media-backup20190409 social-media-backup
20190409 social-media-backup
 
20190220 open-library
20190220 open-library20190220 open-library
20190220 open-library
 
20190220 digital-archives
20190220 digital-archives20190220 digital-archives
20190220 digital-archives
 
OER KKU Library
OER KKU LibraryOER KKU Library
OER KKU Library
 
Museum digital-code
Museum digital-codeMuseum digital-code
Museum digital-code
 
OER MOOC - Success Story
OER MOOC - Success StoryOER MOOC - Success Story
OER MOOC - Success Story
 
LAM Code of conduct
LAM Code of conductLAM Code of conduct
LAM Code of conduct
 
RLPD - OER MOOC
RLPD - OER MOOCRLPD - OER MOOC
RLPD - OER MOOC
 
New Technology for Information Services
New Technology for Information ServicesNew Technology for Information Services
New Technology for Information Services
 
New Technology for Information Services
New Technology for Information ServicesNew Technology for Information Services
New Technology for Information Services
 
digital law for GLAM
digital law for GLAMdigital law for GLAM
digital law for GLAM
 
20180919 digital-collections
20180919 digital-collections20180919 digital-collections
20180919 digital-collections
 
Field-Weighted Citation Impact (FWCI)
Field-Weighted Citation Impact (FWCI)Field-Weighted Citation Impact (FWCI)
Field-Weighted Citation Impact (FWCI)
 
20180828 digital-archives
20180828 digital-archives20180828 digital-archives
20180828 digital-archives
 
Local Wisdom Information : How to
Local Wisdom Information : How toLocal Wisdom Information : How to
Local Wisdom Information : How to
 
201403 etda-library-settup
201403 etda-library-settup201403 etda-library-settup
201403 etda-library-settup
 
201403 etda-library
201403 etda-library201403 etda-library
201403 etda-library
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

Text Mining : Experience

  • 1. ประสบการณ์การวิเคราะห์ข้อมูลด้วย วิธีการทาเหมืองข้อมูล (Text Mining) ดร.อลิสา คงทน นักวิจัย ห้องปฏิบัติการวิจัยวิทยาการมนุษยภาษา ศูนย์เทคโนโลยีอิเล็กทรอนิกส์และคอมพิวเตอร์แห่งชาติ 1
  • 2. Text Mining is about… “Sifting through vast collections of unstructured or semistructured data beyond the reach of data mining tools, text mining tracks information sources, links isolated concepts in distant documents, maps relationships between activities, and helps answer questions.” Tapping the Power of Text Mining Communications of the ACM, Sept. 2006 2
  • 3. Humans VS. Computers • Humans: Ability to distinguish and apply linguistic patterns to text – Could overcome language difficulties such as slangs, spelling variations, contextual meaning • Computers: Ability to process text in large volumes at high speed – Could sift through a large collection of texts to find simple statistics and relationship among terms in an instant of time • Text mining requires a combination of both Human's linguistic capability + computer's speed and accuracy NLP Data Mining
  • 4. Text Mining Tasks • Information extraction: – Analyze unstructured text and identify key words or phrases and relationships within text • Topic detection and tracking: – Filter and present only documents relevant to the user profile • Summarization: – Text summarization reduces the content by retaining only its main points and overall meaning 4
  • 5. Text Mining Tasks • Categorization: – Automatic classify documents into predefined categories • Clustering: – Group similar documents based on their similarity • Concept Linkage – Connect related documents by identifying their shared concepts, helping users find information they perhaps wouldn't have found through traditional search methods 5
  • 6. Text Mining Tasks • Information Visualization – Represent documents or information in graphical formats for easily browsing, viewing, or searching • Question and answering (Q&A) – Search and extract the best answer to a given question 6
  • 7. Applications: Tech Mining • Tech Mining is the application of text mining tools to science and technology (S&T) information particularly bibliographic abstracts • It exploits the S&T databases to see patterns, detect associations, and foresee opportunities 7
  • 9. Technical Intelligences: Who, What, When, Where? • Digest multiple S&T information resources • Profile Research Domains: – Who? – What? – When? – Where? • Map Relationships: Topics & Teams • Analyze Trends: What’s Hot & What’s Coming • And do so -- Quickly 9
  • 10. What if I don’t have Tech Mining Software? 10
  • 11. What if I don’t have Tech Mining Software? 11
  • 12. Output example from Tech Mining Software Source: A.L. Porter, QTIP: quick technology intelligence processes, Technol. Forecast. Soc. Change 72 (2005) 12
  • 16. Applications: ABDUL (Artificial BudDy U Love) • An online information service which currently provides access to Thai linguistic (e.g., dictionary and sentence translation) and information resources (e.g., weather condition, stock price, gas price, traffic condition, etc.) • Users are able to use natural language to interact with ABDUL via Instant Messaging (IM) based protocol, Web browser, and Mobile devices 16
  • 19. Web 1.0 VS. Web 2.0 19
  • 20. User-Generated Contents • With the Web 2.0 or social networking websites, the amount of user-generated contents has increased exponentially • User-generated contents often contain opinions and/or sentiments • An in-depth analysis of these opinionated texts could reveal potentially useful information, e.g., – Preferences of people towards many different topics including news events, social issues and commercial products 20
  • 22. Characteristics of Online Reviews • Natural language and unstructured text format • Some reviews are long and contain only a few sentences expressing opinions on the product • Could be difficult for a potential reader to understand and analyze each review that maybe relevant to his or her decision making 22
  • 23. Opinion Mining • Opinion mining and sentiment analysis is a task for analyzing and summarizing what people think about a certain topic • Opinion mining has gained a lot of interest in text mining and NLP communities • Three granularities of opinion mining: – Document level – Sentence level – Feature level 23
  • 24. Feature-Based Opinion Mining • This approach typically consists of two following steps: 1. Identifying and extracting features of an object, topic or event from each sentence 2. Determining whether the opinions regarding the features are positive or negative 24
  • 25. Opinion Mining on Hotel Reviews in Thailand (Graphical Display) 25
  • 26. Opinion Mining on Hotel Reviews in Thailand (Textual Display) 26
  • 28. Opinion Mining on Mobile Network Operators in Thailand 28
  • 29. Opinion Mining on Mobile Network Operators in Thailand 29
  • 30. Challenges in Text Mining • Text Mining = NLP + Data Mining • Statistical NLP – Ambiguity – Context – Tokenization Sentence Detection – POS tagging • Data Mining – Ability to process the data – Massive amounts of data – Determining and extracting information of interest 30
  • 31. Conclusions • As the amount of data increases, text-mining tools that sift through it will be increasingly valuable • Various applications for academic and industry uses 31
  • 32. Thank you for your attention Q&A 32