Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Tweet segmentation and its application to named entity recognition

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 4 Anzeige

Tweet segmentation and its application to named entity recognition

Herunterladen, um offline zu lesen

Tweet segmentation and its application to named entity recognition
+91-9994232214,8144199666, ieeeprojectchennai@gmail.com,
www.projectsieee.com, www.ieee-projects-chennai.com

IEEE PROJECTS 2015-2016
-----------------------------------
Contact:+91-9994232214,+91-8144199666
Email:ieeeprojectchennai@gmail.com

Support:
-------------
Projects Code
Documentation
PPT
Projects Video File
Projects Explanation
Teamviewer Support


Tweet segmentation and its application to named entity recognition
+91-9994232214,8144199666, ieeeprojectchennai@gmail.com,
www.projectsieee.com, www.ieee-projects-chennai.com

IEEE PROJECTS 2015-2016
-----------------------------------
Contact:+91-9994232214,+91-8144199666
Email:ieeeprojectchennai@gmail.com

Support:
-------------
Projects Code
Documentation
PPT
Projects Video File
Projects Explanation
Teamviewer Support


Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (19)

Ähnlich wie Tweet segmentation and its application to named entity recognition (20)

Anzeige

Weitere von ieeepondy (20)

Aktuellste (20)

Anzeige

Tweet segmentation and its application to named entity recognition

  1. 1. Tweet Segmentation and Its Application to Named Entity Recognition Abstract: Twitter has attracted millions of users to share and disseminate most up-to- date information, resulting in large volumes of data produced everyday. However, many applications in Information Retrieval (IR) and Natural Language Processing (NLP) suffer severely from the noisy and short nature of tweets. In this paper, we propose a novel framework for tweet segmentation in a batch mode, called HybridSeg. By splitting tweets into meaningful segments, the semantic or context information is well preserved and easily extracted by the downstream applications. HybridSeg finds the optimal segmentation of a tweet by maximizing the sum of the stickiness scores of its candidate segments. The stickiness score considers the probability of a segment being a phrase in English (i.e., global context) and the probability of a segment being a phrase within the batch of tweets (i.e., local context). For the latter, we propose and evaluate two models to derive local context by considering the linguistic features and term- dependency in a batch of tweets, respectively. HybridSeg is also designed to iteratively learn from confident segments as pseudo feedback. Experiments on two tweet data sets show that tweet segmentation quality is significantly improved by learning both global and local contexts compared with using global context alone. Through analysis and
  2. 2. comparison, we show that local linguistic features are more reliable for learning local context compared with term-dependency. As an application, we show that high accuracy is achieved in named entity recognition by applying segment-based part-of-speech (POS) tagging. Existing System: Many organizations have been reported to create and monitor targeted Twitter streams to collect and understand users’ opinions. Targeted Twitter stream is usually constructed by filtering tweets with predefined selection criteria (e.g., tweets published by users from a geographical region, tweets that match one or more predefined keywords). Due to its invaluable business value of timely information from these tweets, it is imperative to understand tweets’ language for a large body of downstream applications, such as named entity recognition (NER) event detection and summarization opinion mining sentiment analysis and many others. Given the limited length of a tweet (i.e., 140 characters) and no restrictions on its writing styles, tweets often contain grammatical errors, misspellings, and informal abbreviations. The error-prone and short nature of tweets often make the word-level language models for tweets less reliable. For example, given a tweet “I call her, no answer. Proposed System:
  3. 3. To achieve high quality tweet segmentation, we propose a generic tweet segmentation framework, named HybridSeg. HybridSeg learns from both global and local contexts, and has the ability of learning from pseudo feedback. Global context: Tweets are posted for information sharing and communication. The named entities and semantic phrases are well preserved in tweets. The global context derived from Web pages (e.g., Microsoft Web N-Gram corpus) or Wikipedia therefore helps identifying the meaningful segments in tweets. Hardware Requirements: • System : Pentium IV 2.4 GHz. • Hard Disk : 40 GB. • Floppy Drive : 1.44 Mb. • Monitor : 15 VGA Colour. • Mouse : Logitech. • RAM : 256 Mb. Software Requirements: • Operating system : - Windows XP. • Front End : - JSP • Back End : - SQL Server
  4. 4. Software Requirements: • Operating system : - Windows XP. • Front End : - .Net • Back End : - SQL Server

×