SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
1
(I)
TABLE OF CONTENTS
Chapter No. Topics Page No.
Student Declaration II
Certificate from the Supervisor III
Acknowledgement IV
Summary (Not more than 250 words) V
List of Figures VI
List of Tables VII
List of Symbols and Acronyms VIII
Chapter-1 Introduction 10-13
1.1 General Introduction
1.2 List some relevant current/open problems.
1.3 Problem Statement
1.4 Overview of proposed solution approach and Novelty/benefits
1.5 Give tabular comparison of other existing approaches/ solution to
the problem framed
Chapter-2 Literature Survey 14-17
2.1 Summary of papers studied
2.2 Integrated summary of the literature studied
Chapter 3: Analysis, Design and Modeling 18-21
3.1 Overall description of the project
3.2 Functional requirements
3.3 Non Functional requirements
3.4 Logical database requirements
3.5 Design Diagrams
3.3.1Use Case diagrams
2
3.3.2 Class diagrams / Control Flow Diagrams
3.3.3 Sequence Diagram/Activity diagrams
Chapter-4 Implementation and Testing 22-25
4.1 Implementation details and issues
4.1.1 Implementation Issues
4.1.2 Algorithms (Module wise- with respect to design)
4.2 Risk Analysis and Mitigation
Chapter-5 Testing (Focus on Quality of Robustness and Testing) 26-28
5.1 Testing Plan
5.2 Component decomposition and type of testing required
5.3 List all test cases
5.4 Error and Exception Handling
5.5 Limitations of the solution
Chapter-6 Findings & Conclusion 29-29
5.1 Findings
5.2 Conclusion
5.3 Future Work
References 30-30
Brief Bio-data (Resume) 31-31
3
(II)
DECLARATION
I hereby declare that this submission is my own work and that, to the best of my knowledge and
belief, it contains no material previously published or written by another person nor material which
has been accepted for the award of any other degree or diploma of the university or other institute of
higher learning, except where due acknowledgment has been made in the text.
Place: Noida Signature:
Date: 04-06-2015 Name:Utkarsh
Enrollment No:9911103587
4
(III)
CERTIFICATE
This is to certify that the work titled Sentiment Analysis of Opinions submitted by Utkarsh in
partial fulfillment for the award of degree of B. Tech of Jaypee Institute of Information
Technology University, Noida has been carried out under my supervision. This work has not been
submitted partially or wholly to any other University or Institute for the award of this or any other
degree.
Signature of Supervisor
Name of Supervisor Mr. Sudhanshu Kulshrestha
Designation Asst. Prof., Deptt. of CSE
Date 04-06-2015
5
(IV)
ACKNOWLEDGEMENT
The satisfaction which accompanies the successful completion of any project is incomplete without
the mention of names of those who made it possible, because success is the epitome of hard work,
perseverance, undeterred courage, zeal, determination and the most encouraging guidance and
advice which serve as the beacon light and crown our effort with success.
I would like to thank Mrs. Shelly Sachdeva(Project Coordinator) for constructive instructions and
appreciation.
I am deeply indebted to Mr. Sudhanshu Kulshrestha(Project Guide) for his constant guidance,
constructive consoling and unfailing encouragement throughout the completion of this project.
I would also like to thank other faculty of CSE department for their continuous help for effective
implementation of this project and also for finalization of this project report.
Lastly, I thank our families for their support and encouragement.
Signature of the Student
Name of Student Utkarsh
Enrollment Number 9911103587
Date 04-06-2015
6
(V)
SUMMARY
The Sentiment Analysis of Opinions is one of the works in Natural Language Processing and there
are various open problems exist in this field of study. In this project, the Problems is To detect
sentiments and output the scores for the overall sentiments in the given text.This project is about
detecting sentiments in a opinions/opinions given as text in simple English. It gives scores positive
if overall sentiments of the given text are positive and negative if overall sentiments of the given
text is negative otherwise zero for neutral. It is based on linguistic approach using one of the
modules of open source python library called NLTK. The other methods which are available, like
naive bias classifiers can also be used for detecting and mentioning sentiments. This project is
written in python 2.7 language using IDLE as editor and tkinter module is used to get Text as input
and to Display its output in a separate window.
__________________ __________________
Signature of Student Signature of Supervisor
Name: Utkarsh Name :Mr.Sudhanshu Kulshrestha
Date:04 - 06 - 2015
7
(VI)
LIST OF FIGURES
Sentiment Analysis Page Number 17
Use cases diagrams Page Number 20
Class diagrams Page Number 20
Activity or Flow diagrams Page Number 21
IG diagram Page Number 24
8
(VII)
LIST OF TABLES
Tables Page Number
1.Tabular comparison of other existing approaches/ solution to the 15
problem framed
3.Risk analysis and mitigation plan 23
4.Component testing 24
5.Top risk on the basis of IG diagram 24
6.Mitigation Approaches 25
7.Additional resources needed for mitigation 25
8.Testing Plan 26
9.Testing Team Details 27
10.Test Environment 27
11.Component decomposition and type of testing 28
12.List of all test cases 28
9
(VIII)
LIST OF SYMBOLS & ACRONYMS
1.NLTK: Natural Language Tool Kit
2.NLP:Natural Language Processing.
3.IDLE:Python Editor.
4.IDE :Integrated Development environment
5.OS: operating system
6.DOS: Disk operating systems
10
Chapter-1 Introduction
1.1 General Introduction
The Sentiment analysis is part of natural language processing. Natural language Processing is used
for data analytics purpose, to extract meaningful information from lots of data. This is one of the
methods to get information about current trend in the market of what people are thinking or talking
on social media. There are so many practical applications present in the current world like in
election which party is favourable or gaining popularity or a customer watching for reviews before
actually buying something online. These are few of the applications which are getting harder to
solve as size of data keeps on increasing. Big part goes to arrange this data into something
meaningful before analysing it. This part of arrangement of data is called Text Classification.
Sentiment classification and analysis is performed in python using nltk module. Python has special
module NLTK to do tasks in natural language processing. It supports multiple languages like
English, Hindi, Chinese etc to do classification of text or data into something meaningful.
Text Classification can be performed in following ways:
1. Sentiment-Classification
2. Features-based-Sentiment-classification
3. Summarization-of-sentiments
These classifications classify the complete document in accordance with the sentiments or opinions
listed in the text. Feature based approach however, classifies the sentiments based on specifications
of the entity(Noun) listed in the text. This approach reveals about good or bad quality about certain
entities based on the details listed with it. Opinions summarization is similar to text summarization
but opinion summarization gives a clear indication about the sentiment attached with the text. It
outputs the sentiment precisely not in the form of substring of the given text, It mentions the text in
the positive or negative words about the entities so that a whole document can be best described in
few words without losing the abstract of the document. These types of classification can be
performed before actually analysing the text. After text classification, it performs tagging with the
words.
Sentiment classification can be performed at different level.
1. Document Level
11
2. Sentence level
3. Word level
English is one of the most preferred language to work for natural language processing. This project
is based on opinions in English language, does not support other languages at all.
Consider an example : "I watched the movie burger. The movie was very good and the actor did an
awesome job."
"When Modi returned from U.S.A., I got my 15 lakhs as promised by PM Modi"
It clearly tells about the movie and the actor stating positive review. However the sentiment
classifier is still not able to classify sarcasm. It is still a big problem for data analytics and a topic of
research. How to perform this in a machine language is much harder. There are approaches which
perform such operations
1. Linguistic approach
2. Machine Learning
1.2 Current Open Problems/Issues
1. Linguistic Approach:
It the basic approach to deal with the sentiment analysis. It uses tagging technique with the
tokens and then starts analyzing it. The problems with this approach is
 Negation: This approach can not deal with negation very well. Few times, this
approach produces opposite in sense result .
 Grammatically incorrect sentences: This approach uses datasets to match words
during tagging. So if a sentences with polarity is formed grammatically incorrect, it
is not possible to match it with the existing datasets of polar words. The datasets
must contain all the polar words used in regular language to make it more efficient.
 Sometimes, users say something but mean something else type sentences in the text
which make sense but not analyzable by a machine.
2. Machine-Learning-approach: There are methods which classify the text like naive_bias or
S_V_M also suffers from problems like:
1. Sarcasm:.
2. Jumbling of words
12
3. Chatting text or tweets: Limited words to type
1.3 Problem Statement
To detect the given text as input, perform analysis on the data and show the score of the polarity of
input text. The score shows the polarity of the text. If it is greater than zero means sentiment is
positive else negative..
The input will be taken from the user in string format. After inputing the string, the approach used
in this project classify/toekenize the text in tokens. When tokenization is completed, it starts
operation of tagging to each token and then evaluate it. This generates a score which after
conversion from integer to string is displayed on the screen.
1.4 Overview of proposed solution approach and Novelty/benefits
Proposed solution is built in python 2.7 using nltk module and tkinter to take input from the user
and to display output. It is based on linguistic approach. It takes the string, tokenized it and matches
the tokens with the datasets in database with the tags added with tokens. Finally it evaluates the
score for each polar words and calculates the score for the given text.
The file contains code is divided into classes:
 splitter_class: The given input is in string format. The whole paragraph can not be evaluated
as it it. First this class splits the text into tokens/words using tokenizer function of nltk
module.
 pos_tagger_class: When splitting is done, this class adds tags to the each tokens so that these
tokens can be classified as verbs or nouns or adverbs or adjectives etc. This class does the
tagging work and returns the tagged sentences.
 dicionary_tagger_class: This class uses the datasets available with the project to make a
dictionary for all the tokens tokenized by splitter_class and make a dictionary of tagged
tokens.
1.5 Give tabular comparison of other existing approaches/ solution to the
problem framed
Linguistic_approach Simple approach, easy to code and good results
13
with simple texts
Naive_bias_classifier approach Machine_Learning approach, works by first
learning the text then evaluating the other part of
text based on the learning outcome, outperforms
linguistic approach
Support_Vector machine approach Machine_Learning approach, works by data
analysis and finding patterns in the data to
evaluate. Gives a better classification, Better
Results.
14
Chapter-2 Literature Studied
2.1 Summary of Papers
 Paper-1 : Sentiment Analysis And Opining Mining
By: G.Vinodhini and RM.Chandrasekaran [June 6,2012]
Department of Computer Science and Engineering, Annamalai
University, Annamalai Nagar-608002.
Summary
The big volume of data present on internet today consisting of regular updating and increasing in
size of social networks, news, entertainment, reviews, blogs, discussions forums provides a large
number of opinions. The data analytics focus of these opinions for sentiment analysis work.
Researchers are currently working to build a software to detect and classify the texts available
online. The precise information extracted from these type of resources present on internet today can
give us lots of information about user's liking, disliking, what they want or do not want to buy and it
can be used by the other party to take advantage of this information to provide better deals to the
users or help users to get better deals in case of reviews. The data available on internet after
classification and analyzing can be very valuable to the users.
This paper detailed about the survey describing about the methods in data analytics and the
problems exist in the area of data analytics /sentiment analysis.
Weblink- http://www.dmi.unict.it/~faro/tesi/sentiment_analysis/SA2.pdf
 Paper-2 : Boost up! Sentiment Categorization with Machine
Learning Techniques
By: Andr´es Cassinelli, Chih-Wei Chen [ June 5,2009]
Summary
To calculate the sentiment of a given text or opinion or review, it is noted that methods have an
analysis nearly same to the past works in data analytics in reviews or sentiment analysis, it works
precisely in a better way. If these methods are applied to the multi-classfication techniques, the
results could be quite same. On applying classification techniques on the data, it first uses the data
as training set to train itself and the evaluates the rest of the data, so the technique mentioned in the
paper describes the relationship between the objects in an efficient way.
Weblink- http://www.cs.cornell.edu/home/llee/papers/sentiment.pdf
 Paper-3: Twitter as a Corpus for Sentiment Analysis and Opinion
Mining
By: Alexander Pak, Patrick Paroubek [2010]
Universit´e de Paris-Sud, Laboratoire LIMSI-CNRS, Bˆatiment 508,F-
91405 Orsay Cedex, France
15
Summary
Today Social network sites like twitter, facebook, google plus, linkedin etc are famous tools to
communicate with other people on internet. Thousands of people shares information with each
other. This information may be useful for some or waste data for some. If properly analysed, this
data could be very useful for some purposes. It may be in the form of opinions or results to others.
So these social sites can be very effective in generating information (also useful) about so many
aspects in today's life for human. But there is less work done in recent times because these social
networking sites came into existence shortly. In this paper, the author specifies the details using
Twitter, one of the most famous social network in present world, for the works of sentiment
analysis.
Weblink: http://lrec-conf.org/proceedings/lrec2010/pdf/385_Paper.pdf
2.2 Integrated Summary of the literature studied
Sentiment analysis is currently one of the popular topic in research field.There are various works
going on in this area for different languages not studied until now like Arabic, Hindi, Thai etc.
There are various open source libraries available for different languages like python, R etc which
makes the work easy to analyze the text and process it. It can be used for various purposes like in
reviewing movies, products of a companies, about companies, feeling or emotions of citizens for a
country. The most popular way to get this information on social media and analyze it. To make it
into something meaningful sense, the classifier techniques must be used.
The data must be in readable format, in English. The classifiers are used to tokenize of classify the
data. The SuperWised learning technique is used with machine learning approach to detect
sentiments and analyze the sentiments of the rest of the text . Un-Superwised learning is linguistic
approach in which text is first tokenized into tokens and added with tags to evaluate the sentiments
of the text.
How to get lots of data to evaluate:
 Social sites
1. Facebook.com
2. Twitter.com
3. LinkedIn.com
 News websites and comments
 Movie Reviewing sites
 Products selling sites
1. Flipkart
2. Snapdeal
 Blogs etc
Techniques used presently are:
 Machine learning
1. Naive_Bias_classifier
2. Suppport_Vector_Machine
3. Decision_tree
16
Text Structure:
 A array of sents/sentences
 Each sent is again tokenized called tokens
 Each word or token is padded with 2 other tags in dictionary format. These added tags make
each token to be recognized as verbs, nouns, adjectives, adverbs etc to verify if that token is
polar word or not.
 Separate datasets are there so that each token can be matched with words present in the
datasets.
First, collection of data is a concern.Useful data is what is required before analysing the
data.Sentiment analysis is performed on the data which is about a product or review and user wants
to know about if it is good or not. Sentiments can have various types of polarity or emotions about
something particular.
Summarizing the opinions is also one of the great concern for today's reseachers. summarizing the
sentiments does not deal with subset of text or its one part of text to be printed. It is printing the
data with a precise sense in fewer number of words and it also contains the subject of the text.
17
opinionative words or phrases
Features
Fig1Sentiment Classification and analysis.
Product Reviews
Sentiment_identific
ation
Feature_Selection
Sentiment_classification
Sentiment Polarity
18
Chapter-3 Analysis, Design and Modeling
3.1 Overall description of the project
3.1.1 Introduction
This software is built on windows(8) platform 64 bit, Python version -2.7 ,32 bit system.
It uses "nltk" module which can be downloaded from nltk.org . The input section uses
tkinter module to get the input and to display the output. Tkinter must be downloaded first
to run this on any system . All the listed Setups above are available free on the python
official website.
 Purpose
This software can be used by any user who wants to analyze movie reviews or
product reviews or any opinions in positive or negative.
 Scope
The opinions must be in English and simple words. It does not support other
languages. It may not handle sarcastic or negation well. So in that case, result may
vary or unexpected.
 Product perspective
This software doesn't depend on any other hardware of software other than resources
provided by a system. Python setup with nltk and tkinter module do all the work
required.
 Product functions
This software takes a string typed by user and produces the sentiment score . The
user needs to type a string and wait for the output. Output may take some time for
processing depends on the size of text typed by the user.
 User characteristics
The users can be anyone who wants to analyse data on the basis of polarity of the
sentences. It works in the same way for each user and execution time of
text_processing depends on the size of data given to the software by user.
 Constraints
The user must know English and know how to install python setups. If python
setups are installed, no pre-requisite knowledge is required to handle this software.
Hardware configuration must be met.
 Assumptions and dependencies
System must support python 2.7 32 bit. tkinter may not work with 3.x python
because of syntax change. Windows platform must be xp or wista or windows 7 or 8.
Memory must be 512mb at least. System handles text files.
3.2 Functional Requirements:
Sentiment analysis has to be performed on text in English and it gives output as:.
a. Positive
b. Negative
c. Neutral (zero)
19
3.3 Non Functional Requirements
 Data selection: data can be downloaded from standford site or various user reviews
sites or social networks. Reviews for movies and reviews for product must be
checked for separate datasets listed in the database.
 Accessibility: To access the data listed on nltk, run "nltk . download( ) on idle
 Documentation-Proper comments are there within each file for explanation.
 Maintainability - Codes does not need to be maintained if not altered.
 Portability - The user just need to run the .py file on any system to analyse
reviews/opinions.
 Reliability - It depends on the structure language of opinions.
 Response Time - Long reviews can take more time to pre-process it and then
tokenization. .
3.4 Logical database requirements
For database , separate files are added with the source code in separate folder with an extension
yaml. .yaml extension is easy to map with data members which are common for various languages
like arrays, dictionaries etc. There is no sql or other data base concepts are used in the project.Data
sets Files are attached with the source code using their director/file name paths with Python file
handling.
3.5 Design Diagrams
 3.3.1Use Case diagrams
user
1.Input String
2. takes
input
3.Press Enter
4. Start
processing the
data,tokenization
5.waiting for the Output
6.Output screen appears with
the sentiment score
Backgr
ound
Proces
sing
20
 Class diagrams / Control Flow Diagrams
Pos_Tagger_Class
+init()
+pos_tag()
Dictionary_tagger_
Class
+init()
+tag()
+tag_sentence()
Object class(python)
Splitter_class
+init()
+split()
21
 Sequence Diagram/Activity diagrams [3]
22
Chapter-4 Implementation details and issues
4.1 Implementation details and issues
The implementation is done in Python 2.7 using nltk and tkinter module. NLTK module is used for
text processing purpose which is open sourced. nltk gives many corpa for data analytics purpose.
These corpa can be used to recreate grammar or taggers which againg can be used with the tokens
for tagging and generating efficient classified data.
To download corpus like chas, books or novels listed to be used with data analytics purpose, run
nltk . download ( ) in python editor. this will download all the required documents for the
sentiment analysis purpose and can be used by importing "import nltk" .
It uses file handling in python. So check the path carefully first. All the files must be placed first
and its path names must be given to the dictionary_tagger_class.
Python 3.x may not be compatible with this code as there are many functions or tkinter changed in
3.x versions of python. It contains 3 classes:
 splitter_class: To split texts into tokens
 pos_tagger_class: for tagging purpose
 dictionary_tagger_class: make tagged tokens a dictionary data-type
4.1.1 Implementation Issues
Finding compatible functions with the nltk module and html parsing functions were few of
the issues with the project. there are many changes in python 2.7 and 3.x versions so
keeping syntax with compatible version was also one of the issues. Tkinter is also different
for python 2.7 and python 3.x as there are syntax changes in python 2.7 .
4.1.2 Algorithms (Module wise- with respect to design)
First module deals with the copying the content from web for downloading the reviews.
Second module deals with the tokenization process of texts and converting it into lists of
strings.
Third module deals with the tagging the tokens with accurate tags.
Fourth module deals with the file handling to add the files of datasets to the source code.
Fifth module deals with the making of dictionary tagged data members of text tokens.
Sixth module deals with the displaying the text attached with the polar words of text and the
result.
For Input and Output, Tkinter is being used here with python. It takes input and supplies it
to the source code of the sentiment analysis code and after processing, sentiment analysis
code returns the score for sentiment analysis which displayed on the screen using Tkinter.
Tkinter is a separate module for python.
The approach is Linguistic approach. In this approach, first, text is tokenized using
tokenizer_ function and then added tags with it. These tokens are then matched with the
existing data sets stored separately using .yaml extension. If token is found, it compares for
the attached tag with the token. On the basis of attached tag, it evaluates if it is positive or
negative. If the token s not found in the datasets, it is treated as neutral. Adjective or
Adverbs increases the score in the direction of polarity of words.
23
4.2 Risk Analysis and Mitigation
Ris
k
Id.
Description of risk Risk area Probabilit
y
(P)
Impac
t
(I)
PE
R*I
Risk
selected
for
mitigatio
n
(Y/N)
Mitigatio
n plan
Classificatio
n
1 Memory
Overflow/underflo
w
Memory 0.001 L 0.00
1
Y Try/catch
block
Code and
Unit test
2 Invalid Input( not
string)
Conversio
n of data
type
problem,
too large
numbers,
passing
string of
greater
size than
allowed
0.3 L 0.3 N Code and
unit test
7 Improper use of
function(not
passing required
parameters )
Prototypin
g
0.3 M 0.9 N Coding
Implentation
24
Interrelationship Graph
3 Performance Time of
execution
0.3 M 0.9 N Development
Process
4 Complier not
working
Compiler
problem
0.001 L 0.001 Y Re-
insall/Re-
open
Environment
and test
5 Code not working Code
altered
0.3 M 0.9 N Engineering
Specialities
6 Unwanted output Code
altered
0.1 L 0.1 N Engineering
Specialities
Memory
wt:0.001
Code Not
working
wt:0.9
Perfor
mance
wt:0.9
Unwanted
Output
wt:0.1
Prototyping
wt:0.9
Compiler
problem
wt:0.001
Data
Type/range
wt:0.3
25
S.No Risk Area # of Risk
statements
Weights(in+out) Total weight Priority
1 Code altered 4 0.1+0.1+0.1+0.9 1.2 High
2 Memory 2 0.001+0.3 0.301 Low
3 Data
type/range
2 0.3+0.9 1.2 High
4 Performance 1 0.1 0.1 Low
5 Prototyping 2 0.3+0.9 1.2 High
6 Compiler
problem
1 0.9 0.9 Medium
Top Risks as the ones with maximum total weight from the graph
Risk Id Risk Statement Risk Area Priority of Risk area
in IG
1 Code not
working/unwanted
output
Code Altered 1
Mitigation Approaches
Use Try/catch block for invalid input constraints.
Make function definition private..
For compiler problem, re-install/re-open it or check for the python path in the environment
variable.
For unwanted output, check for the range of input values or prototypes of functions.
Date Started Date To complete Owner
1 - May -2015 15 - May - 2015 Utkarsh
Additional resources needed for mitigation
Copy the source code for backup.
26
Chapter-5 Testing (Focus on Quality of Robustness and Testing)
5.2.1 Testing Plan
The source code for sentiment analysis is checked for different reviews taken from different sites. A
test file is also maintained for this purpose in a separate folder and its output is also saved. The type
of testing performed is mentioned here:.
Type of Test Will test be
performed?
Comments/explanation Software component
Requirement testing Yes
Unit Yes Listed in first program source files
Integration Yes Linked with source
file using fle handling
Database files
Performance Yes Depends on the
execution of text input
Length of text in
tkinter
Stress Yes Compiled py files
Compliance No
Security No Not hidden Dot py file for
implementation
Load No
Volume No
Example test cases Yes Number of test cases
are written in main file
and added with
datasets
Main files and
datasets
Compilation Yes For syntactical errors Python source files
Test Team Details
Test Schedule
Activity Start date Completion date Hours Comments
Obtain input
data
01/05/2015 10/05/2015 3 hours/Day Input taken from
various sources
Tester Utkarsh Performed all the test cases
27
on internet
Test region
setup
11/05/2015 15/05/2015 3 hours/Day Input taken from
various sources
on internet
TEST ENVIRONMENT- Description of test platforms
Software Items
Operating systems windows 8 Notepad
Python editor and compiler tkinter and nltk
Hardware Items
A complete system with pre-installed software for running python programs, nltk and tkinter
modules
5.2 Component decomposition and type of testing required
S.No List of various components Type of testing
required
Technique of writnig
test cases
1 TEST1 Integration White Box
2 TEST2 Performance Blak Box
3 TEST3 Example test cases Black box
5.3 List all test cases in prescribed format
Test cases for component
Test case Id Input Output Status
TEST1 Linked with file Console output score Pass
TEST2 Datasets Console output score Pass
TEST3 Numbers Integral Fail
TEST3 String Score Pass
4 TEST4 Compilation White Box
28
TEST3 Review from online
site
Score Pass
TEST4 Example test cases
linked with separate
files
Console output Pass
5..4 Error and Exception Handling (mention debugging techniques with which
you have corrected errors)
Test case id Test Case for component Debugging technique
1 Tkinter Print or tracing
2 Source code Backtracking
5.6 Limitations of the solution
The source code does not work for the following test cases:
 Grammatically ill formed sentences.
 Sentences having Sarcasm.
 Negation may not be handled well by the source code
 Too large text (in MB data of text file).Python takes lot of time to execute this much of data.
 Jumbling of words in sentences.
29
Chapter-6 Findings & Conclusion
6.1 Findings
The sentiment analysis is efficient for simple English, not for any other language. The sentence
formation must be simple and straight forward because it does not handle various cases of sentences
formation like jumbling of words or sarcastic sentences. Input can be taken from tkintr in text
format and similarly displayed. nltk module works really good for natural language processing. It
also provides other techniques to classify the text like naive-bias classifier or svm. Nltk includes
different kind of tagging functions to add tags with tokens.
6.2 Conclusion
.This approach used in the project works efficiently with plain English text. It is easy to code and
simple in understanding, does not require regular expression construction. There are built taggers
available which an be used directly with the texts. To make more efiicient, different techniques can
be grouped together.Naive_Bias_classifier or S_V_M can work better in case of complex sentences.
6.3 Future Work
 Using different techniques like machine learning ,super_wised learnig to train the one part
of text and use this training to analyze the rest of the text.
 Combine different techniques to see the result of combined approach of algorithms
 This work can be extended for other languages like Hindi etc.
 Construction of Regular Grammar makes the tagging part more efficient. Generate own
regular expressions.
30
References
[1] http://en.wikipedia.org/wiki/sentiment-analysis1
[2] http://inltk.org
[3] http://marl.gi2mo.org/img/class_diagram_v0.2.png
[4] http://www.nltk.org/books
[5] http://nlp.stanford.edu/IR-book/html/htmledition/edit-distance-1.html
[6] https://wiki.python.org/moin/TkInter
[7] www.tutorialspoint.com/python/python_sending_email.htm
Appendix
A. Time Line
01-02 04-03 20-03 25-04 10-05 25-05 04-06
Synopsis
Study research
papers and
Implementation
Midterm report
Implementation
Testing
Report
31
Resume
Utkarsh
Date of Birth: 15-08-1993
E-Mail: soniutkarsh@ymail.com
Phone No.: +91-8468088422 Codechef Profile:Utkarsh3587
Interests:
 Data Structures
 Algorithms
 Operating Systems
 Object Oriented Programming
Education:
 B.Tech., Computer Science & Engineering-2015
Jaypee Institute of Information Technology , Noida
4th
year (7th
Semester) , Current CGPA : 6.2/10.
 Senior Secondary-2010
Sardar Patel Public Senior Secondary School , Delhi
CBSE with 74.6% .
 Secondary-2008
Sardar Patel Public Senior Secondary School , Delhi
CBSE with 83.8%.
Skillset:
Programming Languages: C , C++
Operating Systems : Ubuntu , Windows
Web Technologies: HTML, CSS, JavaScript
Projects:
 Hybrid Cross Platform Application
This Project was done on PhoneGap Platform using web technologies like html, css and java
script. Under this project I have implemented some functionalities like downloading study
material, playing quizzes , reading newspaper and few other functions etc.
 Face Recognition Application using OpenCV for Android
It was an android application project based on Image Processing using OpenCV libraries. It
detects faces and recognizes them on the basis of stored images.

Weitere ähnliche Inhalte

Was ist angesagt?

Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...Geetika Gautam
 
Approaches to Sentiment Analysis
Approaches to Sentiment AnalysisApproaches to Sentiment Analysis
Approaches to Sentiment AnalysisNihar Suryawanshi
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment AnalysisRebecca Williams
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysisAshish Mundra
 
Amazon sentimental analysis
Amazon sentimental analysisAmazon sentimental analysis
Amazon sentimental analysisAkhila
 
Project sentiment analysis
Project sentiment analysisProject sentiment analysis
Project sentiment analysisBob Prieto
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis pptSonuCreation
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarRavi Kumar
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysisSunil Kandari
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysisRahul Jha
 
Social Media Sentiments Analysis
Social Media Sentiments AnalysisSocial Media Sentiments Analysis
Social Media Sentiments AnalysisPratisthaSingh5
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in TwitterAyushi Dalmia
 
Sentiment Analysis Using Product Review
Sentiment Analysis Using Product ReviewSentiment Analysis Using Product Review
Sentiment Analysis Using Product ReviewAbdullah Moin
 
Sentiment analysis in twitter using python
Sentiment analysis in twitter using pythonSentiment analysis in twitter using python
Sentiment analysis in twitter using pythonCloudTechnologies
 
Sentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhySentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhyDavide Feltoni Gurini
 
Movie recommendation project
Movie recommendation projectMovie recommendation project
Movie recommendation projectAbhishek Jaisingh
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on TwitterNitish J Prabhu
 

Was ist angesagt? (20)

Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
 
Approaches to Sentiment Analysis
Approaches to Sentiment AnalysisApproaches to Sentiment Analysis
Approaches to Sentiment Analysis
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
 
Amazon sentimental analysis
Amazon sentimental analysisAmazon sentimental analysis
Amazon sentimental analysis
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
 
Project sentiment analysis
Project sentiment analysisProject sentiment analysis
Project sentiment analysis
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Mini Project- Face Recognition
Mini Project- Face RecognitionMini Project- Face Recognition
Mini Project- Face Recognition
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
 
Social Media Sentiments Analysis
Social Media Sentiments AnalysisSocial Media Sentiments Analysis
Social Media Sentiments Analysis
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
 
Sentiment Analysis Using Product Review
Sentiment Analysis Using Product ReviewSentiment Analysis Using Product Review
Sentiment Analysis Using Product Review
 
Sentiment analysis in twitter using python
Sentiment analysis in twitter using pythonSentiment analysis in twitter using python
Sentiment analysis in twitter using python
 
Sentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhySentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and Why
 
Movie recommendation project
Movie recommendation projectMovie recommendation project
Movie recommendation project
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on Twitter
 

Andere mochten auch

Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
Project presentation
Project presentationProject presentation
Project presentationUtkarsh Soni
 
Ads team12 final_project_presentation
Ads team12 final_project_presentationAds team12 final_project_presentation
Ads team12 final_project_presentationPriti Agarwal
 
Opinion mining and summarization
Opinion mining and summarizationOpinion mining and summarization
Opinion mining and summarizationGeorge Ang
 
2005 Web Content Mining 4
2005 Web Content Mining   42005 Web Content Mining   4
2005 Web Content Mining 4George Ang
 
A survey on sentiment analysis and opinion mining
A survey on sentiment analysis and opinion miningA survey on sentiment analysis and opinion mining
A survey on sentiment analysis and opinion miningeSAT Journals
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion MiningShital Kat
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion MiningAli Habeeb
 
Text Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion MiningText Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion MiningFabrizio Sebastiani
 
Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges Jaganadh Gopinadhan
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...Srivatsan Ramanujam
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter DataNurendra Choudhary
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networkseSAT Publishing House
 
Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Kavita Ganesan
 
English Parts Of Speech
English Parts Of SpeechEnglish Parts Of Speech
English Parts Of Speechguesta684c8b
 

Andere mochten auch (18)

Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Project presentation
Project presentationProject presentation
Project presentation
 
Ads team12 final_project_presentation
Ads team12 final_project_presentationAds team12 final_project_presentation
Ads team12 final_project_presentation
 
Opinion mining and summarization
Opinion mining and summarizationOpinion mining and summarization
Opinion mining and summarization
 
2005 Web Content Mining 4
2005 Web Content Mining   42005 Web Content Mining   4
2005 Web Content Mining 4
 
Omsa
OmsaOmsa
Omsa
 
A survey on sentiment analysis and opinion mining
A survey on sentiment analysis and opinion miningA survey on sentiment analysis and opinion mining
A survey on sentiment analysis and opinion mining
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion Mining
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion Mining
 
TEXT SUMMARIZATION
TEXT SUMMARIZATIONTEXT SUMMARIZATION
TEXT SUMMARIZATION
 
Major
MajorMajor
Major
 
Text Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion MiningText Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion Mining
 
Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter Data
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networks
 
Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)
 
English Parts Of Speech
English Parts Of SpeechEnglish Parts Of Speech
English Parts Of Speech
 

Ähnlich wie Project report

Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsKimberly Pulley
 
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...TELKOMNIKA JOURNAL
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmKhushboo Gupta
 
Sentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdfSentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdfOmSatpathy
 
Data Science - Experiments
Data Science - ExperimentsData Science - Experiments
Data Science - ExperimentsGaurav Marwaha
 
A survey on sentiment analysis and opinion mining
A survey on sentiment analysis and opinion miningA survey on sentiment analysis and opinion mining
A survey on sentiment analysis and opinion miningeSAT Publishing House
 
IRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie ReviewsIRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie ReviewsIRJET Journal
 
Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewINFOGAIN PUBLICATION
 
New Skills for Testers
New Skills for TestersNew Skills for Testers
New Skills for TestersIOSR Journals
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
The 't' in tel software development for tel research problems, pitfalls, and ...
The 't' in tel software development for tel research problems, pitfalls, and ...The 't' in tel software development for tel research problems, pitfalls, and ...
The 't' in tel software development for tel research problems, pitfalls, and ...Roland Klemke
 
IRJET- Product Aspect Ranking
IRJET-  	  Product Aspect RankingIRJET-  	  Product Aspect Ranking
IRJET- Product Aspect RankingIRJET Journal
 
INTERPRETER AND APPLIED DEVELOPMENT ENVIRONMENT FOR LEARNING CONCEPTS OF OBJE...
INTERPRETER AND APPLIED DEVELOPMENT ENVIRONMENT FOR LEARNING CONCEPTS OF OBJE...INTERPRETER AND APPLIED DEVELOPMENT ENVIRONMENT FOR LEARNING CONCEPTS OF OBJE...
INTERPRETER AND APPLIED DEVELOPMENT ENVIRONMENT FOR LEARNING CONCEPTS OF OBJE...ijpla
 
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...IRJET Journal
 
To document or not to document? An exploratory study on developers' motivatio...
To document or not to document? An exploratory study on developers' motivatio...To document or not to document? An exploratory study on developers' motivatio...
To document or not to document? An exploratory study on developers' motivatio...Hayim Makabee
 
IRJET - Response Analysis of Educational Videos
IRJET - Response Analysis of Educational VideosIRJET - Response Analysis of Educational Videos
IRJET - Response Analysis of Educational VideosIRJET Journal
 
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSTOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSijistjournal
 

Ähnlich wie Project report (20)

Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel Reviews
 
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
 
Sentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdfSentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdf
 
Data Science - Experiments
Data Science - ExperimentsData Science - Experiments
Data Science - Experiments
 
A survey on sentiment analysis and opinion mining
A survey on sentiment analysis and opinion miningA survey on sentiment analysis and opinion mining
A survey on sentiment analysis and opinion mining
 
IRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie ReviewsIRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie Reviews
 
Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A Review
 
New Skills for Testers
New Skills for TestersNew Skills for Testers
New Skills for Testers
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Financial Tracker using NLP
Financial Tracker using NLPFinancial Tracker using NLP
Financial Tracker using NLP
 
The 't' in tel software development for tel research problems, pitfalls, and ...
The 't' in tel software development for tel research problems, pitfalls, and ...The 't' in tel software development for tel research problems, pitfalls, and ...
The 't' in tel software development for tel research problems, pitfalls, and ...
 
IRJET- Product Aspect Ranking
IRJET-  	  Product Aspect RankingIRJET-  	  Product Aspect Ranking
IRJET- Product Aspect Ranking
 
INTERPRETER AND APPLIED DEVELOPMENT ENVIRONMENT FOR LEARNING CONCEPTS OF OBJE...
INTERPRETER AND APPLIED DEVELOPMENT ENVIRONMENT FOR LEARNING CONCEPTS OF OBJE...INTERPRETER AND APPLIED DEVELOPMENT ENVIRONMENT FOR LEARNING CONCEPTS OF OBJE...
INTERPRETER AND APPLIED DEVELOPMENT ENVIRONMENT FOR LEARNING CONCEPTS OF OBJE...
 
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
 
N01741100102
N01741100102N01741100102
N01741100102
 
To document or not to document? An exploratory study on developers' motivatio...
To document or not to document? An exploratory study on developers' motivatio...To document or not to document? An exploratory study on developers' motivatio...
To document or not to document? An exploratory study on developers' motivatio...
 
IRJET - Response Analysis of Educational Videos
IRJET - Response Analysis of Educational VideosIRJET - Response Analysis of Educational Videos
IRJET - Response Analysis of Educational Videos
 
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSTOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
 

Kürzlich hochgeladen

Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxNadaHaitham1
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEselvakumar948
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxchumtiyababu
 

Kürzlich hochgeladen (20)

Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 

Project report

  • 1. 1 (I) TABLE OF CONTENTS Chapter No. Topics Page No. Student Declaration II Certificate from the Supervisor III Acknowledgement IV Summary (Not more than 250 words) V List of Figures VI List of Tables VII List of Symbols and Acronyms VIII Chapter-1 Introduction 10-13 1.1 General Introduction 1.2 List some relevant current/open problems. 1.3 Problem Statement 1.4 Overview of proposed solution approach and Novelty/benefits 1.5 Give tabular comparison of other existing approaches/ solution to the problem framed Chapter-2 Literature Survey 14-17 2.1 Summary of papers studied 2.2 Integrated summary of the literature studied Chapter 3: Analysis, Design and Modeling 18-21 3.1 Overall description of the project 3.2 Functional requirements 3.3 Non Functional requirements 3.4 Logical database requirements 3.5 Design Diagrams 3.3.1Use Case diagrams
  • 2. 2 3.3.2 Class diagrams / Control Flow Diagrams 3.3.3 Sequence Diagram/Activity diagrams Chapter-4 Implementation and Testing 22-25 4.1 Implementation details and issues 4.1.1 Implementation Issues 4.1.2 Algorithms (Module wise- with respect to design) 4.2 Risk Analysis and Mitigation Chapter-5 Testing (Focus on Quality of Robustness and Testing) 26-28 5.1 Testing Plan 5.2 Component decomposition and type of testing required 5.3 List all test cases 5.4 Error and Exception Handling 5.5 Limitations of the solution Chapter-6 Findings & Conclusion 29-29 5.1 Findings 5.2 Conclusion 5.3 Future Work References 30-30 Brief Bio-data (Resume) 31-31
  • 3. 3 (II) DECLARATION I hereby declare that this submission is my own work and that, to the best of my knowledge and belief, it contains no material previously published or written by another person nor material which has been accepted for the award of any other degree or diploma of the university or other institute of higher learning, except where due acknowledgment has been made in the text. Place: Noida Signature: Date: 04-06-2015 Name:Utkarsh Enrollment No:9911103587
  • 4. 4 (III) CERTIFICATE This is to certify that the work titled Sentiment Analysis of Opinions submitted by Utkarsh in partial fulfillment for the award of degree of B. Tech of Jaypee Institute of Information Technology University, Noida has been carried out under my supervision. This work has not been submitted partially or wholly to any other University or Institute for the award of this or any other degree. Signature of Supervisor Name of Supervisor Mr. Sudhanshu Kulshrestha Designation Asst. Prof., Deptt. of CSE Date 04-06-2015
  • 5. 5 (IV) ACKNOWLEDGEMENT The satisfaction which accompanies the successful completion of any project is incomplete without the mention of names of those who made it possible, because success is the epitome of hard work, perseverance, undeterred courage, zeal, determination and the most encouraging guidance and advice which serve as the beacon light and crown our effort with success. I would like to thank Mrs. Shelly Sachdeva(Project Coordinator) for constructive instructions and appreciation. I am deeply indebted to Mr. Sudhanshu Kulshrestha(Project Guide) for his constant guidance, constructive consoling and unfailing encouragement throughout the completion of this project. I would also like to thank other faculty of CSE department for their continuous help for effective implementation of this project and also for finalization of this project report. Lastly, I thank our families for their support and encouragement. Signature of the Student Name of Student Utkarsh Enrollment Number 9911103587 Date 04-06-2015
  • 6. 6 (V) SUMMARY The Sentiment Analysis of Opinions is one of the works in Natural Language Processing and there are various open problems exist in this field of study. In this project, the Problems is To detect sentiments and output the scores for the overall sentiments in the given text.This project is about detecting sentiments in a opinions/opinions given as text in simple English. It gives scores positive if overall sentiments of the given text are positive and negative if overall sentiments of the given text is negative otherwise zero for neutral. It is based on linguistic approach using one of the modules of open source python library called NLTK. The other methods which are available, like naive bias classifiers can also be used for detecting and mentioning sentiments. This project is written in python 2.7 language using IDLE as editor and tkinter module is used to get Text as input and to Display its output in a separate window. __________________ __________________ Signature of Student Signature of Supervisor Name: Utkarsh Name :Mr.Sudhanshu Kulshrestha Date:04 - 06 - 2015
  • 7. 7 (VI) LIST OF FIGURES Sentiment Analysis Page Number 17 Use cases diagrams Page Number 20 Class diagrams Page Number 20 Activity or Flow diagrams Page Number 21 IG diagram Page Number 24
  • 8. 8 (VII) LIST OF TABLES Tables Page Number 1.Tabular comparison of other existing approaches/ solution to the 15 problem framed 3.Risk analysis and mitigation plan 23 4.Component testing 24 5.Top risk on the basis of IG diagram 24 6.Mitigation Approaches 25 7.Additional resources needed for mitigation 25 8.Testing Plan 26 9.Testing Team Details 27 10.Test Environment 27 11.Component decomposition and type of testing 28 12.List of all test cases 28
  • 9. 9 (VIII) LIST OF SYMBOLS & ACRONYMS 1.NLTK: Natural Language Tool Kit 2.NLP:Natural Language Processing. 3.IDLE:Python Editor. 4.IDE :Integrated Development environment 5.OS: operating system 6.DOS: Disk operating systems
  • 10. 10 Chapter-1 Introduction 1.1 General Introduction The Sentiment analysis is part of natural language processing. Natural language Processing is used for data analytics purpose, to extract meaningful information from lots of data. This is one of the methods to get information about current trend in the market of what people are thinking or talking on social media. There are so many practical applications present in the current world like in election which party is favourable or gaining popularity or a customer watching for reviews before actually buying something online. These are few of the applications which are getting harder to solve as size of data keeps on increasing. Big part goes to arrange this data into something meaningful before analysing it. This part of arrangement of data is called Text Classification. Sentiment classification and analysis is performed in python using nltk module. Python has special module NLTK to do tasks in natural language processing. It supports multiple languages like English, Hindi, Chinese etc to do classification of text or data into something meaningful. Text Classification can be performed in following ways: 1. Sentiment-Classification 2. Features-based-Sentiment-classification 3. Summarization-of-sentiments These classifications classify the complete document in accordance with the sentiments or opinions listed in the text. Feature based approach however, classifies the sentiments based on specifications of the entity(Noun) listed in the text. This approach reveals about good or bad quality about certain entities based on the details listed with it. Opinions summarization is similar to text summarization but opinion summarization gives a clear indication about the sentiment attached with the text. It outputs the sentiment precisely not in the form of substring of the given text, It mentions the text in the positive or negative words about the entities so that a whole document can be best described in few words without losing the abstract of the document. These types of classification can be performed before actually analysing the text. After text classification, it performs tagging with the words. Sentiment classification can be performed at different level. 1. Document Level
  • 11. 11 2. Sentence level 3. Word level English is one of the most preferred language to work for natural language processing. This project is based on opinions in English language, does not support other languages at all. Consider an example : "I watched the movie burger. The movie was very good and the actor did an awesome job." "When Modi returned from U.S.A., I got my 15 lakhs as promised by PM Modi" It clearly tells about the movie and the actor stating positive review. However the sentiment classifier is still not able to classify sarcasm. It is still a big problem for data analytics and a topic of research. How to perform this in a machine language is much harder. There are approaches which perform such operations 1. Linguistic approach 2. Machine Learning 1.2 Current Open Problems/Issues 1. Linguistic Approach: It the basic approach to deal with the sentiment analysis. It uses tagging technique with the tokens and then starts analyzing it. The problems with this approach is  Negation: This approach can not deal with negation very well. Few times, this approach produces opposite in sense result .  Grammatically incorrect sentences: This approach uses datasets to match words during tagging. So if a sentences with polarity is formed grammatically incorrect, it is not possible to match it with the existing datasets of polar words. The datasets must contain all the polar words used in regular language to make it more efficient.  Sometimes, users say something but mean something else type sentences in the text which make sense but not analyzable by a machine. 2. Machine-Learning-approach: There are methods which classify the text like naive_bias or S_V_M also suffers from problems like: 1. Sarcasm:. 2. Jumbling of words
  • 12. 12 3. Chatting text or tweets: Limited words to type 1.3 Problem Statement To detect the given text as input, perform analysis on the data and show the score of the polarity of input text. The score shows the polarity of the text. If it is greater than zero means sentiment is positive else negative.. The input will be taken from the user in string format. After inputing the string, the approach used in this project classify/toekenize the text in tokens. When tokenization is completed, it starts operation of tagging to each token and then evaluate it. This generates a score which after conversion from integer to string is displayed on the screen. 1.4 Overview of proposed solution approach and Novelty/benefits Proposed solution is built in python 2.7 using nltk module and tkinter to take input from the user and to display output. It is based on linguistic approach. It takes the string, tokenized it and matches the tokens with the datasets in database with the tags added with tokens. Finally it evaluates the score for each polar words and calculates the score for the given text. The file contains code is divided into classes:  splitter_class: The given input is in string format. The whole paragraph can not be evaluated as it it. First this class splits the text into tokens/words using tokenizer function of nltk module.  pos_tagger_class: When splitting is done, this class adds tags to the each tokens so that these tokens can be classified as verbs or nouns or adverbs or adjectives etc. This class does the tagging work and returns the tagged sentences.  dicionary_tagger_class: This class uses the datasets available with the project to make a dictionary for all the tokens tokenized by splitter_class and make a dictionary of tagged tokens. 1.5 Give tabular comparison of other existing approaches/ solution to the problem framed Linguistic_approach Simple approach, easy to code and good results
  • 13. 13 with simple texts Naive_bias_classifier approach Machine_Learning approach, works by first learning the text then evaluating the other part of text based on the learning outcome, outperforms linguistic approach Support_Vector machine approach Machine_Learning approach, works by data analysis and finding patterns in the data to evaluate. Gives a better classification, Better Results.
  • 14. 14 Chapter-2 Literature Studied 2.1 Summary of Papers  Paper-1 : Sentiment Analysis And Opining Mining By: G.Vinodhini and RM.Chandrasekaran [June 6,2012] Department of Computer Science and Engineering, Annamalai University, Annamalai Nagar-608002. Summary The big volume of data present on internet today consisting of regular updating and increasing in size of social networks, news, entertainment, reviews, blogs, discussions forums provides a large number of opinions. The data analytics focus of these opinions for sentiment analysis work. Researchers are currently working to build a software to detect and classify the texts available online. The precise information extracted from these type of resources present on internet today can give us lots of information about user's liking, disliking, what they want or do not want to buy and it can be used by the other party to take advantage of this information to provide better deals to the users or help users to get better deals in case of reviews. The data available on internet after classification and analyzing can be very valuable to the users. This paper detailed about the survey describing about the methods in data analytics and the problems exist in the area of data analytics /sentiment analysis. Weblink- http://www.dmi.unict.it/~faro/tesi/sentiment_analysis/SA2.pdf  Paper-2 : Boost up! Sentiment Categorization with Machine Learning Techniques By: Andr´es Cassinelli, Chih-Wei Chen [ June 5,2009] Summary To calculate the sentiment of a given text or opinion or review, it is noted that methods have an analysis nearly same to the past works in data analytics in reviews or sentiment analysis, it works precisely in a better way. If these methods are applied to the multi-classfication techniques, the results could be quite same. On applying classification techniques on the data, it first uses the data as training set to train itself and the evaluates the rest of the data, so the technique mentioned in the paper describes the relationship between the objects in an efficient way. Weblink- http://www.cs.cornell.edu/home/llee/papers/sentiment.pdf  Paper-3: Twitter as a Corpus for Sentiment Analysis and Opinion Mining By: Alexander Pak, Patrick Paroubek [2010] Universit´e de Paris-Sud, Laboratoire LIMSI-CNRS, Bˆatiment 508,F- 91405 Orsay Cedex, France
  • 15. 15 Summary Today Social network sites like twitter, facebook, google plus, linkedin etc are famous tools to communicate with other people on internet. Thousands of people shares information with each other. This information may be useful for some or waste data for some. If properly analysed, this data could be very useful for some purposes. It may be in the form of opinions or results to others. So these social sites can be very effective in generating information (also useful) about so many aspects in today's life for human. But there is less work done in recent times because these social networking sites came into existence shortly. In this paper, the author specifies the details using Twitter, one of the most famous social network in present world, for the works of sentiment analysis. Weblink: http://lrec-conf.org/proceedings/lrec2010/pdf/385_Paper.pdf 2.2 Integrated Summary of the literature studied Sentiment analysis is currently one of the popular topic in research field.There are various works going on in this area for different languages not studied until now like Arabic, Hindi, Thai etc. There are various open source libraries available for different languages like python, R etc which makes the work easy to analyze the text and process it. It can be used for various purposes like in reviewing movies, products of a companies, about companies, feeling or emotions of citizens for a country. The most popular way to get this information on social media and analyze it. To make it into something meaningful sense, the classifier techniques must be used. The data must be in readable format, in English. The classifiers are used to tokenize of classify the data. The SuperWised learning technique is used with machine learning approach to detect sentiments and analyze the sentiments of the rest of the text . Un-Superwised learning is linguistic approach in which text is first tokenized into tokens and added with tags to evaluate the sentiments of the text. How to get lots of data to evaluate:  Social sites 1. Facebook.com 2. Twitter.com 3. LinkedIn.com  News websites and comments  Movie Reviewing sites  Products selling sites 1. Flipkart 2. Snapdeal  Blogs etc Techniques used presently are:  Machine learning 1. Naive_Bias_classifier 2. Suppport_Vector_Machine 3. Decision_tree
  • 16. 16 Text Structure:  A array of sents/sentences  Each sent is again tokenized called tokens  Each word or token is padded with 2 other tags in dictionary format. These added tags make each token to be recognized as verbs, nouns, adjectives, adverbs etc to verify if that token is polar word or not.  Separate datasets are there so that each token can be matched with words present in the datasets. First, collection of data is a concern.Useful data is what is required before analysing the data.Sentiment analysis is performed on the data which is about a product or review and user wants to know about if it is good or not. Sentiments can have various types of polarity or emotions about something particular. Summarizing the opinions is also one of the great concern for today's reseachers. summarizing the sentiments does not deal with subset of text or its one part of text to be printed. It is printing the data with a precise sense in fewer number of words and it also contains the subject of the text.
  • 17. 17 opinionative words or phrases Features Fig1Sentiment Classification and analysis. Product Reviews Sentiment_identific ation Feature_Selection Sentiment_classification Sentiment Polarity
  • 18. 18 Chapter-3 Analysis, Design and Modeling 3.1 Overall description of the project 3.1.1 Introduction This software is built on windows(8) platform 64 bit, Python version -2.7 ,32 bit system. It uses "nltk" module which can be downloaded from nltk.org . The input section uses tkinter module to get the input and to display the output. Tkinter must be downloaded first to run this on any system . All the listed Setups above are available free on the python official website.  Purpose This software can be used by any user who wants to analyze movie reviews or product reviews or any opinions in positive or negative.  Scope The opinions must be in English and simple words. It does not support other languages. It may not handle sarcastic or negation well. So in that case, result may vary or unexpected.  Product perspective This software doesn't depend on any other hardware of software other than resources provided by a system. Python setup with nltk and tkinter module do all the work required.  Product functions This software takes a string typed by user and produces the sentiment score . The user needs to type a string and wait for the output. Output may take some time for processing depends on the size of text typed by the user.  User characteristics The users can be anyone who wants to analyse data on the basis of polarity of the sentences. It works in the same way for each user and execution time of text_processing depends on the size of data given to the software by user.  Constraints The user must know English and know how to install python setups. If python setups are installed, no pre-requisite knowledge is required to handle this software. Hardware configuration must be met.  Assumptions and dependencies System must support python 2.7 32 bit. tkinter may not work with 3.x python because of syntax change. Windows platform must be xp or wista or windows 7 or 8. Memory must be 512mb at least. System handles text files. 3.2 Functional Requirements: Sentiment analysis has to be performed on text in English and it gives output as:. a. Positive b. Negative c. Neutral (zero)
  • 19. 19 3.3 Non Functional Requirements  Data selection: data can be downloaded from standford site or various user reviews sites or social networks. Reviews for movies and reviews for product must be checked for separate datasets listed in the database.  Accessibility: To access the data listed on nltk, run "nltk . download( ) on idle  Documentation-Proper comments are there within each file for explanation.  Maintainability - Codes does not need to be maintained if not altered.  Portability - The user just need to run the .py file on any system to analyse reviews/opinions.  Reliability - It depends on the structure language of opinions.  Response Time - Long reviews can take more time to pre-process it and then tokenization. . 3.4 Logical database requirements For database , separate files are added with the source code in separate folder with an extension yaml. .yaml extension is easy to map with data members which are common for various languages like arrays, dictionaries etc. There is no sql or other data base concepts are used in the project.Data sets Files are attached with the source code using their director/file name paths with Python file handling. 3.5 Design Diagrams  3.3.1Use Case diagrams user 1.Input String 2. takes input 3.Press Enter 4. Start processing the data,tokenization 5.waiting for the Output 6.Output screen appears with the sentiment score Backgr ound Proces sing
  • 20. 20  Class diagrams / Control Flow Diagrams Pos_Tagger_Class +init() +pos_tag() Dictionary_tagger_ Class +init() +tag() +tag_sentence() Object class(python) Splitter_class +init() +split()
  • 22. 22 Chapter-4 Implementation details and issues 4.1 Implementation details and issues The implementation is done in Python 2.7 using nltk and tkinter module. NLTK module is used for text processing purpose which is open sourced. nltk gives many corpa for data analytics purpose. These corpa can be used to recreate grammar or taggers which againg can be used with the tokens for tagging and generating efficient classified data. To download corpus like chas, books or novels listed to be used with data analytics purpose, run nltk . download ( ) in python editor. this will download all the required documents for the sentiment analysis purpose and can be used by importing "import nltk" . It uses file handling in python. So check the path carefully first. All the files must be placed first and its path names must be given to the dictionary_tagger_class. Python 3.x may not be compatible with this code as there are many functions or tkinter changed in 3.x versions of python. It contains 3 classes:  splitter_class: To split texts into tokens  pos_tagger_class: for tagging purpose  dictionary_tagger_class: make tagged tokens a dictionary data-type 4.1.1 Implementation Issues Finding compatible functions with the nltk module and html parsing functions were few of the issues with the project. there are many changes in python 2.7 and 3.x versions so keeping syntax with compatible version was also one of the issues. Tkinter is also different for python 2.7 and python 3.x as there are syntax changes in python 2.7 . 4.1.2 Algorithms (Module wise- with respect to design) First module deals with the copying the content from web for downloading the reviews. Second module deals with the tokenization process of texts and converting it into lists of strings. Third module deals with the tagging the tokens with accurate tags. Fourth module deals with the file handling to add the files of datasets to the source code. Fifth module deals with the making of dictionary tagged data members of text tokens. Sixth module deals with the displaying the text attached with the polar words of text and the result. For Input and Output, Tkinter is being used here with python. It takes input and supplies it to the source code of the sentiment analysis code and after processing, sentiment analysis code returns the score for sentiment analysis which displayed on the screen using Tkinter. Tkinter is a separate module for python. The approach is Linguistic approach. In this approach, first, text is tokenized using tokenizer_ function and then added tags with it. These tokens are then matched with the existing data sets stored separately using .yaml extension. If token is found, it compares for the attached tag with the token. On the basis of attached tag, it evaluates if it is positive or negative. If the token s not found in the datasets, it is treated as neutral. Adjective or Adverbs increases the score in the direction of polarity of words.
  • 23. 23 4.2 Risk Analysis and Mitigation Ris k Id. Description of risk Risk area Probabilit y (P) Impac t (I) PE R*I Risk selected for mitigatio n (Y/N) Mitigatio n plan Classificatio n 1 Memory Overflow/underflo w Memory 0.001 L 0.00 1 Y Try/catch block Code and Unit test 2 Invalid Input( not string) Conversio n of data type problem, too large numbers, passing string of greater size than allowed 0.3 L 0.3 N Code and unit test 7 Improper use of function(not passing required parameters ) Prototypin g 0.3 M 0.9 N Coding Implentation
  • 24. 24 Interrelationship Graph 3 Performance Time of execution 0.3 M 0.9 N Development Process 4 Complier not working Compiler problem 0.001 L 0.001 Y Re- insall/Re- open Environment and test 5 Code not working Code altered 0.3 M 0.9 N Engineering Specialities 6 Unwanted output Code altered 0.1 L 0.1 N Engineering Specialities Memory wt:0.001 Code Not working wt:0.9 Perfor mance wt:0.9 Unwanted Output wt:0.1 Prototyping wt:0.9 Compiler problem wt:0.001 Data Type/range wt:0.3
  • 25. 25 S.No Risk Area # of Risk statements Weights(in+out) Total weight Priority 1 Code altered 4 0.1+0.1+0.1+0.9 1.2 High 2 Memory 2 0.001+0.3 0.301 Low 3 Data type/range 2 0.3+0.9 1.2 High 4 Performance 1 0.1 0.1 Low 5 Prototyping 2 0.3+0.9 1.2 High 6 Compiler problem 1 0.9 0.9 Medium Top Risks as the ones with maximum total weight from the graph Risk Id Risk Statement Risk Area Priority of Risk area in IG 1 Code not working/unwanted output Code Altered 1 Mitigation Approaches Use Try/catch block for invalid input constraints. Make function definition private.. For compiler problem, re-install/re-open it or check for the python path in the environment variable. For unwanted output, check for the range of input values or prototypes of functions. Date Started Date To complete Owner 1 - May -2015 15 - May - 2015 Utkarsh Additional resources needed for mitigation Copy the source code for backup.
  • 26. 26 Chapter-5 Testing (Focus on Quality of Robustness and Testing) 5.2.1 Testing Plan The source code for sentiment analysis is checked for different reviews taken from different sites. A test file is also maintained for this purpose in a separate folder and its output is also saved. The type of testing performed is mentioned here:. Type of Test Will test be performed? Comments/explanation Software component Requirement testing Yes Unit Yes Listed in first program source files Integration Yes Linked with source file using fle handling Database files Performance Yes Depends on the execution of text input Length of text in tkinter Stress Yes Compiled py files Compliance No Security No Not hidden Dot py file for implementation Load No Volume No Example test cases Yes Number of test cases are written in main file and added with datasets Main files and datasets Compilation Yes For syntactical errors Python source files Test Team Details Test Schedule Activity Start date Completion date Hours Comments Obtain input data 01/05/2015 10/05/2015 3 hours/Day Input taken from various sources Tester Utkarsh Performed all the test cases
  • 27. 27 on internet Test region setup 11/05/2015 15/05/2015 3 hours/Day Input taken from various sources on internet TEST ENVIRONMENT- Description of test platforms Software Items Operating systems windows 8 Notepad Python editor and compiler tkinter and nltk Hardware Items A complete system with pre-installed software for running python programs, nltk and tkinter modules 5.2 Component decomposition and type of testing required S.No List of various components Type of testing required Technique of writnig test cases 1 TEST1 Integration White Box 2 TEST2 Performance Blak Box 3 TEST3 Example test cases Black box 5.3 List all test cases in prescribed format Test cases for component Test case Id Input Output Status TEST1 Linked with file Console output score Pass TEST2 Datasets Console output score Pass TEST3 Numbers Integral Fail TEST3 String Score Pass 4 TEST4 Compilation White Box
  • 28. 28 TEST3 Review from online site Score Pass TEST4 Example test cases linked with separate files Console output Pass 5..4 Error and Exception Handling (mention debugging techniques with which you have corrected errors) Test case id Test Case for component Debugging technique 1 Tkinter Print or tracing 2 Source code Backtracking 5.6 Limitations of the solution The source code does not work for the following test cases:  Grammatically ill formed sentences.  Sentences having Sarcasm.  Negation may not be handled well by the source code  Too large text (in MB data of text file).Python takes lot of time to execute this much of data.  Jumbling of words in sentences.
  • 29. 29 Chapter-6 Findings & Conclusion 6.1 Findings The sentiment analysis is efficient for simple English, not for any other language. The sentence formation must be simple and straight forward because it does not handle various cases of sentences formation like jumbling of words or sarcastic sentences. Input can be taken from tkintr in text format and similarly displayed. nltk module works really good for natural language processing. It also provides other techniques to classify the text like naive-bias classifier or svm. Nltk includes different kind of tagging functions to add tags with tokens. 6.2 Conclusion .This approach used in the project works efficiently with plain English text. It is easy to code and simple in understanding, does not require regular expression construction. There are built taggers available which an be used directly with the texts. To make more efiicient, different techniques can be grouped together.Naive_Bias_classifier or S_V_M can work better in case of complex sentences. 6.3 Future Work  Using different techniques like machine learning ,super_wised learnig to train the one part of text and use this training to analyze the rest of the text.  Combine different techniques to see the result of combined approach of algorithms  This work can be extended for other languages like Hindi etc.  Construction of Regular Grammar makes the tagging part more efficient. Generate own regular expressions.
  • 30. 30 References [1] http://en.wikipedia.org/wiki/sentiment-analysis1 [2] http://inltk.org [3] http://marl.gi2mo.org/img/class_diagram_v0.2.png [4] http://www.nltk.org/books [5] http://nlp.stanford.edu/IR-book/html/htmledition/edit-distance-1.html [6] https://wiki.python.org/moin/TkInter [7] www.tutorialspoint.com/python/python_sending_email.htm Appendix A. Time Line 01-02 04-03 20-03 25-04 10-05 25-05 04-06 Synopsis Study research papers and Implementation Midterm report Implementation Testing Report
  • 31. 31 Resume Utkarsh Date of Birth: 15-08-1993 E-Mail: soniutkarsh@ymail.com Phone No.: +91-8468088422 Codechef Profile:Utkarsh3587 Interests:  Data Structures  Algorithms  Operating Systems  Object Oriented Programming Education:  B.Tech., Computer Science & Engineering-2015 Jaypee Institute of Information Technology , Noida 4th year (7th Semester) , Current CGPA : 6.2/10.  Senior Secondary-2010 Sardar Patel Public Senior Secondary School , Delhi CBSE with 74.6% .  Secondary-2008 Sardar Patel Public Senior Secondary School , Delhi CBSE with 83.8%. Skillset: Programming Languages: C , C++ Operating Systems : Ubuntu , Windows Web Technologies: HTML, CSS, JavaScript Projects:  Hybrid Cross Platform Application This Project was done on PhoneGap Platform using web technologies like html, css and java script. Under this project I have implemented some functionalities like downloading study material, playing quizzes , reading newspaper and few other functions etc.  Face Recognition Application using OpenCV for Android It was an android application project based on Image Processing using OpenCV libraries. It detects faces and recognizes them on the basis of stored images.