 An on-going project on Natural Language Processing (using Python and the NLTK toolkit), which focuses on the extraction of sentiment from a Question and its title on www.stackoverflow.com and determining the polarity.Based on the above findings, it is verified whether the rules and guidelines imposed by the SO community on the users are strictly followed or not.
2. NLP Definition :
The term Natural Language Processing encompasses a
broad set of techniques for automated generation,
manipulation and analysis of natural or human
languages.
So, NLP comprises of mainly three things –
• Automated Generation of Natural Languages.
• Text manipulation of Natural Languages.
• General analysis of Natural Languages.
3. Diving into the project
This project focuses on the following website :
www.stackoverflow.com
A major Questions and Answers forum for
developers and programmers.
4. Brilliance of stackoverflow:
• One can expect answers to his question within
10 - 15 minutes (in general).
• Comprises of tags ranging from “python” to
“java” to even fields like “image processing”
and “artificial intelligence”.
5. Downsides of stackoverflow:
• Comprises a very strict Voting mechanism which
becomes even more difficult for a beginner to
handle.
• Questions and answers which are written in a very
uncanny or strange way results into down-votes ,
which even decrements the overall reputation of the
user.
6. Major Reasons behind receiving down-
votes :
• Questions showing no Research effort.
• Endeavors framed but not mentioned in the
question. ”What have you tried?” is a very common
reply to questions which do not consist of personal
effort.
• Questions or answers consisting of broken links are
likely to get downvoted.
7. • If the title of the question is not correctly formatted
i.e. it starts with “How do I” etc. then the question is
a contender of receiving down-votes.
• Titles of questions consisting of negative polarity or
negativity in their posts are unlikely to go viral. We
conclude this from the Jonah Berger and Katherine L.
Milkman paper on viral content of internet.
8. What is a badly formatted title?
According to Stack Overflow community, the following
are the examples of badly formatted titles:
• Titles starting with “How do I”, or “How can I” are
categorized as badly formatted.
• Titles which consists of a tag keywords. If the
question consists a tag word, it is also considered as
badly formatted.
9. Major Goals in the project
• Does high-rated questions consists of titles which are
well-formatted?
• Does sentiment of the title of a question play a
significant role in the success of a question? Do titles
which consist of positive sentiment draw more
attention?
10. Major benefits of this project:
• New programmers and developers will be able to
judge their mistakes while framing a particular
question or an answer. This will reduce the chances
of receiving down-votes and not getting blocked by
moderators.
• This will also result in neater and cleaner questions
which will make life easier for existing developers to
answer questions.
11. END
• Anirban Ghosh , Roll – 05 , IT Sec – A, 3rd year
• Aryak Sengupta, Roll – 14, IT Sec – A, 3rd year
Mentor
Prof. Tapan Kumar Hazra