A group project done for a web mining class.
Amazon movie reviews, published by Jure Leskovec. Assistant Professor of Computer Science at Stanford University on his personal site.
Preparing Dataset:
1- Wrote a parser to convert txt file into CSV using R Compiler
2- Developed a NodeJS middleware to gather information about movie
Model selection & optimization:
1- Calculated basic sentiment score for each review
2- Created a word cloud for each movie by combining all reviews
3- Calculated Pointwise Mutual Information (PMI) score for each movie
4- Aggregated all the scores to get an overall movie score
Value:
If amazon adds the overall aggr. score and wordcloud, alongside with its general rating for each product. Then it will save users a lot of time from reading through all the reviews and they can easily picture the overall user sentiments regarding that product.
Sentiment Analysis on Amazon Movie Reviews Dataset
1. SENTIMENT ANALYSIS
AMAZON MOVIE REVIEW DATASET
IS 688 – WEB MINING
INSTRUCTOR: CHRISTOPHER MARKSON
TEAM MEMBERS: Maham | Amit | Mashael | Karan | Nidhish
2. OUTLINE
• Data Source, Collection & Parsing
• Model Selection & Optimizing Parameters
• Methods / Code Sample
• Results Overview & Value
3. DATA SOURCE, COLLECTION & PARSING
Amazon movie reviews, published
by Jure Leskovec. Assistant
Professor of Computer
Science at Stanford University on
his personal site.
4. PROBLEMS
• Format was not R-Friendly
• Only partial information was available, data context were missing
• we had reviews but no information about the movie
5. WORKAROUND / SOLUTION
• Wrote a parser to convert JSON txt file into CSV using R Compiler
• Developed a NodeJS middleware to gather information about movie
6. PREPARED FILES
After parsing, and gather more data using Amazon Web Service, we got following 2 files
&
Reviews
Movie Details
7. MODEL SELECTION & OPTIMIZATION
• Basic Sentiment Score for Each Review, using Syuzhet package
• Provides 4 types of method, bing, afinn, nrc, Stanford; AFFIN has weighted 2477 words and phrases
• Uses coreNLP, stringr libraries mainly.. Emotional trajectory of review
• Create WordCloud for Each Movie, using wordcloud package
• Combined all reviews into one variable, calculated term frequency & generated WordCloud images
• Used tm (text minig), SnowballC (text stemming), RColorBrewer (color palettes) alongside
• Pointwise Mutual Information (PMI) Sentiment Score for Each Movie, using RCurl package
• Wrote our own function
• Movie_Title vs Excellent/Poor, Movie_Genre vs Excellent/Poor
• Final score was the ratio of Movie_Title / Movie_Genre
8. MODEL SELECTION & OPTIMIZATION
• Aggregated all the Sentiment Scores
• Took Median of all the users review score
• Took Median of all the users review text sentiment score
• Assigned an overall Sentiment Score to each movie
• Took median of
• User Review Score Aggr,
• User Review Text Sentiment Score Aggr,
• Movie_Title vs Genre PMI Score
12. RESULT OVERVIEW & VALUE
The Count of Monte Cristo [Region 2]
Far from Home
Phonics Volume 1
13. RESULT OVERVIEW & VALUE
• Alongside with aggregate user reviews, Amazon can present
• overall rating score, and
• Word Cloud local to that product
• This will save users a lot of time to read through all the reviews and they can easily picture the overall
user sentiments regarding that product.