Sentiment Analysis on Amazon Movie Reviews Dataset

•Als PPTX, PDF herunterladen•

1 gefällt mir•1,093 views

A group project done for a web mining class. Amazon movie reviews, published by Jure Leskovec. Assistant Professor of Computer Science at Stanford University on his personal site. Preparing Dataset: 1- Wrote a parser to convert txt file into CSV using R Compiler 2- Developed a NodeJS middleware to gather information about movie Model selection & optimization: 1- Calculated basic sentiment score for each review 2- Created a word cloud for each movie by combining all reviews 3- Calculated Pointwise Mutual Information (PMI) score for each movie 4- Aggregated all the scores to get an overall movie score Value: If amazon adds the overall aggr. score and wordcloud, alongside with its general rating for each product. Then it will save users a lot of time from reading through all the reviews and they can easily picture the overall user sentiments regarding that product.

Daten & Analysen

SENTIMENT ANALYSIS
AMAZON MOVIE REVIEW DATASET
IS 688 – WEB MINING
INSTRUCTOR: CHRISTOPHER MARKSON
TEAM MEMBERS: Maham | Amit | Mashael | Karan | Nidhish

OUTLINE
• Data Source, Collection & Parsing
• Model Selection & Optimizing Parameters
• Methods / Code Sample
• Results Overview & Value

DATA SOURCE, COLLECTION & PARSING
Amazon movie reviews, published
by Jure Leskovec. Assistant
Professor of Computer
Science at Stanford University on
his personal site.

PROBLEMS
• Format was not R-Friendly
• Only partial information was available, data context were missing
• we had reviews but no information about the movie

WORKAROUND / SOLUTION
• Wrote a parser to convert JSON txt file into CSV using R Compiler
• Developed a NodeJS middleware to gather information about movie

PREPARED FILES
After parsing, and gather more data using Amazon Web Service, we got following 2 files
&
Reviews
Movie Details

MODEL SELECTION & OPTIMIZATION
• Basic Sentiment Score for Each Review, using Syuzhet package
• Provides 4 types of method, bing, afinn, nrc, Stanford; AFFIN has weighted 2477 words and phrases
• Uses coreNLP, stringr libraries mainly.. Emotional trajectory of review
• Create WordCloud for Each Movie, using wordcloud package
• Combined all reviews into one variable, calculated term frequency & generated WordCloud images
• Used tm (text minig), SnowballC (text stemming), RColorBrewer (color palettes) alongside
• Pointwise Mutual Information (PMI) Sentiment Score for Each Movie, using RCurl package
• Wrote our own function
• Movie_Title vs Excellent/Poor, Movie_Genre vs Excellent/Poor
• Final score was the ratio of Movie_Title / Movie_Genre

MODEL SELECTION & OPTIMIZATION
• Aggregated all the Sentiment Scores
• Took Median of all the users review score
• Took Median of all the users review text sentiment score
• Assigned an overall Sentiment Score to each movie
• Took median of
• User Review Score Aggr,
• User Review Text Sentiment Score Aggr,
• Movie_Title vs Genre PMI Score

METHODS / CODE SAMPLE
Basic Sentiment Score
WordCloud

RESULT OVERVIEW & VALUE
The Count of Monte Cristo [Region 2]
Far from Home
Phonics Volume 1

RESULT OVERVIEW & VALUE
• Alongside with aggregate user reviews, Amazon can present
• overall rating score, and
• Word Cloud local to that product
• This will save users a lot of time to read through all the reviews and they can easily picture the overall
user sentiments regarding that product.

Empfohlen

Sentiment Analysis on Amazon Movie Reviews DatasetMaham F'Rajput

Loan Sanction ModelMaham F'Rajput

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

Empfohlen

Sentiment Analysis on Amazon Movie Reviews DatasetMaham F'Rajput

Loan Sanction ModelMaham F'Rajput

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss

20240419 - Measurecamp Amsterdam - SAM.pdfHuman37

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss

Machine learning classification ppt.pptamreenkhanum0307

Semantic Shed - Squashing and Squeezing.pptxMike Bennett

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster

Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss

RadioAdProWritingCinderellabyButleri.pdfgstagge

Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2

PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava

While-For-loop in python used in collegessuser7a7cd61

1:1定制(UQ毕业证）昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一fhwihughh

DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali

Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Universitat Politècnica de Catalunya

GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

Weitere ähnliche Inhalte

Kürzlich hochgeladen

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss

20240419 - Measurecamp Amsterdam - SAM.pdfHuman37

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss

Machine learning classification ppt.pptamreenkhanum0307

Semantic Shed - Squashing and Squeezing.pptxMike Bennett

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster

Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss

RadioAdProWritingCinderellabyButleri.pdfgstagge

Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2

PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava

While-For-loop in python used in collegessuser7a7cd61

1:1定制(UQ毕业证）昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一fhwihughh

DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali

Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Universitat Politècnica de Catalunya

GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch

Kürzlich hochgeladen (20)

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一

20240419 - Measurecamp Amsterdam - SAM.pdf

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理

Machine learning classification ppt.ppt

Semantic Shed - Squashing and Squeezing.pptx

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改

RadioAdProWritingCinderellabyButleri.pdf

Identifying Appropriate Test Statistics Involving Population Mean

PKS-TGC-1084-630 - Stage 1 Proposal.pptx

While-For-loop in python used in college

1:1定制(UQ毕业证）昆士兰大学毕业证成绩单修改留信学历认证原版一模一样

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一

DBA Basics: Getting Started with Performance Tuning.pdf

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...

Student profile product demonstration on grades, ability, well-being and mind...

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

GA4 Without Cookies [Measure Camp AMS]

Empfohlen

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools

12 Ways to Increase Your Influence at WorkGetSmarter

ChatGPT webinar slidesAlireza Esmikhani

More than Just Lines on a Map: Best Practices for U.S Bike RoutesProject for Public Spaces & National Center for Biking and Walking

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference

Barbie - Brand Strategy PresentationErica Santiago

Empfohlen (20)

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

12 Ways to Increase Your Influence at Work

ChatGPT webinar slides

More than Just Lines on a Map: Best Practices for U.S Bike Routes

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...

Barbie - Brand Strategy Presentation

Sentiment Analysis on Amazon Movie Reviews Dataset

1. SENTIMENT ANALYSIS AMAZON MOVIE REVIEW DATASET IS 688 – WEB MINING INSTRUCTOR: CHRISTOPHER MARKSON TEAM MEMBERS: Maham | Amit | Mashael | Karan | Nidhish

2. OUTLINE • Data Source, Collection & Parsing • Model Selection & Optimizing Parameters • Methods / Code Sample • Results Overview & Value

3. DATA SOURCE, COLLECTION & PARSING Amazon movie reviews, published by Jure Leskovec. Assistant Professor of Computer Science at Stanford University on his personal site.

4. PROBLEMS • Format was not R-Friendly • Only partial information was available, data context were missing • we had reviews but no information about the movie

5. WORKAROUND / SOLUTION • Wrote a parser to convert JSON txt file into CSV using R Compiler • Developed a NodeJS middleware to gather information about movie

6. PREPARED FILES After parsing, and gather more data using Amazon Web Service, we got following 2 files & Reviews Movie Details

7. MODEL SELECTION & OPTIMIZATION • Basic Sentiment Score for Each Review, using Syuzhet package • Provides 4 types of method, bing, afinn, nrc, Stanford; AFFIN has weighted 2477 words and phrases • Uses coreNLP, stringr libraries mainly.. Emotional trajectory of review • Create WordCloud for Each Movie, using wordcloud package • Combined all reviews into one variable, calculated term frequency & generated WordCloud images • Used tm (text minig), SnowballC (text stemming), RColorBrewer (color palettes) alongside • Pointwise Mutual Information (PMI) Sentiment Score for Each Movie, using RCurl package • Wrote our own function • Movie_Title vs Excellent/Poor, Movie_Genre vs Excellent/Poor • Final score was the ratio of Movie_Title / Movie_Genre

8. MODEL SELECTION & OPTIMIZATION • Aggregated all the Sentiment Scores • Took Median of all the users review score • Took Median of all the users review text sentiment score • Assigned an overall Sentiment Score to each movie • Took median of • User Review Score Aggr, • User Review Text Sentiment Score Aggr, • Movie_Title vs Genre PMI Score

9. METHODS / CODE SAMPLE Basic Sentiment Score WordCloud

10. METHODS / CODE SAMPLE Aggregation PMI

11. RESULT OVERVIEW & VALUE

12. RESULT OVERVIEW & VALUE The Count of Monte Cristo [Region 2] Far from Home Phonics Volume 1

13. RESULT OVERVIEW & VALUE • Alongside with aggregate user reviews, Amazon can present • overall rating score, and • Word Cloud local to that product • This will save users a lot of time to read through all the reviews and they can easily picture the overall user sentiments regarding that product.

14. THANK YOU