SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Anime Recommendation
Executive Summary
• Problem Statement
• Business Values
• Project Requirements
Problem Statement
• How is rating this anime if we give it to user?
• How popular each anime based-on follower?
• How many anime group based-on their genres?
• Which anime will be recommended to user based-on their
preference?
Business Values
• Able to choose anime to match current viewer
• Able to push advertisement to potential viewer
• Able to upsell similar products for each anime
• Able to accurately predict anime rating and popularity for license
acquisition
Requirements
• Anime data and user rating
• Recommendation algorithm using ALS
• Clustering algorithm using K-NN
• Model evaluation using RMSE
Data
• From https://www.kaggle.com/CooperUnion/anime-
recommendations-database
• Contains information on user preference data from 73,516 users on
12,294 anime
• Each user is able to add anime to their completed list and give it a
rating and this data set is a compilation of those ratings
Data
• 2 files, anime.csv and rating.csv
• Data volume
• 12,294 rows for anime.csv
• 7,813,737 rows for rating.csv
Schema
• anime.csv
• anime_id: myanimelist.net's unique id identifying an anime
• name: full name of anime
• genre: comma separated list of genres for this anime
• type: movie, TV, OVA, etc.
• episodes: how many episodes in this show. (1 if movie)
• rating: average rating out of 10 for this anime
• members: number of community members that are in this anime's "group"
Schema
Schema
• rating.csv
• user_id: non identifiable randomly generated user id
• anime_id: the anime that this user has rated
• rating: rating out of 10 this user has assigned (-1 if the user watched it but
didn't assign a rating)
Schema
Feature
• Use original dataset to build recommendation model
• Extract unique genre from genres column in anime.csv to build
clustering model
Feature
• Recommendation
• anime_id
• rating, also used as target
• user_id
Feature
• Clustering
• anime_id
• Pivoted genres (Action, Adventure, Comedy, Drama, …)
• type
• episodes
• rating
• members
• Use predicted cluster as target
Running Prototyping Experiment
• Get data
• Data pre-processing
• Feature engineering
• Train the model
• Model evaluation
Get Data
• Dataset was downloaded from
https://www.kaggle.com/CooperUnion/anime-recommendations-
database
• Data is in comma separated value file format
• See data information in ”Data” section
Data Pre-Processing
• Data retrieved are well-formed
• Some NULL value in rating was found
• Unknown episodes represented as “Unknown”
• Rows with NULL and/or Unknown values was filtered out
• Total filtered rows is ~500
Feature Engineering
• Use original data schema
• Processed only data in rating.csv
• Use anime_id, user_id and rating as features
• rating also used as target
Train the Model
• Processed only data in rating.csv
• Ratio of train-to-test data is 80:20
• Use ALS algorithm to build rating predictive model
Model Evaluation
• Data in anime.csv is used for map anime_id with human-readible
name
• Predicted ratings were of type “floating point”
• Using RMSE as an evaluation method
• Some row of test data cannot be predicted, we get “NaN” as a result
• NaN (Not-a-Number) was filtered out
Anime Recommendation
Part 2
Contents
• Clustering model with K-Means
• Real-time data processing
• Visualization
Clustering with K-Means
Environment
• CRAN R 3.4.2
• Anime data file (anime.csv)
• Genres distance file (distance.csv)
Build a Clustering Model
• Try build with 5 to 10 clusters
• Use distance.csv file to determine the distance
• Visualizing clusters
Discussion
• Distance value can be determine as indicated in “How to produce a
pretty plot of the results of k-means cluster analysis?” discussion
(https://stats.stackexchange.com/questions/31083/how-to-produce-
a-pretty-plot-of-the-results-of-k-means-cluster-analysis)
• Distance value in anime clustering should be a normalized value
• Can be percent of running scene for each genre
• Example: action scene running for 12 minutes out of 24 minutes, so
distance for action is 50%
Real-time Data Processing
Environment
• Web API
• Kafka
• Spark Streaming
Environment
Client
Client
Client
Request
Response
Producer Consumer
Demonstration
Visualization
Demonstration

Weitere ähnliche Inhalte

Was ist angesagt?

Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceHarivamshi D
 
Diabetes prediction with r(using knn)
Diabetes prediction with r(using knn)Diabetes prediction with r(using knn)
Diabetes prediction with r(using knn)tanujoshi98
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesDeepaR42
 
Iimsr student management system
Iimsr student management systemIimsr student management system
Iimsr student management systemSHUJA SHABBIR
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 
Movie Recommender system
Movie Recommender systemMovie Recommender system
Movie Recommender systemPalakNath
 
Student Management System best PPT
Student Management System best PPTStudent Management System best PPT
Student Management System best PPTDheeraj Kumar tiwari
 
Software Development Methodologies
Software Development MethodologiesSoftware Development Methodologies
Software Development MethodologiesNicholas Davis
 
ONLINE EXAMINATION on ASP.NET
ONLINE EXAMINATION on ASP.NETONLINE EXAMINATION on ASP.NET
ONLINE EXAMINATION on ASP.NETRupam Dey
 
Movie lens recommender systems
Movie lens recommender systemsMovie lens recommender systems
Movie lens recommender systemsKapil Garg
 
Online Examination System Report
Online Examination System ReportOnline Examination System Report
Online Examination System ReportAnkan Banerjee
 
Software Quality Assurance
Software Quality AssuranceSoftware Quality Assurance
Software Quality AssuranceSiddhesh Palkar
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filteringNeha Kulkarni
 
Predicting student performance using aggregated data sources
Predicting student performance using aggregated data sourcesPredicting student performance using aggregated data sources
Predicting student performance using aggregated data sourcesOlugbenga Wilson Adejo
 

Was ist angesagt? (20)

Content based filtering
Content based filteringContent based filtering
Content based filtering
 
Student Tracking System
Student Tracking SystemStudent Tracking System
Student Tracking System
 
An Emerging Theory of Avatar Marketing
An Emerging Theory of Avatar MarketingAn Emerging Theory of Avatar Marketing
An Emerging Theory of Avatar Marketing
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
 
Diabetes prediction with r(using knn)
Diabetes prediction with r(using knn)Diabetes prediction with r(using knn)
Diabetes prediction with r(using knn)
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
 
Iimsr student management system
Iimsr student management systemIimsr student management system
Iimsr student management system
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Movie Recommender system
Movie Recommender systemMovie Recommender system
Movie Recommender system
 
Student Management System best PPT
Student Management System best PPTStudent Management System best PPT
Student Management System best PPT
 
Software Development Methodologies
Software Development MethodologiesSoftware Development Methodologies
Software Development Methodologies
 
ONLINE EXAMINATION on ASP.NET
ONLINE EXAMINATION on ASP.NETONLINE EXAMINATION on ASP.NET
ONLINE EXAMINATION on ASP.NET
 
Movie lens recommender systems
Movie lens recommender systemsMovie lens recommender systems
Movie lens recommender systems
 
Stroke Prediction
Stroke PredictionStroke Prediction
Stroke Prediction
 
Online Examination System Report
Online Examination System ReportOnline Examination System Report
Online Examination System Report
 
Software Quality Assurance
Software Quality AssuranceSoftware Quality Assurance
Software Quality Assurance
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Predicting student performance using aggregated data sources
Predicting student performance using aggregated data sourcesPredicting student performance using aggregated data sources
Predicting student performance using aggregated data sources
 
Linear Regression.pptx
Linear Regression.pptxLinear Regression.pptx
Linear Regression.pptx
 
Churn Predictive Modelling
Churn Predictive ModellingChurn Predictive Modelling
Churn Predictive Modelling
 

Ähnlich wie Anime recommendation (Big Data Certification#6)

Design Recommender systems from scratch
Design Recommender systems from scratchDesign Recommender systems from scratch
Design Recommender systems from scratchDr. Amit Sachan
 
Looking into the Future: Using Google's Prediction API
Looking into the Future: Using Google's Prediction APILooking into the Future: Using Google's Prediction API
Looking into the Future: Using Google's Prediction APIJustin Grammens
 
Big Data, Analytics, and Content Recommendations on AWS
Big Data, Analytics, and Content Recommendations on AWSBig Data, Analytics, and Content Recommendations on AWS
Big Data, Analytics, and Content Recommendations on AWSAmazon Web Services
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNNŞeyda Hatipoğlu
 
Recommender System Using AZURE ML
Recommender System Using AZURE MLRecommender System Using AZURE ML
Recommender System Using AZURE MLDev Raj Gautam
 
Managed Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty ImagesManaged Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty ImagesLucidworks
 
Big Data Expo 2015 - Hortonworks Effective use of Apache Spark
Big Data Expo 2015 - Hortonworks Effective use of Apache SparkBig Data Expo 2015 - Hortonworks Effective use of Apache Spark
Big Data Expo 2015 - Hortonworks Effective use of Apache SparkBigDataExpo
 
P211 Group 1 Amazon Beauty Products Recommendation.pptx
P211 Group 1 Amazon Beauty Products Recommendation.pptxP211 Group 1 Amazon Beauty Products Recommendation.pptx
P211 Group 1 Amazon Beauty Products Recommendation.pptxAnupama Kate
 
Big Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with RedisBig Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with RedisMatt Stubbs
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OSri Ambati
 
Algorithms presentation
Algorithms presentationAlgorithms presentation
Algorithms presentationAlket Cecaj
 
Darin Briskman_Amazon_June_9_2017_Presentation
Darin Briskman_Amazon_June_9_2017_PresentationDarin Briskman_Amazon_June_9_2017_Presentation
Darin Briskman_Amazon_June_9_2017_PresentationTriNimbus
 
Getting to Know the Video Consumer - NAB Show 2018
Getting to Know the Video Consumer - NAB Show 2018Getting to Know the Video Consumer - NAB Show 2018
Getting to Know the Video Consumer - NAB Show 2018Verimatrix
 
Running with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsightRunning with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsightChris Price
 
URUG Ruby on Rails Workshop - Sesssion 5
URUG Ruby on Rails Workshop - Sesssion 5URUG Ruby on Rails Workshop - Sesssion 5
URUG Ruby on Rails Workshop - Sesssion 5jakemallory
 
AWS re:Invent Deep Learning: Goin Beyond Machine Learning (BDT311)
AWS re:Invent Deep Learning: Goin Beyond Machine Learning (BDT311)AWS re:Invent Deep Learning: Goin Beyond Machine Learning (BDT311)
AWS re:Invent Deep Learning: Goin Beyond Machine Learning (BDT311)Chida Chidambaram
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchSujit Pal
 

Ähnlich wie Anime recommendation (Big Data Certification#6) (20)

Design Recommender systems from scratch
Design Recommender systems from scratchDesign Recommender systems from scratch
Design Recommender systems from scratch
 
Looking into the Future: Using Google's Prediction API
Looking into the Future: Using Google's Prediction APILooking into the Future: Using Google's Prediction API
Looking into the Future: Using Google's Prediction API
 
Big Data, Analytics, and Content Recommendations on AWS
Big Data, Analytics, and Content Recommendations on AWSBig Data, Analytics, and Content Recommendations on AWS
Big Data, Analytics, and Content Recommendations on AWS
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
 
Recommender System Using AZURE ML
Recommender System Using AZURE MLRecommender System Using AZURE ML
Recommender System Using AZURE ML
 
Managed Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty ImagesManaged Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty Images
 
Big Data Expo 2015 - Hortonworks Effective use of Apache Spark
Big Data Expo 2015 - Hortonworks Effective use of Apache SparkBig Data Expo 2015 - Hortonworks Effective use of Apache Spark
Big Data Expo 2015 - Hortonworks Effective use of Apache Spark
 
P211 Group 1 Amazon Beauty Products Recommendation.pptx
P211 Group 1 Amazon Beauty Products Recommendation.pptxP211 Group 1 Amazon Beauty Products Recommendation.pptx
P211 Group 1 Amazon Beauty Products Recommendation.pptx
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Big Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with RedisBig Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with Redis
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
 
Algorithms presentation
Algorithms presentationAlgorithms presentation
Algorithms presentation
 
Darin Briskman_Amazon_June_9_2017_Presentation
Darin Briskman_Amazon_June_9_2017_PresentationDarin Briskman_Amazon_June_9_2017_Presentation
Darin Briskman_Amazon_June_9_2017_Presentation
 
Getting to Know the Video Consumer - NAB Show 2018
Getting to Know the Video Consumer - NAB Show 2018Getting to Know the Video Consumer - NAB Show 2018
Getting to Know the Video Consumer - NAB Show 2018
 
Running with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsightRunning with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsight
 
URUG Ruby on Rails Workshop - Sesssion 5
URUG Ruby on Rails Workshop - Sesssion 5URUG Ruby on Rails Workshop - Sesssion 5
URUG Ruby on Rails Workshop - Sesssion 5
 
AWS re:Invent Deep Learning: Goin Beyond Machine Learning (BDT311)
AWS re:Invent Deep Learning: Goin Beyond Machine Learning (BDT311)AWS re:Invent Deep Learning: Goin Beyond Machine Learning (BDT311)
AWS re:Invent Deep Learning: Goin Beyond Machine Learning (BDT311)
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity Search
 
File Upload 2015
File Upload 2015File Upload 2015
File Upload 2015
 

Mehr von IMC Institute

นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14IMC Institute
 
Digital trends Vol 4 No. 13 Sep-Dec 2019
Digital trends Vol 4 No. 13  Sep-Dec 2019Digital trends Vol 4 No. 13  Sep-Dec 2019
Digital trends Vol 4 No. 13 Sep-Dec 2019IMC Institute
 
บทความ The evolution of AI
บทความ The evolution of AIบทความ The evolution of AI
บทความ The evolution of AIIMC Institute
 
IT Trends eMagazine Vol 4. No.12
IT Trends eMagazine  Vol 4. No.12IT Trends eMagazine  Vol 4. No.12
IT Trends eMagazine Vol 4. No.12IMC Institute
 
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformationเพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital TransformationIMC Institute
 
IT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to WorkIT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to WorkIMC Institute
 
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรมมูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรมIMC Institute
 
IT Trends eMagazine Vol 4. No.11
IT Trends eMagazine  Vol 4. No.11IT Trends eMagazine  Vol 4. No.11
IT Trends eMagazine Vol 4. No.11IMC Institute
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationIMC Institute
 
บทความ The New Silicon Valley
บทความ The New Silicon Valleyบทความ The New Silicon Valley
บทความ The New Silicon ValleyIMC Institute
 
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10IMC Institute
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationIMC Institute
 
The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)IMC Institute
 
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง IMC Institute
 
IT Trends eMagazine Vol 3. No.9
IT Trends eMagazine  Vol 3. No.9 IT Trends eMagazine  Vol 3. No.9
IT Trends eMagazine Vol 3. No.9 IMC Institute
 
Thailand software & software market survey 2016
Thailand software & software market survey 2016Thailand software & software market survey 2016
Thailand software & software market survey 2016IMC Institute
 
Developing Business Blockchain Applications on Hyperledger
Developing Business  Blockchain Applications on Hyperledger Developing Business  Blockchain Applications on Hyperledger
Developing Business Blockchain Applications on Hyperledger IMC Institute
 
Digital transformation @thanachart.org
Digital transformation @thanachart.orgDigital transformation @thanachart.org
Digital transformation @thanachart.orgIMC Institute
 
บทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.orgบทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.orgIMC Institute
 
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformationกลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital TransformationIMC Institute
 

Mehr von IMC Institute (20)

นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14
 
Digital trends Vol 4 No. 13 Sep-Dec 2019
Digital trends Vol 4 No. 13  Sep-Dec 2019Digital trends Vol 4 No. 13  Sep-Dec 2019
Digital trends Vol 4 No. 13 Sep-Dec 2019
 
บทความ The evolution of AI
บทความ The evolution of AIบทความ The evolution of AI
บทความ The evolution of AI
 
IT Trends eMagazine Vol 4. No.12
IT Trends eMagazine  Vol 4. No.12IT Trends eMagazine  Vol 4. No.12
IT Trends eMagazine Vol 4. No.12
 
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformationเพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
 
IT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to WorkIT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to Work
 
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรมมูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
 
IT Trends eMagazine Vol 4. No.11
IT Trends eMagazine  Vol 4. No.11IT Trends eMagazine  Vol 4. No.11
IT Trends eMagazine Vol 4. No.11
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformation
 
บทความ The New Silicon Valley
บทความ The New Silicon Valleyบทความ The New Silicon Valley
บทความ The New Silicon Valley
 
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformation
 
The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)
 
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
 
IT Trends eMagazine Vol 3. No.9
IT Trends eMagazine  Vol 3. No.9 IT Trends eMagazine  Vol 3. No.9
IT Trends eMagazine Vol 3. No.9
 
Thailand software & software market survey 2016
Thailand software & software market survey 2016Thailand software & software market survey 2016
Thailand software & software market survey 2016
 
Developing Business Blockchain Applications on Hyperledger
Developing Business  Blockchain Applications on Hyperledger Developing Business  Blockchain Applications on Hyperledger
Developing Business Blockchain Applications on Hyperledger
 
Digital transformation @thanachart.org
Digital transformation @thanachart.orgDigital transformation @thanachart.org
Digital transformation @thanachart.org
 
บทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.orgบทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.org
 
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformationกลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
 

Kürzlich hochgeladen

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Kürzlich hochgeladen (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Anime recommendation (Big Data Certification#6)

  • 2. Executive Summary • Problem Statement • Business Values • Project Requirements
  • 3. Problem Statement • How is rating this anime if we give it to user? • How popular each anime based-on follower? • How many anime group based-on their genres? • Which anime will be recommended to user based-on their preference?
  • 4. Business Values • Able to choose anime to match current viewer • Able to push advertisement to potential viewer • Able to upsell similar products for each anime • Able to accurately predict anime rating and popularity for license acquisition
  • 5. Requirements • Anime data and user rating • Recommendation algorithm using ALS • Clustering algorithm using K-NN • Model evaluation using RMSE
  • 6. Data • From https://www.kaggle.com/CooperUnion/anime- recommendations-database • Contains information on user preference data from 73,516 users on 12,294 anime • Each user is able to add anime to their completed list and give it a rating and this data set is a compilation of those ratings
  • 7. Data • 2 files, anime.csv and rating.csv • Data volume • 12,294 rows for anime.csv • 7,813,737 rows for rating.csv
  • 8. Schema • anime.csv • anime_id: myanimelist.net's unique id identifying an anime • name: full name of anime • genre: comma separated list of genres for this anime • type: movie, TV, OVA, etc. • episodes: how many episodes in this show. (1 if movie) • rating: average rating out of 10 for this anime • members: number of community members that are in this anime's "group"
  • 10. Schema • rating.csv • user_id: non identifiable randomly generated user id • anime_id: the anime that this user has rated • rating: rating out of 10 this user has assigned (-1 if the user watched it but didn't assign a rating)
  • 12. Feature • Use original dataset to build recommendation model • Extract unique genre from genres column in anime.csv to build clustering model
  • 13. Feature • Recommendation • anime_id • rating, also used as target • user_id
  • 14. Feature • Clustering • anime_id • Pivoted genres (Action, Adventure, Comedy, Drama, …) • type • episodes • rating • members • Use predicted cluster as target
  • 15. Running Prototyping Experiment • Get data • Data pre-processing • Feature engineering • Train the model • Model evaluation
  • 16. Get Data • Dataset was downloaded from https://www.kaggle.com/CooperUnion/anime-recommendations- database • Data is in comma separated value file format • See data information in ”Data” section
  • 17. Data Pre-Processing • Data retrieved are well-formed • Some NULL value in rating was found • Unknown episodes represented as “Unknown” • Rows with NULL and/or Unknown values was filtered out • Total filtered rows is ~500
  • 18. Feature Engineering • Use original data schema • Processed only data in rating.csv • Use anime_id, user_id and rating as features • rating also used as target
  • 19. Train the Model • Processed only data in rating.csv • Ratio of train-to-test data is 80:20 • Use ALS algorithm to build rating predictive model
  • 20. Model Evaluation • Data in anime.csv is used for map anime_id with human-readible name • Predicted ratings were of type “floating point” • Using RMSE as an evaluation method • Some row of test data cannot be predicted, we get “NaN” as a result • NaN (Not-a-Number) was filtered out
  • 22. Contents • Clustering model with K-Means • Real-time data processing • Visualization
  • 24. Environment • CRAN R 3.4.2 • Anime data file (anime.csv) • Genres distance file (distance.csv)
  • 25. Build a Clustering Model • Try build with 5 to 10 clusters • Use distance.csv file to determine the distance • Visualizing clusters
  • 26. Discussion • Distance value can be determine as indicated in “How to produce a pretty plot of the results of k-means cluster analysis?” discussion (https://stats.stackexchange.com/questions/31083/how-to-produce- a-pretty-plot-of-the-results-of-k-means-cluster-analysis) • Distance value in anime clustering should be a normalized value • Can be percent of running scene for each genre • Example: action scene running for 12 minutes out of 24 minutes, so distance for action is 50%
  • 28. Environment • Web API • Kafka • Spark Streaming