My Academic Major Project Movie Recommendation using Artificial Intelligence. We also developed a website named movie engine for the recommendation of movies.
Movie recommendation Engine using Artificial Intelligence
1. Movie Recommendation Engine using
Artificial Intelligence
Under the Guidance:
Mrs. G. Sujatha
Asst. Professor, Dept. of CSE, KPRIT
By,
D. Harivamshi -16RA1A0512
U. Laxman -16RA1A0516
G. Vishnu Priya -15RA1A0510
2. Contents
• ABSTRACT
• EXISTING SYSTEM AND
DISADVANTAGES
• PROPOSED SYSTEM AND
ADVANTAGES
• SYSTEM REQUIREMENTS
• MODULES
• SYSTEM ARCHITECTURE
• UML DIAGRAMS
• INPUTS
• PYTHON LIBRARIES
• IMPLEMENTATION
• SCREENSHOTS
• TEST CASES
• CONCLUSION
• FUTURE ENHANCEMENTS
• BIBLIOGRAPHY
3. Abstract
• Recommender systems generates meaningful recommendations to a collection of users for items or
products that might interest them.
• Movie recommendation is important in our social life due to its strength in providing enhanced
entertainment.
• Although, a set of movie recommendation systems have been proposed, most of these either cannot
recommend a movie to the existing users efficiently or to a new user by any means.
• In this project we propose a movie recommendation system that has the ability to recommend movies to a
new user as well as the others.
4. Existing System
• The existing system is based on clustering the data. Clustering is the task of dividing
the population or data points into a number of groups such that data points in the
same groups are more similar to other data points in the same group than those in
other groups.
• K-Means clustering algorithm is a popular algorithm that has been used because the
given dataset has no target variables i.e. unsupervised learning hence it has no
predictions, instead it can form groups or clusters.
5. Disadvantages:
• Does not work well with large dataset: In large datasets, the cost of calculating the distance
between the new point and each existing points is huge which degrades the performance of
the algorithm.
• Does not work well with high dimensions: K-Means clustering algorithm
doesn't work well with high dimensional data because with large number of
dimensions, it becomes difficult for the algorithm to calculate the distance in each dimension.
• Being dependent on initial values: For a low k, you can mitigate this dependence by running k-
means several times with different initial values and picking the best result. As k increases, you
need advanced versions of k-means to pick better values of the initial centroids (called k-
means seeding).
6. Proposed System
• The proposed system is based on classification of the data. Classification of the data is
giving the data based on the content.
• There are two methods we use in our project content-based filtering and filtering based on
User interests. This is generally called a hybrid filtering technique. We will discuss more about
these in future slides.
Advantages:
• These work on user’s interests and targets labelled data: It only recommends what the user is
interested in watching. And here we have the target is labelled data.
• Prediction is more accurate: Unlike K-Means clustering algorithm, This works on similarity measure calculated
using cosine similarity and comparatively gives more accurate result.
7. System Requirements
Software Requirements :
Operating system : Windows 10 Home
Front-End : HTML,CSS,JS.
Back-End : FLASK-Python.
Coding Language : Python.
Software Environment :Jupyter Notebook,
Google Collab
Hardware Requirements :
System : Intel I-5 Processor
Hard Disk : 1000 GB.
RAM : 8GB.
8. Modules
There are three modules in our system:
1. Input Module
2. Processing Module
3. Output Module
9. Recommended movies output
Processing Module
Libraries
Pre
processing
of data
Weighted
Rating
Calculation
Top Movies
Generate
Similar
Movies
Recommended
Cosine
Similarity
Algorithm
Input from users
Fig: Architecture of different modules of movie recommendation system
15. Inputs
We used the dataset obtained from TMDB API. We use two types of data,
1. Full Dataset.
2. Small Dataset.
16. Python Libraries
Sklearn:
We used this module to vectorise the data
and to calculate the cosine similarity.
Matplotlib:
We used this module to plot the graphs for
comparing the different features of the data.
Pandas:
We used this module to modify and filter
the dataset according to the requirement.
Numpy:
We used this module to manipulate the data
used in to large sets of arrays and for
applying few mathematical functions
required on data.
17. Implementation
Simpler Recommender
Simpler Recommender recommends the movies that are more popular and more critically acclaimed will have a higher
probability of being liked by the average audience.
Simpler
Recommender
Full dataset
Top Rated
Movies
Top Genre
Movies
Mostly
Watched
Movies
• This recommender is
common to all the
users.
18. We used the TMDB Ratings to come up with our Top Movies Chart.
We used IMDB's weighted rating formula to construct the chart.
Mathematically, it is represented as follows:
Weighted Rating (WR) = ((v/v + m)*R) + ((m/v + m)*C)
Where,
• v is the number of votes for the movie
• m is the minimum votes required to be listed in the chart
• R is the average rating of the movie
• C is the mean vote across the whole report
19. Content-Based Recommender
Content based recommendation system recommends movies based on the content, meta data of
the movie dataset we have.
Content-Based
Recommender
Small dataset
Movies Similar to
given title
Input Title
20. We built two Content Based Recommenders based on:
• Movie Overviews and Taglines.
• Movie Cast, Crew, Keywords and Genre.
We used cosine similarity to find out the numeric value of similarity between two movies.
Mathematically, it is defined as follows:
Cosine (x, y) = (x. (y ^ t)) / (||x||. ||y||)
We used the linear_kernel present in sklearn module which does the same work much faster.
22. Figure : The screenshot refers to the output of the movies being displayed with the top 15
ranking movies based on a Weighted Rating calculations.
23. Figure : The screenshot refers to the output of the movies being displayed with the top
movies based on a particular GENRE by the user.
24. Figure : The screenshot refers to the output of the movies being displayed with the top 10
ranking movies based on a particular input from the user.
25. Test Cases
A TEST CASE is a set of conditions or variables under which a tester will determine whether a
system under test satisfies requirements or works correctly. The process of developing test cases
can also help find problems in the requirements or design of an application.
26. Return type of weighted_rating
function
Weighted rating function should
return float type
Returned the required
type
Pass
Value of weighted_rating
function
Value which is returned by weighted
rating should be accurate
Returned the accurate
value
Pass
Return type of
get_recommendation
Get_recommendation function
should return series type
Returned the series type Pass
Get recommendations title
parameter is not case sensitive
Get_recommendations should return
the value, if the title parameter
passed is in either lower or upper
case
Returned the value for
both upper and lower
case of title
Pass
Value of get_recommendation
Value returned by
get_recommendations should be
according to cosine similarity
measure
Returned the value
exactly
Pass
Test Case Description Output Result
Test results based on different cases:
27. Conclusion
Recommender systems are a powerful new technology for extracting additional value for a business from its
user databases. These systems help users find items they want to buy from a business. Recommender
systems benefit users by enabling them to find items they actually attract or interested towards.
Recommender systems are being stressed by the huge volume of user data in existing corporate databases,
and will be stressed even more by the increasing volume of user data available on the Web.
Using Content-Based recommendation we were able to recommend the movies which users might be
interested.
28. Future Enhancements
Cosine similarity calculation do not work well when we don't have enough rating for movie or when user's
rating for some movie is exceptionally either high or low. As an improvement on this project some other
methods such as adjusted cosine similarity can be used to compute similarity.
In equation form, the adjusted cosine similarity computation is expressed as:
The main advantage of this approach is that in item-based collaborative filtering, the item vectors consist of
ratings from different users who often have varying rating scales.
29. Bibliography
Google developers docs: https://developers.google.com/machine-learning/recommendation/
Learning the Pandas Library Python Tools for Data Munging, Analysis, and Visual by Matt Harrison Michael Prentiss.
Alberto Cairo - The Functional Art - An Introduction to Information Graphics and Visualization-New Riders (2013)
Andreas C. Muller, Sarah Guido - Introduction to Machine Learning with Python_ A Guide for Data Scientists-O’Reilly
Media (2016)
Building a Movie Recommendation Engine in Python using Scikit-Learn, Medium by Heroku.
Movie Dataset from Kaggle, www.Kaggle.com